taw's blog: SQL-based Game of Life refactored

Tuesday, December 01, 2015

SQL-based Game of Life refactored

a185991 sahara-2 by The.Rohit from flickr (CC-NC)

A couple weeks ago I posted SQL-based Game of Life. It was rather ridiculous, with 3-levels of VIEWs, one of them including LEFT OUTER JOIN.

I revisited the problem for some refactoring and it turns out solution is far simpler.

Rules of game of life

Standard rules are:

cell's neighbours are all cells next to the cell
cell is not its own neighbour
cell becomes live if it was alive and has 2 or 3 live neighbours
cell becomes dead if it was dead and has 3 live neighbours

But I thought, what if we tweaked them a bit so every cell is considered its own neighbour as well? Mathematically tweaking definition:

cell's neighbours are all cells next to the cell
cell is its own neighbour
cell becomes alive if it was alive and has 3 or 4 live neighbours
cell becomes dead if it was dead and has 3 live neighbours

That might or might not lead to simpler query. But those numbers are weirdly close, what if instead we did this:

cell's neighbours are all cells next to the cell
cell is 0.5 of its own neighbour
cell becomes alive if it was alive and has 2.5 or 3.5 live neighbours
cell becomes dead if it was dead and has 3 live neighbours

Wait, doesn't that simply mean...

cell's neighbours are all cells next to the cell
cell is 0.5 of its own neighbour
cell becomes alive if it has between 2.5 and 3.5 live neighbours

So it turns out we don't need to check alive/dead at all, leading to really simple code to which I'll get soon. Especially if we multiply weight by 2 to avoid floats - so cell counts as 1-weight neighbour and all cells around it count as 2-weight neighbours - and then we look for any total weight between 5 and 7.

Does that work with other rules?

For a while I thought it's a hack for standard Game of Life, and it wouldn't generalize to other types, but it totally would. If for example live cells stay live at 1, 6, or 7 neighbours, and dead ones at 2, 4, or 8, then we check for total in [3,13,15,4,6,16]. Or you could weight them at 10/1 instead of 1/2, then rules would be in [11,16,17,2,4,8].

I didn't use this generalization is the code, as code is a tiny bit simpler for standard case, but it's really tiny difference.

The code

CREATE TABLE life(
  simulation string,
  generation int,
  x int,
  y int
);

CREATE TABLE neighbour_coordinates(
  dx int,
  dy int,
  weight int
);

INSERT INTO neighbour_coordinates VALUES
  (-1,-1, 2),
  (-1, 0, 2),
  (-1,+1, 2),
  ( 0,-1, 2),
  ( 0, 0, 1),
  ( 0,+1, 2),
  (+1,-1, 2),
  (+1, 0, 2),
  (+1,+1, 2);

CREATE VIEW alive_next_generation AS
  SELECT
    simulation,
    generation+1 AS generation,
    x+dx as x,
    y+dy as y
  FROM life
  JOIN neighbour_coordinates
  GROUP BY simulation, generation, life.x+dx, life.y+dy
  HAVING SUM(weight) BETWEEN 5 and 7;

Driver being exactly as before. That's ridiculously simple compared with initial solution.

MongoDB rant

And then I thought - how about doing this in MongoDB? And oh my god, that aggregation pipeline...

MongoDB JSON-document model is the sweetest thing ever in the world of databases, and it's interface is OK for simple queries, but it seriously just needs a real query language. So far every time I had to do anything nontrivial in MongoDB I did it in Ruby on application side, and I'm not unusual here.

SQL world is on the other end of the scale, with literally the worst document model ever invented, but fairly adequate query language. I wouldn't call SQL "good", but compared to monstrosities required by mongo for even fairly straightforward stuff...

Did anybody ever manage to get best of both worlds with some document model based on JSON or similar, and adequate query language?

I know some RDBMSs have JSON boltons but they're half-supported bullshit, and SQL is just adequate in comparison to what Mongo requires, not by any objective standard, so I'd rather not get there.

Some NoSQLs like MongoDB for some reason feel they don't need any kind of real querying language, just some $-trees, but that doesn't get anywhere close to adequate even for rather simple uses and it just moves querying to application side hacks.

I guess solr/lucene is sort of doing both sides decently, but it's a fairly special kind of NoSQL, and for many reason difficult to use as primary database. (don't even ask what kind of hackery I've done with it over years).