Sitemap

Puzzle me this

Tom Geudens
14 min readJan 7, 2025

--

Completely unrelated to the rest of the post but I’m currently being haunted by Bach’s BMV 1060. I hear it everywhere. Very strange. Catchy tune though.

Why
Do not reinvent warm water! Were you ever told that? Did you then move on to something else, away from what had piqued your interest? I do not. Move on I mean. Lying in hospital one day, I got an unexpected visitor. As we didn’t want to discuss my condition he talked me — in detail — through an alternative system for bikes. One that he had developed. He had mastered how bikes work (bikes are fully understood, it’s a myth that they are not) and worked out a better and more efficient method. I love that. There is satisfaction to be found in understanding.

A puzzle is already understood by its creator. Solving it brings no new result. What it brings is understanding for the solver. How do the pieces fit together? How does the mind of the creator work?

At this time of the year — and given that I write about IT-related things — you might think I’m going to discuss the Advent Of Code puzzles that just had their 10th iteration. I’m not. They were great though. The overwhelming majority could be solved with a graph (look we have arrived at graphs ;-) and I had additional fun by limiting myself to pure Cypher/GQL (no apoc, no gds, no custom code). I’ll give you one quote that I found to be very telling. The more competitive people often use hacky methods to get to the result fast. Puzzle 24 part II required you to fix a diagram that represents an adder (not the venomous snake). Quote: I just looked at the graph representation and could pick out the flawed parts.

Graph that represents part of the adder diagram of the Advent Of Code 2024 day 24 part II puzzle, created with the hierarchical view in Neo4j Bloom.
part of the adder diagram in Neo4j Bloom

The full diagram is much bigger but there’s indeed one flaw (of the four in the puzzle) you can see in this image. A visible flaw in the pattern without having to do any computation (other than what goes on behind your eyes automatically). That’s the power of graphs.

What
In this post I discuss a puzzle which can be represented with a simpler graph. The Sudoku. As my Japanese is — sadly — limited to what I learned from reading Shōgun, I’ll leave the history of the name for you to look up.

A sudoku grid has 81 (9 times 9) cells. Each of these can be represented with a node.

A blue circle with black border representing a node. Inside is a black bordered white background pill that holds the label of the node, Cell. Next to the node are the properties, key value pairs.
sudoku cell

Sounds like a logical choice but note that in the adder-diagram above, I represent the wires as nodes and the actions (boolean gates) as relationships. Some thought does go into these choices!

I did go overboard with the properties. There is a lot of duplicated information. As there are only going to be 81 nodes in my database I’m going with convenience rather than sparseness. And by the way, the cartesian point is only there for easy visualisation; it plays no part in the solving.

Each cell is part of three sets. Row, column and box. Each of these is represented with a relationship.

ROW relationship
COLUMN relationship

Yes, I am aware I’m representing a column horizontally. It just looked horrible vertically. Use your imagination, I’m going to do it again to represent the box.

BOX relationship

Did you know that there actually is no name for a 3x3 grid (which each box in a sudoku is)? There is one for a 2x2 grid, it’s called a quadrant. And everybody in IT sales thinks those are magic. I hereby propose the word nonant. Sounds nice and it’s also a place in Normandy, France. What else could you possibly want?

Here is the syntax to create the database with an empty grid.

// schema first
CREATE CONSTRAINT uniqueCellid
IF NOT EXISTS FOR (c:Cell) REQUIRE (c.id) IS UNIQUE;
// create the nodes
FOREACH(row IN range(1,9) |
FOREACH(column IN range(1,9) |
CREATE (:Cell {
id: toInteger(toString(row) + toString(column)),
stringid: "r" + toString(row) + "c" + toString(column),
row: row,
column: column,
gridlocation: point({ x: column - 1, y: (row - 9) * -1 }),
value: 0,
box: 3 * ( ( row - 1 ) / 3 ) + ( ( column - 1 ) / 3 ) + 1
})
)
)
WITH "created the nodes" AS result
MATCH (:Cell)
RETURN count(*) AS thecount;
// create the ROW relationship
MATCH (c:Cell)
WITH c.row AS row, c.id AS id
ORDER BY row, id
WITH row, collect(id) AS toconnect
CALL (row, toconnect) {
UNWIND range(0,8) AS index
WITH toconnect[index] AS sourceid, toconnect[index+1] AS targetid
MATCH (source:Cell WHERE source.id = sourceid)
MATCH (target:Cell WHERE target.id = targetid)
CREATE (source)-[:ROW]->(target)
RETURN count(*) AS connections
}
RETURN sum(connections) AS thecount;
// create the COLUMN relationship
MATCH (c:Cell)
WITH c.column AS column, c.id AS id
ORDER BY column, id
WITH column, collect(id) AS toconnect
CALL (column, toconnect) {
UNWIND range(0,8) AS index
WITH toconnect[index] AS sourceid, toconnect[index+1] AS targetid
MATCH (source:Cell WHERE source.id = sourceid)
MATCH (target:Cell WHERE target.id = targetid)
CREATE (source)-[:COLUMN]->(target)
RETURN count(*) AS connections
}
RETURN sum(connections) AS thecount;
// create the BOX relationship
MATCH (c:Cell)
WITH c.box AS box, c.id AS id
ORDER BY box, id
WITH box, collect(id) AS toconnect
CALL (box, toconnect) {
UNWIND range(0,8) AS index
WITH toconnect[index] AS sourceid, toconnect[index+1] AS targetid
MATCH (source:Cell WHERE source.id = sourceid)
MATCH (target:Cell WHERE target.id = targetid)
CREATE (source)-[:BOX]->(target)
RETURN count(*) AS connections
}
RETURN sum(connections) AS thecount;

Before I can get to the solving, one more thing about my model needs to be explained. When a node has no value for a particular property, the rule in a property graph is to not store the property. In this case however that rule is broken (it’s more of a guideline anyway). When a cell has no value yet, its value is zero. That is not only con-vention (in the sudoku space) it’s also con-venient for the calculations we’ll be doing (and if you don’t believe that, I’ll invent another con-story).

How
In case the rules of sudoku aren’t clear (please message me with the location of the rock you’ve been hiding under, I’d love to rent the place) here they are

  1. Each row of nine cells has to contain the numbers 1, 2, 3, 4, 5, 6, 7, 8, 9 with no repeats.
  2. Each column of nine cells has to contain the numbers 1, 2, 3, 4, 5, 6, 7, 8, 9 with no repeats.
  3. Each box of nine cells has to contain the numbers 1, 2, 3, 4, 5, 6, 7, 8, 9 with no repeats.

An emergent rule from this is that the numbers in each row, column and box must add up to 45 (= 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9).

You are given a grid with some cells already provided and have to fill out all the other cells according to those rules. Only true sudoku puzzles are considered in this post. Those only have one solution.

To start the syntax-heavy part (you’ve been warned), a couple of helper queries

// how many left todo
MATCH (cell:Cell WHERE cell.value = 0)
RETURN count(*) AS todo;
// frequency count
UNWIND range(1,9) AS value
WITH value, COUNT {MATCH (cell:Cell) WHERE cell.value = value} AS frequency
ORDER BY frequency DESC
WITH value, frequency
WHERE frequency < 9
RETURN value, frequency;
// set the grid
WITH split($input,'') AS entries
UNWIND range(1,9) AS row
UNWIND range(1,9) AS column
MATCH (cell:Cell WHERE cell.id = toInteger(toString(row) + toString(column)))
SET cell.value = toInteger(entries[((row - 1) * 9) + (column - 1)]);
// export the grid
UNWIND range(1,9) AS row
UNWIND range(1,9) AS column
MATCH (cell:Cell WHERE cell.id = toInteger(toString(row) + toString(column)))
RETURN reduce(result="",entry in collect(cell.value) | result + entry)
AS sudoku;

A human knows when a puzzle is done, a computer does not. Hence the todo-query. A human knows which number is prevalent (and thus might offer the most opportunities), a computer does not. Hence the frequency-query. I made a mistake on my first iteration of that. I forgot that not all puzzles start out with all values in them. Tant pis (that is French, that is) but that’s why I’m using the range there instead of going over the nodes.

The grid is set by using an 81 position string. Say for example that we want to set this puzzle

. 2 . | . 3 . | . . .
. . 3 | . . . | . 4 .
. . 4 | 9 . 8 | . . 3
------ ----- -----
. 4 . | 7 1 . | . 5 6
3 . . | . . . | . . 2
5 1 . | . 9 6 | . 8 .
------ ----- -----
9 . . | 3 . 5 | 4 . .
. 8 . | . . . | 1 . .
. . . | . 2 . | . 9 .

Then that is represented as follows

:param input => "020030000003000040004908003040710056300000002510096080900305400080000100000020090"

And the final helper query can take a grid and turn it into such a string.

My children have to solve a sudoku every day. Yes, I am an old school parent, my children have duties. They also have to read (in a physical book) and write (by hand with a pen) every day. Parenting is a subject for another post. Sudoku solving is the one here. I am not a — complete — monster, the puzzles they get are the basic ones you find in magazines. Most of what they have to do is find the naked singles.

In case you think this post has now gone x-rated, watch this Sudoku master discuss it and realize the innuendo is only in your head!
It’s a cell that — based on the values already in the grid — can only be a single value. Because either the row, column or box forces it so.

Taking our puzzle again, there are 8 such cells.

. 2 . | . 3 . | . . .
. . 3 | . . . | . 4 .
. . 4 | 9 . 8 | . . 3
------ ----- -----
. 4 . | 7 1 3 | . 5 6
3 . . | . . . | . 1 2
5 1 . | . 9 6 | 3 8 4
------ ----- -----
9 . 1 | 3 . 5 | 4 . .
. 8 . | . . 9 | 1 3 .
. 3 . | . 2 . | . 9 .

And you might think I’m using one to find the other, but I’m not
1 goes into position 58 because it’s the only option for the box and the row
1 goes into position 73 because it’s the only option for the row
3 goes into position 46 because it’s the only option for the box and the column
… and so on, each of them could have been the one you saw first.

The Cypher query goes thus

UNWIND range(1,9) AS value
CALL (value) {
CALL (value) {
MATCH (cell:Cell)
// the cell is empty
WHERE cell.value = 0
// and there's nothing in the cell's row that has the value
AND NOT EXISTS {
MATCH (cell)-[:ROW]-+(other) WHERE other.value = value
}
// and there's nothing in the cell's column that has the value
AND NOT EXISTS {
MATCH (cell)-[:COLUMN]-+(other) WHERE other.value = value
}
// and there's nothing in the cell's box that has the value
AND NOT EXISTS {
MATCH (cell)-[:BOX]-+(other) WHERE other.value = value
}
WITH cell.box AS box, collect(cell.id) AS theids
WHERE size(theids) = 1
// there's only one possibility for the value in the box
RETURN theids[0] AS single, "box" AS reason
}
RETURN single, reason
UNION
CALL (value) {
MATCH (cell:Cell)
// the cell is empty
WHERE cell.value = 0
// and there's nothing in the cell's row that has the value
AND NOT EXISTS {
MATCH (cell)-[:ROW]-+(other) WHERE other.value = value
}
// and there's nothing in the cell's column that has the value
AND NOT EXISTS {
MATCH (cell)-[:COLUMN]-+(other) WHERE other.value = value
}
// and there's nothing in the cell's box that has the value
AND NOT EXISTS {
MATCH (cell)-[:BOX]-+(other) WHERE other.value = value
}
WITH cell.row AS row, collect(cell.id) AS theids
WHERE size(theids) = 1
// there's only one possibility for the value in the row
RETURN theids[0] AS single, "row" AS reason
}
RETURN single, reason
UNION
CALL (value) {
MATCH (cell:Cell)
// the cell is empty
WHERE cell.value = 0
// and there's nothing in the cell's row that has the value
AND NOT EXISTS {
MATCH (cell)-[:ROW]-+(other) WHERE other.value = value
}
// and there's nothing in the cell's column that has the value
AND NOT EXISTS {
MATCH (cell)-[:COLUMN]-+(other) WHERE other.value = value
}
// and there's nothing in the cell's box that has the value
AND NOT EXISTS {
MATCH (cell)-[:BOX]-+(other) WHERE other.value = value
}
WITH cell.column AS column, collect(cell.id) AS theids
WHERE size(theids) = 1
// there's only one possibility for the value in the column
RETURN theids[0] AS single, "column" AS reason
}
RETURN single, reason
}
RETURN value, single, collect(reason) AS reasons
ORDER BY value, single;

That is the same subquery three times. Well spotted! Once aggregated on row, once on column and once on box. Wouldn’t it be cool if there was a way to define a simple function in Cypher itself?

Updating is — obviously — almost the same query but ends with

...
WITH value, single, collect(reason) as reasons
MATCH (cell:Cell WHERE cell.id = single)
SET cell.value = value
RETURN count(*);

Rinse and repeat. Six more singles depended on those eight. Then three. Then … it stops. Our puzzle now looks like this

. 2 . | . 3 . | . . .
. . 3 | . . 2 | . 4 .
1 . 4 | 9 . 8 | 2 . 3
------ ----- -----
. 4 . | 7 1 3 | 9 5 6
3 . . | . . . | 7 1 2
5 1 7 | 2 9 6 | 3 8 4
------ ----- -----
9 . 1 | 3 . 5 | 4 2 .
. 8 . | . . 9 | 1 3 .
. 3 . | . 2 . | 6 9 .

We could now go fancy. Try finding X-wings. Or use the Snyder notation to find options. My children are 11 and 14. Do you really think I would put them through that?

Consider instead position 56. It can only be 4. Not because it’s the only option for that value in the row, column or box but because it can not be either of the two other numbers missing in the box, 5 and 8 (both are ruled out by the column). Likewise position 93 can only be 5.
Written as a query that looks like this

UNWIND range(1,9) AS value
CALL (value) {
CALL (value) {
MATCH (cell:Cell)
// the cell is empty
WHERE cell.value = 0
// and there's nothing in the cell's row that has the value
AND NOT EXISTS {
MATCH (cell)-[:ROW]-+(other) WHERE other.value = value
}
// and there's nothing in the cell's column that has the value
AND NOT EXISTS {
MATCH (cell)-[:COLUMN]-+(other) WHERE other.value = value
}
// and there's nothing in the cell's box that has the value
AND NOT EXISTS {
MATCH (cell)-[:BOX]-+(other) WHERE other.value = value
}
WITH cell.box AS box, collect(cell.id) AS theids
RETURN theids AS options, "box" AS reason
}
RETURN options, reason
UNION
CALL (value) {
MATCH (cell:Cell)
// the cell is empty
WHERE cell.value = 0
// and there's nothing in the cell's row that has the value
AND NOT EXISTS {
MATCH (cell)-[:ROW]-+(other) WHERE other.value = value
}
// and there's nothing in the cell's column that has the value
AND NOT EXISTS {
MATCH (cell)-[:COLUMN]-+(other) WHERE other.value = value
}
// and there's nothing in the cell's box that has the value
AND NOT EXISTS {
MATCH (cell)-[:BOX]-+(other) WHERE other.value = value
}
WITH cell.row AS row, collect(cell.id) AS theids
RETURN theids AS options, "row" AS reason
}
RETURN options, reason
UNION
CALL (value) {
MATCH (cell:Cell)
// the cell is empty
WHERE cell.value = 0
// and there's nothing in the cell's row that has the value
AND NOT EXISTS {
MATCH (cell)-[:ROW]-+(other) WHERE other.value = value
}
// and there's nothing in the cell's column that has the value
AND NOT EXISTS {
MATCH (cell)-[:COLUMN]-+(other) WHERE other.value = value
}
// and there's nothing in the cell's box that has the value
AND NOT EXISTS {
MATCH (cell)-[:BOX]-+(other) WHERE other.value = value
}
WITH cell.column AS column, collect(cell.id) AS theids
RETURN theids AS options, "column" AS reason
}
RETURN options, reason
}
UNWIND options AS option
WITH option, collect(value) as values
WITH option, reduce(uniquevalues=[], value in values |
CASE WHEN value IN uniquevalues
THEN uniquevalues
ELSE uniquevalues + value END) AS uniquevalues
WHERE size(uniquevalues) = 1
RETURN option, uniquevalues[0] AS single;

Instead of looking if there’s only one position for a value, it checks if there’s only one value for a position. To update, replace the final return statement with

...
MATCH (cell:Cell WHERE cell.id = option)
SET cell.value = uniquevalues[0]
RETURN count(*);

With 34 positions left to solve after that, we move back to checking for singles. Three are found on the first iteration, three more on the second, three more on the third, then four, two, four, three, four, five, three more on the tenth iteration … and guess what, we’re done.

solved puzzle in Neo4j Bloom

Wasn’t that fun?

Well not so much if you — like me — are executing this manually in the Neo4j Browser. This puzzle requires 14 iterations. Cypher/GQL is a great query language but it lacks coding constructs (functions, proper loops and branching).

To alleviate that pain (and keep myself busy during the dark winter days) I created a Python notebook with solver code. Don’t shoot me if the code is bad; Golang is my usual poison. You’ll find it, the browser scripts and everything else I used in this blogpost in this github repository.

Last words
The above will not solve all sudoku puzzles. You’ll find a third approach in my scripts (that one involves basic pencilmarking). With that added most hard-level puzzles on The Sudoku can be solved.

But that is not the point. I gave my kids puzzle 00759 (hard-level) yesterday. It looks hard (relatively few clues to start with and very few starting points on the first couple of iterations). Neither of them got very far in their allotted time (as I said, I’m not a complete monster, their tasks are timeboxed). And yet the puzzle only requires finding naked singles. Try it. Few clues to start and twenty-five iterations to go through, that is what makes that puzzle hard.

That too is not the point. I didn’t look up what common approaches to solve sudokus are. I gave you my own. It’s very likely that I’m repeating what others have done. It’s very likely that my queries are not the most efficient. And I should probably have looked into what can be done with that adds-up-to-45-rule. I had fun with it and if I put you on a journey of your own I have succeeded!

Reinvent warm water/the wheel. Please. The planet can do with better versions of those (and a million other things some will claim we have solved).

--

--

Tom Geudens
Tom Geudens

Written by Tom Geudens

At the age of 15, Tom Geudens’ parents gave him a choice. Either become a baker or go into IT. He went into IT …

No responses yet