Thursday, 19 February 2026

Large Language Models and Sudoku

I am being a somewhat stick-in-the-mud for this post by refusing to use the terms AI or Artificial Intelligence.

So the reason for making this post is me seeing a Cracking the Cryptic video earlier this week featuring a puzzle created as the result of a one shot prompt to a large language model (in this case Claude Opus 4.6).  This got me thinking: I'm generally aware that the underlying base models have all sorts of capability for a while - after all once you've trained on the entire internet there isn't really anywhere else to go - and that recent progress is more on the various thinking and fine tuning that you layer on top of that.

In the video Simon made a rather bold claim regarding the "Cracking the Cryptic" test - i.e. that he'd pretty much be able to tell the difference between something entirely produce by electricity and silicon, rather than with more thoughtful human creativity (even allowing for software assistance).

With that in mind I wanted to put various models to the test.  I started small with 6x6, came up with a prompt and see what happened.  And that was the following, which I think are all reasonably interesting, and especially puzzles 2 and 4:


(p.s. I've just about had enough of blogger - uploading an image shouldn't require turning off privacy options.  A blog change is now 100% on the cards whenever I get round to it.)

These were puzzles produced by the following LLMs:
1. GPT 5.2 Thinking
2. Claude Sonnet 4.6 Thinking
3. Grok 4.1 Thinking
4. Gemini 3 Pro

I'll add a note that non-thinking models were hopeless with this task.  I'll add that Grok and Gemini have question marks in my mind about whether they fluked it, because although the puzzles are fine, they couldn't output the corresponding solution correctly.  Both also failed to add 3x2 regions correctly.

I'll add a second note that generating the puzzles as text seemed easy in comparison to generating the puzzles as images  The diffusion models attached to LLMs are bloody useless with puzzle grids, and require thinking and programming tool use to get right.

I'll add a third note that I don't currently have access to Claude Opus 4.6 which created the CtC puzzle.

Contact Form

Name

Email *

Message *