Thursday, 19 February 2026

Large Language Models and Sudoku

I am being a somewhat stick-in-the-mud for this post by refusing to use the terms AI or Artificial Intelligence.

So the reason for making this post is me seeing a Cracking the Cryptic video earlier this week featuring a puzzle created as the result of a one shot prompt to a large language model (in this case Claude Opus 4.6).  This got me thinking: I'm generally aware that the underlying base models have all sorts of capability for a while - after all once you've trained on the entire internet there isn't really anywhere else to go - and that recent progress is more on the various thinking and fine tuning that you layer on top of that.

In the video Simon made a rather bold claim regarding the "Cracking the Cryptic" test - i.e. that he'd pretty much be able to tell the difference between something entirely produce by electricity and silicon, rather than with more thoughtful human creativity (even allowing for software assistance).

With that in mind I wanted to put various models to the test.  I started small with 6x6, came up with a prompt and see what happened.  And that was the following, which I think are all reasonably interesting, and especially puzzles 2 and 4:


(p.s. I've just about had enough of blogger - uploading an image shouldn't require turning off privacy options.  A blog change is now 100% on the cards whenever I get round to it.)

These were puzzles produced by the following LLMs:
1. GPT 5.2 Thinking
2. Claude Sonnet 4.6 Thinking
3. Grok 4.1 Thinking
4. Gemini 3 Pro

I'll add a note that non-thinking models were hopeless with this task.  I'll add that Grok and Gemini have question marks in my mind about whether they fluked it, because although the puzzles are fine, they couldn't output the corresponding solution correctly.  Both also failed to add 3x2 regions correctly.

I'll add a second note that generating the puzzles as text seemed easy in comparison to generating the puzzles as images  The diffusion models attached to LLMs are bloody useless with puzzle grids, and require thinking and programming tool use to get right.

I'll add a third note that I don't currently have access to Claude Opus 4.6 which created the CtC puzzle.

Monday, 23 June 2025

Bonus Puzzles from the UKSC

So the 2025 UKSC has been and gone.  Here are a couple of variations of puzzles that appeared on the contest, created by removing some given digits and cages. I checked with solving tools that each had only 1 solution, but to me both seemed too hard to understand fully.  What I can say is that both of these puzzles were set with particular attention paid to the grid geometry - in particular using intersection theory/SET patterns.

On the classic - perhaps there's more to be done with Phistomefel ring?  You didn't need to use this explicitly for the competition puzzle - the break in for that was a series of two hidden pairs.  However, these are now gone with the removal of the given.  I suppose I should also add that the Sudokuwiki solver gets absolutely nowhere with this - in other words this one is off the scale in terms of that measure of difficulty.

On the classic - there's all sorts going on with the larger 2x2 checkerboard combination - but it also seems to me that there's some stuff that limits the permutations within those 2x2s.  The configuration doesn't work with a pure 2x2 grid - you need the L trimino cages in order to disambiguate the solution; that said I think they should be working with the larger checkerboard layout in exactly the same way.  Or not!

It would be interesting to see if any of my dearest readers is able to share any insight.  I would say enjoy, but in this case I'm not 100% sure that experience is guaranteed!
    #363 Sudoku – rated 10/10 [Extra Hard]
    #364 Killer Sudoku – rated 10/10 [Extra Hard]

Friday, 10 February 2023

Puzzle 362 Penalty Heyawake

It's been a good long while dearest reader, but here's something I came up with on a bit of a whim.  Come to think of it, lots of things about this post are a bit scatter-gun.  For one, Puzzle 361 remains unpublished.  For two I don't know if this means I'll be posting anything other than quick ideas in the near future (not that I suppose I ever posted anything other than relatively quick ideas).

But yes.  All that aside, the idea of this puzzle is to apply the usual rules of Heyawake.  There is one further constraint to add to the mix: you must not be able to draw a loop in the unshaded squares.

The easiest consequence of this is that there cannot be any 2x2 blocks of unshaded squares, but there are a couple more of logical consequences/solving heuristics to consider.  I'm not sure how many of them make it into this puzzle, or whether it's something that is really worth exploring in all that much detail, or even indeed whether this is a particularly helpful name of the variation... but it will have to do for now.  Enjoy!
    #362 Penalty Heyawake – rated 5/10 [Medium]

All puzzles © Tom Collyer 2009-23.

Sunday, 30 October 2022

Unofficial Championship Host's Guidebook

Before I get going, a couple of quick note to self: 

  1. Move this somewhere more permanent on the blog, and dig up other similar posts!
  2. Finish the post!

I don't know exactly how much appetite there is for this kind of post, but I don't think it's of no interest at all.  Hopefully it's of more interest than the existing WPF guidebook that exists somewhere on the internet, but that I am not interested enough in to find and link to right now.

The best approach to this kind of thing is to somehow work backwards from a successful outcome.  To that end, I will draw on my many experiences as a competition participant as well as my experience as a competition organiser to run through all the things a participant tends to experience at a well-run event.

These then can be regrouped from an organiser's point of view.

Tuesday, 25 October 2022

WSPC 22: Aftermath

So the intention wasn’t to leave the last post as something of a cliffhanger, but it turns out I didn’t really have the energy to do updates during the week. 

The first thing to say is that the week was a bit of a rollercoaster. There were some unpleasant lows I won’t speak more about, but they left me feeling greatly saddened and robbed me of both self-esteem and proper sleep during the week. That’s never ideal when you want to be at peak mental sharpness, to say the least.

But overall it was great to return to things after 3 long years away. The puzzle solving itself also had its ups and downs; that’s usually the case but this year I was well within my usual abilities. I do think the WPC in particular will go down as a classic vintage. The 3 years away has given the opportunity for a new generation of solvers and authors to break through, and it has been a real delight to meet some of them and start new friendships. And of course to meet dear old friends once again.

I’ll be doing some posts with the added benefit of hindsight when I’m back to London. These will probably focus more on the highs as I think it’s about time to put the aforementioned lows behind now. 

Contact Form

Name

Email *

Message *