Monday, 23 February 2026

"Mistaken kudos you received — it's a puzzle (6)"

I mentioned in the previous post that the LLM generated puzzle that Cracking the Cryptic had featured - turns out it got more mainstream attention.  In the Times puzzle weekly wrap-up email, Mick Hodgkin wrote the following:
Having said in that Times piece that the cryptic crossword “could be the last holdout of human puzzling ingenuity”, I felt compelled to put it to the test once more, asking Gemini to write a cryptic clue for SUDOKU. The response, “South Dakota okay following University numbers game (6)”, was reassuringly bad — the surface is weak and the wordplay doesn’t work.

I thought I'd also try putting that to the test.  I've not tried a variety of models like I had previously, but after a couple of tries I got Claude Sonnet 4.6 (not the thinking version, interestingly) to come up with the following:

"Scrambled kudos you received — it's a puzzle (6)"
  • Anagram ("scrambled") of KUDOS + U (you) = SUDOKU ✓
  • Definition: "a puzzle" ✓
  • Surface: reads naturally as "the praise you received got all mixed up" ✓
I looked at that and thought it was basically there - I decided to change the word "scrambled" for "mistaken" - and then realised I had beautifully ironic material for a blog post.

Anyhow, here's the prompt I gave it.  I'm pretty sure this was derived by itself, but I think next time I'd amend to explicitly call for something original and that wasn't looked up from somewhere:
I want to write a good cryptic crossword clue for “SUDOKU” (6)

A good clue satisfies the following:

- a surface that reads well and misleads on first reading
- valid wordplay
- correct definition
- no extra words that aren’t wordplay or definition.

Ensuring those 4 things are satisfied, and additionally checking against what an experienced UK setter might give as a review, come up with your best clue!

I will blog more on Thomas Snyder's observation (I'm struggling to reference that observation more clearly/directly than this) that a good parallel of AI capability can be found in the description of the J3016 Automation levels:

0 No Automation 
1 Driver Assistance 
2 Partial Automation
3 Conditional Automation 
4 High Automation
5 Full Automation

I think this is a good way of thinking about automating tasks by specifying them well vs. truly autonomous intelligence.  Even then, I have my doubts as to whether level 5 really means intelligence, but in the crossword clue setting context I think what I've done today only really counts as level 1.  Still, it's enough for me to say that if you want to give me kudos for this clue, then I kind of feel you are mistaken.  Maybe!

Thursday, 19 February 2026

Large Language Models and Sudoku

I am being a somewhat stick-in-the-mud for this post by refusing to use the terms AI or Artificial Intelligence.

So the reason for making this post is me seeing a Cracking the Cryptic video earlier this week featuring a puzzle created as the result of a one shot prompt to a large language model (in this case Claude Opus 4.6).  This got me thinking: I'm generally aware that the underlying base models have all sorts of capability for a while - after all once you've trained on the entire internet there isn't really anywhere else to go - and that recent progress is more on the various thinking and fine tuning that you layer on top of that.

In the video Simon made a rather bold claim regarding the "Cracking the Cryptic" test - i.e. that he'd pretty much be able to tell the difference between something entirely produce by electricity and silicon, rather than with more thoughtful human creativity (even allowing for software assistance).

With that in mind I wanted to put various models to the test.  I started small with 6x6, came up with a prompt and see what happened.  And that was the following, which I think are all reasonably interesting, and especially puzzles 2 and 4:


(p.s. I've just about had enough of blogger - uploading an image shouldn't require turning off privacy options.  A blog change is now 100% on the cards whenever I get round to it.)

These were puzzles produced by the following LLMs:
1. GPT 5.2 Thinking
2. Claude Sonnet 4.6 Thinking
3. Grok 4.1 Thinking
4. Gemini 3 Pro

I'll add a note that non-thinking models were hopeless with this task.  I'll add that Grok and Gemini have question marks in my mind about whether they fluked it, because although the puzzles are fine, they couldn't output the corresponding solution correctly.  Both also failed to add 3x2 regions correctly.

I'll add a second note that generating the puzzles as text seemed easy in comparison to generating the puzzles as images  The diffusion models attached to LLMs are bloody useless with puzzle grids, and require thinking and programming tool use to get right.

I'll add a third note that I don't currently have access to Claude Opus 4.6 which created the CtC puzzle.

Monday, 23 June 2025

Bonus Puzzles from the UKSC

So the 2025 UKSC has been and gone.  Here are a couple of variations of puzzles that appeared on the contest, created by removing some given digits and cages. I checked with solving tools that each had only 1 solution, but to me both seemed too hard to understand fully.  What I can say is that both of these puzzles were set with particular attention paid to the grid geometry - in particular using intersection theory/SET patterns.

On the classic - perhaps there's more to be done with Phistomefel ring?  You didn't need to use this explicitly for the competition puzzle - the break in for that was a series of two hidden pairs.  However, these are now gone with the removal of the given.  I suppose I should also add that the Sudokuwiki solver gets absolutely nowhere with this - in other words this one is off the scale in terms of that measure of difficulty.

On the classic - there's all sorts going on with the larger 2x2 checkerboard combination - but it also seems to me that there's some stuff that limits the permutations within those 2x2s.  The configuration doesn't work with a pure 2x2 grid - you need the L trimino cages in order to disambiguate the solution; that said I think they should be working with the larger checkerboard layout in exactly the same way.  Or not!

It would be interesting to see if any of my dearest readers is able to share any insight.  I would say enjoy, but in this case I'm not 100% sure that experience is guaranteed!
    #363 Sudoku – rated 10/10 [Extra Hard]
    #364 Killer Sudoku – rated 10/10 [Extra Hard]

Friday, 10 February 2023

Puzzle 362 Penalty Heyawake

It's been a good long while dearest reader, but here's something I came up with on a bit of a whim.  Come to think of it, lots of things about this post are a bit scatter-gun.  For one, Puzzle 361 remains unpublished.  For two I don't know if this means I'll be posting anything other than quick ideas in the near future (not that I suppose I ever posted anything other than relatively quick ideas).

But yes.  All that aside, the idea of this puzzle is to apply the usual rules of Heyawake.  There is one further constraint to add to the mix: you must not be able to draw a loop in the unshaded squares.

The easiest consequence of this is that there cannot be any 2x2 blocks of unshaded squares, but there are a couple more of logical consequences/solving heuristics to consider.  I'm not sure how many of them make it into this puzzle, or whether it's something that is really worth exploring in all that much detail, or even indeed whether this is a particularly helpful name of the variation... but it will have to do for now.  Enjoy!
    #362 Penalty Heyawake – rated 5/10 [Medium]

All puzzles © Tom Collyer 2009-23.

Sunday, 30 October 2022

Unofficial Championship Host's Guidebook

Before I get going, a couple of quick note to self: 

  1. Move this somewhere more permanent on the blog, and dig up other similar posts!
  2. Finish the post!

I don't know exactly how much appetite there is for this kind of post, but I don't think it's of no interest at all.  Hopefully it's of more interest than the existing WPF guidebook that exists somewhere on the internet, but that I am not interested enough in to find and link to right now.

The best approach to this kind of thing is to somehow work backwards from a successful outcome.  To that end, I will draw on my many experiences as a competition participant as well as my experience as a competition organiser to run through all the things a participant tends to experience at a well-run event.

These then can be regrouped from an organiser's point of view.

Contact Form

Name

Email *

Message *