Friday 3 December 2021

Puzzle 360: Killer Sudoku

I'm sure this sort of thing must have been tried before, dearest reader.  Not my most elegant work, but the vast majority of the windows will need to be opened if you want to make it through to the end.  Enjoy!
    #360 Killer Sudoku – rated 6/10 [Hard]
All puzzles © Tom Collyer 2009-21.

Thursday 2 September 2021

Puzzle 359: Heyawake

It's been a long time since I did Heyawake.  As with the recent Masyu, it's not entirely Nikoli smooth, and there are bits I'd even go so far as describing as fiddly, but hopefully it makes for a decent challenge at least.

As a bit of an aside, I came across recently, and lots of it basically seems like magic to me.  For example:
There's also something called pzprRT, which seems like an extraordinary setting tool.  Basically, for a small number of types, including 4 of my favourites: Yajilin, Masyu, Slitherlink and indeed Heyawake, you are able to mark up your puzzle using the pzprjs project (i.e. as featured on and then you can press a button and it will give you the live deductions corresponding to the clues you've put in.

I am still generally avoiding marking up my puzzles using, as the interaction between the puzzle player and the hidden database (which stores the metadata of logged in solvers) is still not entirely clear to me.  I think I saw that this metadata is indexed by hashed versions of the puzzle URLs rather than the URLs themselves - this would kind of be fine by me I think as it means the database would be genuinely only metadata, rather than actual puzzle data making up some kind of secret dystopian mega master puzzle database.  But in any case pzprRT generated me a, so I think it means this puzzle is now on the database, hashed or otherwise, and now what’s done is done.

I suppose what it does mean for you, dearest reader, is that you can also play along online!  I may even start making a habit of it.  Enjoy!
    #359 Heyawake – rated 8/10 [Very Hard]
All puzzles © Tom Collyer 2009-21.

Sunday 22 August 2021

Puzzle 358: Masyu

So I don't think this is really as smooth and polished so as to live up to Nikoli standards, but there are a couple of flashes of something here and there to keep you interested, dearest reader.  Enjoy!
    #358 Masyu – rated 6/10 [hard]
All puzzles © Tom Collyer 2009-21.

Saturday 21 August 2021

Puzzle 357: Sudoku

I'm going to be posting a few Nikoli style puzzles over the next few days.  I was about to say that there probably won't be any Sudoku, but that's not entirely true.  As a bit of a preview, there will probably be a few Renban Groups that I'll post, mainly because I've seen too many examples where the groups are presented as lines, rather than shaded regions, and it's annoyed me past a critical point.

Anyhow, puzzles like these I guess aren't as common as they used to be.  Nothing too flash here.  Enjoy!
    #357 Sudoku – rated 3/10 [Easy]
All puzzles © Tom Collyer 2009-21.

Tuesday 17 August 2021

Maki Kaji

I saw in the news today that Nikoli president Maki Kaji has passed away at the age of 69.

I had the chance to meet him at the 2010 world sudoku championships, hosted that year in Philadelphia, USA. Always fond of games and gambling - Nikoli the puzzle company takes its name from Nikoli the racehorse - he hosted a couple of rounds of the Nikoli derby in which I was lucky enough to finish to second, winning a tenugui which I still have with me today. 

(Thanks - in advance! - to Przemysław Dębiak for the photo)

Kaji was known as the “godfather of sudoku,” following his role in the worldwide explosion of its popularity in 2004.  

I am tickled to see an example of a Masyu puzzle included on the BBC website (and also how much better designed it used to be!). But I come under the category of a puzzle fanatic. I think part of his puzzling greatness is reflected in this quote:
"The secret to inventing a good puzzle," he said, "is to make the rules simple and easy for everyone, including beginners. 

"You have to be able to make both easy and difficult puzzles using the same rules.

"Between 200 and 300 people help to complete a new puzzle. It has to be something that children, old people and everyone in between can enjoy, to be really good."
Rest in peace. 

Wednesday 28 July 2021

Puzzle Projects

First a quick update: I haven't written here for a little while, which feels a bit personally disappointing.  I suppose part of it reflects my continuing malaise within the puzzle community. As I've previously said, it used to be a place where I thought I fit in, and increasingly I am made to feel as if I do not.  The best way I can describe this is that any enthusiasm I have for the puzzle community has been blunted by expediency, apathy and lethargy.

About this time last year I outlined that I'd like to try and work on a bunch of different puzzle projects as a way of trying to get back into the swing of things.  In some ways it was going to be part of my own recovery, but for a number of reasons none of them really happened.  The main thing to say is that I haven't forgotten about them, and I would like to try and make them happen.

Recently I have been thinking about what a week-long online global festival of puzzles might look like. And yes, I have been corresponding privately with some relevant people before you ask, dearest reader.  But I'd also like to share the sorts of things I would hypothetically like to see.

Saturday 15 May 2021

Portrait of a flying fish

If, like me dearest reader, you are interested in sudoku solving techniques, then perhaps this is an interesting link to a youtube video.
If you found that video interesting, then perhaps this is an interesting picture.  [Someone going by the pseudonym shye is responsible for this puzzle, and it even has a name.  Imagine that, dearest reader: a sudoku with a name!  Rather grandly, this one is called "Valtari"]

If you found that picture interesting, then perhaps these are some interesting words.
  • Add rows 1, 3, 5, 7, 9 together with box 5 (in blue, with double weights in R5C4 and R5C6).
  • Subtract Columns 3, 5, 7 (in red).
  • The blue cells, considered with multiplicity, have the same contents as the red cells plus 3 extra copies of the digits 1-9.
  • In the double-weighted blue cells, the only possibilities are 2, 3 and 4.
  • There are already exactly two each of 2, 3 and 4 appearing as given digits in the single-weighted blue cells.
  • This means that whichever two of the three digits ends up in the double-weighted blue cells will end up bring the multiplicity of both of those digits up to 4 within the blue cells.
  • This means that the two empty red cells have the same contents as the two empty double-weighted blue cells.
  • It also means that those two digits cannot appear again in any other blue cell.
  • More on this to come I'm sure, following this good work by Philip Newman:
UPDATE: as far as I can tell, there is no need to add Box 5 to the equation here - this makes the intersections slightly simpler (no need for double-weights any more) - but it does make it harder to identify where you are supposed to be looking.

If you didn't find that interesting, then I'll leave you with some whimsical musings.  I always thought that an obscure and difficult sudoku solving technique going by the name Exocet was named for a powerful missile that could bust open the hardest of puzzles.  Cela a plus de sens en français.

Concord and Discord

I was pleased to receive a couple of messages on discord yesterday by virtue of sharing a server - namely the cracking the cryptic server.  Unfortunately I can't reply to any messages any more as I've decided it's for the best for me to leave that server, perhaps permanently.

The first message was from Philip Newman, letting me know about a document he's been working on.

This is, in fairness, an acquired taste.  It takes what was to me an impenetrable concept of sudoku solving techniques (Exocets), and relates them to a slightly less impenetrable concept of sudoku solving technique, namely "SET".  

As an aside, I don't think endless jargon, neologisms, portmanteaus, puns and acronyms is always very helpful when it comes to the dispersal of knowledge and understanding.  I should say that SET stands for "set equivalence theory", which is what has emerged following some of the discussions and posts on this blog (I still don't like that word even after all these years!) in autumn 2020.  I'd previously referred to it vaguely as intersection theory and will probably continue to do so in the future as well, in case that causes any confusion.

Anyhow, whilst I might not love the words, I do find the underlying conepts fascinating.  if you enjoyed said posts and discussions from last autumn then I think you'll enjoy Philip's document.  In particular, it seems to move the discussion on to more complicated and extended equivalent sets/intersections, and in particular more complicated interactions inside individual instances of them.  The way in which seemingly unrelated cells become strongly related turns out to be far more subtle than we really touched on previously.

Hopefully the last section implies this isn't going to be the final word on the matter either!  Maybe I'll take the time to add my own musings on Philip's work if I can muster up time and/or motivation.

The second message I received was from someone who had read one of my earlier blog (ugh) posts where I had previously mentioned that I wasn't having a great time on either the Puzzler's Club or the Cracking the Cryptic discord servers. 

I tried to reply privately to that message, but it turns out that having left the server I can't send the message.  So on the off chance that person is also reading this, let me reply in slightly less specific terms: thank you for your message, it was very kind of you to think of me and there was certainly no need to apologise on anyone's behalf.  Plenty of people have a better time of it there than I do - I can only relate my own experience.  At times (not all the time) it can feel very cliquey, hierarchical and beset by an unpleasant group-think of what is and isn't the best way to think about things.  

I particularly appreciate that is by no means reflective of all the members of the server, but it is reflective of enough it for me to conclude that it doesn't really work for me right now.

Sunday 14 February 2021

Puzzle 356: Arrow Sudoku

Well, I don't see why everyone else should be having fun with arrow sudoku and leave me feeling left out.  Who knows?  Maybe I'll make a few more.  

A.K.A. you need a spare arrow somewhere to break the symmetry.  Enjoy!
    #356 Arrow Sudoku – rated 6/10 [Hard]
All puzzles © Tom Collyer 2009-20.

Friday 5 February 2021

Puzzle 355: Arrow Sudoku

Look at me - I can make arrow sudoku too! Although perhaps not as difficult as the modern taste demands.  I thought this was fun anyway.  

In case you haven't seen arrow sudoku before, numbers placed in cells with circles must equal the sum of numbers placed along their corresponding arrows.  Numbers placed on arrows may repeat.  Enjoy!
    #355 Arrow Sudoku – rated 4/10 [Medium]
All puzzles © Tom Collyer 2009-21.

Puzzle Rating Systems 2: Croco and Puzzle Duel

I'm going to continue my discussion about rating systems by exploring a couple of examples from the puzzling world.  I will apologise at this point for not setting up something like MathJax on the blog and making the formulae look a bit nicer.  Maybe I'll do that later.

The first example is from a website that sadly isn't running any more: croco puzzle.  This was a German puzzle website featuring daily puzzle(s) which placed you on a daily leaderboard, and then also on an overall leaderboard, functioning as a rating system. I won't go into the full details of the mechanics here, as thankfully a record of the mechanics has been lovingly documented by Chris Dickson here:

The key features of the rating system have much in common with Elo.  You start off with a current rating, you have a performance event, and from that you can calculate an updated rating by starting with the current rating and somehow adjusting it by the difference between your current rating and what the rating of the performance implied.  There is something pleasingly Bayesian about the whole state of affairs.

On croco puzzle, you started off with a rating of 0.  The evaluation of performance is given by assessing your solving time of a particular puzzle vs. the median solving time of the entire population of the solving population.  Specifically:
Performance = 3000 * 1 / 2 ^ [(Your Time - The Best Time) / (Median Time - The Best Time)]
The update mechanism is simpler, and occurred every day:
New rating = Current Rating + (Performance Rating - Current Rating) / 60
[N.B. these formulas weren't completely consistent over time; there was some experimentation varying the parameters set here at 2 and 60 over different periods.  However I don't want to overcomplicate the exposition.]

Before talking about the dynamics, it's worth noting that performance evaluation is entirely dependent on the ability of the crowd on the day.  This aggregate ability might change over time because a different number of individuals might be playing on any two given days.  The aggregate ability might also change even if the individuals making up the population remains the same, as their overall abilities improve (or decline).  This means that the performance benchmark to which all performances are being measured are likely to be stable from day to day, but perhaps not from year to year.  Indeed, I recall some analysis provided on the website demonstrating that ratings didn't really seem to be comparable over the longer term (screenshot dug up from

The rating system shares some of the limitations I noted with Elo, particularly with regards to the dynamics.  It took a great deal of time for a rating to converge to a sensible level, and the higher your rating, the more unstable it became, particularly past the median mark of 1500.  The instability compared to Elo was somewhat asymmetric - good performances were very much a case of diminishing returns from the update mechanism, but bad performances ended up being very costly.  To what extent this was an incentive to play is probably debatable, but there were certainly examples of individuals cherry-picking the puzzles they played.

In fairness the croco system was designed to be slow to build up a sense of a long term achievement.  What I found interesting is that alongside rating, a daily "Tagesrating" was published, giving a more complete picture of recent form, as opposed to long term ability.  This is potentially where the use of the best time as a defining parameter might skew the daily evaluation, depending on who exactly got the best time, and how extreme it was in comparison to everyone else.  I shouldn't be surprised that Thomas Snyder had some thoughts on this at the time.

Puzzle Duel is a puzzle website which runs using a similar idea, although uses slightly different mechanic.

The explanation given is here:, although I'll also leave a screenshot to record things as well.

Again the starting point is a rating of 0.  The performance evaluation is there in the formula, but I'm not sure I've understood absolutely correctly the update mechanism.  I think it's the case that the ratings as per the leaderboard are calculated only weekly, despite there being a daily puzzle to solve.  That is to say the update mechanism runs in aggregate:
End of Week rating = Start of Week Rating + [Sum (Performance Rating - Start of Week Rating) / 7] ^ 0.66
Certainly you can see that individual points are calculated for each user every day, which I  assume are calculated using a single point sum, rather than all 7:

When it comes to the dynamics of the puzzle duel rating, the same observations apply: as performance is benchmarked specifically to the median solving time, we are dealing with something that is likely to be stable from day to day, but perhaps not year to year.  That said, there is no dependence on the best time any more; instead every shift is measured relative to multiples of the median.  The update mechanic implies that 1500 means that on average you are 2x as slow as the median solving time, and 2500 means that you are 2x as fast; similarly 1000 means you are 4x as slow and 3000 means you are 4x as fast.  

In practice I think this will end up bunching the majority of ratings quite tightly around the 2000 mark long term, which is possibly not good news for an accurate ranking system.  For now, individual ratings are still generally converging to their level - in particular they are mostly still increasing and I suppose this means that everyone is getting a good feeling when they come to look at their rating.

The equivalent weekly performance ratings are also viewable as a time series alongside the overall rating:

Perhaps time will tell, but it may well be the case that a bad performance (particularly if you have a submission error) is likely to be as asymmetrically punishing as it was for croco.  Puzzle Duel says that there is an initial bonus given to newcomers - but I'm not sure how effective that has been in getting people more towards their level.  Perhaps that's not the intention.

I'll conclude with a germ of a thought that I might shape the next post around.  I think one area to possibly consider is the asymmetry of the update mechanism, depending on where you currently sit overall.  I'd like to see a little more subtlety in the update mechanism, where recent form can more dynamically amplify or dampen down the magnitude of the update.  The desired dynamic would be to quickly accelerate an individual to a stable rating, and then leave them there if recent form stabilises.  In that way one bad performance in an otherwise stable run of form might not require several good performances to make up the lost ground, and one good performance isn't going to change everything overnight either.  On the other hand, should recent form start consistently moving away from the rating as part of a genuine shift in ability, then the update could become more sensitive and quickly re-converge.  More on that next time.

Tuesday 2 February 2021

Puzzle Rating Systems part 1: Elo and Credit Scores

The subject of rating systems is something that has been on my mind for many years, and I'd like to spend a bit of time blogging about my thoughts, to see where I end up as much as anything else.  Hopefully I'll get a couple of useful comments along the way from some of my dearest readers.  I've come across a few different rating systems in the puzzle solving context, and indeed beyond that, and whilst some of them have their pros and cons none of them seems to excite me particularly.  I think I could eventually do better!

I think my first exposure to rating systems was in the context of online matchmaking for the games Age of Empires 2 and Age of Mythology, which I used to play a lot before the days that sudoku really was a thing.  (I am mildly tickled to have discovered recently that Age of Empires 2 is still thriving, and indeed flourishing!).  Anyhow, the idea was there was a ladder based on the Elo system, with individuals gaining or losing points after a match depending on (1) their previous rating and (2) the difference in rating between them and their opponent that they either won or lost to.  The system more or less worked to get games between players of equal skills, but as the basis for a rating system it did seem to leave a lot to be desired.

[Incidentally, the same Elo system is used to provide the basis of Chess rankings.]

The limitations that I remember thinking about Elo at the time are:
  • Individual ratings didn't seem to be very stable.
  • Ratings weren't particularly comparable over time, and in particular seemed to be dependent on the total number of active players.
  • Individuals could manipulate their own ratings by attempting to cherrypicking their opponents.
  • In some cases Elo produced an incentive to an individual not to play at all if they believed they had maximised their rating.
  • It took a certain amount of time for Elo to converge on a reasonable rating for a player - this certain led to some frustration when a good player would "smurf" on a new name and beat up on less good players as they rose through the ranks.
I'm sure there is no shortage of literature out there on the Elo system in regards to some of these limitations.  Maybe I'll talk about some of that in further posts.

More recently, I think a lot about rating systems in my job where rating systems are used to estimate credit risk.  I suppose many people will have heard of credit scores, where the higher score you have, the lower credit risk you are supposed to have.  The general idea is to produce a good discrimination in your scoring system, so that when you line up your population in score-order and observe what happens, then most of the credit that went bad should be at the low score end and almost none of it should be at the high score end.  The extent to which this is achieved tends to get measured by the Gini coefficient, or by the area under the receiver operator characteristic ("ROC") curve.

[Incidentally, this is also something that you might have heard about in relation to "machine learning".  I find it very amusing that people in credit risk have been doing "data science", "artificial intelligence" and "machine learning" for decades and decades; practitioners have preferred to call this "modelling", or maybe sometimes "statistical modelling".]

Once you have a good rank order of your population, there is a secondary task to calibrate the ranking to an overall risk level for the population.  It's no good if your system implies that 5% of your population is going to go bad, to then observe that actually 10% subsequently did go bad, even if your rank ordering held up OK.  (Even a difference between 2% implied and, say, 2.2% observed is a BIG deal in credit risk - you are out by a factor of 20% and this can end up being worth 10s of £millions).  Looking at rating systems from this point of view suggests that it might be good idea if you can provide a link between your rankings and some kind of probability.

Comparing Elo and credit ratings isn't exactly like for like, but in theory there are some probabilistic statements about Elo that can be made.  For example, if two players have the same rating, then the probability of winning should be 50/50.  But if the gap between the players is large then this might correspondingly by 25/75 or 5/95 or so on.  This is where there is more to the comparison than meets the eye, as both examples make use of the logistic curve.

I'm going to wrap things up here and save further detail for future posts.  I think the point with all of this is that I'm not convinced you can be ever be sure that Elo ratings are accurate in the first place, and this has to do with the score update mechanism.  Elo assumed that player's skill levels could be represented by a normal random variable, whose mean slowly varied over time.  However, the issue here is with the score update mechanism, which at times isn't sensitive enough (in getting new players up to speed with a suitable rating) and at others is too sensitive (leading to erratic increases and decreases in rating), and overall doesn't seem to do a great job in tracking how that mean might vary over time (and indeed says nothing about the variance around that mean!).  That said, overall Elo roughly does gets you to the right place, eventually, but it seems to me that there might be room for improvement.

Contact Form


Email *

Message *