I wouldn't recommend jumping straight into it however, so I'm going to try and post a little bit of background material first. I'm now going to work backwards.

Today (9th October) I was about to write this post, and then I saw Sam Cappleman-Lynes comment on my previous post regarding Phistomefel's theorem being equivalent to SK Loops, and so I am wondering if the Fred's generalisation, and the further generalisation I am about to try and explain, are in fact equivalently explained using some impenetrable notation in a dark corner somewhere over on http://forum.enjoysudoku.com/. Maybe we'll find that out later, dearest reader.

Now let's jump back to the beginning. My attention was piqued over on Cracking the Cryptic by a pair of videos where both Mark and Simon solved the same puzzle by Philip Newman, which comes with the name Tatooine Sunset:

- Mark's guessy solve is here: How I "Cheat" at Sudoku
- Simon's more methodical solve is here: How NOT To Cheat At Sudoku!
- For reference, the puzzle is also discussed here: http://forum.enjoysudoku.com/tatooine-sunset-tatooine-sunrise-t38180.html

I watched Simon's video on Wednesday night (7th October), and saw there were an awful lot of swordfish going on, together with some triples, and that everything was hinting at some extraordinary underlying structure to this puzzle. I messaged Simon to say that there was something sort of Phistomefel-ly with the givens in the 4x4 region of the grid between Rows 2-5 and Columns 2-5 (A). I also saw there was something else going on with some more regions: namely those between Rows 7-8/Columns 2-5 (B), Rows 2-5/Columns 8-9 (C), and then Rows 7-8/Columns 8-9 (D).

This clearly suggested a Fred style partition, grouping together Rows 169 to define one part of the partition, and Columns 167 to define the other. The 9 cell set (E) defined by the intersection of these rows and columns is absolutely fascinating, as it is where the breakthrough in this puzzle is made, after you're done with all the fishing around.

Fred's theory tells us that the set of digits placed in those 9 cells, plus 3 extra copies of 1-9, are equal to the set of 36 numbers placed in the four regions A, B, C and D described above. I'd even got as far as noticing that the digits 5,7,9 occur exactly 3 times each in that ABCD area. Which implies that the set E cannot contain the digits 5,7,9, since:

E + 3 x (1-9) = A + B + C + D

and the 3 copies of each digit on the right hand side are exactly accounted for by the 3 sets of 1-9 on the left. This felt like an exciting deduction, and I'd even managed to have the idea that this was in some sense equivalent to swordfishes, because as Fred has told us, swordfishes are one thing that can be explained by this intersection theory.

At that point I think I'd have been left scrabbling around not knowing what to do next. It was as if I had some of the picture, but not all of it, and I had no idea what I was looking for. However fate then smiled upon me.

On Thursday night (8th October), as I was googling "tatooine sunset sudoku" to try and find the enjoysudoku forum post again, I noticed a link to Trevor Tao's video. [I'm so excited by all of this I am only going to mention in passing that I *think* Trevor may well be the brother of the Fields Medal winner mathematician, Terence Tao]. I also messaged Simon about this video on Thursday, so I'm kind of hoping we might be seeing a video on this in the next week or so. Simon certainly has a talent for clear exposition than I can only dream about!

Trevor's genius observation was that the right way of thinking about this puzzle is not via Fred's inersection theory, and a partition into 4 regions of the grids, but instead to partition rows and columns into 3 (which he does via colours in the video) to end up with a 9 region partition of the grid.

In case you are too lazy to scroll up to see the link to Trevor's video, I'm going to post it again. I can't get over how unbelievably elegant this generalisation of Fred's already very elegant intersection theory actually is!

The video skips over some of the details a bit quickly, so I'm going to add a couple of observations.

- I'm not entirely convinced using the same colours to partition both the rows and the columns is necessarily helpful. When playing around with this in my notebook, I'm labelling the columns ABC and the rows DEF.
- I also don't think displaying the dual view of the 9 resulting intersections as exactly the same grid as the original Sudoku puzzle is all that intuitive. Whilst the two are linked, they do behave very differently. In my notebook I drew a 3x3 array of the partition something like this:DA DB DCEA EB ECFA FB FC
- On the other hand there are some advantages to the colouring, as it does highlight some of the beautiful structures underlying this puzzle - firstly that the red rows/red columns intersection is entirely occupied by givens, which is also the case for the blue rows/blue columns intersection.
- More than that, the digits in the red intersection and the digits in the blue intersection have no overlap! And this imposes further structure into the dual 3x3 array because now it means that, for example, red rows/blue columns is the same set as blue rows/red columns. In other words it's Fred's intersection theory working together simultaneously over 3 different 2x2 partitions!
- What this then means is that you can play a weird sudoku-like game (albeit one with repeated symbols!) in this dual 3x3 array to then try and solve the sets of digits in each of the 9 intersections, noting that each complete set of 3 rows or 3 columns must contain 3 complete copies of the digits 1-9.
- For this particular puzzle, this game is by no means trivial. The step where Trevor places two 8's and two 1's in the green rows/green columns intersection (having first observed that having three of either is impossible) is quite subtle, and relies on some pigeonhole principle style reasoning. For example, if you only have one 8 then this implies three 1s, which we have already seen is impossible. And vice versa.
- It's also worth observing that to solve the sets of digits within the dual 3x3 array, you still need to make some reference to the original grid. It's not easy!
- When you have solved the dual 3x3 array, using all this information to solve the puzzle is also not easy. You'll need to mark up quite a few triples in the grid, noting that cells in some of the intersections have only 4 possible candidates, before you can finish the solve. As in Simon's video, the techniques used after this are simply hidden singles and naked singles, but let me say this again, to actually spot these after all of the above, well, it's not easy!

This is great stuff Tom, thanks for posting! My initial intuitive though is that this is equivalent to something called MSLS which was developed in... hmm, I want to say about 2012. I'm working on a comment now which expresses the solution to Tattoine Sunset in terms of MSLS. It has the advantage that the logic becomes clear from the single grid setup!

ReplyDeleteIt's valuable to have these different perspectives on equivalent logic. I'm excited that this stuff is getting rediscovered and made available to a new audience again!

My thought exactly. A different perspective changes everything. These sure are interesting times for sudoku!

ReplyDeleteI can't claim to understand MSLS, but I do note that one enjoysudoku forum user mentions it in the thread I linked above:

ReplyDeletehttp://forum.enjoysudoku.com/tatooine-sunset-tatooine-sunrise-t38180.html

I had a look at what I think is the reference for MSLS. I can't claim to understand it, although I suspect part of that is not being willing to battle through the ASCII representations of the examples. I think I'd like to see this presented with some better pictures.

http://forum.enjoysudoku.com/using-multi-sector-locked-sets-t31222.html

The thing that stands out for me is dividing 1-9 into these home and away sets. That seems very similar to my observation 4.

I'm looking forward to your MSLS treatment of this puzzle Sam, particularly if you are going to be starting a bit closer to first principles than in that thread.

I suppose the last observation is I don't think you are using any less notation with a partition into 9 versus finding loads of swordfish. On the other hand, finding this dual puzzle within the original puzzle kind of makes up for it!

To start with, I'm not going to define exactly what I mean by an MSLS. Firstly because I'm not actually sure anyone agrees on a definition, and secondly because I think the definition would be overly technical and obscure what's actually going on where an example would be clearer.

ReplyDeleteIn short, an MSLS is a bit like a fish (X-Wing, Swordfish, Jellyfish etc.) except that it is allowed to consider multiple digits at once. Let's solve Tattooine Sunset with them... without a separate grid, and without wonky dimensions.

The downside of using a single grid is that it's slightly harder to keep track of the deductions we've already made, so to follow this solution you *might* need a few pencil marks. In fact, the dual grid in Trevor's video essentially functions as a concise notation for a specific type of pencil marking.

The first MSLS is found by considering the digits 2, 7 and 9. This is not surprising given Trevor's first step involves these digits.

ReplyDeleteNote that the given 2s, 7s and 9s nearly all line up nicely in rows 2,5,7 and columns 3,4,9. This is usually a clue that there's an MSLS lurking, and it is usually found in the columns *NOT* occupied by the givens we've identified. In our case, look at columns 2, 5 and 8, whose own givens also line up nicely in rows.

Within columns 2, 5 and 8, we need to place a total of nine 2s, 7s and 9s. Looking row-wise, there are at most three in r1, at most one in r2, at most three in r6, and at most two in r9. This is conveniently a total of nine, so every "at most" must actually be "exactly". This tells us that r1c258 is a 279 triple, r6c258 is a 279 triple, and furthermore we find that r2, r9 cannot contain any 2s 7s or 9s outside of c258 except the givens.

Now there's a second MSLS, this time looking at digits 3,4,5 and 6. Again, not a surprise given what Trevor did.

Look at, funnily enough, columns 3,4 and 9. These columns must have a total of twelve 3s, 4s, 5s and 6s between them. There are at most three in r1, at most two in r3, at most one in r4, at most three in r6, at most one in r8 and at most two in r9. This totals twelve. So every at most is an exactly.

This means that in r1 and r6, the cells in r16c347 cannot be anything *but* 3,4,5 or 6, so eliminate any other possibility from those cells. In the other rows, we have a 34 pseudo-pair somewhere in r3, a 6 somewhere in r4, a 3 somewhere in r8 and a 45 pseudo-pair somewhere in r9, so those can be eliminated from elsewhere in those rows.

Note also that, focusing now only on 5 and 6 in the same columns, that we need to place six digits in total, and there's at most two in r1, at most one in r4, at most two in r6 and at most one in r9, which totals six, so we can eliminate 5 and 6 from the other cells in those rows.

After carrying out eliminations from these MSLSs, there is now a hidden single 8 in row 1. Progress! That's it for the MSLSs, actually, although to finish the puzzle from here you'll still need to keep the eliminations from them in your mind for a short while. Good notation needed.

Nose to the grindstone. You need to spot a 179 triple (rows 3,4,8) in column 7. Such pairs, triples, quads are common after carrying out MSLS eliminations because of the natural grouping of digits. Check that this then gives you a naked single 4 in r6c7, followed by a hidden single 4 in box 4. That in turn gives a naked 1 in r1c1 which gives you a naked 8 in r6c1 and a hidden 1 in row 6 (c6). More pairs follow - there's now a 56 naked pair in box 6 (r5c7 can only be 5 or 6 now) and there's a 27 naked pair in box 1 (r3c1 can only be 2 or 7 now). Finally a 36 naked pair in row 2 (c2 and c7).

Now with all of the marked pairs and triples, you should have no problem finishing the puzzle normally.

---------------------------------------------------------------

We used three MSLSs here. Actually, the first one doesn't count because it was degenerate - you could easily express it as three disjoint swordfish on 2, 7 and 9 and get the same eliminations. Same goes for the third one, it was just a swordfish on 5 and a separate swordfish on 6. I still find it easier mentally to think of them as an MSLS though - feels like less to keep track of.

The second one, though, was genuinely something that couldn't be done digit-by-digit. It combined two sorts of elimination in a way that only makes sense when considering all four digits together.

This comment has been removed by the author.

ReplyDeleteBy the way, here's what I think is a pretty general statement of "intersection theory". With certain exceptions (e.g. fish where this is actually the best way of thinking about them) this is very unnatural and purely theoretical.

ReplyDeleteDefine a truth set to be a set of potential placements of which exactly one is true. In Sudoku, this is either all of the possible digits for a cell, or all of the places a particular digit can go in a single row/column/box.

Let's say we have N pairwise disjoint truth sets - call these the "base sets". The base sets contain a total of N true placements, one for each truth set.

Suppose further that we can find another N pairwise disjoint truth sets called the "cover sets" with the property that every placement that is in one of the base sets is also in one of the cover sets. Then the cover sets also contain exactly N true placements, and since the cover sets cover all of the base sets, every placement that is in a cover set but not in a base set must be false and can be eliminated.

Almost every Sudoku technique is a special case of this general formulation:

- Hidden single (e.g. hidden single 4 in row 7), the base set is all of the places a 4 can go in row 7, and the cover set is all of the possible digits for the unique cell in row 7 that can contain a 4. The conclusion is that every non-4 digit can be eliminated from that cell.

- Hidden pair (e.g. a 36 pair in column 2), base sets are all of the places where a 3 or a 6 can go in column 2, cover sets are all of the digits that can go in the two cells in columns 2 that have 3 or 6 as possibilities.

- Hidden triple, hidden quad etc. are similar.

- Naked pair (e.g. a naked 45 pair in row 3), base sets are the possible digits in the two cells, covers sets are all of the places a 4 or 5 can go in the row.

- Naked triple, quad etc. are similar

- Pointing (e.g. pointing 5s in box 3 eliminating from the remainder of column 8), base set is all of the places a 5 can go in box 3, cover set is all of the places where a 5 can go in column 8.

- Claiming (e.g. 2s in row 8 confined to box 8 eliminating from the remainder of box 8), base set is places a 2 can go in the row, cover set is the places 2 can go in the box.

- X-Wing (e.g. X-Wing on 1 in rows 3, 4 eliminating from cols 6, 7), base sets are all of the places a 1 can go in rows 3 and 4, cover sets are all of the places a 1 can go in columns 6 or 7.

- Swordfish, Jellyfish etc. are similar

There are also extensions to having non-disjoint base and cover sets and unequal numbers of base and cover sets. For example, if there are N disjoint base sets and (N + k) disjoint cover sets then you can eliminate any placement that appears in (k+1) of the cover sets. I'll leave the non-disjoint cases to you.

This allows you to formulate e.g. the XY-Wing in this framework. Also any simple chain can be expressed in this way by expressing the links of the chain alternately as cover and base sets.

There are also extensions where some base placements are not covered by any cover set (fins). These give you finned X-Wings, finned swordfish and related techniques.

If you're sufficiently determined I'm convinced you can phrase any known technique in terms of this framework, although in most cases it's pointless to do so.

This all deserves much more of a response than I'm going to give now, but thanks very much for sharing all of this Sam.

DeleteIt does remind me of a comment I left on Roland's blog a while back:

https://hausigel.de/blog/2020/07/04/selection-puzzles/#comment-36

In short solving sudoku by thinking about these truth sets is basically the same thing as solving a column dance puzzle.

I think it's worth thinking about Roland's response that he'd worked out that you could build a Turing machine out of Tapas and so all puzzles are Tapa variations :-)

I was about to comment about the link between Column Dance and Sudoku via the DLX algorithm but I see that's exactly what your comment to Roland was... :)

DeleteI solved it using the intersection theorem. I had to use the theorem for 2 distincts dissections (for the Yellow and Red set of cells), which guided me to dissect the grid into 6 sets: Y=Yellow, R=Red, B=Blue, Gn=Green, Gy=Gray. See the picture here: https://i.ibb.co/Lrm7h0T/Tatooine.png.

ReplyDeleteThe solving path is equivalent to Trevor's one, though mine is less elegant and more cumbersome.

The equation for the yellow set intersection writes:

Y + 3*(1–9) = R + B + Gn

The one for the red:

R + 3*(1–9) = Y + Gy + Gn

Then I distinct the empty cells of green area (Gne) and filled ones (Gnf) and solved both equations:

B + Gne = Y + 3*(1–9) – R – Gnf = 111 2222 3 4 777777 888 999999

Gy + Gne = R + 3*(1–9) – Y – Gnf = 111 333 44444 555555 6666 888

->Gne cannot contain any 2,5,6,7,9 (because they are not present in both sets above). It can contain only one 3 and one 4 (1rst equation) and only two 1's and two 8's.

It is solved like this:

Gne = 11 3 4 88

B = 1 2222 777777 8 999999

Gy = 1 33 4444 555555 6666 8

Then you can solve the sudoku (not trivial, though but interesting with these groupings).

Very nice! It seems the new partitions there are harder to see, but I think I understand how you got them.

DeleteI’ve just about understood your partitions by adding and taking away rows/columns/boxes in the way Sam proposed. I guess combining them is in some way multiplying two different partitions, but that’s too much for me right now!

Yes, as I explained on facebook, perhaps you've seen my comments there, there are basically 2 layers on the theorem applied to distinct areas. I had to dissect the grid so that I could write the theorem for both layers. That means I had to distinct area that are overlaid in both layers. That's why the look of the partitions is more intricate. I hope it's clear.

Delete