Friday 1 November 2013

Points scoring at the WSC

This is the first post in a series of blogs where I'd like to engage the Sudoku community in various discussions relating to the World Sudoku Championship

For this first post, I'd like to talk about points scoring at the WSC.

The traditional model of the WSC goes like this.  The organisers will fashion together a set of puzzles of various types, and then group them into a number of rounds.  The puzzles will be test solved by a number of testers, and their times will be aggregated, and each puzzle will consistently be assigned a number of points according to how they tested.  For example, if we have decided each championship minute should be worth 10 points, a 2 minute puzzle will be graded as being worth 20 points.

I think there are a few things worth discussing here, but the first I'd like to concentrate on is that there are Sudoku puzzles, and there are Sudoku puzzles.  For the former, read things that are generally well known and recognised as Sudoku, your classics, diagonals, extra regions, odd/evens, irregulars and killer.  For the latter, you are almost encroaching onto WPC territory with things like Greater Than, Kropki and Skyscrapers - all of which exist as standalone Latin square puzzles without Sudoku's trademark Third Constraint.

At every WSC round there is inevitably (at least one) long round, typically 45 minutes to an hour, or even longer, full of challenging Sudoku variants (see also the Daily League project) that if you were to show them to the general public, you could be sure of a reaction of bemusement and bafflement.  Due to the length and difficulty of these rounds, they are usually the rounds that score the most points and decide who finishes where in the classification.

As such, the winner of the WSC tends to be the best all-round solver, combining a mix of WPC skills and raw Sudoku solving speed.

I think the balance used to be weighted much more to WPC skills than it was.  For example, in 2006 Rachel Roth-Huber bested David McNeill (and my good self!) at the Times Su Doku Championship.  But the following spring at the WSC in Prague, David - a WPC veteran - finished 4th, compared to Rachel who finished way back in 66th (for reference, I was 45th).  This is not to say David does not possess exceptional raw Sudoku solving speed (he certainly does), but I would certainly have forgiven you for expecting the gap between the two at the WSC to be a little closer.

These days, the WSC play-offs form a large (although not perfect) overlap between the best all round solvers and the quickest classic solvers, which I think is probably testament to the fact that Sudoku solvers have generally improved their puzzling skills.

However I'd like to finish by asking my audience to question this paradigm.  What if it were decided to place (for the sake of argument) a 0.8 multiplier on the long variants rounds at the WSC, to place more emphasis on the friendlier, more publicly recognisable puzzles?  Would this be a good idea?  And do you have any other thoughts regarding WSC scoring?

I'd love to know everyone's thoughts on this, no matter what kind of solver you are!

27 comments:

  1. My 2 cents: What's the issue with simply having more classic and standard variants? ie. Instead of having 1 killer and 1 diagonal like this year, why not have 2 of each? That will greatly shift the balance back over to the more standard variants. Personally, I think that the sudoku competition should be about sudoku, not only in points, but in time spent as well.

    ReplyDelete
    Replies
    1. Yes, this is a good point - one way of increasing the weighting given to classics and standard variations is to simply have more of them!

      The question then becomes will this be seen as being too boring? I'd be interested to see what people think!

      Delete
  2. my 2 cents to begin with, dont have too many rounds. it becomes a test of endurance rather than a test of your solving skills. more inputs will follow

    ReplyDelete
    Replies
    1. There are two ways to interpret endurance: number of rounds, and length of the rounds. I think both are worth talking about, and I might reserve a separate post to talk about this in the future :)

      Delete
  3. I don't see the problem in point-distribution but rather the puzzles themselves.
    In Thailand, we only use computer-generated basic variants (your definition of sudoku) in every single tournament. This also applies to Philippines and Singapore who dominated in their tournaments consisting of only Classics, but when we come to WSCs, we produce mediocre results, clearly its because of the very different puzzles used.

    I disagree with the idea of sudoku having its own championship in the first place. Soon we'll be seeing World Tapa Championships, World Kakuro Championships and so forth. Personally, I don't mind the overlap between WPC puzzles appearing in WSC - as the only puzzle solver in Thailand I definitely have an edge over my fellow teammates this way. But to be politically correct, I think the World Sudoku Champion should be the fastest solver of the regular Sudoku - and not any of its variants. Of course that would be boring , which is why I find it silly to separate sudoku from puzzles in general in the first place.

    The reason WPC works is because "puzzle" is a general term encompassing a lot of types. Sudoku is only a subset of puzzles.
    Its like medley and backstroke. You can't include freestyle (something a medley swimmer would have to master) in the backstroke event.

    ReplyDelete
    Replies
    1. The idea of having competitions with respect to one different style is something I'd like to talk about later - but I don't think it's such a bad idea.

      I think historically, the whole point of having a separate WSC was an attempt to cash in on the popularity of Sudoku. From the beginning the competition has always featured variants to add interest, which is no bad thing I think, but regardless of that the WSC has carved out a niche and most online competitions seem to follow the variant template - because that's the best way to prepare for the WSC.

      Still, I think Sudoku is broad enough to warrant its own event separate from the WPC. We are up to over 150 variants with the Daily League, and there seems to be no let up in people's creativity. To that end I think a WSC should celebrate the sheer diversity you can have with Sudoku. But this brings us back to the issue at hand. How much weight should be given to variants?

      Of course, penalising the weightings of particular rounds at WSC's is nothing new - indeed it seems the only fair thing to do with novelty rounds like the envelopes in China or the Round 2 with the tiles in Eger.

      Delete
    2. Glad to see another author starting the "what if?" ideas going even if some of the discussion will be rather broad. Let me just give two quick points that may spur further discussion.

      I think one challenge with the current organization scheme is that the mindset of hosting a WPC is one of showing off new puzzle designs, and over the 22 years of the tournament (before recreational puzzle blogs and regular internet contests existed) it was almost like a puzzle convention to showcase puzzle creation in addition to being a competition. Take this thinking and apply it unchanged to the WSC and you end up with a lot of novelties and fewer classic or standard variation puzzles. Prescribing guidelines for how much of each to have will be difficult to make work, since even existing guidelines for the WPC on what should appear do not work as hosts have full discretion.

      You can look at WSC5 to get my first feel for what an event should look like. I was proud that we took a very particular mindset in hosting WSC5 to split apart sections with just classic sudoku (40% of the tournament), reasonably standard variations (30-40% of the tournament), and novelties (20-30%) and we combined the variation types in the "field" rounds so all solvers would have several familiar puzzles in each round. I'd be curious to hear, of the 8 events so far, how you would place certain ones versus others as having sudoku, "sudoku", and puzzles. I think we tried to move as cleanly to the first category as we could.

      As a second point, I often hear people say that just solving classic sudoku for speed would not be as "interesting". Let me say I would gladly pay to solve a classic sudoku tournament put together by Tetsuya Nishio, but never pay for one put together by random monkey at typewriter #655321. At the same time, from a competition fairness perspective, random puzzles might be the best choice as they should be capable of using a broad set of possible techniques if curated properly. I know I can write a puzzle that Kota can solve faster than anyone else, and I know I can write a puzzle that others can solve as fast or faster as Kota. And it is all in the techniques present. This tells me removing humans might be a good effect from a competition perspective, just as say Rubik's Cube solves are fully random and no thought is put into their selection. But now you look entirely unlike a WPC tournament and entirely more like other competitive games.

      Delete
    3. So the first thing I'd like to say is that as relative newcomer to the WPC I'm a big fan of the format, and if it is seen as designers showcase then so much the better. Despite the suffering it seemed to cause at the time, the Evergreens round in Eger must surely go down as an all time classic.

      Whilst I think there are a few problems if the same thinking is applied to the WSC, I'm not sure novelty is so much of a factor any more - for example we didn't see anything new in Beijing. But I do think one of the biggest problems facing organisers of a WSC is just how you group the puzzles together. This is certainly one of the key areas where I thought the event in Philadelphia was an overwhelming success - to my eyes WSC5 remains the gold standard. No one round had overly significant numbers of points attached to it, there weren't any dominating puzzles within any of the rounds and the themes linking the puzzles in each round were clear and well thought through.

      In other competitions, you end up inevitably having the kind of dominating, "assorted" round which decides everything. This was one of the problems with Zilina - after round 1 everything else was almost irrelevant.

      With regards to your second point, I think you bring up something which could be expanded upon in many different ways. For the last few years, classics of similar difficulties have been grouped together, roughly along the lines of your 100m, 400m and (regrettably) your 1500m round. Would you group together your techniques based on a scanraid style hierarchy of techniques, or would you have a true melting pot of classics (whose difficulties would nevertheless be somewhat betrayed by points distributions).

      Sources of puzzles is definitely something I'll blog more about - I find even a given generator will tend to have its own identifiable feel and flavour to the puzzles it puts out.

      Delete
    4. Random sudoku in my model case probably means random. Most computer programs actually have an embedded rating system to filter the puzzles into groups and this is one cause of bias or similarity. Say the Times wants a Fiendish for Saturday, and the rater requires a Swordfish or Y-Wing to hit that level. Now all your Saturday puzzles will have a Swordfish or Y-Wing. (This oversimplifies things, but is hopefully clear). I could certainly see something to giving a round of classic sudoku without any scoring of the puzzles given to the solvers beforehand. They probably want a definition like we used for 400m, but you would not necessarily know the sprint puzzle or the naked single puzzles or anything else. They might just all have 17-27 randomly positioned digits and 1 solution.

      Delete
    5. Using predefined patterns, or predefined symmetries would be another bias that would not fit my meaning of random. I remember, in practicing Stefan Heine's puzzles for the US Sudoku Championship in 2009, that the repeat C4 symmetry really made the puzzles feel the same in terms of how you treated scanning location discover.

      Delete
    6. A round of classics without any grading is certainly a fascinating idea - but it seems to me to have a few hurdles to overcome. I can only really see the way that you'd make such a round fair would be to keep going until everyone finishes - otherwise surely you have the element of luck as discussed here:

      http://logicmastersindia.com/forum/forums/thread-view.asp?pid=13296

      coming into things?

      Delete
    7. I would certainly intend a round like that to be finished by at least 10 solvers. (My preferred model for a championship would have every round always finished by at least 10 people.)

      I might actually suggest "flex timing" so the round lasted until this was true, but "flex timing" is really only required if you don't think you can accurately time the solvers as on a team round.

      Delete
  4. My personal view point is that not all "new" variations need to be equated to being Puzzle-oriented in nature. One can have a newer variant that doesn't deviate much from using the Sudoku techniques. Ideally for the WSC any Sudoku, across the Championship, should use the classic techniques for a good bit of the solve. The variation rules should compliment things rather than dominating the solve. If this is achieved, I don't think it matters if a variant is new or standard.

    A counter point to consider is, it is important to hold the variants at a WSC because, as far as I know, Sudoku variants are avoided as much as possible in the WPC because of the existence of a WSC. If the WSC changes to include only standard variants, there are many nice ideas that work well with the Sudoku rules that either won't be used or will probably make their way into WPC, which I don't think is a good idea because the converse holds true and puzzle solvers would not like an excess of Sudoku variants.

    And I don't think we need to worry about bemusement/bafflement at the rules at the WSC level. I'm sure anyone who makes it to that level is capable of adapting to the rules. It is more about the authoring side of things I feel, because often, in making a Sudoku variant "nice", some givens are removed to use the extra rules even more, which is what causes the deviation more than the variants themselves being standard/new.

    Just to give an example - See the usual Greater than, with no givens, it uses the classic rules, but the inequalities and working the extremes dominate the solve. But say you have a Greater than Sudoku like the ones on Croco (with better aesthetics maybe), with some amount of givens, and only some inequalities, it suddenly becomes a Sudoku-friendly solve. This holds true for almost any puzzle-like variant.

    ReplyDelete
    Replies
    1. Oh and one possible way of awarding the best solver of standard Sudoku would be to have a separate Sprinter award or a Classic Solver award. The chances are they overlap with the eventual best, but you can limit this award to the winner across the classic rounds.

      Delete
    2. So I recall that's what about in Goa in 2008 - and I note that there were separate rankings for "Track" and "Field" in Philadelphia in 2010.

      I suppose the point I'd raise here would be that it's damaging to dilute the title of World Champion. It turned out that Thomas won both titles in 2008, to cut the debate short, but I don't think that there'd be one winner of both titles today. I would suggest we want to keep things as simple as possible, rather than have a multitude of different titles.

      Delete
    3. Well I'm not suggesting the Sprinter award be a "World Title". Just an award of "best Classic solver at the World Sudoku Championships". I agree with that though, that its best not to dilute the title.

      But I'd say that the GP might go a little ways to doing that already, as you now have the World Sudoku GP champion, Kota and the World Sudoku Champion, Jin. We also have Tiit as a runaway preliminary winner. So there's something in the fact that it will be handful of people in that bracket anyway if you're having the GP final, a number of Sudoku Preliminary rounds, and a play-off.

      Delete
    4. Oh I agree - one of the main arguments I have for radically changing the format of the Sudoku GP is that it's fairly pointless to have an online version of the WSC. In my eyes the GP should certainly take a heavily classic oriented format.

      As for a separate classics title - I guess the point is all anyone really cares about is who is the World Sudoku Champion. Once you start having to explain the technical differences between each title I feel you start diminishing all of them. Whilst we all know that the field of Sudoku is pretty broad, I don't think it is sufficiently broad to carry any more than one main title.

      Delete
    5. Well yeah. So IF the Sudoku GP does become classic oriented (something I think is a really good idea, and needs to be done, as you are encouraging everyone to take part), then we have a GP champion anyway who will be the best in the standard stuff. I think its best to keep the World Sudoku Championship as it is.

      Delete
    6. I will comment on point distributions and other stuffs discussed here soon. Just wanted to reply on this subject.

      Do you really want the GP becomes classic oriented? Then what? You'll have 8 similar tournaments containing 8 classic sudoku, 2 irregulars, 2 diagonals, 2 killers...? Do you want to restrict the authors to it? Do you want to have ranking like that: http://www.ukpuzzles.org/final-results.php?contestid=14 with no mean to prove that some players cheated?

      I think the strength of the GP is to have 8 different countries as organizers, you have 8 different visions, 8 different tournaments. The variety is exciting, it gives the opportunity to countries that has not enough means to organize WSC to show their skills. And players can explore new horizons.

      Delete
    7. Fred - part of the problem with the GP will be that it's going to be hosted on 1 website. I note that for Germany, UK and USA the GP legs were actually just the national WSC qualifiers. I can't speak for Germany or USA, but given the UK has to organise a online qualifier, on our own servers, and an offline open competition to organise next year there is very little chance we'll have another spare month to prepare a GP leg in the same format as last year.

      Re the cheating - I believe the best safeguard to this is making everyone do it online, and having publicly viewable replays. It's a dead giveaway when you see that someone is putting in no digits for the first 30-60 seconds, and then enters in the solution row by row. I suppose it doesn't rule out more sophisticated cheating, but there are other ways that organisers can keep on top of things.

      Given that Karel already has a feature like this on Sudokucup, I don't think this will be so much of a problem. We all know who the top guys are, and what they are capable of so we know exactly when we should be raising our eyebrows.

      Delete
    8. I think more than countries having time to organize a variety-filled contest, the main point for me is that the WSC is kind of an isolated event where just a bunch of people get to that level and then have a chance to improve at that level. There is sometimes a vast gap between an experienced world tournament solver and the audience that we try to reach out to.

      The main goal of the WPF GP should be to encourage the general audience and give them a chance to compete against the best. Otherwise we have the same situation thats been going on for years, i.e. a handful of people from each country are the only ones who get exposure at the world level and keep improving.

      We obviously can't have an open participation for an offline event like the WSC but we can easily do so with these online contests and that should be used for growth.

      We can still have our regular schedule of challenging contests on various sites and still enjoy the WSC. The first GP proved that the WPF does have a considerable audience reach, but we kept losing folks as the test variety was at our "normal" levels (maybe we'd lose them anyway, but I think it played a part). That reach should ideally be used to increase numbers rather than provide one more similar contest to the many we already participate in.

      Delete
    9. I think making the WPF Grand Prix Classic oriented would probably do best with the general public. And it would probably be worth testing by for example making half of the rounds more classics oriented and others not. You can see what the effect of participation is in rounds that are and that aren't. Because I think there are people who are attracted by new ideas as well as people who are attracted by common ideas.
      But I think one of the goals we should have, is to increase the group of what we now refer to as Classics. Because all these variants were at one points new. And certain Classics we now see a lot were at one point more complicated too. I am a bit afraid that Classics tend to now be defined with "variants that are easily made with a computer program and thus appear in random magazines and newspapers". But I hope that is not what we want to keep Classics to be.
      I think Classics are much better defined as Sudoku puzzles that use Sudoku techniques prominently in their solve. So that doesn't mean they have to be completely familiar ideas. You can combine your Classics rounds with similar puzzle ideas, that still fit this description.

      I'm not going to comment too much on what I think the WSC should look like, because if I got to choose it would have minimal classic variants. I score well on my logic strength, not my speed. I'm just going to say that I don't think Classics should take too much of a forefront. Because then there comes a point where you say, oh the best Sudoku solver is just someone who's trained at being really, really fast, while I think part of being the best Sudoku solver should also have a strong logic backing to solve the more uncommon variants as well.

      Delete
  5. Finally got time to answer this post. I hope that I'll have some readers as it is supposed to be a discussion (perhaps it had been better to propose this on a forum, where answers are more visible).

    Firstly, let's talk about the 0.8 multiplier : It's hard to answer, because IMO there are lot of factor to deal with (I'll developpe further), but I want to bring 2 things : I really don't understand your reasoning even after reading your message. Classic sudoku is more popular : yes, of course. For me, in the message, you make lot of confusion between uncommon variant and WPC-style sudoku. I consider myself as a sudoku specialist, I don't enjoy so much puzzle competitions (even if sometimes, I make the effort to take part, even in WPC), but I enjoy sudoku variants (common and uncommon) at least as much as classic sudoku (or « classic variants »). Also, I've to say that it really don't make difference if a variant existed as a latin square before. Take the example of kropki, ok, it was a puzzle before to be a sudoku, but I really don't think that one needs WPC skills to be good at kropki sudoku. I think that as it remains digits to fill with constraints between them, a sudoku specialist can learn how to solve it as fast as a player who also has WPC skills.
    So I really don't see what is your goal of the 0.8 multiplier. Secondly, I want to remember you that this idea isn't new. In 2012 and 2013, the rounds seems to be (approximately) equally balanced (2 points/minutes theoretically in 2012 – except round 7 – and between 6.25 and 6.8 this year in Beijing). But if you look at 2011 IB (Eger), you'll see that there are lot of difference between rounds. For example, look at the last round, 3D sudoku, which brings only 140 points in 30 minutes, while some other rounds have 590 points / 45 minutes for example. My explanation is that organizers wanted that lot of people can finish the 3D sudoku (that was the case, I think more than 50 players had it), so in order that the round has not too much importance, they gave it less points. In that example, I can see a real motivation (related to fairness of competition) for having a multiplier factor. This can be viewed as a multiplier factor or an extended time, and it brings me to follow my arguments...

    ReplyDelete
  6. Your point of view seems a bit dogmatic for me : you want to give more weight to classic rounds for some reason, arguing that normally the rounds are equally weighted (2 points/minutes or so...). That's only the theory of instructions booklet, and if we analyze real facts after the competition, we can see that rounds are far from being equally weighted ! Let's analyse the best performance and global performance (total of points of all players) of classic and long variant rounds of WSC 2013. The best player in round 1 (easy classics) gained 7.5 points/minute. In round 4 (hard classics) : 8.2 points/minute. And the long variants rounds, round 2 : 6.8 points/minute and round 9 : 5.6 points/minute. You have more than your 0.8 mulitplier between round 4 and round 9, at least regarding top players. This factor is a bit smaller if you consider all players : round 1 and 4 have resp. 517 and 466 points/minute and long variants round : 451 and 434 points/minute. Regarding the best performance, hard classic was the most weighted of these 4 rounds, but it seems that hard classics are a bit harder for the mean player, haha ! (we have a 0.84 multiplier between round 1 and round 9 regarding all players). I want to bring out another interesting point, concerning short (sprint) rounds. Both 15-minutes rounds seemed to have a lot of weight regarding top players (9.9 and 9.3 points/minute resp. for round 3 and 10), which is much more than other rounds analyzed before. But the same rounds, regarding all players give : 257 and 249 points/minute, which is much lesser than other rounds. It looks like sprint rounds were a bit like jackpots that can be won by only a few players. You can really be effective with a 0.8 multiplier only if each puzzle is perfectly tested, and you know exactly in which time the best player of the world will solve it. I think sudokus were well tested this year, I globally agreed with the points distribution and timing given for each round, but even in that case, as we see, some rounds can have different weights.

    ReplyDelete
    Replies
    1. Fred, these points per minute calculations are absolutely brilliant! Would it be possible to repeat them at different levels so we have a more complete picture? For example at #10 for the round, #40 for the round, #80 for the round and #120 for the round?

      Delete
  7. Another point with which I've to disagree is that you seem to consider that long variant rounds have too much importance on the final ranking. Ok, I concede that if you've a very bad long variant round, it has a dramatic consequence on your final result (however I don't agree that one mistake at the beginning of a long round will break your whole round, on the contrary of sprint rounds. I'll perhaps discuss this point on your other message). But another analyze of the results of this year suggests that long variants rounds had no more effect on ranking than others, given a group a player of same level. I'll focus on top10 players, and assume that this year WSC was fair enough to consider that the best 10 players were those who qualified for playoffs. A look at the difference between the best and the worst of them on each round is very interesting. One may think that 88 minutes round will make around 6 times more difference than 15 minutes round... It's not so !
    Long variants rounds produced respectively 214 and 95 points of difference (round 9 had not lot of « spread » effect among the best players). Classic rounds made each exactly 114 pts difference. Sprint (15 minutes) rounds made respectively 148 and 140 points of difference. Interesting also to see that if you add the both long variants round and both sprint round, Tiit Vunk(that made the best performance in the 2 domains), had a larger difference on sprint rounds : 268 points more than Sun Cheran than on long variants rounds : 259 points more than Jan Novotny.

    My conclusion : The organizer of a WSC has to think carefuly about points, time and time bonus they attribute to each puzzle and each round. It can have significant effect on ranking. If you want to give more place on classic sudoku, or classic variations, just having more of them seem a better idea to me than try to control it with the point distribution. The danger is to have (I caricature a bit, but it make me remember something) a WSC with classic rounds, and the variant parts consisting of a full round of irregular (beurk!), a round of mental calculations variants and a round of geometrical variants... then you'll do as if all these interesting and « not so rare » variants don't exist. I definitely don't think it's a good idea ! Some of the online tournaments are jewels in terms of quality of the puzzle and/or innovation. I would be sad not to find the same level of creation during WSC.

    I find Prasanna's idea to give a special award to the best classic sudoku solver a good idea. I want that the world champion shows amazing skills on both classic and variant sudoku and I think the actual format of the competition requires it. Without multiplying titles, I think that having a second figure that isn't World Champion, but known as the best classic solver of the year could be nice, without requiring another playoff or final.

    ReplyDelete
    Replies
    1. Fred - thanks you very much for your comments, you raise plenty of good points.

      There is a small problem with your analysis of longer rounds - if you are investigating the possibility there is an issue with the scoring, your test cases must be selected independently of the scoring system. It goes without saying the top 10 will not have too much variation, because they all scored highly. Without further information we can't say anything!

      Nevertheless, it is only a small problem. If we track the performance of the 9 (not Ulrich) GP finalists, we see they finished 1, 2, 3, 5, 6, 7, 8, 11 and 29. Clearly the correlation is very good there, apart from Michael Ley. It might be interesting to track the WSC performance of 11-20 in the GP as well.

      Delete

Contact Form

Name

Email *

Message *