Posted: Wed Aug 17, 2005 9:09 am Post subject: Shortest proofs by contradiction / puzzle hardness

Hi all,

I've just finished first stage of my solver that's intended to give shortest proofs by contradiction for hard puzzles. Currently the solver can find the shortest proofs in terms of number of guesses and show which candidates can be examined to provide a proof. I'm going to extend it so it'll be able to show the exact logical steps constituting the shortest proof path. (probably in September, after going back from holiday)

Results I gained so far make some interesting points in discussion on hardness rating. 3 most interesting:

This puzzle from menneske.no got very high rating as it produces a HUGE search space when choosing wrong way at the beginning. However, there exist a number of one-guess proofs. So my solver will probably not rate it not so high :)

Another hard sudoku, shame on me I didn't note who generated this. What's interesting is that is has ONLY ONE 2-guess proof, which seems to be a bit complicated (5 contradictions for the first guess and 2 for second). When limiting logical rules to only 2 basic ones, we need 3 guesses in chain.

Then there goes a puzzle rated 2 by tilps. A number of 2-guess proofs is possible, however using only 2 basic rules for contradiction-proof my solver wasn't able to give ANY proof. Possibly this shows need for merging proofs by contradiction and by lookahead in one module.

So, which of these puzzles would you consider the hardest? ;)

If someone is interested in more details, I've put them here.

Hardest is the one which takes the most time to solve... of course! In my case this means the last tilps puzzle, which took 0,4 ms. The middle one took only 0,195 ms and the menneke.no 0,15 ms. So yes, my solver is only doing its work computer speedwise, ie. it tries to solve any sudoku as fast as possible. Thus it is a backtracker with three simple logics which are fast to calculate with computer. Times vary from 0,02 ms to about 6 ms.

Probably the other ones are trying to find a "human scale difficulty", which is a bit out of my interest. There are too many humans out there. It would be better to make a program that measures how good someone is in solving a sudoku so it tells a personal difficulty level for the given puzzles. So yes, it would alter the difficulty levels based on what the player is good at. If the player is bad in some kind of solving, then a difficult puzzle would include that kind of a problem...

your program is certainly useful to explain sudoku-solution-paths
to the audience !

But for measuring hardness there are other factors to be considered
too. Does it make sense to search for these shortest proofs
or will this waste too much time in the other puzzles
so the total running time on a random set of puzzles
increases ??
I made the experience, that puzzles can require 5-10
times more computer-solution time, depending on just
how you rotate/reflect the ruzzle before solving it,
but examining what is the best orientation is nevertheless
too much waste of time in average.

So, that would be interesting to see some statistics of your solution times.
And how your hardness measure correlates with others
on a larger sample.
E.g. I put my list of 100 hardest here: (ordered)
http://magictour.free.fr/top100
others have reported that they get completely
different orderings of hardness here.

I agree there is no single human difficulty measure that would be perfect for everyone. But still we tend to look for such ratings, there are plenty of them around. I just wondered how can we prove a guess for hard puzzles, and also - "how hard" a guess is. There are many aspects in rating a puzzle, and guess-proof hardness can be one of them. For me it is (as far as human hardness measure is concerned).

Guenter,

your top100 definitely shows that computational difficulty and human difficulty are different. Sometimes they vary much, as computers can remember plenty of future states and people can't (this is also why I prefer contradictions over lookaheads). I've examined several puzzles from your top100, on both sides of the list are puzzles that can be solved in a step-by-step manner, and they are generally considered easier for humans than those which require trial-and-error. And since I'm interested in human-scale measure, and human-understandable proofs, my program operates in this field. And it seems I'm not the only one interested in such proofs.

Of course search for the shortest proofs increases drammatically running time, but my solver is not meant to be a speed-competitor. Moreover, sometimes an easier puzzle produces bigger search space and thus takes longer to compute. Running times vary much, e.g. 1.75 ms for your top1 and 0.75 for top100 up to over 2 seconds! (divide these number by 4 to scale for modern computers). But time is not a key factor for my solver as it's targeted at points (1) and (b) Doug Bowman gave here. I think I'll refrain from producing total hardness measure, the solver will just give an estimation of proof hardness.

>Merri,
>
>I agree there is no single human difficulty measure that would
>be perfect for everyone. But still we tend to look for such
>ratings, there are plenty of them around. I just wondered how
>can we prove a guess for hard puzzles, and also - "how hard"
>a guess is. There are many aspects in rating a puzzle, and
>guess-proof hardness can be one of them. For me it is (as
>far as human hardness measure is concerned).
>
>Guenter,
>
>your top100 definitely shows that computational difficulty
>and human difficulty are different.

does it ? Doug in the thread you quoted below said that they correlate well.
But I found that even different solvers perform quite different on
that list or any other list of "hard" puzzles. I haven't yet asked
humans to rate the puzzles.
My hardness-measure and e.g. Merri's have correlation coefficient 0.37 here.
When I just rotate the sudokus in that list by 180', then I get
a correlation of only 0.16 with the same program !
When you send me your 100 ratings, I'd like to compute our
correlation coefficient. Or yours vs. Merri's.

>Sometimes they vary much,
>as computers can remember plenty of future states and people
>can't (this is also why I prefer contradictions over lookaheads).

memory is not the problem here. You rarely have nested loops
of depth more than 5-10. Humans could do it with photo-copying
or using helper-software.

>I've examined several puzzles from your top100, on both sides
>of the list are puzzles that can be solved in a step-by-step
>manner, and they are generally considered easier for humans
>than those which require trial-and-error. And since I'm
>interested in human-scale measure, and human-understandable
>proofs, my program operates in this field. And it seems I'm
>not the only one interested in such proofs.

Yes. I'd noticed that even Mathematicians write papers now
about this "backtracking-free-solving" ! (Eppstein,Simonis)

>Of course search for the shortest proofs increases
>drammatically running time, but my solver is not meant
>to be a speed-competitor. Moreover, sometimes an easier
>puzzle produces bigger search space and thus takes longer
>to compute. Running times vary much, e.g. 1.75 ms for your
>top1 and 0.75 for top100 up to over 2 seconds! (divide these
>number by 4 to scale for modern computers). But time is not
>a key factor for my solver as it's targeted at points (1) and
>(b) Doug Bowman gave here. I think I'll refrain from producing
>total hardness measure, the solver will just give an estimation
>of proof hardness.

Suppose, ten of my puzzles can be solved in half the time
with "step-by-step", but this increases the average running
time by 20% - then it won't make sense to implement it.

In principle IMO there should be no difference between
human-rating and computer speed.
When there actually is one then it's because computer programs
are not good enough (programmers optimize their implementation
times as primary goal, running time of the program comes only 2nd or 3rd !)
and because humans are not yet good enough at simple backtracking.
Or just don't like it.

> When I just rotate the sudokus in that list by 180', then I get
> a correlation of only 0.16 with the same program !

Does it mean that correlation coefficient can be easily manipulated? Shouldn't puzzle rating be invariant of group operations?

> When you send me your 100 ratings, I'd like to compute our
> correlation coefficient. Or yours vs. Merri's.

In fact I don't currently have any ratings, but when I get to it, it'll be interesting to compare them. Though, as I wrote before, I'll be rating just guess hardness, not total puzzle hardness.

> In principle IMO there should be no difference between
> human-rating and computer speed.

But why? We know very well that human brains operate differently from computer processors. What is a trivial task for a machine - e.g. performing an arithmetical operation on 2 floating-point numbers - can be difficult for humans. On the other hand - a chess playing program that would not take advantage of knowledge base gained by humans, would be just weak.

> humans are not yet good enough at simple backtracking.
> Or just don't like it.

And that's my point - people don't like it. Because most of them consider it hard. E.g. I prefer to see a logical explanation of all steps I take during solving sudoku. When backtracking with branched paths is involved, it's hard for me to see the big picture - so I find such a puzzle hard. Like puzzle #42 from your list, for which there is no unbranched proof by contradiction (which forces me to extend my solver :)

>> When I just rotate the sudokus in that list by 180', then I get
>> a correlation of only 0.16 with the same program !
>
>Does it mean that correlation coefficient can be easily manipulated?
>Shouldn't puzzle rating be invariant of group operations?
>
>> When you send me your 100 ratings, I'd like to compute our
>> correlation coefficient. Or yours vs. Merri's.
>
>In fact I don't currently have any ratings, but when I get
>to it, it'll be interesting to compare them. Though, as I
>wrote before, I'll be rating just guess hardness, not total
>puzzle hardness.

avoiding guesses could be too expensive

>> In principle IMO there should be no difference between
>> human-rating and computer speed.
>
>But why? We know very well that human brains operate differently
>from computer processors.

i.e. slower

> What is a trivial task for a machine -
> e.g. performing an arithmetical operation on 2 floating-point
> numbers - can be difficult for humans. On the other hand -
> a chess playing program that would not take advantage of
> knowledge base gained by humans, would be just weak.

that's because the program isn't yet strong enough.
Tricks used by humans are often not included because
it's tedious to implement them.

>> humans are not yet good enough at simple backtracking.
>> Or just don't like it.
>
>And that's my point - people don't like it. Because most
>of them consider it hard. E.g. I prefer to see a logical
>explanation of all steps I take during solving sudoku.
>When backtracking with branched paths is involved,
>it's hard for me to see the big picture - so I find
>such a puzzle hard. Like puzzle #42 from your list,
>for which there is no unbranched proof by contradiction
>(which forces me to extend my solver :)

when this sudoku-craze goes on and there are
championships etc., then I assume that the
best human sudoku-solvers will at last use
backtracking almost as frequently as computers do.
It would be helpful to solve it on computer then,
so you can go back to the last furcation easily once
you encounter a dead end. With pencil and paper
you might want to use a different color for each
level of backtracking, so you can locate and erase
the last path.

when this sudoku-craze goes on and there are championships etc.

You've lost me on two accounts:
1. Crazes (by definition) don't go on for long (see http://dictionary.reference.com/search?q=craze)
2. Championships? I seriously doubt their relevance to Sudoku. Certainly any online challenges are a complete waste of time since any kid can download a solver and have an answer in milliseconds.

craze: pardon my bad English, I just pick the expressions which I meet...
but I assume you know, what I mean.
championship: but there are also human-fast-solving championships
or tournaments with judges, not online, but in halls with spectators
or in closed rooms with jurors.

Anhow, I don't mean to argue with you. I just doubt that Sudoku "championships" will ever generate much interest.

There are puzzle championships where they have a question on sudoku or a variant. The wikipedia entry on sudoku has some more details. I know that the entrants to these competitions use trial and error to solve, because it is faster.

>We know very well that human brains operate differently
>from computer processors.

i.e. slower

You know it's not the whole truth. Digital computers and analog neural networks are very different means of processing information. What would be the point in creating artificial neural networks if computers operated naturally this way? No simulations would be necessary!

when this sudoku-craze goes on and there are
championships etc., then I assume that the
best human sudoku-solvers will at last use
backtracking almost as frequently as computers do.

Perhaps you're right. But what is faster, can be less enjoying. When I can't see logical dependencies in a static picture, solving sudoku is no longer fun for me - and I suppose I'm not the only one. The more complicated logics behind solving steps, the harder sudoku is. For me. You prefer to look at the computational compexity only. However, understanding a logical formula that describes some pattern may be easy, but "resolving" it - hard
(i.e. computationally complex). What I remeber form a Prolog course is that a programmer must "help" the computer in order to make the SLD-resolution work efficiently. Two programs, totally equivalent from logical point of view, may have execution times differing in orders of degree. (let somebody correct me if I made it wrong)

but when it comes to the special task of solving 9*9-sudoku puzzles,
then I think the best humans and the best computers
will operate similarly.
An optimal strategy for computers will also be an optimal strategy
for humans and vice versa.

In principle it's also the same with chess-computers,
just that all the human tricks are not yet implemented...

But... the thing is that with computer the approach will tell how quickly the task gets done. Using computer you can push the information to very tiny space and operate it quickly, using certain methods. The code I have isn't targetting to be superior as an algorithm, it is targetting to be fast: move the least memory it can, use the fastest math a CPU can perform. So I have chosen logics that are fast to calculate using math and which require less moving of memory.

You can't emulate that with human brains as it is. Thus, algorithm speed shouldn't be used for rating difficulty for humans. I should add a difficulty counter to my code so that it gives or emulates a human-like rating to the logics the code uses. And... it should gather statistics and analyze the board a bit more, so that instead of quitting immediately when it finds a solution it should look for the additional ways that lead to a failure, so that it can see how much complexity there is in the board in total which confuses a human player.

You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum