I was waiting for sleep to come, and surfed over to the American Checker Federation website. As long-time readers of this blog might remember, I've been tinkering a checkers program together, which I tentatively named "Milhouse" to play checkers. This week's problem challenge was a classic 2 on 3 battle where White is to move and force a draw from the down position:
It's late, and been a full day, so I can't really say that I understand this position. But Milhouse doesn't get tired: when I load the endgame database, it immediately proclaim that the position is a draw. What's more is that it identifies two different lines: their website suggests 23-26 as a drawing move, but 19-15 is also a draw.
Without the endgame database, White thinks it is down a man, but with a 25 ply search can't find the draw. In addition to the two moves listed above, it thinks that 23-18 is also almost as viable. But 23-18 is a dead loss, presumably because it allows Red to king both of its checkers. Even with the database though, Milhouse can't seem to find the win for Red in a 31 ply deep search. This seems like a good test case for future improvements to playing with the database.
This morning, I woke up and was trying to resolve a problem: why is Milhouse (my checkers program) so slow? The answer didn't seem entirely obvious, so I pondered it some more. I refined it to "why is Milhouse so slow when it accesses the endgame database files?" I think I know why, but unfortunately, it will require some additional thought and work to remedy.
To explain, let me recap what happens. Put in simple terms, Milhouse does an alpha-beta search to some depth. As it is traversing in depth first order, it may realize that certain paths are futile (it can prove that no leaf in the subtree will be the right move regardless of what value it has, even without searching). Eventually, you'll get down to a leaf node, and you'll "evaluate" it (run a simple function which gives a figure of merit). These values get "backed up" through the tree and one of them ultimately wins as the value of the search.
In other words, every time you search, the score that you get is the evaluation of some node on the search frontier.
Let's imagine for a moment that you have a simple evaluation function. Say you count men (unpromoted checkers) as 100 points and kings as 130 points, and the evaluation is the sum of all your scores minus the sum of your opponents scores. That's certainly a reasonable, but relatively un-knowledgeable evaluation function, but it would "work". The problem is this: it doesn't differentiate very finely. Many, many positions on the search horizon will have identical scores. This significantly reduces the efficiency of search, because the "backed up values" can't do their job efficiently: there are many subtrees with nodes on the search frontier that look just as good as the nodes you've already looked at. Hence, you end up searching and ultimately evaluating many more nodes than you could if you took more information into account.
Ultimately then, "knowledge is power". Improving the evaluation function to include more information has an effect on overall search efficiency. The more knowledge it has in being able to differentiate between the merits of two positions, the fewer overall nodes it will have to evaluate in search (the fewer subtrees it will have to resolve all the way to the leaves).
My evaluation function doesn't understand anything about runaway checkers or trapped kings, and therefore it has to use deeper searches to resolve situations that include these features. When I've played against Cake, I've noticed that not only does Cake search deeper faster, but Cake seems to recognize that it's got me in trouble in searches of much shallower depth (sometimes as many as five moves earlier).
So, I've got to help Milhouse learn more about checkers. I can do this two ways: one, by learning more about checkers myself, and putting some of that knowledge into Milhouse. The second, by learning more about machine learning, and then allowing Milhouse to learn more itself, either through self-play or training.
I suspect that machine learning might be more productive, but I'm not entirely convinced. I'll have to give it a bit more thought.
Addendum: Okay, that was highly abstract, let's work through an example. I've been trying to absorb some knowledge from Checkers legend Derek Oldbury, whose Move Over is available online. Here's Diagram 17, a fairly simple position:
If it were Black's turn to move, it would be a draw (both men promote, and a boring 2x2 King draw ensues) but if White is to move, clearly, he can't move his king (backing away pins him against the wall, advancing is suicide), so he must move his man. Putting him against the wall is hopeless, the black man takes the opposing position and waits to jump him. Therefore, the line goes 24-19 8-11 19-15 18-22 15x8 22x13. There is an exchange of material, but while white will king his man, he's still doomed, as he can't escape to the safety of the double corner. Here's how the search looks using milhouse without its endgame database:
milhouse: restore diagram.17 board restore from diagram.17 White +--------+ | - - - -| |- - - - | | w - - -| |- - R W | | - - - -| |- - - - | | r - - -| |- - - - | +--------+ Red ... White to move ... FEN: W:WK17,24:B8,K18 milhouse: id +1 : [-11] 0.00s : 24-19 +3 : [-15] 0.00s : 24-19 8-12 17-13 +5 : [-12] 0.00s : 24-19 8-11 19-15 11-16 15-10 +7 : [-19] 0.00s : 24-19 18-23 19-16 23-18 16-12 8-11 12-8 +9 : [-16] 0.01s : 24-19 8-12 17-21 18-22 19-15 12-16 15-10 16-19 10-6 +11 : [-12] 0.02s : 24-19 8-11 19-15 18-22 15x8 22x13 8-3 13-9 3-8 : 9-14 8-11 +13 : [-14] 0.03s : 24-19 8-11 19-15 18-22 15x8 22x13 8-3 13-9 3-8 : 9-14 8-11 14-18 11-16 +15 : [-17] 0.10s : 24-19 8-11 19-15 18-22 15x8 22x13 8-3 13-9 3-8 : 9-14 8-11 14-18 11-16 18-15 16-12 ... researching after failing low (-42) +17 : [-9983] 0.30s : 24-19 8-11 19-15 18-22 15x8 22x13 8-3 13-9 3-8 : 9-6 8-11 6-10 11-16 10-15 16-12 15-11 12-16 final score = -9983 0.327 seconds elapsed. ... 231735 nodes searched, 65777 nodes evaluated ... 30493 alpha nodes, 5483 pv nodes, 65965 beta cutoffs ... avg branching factor 1.12 ... 11096 nodes ordered ... 19060 pv hits, 16916 pv misses (52.98%) transposition tables ... 166190 gets (85501 tier0 / 1072 tier1 / 79617 failed) ... 101941 puts (85701 tier0 / 16240 tier1) (30493 alpha, 5483, 65965 beta) ... 0 chinook database accesses.
The output bears some explanation. it uses iterative deepening to search, adding two ply to each search as it deepens. So, it starts out with one ply depth, then three, then five, and so on, up until it discovers a winning line with a 17 ply search (it's actually a 19 ply winning line, quiescence search makes it resolve the final jump that wins the game). You'll notice that at each search depth, it believes 24-19 to be the best move (it is) but that right up until the realization that the game is lost, (a 15 ply search) it believed the game to be essentially even. Had I encountered the starting position at the leaf node of a search, it might have believed it could salvage a draw from this positon, and selected it over a position which actually provided a draw.
Now, let's look at the same search, but using the endgame database.
board restore from diagram.17 White +--------+ | - - - -| |- - - - | | w - - -| |- - R W | | - - - -| |- - - - | | r - - -| |- - - - | +--------+ Red ... White to move ... FEN: W:WK17,24:B8,K18 (DB W LOSES) milhouse: depth 99 search depth is 99 (DB W LOSES) milhouse: id +1 : [-7614] 0.00s : 17-13 +3 : [-7614] 0.00s : 17-13 18-14 24-19 ... researching after failing low (-7712) +5 : [-7712] 0.01s : 17-13 18-14 13-9 14x5 24-19 ... researching after failing low (-7808) +7 : [-7808] 0.01s : 24-19 8-11 19-15 18-22 17x26 11x18 26-30 +9 : [-7810] 0.01s : 24-19 8-11 19-15 18-22 15x8 22x13 8-3 13-9 3-7 +11 : [-7812] 0.02s : 24-19 8-11 19-15 18-22 15x8 22x13 8-3 13-9 3-7 : 9-14 7-2 +13 : [-7812] 0.02s : 24-19 8-11 19-15 18-22 15x8 22x13 8-3 13-9 3-7 : 9-14 7-3 14-10 3-8 +15 : [-7812] 0.03s : 24-19 8-11 19-15 18-22 15x8 22x13 8-3 13-9 3-7 : 9-14 7-3 14-17 3-7 17-14 7-3 ... researching after failing low (-9983) +17 : [-9983] 0.04s : 24-19 8-11 19-15 18-22 15x8 22x13 8-3 13-9 3-7 : 9-14 7-3 14-10 3-8 10-15 8-12 15-11 12-16 final score = -9983 0.075 seconds elapsed. ... 27737 nodes searched, 4147 nodes evaluated ... 4573 alpha nodes, 124 pv nodes, 9520 beta cutoffs ... avg branching factor 1.09 ... 55 nodes ordered ... 3388 pv hits, 1309 pv misses (72.13%) transposition tables ... 23599 gets (9081 tier0 / 15 tier1 / 14503 failed) ... 14217 puts (14125 tier0 / 92 tier1) (4573 alpha, 124, 9520 beta) ... 3852 chinook database accesses. (DB W LOSES) milhouse:
In spite of the fact that it loaded 3852 positions from the database cache, it actually ran 7x faster. You can see that it searched and evaluated many fewer nodes. The highly negative scores even at the initial search depth of one means that it knows this position is lost immediately, so it effectively gives you the benefit of an additional 17 ply of search, should this position show up as a leaf in the search tree. What is needed is to create an evaluation function which more closely predicts the value of deeper searches without the expense.
Here's an example position that causes difficulty without the endgame database:
Without the endgame databse, milhouse scores this position as +6 for red, but of course, it's actually won (it'll take about 7 more ply of search depth to discover). If you look at how it scores the possible moves, you get:
analyzing board position... -6 : 6-1 11-15 ... -4 : 6-2 11-15 ... +5 : 6-9 11-15 ... +12 : 6-10 11-8 ...
You'll see it selects the right move, but the reality is that the first three moves are all draws, where the last move allows the win.
So, for the past couple of days, I've been working on fixing the WLD database lookup code in Milhouse, in an attempt to get it to handle endgame positions properly. There is a tiny bit of subtlety which I haven't seen explained anywhere else, so I think I'll take a moment here to explain it.
The Chinook database that I am using contains WLD for all positions with six pieces or less. But that's not actually true: it actually doesn't include information for any positions which are captures. Why? To shrink the size of the database. Roughly half of the positions (if I recall) are capture positions. Chinook works better if we look those up in the smaller database that results after the jump, rather than here. That's all very well and good. We could just do a quiescence sort of search: jump out your own captures, and see what the resulting positions are.
But there is a further wrinkle: Chinook deletes not only positions where not only the moving side has no captures, but the other side has none either. Again, this deletes about half the remaining positions from the database, so it's good, but now you really need to be somewhat clever.
Basically, you end up writing a little minimax search engine, but instead of running the evaluation function at the leaves, you do database lookups. You have to keep recursing down until both the moving side and the non moving side have no captures, then you can do the lookup. If any of your children are a loss for the other side, then you return a win. If all of your children are wins for the other side, then you've loss. Otherwise, you are drawn. (Implement that logic carefully, I botched it once, which led to an evening of debugging.) The algorithm always terminates because one or both sides is removing at least once checker for each pair of plies.
And, it appears to be working!
Frankly, using a WLD for only six pieces seems kind of odd. If we stored a "distance-to-conversion" or perfect play database, we'd take up more space, but it woud be trivial to get the program to play out difficult endgames, effectively adding dozens of ply to the search engine with relatively little complication. I think that's next on my list. I've begun trying to decipher Appendix B in Schaeffer's "Solving Large Retrograde Problems" paper, but the calculation of MaxWP is confusing, to say the least. I'll have to work through it myself.
Addendum: Actually, it's interesting to note that three quarters of the entire database are locations which are captures. It seems like that can't possibly be true in actual play (actual captures are quite rare in end games). Is this an exploitable feature? Martin Fierz hints that it might be in his "Stepping Stone" project suggestion. I'll have to think about it some more.
Addendum2: Sigh. There is an infinite loop condition which appears not to be caught. Not sure what the deal is with that yet. I'll have to think about it more. I think it is because you don't make any headway if the opposing side has a capture, so you play a move which removes the capture, then he plays a move which restores the capture, and so on... Not sure how to catch that. If we play two non-capture moves in a row, does that mean that we are drawn? Perchance.
I was surfing over at Bob Newell's "Checker Maven" website, and found this awesme collection of PDN files for download. I have a table of challenging positions compiled into milhouse, which I use for debugging and testing, so I decided to hack a simple parser to input some of the positions into Milhouse. The "Gem Problems" file contain 162 positions,
some all of which are quite challenging. Consider this one:
It takes around a 55 ply search to find the winning line, but a reasonably skilled checker player would be expected to play this line out without mistakes. Milhouse still finds some positions like this challenging, since it typically maxes out around 40 ply or so before grinding to a halt. I suspect this reveals a deep flaw in my move ordering, and I think it might be difficult to fix it. But I'll think about it some more.