Can I derive an adequate evaluation function for Milhouse through random play?
As I have mentioned from time to time, I have an implementation of a middling checkers player program that I’ve called Milhouse. Every few months I take time out to play with it and decide what I can do to make it better. Recently I’ve been pondering the possibility of implementing drop-out expansion to make a good opening book, and I’ve also been pondering how I might make further additions to its evaluation function. Sadly, improving its evaluation function requires that I actually understand more about the subtleties of checkers, and as yet, my own study of this fascinating game have been sporadic.
There have been a number of papers suggesting schemes for automatically generating heuristics for games. Most of the ones I’ve read have been based upon machine learning, and are sufficiently complicated that I haven’t had the time or energy to take a look at them. But this evening I stumbled upon the following paper by Abramson and Korf which suggested a different scheme which seems like it could be interesting to experiment with:
The basic idea is that the evaluation of a node can be defined as the expected value of games played randomly from the terminal position. In other words, you can determine the value of a position by playing a bunch of random games from that position, and returning the estimated expected value.
This seems like a bad idea on the face of it: for example consider the classic First Position from checkers:
This position requires really deep search: in fact, even with an endgame WLD database, my program has difficulty actually making headway. It seems all moves as winning, and therefore all moves as having virtually the same value, which means that it searches a lot of nodes.
Martin Fierz’s Cake program plays this position quite well. It takes a while to resolve, the first capture occurs some 56 ply deep in the search.
[Event ""] [Date ""] [Black ""] [White ""] [Result "*"] [Setup "1"] [FEN "W:WK19,K18:BK32,3."] 1. 19-23 32-28 2. 23-27 28-32 3. 18-23 3-8 4. 27-24 32-28 5. 24-19 28-32 6. 23-26 32-28 7. 26-31 28-32 8. 19-16 8-12 9. 16-19 32-28 10. 31-27 28-32 11. 19-23 32-28 12. 27-32 28-24 13. 23-18 24-28 14. 18-22 28-24 15. 32-28 24-19 16. 22-18 12-16 17. 28-32 19-24 18. 18-15 16-20 19. 15-11 24-19 20. 32-27 19-24 21. 27-23 24-28 22. 11-15 28-24 23. 15-18 24-28 24. 23-19 28-32 25. 18-23 32-28 26. 23-27 28-32 27. 19-23 20-24 28. 27x20 32-28 29. 23-19 28-32 30. 20-24 32-28 31. 24-27 28-32 32. 19-23 32-28 33. 27-32 28-24 34. 32-28 24-20 35. 23-18 20-16 36. 18-15 16-20 37. 15-11 20-24 38. 28x19 *
But this paper points out something interesting: to be useful, the program only needs to differentiate better moves from worst moves at the search horizon. And, the fact of the matter is that random play might very well be a good estimate of the relative value of the two positions.
Abramson and Korf did some experiments using the game Othello which compared using the expected outcome of random play as a heuristic value, and compared it with a standard static evaluation function, the worst possible choice, and a complete minimax search from the node (which was the true value of the game). It was found that using expected outcome was reasonable, and made fewer errors than their common static evaluator.
So here’s my question: is this applicable to endgames in checkers? At the time the paper was written, there weren’t large endgame databases for checkers, but now I have a copy of the WLD database for all positions with < 8 checkers. I can quite easily run this experiment, and compare my own static evaluation function to the evaluation function achieved by looking at the expected outcome from random play, and compare them to the actual game theoretic value of the position.
I estimate that it’ll take a few hours of coding to set this up. I’ll report back on the results sometime in the next couple of weeks.
Addendum: I arrived here actually by looking at research on Go, a game which I understand even less than Checkers, but which has seen some interesting research in the last few years, particularly in the idea of “Monte Carlo” search: using random play to evaluate game positions. I’m not so interested in Go, but am somewhat interested in Hex, which has similar characteristics: namely, very high branching factor which makes deep searching hard, and very large configurations during the endgame which prevents straightforward endgame databases. I’d like to write a Hex program someday.