July 6, 2009

In October 2008 I started building an 8 piece endgame database for the game of 10x10 international draughts, and today it is finished. The database contains the win, draw, or loss value of all 4x4 and 5x3 positions, which is a total of 16.5 trillion positions. Along with the WLD values, the database also contains the moves to conversion depths for all positions that are at least 10 plies from a conversion move. In 2007 I had built the databases for 2 through 7 positions, which are required for building the database of 8 piece positions. 

Here are some statistics about the project:

Total calendar build and verify time 8.8 months
Total CPU build time 2.65 CPU years
Total CPU verify time 2.91 CPU years
Size on disk (WLD only) 365GB
Average build speed per CPU 197529 positions/sec
Average verify speed per CPU 179584 positions/sec
Longest MTC position 149 plies {B:WK1,K7,48:BK8,15,K18,22,K33.}

For the first few months I used a single computer, a 2.4GHz quad-core with 8GB of RAM. Building endgame databases is well suited to using parallel computing techniques, and I ran 4 instances of the build program on this computer. By mid January the 8 piece database was only 14% completed and I decided I needed more computing power. I built an 8-core dual Xeon machine using two 2.5GHz quad core Xeons, and with that I was able to run 8 additional parallel instances of the build program.

It is clear from the above statistics that parallelizing the build is critical to getting the 8 piece database built in a reasonable period of time. This was accomplished by subdividing the database into many small pieces, and then recognizing which subdivisions can be built in parallel.

The first subdivision is by man and king counts. For example, once all the 4bk 1wm 3wk positions are built, then the positions with 4bk 2wm 2wk can be built in parallel with the subset of 1bm 3bk 1wm 3wk positions. In addition, positions with 4 black pieces and 4 white pieces can be built in parallel with positions having 5 black pieces and 3 white pieces, as these cases cannot have any common 8 piece dependencies.

The second subdivision is by the row of the most advanced black and white man. Take the subset of 1bm 3bk 2wm 2wk. Once all the positions with the most advanced black men on row 8 and white men on row 8 have been built, then the positions with (leading black man row 7, leading white man row 8) and (leading black man row 8, leading white man row 7) can be built in parallel.

The third subdivision is by the configuration of black men and white men on the leading rank. All configurations with a fixed number of black men on the most advance rank can be built in parallel, and similarly all configurations with a fixed number of white men on its most advanced rank can also be built in parallel. Consider again the subset of 1bm 3bk 2wm 2wk with one black man on the most advanced rank and two white men on its most advanced rank. There are 5 possible configurations of the 1 black man, and 10 possible configurations of 2 white men on white's most advanced rank, giving 50 possible combinations that can be built in parallel within just that one configuration of pieces, man and king counts, and rows of most advanced black and white men. When all the possibilities for parallel builds are considered from the combinations of first, second, and third subdivisions, at any instant in time during the build there are typically hundreds of subdivisions that can be built in parallel. A graph of the actual number of subdivisions that were available to be built in parallel on each day of the 8 piece build is shown below.

The graph does not tell the whole story, because the subdivisions are not all of equal size. But except for the first and last few days, and a couple of days around days 208 and 248, there were always more than enough subdivisions ready to be built to keep 12 cores fully busy, and I could have easily kept two or three times as many cores occupied with work and built the 8 piece database in proportionally less time.

One difficulty with the three layers of subdivisions is that it creates a huge number of files. There are about 200K subdivisions in the 8 piece database, and for most of these there are 8 files generated, four WLD files and four MTC files. This is a lot of files and that creates a few difficulties. One problem is that this format is not good for efficient lookups during a draughts engine search. For this purpose the WLD and MTC databases were reindexed into subdivisions using only piece counts. The reindexing process runs about 10 times faster than building databases. For most of the build period I used 10 cores to build and verify, and two cores to reindex the WLD and MTC subdivisions.

Verification is done after each subdivision is built and compressed. Each non-capture position is visited and its value is checked to be consistent with the WLD values of its successors. Only non-capture positions are checked because capture positions are excluded from the compressed database in order to make the runlength compression more effective. One would expect that this verify process should run much faster than building databases. There are fewer calculations involved, and since there are no capture positions in the compressed databases, most of the positions in each subdivision do not need to be verified. I was initially surprised to find that verification was on average a bit slower than building. The reason is that many successor positions are captures, and during verification their values need to be found using a small search. This is time consuming, and often one capture position will be a successor to several positions within a subdivision. In those cases the search will be performed during verification each time a position being verified has this capture position as a successor. During database building, the WLD values of positions within a subdivision are computed only once, and then they are available for the remaining time of building that subdivision. A simple way to speed up the verification process would be to cache the value of capture positions during the verify step. I expect this would nearly double the speed of the verify process, and had I done this the 8 piece database would have been built in two months less time. The graphs below show the average build speed, verify speed, and the percentage of built positions that are captures as a function of build day.