The Effect of Endgame Tablebases on Modern Chess Engines
Total Page:16
File Type:pdf, Size:1020Kb
The Effect of Endgame Tablebases on Modern Chess Engines By Christopher Peterson CPE 462: Senior Project II Dr. Helen Yu June 11, 2018 Abstract Modern chess engines have the ability to augment their evaluation by using massive tables containing billions of positions and their memorized solutions. This report examines the importance of these tables to better understand the circumstances under which they should be used. The analysis conducted in this paper empirically examines differences in size and speed of memorized positions and their impacts on engine strength. Using this technique, situations where memorized tables improve play (and situations where they do not) are discovered. ii Contents 1 Introduction 1 1.1 Memorized Positions in Chess . .1 1.2 Problem Statement . .1 1.3 Topic Overview . .2 1.3.1 Literature Review . .3 1.3.2 Background . .3 1.3.3 Analysis . .3 2 Literature Review 5 2.1 Comparing Chess Players . .5 2.1.1 Selection of the Best Treatment in a Paired-Comparison Ex- periment . .5 2.2 Complexity in Chess . .5 2.2.1 Programming a Computer for Playing Chess . .5 2.2.2 Searching for Solutions in Games and Artificial Intelligence . .6 2.3 Endgame Tablebases . .6 2.3.1 Retrograde analysis of Certain Endgames . .6 2.3.2 Space-Efficient Indexing of Chess Endgame Tables . .7 2.3.3 Reference Fallible Endgame Play . .8 2.3.4 Can Endgame Tablebases Weaken Play? . .8 2.4 Creating and Testing Engines . .9 2.4.1 Programming a Computer for Playing Chess . .9 2.4.2 Mechanisms for Comparing Chess Programs . .9 2.4.3 Statistical Minefields with Version Testing . 10 3 Background 12 3.1 Complexity in Chess . 12 3.1.1 State-space Complexity of Chess . 12 3.1.2 Game-tree Complexity of Chess . 12 3.1.3 Solving Chess . 13 3.2 Memorized Positions in Chess Engines . 13 3.2.1 Why is Memorization Used in Chess Engines? . 13 3.2.2 Current Limits of Memorization in Chess . 15 3.2.3 Places Where Memorization is Currently Feasible . 15 3.2.3.1 Opening Book . 15 iii 3.2.3.2 Endgame Tablebase . 15 3.2.3.3 Endgame Bitbases . 16 3.3 Evaluating Chess Engines . 17 3.3.1 What Makes a Chess Engine Good? . 17 3.3.2 Determining Strength by Evaluating Board Positions . 18 3.3.3 Determining Strength by Playing Partial Games . 20 3.3.4 Determining Strength by Playing Full Games . 21 3.4 Comparing Chess Engines . 22 3.4.1 Variables to Consider when Evaluating Engines . 22 3.4.1.1 Move Strength . 22 3.4.1.2 Move Difficulty . 22 3.4.1.3 Hardware Speed . 23 3.4.1.4 Time Taken . 23 3.4.1.5 Memory Usage . 24 3.4.2 Elo Rating System . 24 3.4.3 Likelihood of Superiority . 25 3.5 Chess Engine Communication Protocols . 26 3.5.1 Chess Engine Communication Protocol . 26 3.5.2 Universal Chess Interface . 27 4 Analysis 29 4.1 Goals of Analysis . 29 4.2 Overall Approach . 29 4.3 Tools Used . 30 4.3.1 Strategic Test Suite . 30 4.3.2 Cutechess-CLI . 30 4.3.3 Bayeselo . 31 4.3.4 PGN-Extract . 31 4.3.5 PolyGlot . 31 4.3.6 Nalimov Tablebases . 32 4.3.7 Sygyzy Tablebases . 32 4.4 Overall Method . 32 4.4.1 Constants . 33 4.4.2 Engines Tested . 34 4.5 Test 1: Time Control . 35 4.5.1 Test-Specific Goals . 35 4.5.2 Variables . 35 4.5.3 Method . 35 iv 4.5.4 Results . 35 4.5.4.1 Weak Engine . 35 4.5.4.2 Medium-Strength Engine . 36 4.5.4.3 Strong Engine . 37 4.5.5 Analysis of Results . 37 4.6 Test 2: Tablebase Size . 38 4.6.1 Test-Specific Goals . 38 4.6.2 Variables . 38 4.6.3 Method . 38 4.6.4 Results . 39 4.6.4.1 Weak Engine . 39 4.6.4.2 Medium-Strength Engine . 40 4.6.4.3 Strong Engine . 41 4.6.5 Analysis of Results . 42 4.7 Test 3: Drive Speed . 42 4.7.1 Test-Specific Goals . 42 4.7.2 Variables . 42 4.7.3 Method . 43 4.7.4 Results . 43 4.7.4.1 Weak Engine . 43 4.7.4.2 Medium-Strength Engine . 44 4.7.4.3 Strong Engine . 45 4.7.5 Analysis of Results . 46 5 Conclusion 47 5.1 Future Work . 48 Opening Books . 48 Deeper Testing on a Per-Engine Basis . 48 A References 49 B Appendix 52 B.1 Tablebase File Notation . 52 B.2 Trimming 5-Piece Tablebases . 52 B.3 Stripping 5-Piece Tablebases . 53 B.4 Using PGN-Extract . 53 B.5 Using Cutechess CLI . 54 B.6 Using STS With Engine Options . 54 B.7 Testing to Ensure Tablebases are Properly Configured . 55 v B.8 Using Bayeselo . 57 B.9 Glossary . 58 50-Move Rule . 58 Solid-State Drive . 58 B.10 Full Code . 58 vi List of Figures 1 The Djaja position. White to move and draw . 19 2 The BNK vs K checkmate. White to win in 29 moves . 20 3 Relative Elo vs. Seconds per Move for AlphaZero and Stockfish [11] . 23 4 UML State Diagram of the XBoard Protocol by Alessandro Scotti [16] 27 5 The .bat file used to run Cutechess CLI . 54 6 A simple PolyGlot initialization file . 55 7 The tablebase test position. Ke4 leads to a draw in 15 moves, but all other positions lead to Black winning . 56 8 The .bat file used to execute the cutechess tablebase test. checkTB.pgn contains the position to test . 56 9 An output file of a successful test . 57 vii List of Tables 1 Sizes of commonly used endgame tablebases [1] . .1 2 Sizes and metrics of commonly used endgame tablebases [1] . 16 3 Weaker engines that will be used and their CCRL 40/40 ratings [8] . 34 4 Medium-strength engines that will be used during analysis and their CCRL 40/40 ratings [8] . 34 5 Strong engines that will be used and their CCRL 40/40 ratings [8] . 34 6 Results of 250 weak engine games played at 10s . 36 7 Results of 100 weak engine games played at 50s . 36 8 Results of 250 medium-strength engine games played at 10s . 36 9 Results of 100 medium-strength engine games played at 50s . 36 10 Results of 250 strong engine games played at 10s . 37 11 Results of 100 strong engine games played at 50s . 37 12 Sizes of trimmed and stripped tablebases . 39 13 STS comparison between N2, N2 TrimmedTB, and N2 StrippedTB . 39 14 250 weak engine games played between complete and trimmed tablebases 39 15 250 weak engine games played between complete and stripped tablebases 40 16 STS comparison between Spike, Spike TrimmedTB, and Spike StrippedTB 40 17 250 medium-strength engine games played between complete and trimmed tablebases . 40 18 250 medium-strength engine games played between complete and stripped tablebases . 41 19 STS comparison between Stockfish, Stockfish TrimmedTB, and Stock- fish StrippedTB . 41 20 250 strong engine games played between complete and trimmed.