Evaluating Knowledge Transferability in Chess Endgames Using Deep Neural Networks

Evaluating Knowledge Transferability in Chess Endgames using Deep Neural Networks by Frímann Kjerúlf Thesis of 30 ECTS credits submitted to the School of Computer Science at Reykjavík University in partial fulfillment of the requirements for the degree of Master of Science (M.Sc.) in Computer Science June 2019 Examining Committee: Yngvi Bjornsson, Supervisor Professor, Reykjavík University, Iceland Stephan Schiffel, Assistant Professor, Reykjavík University, Iceland David Thue, Assistant Professor, Reykjavík University, Iceland i Copyright Frímann Kjerúlf June 2019 ii Evaluating Knowledge Transferability in Chess Endgames using Deep Neural Networks Frímann Kjerúlf June 2019 Abstract Transfer learning is becoming an essential part of modern machine learning, especially in the field of deep neural networks. In the domain of image recognition there are known methods to evaluate the transferability of features which are based on evaluating to what degree a feature extractor can be considered general to the domain, or specific to the task at hand. This is of high importance when aiming for a successful knowledge transfer since one typically wants to transfer only the general feature extractors and leave the specific ones behind. The general features in the case of image classification can be considered local with respect to each pixel, since the feature extractors in early layers activate on simple features like edges, which are localized within a certain radius from a given pixel. One might then ask the question, whether similar methods are also applicable in other domains than image classification, and of special interest are domains characterized by non-local features. Chess is as excellent example of such a domain since a square’s locality can not be defined by the adjacent pixels alone. One needs to take into account that a single piece can traverse the whole board in a single move. We show that this method is applicable in the case of chess endgame tablebases, in spite of structural differences in the feature space, and that the distribution of the learned information within the network is similar as in the case of image classification. iii Titill Frímann Kjerúlf júní 2019 Útdráttur Yfirfærslunám er orðið nauðsynlegur hluti af nútíma vélnámi, sérstaklega á sviði djúp- tauganeta. Á sviði myndgreiningar eru til þekktar aðferðir til að meta flytjanleika eiginleika í yfirfærslunámi sem byggjast á því að meta hvaða hlutar tauganetsinns finna almenna eiginleika á borð við línur og litaskil, og hvað hlutar finna sértæka eiginleika á borð við andlit eða hús. Þetta er mikilvægt þegar framkvæma á árangursríkt yfirfærslunám því oft reynist best að flytja einungis þá hluta netsinns sem eru almenn- ir. Í myndgreiningu þá teljast þessir almennu eiginleikar vera staðbundnir með tilliti til hvers díls því hægt er að skilgreina hvort díll sé hluti af línu eða ekki með því að skoða einungis þá díla sem eru innan vissrar fjárlægðar frá dílnum. Þá er hægt að spyrja sig hvort þessar sömu aðferðir eigi við í öðrum óðölum sem einkennast af óstaðbundnum almennum eiginleikum. Skák er mjög gott dæmi um slíkt óðal þar sem ekki er hægt að skilgreina nánd hvers reits með því að horfa einungis á nágranna hans. Taka þarf tillit til þess að sumir taflmenn geta farið þvert yfir borðið í einum leik. Niðurstaða okkar er að áðurnefndar aðferðir úr óðali myndgreiningar til þess að meta flytjanleika eiginleika, eiga vel við í óðali endatafla í skák þrátt fyrir þennann mun á eiginleikum óðalanna, og að lögun og dreyfing upplýsinga innan tauganetsinns er áþekk. iv Evaluating Knowledge Transferability in Chess Endgames using Deep Neural Networks Frímann Kjerúlf Thesis of 30 ECTS credits submitted to the School of Computer Science at Reykjavík University in partial fulfillment of the requirements for the degree of Master of Science (M.Sc.) in Computer Science June 2019 Student: Frímann Kjerúlf Examining Committee: Yngvi Bjornsson Stephan Schiffel David Thue v The undersigned hereby grants permission to the Reykjavík University Library to re- produce single copies of this Thesis entitled Evaluating Knowledge Transferability in Chess Endgames using Deep Neural Networks and to lend or sell such copies for private, scholarly or scientific research purposes only. The author reserves all other publication and other rights in association with the copyright in the Thesis, and except as herein before provided, neither the Thesis nor any substantial portion thereof may be printed or otherwise reproduced in any material form whatsoever without the author’s prior written permission. date Frímann Kjerúlf Master of Science vi - Dedicated to Loki - vii Acknowledgements I want to thank my supervisor Dr. Yngvi Björnsson for all the support and goodwill throughout this project. This work was funded by 2014 RANNIS grant “Hermi- og brjóstvitstrjáleit í alhliða leikjaspilun og öðrum flóknum ákvörðunartökuvandamálum”. viii Contents Acknowledgements viii Contents ix List of Figures xi List of Tables xii List of Abbreviations xiii List of Symbols xiv 1 Introduction 1 2 Background 3 2.1 Convolutional Neural Networks . 3 2.2 Transfer Learning . 3 2.3 Transferability . 5 2.4 Endgame Tablebases . 6 3 Methods 8 3.1 Board State Representation . 8 3.2 WDL Values . 10 3.3 Transferability . 11 3.4 Network Design . 14 3.5 Transfer Learning . 16 3.6 Expansion Learning . 16 4 Results and Discussions 18 4.1 Tuning Hyperparameters . 18 4.1.1 Final Hyperparamters . 24 4.2 Experimental Setup . 24 4.3 Transferability . 25 4.3.1 Performance Loss Due to Co-Adaption . 25 4.3.2 Performance Loss Due to Specification . 26 4.4 Transfer Learning . 29 4.5 Expansion Learning . 31 ix 5 Conclusion 33 5.1 Summary . 33 5.2 Future Work . 34 Bibliography 35 x List of Figures 2.1 An example of a Convolutional Neural Network (CNN) with 4 convolutional layers (Conv) and one fully connected layer (FC) (from Medium.com [3]) . 4 2.2 Difference between traditional ML (left) and transfer learning ML (right). (from Pan and Yang, 2010 [4]) . 4 3.1 Chess table with numbering of squares, where a1 is mapped to 0 and h8 to 63......................................... 8 3.2 Example board state with 7 pieces . 8 3.3 Example from Figure 3.2 in vector state representation............ 9 3.4 For full state representation. Each piece type has its own 8 × 8 bit-array. 9 3.5 Example from Figure 3.2 in full state representation. Each piece has its own 8 × 8 bit-array, resulting in a 8 × 8 × 4 tensor. 9 3.6 Image classification performance after knowledge transfer between domain A to B (AnB) and B to B (BnB) as a function of n, the number of transferred layers. Transferred weights of AnB and BnB are kept frozen while transferred weights of AnB+ and BnB+ are allowed to fine-tune (from Yosinski et al. [1]). 11 3.7 Comparing accuracy of 2x2 and 3x3 filters . 14 3.8 Evaluating optimal number of convolutional layers . 15 4.1 Comparing performance of Adam vs Adadelta . 19 4.2 Evaluating effect of batch normalization on accuracy . 19 4.3 Comparing accuracy of 16 and 32 bit floating point precision . 20 4.4 Evaluating the number of epochs needed for convergence . 21 4.5 Evaluating the possibility of overfitting . 21 4.6 Evaluating effect of batch size on accuracy and training speed . 22 4.7 Co-adaption splitting at layers 3, 4 and 5 . 25 4.8 Performance drop due to co-adaption and specification . 27 4.9 Knowledge transfer seeds better initial accuracy . 28 xi List of Tables 2.1 Definition of WDL values in the Syzygy tablebase, giving the game theo- retical values for a given board state. 7 2.2 Syzygy endgame tablebase information, showing both All possible states and states with only Pawns and Kings..................... 7 3.1 Breakdown by WDL values showing the number of states for each WDL value and the corresponding ratio of the whole dataset. 11 3.2 Evaluating optimal number of convolutional layers . 15 3.3 Evaluating optimal network size (250 epochs) . 15 4.1 Evaluating effect of batch size on accuracy after 100 epochs . 23 4.2 Evaluating effect of batch size on accuracy and training time for 10 epochs 23 4.3 Evaluating effect of batch size on accuracy after 30 minute training time . 23 4.4 Chosen hyperparameters and network information . 24 4.5 Network model showing layer type, filter size and number of parameters . 24 4.6 Final label prediction accuracy. 30 4.7 Final accuracy of φ3 on D3 by WDL values. 30 4.8 Final accuracy of φ3!4 on D4 by WDL values. 30 xii List of Abbreviations ANN Artificial Neural Network CNN Convolutional Neural Network Conv Convolutional DL Deep Learning DTM Depth to Mate Value DTZ Depth to Zeroing-Move Value EL Expansion Learning FC Fully Connected ML Machine Learning TL Transfer Learning WDL Win Draw Loss xiii List of Symbols Symbol Description m Number of pieces in a given state mp Number of piece types in a given state X The feature space of all possible board states Xm The feature space of all m piece board states Y WDL value label space Dm Domain of all m piece board states X Board state random variable x A given board state y A given WDL value Pm(X) The probability of sampling an m-piece board state from X Dm Dataset of m-piece board states dm;i = (xi; yi) Datapoint i in Dm Tm Training task fm(·) The objective predictive function mapping states in Xm to labels φ(x) = y A neural net with input x and output y φm A network trained on Dm φm!k A network trained on Dk with transfer from φm φrnd An untrained network initialized with random weights ceil(z) Rounds z 2 R up to next integer Prnd(y) Probability of φrnd outputting label y given a random input PXm (y) The probability of sampling a state with label y from Xm n Number of transferred layers φAnB A network trained on DB with n transferred layers from φA AnB Same as φAnB.

Evaluating Knowledge Transferability in Chess Endgames Using Deep Neural Networks

Yale Yechiel N. Robinson, Patent Attorney Email: [email protected]

Shredder User Manual

A Case Study of a Chess Conjecture

Glossary of Chess

Python-Chess Release 0.23.6

A Complete Chess Engine Parallelized Using Lazy SMP

Python-Chess Release 1.6.1

Estimating Total Search Space Size for Specific Piece Sets in Chess

Playing the Perfect Kriegspiel Endgame

101 Tactical Tips

The Lasker Method to Improve in Chess

Python-Chess Release 0.31.3 Unknown