How Does a General-Purpose Neural Network with No Domain Knowledge Operate As Opposed to a Domain-Specific Adapted Chess Engine?
Total Page:16
File Type:pdf, Size:1020Kb
DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2020 How does a general-purpose neural network with no domain knowledge operate as opposed to a domain-specific adapted chess engine? ISHAQ ALI JAVID KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF INDUSTRIAL ENGINEERING AND MANAGEMENT 1 Abstract—This report is about how a general-purpose neural network (LC0) operates compares to the domain-specific adapted chess engine (Stockfish). Specifically, to examine the depth and total simulations per move. Furthermore, to investigate how the selection of the moves are conducted. The conclusion was that Stockfish searches and evaluates a significantly larger amount of positions than LC0. Moreover, Stockfish analyses every possible move at a rather great depth. On the contrary, LC0 determines the moves sensibly and explores a few moves at a greater depth. Consequently, the argument can be made that a general-purpose neural network can conserve resources and calculation time that could serve us towards sustainability. However, training the neural network is not very environmentally friendly. Therefore, stakeholders should seek collaboration and have a general- purpose approach that could solve problems in many fields. 2 Sammanfattning—Denna rapport handlar om hur ett allmant¨ neuronnat¨ (LC0) som spelar schack fungerar jamf¨ or¨ med den domanspecifika¨ anpassade schackmotorn (Stockfish). Specifikt, att granska djupet samt totala simuleringar per drag for¨ att uppfatta hur dragen valjs¨ och varderas.¨ Slutsatsen var att Stockfish soker¨ och varderar¨ betydlig fler positioner an¨ LC0. Vidare, Stockfish forbrukade¨ mer resurser, alltsa˚ ungefar¨ sju ganger˚ mer elforbrukning.¨ Ett argument gjordes att ett allmant¨ neuronnat¨ har potentialen att spara resurser och hjalpa¨ oss mot ett hallbart˚ samhalle.¨ Men, det kostar mycket resurser att trana¨ neuronnaten¨ och darf¨ or¨ ska vi fors¨ oka¨ samarbeta for¨ att undvika onodiga¨ traningar¨ samt lara¨ fran˚ andras misstag. Slutligen, vi maste˚ strava¨ efter ett allmant¨ neuronnat¨ som ska kunna losa¨ manga˚ problem pa˚ flera falt.¨ 3 I. INTRODUCTION an expert human Go player. Deepmind trained the neural network on the games of expert human players. Later, they HESS is a two-player strategy game that has been played challenged Lee Sedol who had 18 international titles and was C and analyzed over a thousand years. The game involves considered by many as one of the best Go players of all time. no hidden information, i.e. everything that happens in the game AlphaGo defeated Lee Sedol 4-1. Later, the network received is evident for both players and just true skill decides the game. the name AlphaGo Lee [3]. In theory, the result of a game of chess by optimal play is a draw [1]. In the following year, Deepmind took a more general In most states of a chess game, there are various possible approach, they built a general neural network that masters moves and each move could be responded with numerous the game of Go, chess, and Shogi by self-play. Deepmind reasonable moves, and the process proceeds on and the used the same algorithm and network architecture for all move variations grow exponentially. Therefore, it is very three games. They built a general-purpose neural network challenging to always find the best moves even for computers. that had no domain- knowledge except the rules of the game. In early 1990 the computers could not beat the top-level chess The network was trained to start from random play and then, players since it was unmanageable to calculate all the states learning, and improving through self-play. and combinations efficiently. The IBM computer Deep blue was the first engine to beat a world chess human champion Shannon aspired more general machines that could when it defeated Garry Kasparov in 1997[2]. solve many problems through reasoning and sensibility. He explained that machines should be able to take other inputs Computer chess has advanced greatly in the past decades. such as mathematical expressions, chess positions, words, Now it is well beyond human best players. Most of the etc. rather than plain numbers. A method that is developed engines use sophisticated search techniques, domain-specific on trials and errors rather than a strict computing process. adaptation, and handcrafted evaluation functions that have Besides, the machines should learn from the mistakes [5]. been refined by human experts over the decades [3]. Stockfish is an example that has been one of the strongest chess engines Shanon’s aspired approach was implemented by Deepmind in the past decade. It has won the most Top Chess Engine in some ways. Deepmind’s approach was to have a general- Championship (TCEC) in recent years and was considered as purpose reinforcement learning and a general-purpose tree the best chess engine [4]. search algorithm. They built the neural network and the network got the name AlphaGo Zero (AlphaZero in chess). Stockfish is a rule-based chess engine with a “brute force” The general neural network outperformed all other engines in strategy that is based on numerical calculations and deep all three fields. AlphpaGo Zero outperformed AlphaGo Lee searches of the positions. Stockfish analyses every legal move with a score of 100-0. AlphaZero outperformed Stockfish in in a state of the game at a great depth. The strategy was 100 games with a score of 28 wins, 0 losses, and 72 draws [6]. defined as an inefficient way of playing chess by Shannon [5]. Shannon suggested a more humanlike approach for AlphaZero is owned by Deepmined and it is not available searching. A decent human chess player, given a “quite” for others. However, they published the pseudo-Code [6]. position (not in check or a piece about to be captured), Programmers then created a new chess engine based on considers a few of the possible moves and searches at a depth AlphaZero called Leela Chess Zero (LC0) that is an open- of 1-4. However, grandmasters search at a depth of 10-25 in source and is available for experiments. LC0 has become one the forcing variations. Shannon’s idea was that the machine of the strongest chess engines right now. It defeated Stockfish should evaluate positions based on consistent interpretation in the latest TCEC to become the champion. and search sensibly, i.e. search few promising paths rather than “brute force” calculation. A general game-playing system has been a long-standing ambition in artificial intelligence. If a general-purpose neural Stockfish has managed to calculate and search a tremendous network can play highly complex games such as Go and number of positions rather efficiently; it can search 60 million chess beyond the superhuman level. Then, perhaps we are positions per second when competing at TCEC [4]. With nearby to fulfilling the ambition. modern computers, it is now possible to calculate many positions efficiently in the game of chess. However, other Most Machine Learning research is too focused on specific games such as Shogi and Go is far more complex than chess. algorithms and is implemented in specific areas [7]. A general Especially the game of Go, the possible positions that can approach is desired to implement in different parts of life occur from a state in the game of Go grow significantly faster including healthcare, manufacturing, education, financial than in chess. With today’s technology, it is not possible to modeling, policing, and marketing. The approach also could calculate the positions deep enough to achieve high-level play. lead to a more evidence-based decision-making process [8]. In 2016 Google’s Deepmind developed a neural network named AlphaGo that could outperform an expert-level player of Go. It was the first time that an engine could outperform 4 II. AIM parameter tuning. CLOP is an approach to local regression A. What is the purpose of the study? and is used to optimize the evaluation parameters. Discussions have been made that when the function to be optimized is The study will be divided into two parts, section x, and smooth, this method outperforms all other tested algorithms section y. [13]. Section x: Int this part, the focus is to compare the 2) How LC0 evaluates a position: LC0 and AlphaZero approach and algorithms of Stockfish and LC0. Specifically, evaluate each state with the neural network. The network how they evaluate each position, and how the searching is takes in board positions with features as input and outputs conducted? Mainly because those are the most challenging two vectors p and v (1). The vector p (2) is the probability aspect of a chess program. It is interesting to analyze how of moves that an expert level player would make given the the general-purpose neural network manages the challenges state(in the training process the neural network is also learning compared to a rule-based engine? from the moves that it is analyzing and develops a probability distribution of the moves that leads to good results and then Section y: In this part, we will use the results from section x contemplate it as “expert player move.”) [11]. The vector v is to evaluate the cost and benefits of the general-purpose neural the estimated value of the moves. If the expected outcome is network from an environmental viewpoint. i.e. from society’s z then the approximate value of the position is (3). perspective are the gains worth the expense of the training? s = board positions with features. v = a vector of values B. What is NOT the purpose of the study? p = a vector of move probabilities Which engine is better? The performance of the engines a = next move, given the position. is heavily reliant on the hardware it is being run. Thus, for (p; v) = f(s) (1) comparison of the performance, we compare the displays in TCEC, considering the optimal hardware and environment are pa = p(ajs) (2) applied [4].