DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2020

How does a general-purpose neural network with no domain knowledge operate as opposed to a domain-specific adapted engine?

ISHAQ ALI JAVID

KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF INDUSTRIAL ENGINEERING AND MANAGEMENT 1

Abstract—This report is about how a general-purpose neural network (LC0) operates compares to the domain-specific adapted (Stockfish). Specifically, to examine the depth and total simulations per move. Furthermore, to investigate how the selection of the moves are conducted. The conclusion was that Stockfish searches and evaluates a significantly larger amount of positions than LC0. Moreover, Stockfish analyses every possible move at a rather great depth. On the contrary, LC0 determines the moves sensibly and explores a few moves at a greater depth. Consequently, the argument can be made that a general-purpose neural network can conserve resources and calculation time that could serve us towards sustainability. However, training the neural network is not very environmentally friendly. Therefore, stakeholders should seek collaboration and have a general- purpose approach that could solve problems in many fields. 2

Sammanfattning—Denna rapport handlar om hur ett allmant¨ neuronnat¨ (LC0) som spelar schack fungerar jamf¨ or¨ med den domanspecifika¨ anpassade schackmotorn (Stockfish). Specifikt, att granska djupet samt totala simuleringar per drag for¨ att uppfatta hur dragen valjs¨ och varderas.¨ Slutsatsen var att Stockfish soker¨ och varderar¨ betydlig fler positioner an¨ LC0. Vidare, Stockfish forbrukade¨ mer resurser, alltsa˚ ungefar¨ sju ganger˚ mer elforbrukning.¨ Ett argument gjordes att ett allmant¨ neuronnat¨ har potentialen att spara resurser och hjalpa¨ oss mot ett hallbart˚ samhalle.¨ Men, det kostar mycket resurser att trana¨ neuronnaten¨ och darf¨ or¨ ska vi fors¨ oka¨ samarbeta for¨ att undvika onodiga¨ traningar¨ samt lara¨ fran˚ andras misstag. Slutligen, vi maste˚ strava¨ efter ett allmant¨ neuronnat¨ som ska kunna losa¨ manga˚ problem pa˚ flera falt.¨ 3

I.INTRODUCTION an expert human Go player. Deepmind trained the neural network on the games of expert human players. Later, they HESS is a two-player strategy game that has been played challenged Lee Sedol who had 18 international titles and was C and analyzed over a thousand years. The game involves considered by many as one of the best Go players of all time. no hidden information, i.e. everything that happens in the game AlphaGo defeated Lee Sedol 4-1. Later, the network received is evident for both players and just true skill decides the game. the name AlphaGo Lee [3]. In theory, the result of a game of chess by optimal play is a [1]. In the following year, Deepmind took a more general In most states of a chess game, there are various possible approach, they built a general neural network that masters moves and each move could be responded with numerous the game of Go, chess, and by self-play. Deepmind reasonable moves, and the process proceeds on and the used the same and network architecture for all move variations grow exponentially. Therefore, it is very three games. They built a general-purpose neural network challenging to always find the best moves even for computers. that had no domain- knowledge except the rules of the game. In early 1990 the computers could not beat the top-level chess The network was trained to start from random play and then, players since it was unmanageable to calculate all the states learning, and improving through self-play. and combinations efficiently. The IBM computer Deep blue was the first engine to beat a world chess human champion Shannon aspired more general machines that could when it defeated in 1997[2]. solve many problems through reasoning and sensibility. He explained that machines should be able to take other inputs has advanced greatly in the past decades. such as mathematical expressions, chess positions, words, Now it is well beyond human best players. Most of the etc. rather than plain numbers. A method that is developed engines use sophisticated search techniques, domain-specific on trials and errors rather than a strict computing process. adaptation, and handcrafted evaluation functions that have Besides, the machines should learn from the mistakes [5]. been refined by human experts over the decades [3]. Stockfish is an example that has been one of the strongest chess engines Shanon’s aspired approach was implemented by Deepmind in the past decade. It has won the most Top Chess Engine in some ways. Deepmind’s approach was to have a general- Championship (TCEC) in recent years and was considered as purpose and a general-purpose tree the best chess engine [4]. search algorithm. They built the neural network and the network got the name AlphaGo Zero (AlphaZero in chess). Stockfish is a rule-based chess engine with a “brute force” The general neural network outperformed all other engines in strategy that is based on numerical calculations and deep all three fields. AlphpaGo Zero outperformed AlphaGo Lee searches of the positions. Stockfish analyses every legal move with a score of 100-0. AlphaZero outperformed Stockfish in in a state of the game at a great depth. The strategy was 100 games with a score of 28 wins, 0 losses, and 72 draws [6]. defined as an inefficient way of playing chess by Shannon [5]. Shannon suggested a more humanlike approach for AlphaZero is owned by Deepmined and it is not available searching. A decent human chess player, given a “quite” for others. However, they published the pseudo-Code [6]. position (not in check or a piece about to be captured), then created a new chess engine based on considers a few of the possible moves and searches at a depth AlphaZero called (LC0) that is an open- of 1-4. However, grandmasters search at a depth of 10-25 in source and is available for experiments. LC0 has become one the forcing variations. Shannon’s idea was that the machine of the strongest chess engines right now. It defeated Stockfish should evaluate positions based on consistent interpretation in the latest TCEC to become the champion. and search sensibly, i.e. search few promising paths rather than “brute force” calculation. A general game-playing system has been a long-standing ambition in artificial intelligence. If a general-purpose neural Stockfish has managed to calculate and search a tremendous network can play highly complex games such as Go and number of positions rather efficiently; it can search 60 million chess beyond the superhuman level. Then, perhaps we are positions per second when competing at TCEC [4]. With nearby to fulfilling the ambition. modern computers, it is now possible to calculate many positions efficiently in the game of chess. However, other Most Machine Learning research is too focused on specific games such as Shogi and Go is far more complex than chess. and is implemented in specific areas [7]. A general Especially the game of Go, the possible positions that can approach is desired to implement in different parts of life occur from a state in the game of Go grow significantly faster including healthcare, manufacturing, education, financial than in chess. With today’s technology, it is not possible to modeling, policing, and marketing. The approach also could calculate the positions deep enough to achieve high-level play. lead to a more evidence-based decision-making process [8].

In 2016 Google’s Deepmind developed a neural network named AlphaGo that could outperform an expert-level player of Go. It was the first time that an engine could outperform 4

II.AIM parameter tuning. CLOP is an approach to local regression A. What is the purpose of the study? and is used to optimize the evaluation parameters. Discussions have been made that when the function to be optimized is The study will be divided into two parts, section x, and smooth, this method outperforms all other tested algorithms section y. [13]. Section x: Int this part, the focus is to compare the 2) How LC0 evaluates a position: LC0 and AlphaZero approach and algorithms of Stockfish and LC0. Specifically, evaluate each state with the neural network. The network how they evaluate each position, and how the searching is takes in board positions with features as input and outputs conducted? Mainly because those are the most challenging two vectors p and v (1). The vector p (2) is the probability aspect of a chess program. It is interesting to analyze how of moves that an expert level player would make given the the general-purpose neural network manages the challenges state(in the training process the neural network is also learning compared to a rule-based engine? from the moves that it is analyzing and develops a probability distribution of the moves that leads to good results and then Section y: In this part, we will use the results from section x contemplate it as “expert player move.”) [11]. The vector v is to evaluate the cost and benefits of the general-purpose neural the estimated value of the moves. If the expected outcome is network from an environmental viewpoint. i.e. from society’s z then the approximate value of the position is (3). perspective are the gains worth the expense of the training? s = board positions with features. v = a vector of values B. What is NOT the purpose of the study? p = a vector of move probabilities Which engine is better? The performance of the engines a = next move, given the position. is heavily reliant on the hardware it is being run. Thus, for (p, v) = f(s) (1) comparison of the performance, we compare the displays in TCEC, considering the optimal hardware and environment are pa = p(a|s) (2) applied [4]. v ≈ E(z|s) (3) III.SECTIONX 3) The search algorithm for Stockfish: Stockfish uses A. Background Alpha-beta pruning for its search algorithm. The algorithm is a significant enhancement to the search algorithm As described before, there are mainly two difficulties in by decreasing the number of nodes that are evaluated in computer chess. How to evaluate a position, and how to the search tree. It prunes away the leaves or even subtrees search. We start by clarifying the theory behind how the that do not influence the final decision since sometimes the engines evaluate each state of the game and how the searching guaranteed maximum and minimum of a tree is determined process is conducted. Starting by studying how Stockfish at some level and thus the other side of the subtree is not evaluate a position. required to be evaluated. In chess in each state, there often are various possible moves. It could be very expensive to 1) How Stockfish evaluates a position: Stockfish has an explore and examine all the variations. However, there often evaluation function that takes in the following parameters [10]. are few good moves and the goal being to determine the • Raw materials: having more pieces is better good moves and explore them at a much deeper depth and • Imbalance: non-symmetrical and imbalance creates more prune away the obvious bad moves in an early stage. results • Pieces: pieces have characteristics and are evaluated Figure 1 illustrates pruning in a minimax tree search. It differently based on the situations. e.g. Knights are good is white’s turn (i.e. the moves are made as follows: white, on outposts or bishop pair are better black, white, and black). There are two possible moves for • Mobility: the number of legal moves in the position white to start with, both are explored at depth of 3, and at the • King safety end, each position is evaluated. Starting at the bottom left, the • Threats: possible threats from the opponent previous move was white and white must choose between -1 • Passed pawns and 3. White chooses 3. In the next subtree, white must choose • Space: the number of squares your pieces cover and between 5 and some value. However, in the move before that, control it is black’s turn, and black knowing that white has a move Stockfish evaluates a state based on these parameters that gives at least 5, black will not take that path. Therefore, that have been distributed different weights. The wights the other side of the tree (light red) is pruned away [10]. are predefined and hardcoded in the program. The weights are different during the game, namely three phases, opening, middlegame, and endgame. e.g. in the opening and middlegame, the king’s safety is very high however in the endgame, an active king is preferred. Stockfish is implementing CLOP (confident local optimization) for 5

of positions that they can analyze for a state. Consequently, analyze how their display changes. These methods are fine to compare the engines with. However, the focus perhaps would be more on the performance rather than the approach.

For this experiment, the method to compare the engines was to analyze in general, how many positions on average is simulated in a state. Furthermore, to know how deep it searches and how are the paths considered and explored. This way we get some understanding of how the complexity is being handled. Besides seeing if the bad moves are identified early and that the focus is on the better moves instead. This is mainly to understand how a general algorithm handles these complex situations compare to the domain-specific algorithms.

Several things could affect the experiment in different ways such as a neural network for LC0, the computer for the Figur 1. Shows pruning in the minimax search algorithm. experiment, and other things. 1) The choice of a neural network for LC0: Leela chess zero is an open-source chess engine. Thus, people inside 4) The search algorithm for LC0: Leela uses a modified and outside of the LC0 team have created thousands of that is called the Probabilistic neural networks. The three main aspects of the networks Upper Confident Tree score. Every game state is a node in are the network structure, training data, and training procedure. the tree and each node has an estimated value and with a prioritized list of moves to consider, called the expert policy Network structure: that is, given a state how likely it is for an expert-level player A convolutional neural network that has been made in many to make each possible action. The policy and evaluation are different sizes. The size of the network is defined in the optimized by the neural network in the training runs. number of blocks and filters that are referred to as the height and width of the network. The size of the network is The search starts by picking the most promising move and often written as n x m, meaning n residual blocks and m filter. do rollouts, i.e. going from the start node and making moves until terminated state (a terminated state is a loss, draw or win). Training data: In chess, mostly, it does not reach to terminal nodes because LC0 uses reinforcement learning for its official runs which of the complexity of the game. Instead, we stop at some depth means, train a network, and use the network to generate and evaluate the last node and the path. The selection of the games and then use the games to generate better and stronger paths to take is a balance between choosing lucrative-high networks. The networks created by the LC0 team is called having high estimated values and being relatively unexplored- T-nets. T nets are the main training nets. These nets are having low visit counts [11]. At the end of the game, the entirely trained with self-played games. The data later will be parameters get adjusted to minimize the predicted outcome vt available for the LC0 community and they will try different and the game outcome z to maximize the similarity of policy training procedures on the data using a supervised learning vector pt to the search probability πt, where πt ∼ at(t = turn). approach and contribute runs which often results in an even Gradient descent is used to adjust the weights with a loss stronger network. function that sums over mean square error and cross-entropy losses (4) where c is a parameter controlling the level of L2 Training procedure: weight regularization. The optimal network depends on the size of hardware and time controls. Smaller net size is recommended for lower and in case of weak GPU or no GPU. The larger networks are better on GPU and it is slower but significantly stronger. The recommended networks by the author of LC0 for different scenarios are in appendix 1.1[11]. 2) The computer for the experiment and time controls: Both B. Method engines used the same computer. Details of the computer are The goal was to examine how each engine play. Some of in appendix 1.2 and the time controls for the games are in the methods that could be considered are as follows: How the appendix 1.3. engines operate under different time controls. Furthermore, 3) Other: Opening books were used to avoid duplicate perhaps even experiment with time odds. i.e. one engine gets games. If two chess engines are given the same resource for more time than the other. Another way was to set limitation of every game, the engines will play the same game repeatedly. resources for engines such as fixed depth and a fixed amount Therefore, opening books are needed. The opening book used 6 for the experiment can be accessed at [14]. king in front of the pawn. Let us observe how the engines examine the next move. First, we start with LC0. The engine Ponder was set to false: if ponder sets to false, it means the analyzes and outputs which moves were considered and engines were not allowed to calculate when it is not their turn. how many visits each move had. All king moves, the pawn It was to give each engine all calculating resources one at a move, and the rook move c8-c4 have zero visits (they all time since they both run on the same computer. lose the game). Rook to a7, b7, and f7 all have been visited once and then stopped (they are not good moves, they give C. Result white chances for drawing). Rook to e2 and d2, have 5 and 23 visits, respectively (they are inaccuracies but still could 1) Depth analysis: In this section, the depth and selective win the game, the rook is cutting the king that would block depth per move for both engines are compared and analyzed. the advance of the pawn). Finally, the rook exchange (it is Depth: the depth of a node in the tree. e.g. 1. e4 2. c7 3. Nf3. the best move) has been visited 274 times and presented as This is a depth of 3. For Stockfish, a “selective depth” in a the best move and an estimated winning chance of 99 percent. position is how deeply it goes in certain forcing variations that contain tactics, exchanges, checks, etc. [10]. For LC0 a “selective depth” is the maximum depth of a rollout it has Later, Stockfish starts to analyze the position. In 83 searched. milliseconds Stockfish has produced 2.3 million simulations. Every possible move is examined at a depth over 30 and As shown in figure 3, Stockfish has a depth somewhere has found a forced mate in 61 moves. In the endgame when between 30 to 50 in a state and LC0 has usually a depth of few pieces are on the board, Stockfish can use table bases 8 to 12 in a state of the game. Stockfish goes deeper in most and calculate nearly every path in the tree and determine the of the forcing variations. Besides, Stockfish has a significantly best one. However, LC0 still uses the neural network and greater depth in the endgame as shown in figure 4, since there has the same approach throughout the game. The argument are fewer possible moves and thus, it can calculate quite deep. can be made that the domain-specific adaptations are a big However, LC0s depth remains relatively constant throughout advantage in the endgame for Stockfish, which is generally the game. It should be mentioned that LC0, on most occasions better since precis calculation overtakes the sensible judgment. was better in the middlegames, having +1 advantage going into the endgame against Stockfish, and Stockfish managed In the opening and middlegame, there often are many pos- to hold, and the games ended in a draw. Stockfish normally sible moves, the accurate and inaccurate moves are less clear. has an advantage in the endgame. We analyze the endgame in Consequently, Stockfish needs to calculate many potential figure 2 to get a better understanding. moves and it is more challenging to find decent paths. On the other hand, LC0 uses the neural network and takes a few moves to explore and analyze. As shown in figure 5, generally, Stockfish searches 1000 times more nodes than LC0. Stockfish searches between 10 to 80 million total nodes. LC0 searches somewhere around 10 thousand to 100 thousand. This again is mainly due to the domain-specific adaptation advantage of Stockfish. Benefiting from the sophisticated search algorithm and the handcrafted evaluation functions that allow Stockfish to calculate very efficiently compared to the general-purpose neural network and search algorithm.

Figur 2. Black to move (a winning endgame for black). Both engines are given 0.083 seconds each to analyze the next move.

In this position, there are 18 possible moves and black Figur 3. Shows the depth from 10 rapid games between the engines. has an easy win by exchanging the rooks and bringing the 7

Figur 4. Shows the depth from one rapid game in different states of the game.

Figur 6. The Scandinavian opening and it is move no.10(black to move). Both engines are given 3 minutes each to analyze the next move.

6 simulations, LC0 is preferring g6. Later, from 6 to 75000 simulations, LC0 preferred c6 the most. In the end, c6 was presented as the best move that has been analyzed at a depth of 10. The more detailed outputs are illustrated in figure 7. Figur 5. Shows the total nodes (in millions) searched in each state of a game N(visits): Total number of playouts that have traversed these nodes compounded from 10 rapid games. Important: Logarithmic scale! Q: the average value of the path with a range of -1 to 1. P: The probability policy, i.e. how likely it is that an expert player makes the move. Stockfish is a remarkably strong chess engine, as shown U(UCB): Upper Confidence Bound, this is the part of the PCUT formula in figure 4, Stockfish searches sometimes at a depth of up that encourages exploring moves that have not been searched yet, a value to 90, which is truly amazing. It can find a force mate 90 close to zero means that it has been searched greatly. moves in advance and this on a regular computer. What might be even more fascinating is that LC0 with an average depth of 10 and a general-purpose neural network plays as good chess as Stockfish, if not better! To get a better understanding, we analyze a popular opening with both engines as shown in figure 6. The position has been analyzed by humans and engines in decades. The most popular choices have been c6, a6, and g6. There are 43 possible moves to consider in the position. Ho- wever, there are only a few good moves and many bad moves. We look at how these two engines proceed with the moves. To begin with Stockfish, after 4 million simulations, Stockfish has analyzed every move with a depth of 23 and considers g6 as the best. After 100 million simulations Stockfish has analyzed further every legal move at the depth of 30 and still considers Figur 7. Shows the moves LC0 is considering. (worse moves to the right(red), g6 as the best. Finally, after 220 million simulations, other best moves to the left(green). legal moves are still at depth 31 but the move c6 has been analyzed at the depth of 34 and selective depth of 51 that is LC0 starts to analyze every single move but as shown in presented as the best move for the position. the figure it drops some moves after few visits since the initial Later, LC0’s engine starts analyzing the position. After value is very low and the probability of an expert playing only 2 simulations LC0 is preferring the move c6. After 3- that move is also very low. However, a total of 71 000 visits 8 on the 2 best moves since they are quite encouraging and a B. Method high likelihood of an expert playing them. The number of We present a break-even-analysis from an environmental visits to these two moves is 95 percent of total visits. The viewpoint to evaluate the cost and benefits of the general- three worst moves according to LC0 are all queen sacrifices purpose neural network. for no real compensation. LC0 understands that right early, Break-even-analysis: A break-even-analysis is a tool to and in the end, has seen only around 15 positions for each determine at what point a project will be beneficial. We move at a depth of 2-4 to decide they are bad moves. On the calculate the value at which the net present value (NPV) is contrary, Stockfish analyzed all these moves at a depth of zero. i.e. when total cost meets total revenue, which is called 31 to hold searching for these moves. Although Stockfish’s the breakeven-point. There are two varieties of costs. Fixed search algorithm prunes away many unnecessary subtrees, costs, which is expenses that stay the same no matter the but still, the depth is quite high that takes a huge amount volume of the sale. Variable costs, which are the expenses of calculation resources. A human player might understand that fluctuate up and down with the sales [15]. that these are bad moves after two visits and would stop considering these moves. Furthermore, the player would The break-even-analysis is a great tool to get an understan- consider a few moves much deeper. The argument could be ding of the maximum cost of electricity we can put into made that LC0 plays more like a human and going back to training the neural network to not harm sustainability. As Shannon’s idea. mentioned, AlphaZero was trained on millions of games, we calculate how much electricity it cost to train the neural In conclusion, the main difference between the 2 engines is network. The quantity will be used as the cost of training that Stockfish analyzes every possible move at a great depth one neural network in the break-even analysis. LC0 is more and have significantly higher simulations. LC0 analyses the efficient in playing the game and has less energy consumption moves at a significantly lower depth and chooses few moves than Stockfish. We use the amount of energy that is preserved to analyze deeper. Stockfish may be very efficient in the by using LC0 instead of Stockfish for a year as revenue in the calculation, however, LC0 is far more efficient and precise in break-even-analysis. playing the game. Stockfish uses relatively more resources and have a higher amount of computation; this will be discussed C. Result in detail in section y. Deepmind’s AlphaZero played against Stockfish with time odds. i.e. AlphaZero had 1/10 of the First, we calculate how much electricity it cost to train one time that Stockfish had and still managed to defeat Stockfish neural network. e.g. AlphaZero. Next, we calculate how much by a large margin [6]. Here perhaps is a demonstration that energy is preserved if all of the chess players used LC0 instead the general-purpose neural network can potentially be more of Stockfish for a year. Finally, we calculate the number of efficient than the domain specific-adaptation approach. neural networks that we can train at the break-even-point. 1) The electricity cost of training AlphaZero: - 5000 first-generation TPUs were used for selfplay IV. SECTIONY - Trained for 9 hours A. Introduction → A total of 9 x 5000 = 45 000 TPU-hours In section x we concluded that the general-purpose neural - Power consumption of a first-generation TPUs is estima- network played the game more efficiently, and there is a ted at 40 W potential to preserve resources with the approach. However, - The total energy consumed is we have not yet discussed the resources that have been put → 45 000 x 40 = 1800000 into training these neural networks. Google invested a great That is a total cost of 1 800 kWh for one net. The amount of capital on Deepmind to create the general-purpose average household electricity consumption for an apartment neural network. The total cost for the project is far over in Sweden is 2500 kWh [16]. millions of dollars, considering the great number of TPUs required, the number of great engineers, the cost of the 2) The total preserved energy for a year: Here, we calculate location, and considerably more resources that cost a large the electricity cost of playing the game. The computer that was amount of capital. used for the experiment have a CPU power consumption of 85 W, and a GPU power consumption of 150 W. We use table In this section, we analyze how the project influences 1 to calculate the electricity cost for each engine. our society. Particularly, since the environment is a much- discussed subject in today’s society, we try to evaluate the costs and benefits from an environmental viewpoint. To clarify, we calculate the costs and benefits of electricity that the neural networks produce. The great resource consumption from Deepmind has other impacts on the environment than just electricity consumption. However, we assume that electricity The electricity cost for Stockfish: consumption is the only cost of training. Because we calculate the cost only in chess and only in terms of electricity. 0.3 × 85 = 25.5W (4) 9

The electricity cost for LC0: D. Discussion

0.025 × 85 + 0.007 × 150 = 3.175W (5) As mentioned, the general-purpose neural network may be more efficient and has lower computation expense when playing the game, but we must consider the number of neural The amount of electricity preserved if LC0 is used instead networks that are being trained. Deepmind trained many of Stockfish: different neural networks that each played millions of games. 25.5 − 3.175 = 22.325W (6) LC0 is open-source, and since AlphaZero there have been thousands of different neural networks and each network Now that we have the preserved electricity for a second, we has been trained by different contributors. Consequently, the need to calculate the total amount of electricity preserved in a total resources consumed could be tremendous that may not year. For that, we must estimate how often Stockfish is used be environmentally friendly. As shown in the break-even- for analysis by chess players. We start by counting the amount analysis, to be environmentally friendly we can only train of active chess players. The total active online chess players hundreds of nets per year. Thus, we must choose and develop a week is: nets wisely and more efficiently. - Chess.com, 3.5 million - .org, 1 million Although AI has the potential of solving sustainability - Chess24.com, 0.5 million challenges, we may not be sustainably working with AI. In → A total of 5 million active players per week. the paper Green AI, it has been stated that the computation required for deep learning research has been increasing If we make the assumption that in a week, each player uses exceptionally fast. These computations have a large carbon Stockfish for 5 minutes to analyze their game (This is a footprint [17]. According to a paper by Emma Strubell, a rough assumption! Many good players analyze their game with machine translation model that uses the technique of neural Stockfish for hours and others never uses Stockfish). → The architecture search (NAS), ran for almost a billion steps preserved electricity by one player in a week is: during training and development. The training was responsible 5 for an estimated 626,000 CO2 emissions(lbs.) [18]. 22.325 × ≈ 1.86W h (7) 60 In today’s AI research the focus is on the accuracy and Thus, the total preserved electricity in one week is: result of neural networks, but we must consider the number of resources that go into the training of these networks, and 1.86 × 5000000 ≈ 9000000W h (8) the effect on the environment. We should make AI greener and more inclusive. We should share the documentation of Finally, the total preserved electricity for a year is: the training method and make it available for others that they can learn from and improve it. Consequently, to avoid 5000000 × 52 ≈ 468000000W h (9) wasting resources with the same method. Furthermore, we should seek a more general-purpose neural network that can That is a total of 468 000 kWh. Now that we have the total be used in several fields. The stakeholders should attempt to cost and the total revenue, we calculate the break-even-point. cooperate and have a bigger perspective by researching with 3) The number of neural network that we can train: The a general-purpose approach rather than a domain-specific amount of neural network that we can train in a sustainable adaptation. way is given by the break-even-point. Break-even-point = Total revenue - Total cost = 0 As observed earlier, a general-purpose neural network p = Total electricity preserved. can play a complex game at a very high level. Though in c = the electricity cost of one neural network chess and Go there is no hidden information, it is easier for n = The amount of neural networks a neural network to train. In real-world problems, there is hidden information, incomplete data and therefore it is far p − c × n = 0 (10) more challenging. Although we might have gotten a step closer. OpenAI trained neural networks to play the game of Solving n in (10) gives us 260, that is the number of neural Dota 2 [19]. The game involves hidden information and is networks that can be trained by the preserved electricity. heavily dependent on teamwork and cooperation strategies. The neural network bots played in a team and managed In summary, the electricity cost of training AlphaZero is to defeat a team of human world champions (5 bots vs. 5 estimated at 1 800 kWh. Stockfish consumes 22W more energy humans). than LC0. According to the assumption that would be a total of 468 000 kWh in a year. That is the amount of energy that If we have managed to achieve excellent results with might be preserved if we used LC0 instead of Stockfish. The neural networks in such complex games, then perhaps, we break-even-point gives us that we can train a total amount of are closer to tackle complex real-world problems such as 260 neural networks a year for the game of chess. healthcare, education, finance, manufacturing, etc. Something 10

that perhaps future studies can focus on is to study how these REFERENSER general-purpose neural networks can help us towards a more [1] W. Steinitz, “The modern Chess Instructor” (Edition Olms, 1990). sustainable society. [2] M. Campbell, A. J. Hoame Jr., F. Hsu “Artif, Intell” 134, 57-83(2002). [3] D. Silver et al.,” A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play”, 2018,” [Online]. Available at: https://science.sciencemag.org/content/sci/362/6419/1140.full.pdf [Acces- V. CONCLUSION sed 30 July 2020]. Comparing Stockfish and LC0 shows that a general-purpose [4] G. Haworth and N. Hernandez, “TCEC Season 14 -the 14th Top Chess Championship,” 2019. [Online]. Available at: neural network can be very efficient in a complex game such http://www.chessdom.com/TCEC-Season-14-the-14th-Top-Chess- as chess and perhaps we are closer to the desire of Machine Engine-Championship [Accessed 30 July 2020]. Learning to have a system that could [5] C. E. Shannon, “London Edinburgh Dublin Philos. Mag. J. Sci.” 41, 256- 275(1950). learn to master any game. Furthermore, to learn from these [6] D. Silver et al.,” Supplementary material for a general networks and try to implement them in real-world problems. reinforcement learning algorithm that masters chess, shogi, and Go through self-play”, 2018,” [Online]. Available at: VI.AUTHOR https://science.sciencemag.org/content/sci/suppl, [Accessed 30 July 2020]. Javid Ishaq Ali, a student at KTH Royal Institute of Techno- [7] K. L. Wagstaff, “Machine learning that matters,” [Online]. Available at: logy in Stockholm Sweden, majoring in industrial engineering https://arxiv.org/ftp/arxiv/papers/1206/1206.4656.pdf [Accessed 30 July 2020]. and management. [8] M. I Jordan and T.M. Mitchell, “Machine learning: Trends, perspectives, and prospects,” 2015. [Online]. Available at: BILAGA A https://science.sciencemag.org/content/349/6245/255 [Accessed 30 July 2020]. APPENDIX 1 [9] Stockfish: Open source chess engine. 2020 [online] Available at: 1.1 https://Stockfishchess.org/ [Accessed 30 July 2020]. 30 blocks: Recommended for multi-GPU (RTX), long analy- [10] Stockfish: Source code 2020 [online] Available at: https://github.com/official-Stockfish/Stockfish [Accessed 30 July 2020]. sis, or when speed is not a major factor [11] LC0: Open source chess engine. 2020 [online] Available at: 24 blocks: Recommended for TC bigger than 1 minute per https://lczero.org/play/quickstart/ [Accessed 30 July 2020]. move with an RTX card [12] LC0: Source code 2020 [online] Available at: https://github.com/LeelaChessZero/LC0 [Accessed 30 July 2020]. 20 blocks: Recommended for running on non-RTX cards or [13] R.Coulom “CLOP: Confident Local Optimization for Noisy TC on the order of seconds (with RTX) Black-Box Parameter Tuning,” 2011. [Online]. Available at: 10 blocks: Recommended for running on CPU https://link.springer.com/chapter/10.1007/978-3-642-31866-5 13 [Accessed 30 July 2020]. 1.2 [14] The Cerebellum opening book, [online] Available at: https://zipproth.de, The computer that the engines ran on for the experiment: [Accessed 30 July 2020]. GPU type: AMD Radeon (TM) R9 380 Series. [15] J. Berk, P. Demarzo, “Corporate Finance”, 291-292(Fourth edition, 2017). [16] Energiradgivaren,˚ [online] Available at: https://www.energiradgivaren.se/2011/09/elforbrukning-i-en- genomsnittlig-villa-respektive-lagenhet/ [Accessed 30 July 2020]. [17] R. Schwartz et al.,” Green AI”, 2019,” [Online]. Available at: https://arxiv.org/pdf/1907.10597.pdf [Accessed 30 July 2020]. [18] E. Strubell et al.,” Energy and Policy consideration for Deep Learning in NLP”, 2019,” [Online]. Available at: https://arxiv.org/pdf/1906.02243.pdf [Accessed 30 July 2020]. [19] OpenAI, 2019,” [Online]. Available at: https://openai.com/blog/openai- five-defeats-dota-2-world-champions/ [Accessed 30 July 2020].

Chess GUI: Arena Chess GUI LC0 version: LC0-v0.25.1-windows-GPU-OpenCL Network: different for different time controls For classical games: the main run of the T60 net with 24 blocks and 320 filters Stockfish version: Stockfish 20011801 x64

1.3 Classical: 60+45 (60 minutes per game plus 45 seconds increment per move) Rapid: 15+10 (15 minutes per game plus 10 seconds increment per move) Blitz: 3+2 (3 minutes per game plus 2 seconds increment per move) Bullet: 1+0 (1 minute per game plus 0 seconds increment per move) TRITA -EECS-EX-2020:625

www.kth.se