robotics

Article Interactive Robot for Playing Russian Checkers

Ekaterina E. Kopets 1, Artur I. Karimov 1,* , Georgii Y. Kolev 1, Lorenzo Scalera 2 and Denis N. Butusov 1

1 Youth Research Institute, ETU LETI, 197376 St. Petersburg, Russia; [email protected] (E.E.K.); [email protected] (G.Y.K.); [email protected] (D.N.B.) 2 Polytechnic Department of Engineering and Architecture, University of Udine, 33100 Udine, Italy; [email protected] * Correspondence: [email protected]; Tel.: +7-950-047-2974

 Received: 3 November 2020; Accepted: 8 December 2020; Published: 9 December 2020 

Abstract: Human–robot interaction in board games is a rapidly developing field of robotics. This paper presents a robot capable of playing Russian checkers designed for entertaining, training, and research purposes. Its control program is based on a novel unsupervised self-learning algorithm inspired by AlphaZero and represents the first successful attempt of using this approach in the checkers game. The main engineering challenge in mechanics is to develop a board state acquisition system non-sensitive to lighting conditions, which is achieved by rejecting computer vision and utilizing magnetic sensors instead. An original robot face is designed to endow the robot an ability to express its attributed emotional state. Testing the robot at open-air multiday exhibitions shows the robustness of the design to difficult exploitation conditions and the high interest of visitors to the robot.

Keywords: draughts; checkers; robotics; AlphaZero; human–robot interaction

1. Introduction Designing robots playing games with human opponents is a challenge. Among the earliest examples of playing robots dates back to the XVIII century when Wolfgang de Kempelen presented the first -playing robot called “The Turk” or “Automaton Chess Playing” [1,2]. This automaton consisted of a mechanical puppet costumed as a Turkish sorcerer seated at a , awaiting the challenges of living opponents. It was in fact not autonomous at all since the mechanical arm was operated by human hiding inside the machine. The first successful automaton “El Ajedrecista” was introduced in 1912 by Leonardo Torres y Quevedo [3]. El Ajedrecista was a true automaton built to play chess without the need for human guidance. Since then, no other examples of playing robots have been recorded until the digital revolution and the introduction of computers. Nowadays, the design of playing robots is strictly related to the use of advanced approaches of automated control, artificial intelligence and human–machine interface design. Games that involve active interaction with physical media such as table tennis or pool require developing sophisticated automatic control algorithms rather than intelligent decision-making programs. Examples of such robots include a table tennis robot based on an industrial KUKA robot by J. Tebbe et al. [4], a self-learning table tennis robot by M. Matsushima et al. [5], an autonomous air hockey robot by M. Lakin [6], and a dual-armed pool robot by T. Nierhoff et al. [7]. Other games such as chess and Go do not necessarily need a physical implementation of a robotic player, but designing the decision-making algorithms for such games is an extremely difficult task. Examples of such game machines are the chess computer Deep Blue [8], capable of winning against a world chess champion Garry Kasparov, and AlphaGo [9], Google’s AI Program that beat the Chinese GoMaster Ke Jie. Nevertheless, it is proposed that the physical implementation of a robotic player in such games improves the user experience

Robotics 2020, 9, 107; doi:10.3390/robotics9040107 www.mdpi.com/journal/robotics Robotics 2020, 9, 107 2 of 15 and the game attractiveness to human players [10]. Moreover, manipulating non-instrumented arbitrary-shaped pieces on a board in changing environmental conditions is a challenging task itself. In [11], the autonomous chess-playing system Gambit capable of playing physical board games against human opponents in non-idealized environments was presented. In [12], the authors presented a four degree-of-freedom chess-playing robotic manipulator based on a computer vision chess recognition system, whereas in [13,14], a Baxter robot was used to play chess in unconstrained environments by dynamically responding to changes. Other recent examples are the manipulator for Chinese chess presented in [15], the didactic robotic arm for playing chess using an artificial vision system described in [16], and the Chinese chess robotic system for the elderly using convolutional neural networks shown in [17]. Moreover, in [18], the implementation of collaborative robots for a chess-playing system that was designed to play against a human player or another robot was experimentally evaluated. In order to make the board game design easier and fully independent from lighting conditions, some authors propose using an array of Hall sensors for detecting pieces on the board [19]. It is also important that robots with advanced mimics can transmit non-verbal information to human players, which can be also used in gameplay [20,21]. In our work, we pursued a goal to implement a small-sized Russian checkers robot that would have a simple design and be capable of playing checkers on a flexible level. We created a robotic setup consisting of the Dobot 3-DoF manipulator with a custom magnetic-sensor checkers board, control electronics, and a laptop. We developed an original game algorithm using basic features of the renowned AlphaZero, which proved to be efficient in the game Go and then was extended to chess and other games [22]. The game difficulty level is easily selected by choosing the generation of the neural network controlling the move selection process in our algorithm. To our knowledge, this is the first successful attempt to utilize the AlphaZero approach to the Russian checkers game. We obtained an unsupervised self-learning program that was proven to improve its playing capabilities after a number of iterations in our experiment. The main engineering finding of our research is utilizing Hall sensors instead of computer vision (CV). In table game applications, computer vision is a relatively common and sometimes overestimated solution suffering from sensitivity to light conditions and requiring proper calibration. Our solution utilizes the properties of the game which allows distinguishing the game state after a human player moves without using any special checker identification, using only binary information on the presence/absence of the checker on a field. Therefore, more reliable and robust magnetic sensors were used instead of computer vision. As the main result, we created a highly transportable, friendly-designed robot, highly versatile in its playing skills. The main challenges in the reported work were creating a novel playing algorithm, reliable hardware, and combining them in a machine, and the aim of our project is conducting a number of studies in human–machine interaction, especially in using robots for education. While the popularity of such classical games as chess and checkers is threatened by video games, using robotized assistants can raise interest in table games among young players, and taking advantage of AI and the robotic environment, they might be very useful in teaching and training classical table games. Briefly, the main contributions of our study are as follows:

A novel checkers board with Hall sensors was designed, which allowed us to avoid • CV implementation. Innovative hardware setup was designed including a control board and an animated face. • A novel checkers game program was designed based on a deep learning approach inspired by • AlphaZero algorithm.

The rest of the paper is organized as follows: Section2 describes the robot setup and the experimental testbed. Section3 presents the AlphaZero based algorithm allowing the robot to play Russian checkers. Finally, a discussion on our project in human–machine interaction (HMI) and conclusions are given in Section4. Robotics 2020, 9, 107 3 of 15 Robotics 2020, 9, x 3 of 16

2.2. RoboticRobotic SetupSetup TheThe checkerscheckers robotrobot describeddescribed inin thethe paperpaper hashas aa simplesimple designdesign andand containscontains thethe followingfollowing components:components:

• ActuatorActuator (robotic(robotic arm); arm); •• ControlControl electronics; electronics; •• CheckersCheckers fieldfield module; module; •• FacialFacial modulemodule (robot’s (robot’s head); head); •• Laptop. Laptop. • As the main mechanical part, the 3-DoF Dobot robotic arm with a repeatability of 0.2 mm was As the main mechanical part, the 3-DoF Dobot robotic arm with a repeatability of 0.2 mm was used, used, equipped with a pneumatic gripper for capturing and moving checkers. It is powered and equipped with a pneumatic gripper for capturing and moving checkers. It is powered and controlled controlled with an Arduino-based electronic unit. Two PCBs are used in the design: the control with an Arduino-based electronic unit. Two PCBs are used in the design: the control electronics module electronics module PCB and the checkers field PCB. Due to splitting the electronics into two PCB and the checkers field PCB. Due to splitting the electronics into two independent physical parts, independent physical parts, the checkers board can be replaced in the future without changing the the checkers board can be replaced in the future without changing the control electronics module. control electronics module. The working area of the robotic arm is limited due to its compact size, so The working area of the robotic arm is limited due to its compact size, so a 15 cm 15 cm custom a 15 cm × 15 cm custom checkers board is used. Equipped with magnetic checker sensors,× it is placed checkers board is used. Equipped with magnetic checker sensors, it is placed in front of the arm. in front of the arm. The robot’s head is placed on a neck mounted on the robotic arm basement and The robot’s head is placed on a neck mounted on the robotic arm basement and has a face with 8 8 has a face with 8 × 8 LED matrices for eyes and eyebrows for expressing a wide range of emotions:× LED matrices for eyes and eyebrows for expressing a wide range of emotions: surprise, happiness, surprise, happiness, anger, etc. while remaining schematic enough to avoid the “uncanny valley” anger, etc. while remaining schematic enough to avoid the “uncanny valley” effect [23]. A laptop with effect [23]. A laptop with Core i5 CPU and Win10 operational system running the checkers program Core i5 CPU and Win10 operational system running the checkers program is connected to the robot via is connected to the robot via UART COM-port emulator above USB protocol. Figure 1 shows the 3D UART COM-port emulator above USB protocol. Figure1 shows the 3D model of the main parts of model of the main parts of the robotic setup. The custom body details are made of Dibond composite the robotic setup. The custom body details are made of Dibond composite panels. The body of the panels. The body of the checkers robot consists of the following parts: a robotic arm and its platform, checkers robot consists of the following parts: a robotic arm and its platform, a rack for a control PCB, a rack for a control PCB, a rack for a checkers field PCB, a checkers field cover, a compartment for a rack for a checkers field PCB, a checkers field cover, a compartment for captured checkers, a robot captured checkers, a robot head with a face. The checkers and some auxiliary mechanical parts are head with a face. The checkers and some auxiliary mechanical parts are printed from PLA plastic printed from PLA plastic using the 3D printer. using the 3D printer.

(a) (b)

FigureFigure 1.1.The The 3D3D modelmodel ofof thethe robot,robot, withwith 1515 cm cm × 1515 cmcm customcustom checkerscheckers field,field, roboticrobotic arm,arm, andand thethe × headhead withwith aa face:face: ((aa)) 3D3D modelmodel ofof thethe assembledassembled robot;robot; ( b(b)) exploding exploding assemblies. assemblies.

FigureFigure A1A1 (see(see Appendix Appendix AA)) presents presents control control el electronicsectronics schematics schematics for for the the checkers checkers robot. robot. The Theelectronic electronic unit unituses usesthree three A4988 A4988 drivers drivers for Nema for Nema 17 stepper 17 stepper motors motors and has and a driver has a driverand a slot and for a slotan foradditional an additional stepper stepper motor. motor Additionally,. Additionally, the theboard board controls controls two two SG90 SG90 servo servo motors motors for thethe eyebrows and two MG996R servo motors for the robot neck. Two additional optional SG90 servo motors can be used for mounting a fingered gripper (not used in the current project). The Dobot

Robotics 2020, 9, x 4 of 16 original angle position sensors are not used. Instead, limit switches based on TLE4945L bipolar Hall sensors and neodymium magnets are used for more precise initial positioning. For object capturing, an air pump connected through an L293D driver with the microcontroller is used. The control board has three press buttons: reset, power, and settings. Two 8 × 8 LED matrices MAX7219 are connected to the control board for the robot eye imitation. The board state is obtained via an array of hall sensors below black cells (see Figure A2). The human player confirms his/her move via pressing the button which enforces the microcontroller to read the state of all Hall sensors and transmit a data packet to the laptop. To ensure the proper power supply of all 32 Hall sensors, a power unit should provide at least 15 W. The trajectory of the arm is calculated by the robot controller, and then the robot moves the checker to the desired cell. Since there are no strong limitations for the speed of computation and the Robotics 2020 9 precision of, the, 107 robot moves, the inverse kinematics problem can be solved directly on the controller.4 of 15 The solution utilizes the inverse Jacobian approach, briefly outlined as follows. The kinematic chain iseyebrows depicted andin Figure two MG996R 2a, where servo the dependent motors for angle therobot neck. 270° Two provides additional the optional platform SG90 where servo the endmotors effector can be(pump used nozzle) for mounting is mounted a fingered to be gripperstrictly horizontal. (not used inThe the angle current project).is not shown The Dobotin the figureoriginal since angle it stands position for sensors the rotation are not of used.the vertical Instead, axis limit of the switches robot basement. based on TLE4945L bipolar Hall sensorsLet and neodymium(, , ) be magnetsthe end areeffector used forposition, more preciseand θ initial(, positioning.,) be the For angular object capturing, position vector.an air pumpThen, connectedthe Jacobian through matrix an is L293Dcomputed driver as with the microcontroller is used. The control board has three press buttons: reset, power, and settings. Two 8 8 LED matrices MAX7219 are connected to () ×. the control board for the robot eye imitation. Then,The board the dynamics state is obtained for the via forwar an arrayd kinematics of hall sensors is calculated below black as cells (see Figure A2). The human player confirms his/her move via pressing the button which enforces the microcontroller to read the state of all Hall sensors and transmit a data packets to(θ the)θ. laptop. To ensure the proper power supply of all 32Thus, Hall the sensors, inverse a power kinematics unit should dynamics provide can be at calculated least 15 W. as The trajectory of the arm is calculated by the robot controller, and then the robot moves the Δθ J(s∗ s ) checker to the desired cell. Since there are no strong limitations , for the speed of computation and the whereprecision Δθ of is the the robot increment moves, of the the inverse current kinematics angle θ problemθ Δθ can be during solved the directly robot onmovement, the controller. and sThe∗ is solutionthe desired utilizes position. the inverse The path Jacobian is represented approach, as briefly the fastest outlined trajectory as follows. for achieving The kinematic the desired chain is ∗ positiondepicted in. FigureIt is not2a, necessary where the to dependent provide any angle specialθ3 = pa270th ◦shapeθ1 providesdue to the the ab platformsence of whereobstacles the and end − singularitieseffector (pump in the nozzle) working is mounted field of the to be robot. strictly horizontal. The angle θ0 is not shown in the figure since it stands for the rotation of the vertical axis of the robot basement.

(a) (b)

FigureFigure 2. 2. CheckersCheckers robotic robotic setup: setup: ( (aa)) kinematic kinematic chain, chain, ( (bb)) manufactured manufactured setup setup view. view.

If,Let durings = (x ,itsy, zmove,)> be thethe endrobot eff capturesector position, an oppone and θnt’s= ( θcheckers,0, θ1, θ2) >then,be the at the angular end positionof its move, vector. it successivelyThen, the Jacobian picks matrixup all isthe computed captured as checkers and throws them into the special compartment representing a rectangular deepening in the robot basement. ! Figure 2b shows the robotic setup. ∂si J(θ)ij = . ∂θj

Then, the dynamics for the forward kinematics is calculated as

. . s = J(θ)θ.

Thus, the inverse kinematics dynamics can be calculated as

1 ∆θn = J− (s∗ sn), − where ∆θn is the increment of the current angle θn+1 = θn + ∆θn during the robot movement, and s∗ is the desired position. The path is represented as the fastest trajectory for achieving the desired position Robotics 2020, 9, 107 5 of 15

s∗. It is not necessary to provide any special path shape due to the absence of obstacles and singularities in the working field of the robot. If, during its move, the robot captures an opponent’s checkers, then, at the end of its move, it successively picks up all the captured checkers and throws them into the special compartment Robotics 2020, 9, x 5 of 16 representing a rectangular deepening in the robot basement. Figure2b shows the robotic setup. The program provides protection from improper moves. If If the the player performs performs a a wrong wrong move, move, the program interface warns with a sound and message.message. The player then needs to change the move and press the red button again. The varietyvariety of of the the robot’s robot’s facial facial expressions expressions is depicted is depicted in Figure 3in. TheFigure basic 3. emotions The basic programmed emotions programmedfor the robot, for following the robot, the following famous concept the famous of Ekman concept and of Friesen,Ekman and are: Friesen, neutral, are: happy, neutral, sad, happy, angry, sad,surprise, angry, fear, surprise, and disgust. fear, and The disgust. main facial The featuresmain facial imitating features emotional imitating states emotional follow states FLOBI follow robot FLOBIfeatures robot [24]. features [24].

Figure 3. FacialFacial expressions expressions of the checkers robot rendering its emotional state.

The eyebrow servo motor positions and facial LED matrices matrices images images are are controlled controlled by by the the laptop, laptop, which also plays back a voice remarks attributed to the robot corresponding to to its its emotional emotional state. state. For instance, instance, when when a a robot robot captures captures an an opponent’s opponent’s checker, checker, it imitat it imitateses the the emotion emotion of “happiness” of “happiness” on itson face its face and and plays plays a voice a voice commentary commentary of the of correspon the correspondingding emotion; emotion; when when a player a player breaks breaks the rules, the therules, robot the robotsimulates simulates the emotion the emotion of anger of anger on its on face its and face plays and plays a voice a voice comment comment that thatthe rules the rules are violated.are violated. The Thecontrol control program program for forthe the robot robot running running on on the the laptop laptop is is written written based based on on a custom implementation of the AlphaZero program.program.

3. Algorithm Algorithm for Playing Russian Checkers This sectionsection presents presents the the AlphaZero-based AlphaZero-based algorithm algorithm allowing allowing the robot the torobot play Russianto play checkers.Russian Thischeckers. algorithm This algorithm does not needdoes not any need input any data input for trainingdata for training except the except game the rules, game which rules, is which a special is a specialfeature feature of AlphaZero-based of AlphaZero-based programs, programs, and no databaseand no database of the games, of the possiblegames, possible tricks, or tricks, tactics or existing tactics existingfor the game for the are game required are [required25]. Therefore, [25]. Therefore, we decided we to decided base our to algorithm base our onalgorithm the ideas on of the AlphaZero. ideas of WhileAlphaZero. a number While of open-sourcea number of checkersopen-source programs checkers exist, programs these approaches exist, these are approaches mostly obsolete, are mostly not as versatileobsolete, andnot as promising versatile inand a possible promising upgrade in a possible in the future. upgrade in the future.

3.1. Russian Checkers Rules Russian checkers was chosen because this variant of the game is the most popular in Russia, and visitors at shows will be able to start the game without additional instructions. It is also important that Russian checkers is not a solved game, unlike English checkers [26]. Russian checkers is a board game for two players. The general rules of Russian checkers are similar to other versions of this game, such as English checkers, except for a few points. For clarity, we give a brief list of the rules used during the algorithm training. 1. Simple checkers (“men”) move diagonally forward on one cell. 2. Inverted checkers (“kings”) move diagonally on any free field both forward and backward.

Robotics 2020, 9, 107 6 of 15

3.1. Russian Checkers Rules Russian checkers was chosen because this variant of the game is the most popular in Russia, and visitors at shows will be able to start the game without additional instructions. It is also important that Russian checkers is not a solved game, unlike English checkers [26]. Russian checkers is a board game for two players. The general rules of Russian checkers are similar to other versions of this game, such as English checkers, except for a few points. For clarity, we give a brief list of the rules used during the algorithm training.

1. Simple checkers (“men”) move diagonally forward on one cell. 2. Inverted checkers (“kings”) move diagonally on any free field both forward and backward. 3. Capture is mandatory. Captured checkers are removed only after the end of the turn. 4. Simple checker located next to the checker of the opponent, behind which there is a free field, transfers over this checker to this free field. If it is possible to continue capturing other checkers of the opponent, this capture continues until the capturing checker reaches the position from which the capture is impossible. A simple checker performs the capture both forwards and backwards. 5. The captures diagonally, both forward and backward, and stops on any available field after the checker is captured. Similarly, a king can strike multiple pieces of a rival and must capture as long as possible. 6. The rule of “Turkish capture” is applied: during one move, the opponent’s checker can be captured only once. If during the capture of several checkers, the simple checker or king moves on a cell from which it attacks the already captured checker, the move stops. 7. In multiple capture variants, such as one checker or two, the player selects the move corresponding to his/her own decision. 8. The game may have a draw state. A draw occurs when both players have kings and during the last 15 moves no checkers have been attacked. 9. White move first.

The checkers field has a size of 8 by 8 cells (64 cells). The white cell in the corner is on the right hand of each player; the cell coordinates are represented by a combination of a letter and a number.

3.2. Monte-Carlo Tree Search Algorithm The AlphaZero algorithm contains two basic elements: the Monte Carlo Tree Search Algorithm (MCTS) and the neural network. The MCTS algorithm is a randomized decision tree search. Each node of a decision tree is a field state, and each edge is an action necessary to reach the corresponding state. In order to perform decision making, the algorithm simulates the game, during which the player moves from the root of the tree to the leaf. The neural network estimates the probability of the current player winning v and the probability of the current move choosing P, giving the information regarding which of the currently available moves is the most efficient. The MCTS algorithm uses these predictions (P and v) to decide where to perform a more elaborate tree search. Each tree edge contains four values: P, the number N of the current edge visits, the total action value W, the expected reward for the current move Q, which is calculated as

W(s, a) Q(s, a) = . (1) N(s, a) Robotics 2020, 9, 107 7 of 15

The structure of the decision tree is shown in Figure4. The next move is made to maximize the action at = Q + U. A nonlinear parameter U is computed as

rP N(s, b) b U(s, a) = cpuctP(s, a) (2) 1 + N(s, a) P where cpuct controls the level of tree exploration, N(s, b) is the sum of visits in all parents of the b current action. The term U is added in order to prevent selecting the “best” moves in early simulations but to encourage tree research. However, over time, the parameter Q, which characterizes the overall quality of the move based on multiple wins and defeats, starts to play a leading role, while the role of the parameter U decreases. This reduces the probability of choosing a potentially wrong action. As the result,Robotics MCTS 2020 computes, 9, x a search probability vector π which assigns a probability of playing the7 of move16 proportional to the number of nodes visited: where τ is the “temperature” parameter controlling the degree of exploration. When τ is 1, the algorithm selects moves randomly taking into account the number of wins. When τ → 0, the move π N(s,a)(1/τ), with the largest number of wins will be selected.∝ Thus, τ = 1 allows exploration, and τ→ 0 does not. The activity diagram of MCTS is shown in Figure 5. When the search is complete, the move is selected where τbyis the the algorithm “temperature” randomly parameter proportional controlling to its probability the degree of found exploration. by MCTS. When τ is 1, the algorithm selects moves randomly taking into account the number of wins. When τ 0, the move with the → largest3.3. number Deep Neural of wins Network will be selected. Thus, τ = 1 allows exploration, and τ 0 does not. The activity → diagram ofThe MCTS proposed is shown robot ingame Figure algorithm5. When uses the the searchconvolutional is complete, deep neural the move networks is selected (DNN) with by the algorithma parameter randomly set proportionalθ. to its probability π found by MCTS.

Game state (s)

N – How many times was the Main characteristics action taken from the state s W – Total action weight Q – Current action weight The action that P – Probability of action maximazes Q+U Possible action (a)

Game state (s) Input

Neural network probability of probabilities value move (P)

probability of move (P)

Action importance (v)

Action importance (v)

FigureFigure 4. AlphaZero 4. AlphaZero game game state state tree. tree. On On each each step, step, the the algorithm algorithm selects the the edge edge maximizing maximizing thethe sum sum Q + U, where Q(s,a) is the expected reward for the move and U(s,a) is a nonlinear term forcing the tree Q + U, where Q(s,a) is the expected reward for the move and U(s,a) is a nonlinear term forcing the tree search depending on the action probability P(s,a) and a number of the visits N(s,a). search depending on the action probability P(s,a) and a number of the visits N(s,a). To design it, input and output data are first determined. The input data are the field state. The field is a two-dimensional array 8 × 8, but only black cells are used in the game, and each black cell contains different types of data, as follows: empty cell (0), black checker (1), white checker (2), black king (3), white king (4). All checkers move diagonally.

Robotics 2020, 9, 107 8 of 15 Robotics 2020, 9, x 8 of 16

Start with the root of the tree (current state of the game)

We get P and v for the current state as a result of neural network operation

Calculate a U = f(P,N)

Choose an edge (action) with maximum Q+U

[failed to reach the leaf] [managed to reach the leaf]

Return to the previous node, updating the value of the edge: N = N+1 W = W+v Q = W/N

[failed to reach the root]

[failed to complete the required [managed to reach the root] number of simulations]

[managed to complete the required number of simulations]

[Best move] [Probabilistic move]

Choose the action with the Calculate the probabilities of greatest N each action as N^(1/τ) (τ - parameter controlling the study)

Transition to a new state

FigureFigure 5. 5. AA diagram diagram of of the the Monte-Carlo Monte-Carlo Tree Tree Search Search algorithm. algorithm.

3.3. DeepThe checkers Neural Network board contains four areas depending on how many moves a checker can perform from it—seeThe proposed Figure 6. robot Each game area algorithmof smaller usessize contains the convolutional four fewer deep black neural cells and networks two more (DNN) moves— with a seeparameter Table 1. set θ. To design it, input and output data are first determined. The input data are the field state. The field is a two-dimensional array 8 8, but only black cells are used in the game, and each black cell contains × different types of data, as follows: empty cell (0), black checker (1), white checker (2), black king (3), white king (4). All checkers move diagonally. The checkers board contains four areas depending on how many moves a checker can perform from it—see Figure6. Each area of smaller size contains four fewer black cells and two more moves—see Table1.

Figure 6. Checkerboard areas with respect to a king.

Robotics 2020, 9, x 8 of 16

Start with the root of the tree (current state of the game)

We get P and v for the current state as a result of neural network operation

Calculate a U = f(P,N)

Choose an edge (action) with maximum Q+U

[failed to reach the leaf] [managed to reach the leaf]

Return to the previous node, updating the value of the edge: N = N+1 W = W+v Q = W/N

[failed to reach the root]

[failed to complete the required [managed to reach the root] number of simulations]

[managed to complete the required number of simulations]

[Best move] [Probabilistic move]

Choose the action with the Calculate the probabilities of greatest N each action as N^(1/τ) (τ - parameter controlling the study)

Transition to a new state

Figure 5. A diagram of the Monte-Carlo Tree Search algorithm.

The checkers board contains four areas depending on how many moves a checker can perform Roboticsfrom it—see2020, 9, Figure 107 6. Each area of smaller size contains four fewer black cells and two more moves—9 of 15 see Table 1.

Figure 6. Checkerboard areas with respect to a king.

Robotics 2020, 9, x Table 1. Summary field information on checkerboard areas. 9 of 16

TableArea 1. Summary Black field Cell information Moves on Blackcheckerboard Cells areas.

IArea Black 7 Cell Moves Black 14 Cells III 97 14 10 III 11 6 II 9 10 IV 13 2 III 11 6 IV 13 2 After calculating the data, we obtained 280 possible moves. This number is equal to the number of output neuronsAfter calculating that get certain the data, values we obtained as the result 280 possible of the given moves. field This state number classification. is equal to the Regarding number the state ofof theoutput checkerboard, neurons that get our certain algorithm values takes as the intoresult account of the given not field only state the classification. current state Regarding but also the the state of the checkerboard, our algorithm takes into account not only the current state but also the previous states, dividing one state into two layers. previous states, dividing one state into two layers. One layer contains information about the first player’s objects, and another one contains One layer contains information about the first player’s objects, and another one contains informationinformation about about the the objects objects of of the the second second player.player. One One additional additional layer layer provides provides information information on on whichwhich player player is is currently currently performingperforming the the action. action. The The overall overall set setof layers of layers is shown is shown in Figure in Figure 7. 7.

Current status of the white draughts a b b Current status of the black a draughts

n previous states of white draughts n previous states of black draughts

Matrix filled with “1” If white draughts move or “-1” if black draughts move

FigureFigure 7. Game 7. Game state state used used as as an an input input for for the the deepdeep neural networks networks (DNN). (DNN).

The sizeThe ofsize this of setthis dependsset depends on on how how many many states states areare storedstored in in memory. memory. If Ifonly only the the current current state state is considered,is considered, three three layers layers are are used, used, but but with with storing storing one one moremore move the the number number of oflayers layers increases increases by two.by two. TrainingTraining a DNN a DNN is performed is performed after after each each game. game. At At everyevery move, the the MCTS MCTS runs runs first first estimating estimating movemove probabilities probabilitiesπ, and π, and each each pair pair (s ,(sπ, )π is) is recorded, recorded, and when when the the game game is complete, is complete, its outcome its outcome z ∈ {−1, +1} is also recorded. Then, the DNN parameters θ are updated so as to minimize the objective z { 1, +1} is also recorded. Then, the DNN parameters θ are updated so as to minimize the objective ∈ − function (up to the regularization term): function (up to the regularization term): l = (z − v)2 − π⊤ logp (3) 2 l = (z v) π> logp (3) where and are values predicted by the− DNN.− Initially, we had a completely empty, untrained network, so we could ignore network solutions whereandv and performp are a values tree search predicted based byon a the random DNN. guess, updating only Q and N. After a huge number of Initially,preliminary we hadsimulations, a completely we obtained empty, a decision untrained tree network, that works so without we could the neural ignore network network directly. solutions and performThen, we a treecould search teach the based DNN on by a randomsearching guess, its set of updating parameters only thatQ matchand N the. After Monte-Carlo a huge number search of preliminaryresults simulations,as closely as possible. we obtained This DNN a decision is close treeto the that AlphaZero works without network theproposed neural in network [9,27]. directly.

3.4. Comparing Different Generations of DNN We ran several tests of the program with similar neural network architecture, varying the size of the input data. Using these results, we analyzed the DNN learning efficiency on input data size. Eight different neural network architectures were generated. Each architecture had 6 residual layers containing 75 matching 4 × 4 filters. The architectures were different in the input data size, namely, how many past game states were stored. Thus, we had 8 variants of neural networks with the number of past states from 0 to 7, which corresponded to input data array sizes from 3 × 8 × 8 to 17 × 8 × 8. Each network was trained until at least five generations of the network were produced. A new generation was formed when the learner’s agent defeated the current agent with coefficient 1.3, implying that the new version is potentially superior to the previous one. For each architecture, the

Robotics 2020, 9, 107 10 of 15

Then, we could teach the DNN by searching its set of parameters that match the Monte-Carlo search results as closely as possible. This DNN is close to the AlphaZero network proposed in [9,27].

3.4. Comparing Different Generations of DNN We ran several tests of the program with similar neural network architecture, varying the size of the input data. Using these results, we analyzed the DNN learning efficiency on input data size. Eight different neural network architectures were generated. Each architecture had 6 residual layers containing 75 matching 4 4 filters. The architectures were different in the input data size, namely, × how many past game states were stored. Thus, we had 8 variants of neural networks with the number of past states from 0 to 7, which corresponded to input data array sizes from 3 8 8 to 17 8 8. × × × × Each network was trained until at least five generations of the network were produced. A new generation was formed when the learner’s agent defeated the current agent with coefficient 1.3, implying that the new version is potentially superior to the previous one. For each architecture, Robotics 2020, 9, x 10 of 16 the results of the first five generations were examined. Within the same architecture, 20 games were played betweenresults eachof the generation.first five generations Each generation were examin receiveded. Within points the same calculated architecture, as 20 the games difference were between the wins ofplayed this between generation each overgeneration. another. Each Thesegeneration points received qualitatively points calculated estimate as the the difference neural network between the wins of this generation over another. These points qualitatively estimate the neural performance.network Figure performance.8 shows Figure that DNN8 shows with that DN 13N input with layers13 input performslayers performs the best.the best.

Figure 8. Difference of neural network playing capabilities as a function of the generation number. Figure 8. Difference of neural network playing capabilities as a function of the generation number. The playing quality is estimated as Δ = Nnext − Nprev, where Nnext is the number of games won by the The playing quality is estimated as ∆ = Nnext Nprev, where Nnext is the number of games won by the next generation, Nprev is the number of games− won by the previous generations; the first generation next generation,competedN withprev isa non-trained the number network. of games won by the previous generations; the first generation competed with a non-trained network. Using a three-layer neural network, each subsequent generation is progressing, but as we see in Usingthe a three-layerfifth generation, neural during network, training, the each networ subsequentk can also generationdecrease its performance is progressing, despite but the as we see superiority of the previous version. This is particularly evident in the neural network with 9 input in the fifth generation, during training, the network can also decrease its performance despite the layers, where there was a perceptible failure because the learning process went in the wrong direction. superioritySince of thethe 4th previous generation, version. the network This has is particularlystarted to grow evidentqualitatively. in the This neural indicates network that even withan 9 input layers, whereill-trained there neural was a network perceptible is capable failure of retraining because and the improving learning processinitial results. went The in curve the wrong of the direction. Since the 4thneural generation, network with the 13 input network layers hasshows started that it asymptotically to grow qualitatively. approaches the This limit indicates of its learning, that even an and each new generation is progressing less than the previous. ill-trained neural network is capable of retraining and improving initial results. The curve of the neural It is difficult to confirm the explicit correlation of the input data with the accuracy of DNN network withtraining. 13 input In terms layers of training shows quality, that itthe asymptotically network with 13 approachesinput layers proved the limit to be of the its best, learning, which and each new generationcould potentially is progressing indicate lessthe most than appropriate the previous. amount of data for this configuration.

3.5. Comparing with Aurora Borealis As a reference checkers program, we used a free version of Aurora Borealis [28]. This program is a well-known prize-winner of European and world championships and is one of the top ten best checkers programs. Its Russian checkers engine uses a database with approximately 110 thousand games. Maximum performance settings of the program were used for our test.

Robotics 2020, 9, 107 11 of 15

It is difficult to confirm the explicit correlation of the input data with the accuracy of DNN training. In terms of training quality, the network with 13 input layers proved to be the best, which could potentially indicate the most appropriate amount of data for this configuration.

3.5. Comparing with Aurora Borealis As a reference checkers program, we used a free version of Aurora Borealis [28]. This program is a well-known prize-winner of European and world championships and is one of the top ten best checkers programs. Its Russian checkers engine uses a database with approximately 110 thousand games. Maximum performance settings of the program were used for our test. Robotics 2020, 9, x 11 of 16 A number of matches were played between Aurora Borealis and 5th generation DNN with 17 input layers.A number Most of ofmatches the matches were played were between played Auro accordingra Borealis to the and rules 5th ofgeneration blitz tournaments DNN with 17 with a limit ofinput 15 minlayers. per Most match, of the and matches several were matches played accord were playeding to the according rules of blitz to the tournaments classical ruleswith a with limit a 1 h limit.of In 15 about min per half match, of all and matches, several Auroramatches Borealiswere played failed according to find to a the move classical in the rules allotted with a time 1 h limit. and was disqualified.In about Thehalf remainingof all matches, half Aurora of thematches Borealis wasfailed predominantly to find a movewon in the by allotted Aurora time Borealis. and was Thedisqualified. result of The tests remaining shows that half theof the DNN-based matches was program predominantly is much won faster by Aurora and more Borealis. reliable but still needs additionalThe result learning of tests toshows be able that to the compete DNN-based on equal program terms is much with topfaster checkers and more programs. reliable but The still speed needs additional learning to be able to compete on equal terms with top checkers programs. The is the extremely important parameter here due to the main purpose of designed robot. speed is the extremely important parameter here due to the main purpose of designed robot. 3.6. Interaction between Checkers Program, Robot, and User 3.6. Interaction between Checkers Program, Robot, and User FigureFigure9 presents 9 presents a diagram a diagram of theof the Russian Russian checker checker robotrobot algorithm. algorithm. After After the the software software module module basedbased on AlphaZero on AlphaZero has has received received the the data data on on checker checker field field state, state, it it produces produces thethe move and returns returns the correspondingthe corresponding command command set to theset to robotic the robotic setup. setup.

Player is checking the validity of Pressing a tact switch move

Arduino (checker field)

read Hall sensors state

Chekers field state

Chekers program

An error message incorrect move appears checking the validity of move

correct move

Neural networks module

determine the best The move is impossible move Game over Stroke value in format: LNLN The move is possible

Dobot Chekers program Control command convert the value of the move to the Pieces moving (move) control commands for the robot

Waiting for a tact switch click

FigureFigure 9. Diagram 9. Diagram of of the the checker checker programprogram and and robotic robotic setup setup interaction. interaction.

The programThe program also producesalso produces an emotional an emotional state basedstate based on the on current the current robot moverobot qualitymove quality estimation v (probabilityestimation to v win)(probability given byto thewin) DNN. given Theby the correspondence DNN. The correspondence between the robot’sbetween emotionalthe robot’s state, its attitudeemotional to the state, human its attitude player, to andthe human the game play stateer, and is the given game in state Table is2 given. in Table 2. Two emotions (surprise and fear) are used in exceptional cases of v in comparison to its previous Table 2. Emotional state of the robot. value. The attitude is prescribed in the program settings and may be “positive” (benevolent) and “negative” (aggressive).Attitude When to Human it is set to Robot “positive”, Move Estimation the robot imitates Emotional that State it empathizes with the positive <0.6 happiness ∈ [0.6; 0.9] neutral >0.9 sadness negative >0.8 happiness ∈ [0.4; 0.8] sadness <0.4 anger

Robotics 2020, 9, 107 12 of 15 human player and rejoices at the user’s good play. When the attitude is “negative”, the robot imitates its aggression and rivalry. The phrases corresponding to the emotional state are occasionally played back describing the robot’s state and giving value judgments on the game, such as: “Brilliant move!” or “Sorry, but I don’t think this move was good for you”. It also asks the user to make another move if the last user’s move was incorrect. The voice messages are transmitted through the laptop speakers or an external powerful speaker if the environment is too noisy—e.g., at exhibitions and shows. Servo motors in the robot neck tilt its head in the direction of the board when the robot makes a move and set the head in a vertical position when it waits for the opponent’s move or speaks a message. Eye LED matrices imitate pupil movements towards the board and the user, and follow the overall emotional expression—see Figure3.

Table 2. Emotional state of the robot.

Attitude to Human Robot Move Estimation Emotional State positive v < 0.6 happiness v [0.6; 0.9] neutral ∈ v > 0.9 sadness negative v > 0.8 happiness v [0.4; 0.8] sadness ∈ v < 0.4 anger

4. Discussion and Conclusions In this paper, we presented a robot playing Russian checkers based on the AlphaZero algorithm. The checkers algorithm with 13 state layers recording the last five moves proved to achieve the best results among the tested versions. The robotic setup was built using the Dobot v.1.0 robotic arm with a custom checker board and successfully performed its task. The reported robot participated in Geek Picnic 2019, Night of Museums 2019, and other exhibitions and public events in Saint Petersburg, Russia. Due to the use of Hall sensors instead of a computer vision system, it was independent from lighting conditions, which was especially valuable at events with rapidly changing, insufficient, and flashing light. It also made the design compact and non-expensive. On the other hand, using the digital Hall sensors turned out to be less robust to inaccurate piece localization than could be achieved with analog sensors. The robot played a number of games with visitors at the abovementioned events. As a qualitative result, the robot gained the attention of visitors of all ages and showed itself as a well-recognized attraction. Adjusting the game difficulty level by selecting the generation of the DNN was proved as a reliable idea. The ability of the robot to render a facial emotional state and its voice remarks were appreciated by many users since it brings better interactivity to the gameplay. The robot was designed within the project on the human–machine interface (HMI). The main motivation of the work is as follows. While conventional interfaces for table games (Chess, Checkers, Go, etc.) between human players and computer belong mostly to a class of GUI-based solutions, a classical game design with physical pieces and a board can only be used in a robotized environment [29]. It is intuitively clear that physical interface (GUI or real board) may somehow affect the performance or psychosomatic state of players, but only recently the evidence of the latter phenomenon was obtained for chess game [30]. The authors show that this difference is caused by being familiar or not with playing chess with a computer. While training with a computer is an ordinary practice for many young players, these results indirectly indicate an importance of training in a physical environment as well. The robot can not only serve as a manipulator for pieces, but it can also introduce some additional interactivity or stress factor for making the training process more versatile, which is especially valuable for young players. For example, it can help to improve their focus on the game, or train their stress Robotics 2020, 9, 107 13 of 15 resistance. Some recent results show successful examples of using robots in teaching handwriting, showing that a robotized environment may be used for developing an efficient teaching process [31]. In our future work, we will focus on the following aspects:

1. Investigation of the robot application to teaching checkers game and its efficiency in various training aspects. 2. RoboticsInvestigation 2020, 9, x of performance of checkers board with analog Hall sensors. 13 of 16 3. 3. AdditionalAdditional learning learning procedures procedures for for DNN DNN and more and elaborate more elaborate comparison comparison with existing with checkers existing checkersprograms. programs. Additionally,Additionally, we we will will perform perform aa qualitativequalitative study on on how how apparent apparent emotions emotions make make the thegame game processprocess more more entertaining. entertaining. Author Contributions: Conceptualization, E.E.K. and D.N.B.; data curation, L.S.; formal analysis, A.I.K. and L.S.; Author Contributions: Conceptualization, E.E.K. and D.N.B.; data curation, L.S.; formal analysis, A.I.K. and L.S.; funding acquisition, D.N.B.; investigation, E.E.K. and G.Y.K.; methodology, G.Y.K.; project administration, A.I.K. funding acquisition, D.N.B.; investigation, E.E.K. and G.Y.K.; methodology, G.Y.K.; project administration, A.I.K. andand D.N.B.; D.N.B.; resources, resources, G.Y.K.; G.Y.K.; software, software, E.E.K. E.E.K. andand A.I.K.; supervision, supervision, A.I.K. A.I.K. and and D.N.B.; D.N.B.; validation, validation, G.Y.K., G.Y.K., L.S. L.S. andand D.N.B.; D.N.B.; visualization, visualization, E.E.K. E.E.K. and and G.Y.K.; G.Y.K.; writing—original writing—original draft, draft, E.E.K. E.E.K. and and A.I.K.; A.I.K.; writing—review writing—review a editing, a L.S.editing, and D.N.B. L.S. and All D.N.B. authors All have authors read have and read agreed and to agreed the published to the published version ve ofrsion the manuscript.of the manuscript. Funding:Funding:This This research research received received no no external external funding.funding. Acknowledgments:Acknowledgments:The The authors authors are are grateful grateful toto M.Tech.M.Tech. studen studentt Andrey Andrey Ignatovskiy Ignatovskiy for forpreparing preparing several several programprogram codes codes of AlphaZeroof AlphaZero algorithm algorithm used used in in thisthis research.research.

ConflictsConflicts of Interest:of Interest:The The authors authors declare declare no no conflict conflict ofof interest.

AppendixAppendix A A TheThe circuit circuit of of the the robot robot control control circuit circuit isis givengiven in Figure A1.A1 .

FigureFigure A1.A1. CheckersCheckers robot robot control control circuit. circuit.

TheThe Russian Russian checker checker field field consists consists ofof 3232 blackblack cells requiring requiring 32 32 Hall Hall sensors. sensors. To Toconnect connect all the all the sensorssensors to the to microcontroller,the microcontroller, two 16-channeltwo 16-channel CD74HC4067 CD74HC4067 analog analog multiplexers multiplexers are used. areThe used. connection The diagramconnection is shown diagram in Figure is shown A2 .in Figure A2. The Hall digital bipolar sensors TLE4945L located under black cells on the second PCB determine the state of the cell due to neodymium magnet tokens mounted into each checker. The magnet polarity does not matter since the color and the type of checker (man or king) is non-ambiguously determined by gameplay.

RoboticsRobotics2020 ,20209, 107, 9, x 14 of 1614 of 15

Figure A2. Checkers field schematics. The Hall sensors TLE4945L with 10 kΩ resistors are mounted

under the black cells of the board (only 8 sensors are shown, the schematic of the other is similar). Figure A2. Checkers field schematics. The Hall sensors TLE4945L with 10 kΩ resistors are mounted The button is used for the move confirmation. under the black cells of the board (only 8 sensors are shown, the schematic of the other is similar). The button is used for the move confirmation. References

1. Sussman,The Hall M. Performing digital bipolar the Intelligent sensors Machine:TLE4945L Deception located under and Enchantment black cells inon the the Life second of the AutomatonPCB determineChess Player. the TDRstate/ Theof the Drama cell Rev.due 1999to neodymium, 43, 81–96. [magnetCrossRef tokens] mounted into each checker. The 2. magnetStandage, polarity T. The does Turk: not The matter Life and since Times the of thecolor Famous and the Eighteenth-Century type of checker Chess-Playing (man or king) Machine is non-; Walker ambiguously determined by gameplay. Books: London, UK, 2002; ISBN 0-8027-1391-2. 3. Kovács, G.; Petunin, A.; Ivanko, J.; Yusupova, N. From the first chess-automaton to the mars pathfinder. Acta References Polytech. Hung. 2016, 13, 61–81. 4. 1.Tebbe, Sussman, J.; Gao, M. Y.; Performing Sastre-Rienietz, the Intelligent M.; Zell, Machine: A. A table Decept tennision and robot Enchantment system using in th ane Life industrial of the Automaton kuka robot arm. In GermanChess Player. Conference TDR/The on Pattern Drama Rev. Recognition 1999, 43;, Springer:81–96, doi:10.1162/105420499760347342. Cham, Germany, 2018; pp. 33–45. 2. Standage, T. The Turk: The Life and Times of the Famous Eighteenth-Century Chess-Playing Machine; Walker 5. Matsushima, M.; Hashimoto, T.; Takeuchi, M.; Miyazaki, F. A learning approach to robotic table tennis. IEEE Books: London, UK, 2002; ISBN 0-8027-1391-2. Trans. Robot. 2005, 21, 767–771. [CrossRef] 3. Kovács, G.; Petunin, A.; Ivanko, J.; Yusupova, N. From the first chess-automaton to the mars pathfinder. 6. Lakin,Acta M. Polytech. A Robotic Hung. Air 2016 Hockey, 13, 61–81. Opponent. CCSC SC Stud. Pap. E-J. 2009, 2, 1–6. 7. 4.Nierho Tebbe,ff, T.; J.; Kourakos,Gao, Y.; Sastre-Rienietz, O.; Hirche, S. M.; Playing Zell, A. pool A table with tennis a dual-armed robot system robot. using In an Proceedings industrial kuka of the robot 2011 IEEE Internationalarm. In German Conference Conference on on Robotics Pattern Recognition and Automation,; Springer: Shanghai, Cham, Germ China,any, 9–132018; Maypp. 33–45. 2011; pp. 3445–3446. 8. 5.Newborn, Matsushima, M. Kasparov M.; Hashimoto, Versus Deep T.; Take Blue:uchi, Computer M.; Miyazaki, Chess Comes F. A learning of Age; Springer:approach to New robotic York, table NY, tennis. USA, 1997. 9. Silver,IEEE D.; Trans. Huang, Robot. A.; 2005 Maddison,, 21, 767–771. C.J.; Guez, A.; Sifre, L.; Driessche, G.V.D.; Schrittwieser, J.; Antonoglou, I.; 6.Panneershelvam, Lakin, M. A Robotic V.; Lanctot, Air Hockey M.; et Opponent. al. Mastering CCSC the SC gameStud. Pap. of Go E-J. with 2009 deep, 2, 1–6. neural networks and tree search. 7.Nature Nierhoff,2016, 529T.; Kourakos,, 484–489. O.; [CrossRef Hirche, ][S.PubMed Playing pool] with a dual-armed robot. In Proceedings of the 2011 IEEE International Conference on Robotics and Automation, Shanghai, China, 9–13 May 2011; pp. 3445– 10. Carrera, L.; Morales, F.; Tobar, J.; Loza, D. MARTI: A Robotic Chess Module with Interactive Table, 3446. for Learning Purposes. In Proceedings of the World Congress on Engineering and Computer Science, 8. Newborn, M. Kasparov Versus Deep Blue: Comes of Age; Springer: New York, NY, USA, 1997. San Francisco, CA, USA, 25–27 October 2017; Volume 2. 9. Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; Driessche, G.V.D.; Schrittwieser, J.; Antonoglou, 11. Matuszek,I.; Panneershelvam, C.; Mayton, V.; B.; Lanctot, Aimi, R.;M.; Deisenroth,et al. Mastering M.P.; the Bo,game L.; of Chu, Go with R.;Kung, deep neural M.; LeGrand, networks L.;and Smith, tree J.R.; Fox,search. D. Gambit: Nature An 2016 autonomous, 529, 484–489, chess-playing doi:10.1038/nature16961. robotic system. In Proceedings of the 2011 IEEE International 10.Conference Carrera, onL.; RoboticsMorales, F.; and Tobar, Automation, J.; Loza, D. Shanghai, MARTI: A China, Robotic 9–13 Chess May Module 2011; pp.with 4291–4297. Interactive Table, for 12. Luqman,Learning H.M.; Purposes. Zaffar, In M. Pr Chessoceedings Brain of and the AutonomousWorld Congress Chess on En Playinggineering Robotic and Computer System. Science, In Proceedings San of the 2016Francisco, International CA, USA, Conference25–27 October on 2017; Autonomous Volume 2. Robot Systems and Competitions (ICARSC), Braganca, Portugal, 4–6 May 2016; pp. 211–216. 13. Chen, A.T.-Y.; Kevin, I.; Wang, K. Computer vision based chess playing capabilities for the Baxter humanoid robot. In Proceedings of the 2016 2nd International Conference on Control, Automation and Robotics (ICCAR), Hong Kong, China, 28–30 April 2016; pp. 11–14. Robotics 2020, 9, 107 15 of 15

14. Chen, A.T.-Y.; Wang, K.I.-K. Robust computer vision chess analysis and interaction with a humanoid robot. Computers 2019, 8, 14. [CrossRef] 15. Wang, J.; Wu, X.; Qian, T.; Luo, H.; Hu, C. Design and Implementation of Chinese Chess Based on Manipulator. In Proceedings of the 2019 IEEE 3rd Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC), Chongqing, China, 11–13 October 2019; pp. 1687–1690. 16. Del Toro, C.; Robles-Algarín, C.A.; Rodríguez-Álvarez, O. Design and Construction of a Cost-Effective Didactic Robotic Arm for Playing Chess, Using an Artificial Vision System. Electronics 2019, 8, 1154. [CrossRef] 17. Chen, P.-J.; Yang, S.-Y.; Wang, C.-S.; Muslikhin, M.; Wang, M.-S. Development of a Chinese Chess Robotic System for the Elderly Using Convolutional Neural Networks. Sustainablilty 2020, 12, 3980. [CrossRef] 18. Kołosowski, P.; Wolniakowski, A.; Miatliuk, K. Collaborative Robot System for Playing Chess. In Proceedings of the 2020 International Conference Mechatronic Systems and Materials (MSM), Bialystok, Poland, 1–3 July 2020; pp. 1–6. 19. Sušac, F.; Aleksi, I.; Hocenski, Ž. Digital chess board based on array of Hall-Effect sensors. In Proceedings of the 2017 40th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 22–26 May 2017; pp. 1011–1014. 20. Correia, F.; Alves-Oliveira, P.; Ribeiro, T.; Melo, F.S.; Paiva, A. A social robot as a card game player. In Proceedings of the Thirteenth Artificial Intelligence and Interactive Digital Entertainment Conference, Little Cottonwood Canyon, UT, USA, 5–9 October 2017. 21. Oliveira, R.; Arriaga, P.; Alves-Oliveira, P.; Correia, F.; Petisca, S.; Paiva, A. Friends or foes? Socioemotional support and gaze behaviors in mixed groups of humans and robots. In Proceedings of the 2018 ACM/IEEE International Conference on Human-Robot Interaction, McCormick Place North, Chicago, IL, USA, 5–8 March 2018; pp. 279–288. 22. Silver, D.; Hubert, T.; Schrittwieser, J.; Antonoglou, I.; Lai, M.; Guez, A.; Lanctot, M.; Sifre, L.; Kumaran, D.; Graepel, T.; et al. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science 2018, 362, 1140–1144. [CrossRef][PubMed] 23. Nitsch, V.; Popp, M. Emotions in robot psychology. Biol. Cybern. 2014, 108, 621–629. [CrossRef][PubMed] 24. Hegel, F.; Eyssel, F.; Wrede, B. The social robot ‘flobi’: Key concepts of industrial design. In Proceedings of the 19th International Symposium in Robot and Human Interactive Communication, Viareggio, Italy, 13–15 September 2010; pp. 107–112. 25. Duarte, F.F.; Lau, N.; Pereira, A.; Reis, L.P. A Survey of Planning and Learning in Games. Appl. Sci. 2020, 10, 4529. [CrossRef] 26. Schaeffer, J.; Burch, N.; Björnsson, Y.; Kishimoto, A.; Müller, M.; Lake, R.; Lu, P.; Sutphen, S. Checkers is solved. Science 2007, 317, 1518–1522. [CrossRef][PubMed] 27. Silver, D.; Schrittwieser, J.; Simonyan, K.; Antonoglou, I.; Huang, A.; Guez, A.; Hubert, T.; Baker, L.; Lai, M.; Bolton, A.; et al. Mastering the game of go without human knowledge. Nature 2017, 550, 354–359. [CrossRef] [PubMed] 28. Aurora Borealis Website. Available online: http://aurora.draughtsworld.com/ (accessed on 5 December 2020). 29. Li, S.; Qiu, K.; Qian, C.; Lv, J.; Cui, Y. A study on human-machine interaction computer games robot. In Proceedings of the 2017 29th Chinese Control And Decision Conference (CCDC), Chongqing, China, 28–30 May 2017; pp. 7643–7648. 30. Fuentes-García, J.P.; Pereira, T.; Castro, M.A.; Carvalho Santos, A.; Villafaina, S. Heart and brain responses to real versus simulated chess games in trained chess players: A quantitative EEG and HRV study. Int. J. Environ. Res. Public Health 2019, 16, 5021. [CrossRef][PubMed] 31. Lemaignan, S.; Jacq, A.; Hood, D.; Garcia, F.; Paiva, A.; Dillenbourg, P. Learning by teaching a robot: The case of handwriting. IEEE Robot. Autom. Mag. 2016, 23, 56–66. [CrossRef]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).