Gogogo: Improving Deep Neural Network Based Go Playing AI with Residual Networks Xingyu Liu Introduction Network Architecture Experiment Result
Total Page:16
File Type:pdf, Size:1020Kb
GoGoGo: Improving Deep Neural Network Based Go Playing AI with Residual Networks Xingyu Liu Introduction Network Architecture Experiment Result • Go playing AIs using Traditional Search: Input (19x19x5) Input (19x19x5) Data • Training Accuracy ~ 32% GNU Go, Pachi, Fuego, Zen etc. • Testing Accuracy ~ 26% CONV3, 64 CONV3, 64 CONV3 • Powered by Deep Learning: Batch Norm Supervised Learning Training Loss CONV3, 64 CONV3, 64 Zen → Deep Zen Go, darkforest, AlphaGo ReLU 7 6 • Goal: From by Vanilla CNN to ResNets CONV3, 64 CONV3, 64 CONV3 5 Batch Norm 4 CONV3, 64 CONV3, 64 3 Eltwise Add Training Methodology and Data 2 CONV3, 64 CONV3, 64 1 ReLU 0 1 365 183 547 729 911 1275 2185 5461 6371 7281 8191 1457 1639 1821 2003 2367 2549 2731 2913 3095 3277 3459 3641 3823 4005 4187 4369 4551 4733 4915 5097 5279 5643 5825 6007 6189 6553 6735 6917 7099 7463 7645 7827 8009 8373 8555 8737 SL on RL on RL on 1093 CONV3, 64 CONV3, 64 Policy Policy Value (c) Residual Module /100 batches Network Network Network CONV3, 64 CONV3, 64 • Use Ing Chang-ki rule CONV3, 64 CONV3, 64 Board State + Ko is Game State, No need to Hyperparameters Value CONV3, 64 CONV3, 64 Base learning rate 2E-4 remember the number of captured stones Decay Policy Exp CONV3, 64 CONV3, 64 • From Kifu to Input Feature Maps Decay Rate 0.95 Channels: 1) Space Positions; 2) Black Positions; 3) CONV3, 64 CONV3, 64 Decay Step (kifu) 200 White Positions; 4) Current Player; 5) Ko Positions Loss Function Softmax CONV3, 64 CONV3, 64 (b) Hyperparameters Fig. 3 GoGoGo plays against itself, policy network only CONV3 64 CONV3 64 Output FC23104 1 Future Work Fig 1. Ko fight explicity expansion P Fig. 2 (a) Policy Network (b) Value Network • Reinforcement Learning of Value Network • Dynamic Board State Expansion • Network Architecture Exploration • Monte Carlo Tree Search Ko fight performing. Saves disk space. Small Mem • Real Match Testing against Human Players 푎푡 = argmax(푄 푠푡, 푎 + 푢(푠푡, 푎)) • Two Levels of Batches (Kifus, moves) 푎 푃(푠, 푎) [1] David Silver et al., “Mastering the game of go with deep neural networks and Random Shuffling. Mem usage small and locality. 푢(푠, 푎) ∝ tree search”, Nature, 529:484–503, 2016. 1 + 푁(푠, 푎) [2] Kaiming He et al, “Deep residual learning for image recognition”, CoRR, abs/1512.03385, 2015. .