Large-Scale Platform for MOBA Game AI Bin Wu & Qiang Fu 28th March 2018 Outline

• Introduction

• Learning algorithms

• Computing platform

• Demonstration Game AI Development

Early exploration Transition Rapid development Explosive growth

1950s-1960s 1970s-1980s 1990s-2000s 2010s

Alpha Go (DeepMind) Checkers beat state Chess4.5 beat human Deep Blue (IBM) beat defeat Lee Sedol, and champion players Garry Kasparov Jie Ke

Applications of Game AI

Gaming Research

Core applications in gaming industry Ideal testbed for general AI research

Pre-game procedures Massive data from e.g., game designing human players Low experimental Player experience costs e.g., teammates, enemies General ability for inception and decision Others e.g., E-sports From virtual world to real world Game AI Research Topic

Game AI has become a research hot topic after the success of AlphaGo

• Many AI giants have joined game AI research • Moving from Go->RTS, MOBA, etc. 2 1v1 beat top human players. 5v5 to be activated in 2018

Released Starcraft AI platform, preliminary results Released Starcraft in simple scenarios II AI platform,not able to defeat built-in AI

MOBA Game

• 5 v.s. 5 game Obtain gold/exp → gain advantages on equipment → win fights → destroy enemy’s base

Goal: Destroy Enemy’s Base Enemy’s Turrets Attack/Skills Control

Movement Control

Equipment Purchase Neutral creeps: of money/power/level/… MOBA Game

• Micro Combat ◇ Movement ◇ Use of skills

MOBA Game

• Macro Strategies ◇ Back up ◇ Laning ◇ Ganking ◇ Stealing base

MOBA AI - Key Challenge

Learning Algorithm Computing Platform Learning Algorithms Learning Algorithms - Challenges

Complexity 1 10^20000

Multi-agent 2 5v5 coordination

Sparse and delayed rewards 3 20,000+ frames per game

Imperfect Info 4 Partially observable Learning Algorithms - Challenges

• Complexity >> Go ◇ End to end solutions (SL/RL) do not work well Not able to finish basic movement/attack Similar observations made • DeepMind

Go MOBA

State space 3^360 ≈ 10^170 10^20000 (361 pos, 3 states each) (10 heroes,2000+pos * 10+states)

Left, right, … Skill 1,2,3+pos/target 250^150 ≈ 10^360 20^20000 ≈ 10^20000 recover (250 pos available, 150 Action space (20 actions,20,000 frames per game) return decisions per game in average) etc… Learning Algorithms - Challenges

• Multi-agent ◇ Macro strategy level Four defending, while one steals the base

◇ Micro combat level Tanks protecting assassins Learning Algorithms - Challenges

• Sparse and delayed rewards

Go <360 steps

>20,000 steps MOBA Learning Algorithms - Challenges

• Imperfect Information ◇ Maps are partially observable Guess enemy’s positions/strategy Actively explore to gain Model Architecture

• Divide and Conquer Strategy

Transfer

Split for simplification Solution space ~10^20000->~10^2000

Combat Model - Transfer

• Where to send heroes? ◇ Compared to Go game Put heroes as stones Put maps as boards ◇ Predict good position

Hotspots Prediction Transfer Model - Strategy

• Key resources in MOBA ◇ Modeling macro objectives Describe hotspots transition series before destroying the key resource Model - Strategy

宏观 DescribeSession hotspots切分示例 transition series before destroying the key resource

Mid 1st Dark Slain Start Dragon turret dragon Dragon

Mid 2nd Mid 3rd Base Bottom 1st turret turret turret

Stealing blue creep Killing bottom lane Attacking bottom 1st creeps turret Model - Transfer Network with Macro Strategy

Key resources

Hotspots Model - Combat

• Multi-task on buttons ◇ Action space Directions Skill releasing position Learning Framework

• Imitation + Reinforcement Learning Computing Platform MOBA Game AI Platform

• Computing Platform ◇ Computational power – large-scale CPU/GPU virtualization ◇ Learning platform – Efficient and easy-to-use platform

Game environment Reinforcement Service Feature extraction Model training deployment learning

Task Tencent cloud Managem Machine learning ent function

Resource allocation Elastic computation Kubernetes resource allocation

Online service Idle resource pool Online service Offline service Deploym ent Docker + mixed online/offline technique Docker + GPU virtualization for shared resource

Computat ional units Millions of CPUs Thousands of GPUs Computational Power

• Computational Costs

GPU CPU MOBA AI thousands millions

CPU/GPU Demands Challenge Solution

The more is the better Improve resource utilization CPU/GPU virtualization for efficiency without additional costs shared resources

CPU Virtualization

• Elastic and dynamic resource pool ◇ millions of CPU cores

70% - Idle resource pool ◇ New resources not yet delivered ◇ Old resources not yet cleared ◇ Returned resources

30% - Idle slots in online service ◇ Online service resource usage

# of CPU cores # CPU avg %

Percentage millions 20%

Elastic & Dynamic Resource Pool ◇ 20%->65% using docker isolation

GPU Virtualization

• Goal: improve GPU usage efficiency • Resource usage # of GPU % of low load machines GPU avg usage thousands 65% 28% • Optimization idea

GPU Virtualization [12]

Time-slice share

Parallel share Learning Platform

Core Technique Version Update Frequency

Feature extraction Hours

Model training One day

RL training One day Learning Platform - Feature Extraction Platform

Game replays Game Raw Data Features Training samples Models Evaluation

pre Feature extraction shuffle Training Evaluation

gamecore

• Demand 1 ◇ Feature extraction from up to hundreds of thousands of replays Challenge: demands up to 210 thousand CPU cores per day Solution • CPU virtualization • docker elastic & dynamic resource pool • Demand 2 ◇ Multiple tasks, each with millions of entries Challenge: Parallel task scheduling Solution: Tencent Serverless Cloud Function

Learning Platform - Serverless Cloud Function

Advantage of Cloud Function ◇ Function As A Service ◇ Millions of CPU cores available ◇ Free of charge in idle slots 30% of costs on average

SDK SDK COS CMQ … Application layer

API Access layer

Function Config Function Call Function Coordination Control layer

Function Function Function … Execution layer Learning Platform - Model Training Platform

1.Requirement ◇ Billions of samples per task ◇ Fast model training Training Result Platform 2.Solution ◇ Multi-GPU, multi-machine ◇ Machine learning platform

3.Challenges ◇ IO Big Data Efficient data inputs Efficient computation

◇ Communication Efficient parameters exchange

Model Training Platform - IO

Data IO ◇ Multiprocessing ◇ “Lock free” queue

Efficient computation ◇ Data pre-caching ◇ OP speed up by multi-threading Model Training Platform - Communication

Parameters exchange ◇ NCCL2 [11] Efficient communication between GPUs ◇ RDMA Efficient communication across nodes

Model Training Platform - Performance

Acceleration

Multi-GPU Multi-Machine Speed-up Optimization results (acceleration ratio) 70

60

50

40

30

20

10

0 1GPU 8GPUs 16GPUs 32GPUs 64GPUs IO Computation Communication

Before After Upper bound Learning Platform - Reinforcement Learning Platform

• Demands ◇ Hierarchical RL Various scenarios ◇ Large-scale parallel self-play Millions of games ◇ Automatic task management Unified framework Model analysis Evaluation

RL Platform - Hierarchical RL

• Hierarchical RL Jungle ◇Scenario specific 打野 • Solution 清兵 团战 ◇General Hierarchical RL Laning • Features ◇Macro task selection

◇Micro task selection Combat ◇Effectively handles long-term planning and delayed rewards ◇Value network for guiding sub-task policy learning

RL Platform – Parallel Training

• Large-scale parallel self-play • Solution ◇ Docker image for gamecore version management ◇ Parallel training framework RL Platform – Automatic task management

• Unified framework for model analysis and evaluation ◇Task submission

◇ Task start/stop

◇ Results visualization

Reward curve 雷达图 Prediction distribution Self play results RL Platform – Performance

• Ten million scenarios per day ◇20s per scenario with 16 GPUs • Millions of full games ◇10min+ per game with 128 GPUs

Demonstration Visualization Demo – Quadra-kill Under Turret

• Micro combat ◇ Fight against mid-high level testers ◇ Killing while avoiding harm from turret Demo – Pentakill

• Micro combat ◇ Fight against mid-high level testers Demo – Transfer & Strategy

• Opening Demo – Transfer & Strategy

• First Dragon appears at 2:00 Demo – Transfer & Strategy

• Besiege and Destroy the Base Demo – RL

Before reinforcement After reinforcement Summary Tencent Game AI Research

• Pursue general AI via game AI research • MOBA AI ◇ Algorithm · Imitation + Reinforcement Learning ◇ Computing platform · Feature extraction platform Millions of CPUs · Model training platform Thousands of GPUs · Reinforcement learning platform Hierarchical RL

Tencent Game AI Research

• Future work ◇ Algorithm · Tactic-level search and planning · Multi-agent RL ◇ Computational power · Search/planning platform MCTS · Reinforcement learning platform Multi-agent RL About Tencent AI Lab

2017.11 Tencent is identified by Our journey China Ministry of Science and Technology to national open innovation platform for AI medical imaging Today Our team consists of 70 world-class AI scientists and 300 research engineers 2017.3 “Jueyi”Fine Art wins the UEC World Cup 2017.5 Tencent establishes its Seattle AI Lab and announces 2017.3 leading Speech Tencent Recognition expert announces leading Dr Dong Yu as AI researcher Dr Deputy Director Tong ZHANG as 2016.4 the Director of Tencent establishes its Tencent AI Lab corporate-level AI Lab About Tencent AI Lab

Diverse game ecosystem Game AI Environment for AGI

Massive user base Social AI New ways to communicate WeChat: ~1 billion MAU QQ: 850 million MAU

Content AI China’s leading news, video, music and literature platforms Perceiving the world and generating content

Medical AI Building a national open innovation Impact and advance platform for AI medical imaging industry Thank you References

• [1] Silver, David, et al. "Mastering the game of Go with deep neural networks and tree search." Nature 529.7587 (2016): 484-489. • [2] Artificial Intelligence Startup Landscape Trends and Insights - Q4 2016. NOVEMBER 20, 2016 VENTURE SCANNER. https://www.venturescanner.com/blog/2016/artificial-intelligence-startup-landscape-trends-and-insights-q4-2016 • [3] Tian, Yuandong, et al. "ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games." arXiv preprint arXiv:1707.01067 (2017). • [4] O Vinyals et al. StarCraft II: A New Challenge for Reinforcement Learning. https://deepmind.com/research/publications/starcraft-ii-new-challenge-reinforcement-learning/. Aug. 9, 2017 • [5] “We've created an AI which beats the world's top professionals at 1v1 matches of ”. https://blog.openai.com/dota-2/ • [6] Ontanó n, Santiago, Gabriel Synnaeve, Alberto Uriarte, Florian Richoux, David Churchill, and Mike Preuss. "RTS AI: Problems and Techniques." (2015): 1-12. • [7] Miles, Chris, and Sushil J. Louis. "Co-evolving real-time strategy game playing influence map trees with genetic algorithms." Proceedings of the International Congress on Evolutionary Computation, Portland, Oregon. IEEE Press, 2006. • [8] Jang, Su-Hyung, and Sung-Bae Cho. "Evolving neural NPCs with layered influence map in the real-time simulation game „Conqueror‟." Computational Intelligence and Games, 2008. CIG'08. IEEE Symposium on. IEEE, 2008. • [9] Weber, Ben George, Michael Mateas, and Arnav Jhala. "Building Human-Level AI for Real-Time Strategy Games." AAAI Fall Symposium: Advances in Cognitive Systems. Vol. 11. 2011. • [10] Xingjian, S. H. I., et al. "Convolutional LSTM network: A machine learning approach for precipitation nowcasting." Advances in neural information processing systems. 2015. • [11] Nathan Luehr. NCCL: ACCELERATED COLLECTIVE COMMUNICATIONS FOR GPUS. April 5, 2016. GPU Technology Conference 2016. • [12] CUDA MULTI-PROCESS SERVICE. https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf.