MOBA Game AI Bin Wu & Qiang Fu 28Th March 2018 Outline

Large-Scale Platform for MOBA Game AI Bin Wu & Qiang Fu 28th March 2018 Outline • Introduction • Learning algorithms • Computing platform • Demonstration Game AI Development Early exploration Transition Rapid development Explosive growth 1950s-1960s 1970s-1980s 1990s-2000s 2010s Alpha Go (DeepMind) Checkers beat state Chess4.5 beat human Deep Blue (IBM) beat defeat Lee Sedol, and champion players Garry Kasparov Jie Ke Applications of Game AI Gaming Research Core applications in gaming industry Ideal testbed for general AI research Pre-game procedures Massive data from e.g., game designing human players Low experimental Player experience costs e.g., teammates, enemies General ability for inception and decision Others e.g., E-sports From virtual world to real world Game AI Research Topic Game AI has become a research hot topic after the success of AlphaGo • Many AI giants have joined game AI research • Moving from Go->RTS, MOBA, etc. DOTA 2 1v1 beat top human players. 5v5 to be activated in 2018 Released Starcraft AI platform， preliminary results Released Starcraft in simple scenarios II AI platform，not able to defeat built-in AI MOBA Game • 5 v.s. 5 game Obtain gold/exp → gain advantages on equipment → win fights → destroy enemy’s base Goal: Destroy Enemy’s Base Enemy’s Turrets Attack/Skills Control Movement Control Equipment Purchase Neutral creeps: source of money/power/level/… MOBA Game • Micro Combat ◇ Movement ◇ Use of skills MOBA Game • Macro Strategies ◇ Back up ◇ Laning ◇ Ganking ◇ Stealing base MOBA AI - Key Challenge Learning Algorithm Computing Platform Learning Algorithms Learning Algorithms - Challenges Complexity 1 10^20000 Multi-agent 2 5v5 coordination Sparse and delayed rewards 3 20,000+ frames per game Imperfect Info 4 Partially observable Learning Algorithms - Challenges • Complexity >> Go ◇ End to end solutions (SL/RL) do not work well Not able to finish basic movement/attack Similar observations made • DeepMind Go MOBA State space 3^360 ≈ 10^170 10^20000 （361 pos, 3 states each） (10 heroes，2000+pos * 10+states) Left, right, … Skill 1,2,3+pos/target 250^150 ≈ 10^360 20^20000 ≈ 10^20000 recover (250 pos available, 150 Action space (20 actions，20,000 frames per game) return decisions per game in average) etc… Learning Algorithms - Challenges • Multi-agent ◇ Macro strategy level Four defending, while one steals the base ◇ Micro combat level Tanks protecting assassins Learning Algorithms - Challenges • Sparse and delayed rewards Go <360 steps >20,000 steps MOBA Learning Algorithms - Challenges • Imperfect Information ◇ Maps are partially observable Guess enemy’s positions/strategy Actively explore to gain vision Model Architecture • Divide and Conquer Strategy Transfer Split for simplification Solution space ~10^20000->~10^2000 Combat Model - Transfer • Where to send heroes? ◇ Compared to Go game Put heroes as stones Put maps as boards ◇ Predict good position Hotspots Prediction Transfer Model - Strategy • Key resources in MOBA ◇ Modeling macro objectives Describe hotspots transition series before destroying the key resource Model - Strategy 宏观 DescribeSession hotspots切分示例 transition series before destroying the key resource Mid 1st Dark Slain Start Dragon turret dragon Dragon Mid 2nd Mid 3rd Base Bottom 1st turret turret turret Stealing blue creep Killing bottom lane Attacking bottom 1st creeps turret Model - Transfer Network with Macro Strategy Key resources Hotspots Model - Combat • Multi-task on buttons ◇ Action space Directions Skill releasing position Learning Framework • Imitation + Reinforcement Learning Computing Platform MOBA Game AI Platform • Computing Platform ◇ Computational power – large-scale CPU/GPU virtualization ◇ Learning platform – Efficient and easy-to-use platform Game environment Reinforcement Service Feature extraction Model training deployment learning Task Tencent cloud Managem Machine learning ent function Resource allocation Elastic computation Kubernetes resource allocation Online service Idle resource pool Online service Offline service Deploym ent Docker + mixed online/offline technique Docker + GPU virtualization for shared resource Computat ional units Millions of CPUs Thousands of GPUs Computational Power • Computational Costs GPU CPU MOBA AI thousands millions CPU/GPU Demands Challenge Solution The more is the better Improve resource utilization CPU/GPU virtualization for efficiency without additional costs shared resources CPU Virtualization • Elastic and dynamic resource pool ◇ millions of CPU cores 70% - Idle resource pool ◇ New resources not yet delivered ◇ Old resources not yet cleared ◇ Returned resources 30% - Idle slots in online service ◇ Online service resource usage # of CPU cores # CPU avg % Percentage millions 20% Elastic & Dynamic Resource Pool ◇ 20%->65% using docker isolation GPU Virtualization • Goal: improve GPU usage efficiency • Resource usage # of GPU % of low load machines GPU avg usage thousands 65% 28% • Optimization idea GPU Virtualization [12] Time-slice share Parallel share Learning Platform Core Technique Version Update Frequency Feature extraction Hours Model training One day RL training One day Learning Platform - Feature Extraction Platform Game replays Game Raw Data Features Training samples Models Evaluation pre Feature extraction shuffle Training Evaluation gamecore • Demand 1 ◇ Feature extraction from up to hundreds of thousands of replays Challenge: demands up to 210 thousand CPU cores per day Solution • CPU virtualization • docker elastic & dynamic resource pool • Demand 2 ◇ Multiple tasks, each with millions of entries Challenge: Parallel task scheduling Solution: Tencent Serverless Cloud Function Learning Platform - Serverless Cloud Function Advantage of Cloud Function ◇ Function As A Service ◇ Millions of CPU cores available ◇ Free of charge in idle slots 30% of costs on average SDK SDK COS CMQ … Application layer API Access layer Function Config Function Call Function Coordination Control layer Function Function Function … Execution layer Learning Platform - Model Training Platform 1.Requirement ◇ Billions of samples per task ◇ Fast model training Training Result Platform 2.Solution ◇ Multi-GPU, multi-machine ◇ Machine learning platform 3.Challenges ◇ IO Big Data Efficient data inputs Efficient computation ◇ Communication Efficient parameters exchange Model Training Platform - IO Data IO ◇ Multiprocessing ◇ “Lock free” queue Efficient computation ◇ Data pre-caching ◇ OP speed up by multi-threading Model Training Platform - Communication Parameters exchange ◇ NCCL2 [11] Efficient communication between GPUs ◇ RDMA Efficient communication across nodes Model Training Platform - Performance Acceleration Multi-GPU Multi-Machine Speed-up Optimization results (acceleration ratio) 70 60 50 40 30 20 10 0 1GPU 8GPUs 16GPUs 32GPUs 64GPUs IO Computation Communication Before After Upper bound Learning Platform - Reinforcement Learning Platform • Demands ◇ Hierarchical RL Various scenarios ◇ Large-scale parallel self-play Millions of games ◇ Automatic task management Unified framework Model analysis Evaluation RL Platform - Hierarchical RL • Hierarchical RL Jungle ◇Scenario specific 打野 • Solution 清兵团战 ◇General Hierarchical RL Laning • Features ◇Macro task selection ◇Micro task selection Combat ◇Effectively handles long-term planning and delayed rewards ◇Value network for guiding sub-task policy learning RL Platform – Parallel Training • Large-scale parallel self-play • Solution ◇ Docker image for gamecore version management ◇ Parallel training framework RL Platform – Automatic task management • Unified framework for model analysis and evaluation ◇Task submission ◇ Task start/stop ◇ Results visualization Reward curve 雷达图 Prediction distribution Self play results RL Platform – Performance • Ten million scenarios per day ◇20s per scenario with 16 GPUs • Millions of full games ◇10min+ per game with 128 GPUs Demonstration Visualization Demo – Quadra-kill Under Turret • Micro combat ◇ Fight against mid-high level testers ◇ Killing while avoiding harm from turret Demo – Pentakill • Micro combat ◇ Fight against mid-high level testers Demo – Transfer & Strategy • Opening Demo – Transfer & Strategy • First Dragon appears at 2:00 Demo – Transfer & Strategy • Besiege and Destroy the Base Demo – RL Before reinforcement After reinforcement Summary Tencent Game AI Research • Pursue general AI via game AI research • MOBA AI ◇ Algorithm · Imitation + Reinforcement Learning ◇ Computing platform · Feature extraction platform Millions of CPUs · Model training platform Thousands of GPUs · Reinforcement learning platform Hierarchical RL Tencent Game AI Research • Future work ◇ Algorithm · Tactic-level search and planning · Multi-agent RL ◇ Computational power · Search/planning platform MCTS · Reinforcement learning platform Multi-agent RL About Tencent AI Lab 2017.11 Tencent is identified by Our journey China Ministry of Science and Technology to build national open innovation platform for AI medical imaging Today Our team consists of 70 world-class AI scientists and 300 research engineers 2017.3 “Jueyi”Fine Art wins the UEC World Cup 2017.5 Tencent establishes its Seattle AI Lab and announces 2017.3 leading Speech Tencent Recognition expert announces leading Dr Dong Yu as AI researcher Dr Deputy Director Tong ZHANG as 2016.4 the Director of Tencent establishes its Tencent AI Lab corporate-level AI Lab About Tencent AI Lab Diverse game ecosystem Game AI Environment for AGI Massive user base Social AI New ways to communicate WeChat: ~1 billion MAU QQ: 850 million

Load more