Trajectory Prediction Through Deep Reinforcement Learning in Unity

TRAJECTORY PREDICTION THROUGH DEEP REINFORCEMENT LEARNING IN UNITY A Thesis Presented B y Tribhuwan Singh t o The Department of College of Engineering in partial fulfillment of the requirements for the degree of Master of Science in the field of Data Analytics Engineering Northeastern University Boston, Massachusetts M a y 2 0 2 1 ii A B S T R A C T The thesis was carried out to develop a 3D game tha t could be used to play. The case of the project was a traject ory prediction shooting game in which the player had to search the goal(star) inside a polytope and shoot the ball to the goal. T here were a few obstacles inside the polytope, and they spawn r andomly. The player must constantly hit the goal continuously as many as it can. If the player misses the goal, then the game i s over. This thesis investigates how artificial intelligence tec hniques can be used to predict the trajectory of the ball. The game contains two main game scenes, one is the Graphic User Interfaces (GUI) scene and another one is the game scene. The first part describes the practicalities of the game. The player could play, control the physics of the ball, get he lp with the Prediction line, and start or end this game on the GUI scene. This thesis presents the, to our knowledge, second part for introducing the vast field of artificial intelligen ce. Topics introduced include Reinforcement Learning technique s, agent simulation, and Imitation Learning. We believe our approach demonstrates the artificial agent that can learn a conditioned trajectory to model human m otion inside a polytope using imitation learning. iii ACKNOWLEDGEMENTS I would like to thank wholeheartedly Dr. Tony Mulle n a n d D r . Sagar Kamarthi for mentoring me throughout the cour se of this thesis and providing me with invaluable suggestions , feedback, and encouragement without which, I would not have b een able to complete this thesis. I am thankful to them for the ir constant help and support in light of this thesis. iv LIST OF FIGURES 2.1 Unity Project Structure ....................... ..................... 12 2.2 Unity High Level Architecture ................. ................. 13 2.3 Unity Domain Model Diagram .................... ............... 14 2.4 Design of the Monobehaviour class ............. ............... 15 2.5 Execution order of Event Functions ............ ................ 17 3.1 Visual Depiction of Learning Environment ...... ............ 19 3.2 The reinforcement Learning cycle .............. ................ 20 3.3 Objective Function of PPO ..................... ................... 21 3.4 Demonstration recorder component .............. .............. 23 3.5 Demonstration Inspector ....................... ................... 24 4.1 Illustration of a Reinforcement Learning Archit ecture .... 28 4.2 Illustration of the Convolutional neural Networ k ........... 30 5.1 Final implementation of the game .............. ................ 34 5.2 Cumulative reward over each step in the environ ment .... 35 5.3 Episode length over each step in the environmen t ......... 36 5.4 Entropy over each step in the environment ..... ............. 37 5.5 Learning rate over each step in the environment ............ 38 5.6 Policy loss over each step in the environment . ............. 39 5.7 Value estimate over each step in the environmen t ......... 40 v 5.8 Value loss over each step in the environment .. .............. 41 8.1 Demonstration of the game played by the trained agent.. 47 8.2 GAIL Value loss over each step in the environme nt ...... 47 8.3 Extrinsic Reward over each step in the environm ent ...... 48 8.4 Gail Policy Estimate over each step in the envi ronment .. 48 8.5 Gail Reward over each step in the environment . ........... 49 vi LIST OF TABLES 4.1 Data Table List of hyperparameters ............ ................ 29 4.2 Data Table List of Network Setting ............ ................ 31 4.3 Data Table List of Reward Signal .............. ................ 32 4.4 Data Table List of Evaluation Setting ......... ................. 33 vii LIST OF ABBREVATIONS GAIL Generative Adversarial Imitation Learning PPO Proximal Policy Optimization CNN Convolutional Neural Network IL Imitation Learning RL Reinforcement Learning ML Machine Learning IL Importance Weighted Actor-Linear Architecture LSTM Long Short-Term Memory viii TABLE OF CONTENTS ABSTRACT .......................................... ........................ ii ACKNOWLEDGEMENTS .................................. ............. iii LIST OF FIGURES ................................... ..................... iv LIST OF TABLES .................................... ..................... vi LIST OF ABBREVATIONS .............................. .............. vii 1 . INTRODUCTION ...................................... ............ 10 2 . UNITY3D ASSET DEVELOPMENT ......................... 1 1 2.1. Architecture ...................................... ............ 11 2.1.1. High-Level Architecture .......................... 1 2 2.1.2. Scene Management .................................. 1 3 2.2. Unity Lifecycle ................................... ........... 14 3 . BACKGROUND ........................................ ............ 18 3.1. ML-Agents ......................................... .......... 18 3.2. Reinforcement Learning ............................ ...... 20 3.3. Proximal Policy Optimization ...................... 2 1 3.4. Imitation Learning ................................ ......... 22 3.5. Visual Encoding ................................... ......... 25 4 . METHOD ............................................ ................ 26 4.1. Preprocessing ..................................... .......... 26 4.2. Model Architecture ................................ ........ 27 4.2.1. Network Architecture .............................. 2 7 4.2.2. Visual Encoder Architecture ..................... 2 9 4.3. Training Details .................................. .......... 31 4.4. Evaluation Procedure .............................. ....... 32 5 . RESULTS ........................................... ................. 34 5.1. Cumulative Reward ................................. ....... 35 ix 5.2. Episode Length .................................... ......... 36 5.3. Entropy ........................................... ............ 37 5.4. Learning Rate ..................................... .......... 38 5.5. Policy Loss ....................................... ........... 39 5.6. Value Estimate .................................... .......... 40 5.7. Value Loss ........................................ ........... 41 6 . CONCLUSION ........................................ .............. 42 6.1. Future Work ....................................... .......... 42 7 . REFERENCES ........................................ .............. 44 8 . APPENDIX .......................................... ................ 47 10 1. Introduction The theory of reinforcement learning is to train an agent to learn to do some actions that maximize the expected rewar d with an environment. The objective is to train agents to op timize their control of an environment when they are confronted with a complex task. Reinforcement learning has been succe ssful in a variety of domains and we want to extend that to a notch by developing an agent that can learn successful polic ies directly from sensory observations. To get a faster and a be tter agent we simply demonstrated the behavior we wanted the agen t to perform by providing real-world examples from the g ame. Therefore, we used Generative Adversarial Imitation Learning (GAIL) to work parallelly with Proximal Policy Opti mization (PPO) to ensure an optimal action. We demonstrated that our agent is capable of learni ng a conditioned trajectory to model human motion inside a polytope using imitation learning. 11 2. Unity3D Asset Development “Unity3D is a cross-platform game engine with a bui lt-in IDE developed by Unity Technologies. It is generally us ed to develop video games for computer platforms such as web and desktop, consoles, and mobile devices, and is applied by sev eral million developers in the world. Unity is primarily used to create mobile and web games, but there are various games to be de veloped for PC. The game engine was programmed in C/C++ and can support codes written in C#, JavaScript, or Boo. It grew fr om an OS X- supported game development tool in 2005 to the mult i-platform game engine that it is today [1].” “Unity is the perfect choice for small studios, ind ie developers, and those of us who have always wanted to make game s. Its large user base and extremely active user community allow everyone from newbies to seasoned veterans to get answers an d share information quickly [1].” “Unity provides an excellent entry point into game development, balancing features, and functionality with a price point. The free version of Unity allows people to experiment, learn , develop, and sell games before committing any of their hard- earned cash. Unity is a very affordable, feature-packed Pro vers ion that is royalty-free, allowing people to make and sell game s with the very low overhead essential to the casual games mar k e t [ 1 ] . ” 2.1 Architecture Unity3D is a commercial tool and a closed commercia l e n g i n e . It integrates Animation Mechanics, Character Mechan ics, Player Mechanics, Environment Mechanics, and Programming Developer. All the game projects of Unity follow th e s a m e 12 model. Every game is component-based and contains a t least a scene. The scene covers game objects. The component s inside

Load more