Machine Learning in Simulated Robocup Optimizing the Decisions of an Electric Field Agent
Total Page:16
File Type:pdf, Size:1020Kb
Master Thesis Software Engineering Thesis no: MSE-2003:19 June 2003 Machine learning in simulated RoboCup Optimizing the decisions of an Electric Field agent Markus Bergkvist Tobias Olandersson Department of Software Engineering and Computer Science Blekinge Institute of Technology Box 520 SE - 372 25 Ronneby Sweden This thesis is submitted to the Department of Software Engineering and Computer Science at Blekinge Institute of Technology in partial fulfillment of the requirements for the degree of Master of Science in Software Engineering. The thesis is equivalent to 2×20 weeks of full time studies. Contact Information: Author: Markus Bergkvist Address: Blasius K¨onigsgatan30B, 372 35 Ronneby E-mail: [email protected] Author: Tobias Olandersson Address: Bl˚ab¨arsv¨agen27, 372 38 Ronneby E-mail: [email protected] University advisor: Stefan Johansson Department of Software Engineering and Computer Science Department of Software Engineering and Computer Science Internet : www.bth.se/ipd Blekinge Institute of Technology Phone : +46 457 38 50 00 Box 520 Fax : + 46 457 271 25 SE - 372 25 Ronneby Sweden Abstract An implementation of the Electric Field Approach applied to the simulated RoboCup is presented, together with a demonstration of a learning system. Re- sults are presented from the optimization of the Electric Field parameters in a limited situation, using the learning system. Learning techniques used in con- temporary RoboCup research are also described including a brief presentation of their results. Keywords: Self-learning, Parameter optimization, Simulated RoboCup, Multi- Agent system, Electric Field Approach Contents 1 Introduction 1 1.1 RoboCup . 1 1.1.1 Simulated league . 2 1.1.2 Environmental issues . 2 1.2 Electric Field Approach . 3 1.3 Learning techniques . 3 1.3.1 Reinforcement learning . 3 1.3.2 Q-learning . 4 1.3.3 Hill-climbing . 4 1.4 Contemporary research . 5 1.4.1 Brainstormers . 5 1.4.2 Tsinghuaeolus . 6 1.4.3 CMUnited . 6 1.5 Problem description . 6 1.6 Delimitations . 7 1.7 Method . 7 1.7.1 Literature survey . 7 1.7.2 Experiments . 7 1.8 Thesis outline . 7 2 Implementation 8 2.1 CRaPI, a RoboCup API . 8 2.2 Yaffa, a RoboCup player . 9 2.3 Our approach . 10 2.3.1 Conceptualization of EFA . 10 2.3.2 Implementation of EFA . 11 2.3.3 Implementation of a learning system . 12 3 The experiment 15 3.1 Setup................................. 15 3.2 Results . 16 3.2.1 Training phase . 16 3.2.2 Benchmarks . 17 4 Discussion 20 4.1 Results . 20 4.1.1 Utilities for training . 20 4.1.2 Trained vs Untrained . 20 i 4.1.3 Game results . 20 4.1.4 Reliability . 21 4.2 Problems . 21 4.2.1 Passing . 21 4.2.2 Intersecting . 22 4.2.3 WorldModel . 22 4.2.4 Size of implementation . 22 5 Conclusion 23 5.1 Future work . 23 6 Acknowledgements 24 Bibliography 25 A Behaviours in Yaffa 27 ii List of Figures 2.1 Architectural overview of CRaPI . 8 2.2 Architectural overview of Yaffa . 9 2.3 Visualization of the Electric Field Generator . 13 2.4 Snapshot of a keep-away situation . 14 3.1 Charges for object types during training. 16 3.2 The hill-climbing through each training round. 17 3.3 Benchmark of utilities for the keep-away situation . 18 3.4 Benchmark of filtered utilities for the keep-away situation where only values exceeding 80 are included. 19 iii List of Tables 1.1 Reinforcement Learning approach vs Greedy policy. 6 3.1 Utilities for training . 16 3.2 Best charge configuration found during training . 17 3.3 Utilities for trained and untrained team . 17 3.4 Filtered utilities for trained and untrained team where only values exceeding 80 are included in the calculations. 18 3.5 Results after full time matches (6000 cycles). Multiple values are shown when matches have been played more than once. 18 3.6 Ball possesion after full time matches (6000 cycles). 19 4.1 Standard deviation of utility values on portion of full test. 21 iv Chapter 1 Introduction “By the year 2050, develop a team of fully autonomous humanoid robots that can win against the human world soccer champion team.” -RoboCup Federation [2] The goal and timetable is aimed to advance the overall level of technology in society. There will be several technological achievements even if the goal is not completely fulfilled, ranging from improved sensor reading computations to advanced multi-agent coordination policies. 1.1 RoboCup The idea of using soccer-playing robots in research was introduced by Mack- worth [1]. Unfortunately, the idea did not get the proper response until the idea was further developed and adapted by Kitano, Asada, and Kuniyoshi, when proposing a Japanese research program, called Robot J-League, a professional soccer league in Japan [1]. During the autumn of 1993, several American researchers took interest in the Robot J-League, and it thereafter changed name to the Robot World Cup Initiative or RoboCup for short. RoboCup is sometimes referred to as the RoboCup challenge or the RoboCup domain. In 1995, Kitano et al. proposed the first Robot World Cup Soccer Games and Conferences to take place in 1997 [1]. The aim of RoboCup was to present a new standard problem for AI and robotics, somewhat jokingly described as the life of AI after Deep Blue [1]. RoboCup differs from previous research in AI by focusing on a distributed so- lution instead of a centralized solution, and by challenging researchers from not only traditionally AI-related fields, but also researchers in the areas of robotics, sociology, real-time mission critical systems, etc. To co-ordinate the efforts of all researchers, the RoboCup Federation was formed. The goal of RoboCup Federation is to promote RoboCup, for example by annually arranging the world cup tournament. Members of the RoboCup 1 CHAPTER 1. INTRODUCTION 2 Federation are all active researchers in the field, and represent a number of universities and major companies. As the body of researchers is quite large and widespread, local committees are formed to promote RoboCup-related events in their geographical area. In order for a robot team to actually perform a soccer game, various technolo- gies must be incorporated including: design principles of autonomous agents, multi-agent collaboration, strategy acquisition, real-time reasoning, robotics, and sensor-fusion. RoboCup is a task for a team of multiple fast-moving robots under a dynamic environment. 1.1.1 Simulated league The RoboCup simulator league is based on the RoboCup simulator, the soccer server [1]. The soccer server is written to support competition among multiple virtual soccer players in an uncertain multi-agent environment, with real-time demands as well as semi-structured conditions. One of the advantages of the soccer server is the abstraction made, which relieves the researchers from having to handle robot problems such as object recognition, communications, and hardware issues, e.g., how to make a robot move. The abstraction enables researchers to focus on higher level concepts such as co-operation and learning. Since the soccer server provides a challenging environment, i.e., the intentions of the players cannot be mechanically deduced, there is a need for a referee when playing a match. The included artificial referee is only partially implemented and can detect trivial situations, e.g., when a team scores. However, there are several hard to detect situations in the soccer server, e.g., deadlocks, which brings the need for a human referee. There have been five world cups and one pre-world cup event [1]. 1.1.2 Environmental issues The type of environment in RoboCup is one of the hardest to deal with according to the definition of environments in “Artificial Intelligence, A modern approach” [9]. • It is inaccessible since the field of view is limited to the view angle and it is only possible to see a part of the soccer field, i.e. the agent has to maintain an internal state of the soccer field. • It is nondeterministic from the agents viewpoint1 because the next state of the environment cannot be determined by its current state and the actions selected by the agent. For instance, there are 21 other players the agent can only guess what actions they will select, and it is not possible to calculate the exact trajectory of the ball. • It is nonepisodic because the agent has to plan its actions several cycles ahead, and every action the agent does has impact on the subsequent cycle. 1It is deterministic from the viewpoint of the system, since the noise can be predicted given that the randomization key and the movement model is known. CHAPTER 1. INTRODUCTION 3 • It is semidynamic because the environment will not change while the agent is deliberating if the agent comes to a decision and acts within passage of the cycle time (currently 100ms). If the agent does not send a command to the server before the ending of the cycle, the server calculates the next state without any action from the agent. • It is discrete in the sense that the agent know what flags and number of players he can see, and what actions the agent can do. But it is continuous in the sense that the possible perceptions of the flags and the players are neither limited nor clearly defined. 1.2 Electric Field Approach In autonomous robotics, artificial potential fields are often used to plan and control the motion of physical robots [6]. The Electric Field Approach [4] is proposed as a generalization of traditional potential field approaches, which al- lows for both motion control and object manipulation.