Learning of Cooperative Actions in Multi-Agent Systems: a Case Study of Pass Play in Soccer Hitoshi Matsubara, Itsuki Noda and Kazuo Hiraki
Total Page:16
File Type:pdf, Size:1020Kb
From: AAAI Technical Report SS-96-01. Compilation copyright © 1996, AAAI (www.aaai.org). All rights reserved. Learning of Cooperative actions in multi-agent systems: a case study of pass play in Soccer Hitoshi Matsubara, Itsuki Noda and Kazuo Hiraki Electrotechnical Laboratory 1-1-4 Umezono, Tsukuba, Ibaraki, 305 JAPAN e-maih {matsubar, noda, khiraki} @etl.go.jp Abstract Fromthe standpoint of multi-agent systems, soccer (association football), whichis just one of usual team sports, makea good example of problems in the real world which is moderately abstracted. Wehave cho- sen soccer as one of standard problems for the study on multi-agent systems, and we are now developing a soccer server, which provides a commontest-bench to various multi-agent systems. Wepresent an exper- iment on cooperative action learning amongsoccer- player agents on the server. Oursoccer agent can learn whetherhe should shoot a ball or pass it. Introduction Soccer (association football) is a typical team game, in which each player is required to play cooperatively. Figure 1: Soccer players and a ball And soccer is a real-time game in which situation changes dynamically. We have chosen soccer as one of standard problems for the study on multi-agent sys- ¯ Team play is advantageous. tems, and we are now developing a soccer server for ¯ As precise communication by language is not ex- the research. We present an experiment on coopera- pected, effectiveness must be provided by combining tive action learning amongsoccer-player agents on the simple signals with the situation. server. These characteristics show that soccer is an appro- Soccer as a Standard Problem priate example for evaluation of multi-agent systems. Recently, soccer has been selected as an example on From the standpoint of multi-agent systems, soccer real robots as well as software simulators (Asada 1995; (association football), which is just one of usual team Sahota 1993; Sahota 1994; Stone 1995). Robo-cup, the sports, make a good example of problems in the real robot world cup initiative, will be held in IJCAI-97 world which is moderately abstracted. Multi-agent (Kitano 1995). Many of those experiments, however, systems provides us with research subjects such as co- have their ownways of setting, which makesit difficult operation protocol by distributed control and effective to make a comparison among them. For satisfying the communication, while having advantages as follows: need for a commonsetting, we are developing a soc- ¯ efficiency of cooperation cer server(Kitano 1995; Noda 1995). Adoption of this soccer server as a commonsitting makes it possible to ¯ adaptation compare various algorithms on multi-agent systems in ¯ robustness the form of a game. ¯ real-time Soccer Server Soccer has the following characteristics: Fig.1 shows soccer players and a ball in our soccer ¯ robustness is more important than elaboration. server. Fig.2 shows an one-shot example of the server. ¯ adaptability is required for dynamic change of plans This server will be used as one of the official servers of according to the operations of the opposing team. Robo-Cup in IJCAI-97. 63 Figure 2: WindowImage of Soccer Server Overview (2) Check whether the ball is out of play; (3) The soccer server provides a virtual field where players trol positions of players for kick-offs, throw-ins and of two teams play soccer. Each player is controlled corner-kicks, so that players on the defending team by a client program via local area networks. Control keep a minimumdistance from the ball. protocols are simple in that it is easy to write client Judgments by the referee are announcedto all clients programs using any kind of programming system that as an auditory message. supports UDP/IP sockets. ¯ Control via Networks Protocol A client can control a player via local area networks. The protocol of the communication between clients As described above, a client connects to the server by and the server is UDP/IP. When a client opens a a UDP/IP socket. Using the socket, the client sends UDPsocket, the server assigns a player to a soccer commandsto control its player and receives sensory field for the client. The client can control the player information of the fields. Commandand sensor infor- via the socket. mation consist of ASCIIstrings. ¯ Physical Simulation The soccer server has a physical simulator, which ¯ Initialization simulates movementof objects (ball and players) and First of all, a client opens a UDPsocket and con- collisions between them. The simulation is simplified nects to a server host.(The server’s port number is so that it is easy to calculate the changes in real- assumed to be 6000.) Then, a client sends the fol- time, but the essence of soccer is not lost. lowing string to the server: The simulator works independently of communica- tions with clients. Therefore, clients should assume that situations on the field change dynamically. (init TEAMNAME(POS-X POS-Y) ¯ Referee The server assigns a player whose team name is The server has a referee module, which controls each "TEAMNAME,and initializes its position at (POS- game according to a number of rules. In the cur- X, POS-Y). If the initialization succeeds, the server rent implementation, the rules are: (1) Check goals; returns a following string to the client: Coach Mode In order to make it easy to set up a certain situation, (init SIDE NUMBER TIME), the server has a coach mode. In this mode, a special client called ’coach’ can connect to the server, whocan where SIDE, NUMBER,TIME indicates the side of moveany players and the ball. This mode is useful for the team (T or ’r’), the player’s uniform number, learning client programs. and the current time respectively. Current Status ¯ Action Command The server is implemented by g++ and X window with Each client can control its player by 3 kinds of com- Athena widgets. mands, ’turn’, ’dash’ and ’kick’. Clients also can communicatewith other clients using the ’say’ com- WWW home page mand. Commandformats axe as follows: Programs of the soccer server are available by FTP: ftp ://ci.etl.go, jp/pub/soccer/server/ sserver-l.8. tar.gz (turn Changethe direction of the player Homepage of the soccer server is: PO WER) according to POWER. POWER http ://ci.etl.go, jp/~noda/soccer/ should be -180 ,,~ 180. server,html (dash Increase the velocity of the player POWER) toward its direction according We alsohave a mailinglist about Robo-Cup: to POWER. POWERshould be rjl%csl,sony. co. jp -30 ,-~ 100. (kick POWER Kick a ball to DIRECTIONac- DIRECTION) cording to POWER. POWER Learning of pass play should be -180 ,,~ 180, and DI- RECTIONshould be 0 ,-- 100. On this soccer server, we are conducting an experiment This commandis effective only on cooperative-action learning amongsoccer players. when the distance between the Our first aim is to make the players learn howto kick player and the ball is less than 2. a pass, which is supposed to be the fundamentals of co- (say Broadcast MESSAGE to all play- operative action in soccer. The information that Shoot- MESSAG E) ers. MESSAGEis informed im- ing the ball for the enemy’s goal is the purpose of the mediately to clients using (hear game is already knownto the player agents. It is easy ...) format described below. for a successful shooting which leads to the team score when there are no enemy players in the area between These commandsmay not cause expected effects be- him and the goal, that is, whenhe is a free player try- cause of physical situation (ex. collisions) and noise. ing to shoot a goal. ¯ Sensory Information When there are two players from his side and one player from the other, and the player of the enemy’s Players can see objects in their views. The direc- team is defending against him, it is a wiser policy to tion of view of a player is the direction of movement pass the ball to a free player of his team than to make of the player, and the angle of view is 60 degrees. a shoot by himself. Evaluation of action is based on Players also can hear messages from other players the score and the time taken for making the score. and the referee. This information is sent to clients Wecarried out an experiment on our server as fol- in the following format: lows: ¯ There are two offense players and one defense player (see TIME Inform visual information. in a penalty area. OBJINFO OB- ¯ One offense player A is set at random. A ball is at OBJINFO... ) JINFOis information about a visible object, whosefor- his feet in the first place. He will shoot the ball to mat is (OBJ-NAME DIS- the goal or pass it to his teammate. He cannot move TANCE DIRECTION). anywhere. This messageis sent 2 times ¯ Another offense player B and one defense player C per second. (The frequency are set at random. C keeps his position between the maybe changed.) ball and the goal. In some situations (see Fig.3) (hear TIME Inform auditory informa- marks A (to pass the ball to B is a good strategy SENDER tion. This message is sent for A). In other situations (see Fig.4) C marks B MESSAGE) immediately when a client SENDERsends (say MES- shoot the ball to the goal directly is a good strategy). SAGE) command. TIME ¯ B is programmedto wait the pass from A and shoot indicates the current time. the ball. 65 ¯ B: angle is 54 degrees on the left distance is 4.5 ¯ C: angle changes from 30 degrees on the left to 30 degrees on the right distance is 5.5 Intuitively speaking, it is better to shoot the ball directly when C is on the left.