Chinese Journal of Aeronautics, (2015),28(1): 191–199

Chinese Society of Aeronautics and Astronautics & Beihang University Chinese Journal of Aeronautics

[email protected] www.sciencedirect.com

Consensus based on learning with a UAV rendezvous application

Lin Zhongjie, Liu Hugh hong-tao *

University of Toronto Institute for Aerospace Studies, 4925 Dufferin Street, Toronto, Ontario M3H 5T6, Canada

Received 23 July 2014; revised 15 September 2014; accepted 2 November 2014 Available online 26 December 2014

KEYWORDS Abstract Multi-agent cooperation problems are becoming more and more attractive in both civil- Consensus; ian and military applications. In multi-agent cooperation problems, different network topologies Distributed algorithms; will decide different manners of cooperation between agents. A centralized system will directly ; control the operation of each agent with information flow from a single centre, while in a distrib- Game theory; uted system, agents operate separately under certain communication protocols. In this paper, a sys- Multi-agent systems; tematic distributed optimization approach will be established based on a learning game algorithm. The convergence of the algorithm will be proven under the game theory framework. Two typical consensus problems will be analyzed with the proposed algorithm. The contributions of this work are threefold. First, the designed algorithm inherits the properties in learning game theory for problem simplification and proof of convergence. Second, the behaviour of learning endows the algorithm with robustness and autonomy. Third, with the proposed algorithm, the consensus problems will be analyzed from a novel perspective. ª 2015 Production and hosting by Elsevier Ltd. on behalf of CSAA & BUAA.

1. Introduction problems, different network topologies will influence different manners of cooperation between agents. A centralized system Multi-agent cooperation problems are becoming more and will directly control the operation of each agent with informa- more attractive in both civilian and military applications, such tion flow from a single centre, while in a distributed system, as the real-time monitoring of crops status, monitoring the agents operate separately under certain communication traffic situation,1,2 forest fire monitoring,3,4 surveillance5–7 protocols. and battle field assessment.8 In multi-agent cooperation There are many related studies on distributed multi-agent cooperation problems such as flocks,9 swarms,10,11 sensor 12,13 * Corresponding author. Tel.: +1 416 667 7928. fusion and so on. The theoretical framework for construct- E-mail address: [email protected] (H.h.-t. Liu). ing and solving consensus problems within networked systems 14–17 Peer review under responsibility of Editorial Committee of CJA. was introduced in Refs. . For distributed multi-agent systems, there are three chal- lenges need to be addressed to achieve cooperation among a potentially large number of involving agents: Production and hosting by Elsevier http://dx.doi.org/10.1016/j.cja.2014.12.009 1000-9361 ª 2015 Production and hosting by Elsevier Ltd. on behalf of CSAA & BUAA. 192 Z. Lin, H.hong-tao Liu

(1) Limited information for agents to utilize to achieve the Consider a 2 · 2 matrix game global objective. (2) The information of the distributed network topology is unknown to agents. ð1Þ (3) The network can be dynamic, such as stochastic net- works or switching networks.

where aij ði ¼ 1; 2; j ¼ 1; 2Þ represents the cost of Agent 1 To construct a game theory model for a distributed multi- 1 2 according to its strategies fl1; l1g and bij ði ¼ 1; 2; j ¼ 1; 2Þ is agent cooperation problem: first, the agents of the game and 1 2 the cost of Agent 2 for its strategies fl2; l2g. Agents will decide their domains needs to be identified, in which agents their strategies based on their current empirical frequency for are considered as myopic and rational players; second, based their opponents. The algorithm starts with an initial empirical on the strategy domains and the mission requirement, a global frequency cost function that reflects the performance of the joint actions will be constructed; third, an optimal strategy will be derived 2 2 2 e1ð0Þ¼ðw1; w2Þ out based on the established global cost function. Therefore, ð2Þ e1ð0Þ¼ðw1; w1Þ in order to achieve cooperation in a distributed network 2 1 2 among myopic agents, the designed game will yield the lowest j where wi ði ¼ 1; 2; j ¼ 1; 2Þ represents the anticipated weight costs for all agents if and only if when their strategies will ben- for opponent j’s ith strategy. efit the global objective. Take Agent 1 as an example, according the empirical fre- Based on bargaining game concept, the author suggests a quency of Agent 1, the cost for each strategy is useful strategy for task assignment and resource distribution 18 8 problems with a small number of agents. An approach is pro- 2 2 > w a11 þ w a12 <> J ðl1je2Þ¼ 1 2 posed for consensus problems based on cooperative game the- 1 1 1 w2 þ w2 ory and bargain game theory.19 A distributed optimization 1 2 ð3Þ > w2a þ w2a algorithm is designed based on the state potential game :> 2 2 1 21 2 22 J1ðl1je1Þ¼ 2 2 framework.20,21 w1 þ w2 In classical game theory, agents are assumed to be able Both agents will choose the strategies that yields the lowest to correctly anticipate what their opponents will do. It cost based on their own empirical frequencies. At the end of may be unrealistic in practice. In real application, agents each game, agents will update their empirical frequency cannot introspectively anticipate how others will act in dis- according to their opponents’ strategies in the previous game. tributed networks, especially in games with a large number of participants. Unlike classical game theory, learning game 23 theory does not impose assumptions on agents’ rationality Definition 1. (e-dominate ). A strategy li 2Ui is e-dominated and believes, but assumes instead that agents can learn over by another strategy l^i 2Ui, if for any li 2Ui, there exists time about the game and the behaviour patterns of their Jiðli; liÞ > Jiðl^i; liÞþe. The set of e-dominate strategy opponents. domain of agent i is denoted as The preliminary result was presented.22 In this paper, a sys- e 6 tematic distributed optimization approach will be established Di ðUÞ ¼ fli 2UijJiðli; liÞþe Jiðl^i; liÞ; based on a learning game algorithm. The convergence of the 8l^i 2Ui; 8li 2Ui; e P 0gð4Þ algorithm will be proven under game theory framework. According to the definition of the e-dominate domain, an Two typical consensus problems will be analyzed with the pro- adaptive learning is defined as posed algorithm. Definition 2. (Adaptive learning23). A sequence of strategy for 2. Preliminaries agent i as fliðtÞg is consistent with adaptive learning, if

e ^6 ^ For a game GðN; ðUi; i 2 NÞ; ðJi; i 2 NÞÞ, Ui represents agent i’s liðtÞ2Di ðflðsÞjt s < tgÞ; 8t < t ð5Þ strategy domain with its element as li. The collected strategy domain of Ui is denoted as U¼i NUi with its element as 2 Definition 3. (). A strategy l ¼ðli ; liÞ is l ¼ðli; liÞ, where li denotes the strategies of other agents called a Nash Equilibrium of game GðN; fUi; i 2 Ng; besides agent i. The cost function for agent i is denoted as fJi; i 2 NgÞ, if and only if Jiðli; liÞ, which implies that agent i’s cost depends both on its own strategy and on every other agent’s strategy. 6 Jiðli ; liÞ Jiðli; liÞ; 8i 2 N; 8li 2Ui ð6Þ

2.1. Governing equation Definition 4. (Convergence). The sequence ffðtÞg converges to f if and only if there is a T > 0 such that fðtÞ¼f for all t P T. Fictitious play algorithm is adopted for designing the distrib- 23 uted optimization algorithm in this work. It is a variant of a Lemma 1. (Adaptive learning convergence ). If certain ‘‘stationary Bayesian learning’’ process. In fictitious play, strategy sequence fliðtÞg is consistent with adaptive learning agents assume behaviour patterns of their competitors were and it converges to li , then li belongs to a pure strategy Nash 22 stationary. equilibrium l ¼ðli ; liÞ. Consensus based on learning game theory with a UAV rendezvous application 193

2.2. Potential games The matrix A(k) is row stochastic matrix, which suggests that The potential game concept is important for distributed multi- limeLt ¼ 1vT ð11Þ agent cooperation problems. t!1 and 24 Definition 5. (Potential game ). A game GðN; fUi; i 2 Ng; k T fJi; i 2 NgÞ is a potential game if there exists a global function lim A ¼ 1u ð12Þ k!1 Uðli; liÞ such that P P where ivi ¼ 1 and iui ¼ 1.P Therefore, Eqs. (11) and (12) suggest that limt!1xðtÞ¼ vixið0Þ and limt!1xðkÞ¼ Jiðli; liÞJiðl^i; liÞ¼Uðli; liÞUðl^i; liÞ; P i iuixið1Þ . 8i 2 N; li; l^i 2Ui; li 2Ui ð7Þ If the strategy domain for every agent of a potential game is 3. Distributed optimization algorithm bounded, this game is defined as a bounded potential game. The potential game concept provides a valuable theoretical Under a centralized network, the centre can obtain informa- framework for distributed multi-agent cooperation problems. tion from all agents and conduct the optimization. For a dis- First, a game admits the potential property is guaranteed to tributed system, only partial information is available for possess a Nash equilibrium. Second, from the definition of a each agent due to the distributed network topology. potential game, the Nash equilibrium for every local cost func- According to the mission requirement of a distributed tion is consistent with the global objective. The potential game multi-agent cooperation problem, the global cost function is framework, therefore, provides distributed optimization prob- defined as lems with theoretical support for problem simplification. Jgðx1; x2; ...; xnÞð13Þ Many design methodologies25–27 can be adopted to derive a potential game according to the global objective. and the optimization purpose of this problem is to find optimal fxi ; i 2 Ng that satisfies

2.3. Continuous and discrete consensus algorithm xi ¼ arg max Jgðx1; x2; ...; xnÞ; i 2 N ð14Þ xi2Ui The interaction topology of a network can be denoted as In a distributed system, the only information available for GðN; EÞ with the set of agents N and edges E. The set of the agent i is its own local state xi and the information from its neighbours of agent i is denoted as Ni ¼fj 2 Njði; jÞ2Eg. x j N nn neighbours f j; 2 ig. A distributed optimization algo- An adjacent matrix A ¼½aij2R is defined as rithm, therefore, aims at designing a protocol under a distrib- aij > 0Ifði; jÞ2E uted network to achieve an asymptotic convergence of the state of each agent fxiðkÞ; i 2 Ng according to the global aij ¼ 0 Otherwise objective in Eq. (13). nn TheP Laplacian matrix L ¼½lij2R is defined as that n – lii ¼ j¼1aij and lij ¼aij if i j. 3.1. Distributed optimization algorithm based on fictitious play concept Definition 6. (Stochastic matrix). A nonnegative square matrix is a row stochastic matrix if its every row is summed 28,29 In order to adopt and extend the fictitious play concept to up to one. establish a distributed optimization approach, first, the empir- The product of two stochastic matrices is still a stochastic ical frequency will be extended to the continuous domain. Sec- matrix. A row stochastic matrix P 2 Rnn is a stochastic inde- ond, based on the global objective and local empirical k T composable and aperiodic (SIA) matrix, if limk!1P ¼ 1y frequency, a local strategy evaluation function for agent i will with y 2 Rnn. be established. Third, an optimal strategy for agent i will then The continuous-time consensus algorithm for single inte- be decided for next iteration process. grator system is described as X 3.1.1. Empirical frequency x_ ðtÞ¼ a ðtÞðx ðtÞx ðtÞÞ ð8Þ i ij j i In order to adopt the fictitious play concept for a continuous j2Ni distributed problem, a distribution function is adopted to Its Laplacian matrix form is represented as model the empirical frequency build up by each agent for its x_ ðtÞ¼LðtÞxðtÞð9Þ counterparts. Consider the strategy domain Ui for every agent i is contin- The consensus algorithm in Eqs. (8) and (9) is for a single- uous and convex. Agent i will model the strategy adopted by integrator system with dynamics as x_i ¼ ui. From Eq. (9), the its opponent j at time k as a Gaussian distribution discrete consensus control law is derived as 2 2 ljðkÞ¼fðujðkÞ; rj ðkÞÞ NðujðkÞ; rj ðkÞÞ. The empirical fre- 8 quency of agent i for its counterpart j at time k is defined as > xðk þ 1ÞxðkÞ¼LðkÞxðkÞ j < eiðkÞ. If the strategy agent j plays in time k +1is xðk þ 1Þ¼ðX1 LðkÞÞxðkÞ¼AðkÞxðkÞ > ð10Þ > x k 1 a k x k : ið þ Þ¼ ijð Þ jð Þ 2 j2Ni ljðk þ 1Þ¼fðujðk þ 1Þ; rj ðk þ 1ÞÞ 194 Z. Lin, H.hong-tao Liu

j the empirical frequency eiðkÞ is updated as ous section, the proposed algorithm suggests that agents will always adopt strategies that yield lowest cost based on their j 1 j e ðk þ 1Þ¼ ðe ðkÞþl ðk þ 1ÞÞ current empirical frequency. In this sense, the proposed dis- i 2 i j tributed optimization algorithm is an adaptive learning pro- Nðu ðk þ 1Þ; r2 ðk þ 1ÞÞ ð15Þ i;j i;j cess. According to Lemma 1, the sequence of state for each with agent fxiðkÞ; i 2 Ng follows the proposed distributed optimi- 8 zation algorithm will converge to a pure Nash equilibrium x > 1 < u ðk þ 1Þ¼ ðu ðkÞþu ðk þ 1ÞÞ i;j 2 i;j j provided that the distributed problem Eq. (14) exists one. > : 2 1 2 2 ri;jðk þ 1Þ¼ ðri;jðkÞþrj ðk þ 1ÞÞ Remark 1. From the updating protocol as defined in Eq. (15), 4 j j j the memory size mi of eiðkÞ is defined as that agent i will only and the distribution function for eiðkÞ is denoted as fiðljðkÞÞ. j record the most recent mi strategies of its opponent j as Every agent i 2 N will establish its local empirical frequency j j fljðk mi þ 1Þ; ...; ljðk 1Þ; ljðkÞg. at time k for all its opponents as eiðkÞ¼feiðkÞ; j 2 Nig. Dur- ing each game, agent i will not only share state information but Remark 2. Since agents in learning game theory have no prior also empirical frequency with its neighbours and update with knowledge concerning their opponents, both the empirical fre- the protocol given in Eq. (15). quency and strategies concerning their opponents are modelled as Gaussian distribution on ð1; þ1Þ. 3.1.2. Optimization Based on the updating and sharing protocols for the empirical frequency, the cost for strategy li 2Ui under empirical fre- 4. Application to consensus problems quency ejðkÞ is calculated by i Z In this section, the theoretical developments proposed in previ- j j j JiðlijeiðkÞÞ ¼ Jiðli; ljÞfiðljðkÞÞdlj ð16Þ ous sections will be illustrated with two typical consensus problems. Based on the problem requirements, the two prob- for a two agents game problem. lems will firstly be modelled as distributed optimization prob- For the distributed optimization problem as suggested in lems and the proposed algorithm will be adopted for solving Eq. (13), agent i will establish its empirical frequency for all them. The first problem is a distributed averaging problem its opponents as eiðkÞ¼ffiðl1ðkÞÞ; ...; fiðli1ðkÞÞ; and the second problem is a rendezvous problem, and in fiðliþ1ðkÞÞ; ...; fiðlnðkÞÞg. Under the empirical frequency which, every agent is modelled as a UAV with unicycle model. eiðkÞ, the cost for state xi is derived as Z Z The simulation result concerning the two problems will be pre- sented and discussed in the next section. JiðxijeiðkÞÞ ¼ ... Jgðx1; x2; ...; xnÞfiðl1ðkÞÞ ...fiðli1ðkÞÞ 4.1. Distributed averaging problem fiðliþ1ðkÞÞ ...fiðlnðkÞÞ dx1dx2 ...dxi1dxiþ1 ...dxn ð17Þ Consider there exist n agents with their initial states are not with xi 2Ui. equal to each other and the sum of their states equals to In every game, agent i will always search for the strategy l > 0. The state domain for each agent is continuous, convex xi 2Ui that yields the lowest cost for Eq. (17) under current and bounded as Ui ¼ð0; l; 8i 2 N. The inter-agent connec- empirical frequency as tion is distributed, i.e. the network topology is not a full graph. The mission requirement is to design a distributed protocol to 6 Jiðxi jeiðkÞÞ Jiðx^ijeiðkÞÞ; 8x^i 2Ui ð18Þ let the state of every agent reach a consensus. For this typical consensus problem, the global cost function is designed as and update xiðk þ 1Þ¼xi . X 2 The cost function JiðlijeiðkÞÞ for strategy li under the local Jgðx1; x2; ...; xnÞ¼ aijðxi xjÞ empirical frequency eiðkÞ is different from the local cost func- i–j ð19Þ tion J ðl ; l Þ as discussed in Eq. (7). The empirical frequency i i i ði; j 2 N; aij P 0Þ based cost function JiðlijeiðkÞÞ is a constructed function based on the local empirical frequency eiðkÞ and the global cost func- where xi and xj represent the states of agent i and agent j, respectively. By analyzing the cost function, the only equilib- tion Jgðx1; x2; ...; xnÞ, while the local cost function Jiðli; liÞ is a predefined function based on the current strategies from rium is achieved when xi ¼ xj; 8i; j 2 N. agents which are accessible to agent i under a certain network Based on the problem requirement, the local cost function topology. In this sense, without information from others, the can be designed as local cost function cannot be verified, while agent i can still X J ðx ; x Þ¼ a ðx x Þ2 ð20Þ make decision based on its empirical frequency and JiðlijeiðkÞÞ. i i i ij i j j2Ni 3.2. Convergence analysis and remarks with the neighbour set for each agent is defined based on the distributed and connected network topology. Definition 2 suggests that for every iteration, if agent i’s strat- From the definition of the local cost function, the designed egy is selected from the e dominate strategy domain, this game is a bounded potential game24 with its potential function algorithm is consistent with adaptive learning procedure. For as the global cost function Eq. (19). According to the proper- the distributed optimization algorithm introduced in the previ- ties of a potential game, the designed game is guaranteed to Consensus based on learning game theory with a UAV rendezvous application 195 possess a Nash equilibrium and the equilibrium for local cost suggested in Eq. (25). Since the network topology in this function Eq. (20) is consistent the global objective Eq. (19). problem is distributed and connected, there must exist agent j Provided that the empirical frequency eiðkÞ is defined as a j 2 Ni such that 2 Gaussian process as fiðljðkÞÞ Nðui;jðkÞ; ri;jðkÞÞ, the cost for – a strategy xi 2Ui is xi xj; xi 2 f and xj 2 f ð26Þ Z Z X 2 Since f is the state of convergence, according to the empir- J ðx je Þ¼ ... a ðx x Þ f ðl Þf ðl Þ ...f ðl Þf ðl Þ i i i ij i j i 1 i 2 i i1 i iþ1 ical frequency updating protocol and Definition 4, the empiri- j2Ni cal frequency for agent i and agent j will be ...fiðlnÞdx1dx2 ...dxi1dxiþ1 ...dxn X Z 2 j 2 a x x f l dx lim eiðkÞ¼fðxj; rj Þð27Þ ¼ ijð i jÞ ið jÞ j k!1 Xj2Ni i 2 lim ejðkÞ¼fðxi; ri Þð28Þ 2 2 2 k!1 ¼ aijðui;j þ ri;j 2xiui;j þ xi Þð21Þ j2Ni Therefore, according to Eq. (23), the new updating process is where the time k is ignored for concision. ( Because of the assumption of myopic for agents and the 0 xi ¼ xj convexity of the local cost function, the optimal strategy x 0 ð29Þ i x ¼ xi for agent i based on its empirical frequency is derived from j – 0 0 0 0 X Since xi xj the updated state f ¼fx1; x2; ...; xng is not @JiðxijeiÞ @ 2 2 2 ¼ aijðui;j þ ri;j 2xiui;j þ xi Þ equal to f, according to the definition of convergence as @xi @xi X j2Ni Definition 4, the assumption does not hold. Therefore, if the ¼ 2aijðxi ui;jÞ¼0 ð22Þ algorithm converges, it will only converge to the consensus j2Ni state as Eq. (25). h Therefore, X 4.2. Rendezvous problem for UAVs aijui;j Xj2Ni xi ¼ ð23Þ aij j2Ni A rendezvous problem is a typical multi-agent cooperation problem. For a rendezvous mission, multiple agents are usu- which is the weighted average of the mean values of agent i’s ally required to arrive a rendezvous point through communica- empirical frequency. P tion with each other under a distributed network topology. A Further, if the weight aij admits aij ¼ 1 and the mem- j2Ni rendezvous problem can be related to various application ory size for agent i’s empirical frequency equals 1, which 30–32 2 domain. The authors considered the planning of threat- implies f ðl ðkÞÞ ¼ fðxjðkÞ; r ðkÞÞ, then i j X j avoiding trajectories for agents for a rendezvous mission in a xiðk þ 1Þ¼ aijxjðkÞð24Þ hostile environment. In this section, without further compli- j2Ni cated the discussion, a simple rendezvous problem will be con- which is consistent with the discrete consensus protocol.3 sidered and the problem will be handled with the proposed distributed optimization algorithm as discussed in previous sections. Lemma 2. Every bounded potential game has approximate finite Assume there are n agents located randomly on a 2-dimen- improvement property (AFIP).24 sioal (2D) plane, and the inter-agent communication is undi- rected and distributed. Each agent i can only communicate Theorem 1. The algorithm proposed in this section will converge with its neighbour j 2 N . The model for each agent follows with finite iteration steps for the consensus problem as suggested i the definition of a unicycle model as in Eq. (19) and will only converge to the consensus state as 8 > x_ V cos h < i ¼ i f ¼fxi ; i 2 Ng : x1 ¼ x2 ¼ x3 ¼ ...¼ xn ð25Þ > y_i ¼ V sin hi ð30Þ : _ Proof. The proof of this theorem is based on the proof of the hi ¼ xi following two statements: where V is the constant speed for every agent and xi is the con- trol input. (1) The sequence of this algorithm will converge with finite The rendezvous problem requires agents to gather to the iteration steps same location on the 2D plane. The controllers for agents are (2) If the algorithm converges, it will only converge to the designed based on their own empirical frequency. The rendez- consensus state as suggested in Eq. (25). vous mission does not require UAVs arrive the same location at the same time. The inter-UAV communication happens on [Proof of Convergence I] As discussed in Section 3.2 and a predefined time interval, and vehicles will re-adjust their con- Lemma 2, with the proposed distributed optimization algo- trol inputs whenever information update is available. For this rithm, the problem will converge with finite iteration steps. rendezvous problem, the global cost function is designed as Xn [Proof of Convergence II] Assume the algorithm will converge J ðl ; l ; ...; l Þ¼ J ðl ; l Þð31Þ – g 1 2 n i i i to certain state f and f f , where f is the equilibrium state as i¼1 196 Z. Lin, H.hong-tao Liu with with its most adjacent agents. The updating protocol adopted Xhi 2 2 in the simulation follows Eq. (15) with the variance values for Jiðli; liÞ¼ ðxi xjÞ þðyi yjÞ distribution functions equal 0. j2Ni Table 1 and Fig. 1(a) demonstrate the influence of number where li ¼ðxi; yi; hiÞ is the state of agent i. of neighbours on the algorithm convergence rate. The memory Based on the discussion of the distributed optimization size in this case is infinite, which implies that each agent will algorithm in previous sections, each agent i will establish its record the whole strategy profile for its counterparts. 1 2 n local empirical frequency ei ¼fei ; ei ; ...; ei g based on the From the simulation results, it is apparent that the conver- information from its neighbours. Each entry of the empirical gence efficiency will increase if the number of neighbours for j frequency ei is the estimation for the state of another agent, each agent is increased, which is coherent with the fundamen- j j j j j j j 33 i.e. ei ¼ðeix; eiy; eihÞ with both eix; eiy and eih are Gaussian pro- tal consensus theory. 2 2 cess as fixðljÞNðuix;j; rix;jÞ; fiyðljÞNðuiy;j; riy;jÞ and Table 1 and Fig. 1(b) demonstrate the influence of memory 2 fihðljÞNðuih; j; rih;jÞ, respectively. For a strategy of the empirical frequency on the algorithm convergence rate. li ¼ðxi; yi; hiÞ2Ui, since the global cost function only In Table 1, the column ‘‘Infinite memory size’’ represents the involves the state of fxj; j 2 Nig and fyj; j 2 Nig, the cost of case that agents will take all strategies adopted by their coun- the strategy based on the local empirical frequency is terparts into consideration to make their own decision, while Z Z ‘‘Memory size 2’’ and ‘‘Memory size 3’’ columns represent

JiðlijeiÞ¼ ... Jgðl1; l2; ...; lnÞfixðl1Þ agents will only use the most recent 2 or 3 information to esti- mate their opponents’ strategy patterns. fiyðl1Þ ...fixðlnÞfiyðlnÞdl1 ...dln ð32Þ In order to make the simulation result more clear for dem- onstration, the information updating process for only one By analyzing Jgðl1; l2; ...; lnÞ and fJiðli; liÞ; i 2 Ng in Eq. (31), the designed problem is consistent with a potential agent is demonstrated in Fig. 1(b). The blue line represents game as defined in Definition 5, and thus, the equilibrium the infinite memory case; the red line represents the case in which the memory size of the empirical frequency equals 2; for fJiðl ; l Þ; i 2 Ng is consistent with the global objective. i i the green line represents the memory size 3 case. Therefore, the cost function for the strategy li of agent i can be simplified as The result demonstrates that for the network that each Z Z agent has less neighbours, the decrease of memory size will improve the performance greatly. However, under network JiðlijeiÞ¼ ... Jiðxi;yiÞfixðl1Þfiyðl1Þ...fixðlnÞfiyðlnÞdl1 ...dln X topologies that each agent has more neighbours, there is no 2 2 2 ¼ ðuix;j þ rix;j 2uix;jxi þ xi Þ clear pattern to describe the influence of memory on the per- j2XNi formance of convergence. 2 2 2 þ ðuiy;j þ riy;j 2uiy;jyi þ yi Þ¼Wiðxi;yiÞð33Þ j2Ni 5.2. Simulation for rendezvous problem with multiple UAVs With the support of Eq. (33), the optimal control problem is design as The simulation for the rendezvous problem is conducted to Z demonstrate the designing procedure of a distributed multi- tf 1 2 agent cooperation mission based on the distributed optimiza- Ji ¼ WiðxiðtfÞ; yiðtfÞÞ þ rxi dt 0 2 tion algorithm proposed before. In the simulation, 5 unicycle 8 model based vehicles are considered with constant velocity > < x_i ¼ V cos hi V = 10 m/s. The vehicles are distributed on the 2D plane at s:t: > y_i ¼ V sin hi ð34Þ points [1 km, 3 km], [3 km, 7 km], [9 km, 10 km], [10 km, : _ h ¼ xi 4 km] and [6 km, 2 km]. In this problem, the inter-vehicle collision and obstacle By solving this optimal control problem, the control input avoidance are not considered for simplicity. The inter-vehicle and the trajectory for each agent can be obtained for the ren- communication is distributed, and each vehicle can only com- dezvous mission. municate with its 2 most adjacent neighbours under an undi- rected network topology. The communication is assumed to 5. Simulation results happen every 10 s after the mission start, and communication

In this section, based on the analysis of the two consensus problems in the previous section, corresponding simulations Table 1 Simulation results for the distributed averaging are conducted to illustrate the theoretical developments. problem. Number of Infinite Memory Memory 5.1. Simulation for distributed averaging problem neighbours memory size size 3 size 2 3 470 395 306 The simulation for the distributed averaging problem is con- 5 152 128 96 ducted to illustrate the influence of neighbour numbers and 7665839 memory size on convergence efficiency. The number of neigh- 10 38 45 19 bours for each agent is the same. The weights aij are constant 15 13 23 15 and equal to each other. Every agent can only communicate Consensus based on learning game theory with a UAV rendezvous application 197

Fig. 1 Simulation for distributed averaging problem.

failure is not considered. The optimization will be terminated gence of the state x and y for each vehicle are demonstrated in when the global cost is smaller than a predefined threshold Fig. 3(a) and (b) separately. Fig. 4(a) shows the simulation T, and in the simulation, the threshold is set as T = 0.1. The result of the state h. The designed controller signal is as general pseudo-spectral optimization software (GPOPS)34 is demonstrated in Fig. 4(b). adopted for the optimal control problem. The simulation demonstrates that the designed controller The trajectories for the vehicles are demonstrated in based on the distributed optimization algorithm proposed in Fig. 2(a), and corresponding global cost function convergence the previous section can achieve the rendezvous mission result with regard to time is illustrated in Fig. 2(b). The conver- requirement.

Fig. 2 Simulation results of trajectory for each vehicle and global cost. 198 Z. Lin, H.hong-tao Liu

Fig. 3 Simulation results of states x and y.

Fig. 4 Simulation results of heading angle h and control signal x.

6. Conclusions (4) By introducing and analyzing the two typical consensus problems with the proposed distributed optimization (1) A distributed optimization algorithm is designed based algorithm, this paper provides a guideline for multi- on the fictitious play concept. The algorithm assumes agent cooperation problem design under game theory agents analyze past observations as if behaviours of their framework. competitors were stationary, and by learning from the (5) Simulation results are provided to demonstrate that the strategy patterns of their opponents, agents will decide designed game model for the distributed multi-agent their own optimal strategies. cooperation problems can ensure global cooperation (2) The designed distributed optimization algorithm ensures among myopic agents. the algorithm autonomy and robustness through the learning behaviour of each agent. (3) Convergence of the proposed algorithm is proven under Acknowledgements game theory framework to guarantee the algorithm will converge to a Nash Equilibrium provided that the origi- This work is funded by China Scholarship Council (No. nal problem design exists one. 201206230108) and the Natural Sciences and Engineering Consensus based on learning game theory with a UAV rendezvous application 199

Research Council of Canada Discovery Grant (No. RGPIN 20. Li N, Marden JR. Designing games for distributed optimization. 227674). Proceedings of 2011 50th IEEE conference on decision and control and European control conference; 2011 Dec 12–15; Orlando, FL. Piscataway, NJ: IEEE; 2011. p. 2434–40. References 21. Li N, Marden JR. Designing games for distributed optimization. IEEE J Selected Top Signal Process 2013;7(2):230–42. 1. Herna´ndez JZ, Ossowski S, Garcıa-Serrano A. Multiagent archi- 22. Lin ZJ, Liu HHT. Consensus based on learning game theory. tectures for intelligent traffic management systems. Transp Res Proceedings of the 6th IEEE Chinese guidance navigation and Part C Emerg 2002;10(5–6):473–506. control conference; 2014 Aug 8–10; Yantai, China. Piscataway, 2. Abreu B, Botelho L, Cavallaro A, Douxchamps D, Ebrahimi T, NJ: IEEE; 2014. Figueiredo P, et al. Video-based multi-agent traffic surveillance 23. Milgrom P, Roberts J. Adaptive and sophisticated learning in system. Proceedings of the IEEE intelligent vehicles Symposium IV normal form games. Games Econ Behav 1991;3(1):82–100. 2000; Oct 3–5; Dearborn, MI. Piscataway, NJ: IEEE; 2000. p. 457– 24. Monderer D, Shapley LS. Potential games. Games Econ Behav 62. 1996;14(1):124–43. 3. Ren W, Beard RW. Distributed consensus in multi-vehicle cooper- 25. Shapley LS. A value for n-person games. 1952 Mar. Report No.: ative control: theory and applications. London: Springer-Verlag; AD604084. 2008. 26. Gopalakrishnan R, Nixon S, Marden JR. Stable utility design for 4. Ren W, Beard RW. Consensus seeking in multiagent systems distributed resource allocation [Internet]; 2014 [cited 2014 Jun]. under dynamically changing interaction topologies. IEEE Trans Available from: . 5. Remagnino P, Tan T, Baker K. Multi-agent visual surveillance of 27. Marden JR, Shamma JE. Game theory and distributed control. dynamic scenes. Image Vis Comput 1998;16(8):529–32. Handbook of game theory, vol. 4. Forthcoming. 6. Orwell J, Massey S, Remagnino P, Greenhill D, Jones GA. A 28. Wolfowitz J. Products of indecomposable, aperiodic, stochastic multi-agent framework for visual surveillance. Proceedings of matrices. Proc Am Math Soc 1963;14(5):733–7. international conference on image analysis and processing; 1999 29. Cao YC, Yu WW, Ren W, Chen GR. An overview of recent Sept 27–29; Venice, Italy. Piscataway, NJ: IEEE; 1999. p. 1104–7. progress in the study of distributed multi-agent coordination. 7. Ivanov YA, Bobick AF. Recognition of multi-agent interaction in IEEE Trans Industr Inf 2013;9(1):427–38. video surveillance. Proceedings of the seventh IEEE international 30. McLain TW, Chandler PR, Rasmussen S, Pachter M. Cooperative conference on computer vision; 1999 Sept 20–25; Corfu, control of UAV rendezvous. Proceedings of the American control Greece. Piscataway, NJ: IEEE; 1999. p. 169–76. conference; 2001 Jun 25–27; Arlington, VA. Piscataway, NJ: 8. Janssen M, de Vries B. The battle of perspectives: a multi-agent IEEE; 2001. p. 2309–14. model with adaptive responses to climate change. Ecol Econ 31. McLain TW, Chandler PR, Pachter M. A decomposition strategy 1998;26(1):43–65. for optimal coordination of unmanned air vehicles. Proceedings of 9. Olfati-Saber R. Flocking for multi-agent dynamic systems: algo- the American control conference; 2000 Jun; Chicago, Illinois. Pis- rithms and theory. IEEE Trans Autom Control 2006;51(3):401–20. cataway, NJ: IEEE; 2000. p. 369–73. 10. Jadbabaie A, Lin J, Morse AS. Coordination of groups of mobile 32. McLain TW, Beard RW. Trajectory planning for coordinated autonomous agents using nearest neighbor rules. IEEE Trans rendezvous of unmanned air vehicles. 2000. Report No.: AIAA- Autom Control 2003;48(6):988–1001. 2000-4369. 11. Vicsek T, Cziro´k A, Ben-Jacob E, Cohen I, Shochet O. Novel type 33. Olfati-Saber R, Fax JA, Murray RM. Consensus and cooperation of phase transition in a system of self driven particles. Phys Rev in networked multi-agent systems. Proc IEEE 2007;95(1):215–33. Lett 1995;75(6):1226. 34. Rao AV, Benson DA, Darby C, Patterson MA, Francolin C, 12. Zhang F, Leonard NE. Cooperative filters and control for Sanders I, et al. Algorithm 902: GPOPS, a MATLAB software for cooperative exploration. IEEE Trans Autom Control 2010;55(3): solving multiple-phase optimal control problems using the gauss 650–63. pseudospectral method. ACM Trans Math Softw 2010; 13. Zhang F, Fiorelli E, Leonard NE. Exploring scalar fields using 37(2):22:1–22:39. multiple sensor platforms: tracking level curves. Proceedings of 2007 46th IEEE conference on decision and control; 2007 Dec 12– Lin Zhongjie received his B.S. degree from the school of electronic 14; New Orleans, LA. Piscataway, NJ: IEEE; 2007. p. 3579–84. information and electrical engineering of Shanghai Jiao Tong Uni- 14. Saber RO, Murray RM. Consensus protocols for networks of versity and M.S. degree from the school of aeronautics and astro- dynamic agents. Proceedings of the American control conference; nautics of Shanghai Jiao Tong University. He is currently a Ph.D. 2003 Jun 4–6; Pasadena, CA. Piscataway, NJ: IEEE; 2003. p. candidate at the University of Toronto Institute for Aerospace Studies 951–6. (UTIAS). His main research interests are multi-agent cooperation 15. Olfati-Saber R, Murray RM. Consensus problems in networks of problems, distributed optimization, game theory and task assignment agents with switching topology and time delays. IEEE Trans domain. Autom Control 2004;49(9):1520–33. 16. Fax JA. Optimal and cooperative control of vehicle formations Liu Hugh hong-tao is a professor at the University of Toronto Institute [dissertion]. California: California Institute of Technology; 2001 for Aerospace Studies (UTIAS), Toronto, Canada, where he also [dissertation]. serves as the Associate Director, Graduate Studies. His research work 17. Fax JA, Murray RM. Information flow and cooperative control of over the past several years has included a number of aircraft systems vehicle formations. IEEE Trans Autom Control 2004;49(9):1465–76. and control related areas, and he leads the ‘‘Flight Systems and 18. Kraus S. Negotiation and cooperation in multi-agent environ- Control’’ (FSC) Research Laboratory. ments. Artif Intell 1997;94(1–2):79–97. 19. Semsar-Kazerooni E, Khorasani K. Multi-agent team coopera- tion: a game theory approach. Automatica 2009;45(10):2205–13.