Consensus Based on Learning Game Theory with a UAV Rendezvous Application
Total Page:16
File Type:pdf, Size:1020Kb
Chinese Journal of Aeronautics, (2015),28(1): 191–199 Chinese Society of Aeronautics and Astronautics & Beihang University Chinese Journal of Aeronautics [email protected] www.sciencedirect.com Consensus based on learning game theory with a UAV rendezvous application Lin Zhongjie, Liu Hugh hong-tao * University of Toronto Institute for Aerospace Studies, 4925 Dufferin Street, Toronto, Ontario M3H 5T6, Canada Received 23 July 2014; revised 15 September 2014; accepted 2 November 2014 Available online 26 December 2014 KEYWORDS Abstract Multi-agent cooperation problems are becoming more and more attractive in both civil- Consensus; ian and military applications. In multi-agent cooperation problems, different network topologies Distributed algorithms; will decide different manners of cooperation between agents. A centralized system will directly Fictitious play; control the operation of each agent with information flow from a single centre, while in a distrib- Game theory; uted system, agents operate separately under certain communication protocols. In this paper, a sys- Multi-agent systems; tematic distributed optimization approach will be established based on a learning game algorithm. Potential game The convergence of the algorithm will be proven under the game theory framework. Two typical consensus problems will be analyzed with the proposed algorithm. The contributions of this work are threefold. First, the designed algorithm inherits the properties in learning game theory for problem simplification and proof of convergence. Second, the behaviour of learning endows the algorithm with robustness and autonomy. Third, with the proposed algorithm, the consensus problems will be analyzed from a novel perspective. ª 2015 Production and hosting by Elsevier Ltd. on behalf of CSAA & BUAA. 1. Introduction problems, different network topologies will influence different manners of cooperation between agents. A centralized system Multi-agent cooperation problems are becoming more and will directly control the operation of each agent with informa- more attractive in both civilian and military applications, such tion flow from a single centre, while in a distributed system, as the real-time monitoring of crops status, monitoring the agents operate separately under certain communication traffic situation,1,2 forest fire monitoring,3,4 surveillance5–7 protocols. and battle field assessment.8 In multi-agent cooperation There are many related studies on distributed multi-agent cooperation problems such as flocks,9 swarms,10,11 sensor 12,13 * Corresponding author. Tel.: +1 416 667 7928. fusion and so on. The theoretical framework for construct- E-mail address: [email protected] (H.h.-t. Liu). ing and solving consensus problems within networked systems 14–17 Peer review under responsibility of Editorial Committee of CJA. was introduced in Refs. For distributed multi-agent systems, there are three chal- lenges need to be addressed to achieve cooperation among a potentially large number of involving agents: Production and hosting by Elsevier http://dx.doi.org/10.1016/j.cja.2014.12.009 1000-9361 ª 2015 Production and hosting by Elsevier Ltd. on behalf of CSAA & BUAA. 192 Z. Lin, H.hong-tao Liu (1) Limited information for agents to utilize to achieve the Consider a 2 · 2 matrix game global objective. (2) The information of the distributed network topology is unknown to agents. ð1Þ (3) The network can be dynamic, such as stochastic net- works or switching networks. where aij ði ¼ 1; 2; j ¼ 1; 2Þ represents the cost of Agent 1 To construct a game theory model for a distributed multi- 1 2 according to its strategies fl1; l1g and bij ði ¼ 1; 2; j ¼ 1; 2Þ is agent cooperation problem: first, the agents of the game and 1 2 the cost of Agent 2 for its strategies fl2; l2g. Agents will decide their strategy domains needs to be identified, in which agents their strategies based on their current empirical frequency for are considered as myopic and rational players; second, based their opponents. The algorithm starts with an initial empirical on the strategy domains and the mission requirement, a global frequency cost function that reflects the performance of the joint actions will be constructed; third, an optimal strategy will be derived 2 2 2 e1ð0Þ¼ðw1; w2Þ out based on the established global cost function. Therefore, ð2Þ e1ð0Þ¼ðw1; w1Þ in order to achieve cooperation in a distributed network 2 1 2 among myopic agents, the designed game will yield the lowest j where wi ði ¼ 1; 2; j ¼ 1; 2Þ represents the anticipated weight costs for all agents if and only if when their strategies will ben- for opponent j’s ith strategy. efit the global objective. Take Agent 1 as an example, according the empirical fre- Based on bargaining game concept, the author suggests a quency of Agent 1, the cost for each strategy is useful strategy for task assignment and resource distribution 18 8 problems with a small number of agents. An approach is pro- 2 2 > w a11 þ w a12 <> J ðl1je2Þ¼ 1 2 posed for consensus problems based on cooperative game the- 1 1 1 w2 þ w2 ory and bargain game theory.19 A distributed optimization 1 2 ð3Þ > w2a þ w2a algorithm is designed based on the state potential game :> 2 2 1 21 2 22 J1ðl1je1Þ¼ 2 2 framework.20,21 w1 þ w2 In classical game theory, agents are assumed to be able Both agents will choose the strategies that yields the lowest to correctly anticipate what their opponents will do. It cost based on their own empirical frequencies. At the end of may be unrealistic in practice. In real application, agents each game, agents will update their empirical frequency cannot introspectively anticipate how others will act in dis- according to their opponents’ strategies in the previous game. tributed networks, especially in games with a large number of participants. Unlike classical game theory, learning game 23 theory does not impose assumptions on agents’ rationality Definition 1. (e-dominate ). A strategy li 2Ui is e-dominated and believes, but assumes instead that agents can learn over by another strategy l^i 2Ui, if for any lÀi 2UÀi, there exists time about the game and the behaviour patterns of their Jiðli; lÀiÞ > Jiðl^i; lÀiÞþe. The set of e-dominate strategy opponents. domain of agent i is denoted as The preliminary result was presented.22 In this paper, a sys- e 6 tematic distributed optimization approach will be established Di ðUÞ ¼ fli 2UijJiðli; lÀiÞþe Jiðl^i; lÀiÞ; based on a learning game algorithm. The convergence of the 8l^i 2Ui; 8lÀi 2UÀi; e P 0gð4Þ algorithm will be proven under game theory framework. According to the definition of the e-dominate domain, an Two typical consensus problems will be analyzed with the pro- adaptive learning is defined as posed algorithm. Definition 2. (Adaptive learning23). A sequence of strategy for 2. Preliminaries agent i as fliðtÞg is consistent with adaptive learning, if e ^6 ^ For a game GðN; ðUi; i 2 NÞ; ðJi; i 2 NÞÞ, Ui represents agent i’s liðtÞ2Di ðflðsÞjt s < tgÞ; 8t < t ð5Þ strategy domain with its element as li. The collected strategy domain of Ui is denoted as U¼Âi NUi with its element as à à à 2 Definition 3. (Nash equilibrium). A strategy l ¼ðli ; lÀiÞ is l ¼ðli; lÀiÞ, where lÀi denotes the strategies of other agents called a Nash Equilibrium of game GðN; fUi; i 2 Ng; besides agent i. The cost function for agent i is denoted as fJi; i 2 NgÞ, if and only if Jiðli; lÀiÞ, which implies that agent i’s cost depends both on its own strategy and on every other agent’s strategy. à à 6 à Jiðli ; lÀiÞ Jiðli; lÀiÞ; 8i 2 N; 8li 2Ui ð6Þ 2.1. Governing equation Definition 4. (Convergence). The sequence ffðtÞg converges to f if and only if there is a T > 0 such that fðtÞ¼f for all t P T. Fictitious play algorithm is adopted for designing the distrib- 23 uted optimization algorithm in this work. It is a variant of a Lemma 1. (Adaptive learning convergence ). If certain ‘‘stationary Bayesian learning’’ process. In fictitious play, strategy sequence fliðtÞg is consistent with adaptive learning à à agents assume behaviour patterns of their competitors were and it converges to li , then li belongs to a pure strategy Nash 22 à à à stationary. equilibrium l ¼ðli ; lÀiÞ. Consensus based on learning game theory with a UAV rendezvous application 193 2.2. Potential games The matrix A(k) is row stochastic matrix, which suggests that The potential game concept is important for distributed multi- limeÀLt ¼ 1vT ð11Þ agent cooperation problems. t!1 and 24 Definition 5. (Potential game ). A game GðN; fUi; i 2 Ng; k T fJi; i 2 NgÞ is a potential game if there exists a global function lim A ¼ 1u ð12Þ k!1 Uðli; lÀiÞ such that P P where ivi ¼ 1 and iui ¼ 1.P Therefore, Eqs. (11) and (12) suggest that limt!1xðtÞ¼ vixið0Þ and limt!1xðkÞ¼ Jiðli; lÀiÞJiðl^i; lÀiÞ¼Uðli; lÀiÞUðl^i; lÀiÞ; P i iuixið1Þ . 8i 2 N; li; l^i 2Ui; lÀi 2UÀi ð7Þ If the strategy domain for every agent of a potential game is 3. Distributed optimization algorithm bounded, this game is defined as a bounded potential game. The potential game concept provides a valuable theoretical Under a centralized network, the centre can obtain informa- framework for distributed multi-agent cooperation problems. tion from all agents and conduct the optimization. For a dis- First, a game admits the potential property is guaranteed to tributed system, only partial information is available for possess a Nash equilibrium. Second, from the definition of a each agent due to the distributed network topology. potential game, the Nash equilibrium for every local cost func- According to the mission requirement of a distributed tion is consistent with the global objective.