= , ,..., G = , { } {V E} = , = ∈ ! = : (, ) (, ) . N { ∈ ∈E ∈E} , ,..., ∈ ∈N+ ∈{ } (, ) . E

= , = ∈ ! = : (, ) (, ) . N { ∈ ∈E ∈E} , ,..., ∈ ∈N+ ∈{ } (, ) . E

= , ,..., G = , { } {V E}

= : (, ) (, ) . N { ∈ ∈E ∈E} , ,..., ∈ ∈N+ ∈{ } (, ) . E

= , ,..., G = , { } {V E} = , = ∈ !

, ,..., ∈ ∈N+ ∈{ } (, ) . E

= , ,..., G = , { } {V E} = , = ∈ ! = : (, ) (, ) . N { ∈ ∈E ∈E}

= , ,..., G = , { } {V E} = , = ∈ ! = : (, ) (, ) . N { ∈ ∈E ∈E} , ,..., ∈ ∈N+ ∈{ } (, ) . E

, ,..., ∈{ } , ∈{ } − − +

, ∈{ } − − +

, ,..., ∈{ }

− − +

, ,..., ∈{ } , ∈{ }

, ,..., ∈{ } , ∈{ } − − +

, ,..., ∈{ } , ∈{ } − − +

=[] . ×

=[] . ×

=[] . ×

, , : + , ∈{ } ∈ ∈ ∪ ≤ ≤

∗, ∗ , ; + , ∈{ } ∈ ∪

, , : + , ∈{ } ∈ ∈ ∪ ≤ ≤

∗, ∗ , ; + , ∈{ } ∈ ∪

, , : + , ∈{ } ∈ ∈ ∪ ≤ ≤

∗, ∗ , ; + , ∈{ } ∈ ∪

+ N ∈

G > ¯ I = , ,..., ¯ , ,... T { }⊂{ } ∈I C C G′ (), , ,..., ∈{ ∞} , ,..., ∈{ ∞} ()=( + ) ((), ( + )) G

¯ I = , ,..., ¯ , ,... T { }⊂{ } ∈I C C G′ (), , ,..., ∈{ ∞} , ,..., ∈{ ∞} ()=( + ) ((), ( + )) G

G >

G′ (), , ,..., ∈{ ∞} , ,..., ∈{ ∞} ()=( + ) ((), ( + )) G

G > ¯ I = , ,..., ¯ , ,... T { }⊂{ } ∈I C C

G > ¯ I = , ,..., ¯ , ,... T { }⊂{ } ∈I C C G′ (), , ,..., ∈{ ∞} , ,..., ∈{ ∞} ()=( + ) ((), ( + )) G

π Π (π , Π) ∈P T (πΠ ) "∈I (πΠ ) T∈C ∈I πΠ P→C C (π , Π) ∈P

(π , Π) ∈P T (πΠ ) "∈I (πΠ ) T∈C ∈I πΠ P→C C (π , Π) ∈P

π Π

π Π (π , Π) ∈P T (πΠ ) "∈I (πΠ ) T∈C ∈I πΠ P→C C (π , Π) ∈P

π Π (π , Π) ∈P T (πΠ ) "∈I (πΠ ) T∈C ∈I πΠ P→C C (π , Π) ∈P

= , ,..., { } = , ,..., { } ≤ ≤ G()= , () . {V E } × () G() π (), , ,..., ∈{ } π()=π()() . () ⊂

() ∈ (), () ∀ ∈

≤ ≤ G()= , () . {V E } × () G() π (), , ,..., ∈{ } π()=π()() . () ⊂

() ∈ (), () ∀ ∈

= , ,..., { } = , ,..., { }

π (), , ,..., ∈{ } π()=π()() . () ⊂

() ∈ (), () ∀ ∈

= , ,..., { } = , ,..., { } ≤ ≤ G()= , () . {V E } × () G()

() ⊂

() ∈ (), () ∀ ∈

= , ,..., { } = , ,..., { } ≤ ≤ G()= , () . {V E } × () G() π (), , ,..., ∈{ } π()=π()() .

= , ,..., { } = , ,..., { } ≤ ≤ G()= , () . {V E } × () G() π (), , ,..., ∈{ } π()=π()() . () ⊂

() ∈ (), () ∀ ∈

π() π()

max π() ()= "∈ () , () ∈ ∈ ∗() () ∈ ∈ π∗() π (). ≤ "∈ "∈

( + )

π( + )=π() ( + )

π()( + )∞,

(π()( + )∞).

max (π()( + )∞). ()= "∈ () , () ∈ ∈

π() (); ≤ ≤

π() (); ≤ ≤

π() (); ≤ ≤

π() (); ≤ ≤

Network Games in Security: Three Problems

Yuke Li January 27, 2020

Abstract: This paper contains the development and the analysis of three network games applied to problems in the context of security. The first is the classic Byzantine Generals Problem [1] reformulated into a game on time-varying graphs. We explore conditions for the game under which di↵erent kinds of equilibria (an optimal consensus, a less optimal consensus or no consensus) may arise, including the decision rule for the generals, and the sequence of communication graphs. We also address the question of “fault tolerance [1]” – the maximum number of traitors in the game, beyond which there will no longer be an optimal-consensus type of equilibria. The second and the third problems are respectively robotic surveillance and agents’ mutual influences on communication networks, whose analysis depends on a method we termed as “control of Markov chains.” In these two games, agents’ can be regarded as designing a Markov chain. We study the existence and properties of pure strategy Nash equilibria in di↵erent scenarios of the games. The main advantage of the games is that the payo↵functions for the agents will be properties derived from a Markov chain itself, such as the stationary distribution. This stands in contrast with those assumed payo↵functions in standard game-theoretic approaches to these problems, such as [2, 3], [4], [5], [6], [7], [8] and others. Lastly, we discuss possibilities of extending our analysis.

Keywords: Byzantine generals, , time-varying graphs, fault tolerance, Markov chains, robotic surveillance, opinion dynamics

1 The Byzantine Generals Problem

The Byzantine Generals Problem is a classic research question in the field of distributed systems (e.g., [1, 9]). Fundamental to the functioning of the system is a consensus reached by its components, possibly after information transmission among them. A team of UAVs should coordinate on a formation strategy. The agents in a blockchain must confirm previous Bitcoin transactions to be valid. Data managers in a distributed database system need to agree on whether to commit or abort a given transaction. Adistributedsystemneedstocontinuetofunction,despitethefailuresofafewcompo- nents (or agents). A type of failure is “Byzantine fault” [1]. It means that some components

1 may send contradictory or conflicting information, or they may sleep and then resume ac- tivity after a certain amount of delay. Several divisions of the Byzantine army are camped outside an enemy city, with each division commanded by a lieutenant. The lieutenants re- ceive a message in the form of “attack” (0) or “not attack” (1) from a commander, send these messages among themselves, and decide on whether to attack via a majority voting. However, some of these generals, including the commander, might be traitorous, trying to prevent a good decision from being reached, such as by sending conflicting messages to others. In what follows, we propose to reformulate the original Byzantine Generals problem into a game-theoretic one. In the game, the generals would transmit messages among themselves on a sequence of communication graphs. The strategic communication protocol is that the commander sends its messages to some lieutenants, who would decide on the messages sent to their “neighbors” in the communication graph (who are lieutenants at the same “level” and the next “level”) and on their positions based on the messages received. At the end of the protocol, they will deliver a vote on their positions, which will determine their payo↵s from the game. We will study the strategies – the strategic, mutual best responses – of the generals. A significant di↵erence from [1] is that we will still assume that the traitorous generals can send anything they wish. But the strategies of the traitors need to be optimally calibrated with those of the generals they communicate with and, importantly, vice versa. We will explore conditions for the game under which di↵erent kinds of equilibria (an optimal consensus, a less optimal consensus or no consensus) may arise, including the decision rule for the loyal generals (because the traitorous generals may act arbitrarily), and the sequence of communication graphs. As in [1], we will also address the question of “fault tolerance” – the maximum number of traitors in the game beyond which there will no longer be an optimal-consensus type of equilibria. To the best of our knowledge, there are not many existing game-theoretic studies moti- vated by the Byzantine Generals problem. [10] contains an informal discussion of how the problem can be thought of as a game between two opposing teams. The most relevant work, such as the electronic mail game formulated in [11] as well as its follow-up work (e.g., [12]), has shown that a lack of generated by faulty communication can make coordinated actions impossible. However, an electronic mail game is usually played by two agents. Especially, the game does not cover the first type of Byzantine fault – the agents may send conflicting information to others, which is, instead, the focus of our study. At a general level, our study relates to the literature on games of strategic information transmission. This strand of research started with the seminal work of [13], which develops atwo-stagegamewithasendertransmittingamessagetoareceiver.Thesenderhasprivate information about the statistical distribution of the state of the world unknown to the receiver and may send false messages to mislead the receiver if their interests do not align. The Crawford-Sobel model was later expanded to accommodate settings where a sender interacts with multiple receivers [14], where the senders and the receivers communicate over adynamichorizon[15]orwherethesendersandthereceiverscommunicateonastatic network [16]. The contribution of our work is that we will consider the agents pass the messages on a dynamic sequence of communication graphs and that many of these agents are both a sender and a receiver.

2 Our study also relates to the literature on voting games, though to a lesser extent (e.g., [17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28]). As far as we know, a voting game normally does not cover a communication protocol between politicians and voters. Yet we do not consider the issues of legislative bargaining or party competitions or assume a certain distribution for the voters’ preferences as in a voting game. Next, we will first formulate the game of the Byzantine Generals. Second, we study two scenarios, with one where the lieutenants cannot communicate with others at the same “level” in the communication graph, and the other where they can. In the meantime, we explore equilibrium predictions in either scenario by assuming two di↵erent decision rules.

1.1 Problem Formulation In the game of interest, there is a set of generals n = 1, 2,...,n connected on a commu- { } nication graph G = , , which is a directed graph. Any two connected agents i and j in may have a directed{V edgeE} (i, j)ortwodirectededges(i, j)and(j, i)betweenthem. V We partition n into K N disjoint subsets of generals, 2 K

nk = n. k[=0 The node set n0 = u includes u designated as the commander. The node set nk represents the set of the generals{ } at level k 0, 1,...,K .Forgenerali,letitsneighborsetatlevelk 2{ } be ik = j : j nk and (i, j) or (j, i) . We require that for i nk, j ik+1 and k N0, 1,...,K{ 2,only(i, j)exists.2E 2E} 2 2N 2{ } Assumption 1: Message Passing. At time k 0, 1,...,K ,generali at level k send amessageof0or1toeachoftheirneighborsatlevels2{k and k +1}. Assume that i will send messages to his neighbors at level k based on the messages received from his neighbors at level k 1. i will send messages to his neighbors at level k +1 based on the messages received from his neighbors at both level k 1andlevelk. In other words, for i n ,itsmessagestoneighborsatlevel k are denoted as the 2 k ik -vector [mij]1 and determined by the map, |N | ⇥|Nik | [m ] [m ], pi 7! ij where p ik 1 and j ik. 2N 2N Its messages to neighbors at level k +1 aredenotedasthe ik+1 -vector [mi]1 and ⇥|Nik+1| determined by the map, |N | [m ] [m ] [m ], pi ⇥ ji 7! iq where p ik 1,j ik and q ik+1. 2N 2N 2N The messages are passed on a sequence of communication graphs G(k); k 0, 1,...,K , each of which is a weakly connected digraph. We call this sequence an ascending2{ chain of} G’s spanning subgraphs, G(k) G(k +1), ⇢ 3 which will reach G at time K, G(K)=G, with the following property (k) (k +1) E ⇢E where (k +1) (k)= (j, h):j n and h n n E E { 2 k+1 2 k+1 [ k+2} and (K) (K 1) = j, h : j, h n . E E {{ } 2 k} The message m 0, 1 can thus be regarded as being passed on the directed edge (j, h) jh 2{ } in G(k +1). Assumption 2: Own Position There are two types of generals, loyal generals, and traitors, with their types known to only themselves. Each general i needs to determine his own position v 0, 1 based on messages received. i 2{ } At level 0, the commander u determines his position vu 0, 1 .Atlevelk 1, 2,...,K , the loyal generals determine their positions with the messages2{ received} from2 the{ generals at} level k 1andlevelk,andsendthosepositionstoitsneighborsatlevelk +1. A loyal general’s position is the value he sends to his neighbors at level k +1. Bycontrast,thevalue atraitorsendstohisneighborsmaynotnecessarilybehisownposition.Atraitormaysend whatever messages they would like to any neighbor.

Assumption 2.1. Decision Rule I. Each loyal general i nk; k 0, 1,...,K can only observe messages sent to him from layers k 1andk.General2 i will2{ send the simple} majority value of the messages received from layer k 1tohisneighborsatlayerk. He will adopt the simple majority value among the messages received from layer k 1andk as his own position vi, and send this value to the neighbors at layer k +1.If“0”sand“1”s respectively make up one-half of all the received values, assume that i may take either value as vi. A traitor may send to his neighbors whatever he wishes. Assumption 3: Final Vote. The game will end at time K.Thestate of the game is realized through a final, majority vote on the positions of the generals described by vector

v =[vi]1 n. ⇥ The loyal generals will vote truthfully on their positions, while the traitors may vote strate- gically. The majority can be any desirable majority, which means at least a simple majority. Assumption 4: Possible Outcomes. There are three possible types of outcomes. One is no consensus, which is entirely possible when the total number of generals is even. The second and third outcomes are that a consensus on either 0 or 1 will be reached. Only one of the consensus outcomes can be regarded as the “general will”, which we call the optimal consensus,andtowhichthetraitorsareopposed. Assume for simplicity the preference structure of the loyal generals and traitors as follows. If no consensus is made, each general receives payo↵0. If they have an optimal consensus, each traitor receives payo↵1, and each loyal general receives payo↵2. If they have the less

4 optimal consensus, each traitor receives payo↵2, and each loyal general receives payo↵1. The outcome and the payo↵s are only realized at the end of the game. Other than the commander, the lieutenants have no prior information of whether 0 or 1 will be the optimal consensus. In this game, it is natural to investigate the perfect Nash equilibrium.Generals’ strategies m ,v 0, 1 : i n and j n n , 0 k K ij i 2{ } 2 k 2 k [ k+1   are a subgame perfect Nash equilibrium of the game if for any general i at any level k unilaterally deviating to play an alternative strategy

m⇤ ,v⇤ 0, 1 ; j n n , ij i 2{ } 2 k [ k+1 he cannot receive a better payo↵. 1.2 Lieutenants Cannot Communicate We first consider the scenario where the lieutenants at each layer cannot communicate among themselves with Assumptions 1, 2.1, 3, and 4 in Section II. The simplest game in this scenario only has two stages. Given that the lieutenants cannot communicate, the communication graph is a one-layer tree. In the set of generals n = 1, 2,...,n , u n is the commander. At the root of the tree he sends a message, { } 2 mui 0, 1 ,toeachoftheothern 1generals,i,atlayer1,i n u . General i will then choose2{ his position} v 0, 1 based on m .Thenafinalmajorityvotewillbedeliveredon2 { } i 2{ } ui [vi]1 n and the payo↵s from the voting outcome are realized. ⇥ Theorem 1 If the following holds, 1) Only the commander is traitorous; 2) The game only has two stages and takes place on a tree, with the generals at layer 1 unable to communicate among themselves. then any subgame perfect Nash equilibrium will not realize the general will, but the less optimal consensus. Proof of Theorem 1: Assume without loss of generality that the general will is to realize “1” as the outcome, and that the traitor prefers “0” over “1”. If a simple majority decides the final vote, there are two cases to consider. 1) The total number of generals is odd (2m +1,m N). In this case, it is impossible not to reach a consensus, though not a non-consensus2 might be the commander’s most preferred outcome. Then one dominant strategy of u is that mui =0,i n u. In fact, in any equilibrium he only needs to send “0” to at least m +1lieutenants.2 By Decision Rule I, a lieutenant at layer 1 will take whatever message from u as his position. The above scenario is a subgame perfect Nash equilibrium because u has no incentives to change its strategy, and any unilateral deviation by any single lieutenant from “0” to “1” is unable to change the outcome. If u only sends “0” to m lieutenants instead, there always exists one of those m lieutenants preferring to deviate to adopt “1” as his position.

5 2) The total number of generals is even (2m, m N+). If the commander’s most preferred outcome is “0”, it will send “0” to at least 2m lieutenants as above, and this will be a subgame perfect Nash equilibrium. Even if the commander’s most preferred outcome is a non-consensus, his best strategy is to opt for his second-best option, which is to realize “0” as the consensus outcome. For a non-consensus to occur, u has to send exactly “0” to m 1 lieutenants and “1” to m lieutenants, and use “0” as his position. This scenario cannot be a subgame perfect Nash equilibrium, because there will always be a lieutenant receiving “0” as the message defecting to adopt “1” as his position 1. Then in any equilibrium, u has to send “0” to at least m lieutenants and takes “0” as his position, which means that he “sincerely” votes according to his preference and therefore will not defect. Also, none of the lieutenants will unilaterally deviate to take “1” as their positions, which either will not change the outcome, or will bring about a non-consensus. In all, the traitorous commander can always exploit Decision Rule I to disrupt the general will from being reached in equilibrium. A similar argument applies to the cases where the final vote is based on other kinds of majorities. ⇤ A more general scenario than the one in Theorem 1 is that the game has multiple stages and the lieutenants at each layer cannot communicate among themselves. In the set of generals n = 1, 2,...,n , u n is the commander. There is a commander u sending a message, m { 0, 1 ,toeachofthegeneralsatlayer1,} 2 i n . At layer k 1, 2,...,K , ui 2{ } 2 1 2{ } general j will determine his position vi 0, 1 based on the received messages from layer k 1andsendmessagestohisneighborsatlayer2{ } k +1. Then a final majority vote will be delivered on [vi]1 n and the payo↵s from the voting outcome are realized. ⇥ We first require that this game takes place on a multi-layer tree, which means that each non-commander general only has one sender from the previous layer and multiple receivers from the next layer.

Definition 1 (“Critical Mass”) Suppose the total number of generals is 2m +1 (or 2m, m N). For a subset of generals ns, if 2 1) They have the same most preferred outcome, “0” or “1” or nonconsensus.

2) As a result of their message passing, either the total number of generals who will take their favourable positions by Decision Rule I is a desirable majority, or a nonconsensus will occur. we say ns is a “critical mass”.

Remark 1 Given general i at layer k 0, 1, 2,...,K of the communication graph G(k) (which is a tree), denote the subtree of size2{K k +1 with} the root node i as T . Let the set i of nodes in the subtree Ti be nTi . When the vote is decided by a simple majority, a necessary

1This reasoning applies in all other results and will not be repeated.

6 and sucient condition for a set of non-commander traitors n0 to be a “critical mass” is the following inequality

( n >m+1(or m)) ( n = n0 = m +1(or m)) | Ti | _ | Ti | | | i n i n [2 0 [2 0 Theorem 2 If the following holds,

1) Some generals may be traitorous.

2) The game has multiple stages, with the lieutenants at layer k 1, 2,...,K receiving messages only from layer k 1 and transmitting messages to2{ its neighbors} on layer k +1, and the communication graph G(k) being a tree.

3) Assume that a simple majority will decide the final vote. The inequality in Remark 1 holds (does not hold) for the number of traitors.

Then there will not be (will be) a subgame perfect Nash equilibrium realizing the general will.

Proof of Theorem 2: As in Remark 1, let the set of traitors be n0 n. We first consider the case when n =2m +1,andproceedtoprovetheconditioninRemark1tobeboth2 necessary and su| | cient. Necessity. There are two cases to consider.

1) If i n nTi = m +1but i n nTi = n0 ,itmeansthatsomeofthesem +1generals | 2 0 | 2 0 | |6 | | are loyal because nT n0 .Itcannotbeanequilibriumthatthesem +1 i n0 i generalsS vote “0” and| 2 theP other||m |generals vote “1” – a unilateral deviation by any loyal general amongS the m + 1 generals will be profitable.

2) If nT

Suciency. When nT >m+1, any traitor i in any equilibrium will only send i n0 i 0toitsneighborsinthenextlayer.ByDecisionRuleI,thenumberofgeneralsholding0| 2 | as their positions will beS a simple majority. Then the only kind of subgame perfect Nash equilibrium in the game is 0 realized as the outcome. Alternatively, when nT = m +1,whichisthesecondinequalityinthecondition i n0 i in Remark 1. It is a subgame| 2 perfect| Nash equilibrium that these m +1traitorsvote“0” and the m loyal generalsS vote “1”, because none can unilaterally deviate to make himself better o↵. When n =2m,orthefinalvoteisdecidedbyotherkindsofmajorities,asimilar | | argument applies. ⇤ Theorem 1 is a special case of Theorem 2 – if the commander is traitorous, the condition in Remark 1 easily holds. Theorems 1 and 2 could still apply when the loyal generals adopt adi↵erentdecisionrule–aloyalgeneralmaytaketheexactoppositevalueofthemessage received from the commander as his own position. Despite the existence of a traitorous

7 commander, the general will might be enforced in equilibrium if the loyal generals have no specific decision rule. (e.g., acting randomly.) We now consider the scenario in which the lieutenants at the same layer cannot commu- nicate, but the communication graph may not be a tree. Given the neighborhood structure of the generals, it would be particularly dicult to generalize on a necessary and sucient condition by which a critical mass will form. Therefore, we derive a sucient condition for asubsetofgeneralsbeingacriticalmassbelow.

Theorem 3 If the following holds,

1) Both the commander and the generals from level 1 are traitorous with the same preferred positions.

2) The game has multiple stages, with the lieutenants at layer k 1, 2,...,K receiving messages only from layer k 1 and transmitting messages to2{ its neighbors} on layer k +1.

Then there will not be a subgame perfect Nash equilibrium realizing the general will.

Proof of Theorem 3: As before, suppose without loss of generality that the general will is “1”. If the commander and the generals from layer 1 are traitorous, their goal is to have enough generals adopt their positions eventually. At layer k,anytraitorousgenerali has a dominant strategy, which is to send at least mi N neighbors at layer k + 1 his real position, “0”. 2By Decision Rule I, at layer k +1,aloyalgeneralatlayerk +1willadoptthemajority value of the received messages as their positions, and send his positions onto layer k +2.A traitorous general at layer k + 1 will do the same as those traitors did at layer k. None will unilaterally deviate because

1) the traitors have no incentives to change their strategies.

2) given the traitors’ strategies, none of the loyal generals can unilaterally reverse the game outcome.

Then in any equilibrium, the traitors will make sure there is a majority to adopt their preferred positions. ⇤ There could be more sucient conditions along similar lines. For instance, the comman- der does not have to be traitorous. Alternatively, we could further require more generals below layer 1 to be traitorous. This leads to a more general discussion in Section 3.2 on the connection between the existence of a “critical mass” of traitors and the equilibrium prediction.

1.3 Lieutenants Can Communicate Now we consider the scenario where the lieutenants at each layer may communicate among themselves, with the same set of assumptions as in Section 3.1.

8 The primary di↵erence from the second game is that at layer k 1, 2,...,K ,general 2{ } i will determine his position vi 0, 1 and send messages to his neighbors at layer k +1 based on the messages from layer2k{ 1and} k.Thenwhenreceivingconflictingmessages,it may be impossible to deduce using Decision Rule I whether the traitors are from his layer or the previous layer or both.

v5 v4 layer = 2 layer = 2

layer =1 layer = 1 v v1

v6 v3

layer =2 layer = 2

u

layer = 0

Figure 1: Traitors exist from both layer 1 and layer 2

Example 1 Figure 1 shows a communication graph G(2) (with a commander and two layers of lieutenants). Suppose the second lieutenant v1 from layer 1 and the fifth lieutenant v5 from layer 2 are traitorous. On receiving the likely identical messages from both v1 and v5, v4 may not know both of them to be traitorous. On receiving the likely conflicting messages from the commander u and v2, v1 may not tell whether one or both of them are traitorous.

The following statement is a tautology. If we have

1) Some generals are traitorous.

2) The game is played as described by Decision Rule I.

3) The traitors constitute a critical mass then there will not be a subgame perfect Nash equilibrium realizing the general will. For the game considered in this section, it would be unrealistic to determine the existence conditions for a critical mass of traitors, when given a set of generals n and a sequence of communication graphs G(k),k 0, 1,...,K . It would be realistic, however, to examine whether such a critical mass exists.2{ We propose} one algorithm for this purpose below.

Theorem 4 By Algorithm 1, the set of traitors in the game n0 are a critical mass if ns is a desirable majority.

Proof of Theorem 4: By Algorithm 1, if general i at layer k 0, 1,...,K is a traitor, 2{ } it is then an element of the set ns.

9 Algorithm 1 Input:

nk, G(k); k 0, 1,...,K , n0 2{ } Output:

ns Initialize:

n , n⇤ , ¯n s ; s ; s ; for 0 k K do   for i n do 2 k if i n0 then n 2 n i s s [{} else if j nk 1 : i, j k and j n0 > |{ 2 { }2E 2 }| j nk 1 : i, j k and j n0 then |{ 2 { }2E 62 }| n⇤ n⇤ i s s [{} else go to the next general in nk for i n do 2 k if j nk 1 nk : j ns ns⇤ and i, j k > |{ 2 [ 2 [ { }2E }| j nk 1 nk : j ns ns⇤ and i, j k then ¯n |{ 2¯n i [ 62 [ { }2E }| s s [{} else go to the next general in nk. n n ¯n s s [ s return ns

10 Aloyalgenerali at layer k 0, 1,...,K first determines what messages to send to its neighbors at the same layer. If2 he{ has more traitors} in his neighbors than loyal generals from layer k 1, j nk 1 : i, j k and j n0 > |{ 2 { }2E 2 }| j nk 1 : i, j k and j n0 |{ 2 { }2E 62 }| by Decision Rule I that he will send the traitors’ preferred value to its neighbors at layer k in the game. Then we let i be an element of the set ns⇤. By Decision Rule I, the loyal general i will determine his position as well as the messages sent to his neighbors at layer k +1basedonmessagesreceivedfromlayerk 1 and layer k. Then if i has more neighbors from layer k 1andlayerk who will send him a traitor’s preferred value than neighbors from these two layers who will not,

j nk 1 nk : j ns ns⇤ and i, j k > |{ 2 [ 2 [ { }2E }|

j nk 1 nk : j ns ns⇤ and i, j k |{ 2 [ 62 [ { }2E }| he will adopt this value as his position and send it to neighbors at layer k +1.Weleti be an element of the set ¯n s. At the end of the k-th iteration (k 0, 1,...,K ), the updated set ns denotes the union of the set of traitors and the set of loyal2{ generals who} will adopt the same positions with the traitors up to layer k. At the end of the algorithm, if n is a desirable majority, then the original set of traitors | s| n0 are a critical mass by definition. ⇤ Finding the minimum number of traitors who will constitute a critical mass is equivalent to solving the following integer programming problem for the communication graph G.

min n0 | | such that n0 n n is a desirable majority⇢ 2 | s| The integer programming problem is unlikely to be computed in polynomial time. Nev- ertheless, it does not have to be computed – If the commander is traitorous, his strategy is to selectively influence those from below layers such that a desirable majority holding his preferred positions will eventually be obtained, regardless of whether other traitors might exist in the game. He could revise his strategy accordingly if such a goal falls short. Then in any subgame perfect Nash equilibrium, the general will is not realized with the existence of a traitorous commander. By this reasoning, if the commander has a preferred value (instead of a nonconsensus), he will have a majority voting for this value in equilibrium. Therefore, the minimum number of traitors that may constitute a critical mass for all instances of the above problem is 1. 2 ns is obtained by Algorithm 1.

11 1.4 Alternative Decision Rule Earlier results, as well as Algorithm 1, are consistent with Decision Rule I. By Decision Rule I, each general at layer k 0, 1,...,K can only observe messages received from layer k 1 and layer k. This decision2 rule{ has limitations,} and we propose Decision Rule II below.

Assumption 2.2. Decision Rule II. Each general i nk; k 0, 1,...,K can observe the messages transmitted up to time k.Thenhecanobserveallthe2 2{ message} paths leading to node i.Amessagepathfori at layer k, pi,isadirectedwalkthatbeginswith node u and ends with node i in the communication graph G(k). Denote the set of all message paths for i as i, and the set of generals in those paths as .Letthesetofgeneralsin prior to layer kPwho have not sent conflicting messages be Vi Vi 0 ,thesetofgeneralsatthenextlayertowhichthegeneralsin 0 send messages be , Vik Vik Qi and the set of i’s neighbors from layer k who have not sent conflicting messages be ik0 . Aloyalgenerali at layer k will pass the following simple majority value N

majority m : j 0 and g { jg 2Vik 2Qi} on to his neighbors at the same layer. General i will take the following simple majority value

majority m ,m : j 0,g and h 0 { jg hi 2Vi 2Qi 2Nik} as his position and send this value to his neighbors at layer k +1. As previously, a traitor can send anything he wishes. For instance, in Example 1, v7 has two message paths in 7,whichare u, v2 , v2,v3 , v3,v7 and P {{ } { } { }} u, v3 , v3,v7 .Thesetofgeneralsinthepathsare i = u, v2,v3 . If none of the gen- erals{{ in} {u, v ,v}},v have sent conflicting messages, v Vwill form{ his position} based on the { 2 3 7} 7 messages sent by u to layer 1, and the messages sent by v2 and v3 to layer 2. Next, we derive two results based on Assumptions 1, 2.2, 3, and 4.

Theorem 5 If the following holds,

1) The traitors are a desirable majority among all the generals up to layer k 0, 1,...,K . 2{ } 2) The generals play the game as described by Decision Rule II. then there will not be a subgame perfect Nash equilibrium realizing the general will.

Proof of Theorem 5: Suppose without loss of generality that the general will is “1”. By Assumption 1, G(k); k 0, 1,...,K is weakly connected. Therefore, there is a directed walk between general i 2at{ layer k and} any other general before layer k. For general i,hewillpassthebelowmajorityvaluetohisneighborsatthesamelayer.

majority m : j n and g { jg 2 q 2Njq+1} q 0,1,...,k 2{ [ }

12 General i will then form a position based on the following value

majority m ,m : j n ,g and h 0 { jg hi 2 q 2Njq+1 2Nik} q 0,1,...,k 2{ [ } and pass on this position to his neighbors at layer k +1. By Decision Rule II, the traitors would not find it optimal to send conflicting messages to their neighbors. Then if the traitorous generals up to layer k 0, 1,...,K make up a majority, a majority of generals at time K will vote for their preferred2{ position,} “0”. This is a subgame perfect Nash equilibrium where none will unilaterally deviate, and no other equilibria exist. ⇤ Theorem 6 If the following holds, 1) The commander is traitorous.

2) The generals play the game as described by Decision Rule II. then there will not be a subgame perfect Nash equilibrium realizing the general will. Proof of Theorem 6: General i at layer 1 will send the received value from the traitorous commander to his neighbors at the same layer. Then any two connected generals at this layer receiving the same value from the com- mander will transmit this value. General i’s position depends on the majority value received from the commander and the neighbors at the same layer if none of them has sent conflicting messages. This logic applies to any layer k 1, 2,...,K . Accordingly, the traitorous commander2 will{ send its preferred} value “0” to a number of his neighbors at layer 1 such that a majority will vote for his preferred position at time K. Actually, he should send its preferred value to all neighbors at layer 1. If he sends conflicting messages, generals at layer 1 will not base their positions on his message by Decision Rule II. This is a subgame perfect Nash equilibrium where none will unilaterally deviate, and no other equilibria exist. ⇤ Apparently, “fault tolerance” [1] – whether or not the general will can be realized in equilibrium in the presence of traitors – depends on the decision rule by the generals. The advantage of Decision Rule II is that it can tolerate around a majority of the generals to be traitors (except the case where the commander is traitorous), compared to a threshold 1 of 3 in the original problem [1], and a possibly even lower threshold under Decision Rule I. (whether the traitors can be a “critical mass”.)

2 Control of Markov Chains: Two Games

The theory of control of Markov chains and its applications have been an active research area over the past decades [29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40]. Controlling the Markov chain can be made to achieve the fastest mixing rate of the chain [40], or to maximize an average payo↵function over time [29] when a Markov decision process is assumed.

13 It is natural to consider the “game-theoretic control of Markov chains,” in which the agent adapts its control of the Markov chain as he learns components of the others’ strategies and vice versa. The closest strand of literature is stochastic games, where the players select actions and receive payo↵s that depend on the current state and the chosen actions. The game then moves to a new random state whose distribution depends on the previous state and the actions selected by the players. One objective of studying stochastic games is to examine whether a Nash equilibrium exists for the games [41, 42, 43, 44]. However, we have noticed that many problems may better be studied without having to assume an underlying Markov decision process as in a . Applications may arise where one or more agents control a Markov chain to optimize one or more performance index, such as the Kemeny constant of a Markov chain in a stochastic surveillance problem [45]. We will first study a network game of the robotic surveillance problem [38, 45, 46], where the surveillance agent’s strategy is to control a Markov chain. In prior studies of a surveillance problem, one or one group of surveillance agents must provide adequate coverage of an environment to monitor and detect any intruders. In [46], agents’ surveillance strategies are modeled to be stochastic. A surveillance agent performs a random walk on a hypergraph, where each node corresponds to a section of the environment and each edge of the graph is labeled with the probability of transition between the nodes. The question of interest is the rate at which the Markov chain converges to its steady-state distribution known as the mixing rate [46, 47]. Motivated by this fundamental idea [46], we will consider the case where the surveillance agent’s strategy is to design the initial distribution and the transition probability matrix of the Markov chain, and the intruder’s strategy is to choose the time in which to attack each node in the environment. As this will be a game-theoretic problem, we will focus on studying the strategies for both agents. We will then turn to another network game which we name as the mutual influences problem, where agents optimize their social power by strategically influencing each other. The line of work that motivates our study starts with the DeGroot-Friedkin model [48, 49, 50] in opinion dynamics. The DeGroot-Friedkin model is about agents’ updated self-appraisal (defined as their “social power”) [51, 50, 49, 52] in a process of exchanging opinions with neighbors on a sequence of issues. The model consists of a rule by DeGroot describing the agents’ exchange of opinions or influences known as “distributed averaging” [48] and a rule by Friedkin [49] describing the agents’ update of self-appraisal. We will present two parsimonious game-theoretic models using Markov chain to investigate the evolution of agents’ social power based on their mutual, strategic influences. In either problem, we plan to study the existence and properties of game equilibria by changing the assumptions on agents’ knowledge, strategies and optimization objectives. The main advantage of the proposed approach is that the payo↵functions for the agents will be properties derived from a Markov chain itself, such as the stationary distribution. Without forcing the payo↵functions to be of a certain property such as being convex, this stands in contrast with those assumed payo↵functions in standard game-theoretic approaches to these problems in [2, 3], [4], [5], [6], [7], [8] and others.

14 2.1 The Robotic Surveillance Game As mentioned earlier, it is a fundamental problem of continuing importance to detect the presence of a malicious agent or intruder who is disrupting the functioning of a distributed system such as transportation network [53, 54], a wireless sensor network [55, 56], or some other type of network of interest. A central aim of this task is to address this problem by formulating and solving what might be called a “network surveillance game.” Setup. For purposes of analysis, such a network can typically be modeled by an undi- rected graph G with m>1verticeslabeled1tom.Theverticesrepresent“locations”inthe network at which the intruder can carry out an attack. It is presumed that time is discrete and that the goal of the intruder is to disrupt a given nonempty subset m of vertices, one at a time, within a period of T time units, such as by air strikes. A setI⇢ of distinct times = t ,t ,...,t 1, 2,...T is called an attack set if for each i ,theintruder T { 1 2 m¯ }⇢{ } 2I attacks vertex i at time ti;here¯m is the number of vertices in .Let denote the class of all possible attack sets and note that is a finite set because timeI is discrete.C One way to protect such a networkC is to employ a surveillance agent whose job is to prevent an intruder from causing a disruption. The surveillance agent is presumed to ac- complish this by arriving at a vertex in while the intruder is there. The surveillance agent seeks to accomplish this by searching theI network along a prescribed “search path” where by a“searchpath”ismeantasequenceofG0sverticesv(t),t 1, 2,..., with the property 2{ 1} that for each t 1, 2,..., ,eitherv(t)=v(t +1)or(v(t),v(t +1))isanedgeinG.The actual search path2{ is determined1} probabilistically. It is assumed that v(t),t 1, 2,..., 2{ 1} is a Markov chain generated by an 1 m initial probability distribution vector ⇡0 and an m m row stochastic matrix ⇧of transition⇥ probabilities. Let denote the set of all initial ⇥ P probability distribution vector-stochastic matrix pairs (⇡0, ⇧) which are consistent with G. Static Game. Suppose the game is static in which case the two players’ decisions are made without any knowledge of what the other has chosen. The surveillance agent picks an initial distribution probability vector-transition probability matrix pair (⇡ , ⇧) to 0 2P maximize the sum of the probabilities of meeting the surveillance agent at time ti as ti ranges over .Inotherwords,thesurveillanceagentchoosesastrategy which for each fixed T ,maximizesthesum C!P T2C ti (⇡0⇧ )i i X2I over where for each i ,(⇡ ⇧ti ) is the ith component of the probability vector ⇡ ⇧ti .At P 2I 0 i 0 the same time the intruder chooses a strategy which for each fixed pair (⇡0, ⇧) , minimizes the same sum over .TheobviousquestionatthispointiswhetherornottheseP!C 2P strategies actually determine anC equilibrium? We propose to pursue this question next. One equilibrium concept is as follows. A strategy profile, which is a combination of the choices of the two players and written as ((⇡0⇤, ⇧⇤), ti⇤,i ), is a pure strategy Nash equilibrium if neither player cannot unilaterally improve{ his payo2I}↵. In other words, given the attack set ti⇤,i selected by the intruder and any pair (⇡0, ⇧) chosen by the surveillance agent, there{ holds2I} t t⇤ (⇡ ⇧ i⇤ ) (⇡⇤⇧⇤ i ) . 0 i  0 i i i X2I X2I 15 Similarly, given the pair (⇡0⇤, ⇧⇤)selectedbythesurveillanceagentandanyattackset ti,i chosen by the intruder, there holds { 2 I}

ti t⇤ (⇡⇤⇧⇤ ) (⇡⇤⇧⇤ i ) . 0 i 0 i i i X2I X2I There appears to have been little to no work on games where the agents’ strategies involve choosing an initial probability distribution vector-transition probability matrix pair. Moreover, the goal is to find not only the solution to each agent’s optimization problem but also the solutions that are in best responses with one another.

Theorem 7 If the intruder’s attack set has only a location = i , pure strategy Nash ti I { } equilibrium exists the game, and for any equilibrium, (⇡0⇧ )i =1.

Proof: Let ⇡0 =[⇡0(i)]1 m be such that ⇡0(i)=1andforj m i , ⇡0(j) = 0. And ⇥ 2 \{} let ⇧= [⇧ij]m m be such that the elements from the ith column are 1s, which means ⇥ j m, ⇧ji =1andtherestoftheelementsin⇧are0s.Theintrudercanchoosetoattack i8at2 any point in time 0 t T.   The surveillance agent’s strategy (⇡0, ⇧) and the intruder’s strategy ti constitute a pure strategy Nash equilibrium because

1. It is assumed that intruder has to attack i before time T. Given any attack time ti,the surveillance agent can catch the intruder with probability 1. The surveillance agent will not deviate from (⇡0, ⇧). 2. The intruder is thus indi↵erent as to when to attack, and therefore will not deviate from ti. There exists other equilibria of the game. In any equilibrium, the surveillance agent will ti design a (⇡0, ⇧) such that (⇡0⇧ )i =1whentheintruderattacksattimeti. Given that the intruder may adjust the attack time ti based on the surveillance agent’s choice of ⇡0 and ⇧, a dominant strategy of the surveillance agent is that ⇡ ⇧ti =1, t 1, 2,...,T . Then the 0 8 i 2{ } intruder is indi↵erent as to when to attack i. And neither would like to deviate. ⇤

2.2 The Mutual Influence Problem We then formulate a mutual influence game using Markov chain as a static game and a dynamic game, and study agents’ evolution of social power using the games. The games present a new definition of social power, which is no longer the weight the agents place on their own opinion as in the DeGroot-Friedkin model [50, 49] and the follow-up work such as [57]. A motivating example is that the mutual influences among countries in a trade war scenario can be described using the transition probability matrix of a Markov chain, and the social power of those countries, which represents their economic clout, can be defined using the stationary distribution of the chain. Setup. Amutualinfluencegameiswheresomeagentsinfluenceeachotherovera sequence of issues, and optimize their social power by choosing the level of dependence on

16 others over each issue. Suppose there are m issues labeled in m = 1, 2,...,m and n agents labeled in n = 1, 2,...,n in total. In the case of the trade war,{ the issues may} correspond to a basket of{ goods being} traded among countries. In the case of opinion dynamics, the issues may correspond to a set of topics agents have in discussion. For the kth issue (0 k m), the agents interact with each other on an undirected   graph G(k)= , (k) . The n n nonnegative, row stochastic matrix I(k)isthemutual {V E } ⇥ influence matrix for the k-th issue consistent with G(k). The amount of dependence i has on others can be described by a nonnegative, real-valued, 1 n vector Ii(k)=[Iij(k)]1 n, ⇥ ⇥ where Iij(k) indicates the amount of dependence i has on j and Iii(k)denotesthelevelof self-reliance. If i and j are not connected in G(k)forthiskth issue, Iij(k)=0. Let the row vector ⇡(k)bethestationaryprobabilityvectorofthestochasticmatrixinthesensethat ⇡(k)I(k)=⇡(k). For the kth issue, ⇡t(k),t 1, 2,...,T denotes the vector of agents’ social power on this issue at time t, and is a Markov2{ chain generated} by the initial probability distribution vector ⇡0(k)andtherowstochasticmatrixI(k)oftransitionprobabilitiessuchthat

t ⇡t(k)=⇡0(k)I(k) . For the kth issue, suppose a subset n(k) n players is selected to exert influences on each other. Given i,denotethesetofissuesforwhichheischosenas⇢ m . For l m , i’s i 2 i strategy is to change the following elements in Ii(l), which are I (l),j n(l). ij 2 Static Mutual Influence Game. Astatic,mutualinfluencegamecanberegardedas simultaneously taking place on multiple communication networks, each of which corresponds to one issue over which the agents mutually influence each other. Denote ⇡i(k)asagenti’s social power in the stationary distribution ⇡(k). In the game, the optimization problem for each agent i is thus the following

max ⇡i(k) Ii(k)1=1 k m X2 i The selected agents’ strategies Iij(k)wherei, j n(k)andk m constitute a pure strategy Nash equilibrium if for any agent i he cannot2 make profitable2 unilateral deviations.

In other words, given any alternative strategy of i in Iij⇤ (k)wherej n(k)andk m,there holds 2 2 ⇡⇤(k) ⇡ (k). i  i k m k m X2 i X2 i Dynamic Mutual Influence Game. We now propose a dynamic mutual influence game. Following the ideas in [50, 49, 57], suppose in the dynamic game that a next issue is only dealt with after the agents finish with a previous issue, and that each of the agents’ social power on the prior issue will be their initial power on the next issue. Let the initial probability vector of the agents for the (k +1)thissuebe ⇡0(k +1)= ⇡(k). The mutual influence process of the agents over the (k +1)thissuecanbeexpressedas

⇡(k)I(k +1)1

17 whose ith component is written as (⇡(k)I(k +1)1)i. The optimization problem for each agent is thus the following

max (⇡(k)I(k +1)1)i. Ii(k)1=1 k m X2

The agents’ strategies Iij(k)wherei, j n(k)andk m constitute a pure strategy Nash equilibrium if for any agent i it cannot make2 profitable unilateral2 deviations. In other words, given any alternative strategy of i in I⇤ (k)wherej n(k)andk m,thereholds ij 2 2

(⇡⇤(k)I⇤(k +1)1) (⇡(k)I(k +1)1) . i  i k m k m X2 X2

Theorem 8 Suppose the initial probability distribution vector ⇡0(k) has a number of nonzero entries, and q(k)= 1, 2,...,n0 n(k) is the labeling set of these entries. Let the mutual { }⇢ influence matrix I(k) be the row stochasatic matrix such that i q(k), Iii(k)=1. Then the sequence of matrices I(k); k m is a pure strategy Nash equilibrium8 2 in both the static game and the dynamic game. 2

t Proof: In the static game, we know that ⇡t(k)=⇡0(k)I(k) and have the following

1. For i q(k)andgivenothers’strategiesIji(k),j n(k), i should optimally set Iii(k) to be2 1, and has no incentives to unilaterally deviate2 from this strategy.

2. For i n(k) q(k), ⇡ (k)=0,t 1, 2,... regardless of the strategy I i chooses. 2 \ t 2{ } i Therefore, none will unilaterally deviate from I(k). The same reasoning applies to all of the k issues. Then I(k); k m is a pure strategy Nash equilibrium. In the dynamic game,2 we have that for each (k +1)-thissue,theupdatingprocessis described as ⇡(k)I(k +1)1.Fork m, ⇡ (k)isnonzeroifandonlyiftheith coordinate in 2 i the initial probability distribution vector ⇡0(k)i is nonzero. If ⇡i(k)isnonzero,i’s strategy regarding the k +1thissueisIii(k +1)=1.Thepreviousreasoningforthenon-deviation in the static game still applies. Therefore, if the initial probability distribution vector ⇡0(k)hasanumberofnonzero entries labeled in q(k) n(k), the mutual influence matrix I(k); k m where i q(k), ⇢ 2 8 2 Iii(k) = 1 is a pure strategy Nash equilibrium in the dynamic game. ⇤

Theorem 9 Regardless of the initial probability distribution vector ⇡0(k), a sequence of identity matrices I(k); 1 k m is a pure strategy Nash equilibrium in both the static game and the dynamic game.  

Proof: For agent i,giventheothers’strategiesinI(k); 1 k m, i’s unilateral deviation must be such that k m, I (k) =1. In both games, we have the following 9 2 ii 6

1. In the static game, given the strategies of others, the deviation from Ii(k); 1 k m will decrease ⇡ (k)andthen ⇡ (k). Therefore, i’s unilateral deviation is not i k mi i profitable. 2 P 18 2. In the dynamic game, the unilateral deviation will decrease ⇡i(k), and then k m(⇡(k)I(k+ 2 1)1)i.Therefore,i’s unilateral deviation is not profitable. P Thus, the sequence of identity matrices I(k); 1 k m is a pure strategy Nash equilib-   rium in both the static game and the dynamic game. ⇤

3 Discussion

In this paper, we first have formulated a game of the Byzantine generals on a sequence of time-varying communication graphs. We have proposed two decision rules and studied equilibrium predictions of the game with the presence of traitorous generals. We found that when a traitorous commander exists, there will be no subgame perfect Nash equilibrium realizing the general will under either rule. In other cases, the number of traitors will impact equilibrium predictions. For future work, it may be interesting to study the mixed strategy Nash equilibria of the game. Moreover, we would like to investigate the more general case of the game, where a lieutenant can act at multiple times in the game. It will be worthy of studying how a general may deduce who are the traitors from the messages transmitted, as well as how the decision rules may influence the game equilibria. In the robotic surveillance game, there can be many other possible payo↵functions which one might use to understand the bases for both the intruder and the surveillance agent making choices. For example, suppose the probability of the surveillance agent’s first return time to vertex i at t = ⌧ is pi(⌧); pi(⌧)canberecursivelycalculatedusing

⌧ 1 p (⌧)=⇡ (⌧) p (k)⇡ (⌧ k). i ii i ii Xk=1 ⌧ Here ⇡ii(⌧)istheii-th entry of ⇧ ,whichistheprobabilityofthesurveillanceagentgoing from vertex i to vertex i in ⌧ steps. ⇡ii(⌧)canbederivedusingtheChapman-Kolmogorov equation [58]. At time ti,theprobabilityoftheintruderfirstmeetingthesurveillanceagent at vertex i and thus being captured by the surveillance agent is pi(ti). The intruder then chooses the time to attack each vertex i by minimizing the payo↵function 2I

(⇡0)ipi(ti), i X2I where (⇡0)i is the probability of the surveillance agent starting from vertex i at t =0. This two-player game can easily be extended to a Stackelberg game [59] where one of the agents chooses his actual strategy before the other. In a Stackelberg game, the suitable equilibrium concept is a subgame perfect Nash equilibrium. Another straightforward exten- sion is to an N player game with multiple surveillance agents and intruders. Relatedly, in the mutual influences game, we also expect the fastest mixing rate of the Markov chain is relevant for the existence of other Nash equilibria in the games.

19 References

[1] Leslie Lamport, Robert Shostak, and Marshall Pease. The byzantine generals problem. ACM Transactions on Programming Languages and Systems (TOPLAS),4(3):382–401, 1982.

[2] Bo An, Matthew Brown, Yevgeniy Vorobeychik, and Milind Tambe. Security games with surveillance cost and optimal timing of attack execution. In Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems,pages 223–230. International Foundation for Autonomous Agents and Multiagent Systems, 2013.

[3] Bo An, Milind Tambe, Fernando Ordonez, Eric Anyung Shieh, and Christopher Kiek- intveld. Refinement of strong stackelberg equilibria in security games. In AAAI,2011.

[4] Rong Yang, Christopher Kiekintveld, Fernando Ord´o˜nEz, Milind Tambe, and Richard John. Improving resource allocation strategies against human adversaries in security games: An extended study. Artificial Intelligence,195:440–469,2013.

[5] Tansu Alpcan and Tamer Ba¸sar. Network security: A decision and game-theoretic approach. Cambridge University Press, 2010.

[6] Rufus Isaacs. Di↵erential games: a mathematical theory with applications to warfare and pursuit, control and optimization.CourierCorporation,1999.

[7] Joao P Hespanha, Hyoun Jin Kim, and Shankar Sastry. Multiple-agent probabilistic pursuit-evasion games. In Decision and Control, 1999. Proceedings of the 38th IEEE Conference on,volume3,pages2432–2437.IEEE,1999.

[8] Valerii Patsko, Sergey Kumkov, and Varvara Turova. Pursuit-evasion games. Handbook of Dynamic Game Theory,pages951–1038,2018.

[9] Michael J Fischer. The consensus problem in unreliable distributed systems (a brief survey). In International conference on fundamentals of computation theory,pages 127–140. Springer, 1983.

[10] Joseph Y Halpern. A computer scientist looks at game theory. Games and Economic Behavior,45(1):114–131,2003.

[11] . The electronic mail game: Strategic behavior under” almost common knowledge”. The American Economic Review,pages385–391,1989.

[12] Ken Binmore and Larry Samuelson. Coordinated action in the electronic mail game. Games and Economic Behavior,35(1-2):6–30,2001.

[13] Vincent P Crawford and Joel Sobel. Strategic information transmission. Econometrica: Journal of the Econometric Society,pages1431–1451,1982.

20 [14] Marco Battaglini and Uliana Makarov. with multiple audiences: An exper- imental analysis. Games and Economic Behavior,83:147–164,2014.

[15] Mikhail Golosov, Vasiliki Skreta, Aleh Tsyvinski, and Andrea Wilson. Dynamic strate- gic information transmission. Journal of Economic Theory,151:304–341,2014.

[16] Andrea Galeotti, Christian Ghiglino, and Francesco Squintani. Strategic information transmission networks. Journal of Economic Theory,148(5):1751–1769,2013.

[17] Thomas R Palfrey. A mathematical proof of duverger’s law. 1988.

[18] Anthony Downs et al. An economic theory of democracy. 1957.

[19] Otto A Davis, Melvin J Hinich, and Peter C Ordeshook. An expository development of a mathematical model of the electoral process. American political science review, 64(2):426–448, 1970.

[20] David P Baron and John A Ferejohn. Bargaining in legislatures. American political science review,83(4):1181–1206,1989.

[21] David Austen-Smith and Je↵rey Banks. Elections, coalitions, and legislative outcomes. American Political Science Review,82(2):405–422,1988.

[22] Kenneth A Shepsle. Institutional arrangements and equilibrium in multidimensional voting models. American Journal of Political Science,pages27–59,1979.

[23] Roger B Myerson. Large poisson games. Journal of Economic Theory,94(1):7–45,2000.

[24] James M Snyder Jr, Michael M Ting, and Stephen Ansolabehere. Legislative bargaining under weighted voting. American Economic Review,95(4):981–1004,2005.

[25] William H Riker. The two-party system and duverger’s law: an essay on the history of political science. American political science review,76(4):753–766,1982.

[26] Kenneth Shepsle. Models of multiparty electoral competition.Routledge,2012.

[27] Peter J Coughlin. Probabilistic voting theory. Cambridge University Press, 1992.

[28] James M Enelow and Melvin J Hinich. The spatial theory of voting: An introduction. CUP Archive, 1984.

[29] Vivek S Borkar. Control of markov chains with long-run average cost criterion: the dy- namic programming equations. SIAM Journal on Control and Optimization,27(3):642– 657, 1989.

[30] R Wheeler and K Narendra. Decentralized learning in finite markov chains. IEEE Transactions on Automatic Control,31(6):519–526,1986.

21 [31] M Aicardi, F Davoli, and R Minciardi. Decentralized optimal control of markov chains with a common past information set. IEEE Transactions on Automatic Con- trol,32(11):1028–1031,1987.

[32] Fran¸cois Delebecque and Jean-Pierre Quadrat. Optimal control of markov chains ad- mitting strong and weak interactions. Automatica,17(2):281–296,1981.

[33] Todd L Graves and Tze Leung Lai. Asymptotically ecient adaptive choice of control laws incontrolled markov chains. SIAM journal on control and optimization,35(3):715– 743, 1997.

[34] G Santharam and PS Sastry. A reinforcement learning neural network for adaptive control of markov chains. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans,27(5):588–600,1997.

[35] Vivek S Borkar. Ergodic control of markov chains with constraints—the general case. SIAM journal on control and optimization,32(1):176–186,1994.

[36] Ashutosh Nayyar, Aditya Mahajan, and Demosthenis Teneketzis. Decentralized stochas- tic control with partial history sharing: A common information approach. IEEE Trans- actions on Automatic Control,58(7):1644–1658,2013.

[37] Mishel George, Saber Jafarpour, and Francesco Bullo. Markov chains with maximum entropy for robotic surveillance. IEEE Transactions on Automatic Control,2018.

[38] Xiaoming Duan, Mishel George, and Francesco Bullo. Markov chains with maximum return time entropy for robotic surveillance. arXiv preprint arXiv:1803.07705,2018.

[39] Riccardo Leonardi, Pierangelo Migliorati, and Maria Prandini. Semantic indexing of soc- cer audio-visual sequences: a multimodal approach based on controlled markov chains. IEEE Transactions on Circuits and Systems for Video Technology,14(5):634–643,2004.

[40] Michael M Zavlanos, Daniel E Koditschek, and George J Pappas. A distributed dynam- ical scheme for fastest mixing markov chains. In American Control Conference, 2009. ACC’09.,pages1436–1441.IEEE,2009.

[41] S Lakshmivarahan and Kumpati S Narendra. Learning algorithms for two-person zero- sum stochastic games with incomplete information: A unified approach. SIAM Journal on Control and Optimization,20(4):541–552,1982.

[42] PS Sastry, VV Phansalkar, and MAL Thathachar. Decentralized learning of nash equi- libria in multi-person stochastic games with incomplete information. IEEE Transactions on systems, man, and cybernetics,24(5):769–777,1994.

[43] Xiaofeng Wang and Tuomas Sandholm. Reinforcement learning to play an optimal nash equilibrium in team markov games. In Advances in neural information processing systems,pages1603–1610,2003.

22 [44] Michael Bowling and Manuela Veloso. Rational and convergent learning in stochastic games. In International joint conference on artificial intelligence,volume17,pages 1021–1026. Lawrence Erlbaum Associates Ltd, 2001. [45] Rushabh Patel, Pushkarini Agharkar, and Francesco Bullo. Robotic surveillance and markov chains with minimal weighted kemeny constant. IEEE Transactions on Auto- matic Control,60(12):3156–3167,2015. [46] Jeremy Grace and John Baillieul. Stochastic strategies for autonomous robotic surveil- lance. In Decision and Control, 2005 and 2005 European Control Conference. CDC- ECC’05. 44th IEEE Conference on,pages2200–2205.IEEE,2005. [47] Stephen Boyd, Persi Diaconis, and Lin Xiao. Fastest mixing markov chain on a graph. SIAM review,46(4):667–689,2004. [48] Morris H DeGroot. Reaching a consensus. Journal of the American Statistical Associ- ation,69(345):118–121,1974. [49] Noah E Friedkin. A formal theory of reflected appraisals in the evolution of power. Administrative Science Quarterly,56(4):501–529,2011. [50] Peng Jia, Anahita MirTabatabaei, Noah E Friedkin, and Francesco Bullo. Opinion dy- namics and the evolution of social power in influence networks. SIAM review,57(3):367– 397, 2015. [51] Dorwin Ed Cartwright. Studies in social power. 1959. [52] Noah E Friedkin and Eugene C Johnsen. Social influence and opinions. Journal of Mathematical Sociology,15(3-4):193–206,1990. [53] Xiaopeng Li and Yanfeng Ouyang. Reliable trac sensor deployment under probabilistic disruptions and generalized surveillance e↵ectiveness measures. Operations research, 60(5):1183–1198, 2012. [54] Michael D Meyer. Transportation system as a security challenge. Handbook of Science and Technology for Homeland Security. Wiley, Hoboken, New Jersey,2010. [55] Adrian Perrig, John Stankovic, and David Wagner. Security in wireless sensor networks. Communications of the ACM,47(6):53–57,2004. [56] Tim Bass. Intrusion detection systems and multisensor data fusion. Communications of the ACM,43(4):99–105,2000. [57] Mengbin Ye, Ji Liu, Brian DO Anderson, Changbin Yu, and Tamer Ba¸sar. Evolution of social power in social networks with dynamic topology. IEEE Transactions on Automatic Control,63(11):3793–3808,2018. [58] Sheldon M Ross. Introduction to probability models. Academic press, 2014. [59] Heinrich Von Stackelberg. Market structure and equilibrium.SpringerScience&Busi- ness Media, 2010.

23