Signaling in Bayesian Stackelberg Games

Haifeng Xu1, Rupert Freeman2, Vincent Conitzer2, Shaddin Dughmi1, Milind Tambe1 1University of Southern California, Los Angeles, CA 90007, USA {haifengx,shaddin,tambe}@usc.edu 2Duke University, Durham, NC 27708, USA {rupert,conitzer}@cs.duke.edu

ABSTRACT that are correlated with the action she has chosen. This ability can Algorithms for solving Stackelberg games are used in an ever- of course never hurt the leader: she always has the choice of send- growing variety of real-world domains. Previous work has ex- ing an uninformative signal. In a two-player normal-form game, it tended this framework to allow the leader to commit not only to turns out that no benefit can be had from sending an informative a distribution over actions, but also to a scheme for stochastically signal. This is because the expected leader utility conditioned on signaling information about these actions to the follower. This can each signal, which corresponds to a posterior belief of the leader’s result in higher utility for the leader. In this paper, we extend this action, is weakly dominated by the expected leader utility of com- methodology to Bayesian games, in which either the leader or the mitting to the optimal mixed [4]. But this is no longer true follower has payoff-relevant private information or both. This leads in games with three or more players. Moreover, intriguingly, the to novel variants of the model, for example by imposing an incen- enriched problem with signaling can still be solved in polynomial tive compatibility constraint for each type to listen to the signal time in these games [4]. The idea of adding signals has also al- intended for it. We show that, in contrast to previous hardness ready been explored in security games [28], however these games results for the case without signaling [5, 16], we can solve unre- were not Bayesian (but with richer game structure). stricted games in time polynomial in their natural representation. In this paper, we extend this line of work to Bayesian Stack- For security games, we obtain hardness results as well as efficient elberg Games (BSGs). We suppose that, when the follower has algorithms, depending on the settings. We show the benefits of our multiple possible types, the leader is able to send a separate signal approach in experimental evaluations of our algorithms. to each of these types, without learning what the type is. For exam- ple, consider a security game on a rail network in which we aim to catch ticketless travelers (or, better yet, give them incentives to buy Keywords a ticket). Here, the attacker’s type could encode at which location Bayesian Stackelberg Games, Algorithms, Signaling, Security he starts his journey. Then, by making a separate announcement Games at each station, we send a separate signal to each type. As an- other example, we may send different signals over different (say) 1. INTRODUCTION radio frequencies. In this case, each follower type receives a sepa- In the algorithmic community, and especially the rate signal depending on the frequency to which he is listening. In incen- multiagent systems part of that community, there has been rapidly this latter example (unlike the former), we also require an tive compatibility (IC) increasing interest in Stackelberg models where the leader can constraint: no type should find it beneficial commit to a mixed strategy. This interest is driven in part by a to switch over to a different frequency, since we have no way of number of high-impact deployed security applications [25]. One forcing a type to listen to a particular frequency. of the advantages of this framework—as opposed to, say, comput- Besides considering the case of multiple follower types, we also ing a of the simultaneous-move game—is that it consider the case of multiple leader types. Here, the signal sent sidesteps issues of equilibrium selection. Another is that in two- by the leader can be correlated with her type as well as her action. player normal-form games, an optimal mixed strategy to commit Among other examples, this allows us to capture models where the to can be found in polynomial time [5]. There are limits to this leader is a seller of some item, and the type of the leader corre- computational advantage, however; once we extend to three-player sponds to knowledge about, for example, the quality of the item. completely games or Bayesian games, the computational problem becomes She can then send an informative (but perhaps not in- hard again [5]. (In a , some of the players have formative) signal about this quality to the buyer. Such models are private information that is relevant to the payoffs; their private in- sometimes studied in the auction design literature [8, 18, 12], but formation is encoded by their type.) here our interest is in generally applicable algorithms. As has previously been observed [4, 28, 23], the leader may be Our Contributions: We consider signaling in different models able to do more than commit to a mixed strategy. The leader may of Bayesian Stackelberg games, and essentially pin down the com- additionally be able to commit to send signals to the follower(s) putational complexity in each. For the case with multiple follower types (but a single leader type), we show that the optimal combina- tions of mixed strategies and signaling schemes can be computed Appears in: Proceedings of the 15th International Conference on Au- 1 tonomous Agents and Multiagent Systems (AAMAS 2016), in polynomial time using linear programming. This is the case J. Thangarajah, K. Tuyls, C. Jonker, S. Marsella (eds.), 1 May 9–13, 2016, Singapore. One may wonder whether this just follows from the fact that we Copyright c 2016, International Foundation for Autonomous Agents and can model Bayesian games by representing each type as a single Multiagent Systems (www.ifaamas.org). All rights reserved. player, thereby reducing it to a multiplayer game. But this does not F F1 F2 F F1 F2 whether an incentive compatibility constraint applies or not. How- ; ; ever, for security games, we show that the problem is NP-hard, L 0 2 L 0 1 ; 1 ; 1 L1 0 1 L1 0 1 though we do identify a special case that can be solved efficiently. 1 1 We also provide hardness evidence that this special case is almost L2 0 2 L2 0 1 1 1 the best one can hope in terms of polynomial computability. type ✓1, p =0.55 type ✓2, p =0.45 For the case with multiple leader types (but a single follower type), we show that the optimal combinations of mixed strate- Figure 1: Payoff Matrices for Followers of Different Types gies and signaling schemes can also be computed in polynomial time. Moreover, the polynomial-time solvability extends to secu- 2/3 probability on L1 is optimal for the leader to commit to. This is rity games in this setting. We note that our results (both hardness because to drive a follower of type ✓1 out of the market, the leader and polynomial-time solvability) can be easily generalized to the has to take L1 with probability at least 2/3. Likewise, to drive a case with both multiple leader and follower types, thus we will not follower of type ✓2 out of the market, the leader has to take L2 with discuss it explicitly in the paper. We conclude with an experimental probability at least 1/2. Since 2/3+1/2 > 1, the leader cannot evaluation of our approach. achieve both, so the optimal choice is to drive the follower of type ✓1 (occurring with a higher probability) out of the market so that the leader gets utility 0.55 in expectation. 2. AN EXAMPLE OF STACKELBERG Notice that the leader commits to the strategy without knowing the realization of the follower’s type. This is reasonable because the follower, as a start-up, can keep information confidential from the The Stackelberg model was originally introduced to capture mar- leader firm at the initial stage of the competition. However, as time ket competition between a leader (e.g., a leading firm in some area) goes on, the leader will gradually learn the type of the follower. and a follower (e.g., an emerging start-up). The leader has an ad- Nevertheless, the leader firm cannot change her chosen action at vantage of committing to a strategy (or equivalently, moving first) that point because, for example, there is insufficient time to switch before the follower makes decisions. Here we consider a Bayesian to another product. Can the leader still do something strategic at case of Stackelberg competition where the leader does not have full this point? In particular, we study whether the leader can benefit information about the follower. by partially revealing her action to the follower after observing the For example, consider a market with two firms, a leader and a follower’s type. To be concrete, consider the following leader pol- follower. The leader specializes in two products, product 1 and icy. Before observing the follower’s type, the leader commits to product 2. The follower is a new start-up which focuses on only one choose action L1 and L2 uniformly at random, each with proba- product. It is publicly known that the follower will focus on product bility 1/2. Meanwhile, the leader also commits to the following 1 with probability 0.55 (call him a follower of type ✓1 in this case), signaling scheme. If the follower has type ✓1, the leader will send a and product 2 with probability 0.45 (call him a follower of type signal to the follower when the leader takes action L1, and will ✓2). But the realization is only known to the follower. The leader ; send either or 1 uniformly at random when the leader takes has a research team, and must decide which product to devote this ; action L2. Mathematically, the signaling scheme for the follower (indivisible) team to, or to send them on vacation. On the other of type ✓1 is captured by the following probabilities. hand, the follower has two options: either entering the market and developing the product he focuses on, or leaving the market. Pr( L1, ✓1)=1 Pr(1 L1, ✓1)=0; ;| 1 | 1 Naturally, the follower wants to avoid competition with the Pr( L2, ✓1)= Pr(1 L2, ✓1)= . ;| 2 | 2 leader’s research team. In particular, depending on the type of On the other hand, if the follower has type ✓2, the leader will always the follower, the leader’s decision may drive the follower out of send regardless of what action she has taken. ; the market or leave the follower with a chance to gain substan- When a follower of type ✓1 receives signal (occurring with tial market share. This can be modeled as a Bayesian Stackelberg probability 3/4), he infers the posterior belief of; the leader’s strat- Game (BSG) where the leader has one type and the follower has egy as Pr(L1 , ✓1)=2/3 and Pr(L2 , ✓1)=1/3, thus | ; | ; two possible types. To be concrete, we specify the payoff matri- deriving an expected utility of 0 from taking action F1. Assum- ces for different types of follower in Figure 1, where the leader’s ing the follower breaks ties in favor of the leader,2 he will then action Li simply denotes the leader’s decision to devote the team choose action F , leaving the market. On the other hand, if the to product i for i 1, 2, ; means a team vacation. Similarly, ; 2 { ;} ; follower receives 1 (occurring with probability 1/4), he knows the follower’s action Fi means the follower focuses on products that the leader has taken action L2 for sure; thus the follower will i 1, 2, where means leaving the market. Notice that the take action F1, achieving utility 2. In other words, the signals 2 { ;} ; ; payoff matrices force the follower to only produce the product that and 1 can be viewed as recommendations to the follower to leave is consistent with his type, otherwise he gets utility . The util- 1 the market ( ) or develop the product (1), though we emphasize ity for the leader is relatively simple: the leader gets utility 1 only that a signal has; no meaning beyond the posterior distribution on if the follower (of any type) takes action F , i.e., leaving the mar- ; leader’s actions that it induces. As a result, the leader drives the ket, and gets utility 0 otherwise. In other words, the leader wants follower out of the market 3/4 of the time. On the other hand, if to drive the follower out of the market. the follower has type ✓2, since the leader reveals no information, Possessing a first-mover advantage, the leader can commit to a the follower derives expected utility 0 from taking F2, and thus randomized strategy to assign her research team so that it maxi- will choose F0 in favor of the leader. In expectation, the leader gets mizes her utility in expectation over the randomness of her mixed 3 1 1 utility 4 2 + 2 =0.875(> 0.55). Thus, the leader achieves strategy and the follower types. Unfortunately, finding the optimal better utility⇥ by signaling. mixed strategy to commit to turns out to be NP-hard for BSGs in The design of the signaling scheme above depends crucially on general [5]. Nevertheless, by exploiting the special structure in this the fact that the leader can distinguish different follower types be- example, it is easy to show that any mixed strategy that puts at least 2This is without loss of generality because the leader can always work, because the corresponding normal form of the game would slightly tune the probability mass to make the follower slightly pre- have size exponential in the number of types. fer F . ; Leader plays an Follower’s type is Follower observes the fore sending the signals and will signal differently to different fol- action realized signal and plays lower types. This fits the setting where the leader can observe the time follower’s type after the leader takes her action and then signals accordingly. However, in many cases, the leader is not able to ob- Leader commits Leader “observes” follower type serve the follower’s type. Interestingly, it turns out that the leader (strategy + signaling scheme) and samples a signal can in some cases design a signaling scheme which incentivizes the follower to truthfully report his type to the leader and still benefit Figure 2: Timeline of the BSG with Multiple Follower Types. from signaling. Note that the signaling scheme above does not sat- isfy the follower’s incentive compatibility constraints – if the fol- and multiple follower types. Let ⇥ denote the set of all the fol- lower is asked to report his type, a follower of type ✓2 would be lower types. An instance of such a BSG is given by a set of tuples ✓ ✓ ✓ ✓ m n better off to report his type as ✓1. This follows from some simple (A ,B , ✓) ✓ ⇥ where A ,B R ⇥ are the payoff ma- { } 2 2 calculation, but an intuitive reason is that a follower of type ✓2 will trices of the leader (row player) and the follower (column player) not get any information if he truthfully reports ✓2, but will receive respectively when the follower has type ✓, which occurs with prob- a more informative signal, thus benefit himself, by reporting ✓1. ability ✓. We use [m] and [n] to denote the leader’s and follower’s Now let us consider another leader policy. The leader commits pure strategy set respectively. For convenience, we assume that ev- to the mixed strategy (L ,L1,L2)=(1/11, 6/11, 4/11). Inter- ery follower type has the same number of actions (i.e., n) in the ; estingly, this involves sometimes sending the research team on va- above notation. This is without loss of generality since we can al- cation! Meanwhile, the leader also commits to the following more ways add “dummy" actions with payoff to both players. We ✓ ✓ 1✓ ✓ ✓ ✓ sophisticated signaling scheme. If the follower reports type ✓1, the use aij [bij ] to denote a generic entry of A [B ]. If A = B 3 leader will send signal whenever L1 is taken as well as of for all ✓ ⇥, we say that the BSG is zero-sum. Following the stan- ; 4 2 the time that L2 is taken; otherwise the leader sends signal 1. If dard assumption of Stackelberg games, we assume that the leader the follower reports type ✓2, the leader sends signal whenever can commit to a mixed strategy. Such a leader strategy is optimal if 2 ; L2 is taken as well as 3 of the time that L1 is taken; otherwise it results in maximal leader utility in expectation over the random- the leader sends signal 2. It turns out that this policy is incen- ness of the strategy and follower types, assuming each follower tive compatible – truthfully reporting the type is in the follower’s type best responds to the leader’s mixed strategy.3 It is known that best interests – and achieves the maximum expected leader utility computing the optimal mixed strategy, also known as the Bayesian 17 0.773 (0.55, 0.875) among all such policies. Strong Stackelberg Equilibrium (BSSE) strategy, to commit to is 22 ⇡ 2 Justification of Commitment: The assumption of commitment NP-hard in such a BSG [5]. A later result strengthened the hard- to strategies is well motivated, and has been justified, in many ap- ness to approximation – no polynomial time algorithm can give a plications, e.g., market competition [9] and security [25]. This is non-trivial approximation ratio in general unless P=NP [16]. usually due to the leader’s first-mover advantage. The assumption We consider a richer model where the defender can commit not of commitment to signaling schemes is justified on the grounds of only to a mixed strategy but also to a scheme, often known as a games that are played repeatedly (e.g., a leading firm plays repeat- signaling scheme, of partially releasing information regarding the edly with start-ups that can show up and fade away), so the fol- action she is currently playing i.e., the sample from the leader’s lower can learn the signaling scheme - how the signals correlate committed mixed strategy. Formally, the leader commits to a mixed with leader actions taken. On the other hand, to balance the short strategy x m, where m is the m-dimensional simplex, and a term utility and long-term credibility, the leader has incentives to signaling scheme2 ' which is a stochastic map from ⇥ [m] to a set follow the signaling scheme in order to build a reputation about her of signals ⌃. In other words, the sender randomly chooses⇥ a signal strategy of disclosing information. We refer the reader to [24] for to send based on the action she currently plays and the follower more thorough discussions of this phenomenon. type she observes. We call the pair Remark: This example shows that the additional ability of com- rnd mitting to a signaling scheme can profoundly affect both players’ (x, ') where x m; ' : ⇥ [m] ⌃ (1) strategies. We study how such additional commitment changes the 2 ⇥ ! game as well as the computation of the leader’s optimal policy. The a leader policy. After the commitment, the leader samples an ac- rest of this paper is organized as follows. In Section 3 we general- tion to play. Then the follower’s type is realized, and the leader ob- ize the above example to BSGs, and also examine its application to serves the follower’s type and samples a signal. We assume that the Bayesian Stackelberg Security Games, a model of growing interest follower has full knowledge of the leader policy. Upon receiving a in modeling various security challenges. Note that the above exam- signal, the follower updates his belief about the leader’s action and ple only concerned the case where the follower has multiple types. takes a . Figure 2 illustrates the timeline of the game. In Section 4, we consider a variant of the model where the leader has multiple types (but the follower has only one type), and seek to We note that if the leader cannot distinguish different follower compute the optimal leader policy. We show simulation results in types and has to send the same signal to all different follower types, Section 5 and conclude in Section 6. then signaling does not benefit the leader (for the same reason as the non-Bayesian setting). In this case, she should simply commit to the optimal mixed strategy. The leader only benefits when she can 3. SINGLE LEADER TYPE, MULTIPLE target different follower types with different signals. In many cases, FOLLOWER TYPES like the example in Section 2, the leader gets to observe the fol- lower’s type when it is realized (but after her action is completed) and can therefore choose to signal differently to different follower 3.1 The Model types. Moreover, in practice it is sometimes natural for the leader to In this section, we generalize the example in Section 2 and con- send different signals to different follower types even without gen- sider how the leader’s additional ability of committing to a sig- naling scheme changes the game and the computation. We start 3Note that the follower cannot observe the leader’s realized action, with a Bayesian Stackelberg Game (BSG) with one leader type which is a standard assumption in Stackelberg games. uinely learning their types, e.g., the follower’s type may be defined Remark: Notice that computing the optimal mixed strategy (as- by their location, in which case we can send signals using location- suming no signaling) to commit to is NP-hard in general for the specific devices such as physical signs or radio transmission – this setting above (even NP-hard to approximate within any non-trivial fits our model just as well. We will elaborate one such example ratio), as shown in [5, 16]. Interestingly, it turns out that when we when discussing security games. consider a richer model with signaling, the problem becomes easy! Intuitively, this is because the signaling scheme “relaxes" the game 3.2 Commitment to Optimal Leader Policy by introducing correlation between the leader’s and follower’s ac- We first consider the case where the leader can explicitly observe tion (via the signal). Such correlation allows more efficient compu- the follower’s type, and thus can signal differently to different fol- tation. Similar intuition can be seen in the literature on computing lower types, but this would also fit the location based model. We Nash equilibria (hard for two players [6, 3]) and correlated equilib- start with a simple observation. ria (easy in fairly general settings [20, 14]). OBSERVATION 3.1 (SEE, E.G., [15]). There exists an opti- 3.3 Incentivizing the Follower Type mal signaling scheme using at most n signals with signal j rec- In many situations, it is not realistic to expect that the leader can ommending action j [n] to the follower. 2 observe the follower’s type. For example, the follower’s type may Observation 3.1 follows simply from the fact that if two signals re- be whether he has a high or low value for an object, which is not sult in the same follower best-response action, we can merge these directly observable. In such cases, the leader can ask the follower signals, resulting in a new signal without changing the follower’s to report his type. However, it is not always in the follower’s best best response action and the leader’s utility. As a result, for the rest interests to truthfully report his own type since the signal that is in- tended for a different follower type might be more beneficial to the of the paper we assume that ⌃ = j j [n]. { } 2 follower (recall the example in Section 2). In this section, we con- THEOREM 3.2. The optimal leader policy can be computed in sider how to compute an optimal incentive compatible (IC) leader poly(m, n, ⇥ ) time by linear programming. policy that incentivizes the follower to truthfully report his type, | | and meanwhile benefits the leader. PROOF. Let x =(x1,...,xm) m be the leader’s mixed We note that Observation 3.1 still holds in this setting. To see strategy to commit to. As a result of2 Observation 3.1, the signaling this, consider a follower of type ✓ that receives more than one sig- scheme ' can be characterized by '(j i, ✓) which is the probabil- | nal, each resulting in the same follower best response. Then, as ity of sending signal j conditioned on the leader’s (pure) action ✓ before, we can merge these signals without harming the follower i and the follower’s type ✓. Then, p = xi '(j i, ✓) is the joint ij · | of type ✓. But if a follower of type = ✓ misreports his type probability that the leader plays pure strategy i and sends signal j , as ✓, receiving the merged signal provides6 less information than conditioned on observing the follower of type ✓. Then the follow- receiving one of the unmerged signals. Therefore, if the follower ing linear program computes the optimal leader policy captured by ✓ of type had no incentive to misreport type ✓ before the signals variables xi i [m] and pij i [m],j [n],✓ ⇥. { } 2 { } 2 2 2 were merged, he has no incentive to misreport after the signals are ✓ ✓ merged. So any signaling scheme with more than n signals can be maximize ✓ ⇥ ✓ ij pij aij n2 ✓ reduced to an equivalent scheme with exactly n signals. subject to j=1 pij = xi, for i [m], ✓ ⇥. Pm ✓ P✓ m ✓ ✓ 2 2 i=1 pij bij i=1 pij bij , for ✓,j = j0. Pm 0 6 THEOREM 3.4. The optimal incentive compatible (IC) leader i=1 xi =1 P✓ P policy can be computed in poly(m, n, ⇥ ) time by linear program- pij 0, for all i, j, ✓. ming, assuming the leader does not observe| | the follower’s type. . P (2) PROOF. Similar to Section 3.2, we still use variables x m ✓ 2 The first set of constraints mean that the summation of probability and pij i [m],j [n],✓ ⇥ to capture the leader’s policy. Then ✓ { } 2 2 2 p i ✓ m ✓ mass ij – the joint probability of playing pure strategy and send- ↵j = i=1 pij is the probability of sending signal j when the ing signal j conditioned on follower type ✓ – over j should equal follower has type ✓. Now consider the case where the follower re- the probability of playing action i for any type ✓. The second set ports typeP , but has true type ✓. When the leader recommends of constraints are to guarantee that the recommended action j by action j (assuming a follower of type ), which now is not neces- 4 signal j is indeed the follower’s best response. sarily the follower’s best response due to the follower’s misreport, 1 m ✓ the follower’s utility for any action j0 is p b . There- ↵ i=1 ij ij0 Given any game G, let Usig(G) be the leader’s expected utility j fore, the follower’s action will be arg max P1 m p b✓ with by taking the optimal leader policy computed by LP (2). Moreover, j0 i=1 ij ij0 ↵j let UBSSE(G) be the leader’s utility in the BSSE, i.e., the expected expected utility max 1 m p b✓ . As a result,P the expected j0 i=1 ij ij0 leader utility by committing to (only) the optimal mixed strategy. ↵j utility for the follower of typeP ✓, but misreporting type , is PROPOSITION 3.3. If G is a zero-sum BSG, then Usig(G)= U (G) n m n m BSSE . That is, the leader does not benefit from signaling in 1 ✓ ✓ zero-sum BSGs. U(; ✓)= ↵j max pij bij0 = max pij bij0 ⇥ j j j=1 " 0 ↵j i=1 # j=1 " 0 i=1 # X X X X The intuition underlying Proposition 3.3 is that, in a situation of pure competition, any information volunteered to the follower will Therefore, to incentivize the follower to truthfully report his type, we only need to add the incentive compatibility constraints be used to “harm" the leader. In other words, signaling is only m ✓ ✓ U(✓; ✓) U(; ✓). Using the condition maxj pij b = helpful when the game exhibits some “cooperative components". 0 i=1 ij0 m ✓✓ p b , i.e., the recommended action j by j is indeed the We omit the detailed proof here due to space limitations. Note that i=1 ij ij P all the omitted proofs can be found in the full version of this paper. follower’s best response when the follower has type ✓, we have P 4 U(✓; ✓)= n max m p✓ b✓ = n m p✓ b✓ This is often called “obedience". j=1 j0 i=1 ij ij0 j=1 i=1 ij ij P ⇥ P ⇤ P P Therefore, incorporating the above constraints to LP (2) gives the type, but the defender only knows the distribution ✓ ✓ ⇥. This 2 following optimization program which computes an optimal incen- captures many natural security settings. For example,{ in} airport pa- tive compatible leader policy. trolling, the attacker could either be a terrorist or a regular policy

✓ ✓ violator as modeled in [22]. In wildlife patrolling, the type of an maximize ✓ ⇥ ✓ ij pij aij n2 ✓ attacker could be the species the attacker is interested in [10]. If the subject to j=1 pij = xi, for all i, ✓. attacker chooses to attack target i [n], players’ utilities depend Pm ✓ P✓ m ✓ ✓ p b p b , for j = j0. 2 i=1 ij ij i=1 ij ij0 not only on whether target i is covered or not, but also on the at- Pn m ✓ ✓ 6 d/a j=1 i=1 pij bij ✓ U (i ✓) P P tacker’s type . We use c/u to denote the defender/attacker n m ✓ | maxj p b , for = ✓. (d/a) utility when target i is covered/uncovered (c/u) and an at- P Pj=1 0 i=1 ij ij0 6 m tacker of type ✓ attacks target i. i=1 xi =1h i n ✓ P P Notice that the leader now has pure strategies, thus the natu- pij 0, for all i, j, ✓. k P ral LP has exponential size. Nevertheless, in security games we can (3) n m ✓ sometimes solve the game efficiently by exploiting compact repre- Notice that maxj p b is a convex function, j=1 0 i=1 ij ij0 sentations of the defender’s strategies. Unfortunately, we show this therefore theP above ish a convexP program.i By standard tricks, the is not possible here. Interestingly, it turns out that the hardness of convex constraint can be converted to a set of polynomially many the problem depends on how many targets an attacker is interested linear constraints (see, e.g., [2]). in. In particular, we say that an attacker of type ✓ is not interested a a in attacking target i if there exists j such that Uu (i ✓)

4.2 Commitment to Optimal Leader Policy THEOREM 4.2. The optimal defender policy can be computed Similarly to Observation 3.1, it is easy to see there exists an opti- in poly(n, ⇥ ) time if the defender’s best response problem (i.e., mal leader policy with n signals where each signal recommends an defender oracle)| | admits a poly(n) time algorithm. action to the follower. Therefore, without loss of generality, we as- sume ⌃ = 1,...,n where j is a signal recommending action PROOF. First, observe that LP (5) does not obviously extend to j to the follower.{ } security game settings because the number of leader pure strategies 6The optimal policy can be computed by an LP with exponential 7Interestingly, when ⇥ =1, the game degenerates to a Stack- size. The defender oracle is essentially the dual of the LP. See the elberg game without| uncertainty| of player types, and LP (5) de- Appendix for a derivation of the defender oracle and proof of the generates to a linear program that computes the Strong Stackelberg hardness. equilibrium [4]. 10 2 10 2 no IC constraints no IC constraints no IC constraints IC constraints IC constraints IC constraints 8 BSSE 1.5 8 BSSE 1.5 BSSE

6 1 6 1

4 0.5 4 0.5 runtime (s) runtime (s) leader utility no IC constraints leader utility 2 0 IC constraints 2 0 BSSE 0 -0.5 0 -0.5 5 10 15 5 10 15 5 10 15 5 10 15 n n # types # types (a) Running time, ⇥ constant (b) Leader utility, ⇥ constant (c) Running time, n constant (d) Leader utility, n constant | | | |

Figure 4: Simulation results showing the effect of varying number of actions, n, and number of types, ✓ , on the runtime and utility | | of the three different models in the case of multiple follower types.

✓ ✓ is exponentially large here and so is the LP formulation. There- where both pj and yj are variables. Here p denotes the polytope fore, like classic security game algorithms, it is crucial to exploit px : x for any given p. It turns out thatD this is still a convex a compact representation of the leader’s policy space. For this, we constraint,{ 2 D} and behaves nicely as long as the polytope behaves need an equivalent but slightly different view of the leader policy. nicely. D That is, the leader policy can be equivalently viewed as follows: the n LEMMA 4.3. Let R be any bounded convex set. Then leader observes her type ✓ and then randomly chooses a signal j the following hold: D ✓ (occurring with probability m p✓ in LP (5)), and finally picks i=1 ij 1. The extended set = (x,p):x p ,p 0 is convex; a mixed strategy that depends on both ✓ and j (i.e., the vector D { 2 D } ✓ ✓ ✓ P m ✓ 2. If is a polytope expressed by constraints Ax b, then (p1j ,p2j ,...,pmj ) normalized by the factor i=1 pij in LP (5)). D  D The different view of leader policy above allows us to write a is also a polytope, givene by (x,p):Ax pb,p 0 ; {  } 8 quadratic program for computing the optimalP leader policy. In par- 3. If admits a poly(n) time separation oracle, so does . e ✓ D D ticular, let pj be the probability that the leader sends signal j con- The proof of Lemma 4.3 is standard, and we defer it to the full ✓ e ditioned on the realized leader type ✓, and let xj be the leader’s version. We note that the restriction that is bounded is important, (marginal) mixed strategy conditioned on observing ✓ and sending otherwise some conclusions do not hold,D e.g., Property 2. Fortu- signal j . Then, upon receiving signal j , a rational Bayesian at- nately, the polytope of mixed strategies is bounded. Therefore, tacker will updates his belief, and compute the expected utility for using Lemma 4.3, weD can rewrite Quadratic Program (8) as the fol- attacking target j0 as lowing linear program. ✓ maximize DefU(j, j) ✓pj ✓ a ✓ a j 0 0 0 0 xj (j )Uc (j ✓)+ 1 xj (j ) Uu (j ✓) (7) subject to AttU(j, j) AttU(j, j0), for j = j0. ↵j · | | ✓ ! P ✓ 6 X h ⇣ ⌘ i j pj =1, for ✓ ⇥. (9) ✓ ✓ ✓ 2 where the normalization factor ↵j = ✓ ✓pj is the probability of (yj ,pj ) , for j, ✓. P✓ 2 D sending signal j . Define AttU(j, j0) to be the attacker utility by p 0, for j, ✓. P j attacking target j0 conditioned on receiving signal j , multiplied e Program (9) is linear because AttU(j, j0) and DefU(j, j) are lin- by the probability ↵j of receiving signal j. Formally, ✓ ✓ ✓ ✓ ear in pj and yj , and moreover, (yj ,pj ) are essentially linear AttU(j, j0) constraints due to Lemma 4.3 and the fact2 thatD is a polytope in = ↵j Equation (7) D ⇥ ✓ ✓ a ✓ ✓ ✓ a security games. Furthermore, LP (9) has a compacte representation = ✓p x (j0)U (j0 ✓)+ ✓p ✓p x (j0) U (j0 ✓) ✓ j j c | j j j u | as long as the polytope of realizable mixed strategies has one. In this case, LP (9) can be solved explicitly. More generally,D by Similarly,P ⇣ we can also define DefUh (j, j0), the leader’si expected ⌘ standard techniques from convex programing, we can show that utility of sending signal j with target j0 being attacked, scaled the separation oracle for easily reduces to the defender’s best re- by the probability of sending j . The attacker’s incentive com- sponse problem. Thus ifD the defender oracle admits a poly(n) time patibility constraints are then AttU(j, j) AttU(j, j0) for any algorithm, then a separation oracle for can be found in poly(n) j0 = j. Then the leader’s problem can be expressed as the D 6 ✓ time. By Lemma 4.3, then admits a poly(n) time separation or- following quadratic program with variables xj j [n],✓ ⇥ and D ✓ { } 2 2 acle, so LP (9) can solved in poly(n, ⇥ ) time. The proof is not pj j [n],✓ ⇥. | | { } 2 2 particularly insightful ande a similar argument can be found in [26]. maximize j DefU(j, j) So we omit the details here. subject to AttU(j, j) AttU(j, j0), for j = j0. P ✓ 6 4.4 Relation to Other Models j pj =1, for ✓ ⇥. (8) ✓ 2 We note that our model in this section is related to several mod- x , for j, ✓. Pj 2 D els from the literature on both information and secu- p✓ 0, for j, ✓. j rity games. In particular, when the leader does not have actions n The optimization program (8) is quadratic because AttU(j, j0) 8A separation oracle for a convex set R is an algorithm, n D ✓ and DefU(j, j0) are quadratic in the variables. Notably, these two which, given any x0 R , either correctly asserts x0 or ✓ ✓ ✓ 2 2 D functions are linear in pj and the term pj xj . Therefore, we de- asserts x0 and find a hyperplane a x = b separating x0 from ✓ ✓ ✓ n in the following62 D sense: a x0 >bbut ·a x b for any x . It fine variables yj = pj xj R . Then, both AttU(j, j0) and 2 isD well-known that the convex· program max· a x subject to2xD ✓ ✓ n DefU(j, j0) are linear in pj and yj . The only problematic con- can be solved in poly(n) time for any a R · if has a poly2(nD) straint in program (8) is x✓ , which now becomes y✓ p✓ time separation oracle [11]. 2 D j 2 D j 2 j D and only privately observes her type, our model degenerates to the the type-reporting IC constraints), compared with the utility from Bayesian Persuasion (BP) model of [15]. The BP model is a two- BSSE (the y =0baseline). Note that when ↵ =0there is no player game played between a sender (leader in our case) and a re- gain from signaling, from Proposition 3.3. Interestingly, the gain ceiver (follower in our case). The receiver must take one of a num- from signaling is non-monotone, peaking at around ↵ =0.7. Intu- ber of actions with a-priori unknown payoff, and the sender has no itively, large ↵ means low correlation between the payoff matrices actions but possesses additional information regarding the payoff of the leader and follower, therefore there is a high probability that of various receiver actions (i.e., the leader observes her type). The some entries will induce high payoff to both players. The leader BP model studies how the sender can signal her additional informa- can therefore extract high utility from commitment alone, thus de- tion to persuade the receiver to take an action that is more favorable rives little gain from signaling. However, as we decrease ↵ and to the sender. Variants of the BP model have been applied to var- the game becomes more competitive, commitment alone is not as ied domains including auctions, advertising, voting, multi-armed powerful for the leader and she has more to gain from being able to bandits, medical research and financial regulation. For additional signal. references, see [7]. Our model generalizes the BP model to the case We next investigate the relation between the size of the BSG and where sender has both actions and additional private information, the leader’s utility, as well as runtime, for the three different mod- and our results show that this generalized model can be solved in els. In Figures 4(a) and 4(b), we hold the number of follower types fairly general settings. constant ( ⇥ =5) and vary m = n between 1 and 15. In Fig- The security game setting in this section also relates to the model ures 4(c) and| | 4(d) we fix m = n =5and vary ⇥ between 1 and of Rabinovich et al. [23]. Rabinovich et al. considered a similar 15. In all cases we set ↵ =0.5 for generating random| | instances. security setting where the defender can partially signal her strategy Not surprisingly, allowing signaling (both with and without the and extra knowledge about targets’ states to the attacker in order to IC constraints) provides a significant speed-up over computing the achieve better defender utility. This is essentially a BSG with mul- BSSE.9 On the other hand, the additional constraints in the model tiple leader types and a single follower type. Rabinovich et al. [23] with IC constraints also increase the running time over the model were able to efficiently solve for the case with unconstrained iden- without those constraints. Indeed, the time to compute the leader’s tical security resources. Our Theorem 4.2 shows that this model optimal policy without the IC constraints appears as a flat line in can actually be efficiently solved in much more general security Figures 4(a) and 4(c). settings allowing complicated real-world scheduling constraints, as In both figures of leader utility, the differences of the leader’s long as the defender oracle problem can be solved efficiently. utility among the models are as indicated by Proposition 3.5. Ob- serve that in all models the leader’s utility increases with the num- 5. SIMULATIONS ber of actions, but decreases with the number of types. One expla- nation is that the former effect is due to the increased probability We will mainly present the comparison of the models discussed that the payoff matrices for a given follower type contain ‘coop- in Section 3 in terms of both the leader’s optimal utility and the erative’ entries where both players achieve high utility. However, runtime required to compute the leader’s optimal policy. We focus as the number of follower types increases, it becomes less likely primarily on the setting with one leader type and multiple follower that the leader’s strategy (which does not depend on the follower types, for two reasons. First, this is the case in which it is NP-hard type) can “cooperate" with a majority of follower types simultane- to compute the optimal leader strategy without allowing the leader ously. Thus there is an increased chance that the leader’s strategy to signal (i.e., to compute the BSSE strategy), while our models of results in low utilities when playing against a reasonable fraction signaling permit a polynomial time solution. Second, some inter- of follower types, which accounts for the latter effect. esting phenomena in our simulations for the case of multiple leader In the case of multiple leader types, allowing the leader to signal types also show up in the case of multiple follower types. actually results in a small computational speed up compared to the We generate random case without signaling. We hypothesize that this is because we only instances using a modi- 1 no IC constraints need to solve one LP to compute the optimal policy, rather than the fication of the covariant IC constraints 0.8 multiple LPs required to solve without signaling [5]. Unsurpris- game model [19]. In ingly, we also see an increase in the leader’s utility. The utility particular, for given val- 0.6 trends are similar to the case of multiple follower types, so we do ues of m, n, and ⇥, utility 0.4 ✓ not present them in detail. we independently set aij equal to a random integer 0.2 in the range [ 5, 5] for 0 6. CONCLUSIONS AND DISCUSSIONS 0 0.2 0.4 0.6 0.8 1 each i, j, ✓. Probabilities alpha In this paper, we studied the effect of signaling in Bayesian ✓ ✓ ⇥ were generated Stackelberg games. We show that the leader’s power of commit- { } 2 randomly. For some Figure 5: Extra utility gained by ment to a signaling scheme not only achieves higher utility, but also value of ↵ [0, 1], we computational speed-ups. Some of the polynomial-time solvability 2 the leader from signaling. then set B = ↵(B0)+ results extend to security games, an important application domain (1 ↵)( A), where B0 is a random matrix generated in the same of Stackelberg games, while others cease to hold. There are many fashion as A. So in the case that ↵ =0the game is zero-sum, while interesting directions for future work. What if different follower ↵ =1means independent and uniform random leader and follower types can share information with each other? For a Bayesian leader, payoffs. For every set of parameter values, we averaged over 50 in- what if her signaling scheme cannot be correlated with her mixed stances generated in this manner to obtain the utility/runtime values strategy, but only carries information about her type? Can we ap- we report. ply these ideas to other domains, e.g., where the We first consider the value of signaling for different values of mechanism designer implicitly serves as the leader? ↵ chosen from the set 0, 0.1, 0.2,...,1 . For these simulations, { } we fixed m = n =10and ⇥ =5. Figure 5 shows the abso- 9To compute the BSSE, we implement the state-of-art algorithm lute increase in leader utility from| | signaling (both with and without DOBBS, a mixed integer linear program as formulated in [21]. Acknowledgments: This research is supported by NSF grant CCF- [17] Y. Li, V. Conitzer, and D. Korzhyk. Catcher-evader games. 1350900 and MURI grant W911NF-11-1-0332. Part of this re- arXiv:1602.01896. search was done while the authors were visiting the Simons In- [18] P. B. Miltersen and O. Sheffet. Send mixed signals: earn stitute for the Theory of Computing. more, work less. In B. Faltings, K. Leyton-Brown, and P. Ipeirotis, editors, EC, pages 234–247. ACM, 2012. [19] E. Nudelman, J. Wortman, Y. Shoham, and REFERENCES K. Leyton-Brown. Run the GAMUT: A comprehensive [1] T. M. Anton Kolotilin, Ming Li and A. Zapechelnyuk. approach to evaluating game-theoretic algorithms. In Persuasion of a privately informed receiver. Working Paper, Proceedings of the Third International Joint Conference on 2015. Autonomous Agents and Multiagent Systems-Volume 2, pages [2] S. Boyd and L. Vandenberghe. Convex Optimization. 880–887. IEEE Computer Society, 2004. Cambridge University Press, New York, NY, USA, 2004. [20] C. H. Papadimitriou and T. Roughgarden. Computing [3] X. Chen, X. Deng, and S.-H. Teng. Settling the complexity correlated equilibria in multi-player games. J. ACM, of computing two-player Nash Equilibria. J. ACM, 55(3):14:1–14:29, Aug. 2008. 56(3):14:1–14:57, May 2009. [21] P. Paruchuri, J. P. Pearce, J. Marecki, M. Tambe, F. Ordonez, [4] V. Conitzer and D. Korzhyk. Commitment to correlated and S. Kraus. Efficient algorithms to solve Bayesian strategies. In Proceedings of the 25th AAAI Conference on Stackelberg games for security applications. In Proceedings Artificial Intelligence (AAAI), 2011. of the 23rd AAAI Conference on Artificial Intelligence [5] V. Conitzer and T. Sandholm. Computing the optimal (AAAI), pages 1559–1562, 2008. strategy to commit to. In Proceedings of the 7th ACM [22] J. Pita, M. Jain, J. Marecki, F. Ordóñez, C. Portway, conference on Electronic commerce, pages 82–90. ACM, M. Tambe, C. Western, P. Paruchuri, and S. Kraus. Deployed 2006. armor protection: the application of a game theoretic model [6] C. Daskalakis, P. W. Goldberg, and C. H. Papadimitriou. The for security at the Los Angeles international airport. In complexity of computing a Nash Equilibrium. In Proceedings of the 7th international joint conference on Proceedings of the Thirty-eighth Annual ACM Symposium on Autonomous agents and multiagent systems: industrial track, Theory of Computing, STOC ’06, pages 71–78, New York, pages 125–132. International Foundation for Autonomous NY, USA, 2006. ACM. Agents and Multiagent Systems, 2008. [7] S. Dughmi and H. Xu. Algorithmic Bayesian persuasion. In [23] Z. Rabinovich, A. X. Jiang, M. Jain, and H. Xu. Information Proceedings of the Forty-eighth Annual ACM Symposium on disclosure as a means to security. In Proceedings of the 2015 Theory of Computing, STOC ’16, 2016. International Conference on Autonomous Agents and [8] Y. Emek, M. Feldman, I. Gamzu, R. Paes Leme, and Multiagent Systems, AAMAS 2015, Istanbul, Turkey, May M. Tennenholtz. Signaling schemes for revenue 4-8, 2015, pages 645–653, 2015. maximization. In Proceedings of the 13th ACM Conference [24] L. Rayo and I. Segal. Optimal information disclosure. on Electronic Commerce, EC ’12, pages 514–531, New Journal of Political Economy, 118(5):949 – 987, 2010. York, NY, USA, 2012. ACM. [25] M. Tambe. Security and Game Theory: Algorithms, [9] F. Etro. Stackelberg, heinrich von: and Deployed Systems, Lessons Learned. Cambridge University equilibrium. Journal of Economics, 109(1):89–92, 2013. Press, New York, NY, USA, 1st edition, 2011. [10] F. Fang, P. Stone, and M. Tambe. When security games go [26] H. Xu, F. Fang, A. X. Jiang, V. Conitzer, S. Dughmi, and green: Designing defender strategies to prevent poaching and M. Tambe. Solving zero-sum security games in discretized illegal fishing. In International Joint Conference on Artificial spatio-temporal domains. In Proceedings of the 28th Intelligence (IJCAI), 2015. Conference on Artificial Intelligence (AAAI 2014), Québec, [11] M. Grötschel, L. Lovász, and A. Schrijver. Geometric Canada, 2014. Algorithms and Combinatorial Optimization, volume 2 of [27] H. Xu, A. X. Jiang, A. Sinha, Z. Rabinovich, S. Dughmi, and Algorithms and Combinatorics. Springer, 1988. M. Tambe. Security games with information leakage: [12] M. Guo and A. Deligkas. Revenue maximization via hiding Modeling and computation. In Proceedings of the item attributes. CoRR, abs/1302.5332, 2013. Twenty-Fourth International Joint Conference on Artificial [13] M. Jain, E. Kardes, C. Kiekintveld, F. Ordóñez, and Intelligence, IJCAI 2015, Buenos Aires, Argentina, July M. Tambe. Security games with arbitrary schedules: A 25-31, 2015, pages 674–680, 2015. branch and price approach. In M. Fox and D. Poole, editors, [28] H. Xu, Z. Rabinovich, S. Dughmi, and M. Tambe. Exploring Proceedings of the 24th AAAI Conference on Artificial information asymmetry in two-stage security games. In Intelligence (AAAI). AAAI Press, 2010. Proceedings of the 29th AAAI Conference on Artificial [14] A. X. Jiang and K. Leyton-Brown. Polynomial-time Intelligence (AAAI), 2015. computation of exact in compact [29] Z. Yin, A. Jiang, M. Johnson, M. Tambe, C. Kiekintveld, games. In Proceedings of the Twelfth ACM Electronic K. Leyton-Brown, T. Sandholm, and J. Sullivan. TRUSTS: Commerce Conference (ACM-EC), 2011. Scheduling randomized patrols for fare inspection in transit [15] E. Kamenica and M. Gentzkow. Bayesian persuasion. systems. In Proceedings of the Conference on Innovative American Economic Review, 101(6):2590–2615, 2011. Applications of Artificial Intelligence (IAAI), 2012. [16] J. Letchford, V. Conitzer, and K. Munagala. Learning and approximating the optimal strategy to commit to. In M. Mavronicolas and V. G. Papadopoulou, editors, SAGT, volume 5814 of Lecture Notes in Computer Science, pages 250–262. Springer, 2009.