Computers & Industrial Engineering 111 (2017) 216–227

Contents lists available at ScienceDirect

Computers & Industrial Engineering

journal homepage: www.elsevier.com/locate/caie

A branch and price algorithm for a Stackelberg Security Game ⇑ Felipe Lagos a, Fernando Ordóñez b, , Martine Labbé c,d a Georgia Institute of Technology, United States b Universidad de Chile, Chile c Université Libre de Bruxelles, Belgium d INRIA, Lille, France article info abstract

Article history: Mixed integer optimization formulations are an attractive alternative to solve Stackelberg Game prob- Received 14 June 2017 lems thanks to the efficiency of state of the art mixed integer algorithms. In particular, decomposition Received in revised form 26 June 2017 algorithms, such as branch and price methods, make it possible to tackle instances large enough to rep- Accepted 28 June 2017 resent games inspired in real world domians. Available online 29 June 2017 In this work we focus on Stackelberg Games that arise from a security application and investigate the use of a new branch and price method to solve its mixed integer optimization formulation. We prove that Keywords: the algorithm provides upper and lower bounds on the optimal solution at every iteration and investigate the use of stabilization heuristics. Our preliminary computational results compare this solution approach Stackelberg games Security with previous decomposition methods obtained from alternative formulations of Stackelberg games. Ó 2017 Published by Elsevier Ltd.

1. Introduction to attack. Such Stackelberg Security Game models have been used in the deployment of decision support systems with specialized Stackelberg games model the strategic interaction between algorithms in real security domain applications (Jain et al., 2010; players, where one participant – the leader – is able to commit Pita, Tambe, Kiekintveld, Cullen, & Steigerwald, 2011; Shieh et al., to a strategy first, knowing that the remaining players – the follow- 2012). ers – will take this strategy into account and respond in an optimal Recent work has developed efficient integer optimization solu- manner. These games have been used to represent markets in tion algorithms for different variants of the SSGs (Hochbaum, Lyu, which a participant has significant market share and can commit & Ordóñez, 2014; Jain, Kardes, Kiekintveld, Ordóñez, & Tambe, to a strategy (von Stackelberg, Bazin, Hill, & Urch, 2010), where 2010; Jain, Kiekintveld, & Tambe, 2011; Jain et al., 2011; government decides tolls or capacities in a transportation network Kiekintveld et al., 2009). In general terms these optimization prob- (Labbé, Marcotte, & Savard, 1998), and of late have been used to lems are formulated with the defender committing to a mixed represent the attacker-defender interaction in security domains (randomized) strategy and the attacker(s) responding with a pure (Jain et al., 2010). These games are examples of bilevel optimiza- strategy after conducting surveillance of the defender’s mixed tion problems, which are in general non prob- strategy. A mixed strategy refers to a probability distribution over lems that are difficult to solve. the possible actions while a pure strategy corresponds to selecting In this work we focus on a specific class of Stackelberg games one of the possible actions. In this SSG, the defender mixed strate- which we refer to as Stackelberg Security Games (SSG) that arise gies are probability distributions over possible patrolling strate- in security domains and have a particular payoff structure (Yin, gies, while the attacker’s pure strategy corresponds to selecting a Korzhyk, Kiekintveld, Conitzer, & Tambe, 2010). In a SSG, the secu- specific target to attack. In addition, the number of actions of the rity (or defender) behaves as the leader selecting a patrolling strat- defender can be exponential in size, with respect to the targets egy first and then, possibly many attackers act as the follower, and defense resources, due to the combinatorics of using N observing the defender’s patrolling strategy and deciding where resources to patrol m targets. This illustrates that to solve SSGs we have to address mixed integer optimization problems with ⇑ exponential number of variables. Addressing the combinatorial Corresponding author. size of defender strategies has led to both development of branch E-mail addresses: [email protected] (F. Ordóñez), [email protected] (M. Labbé). and price methods (Kiekintveld et al., 2009) and constraint gener- http://dx.doi.org/10.1016/j.cie.2017.06.034 0360-8352/Ó 2017 Published by Elsevier Ltd. F. Lagos et al. / Computers & Industrial Engineering 111 (2017) 216–227 217 ation methods (Yang, Jiang, Tambe, & Ordóñez, 2013). There are, follower will choose an optimal response and will break ties in favor however, problem instances that arise from real security applica- of the leader. In other words, following the formal definition of a tions that still challenge existing solution methods. Here we inves- SSE in Kiekintveld et al. (2009), a pair of strategies x and h tigate a new branch and price method developed for a novel ðq ðxÞÞh2H form a SSE if they satisfy: formulation of Stackelberg games (MIPSG), introduced in Casorrán-Amilburu, Fortz, Labbé, and Ordóñez (2017). This new ; h P 1. The leader (defender) maximizes utility: uDðx ðq ðxÞÞh2HÞ formulation has been shown to provide tighter linear relaxations 0; h 0 0 uDðx ðq ðx ÞÞh2HÞ for any feasible x than other existing mixed integer formulations and to give the con- h ; h P 2. The followers (attackers) play a best response: uAðx q ðxÞÞ vex hull of the feasible integer solutions when there is only one h u ðx; gÞfor any feasible g. follower. A 3. The follower breaks ties in favor of the leader: We begin by introducing notation and describing the integer u ðx; ðqhðxÞÞ Þ P u ðx; ðqhÞ Þ for any ðqhÞ that is optimal optimization formulations that have been considered previously D h2H D h2H h2H for the followers, that is for any h; qh 2 argmax uh ðx; gÞ. in the next section. We also introduce the equivalent MIPSG for- g A mulation. In Section 3 we present the column generation algorithm for the solution of the linear relaxation of MIPSG, along with a This can be formulated as the following bilevel optimization speed up that can be obtained by aggregating subproblems, and problem, where e is the vector of all ones of appropriate the existence of upper and lower bounds at every iteration. We dimension: also describe the branching strategies used in adapting this column ; h max uDðx ðq Þh2HÞ generation to a Branch and Price method and how to apply dual s:t: eT x ¼ 1; x P 0 stabilization techniques. We present our preliminary computa- h h ; T ; P h H: tional results in Section 4 and provide concluding remarks in q ¼ argmaxgfuAðx gÞje g ¼ 1 g 0g 2 Section 5. Given that the inner optimization problem is a linear optimization problem over the jQj dimensional simplex, there always exists an 2. Integer optimization formulations of SSG optimal pure-strategy response for the attacker, so in the integer optimization formulations we present now we restrict our attention In a Stackelberg security game we consider that the leader is the to the set of pure strategies for the attacker. As we see below, the defender and the attacker (of possibly many types) is the follower. optimality condition of the inner optimization problem can be We let H be the set of possible attacker types and assume that ph expressed with linear constraints and integer variables when we corresponds to a known a priori probability distribution that the make use of the fact that the followers respond with an optimal defender is facing an attacker of type h 2 H. The attacker may pure strategy. Although this leads to being able to use efficient decide to attack any one of a set of targets Q. The mixed strategy mixed integer optimization solution procedures, the problem for the hth attacker is the vector of probabilities over this set of tar- remains theoretically difficult as the problem of choosing the opti- gets, which we denote as qh ¼ðqhÞ . The defender allocates up to mal strategy for the leader to commit to in a Bayesian Stackelberg j j2Q game is NP-hard (Conitzer & Sandholm, 2006). N resources to protect targets, with N < jQj. Each resource can be The payoffs for agents depend only on the target attacked, the assigned to a patrol that protects multiple targets, s # Q, so the # adversary type and whether or not a defender resource is covering set of feasible patrols for one resource is a set S PðQÞ, where h PðQÞ represents the power set of Q. The defender’s pure strategies, the target. Let the parameter Rdj denote the defender’s utility, or or joint patrols, are combinations of up to N such patrols, one for reward, if j 2 Q is attacked by adversary h 2 H when it is covered each available resource. In addition we assume that in a joint by a defender resource. If j 2 Q is not covered, the defender patrol a target is covered by at most one resource. Let X denote h receives a penalty Pdj . Likewise, the attacker’s utilities are denoted the set of joint patrols, or defender strategies. A joint patrol i 2 X, h by a reward Raj when target j is attacked and not covered and pen- jQj can be represented by the vector a ¼½a ; a ; ...; a 2f0; 1g h i i1 i2 ijQj alty Pa , when j is attacked while protected. Therefore if we let j 2 i where a represents whether or not target j is covered in strategy j ij denote when target j 2 Q is protected by patrol i 2 X, then we con- i. The defender’s mixed strategy x ¼ðx Þ specifies the probabili- i i2X sider the following reward structure ties of selecting each joint patrol i 2 X. ( ( Both the leader and followers aim to maximize a linear utility h h h Rdj j 2 i h Paj j 2 i function that averages the rewards of every combination of pure R ¼ C ¼ : ij h R ij h R h h Pdj j i Raj j i strategies weighted by the mixed strategies. If we let Rij and Cij denote the utility received by the defender (and the hth attacker) Alternatively the strategy i can be represented by a vector for having the defender conduct patrol i while the hth attacker jQj ai 2f0; 1g such that aij ¼ 1 when j 2 i or when j 2 ai. Using this strikes target j, then the defender and hth attacker utilities are vector ai we have given by XXX h h h h h h h h R a : R Pd a Rd Pd C a : C ; h h h ð ijÞ ¼ ij ¼ j þ ij j j ð ijÞ ¼ ij uDðx ðq Þh2HÞ¼ p xiqj Rij h2H i2X j2Q h h h : ¼ Raj aij Raj Paj XX h ; h h h : uAðx q Þ¼ xiqj Cij We assume adding coverage to target j 2 Q is strictly better for the i2X j2Q h > h h > h defender and worse for the attacker. That is Rdj Pdj and Raj Paj . The goal is to find the optimal mixed strategy for the leader, given Note that this does not necessarily mean zero-sum. the follower may know this mixed strategy when choosing its strat- egy. Stackelberg equilibria can be of two types: strong and weak, as 2.1. DOBBS and ERASER described by Breton, Alj, and Haurie (1988). We use the notion of Strong Stackelberg Equilibrium (SSE), in which the leader selects Efficient and compact techniques for choosing the optimal an optimal mixed strategy based on the assumption that the strategies for Bayesian Stackelberg games have been a topic of 218 F. Lagos et al. / Computers & Industrial Engineering 111 (2017) 216–227 active research from the work of Paruchuri, Pearce, Tambe, gap. A branch and price method for ERASER is introduced in Jain Ordóñez, and Kraus (2007), Paruchuri et al. (2008). In particular, et al. (2010) and is shown to be efficient in practice and able to the DOBBS problem formulation below, introduced in Paruchuri solve large SSG problems. This algorithm will be used as a compar- et al. (2008), allows for a Bayesian Stackelberg game to be ison for the decomposition algorithm presented in this work. expressed compactly as a single mixed integer optimization A Branch and Price method is based on using a column genera- problem. tion method to solve the LP relaxation. In this column generation XXX h h h for ERASER, the master would solve the problem considering only max p z R ij ij a few of the defender strategies X X, obtaining an optimal master i2X h2H j2Q XX primal and dual solutions. Then, the method tests whether a defen- h h ðDOBBSÞ zij ¼ 1 8 der strategy variable xi would enter the master problem by check- i2X j2Q ing if its reduced cost is positive. Given the reduced master optimal X dual variables indicated in (2)–(6), the reduced cost for strategy zh ¼ qh 8j; h ij j i 2 X, also represented by the vector v 2f0; 1gjQj, is as follows, i2X XX X X h h h h h h h h h 6 v 6 ; h ci ¼ cv ¼ R b þ C ða r Þd ð11Þ 0 Cij zik ð1 qj ÞM 8j ij j ij j j j Q h2H i2X k2Q X2 X h h h X ¼ R ðv Þb þ C ðv Þðah rhÞd ð12Þ h ; h j j j j j j j zij ¼ xi 8i j2Q h2H j2Q Using this reduced cost expression we can define the subprob- h ; ; ; h zij 2½0 1 8i j lem for the ERASER’s column generation. In this case, the subprob- h lem also includes resources and patrol constraints. The branch and q 2f0; 1g 8j; h j price framework is used for ERASER is the same that is used for the

xi 2½0; 1 8i MIPSG model that will be presented in the next section. Thus, the only difference between the two models implementation are the Algorithms for large-scale SSG, using branch and price and fast branch and price tree nodes. upper bound generation framework are introduced in Jain et al. (2010). That work builds these algorithms from a more compact 2.2. Strong integer optimization formulation representation of DOBBS, which has been named as ERASER (Effi- cient Randomized Allocation of Security Resources). This formula- A novel equivalent formulation of this problem, a variation on h tion does not use variable zij obtaining a formulation that uses less the DOBBS formulation, was introduced in Casorrán-Amilburu variables overall but uses two sets of big M constraints. In the ERA- et al. (2017). In contrast to ERASER, this model has tighter linear SER formulation below we present the notation for the dual vari- representation but requires more variables. Below we present this ables of constraints (2)–(6) in parenthesis. optimization problem, referred to as Model Integer Problem for X h h Security Games (MIPSG). max p d ð1Þ XXX h h2H h h X max p zijRij ð13Þ h h h 6 ; h i2X h2H j2Q ðERASERÞ d xiRij ð1 qj ÞM1 8j ð2Þ i X XX X2 h h ph h h h zij ¼ 1 8 ð Þð14Þ a xiC 6 ð1 q ÞM2 8j; h ð3Þ ij j i2X j2Q X i2X X h 6 h ; h h h ; h rh xiCij a 8j ð4Þ zij ¼ qj 8j ð j Þð15Þ i2X i2X X X h h xi ¼ 1 ð5Þ h P ; ; h ah ðCij CikÞzij 0 8j k ð jkÞð16Þ i2X X i2X qh ¼ 1 8h ð6Þ X j h ; h bh j2Q zij ¼ xi 8i ð i Þð17Þ h ; ; h j2Q qj 2f0 1g 8j ð7Þ zh 2½0; 1 8i; j; h ð18Þ xi P 0 8i ð8Þ ij h ; ; h The M1 and M2 values are important for the ERASER perfor- qj 2f0 1g 8j ð19Þ mance, since their value helps determine how tight the linear x 0; 1 i 20 relaxation is. Thus they must be chosen large enough so that the i 2½ 8 ð Þ constraint does not eliminate a feasible solution but as small as In the above description we also give the notation for the dual vari- possible to give the tightest linear relaxation. The values for M1 ables of the linear relaxation of MIPSG for each of the four sets of and M2 are as follows, constraints. This is indicated by the variable in parenthesis on each h h constraint. M1 ¼ maxRdj minPdj ð9Þ j;h j;h The MIPSG formulation is similar to the DOBSS formulation of h h Stackelberg games. The only difference between these formula- M2 ¼ maxRaj minPaj ð10Þ j;h j;h tions is in how they represent the optimal response of the follow- H These values of M guarantee the problem keeps its feasible region ers. In DOBBS this is done by two sets of jQjj j constraints, with a h unchanged. We will show this in the next section for similar con- big M constant, that define the value v as the optimal reward stants in problem MIPSG. value for follower h. In MIPSG the characterization of the optimal When solving these equivalent formulations, one observes that follower response is done with the jQj2jHj constraints in (16). This the ERASER linear optimization relaxation is easier to solve than means that MIPSG is a formulation with more constraints than DOBBS, as it has less variables, however it gives a larger integrality DOBBS, but that does not need a big M constant. F. Lagos et al. / Computers & Industrial Engineering 111 (2017) 216–227 219

Proposition 2.1. Problem MIPSG is equivalent to DOBBS. h R This means that variables zij and xi with i X are not considered in the master problem and assumed fixed at 0. After solving the Proof. Problem MIPSG and DOBBS are the same except for one reduced master problem, the method looks for profitable strategies constraint. While in MIPSG the solution ðz; x; qÞ satisfies in X n X. To identify a profitable strategy i 2 X we should look for a P h h h h variable z or x with positive reduced cost. From linear program- ðC C Þz P 0 8j; k; h in DOBBS the solution ðz; x; q; vÞ satis- ij i i2X ij ik Pij P h h h h h ming duality we have that a positive reduced cost corresponds to fies 0 6 v C z 6 ð1 q ÞM 8j; h.Ifq ¼ 1 then the i2X ij k2QPik j h h h a violated dual constraint. Indeed, the process of column genera- DOBBS solution satisfies k2Q zik ¼ zih and therefore X X X X X X tion in a problem is equivalent to generating the corresponding h h h h 6 vh 6 h h h h ; dual constraints in the dual problem (Bertsimas & Tsitsiklis, Cijzih ¼ Cij zik Cih zik ¼ Cihzih i2X i2X k2Q i2X k2Q i2X 1997). Therefore to identify which variables (and corresponding strategies i 2 X) to add to the master, our method requires we iden- which is equivalent to the MIPSG constraint. tify constraints, either (22) or (24), in this dual problem that are Let us now consider a solution for MIPSG. If qh ¼ 1 then let P P h not being satisfied at the current dual optimal solution. Once the vh : h h h h ¼ i2X Cihzih. Since now we also have k2Q zik ¼ zih we have new variables are added to the master, we re-optimize the master from the MIPSG constraint that problem until there are no violated dual constraints. X X X However, the generic column generation method described vh h h P h h : ¼ Cihzih Cij zik above cannot be implemented as written since computing the i2X i2X k2Q h reduced cost of variables zij or xi that have not been considered This satisfies the DOBBS constraints as the only tight right hand R bh in the master (that is with i X), we need the dual variable i . This h inequality is the one that defines v . h is the dual variable corresponding to constraint (17) that is not pre- R bh sent in the master problem if strategy i X and therefore i is not The results in Casorrán-Amilburu et al. (2017) show that a solu- defined. tion that is feasible for the linear relaxation of the MIPSG formula- We address this difficulty by introducing a related optimization tion is a feasible solution for the linear relaxation of the DOBBS problem, which is based on the dual problem of MIPSG. Recall that formulation. Furthermore, the linear relaxation of the MIPSG prob- given a vector v 2f0; 1gjQj that represents a joint patrolling lem equals the convex hull of the feasible integer solutions when h h h h h h strategy, we denote R v Pd v Rd Pd and C v Ra there is only one adversary. ð jÞ ¼ j þ jð j j Þ ð jÞ ¼ j v h h h The total amount of defender’ strategies increase exponentially jðRaj Paj Þ the utility of the defender and the -th attacker if with the number of targets and resources. Without additional fea- target j is attacked. Assume a set S of individual patrols is given ; sibility constraints, the size of the set of possible defender strate- and for r 2 S let tjr 2f0 1g indicate whether patrol r covers target Q j or not. Consider the following optimization problem (SUBP): gies equals . This leads to problems that are too big to solve X N h max f ð26Þ in a standard computer. It is therefore necessary to find a way to f ;v;e;u h2H generate only the strategies that are used by the model. X h 6 h v h ph h h v h v h ah ; h f p Rð jÞ þ M ð1 ej Þ ðCð jÞ Cð kÞ Þ jk 8j ð27Þ k2Q 3. Column generation for MIPSG X ur 6 N ð28Þ r2S A column generation method on MIPSG aims at solving the lin- X h h ear relaxation of the problem by gradually considering more vari- ej ¼ 1 8 ð29Þ ables associated to the large set of defender strategies. The linear j2Q X relaxation of MIPSG relaxes the integrality constraints and consid- tjr ur ¼ vj 8j ð30Þ h h ers variables that satisfy 0 6 z ; q 2 R and x 2 R. Note that since r2S P P ij j i h h h h z ¼ 1 we still have that z ; q ; xi 2½0; 1. Below we give v ; ; ; ; ; ; h; i2X j2Q ij ij j j 2f0 1g ej 2f0 1g ur 2f0 1g 8j r ð31Þ the dual problem of the linear relaxation of the MIPSG problem, v using the dual variables identified in the statement of the MIPSG The solution vector of this problem corresponds to a joint patrol problem: formed by selecting individual patrols from the set S. Let ur be a bin- X ary variable that is 1 if the schedule r 2 S is used in the strategy v ph min ð21Þ and 0 otherwise. These ur must sum up to N, which is the number i2H h X of available resources. Finally, the variable ej is a binary variable h h 6 ph rh bh h h ah ; ; h noP p Rij þ j þ i þ ðCij CikÞ jk 8i j ð22Þ h h h h h h h that enables f ¼ maxj2Q p RðvjÞ p ðCðv jÞ CðvkÞ Þa . k2Q k2Q jk h h rh ; h This requires the use of a large constant M . The constant M should j ¼ 0 8j ð23Þ X be large enough so that when eh 0 the constraint becomes redun- bh j ¼ i ¼ 0 8i ð24Þ h2H dant, at the same time it is desirable that it be the smallest constant h ah 6 ; ; h that achieves this. The following result gives the best value for M . jk 0 8j k ð25Þ P h h h In the LP relaxation of MIPSG the constraint i2X zij ¼ qj Proposition 3.1. For all h 2 H, the smallest value of M that becomes redundant as it defines the value of qh, but this variable h j guarantees constraints (27) are redundant with ej ¼ 0 is no longer has to be an integer variable. This fact is reflected in that rh h h h h the corresponding dual variable j has a value of zero. M ¼ p maxRdj minPdj j j We now outline the column generation procedure that we pro- pose for MIPSG. We begin by solving a version of the MIPSG prob- ah h h 2jQj min jk maxRaj minPaj ð32Þ jk j j lem in which only a set X X of defender strategies are considered. 220 F. Lagos et al. / Computers & Industrial Engineering 111 (2017) 216–227 n P P h h h h h h 6 Proof. Recall that f will equal minj2Q p RðvjÞ p Proposition 3.2. If f ¼ max h2Hf ¼ h2H max Fij 0 for a new P v h v h ah h strategy i in the subproblem, then there is no new column that must be k2Q ðCð jÞ Cð kÞ Þ jkg. Then considering ej ¼ 0 in constraint h included to the master problem. This problem does not need more (27), we have that an M that makes the constraint redundant columns to be solved optimally. has to satisfy

X h h h h h b h h h Proof. The first thing we should notice is the i values can take M þ p RðvjÞ p C C a ij ik jk arbitrary values because their primal constraint is always feasible. k2Q P () h X Indeed, jzij ¼ xi is true for all strategy in or out of the master P h v h ph h h ah ; h H: bh max p Rð jÞ Cij Cik jk 8j 2 Q 2 problem at any iteration. Hence, if we find some arbitrary i that j2Q k2Q satisfies the dual problem for a non positive reduced cost strategy

h i, then it is not necessary to include that strategy. The M that satisfies this for all j 2 Q satisfies ()In fact, we know that in the dual problem we have to satisfy: X h h h h h h h h h 6 b ; h v p a Fij i 8j ð33Þ M þ min p Rð jÞ Cij Cik jk X j2Q h k2Q b ()i ¼ 0 ð34Þ X h2H h h h P max phRðv Þ ph C C ah : j ij ik jk h j2Q We can take an arbitrary h 2 H and set b such that k2Q P i h h b ¼ max F , and for all remaining h 2 H nfhg set To ensure we satisfy the above inequality, let us rearrange and i h2Hnfhg ij bh h bound: i ¼ max Fij. ()()bh X X These i for strategy i satisfies: h v h h h ah h v h h h ah X max p Rð jÞ ðCij CikÞ jk min p Rð jÞ ðCij CikÞ jk h j j 6 k2Q k2Q f ¼ max Fij 0 X h2H 6 h h h h h ah p ðmaxRdj minPdj Þ2max ðCij CikÞ jk X j j j h h k2Q 6 max Fij þ max Fij 0 no h2Hnfhg 6 h h h h ah h ah p ðmaxRdj minPdj Þ2jQjmaxRaj min jk þ 2jQjmaxmax Cik jk j j j jk j k h 6 bh max Fij i 6 h h h ah h h p ðmaxRdj minPdj Þ2jQj min jk ðmaxRaj minPaj Þ j j jk j j Using this last inequality it is easy to verify that for all j; h, the h h ¼ M values we have set for b meets the first set of constraints in (33) P i bh 6 and also h2H i ¼ 0. Therefore, when f 0 we have the conditions h necessary to finish the column generation. h

We now show that the optimal solution to SUBP either becomes the joint patrol that should be added to the master problem or it 3.1. Upper and lower bounds proves that the column generation found the optimal solution. The optimal solution to SUBP identifies a joint patrolling strategy Good upper and lower bounds can help speed up the column P h generation and branching process. Moreover, if we set optimality v. If the objective function f ¼ f of this solution is positive h2H tolerances, then having tight gaps lead to faster running times. then that strategy violates a constraint in the dual and must be We therefore are interested in being able to bound well the dis- incorporated into the master problem. To prove this define h h P h h tance between the optimal and the current solution. F ¼ phR ph ðC C Þah . ij ij k2Q ij ik jk P P Let L be the Lagrangian relaxation of MIPSG obtained by relax- h h It is easy to check that if f ¼ max hf ¼ max hFj is greater ing the adversaries best response constraint with a Lagrangian ah a than zero, then we can not satisfy the dual. In fact, multiplier of jk. This relaxation is therefore a function of and will X be updated every step, providing an upper bound for our problem. h h 6 ph rh bh h h ah p Rij þ j þ i þ ðCij CikÞ jk Next, we could write this function as follows, k2Q ! X XXX X h h ph rh h h ah 6 bh a h h h h ah h p Rij j ðCij CikÞ jk i Lð Þ¼max p R ðC C Þ z z;x ij ij ik jk ij k2Q i2X h2H j2Q k2Q ! XX X X X h h h h z ¼ 1 8h phR ph rh C C ah 6 bh ij ij j ð ij ikÞ jk i i2X j2Q h k2Q h X X X h ; h h h z ¼ xi 8i f F 6 b 0 ij ¼ j i ¼ j2Q h h zh 0; 1 i; j; h 6 ij 2½ 8 f 0 ! X X rh h h h h h In this set of equations we are using that j ¼ 0, from the dual 6 a max p Rij ðCij CikÞ jk h i2X;j2Q Eq. (23), when q 2 R, i.e., when there are no integer conditions. h2H k2Q j X X We show the condition we need in order to determinate ph h ¼ þ max Fij whether we can terminate the column generation or not. This con- j2Q;i2X h2H h2H dition is sufficient to guarantee optimality. F. Lagos et al. / Computers & Industrial Engineering 111 (2017) 216–227 221

The inequality in the second line is because we removed the classic large problems, such as Machine Scheduling, Bin Packing constraints that involved the xi variables. This further relaxed and Capacitated Vehicle Routing, reducing solution time by a factor a problem gives a value greater than the Lð Þ. In thisP way, we know of up to 5. They also develop a smoothing technique used for a in every iteration the optimal value is greater than ph and less hybridization of column generation with sub-gradient method. P P h2H ph h The smoothing technique in its simplest version is as follows. Let than h2H þ h2Hmaxj2Q;i2X Fij, therefore, the gap is P t h y be the dual solution at iteration t P 2 and 0 6 a 6 1bea f ¼ h Hmaxj Q;i X F , the objective function of the subproblem. 2 2 2 ij weighting parameter, then the dual y~t for the pricing problem for 6 This is further proof that when f 0 we have found the optimal the next iteration is solution using column generation. ~t ^t t So far, we have described how to identify new columns in our y ¼ ay þð1 aÞy : ð35Þ problem when the integrality constraints of the primal are relaxed, Here y^ is the dual associated to the best (min/max) Lagrangian dual h R i.e., when qj 2 . In the next subsection we discuss how to gener- solution so far. At each iteration the dual values are adjusted using ate columns (how to conduct pricing) when branching starts. as reference the best dual solution values. This gives some stability to the dual variables considered, preventing these dual variables 3.2. Branching scheme from changing radically from one iteration to the next. In this simple smoothing scheme we can face three situations: When the variable q is relaxed, we solve the master problem (i) updated duals give us a positive reduced cost column; (ii) we and the subproblem until the optimal solution is reached. How- get a new dual bound and improve the optimality gap; or (iii) a ever, the q variable must be integer for the general case, so we mis-pricing occurs and the next iteration smoothed prices get clo- implement a standard branch and price scheme. At every node ser to yt. A mis-pricing is when the subproblem finds a solution we solve the relaxed problem using column generation and then, with non-positive reduced cost with y^t, but that has positive if any of the integer variables is fractional, we branch on it. reduced costs if we use yt. Under these conditions, the column gen- In the dual MIPSG model presented in (21)–(25), there is a dual eration method with smoothing pricing approach converges to an rh h variable j related to the constraint that defines the qj primal vari- optimal solution in a finite number of iterations (Pessoa et al., able. If this primal variable is relaxed, the dual variable is equal to 2013). h a zero. However, when we branch on qj , then the dual variable is no A fixed gives a smoothing scheme that convergences after longer zero and it should be included in the subproblem. Hence, as some iterations. A better approach considers an auto-adaptive a, rh which increases and decreases as upper-lower bound gap changes. we branch in the branch and price tree, some of these j become active, changing the subproblem. The termination condition for In Pessoa et al. (2013) they propose an algorithm for this adaptive method, which is based on a sub-gradient information and a mis- the column generation is f 6 0. This condition remains a valid ter- pricing sequence for a given initial a. mination condition for the column generation in branched nodes, h but the non-zero r variables modify the value of f . j Algorithm 1. Sub-gradient routine. The strategy we follow to implement the branch and price is going through the tree following a depth-first search procedure, instead of a breadth-first search along nodes. In other words, we quickly find integer solutions, lower bounds of the problem and then check other branches. We also identify the variables those values are close to 0.5 in the first place, because they might be more decisive in the objective function.

3.3. Column generation stabilization by dual price smoothing

Column generation are iterative methods that can run into con- vergence problems when used to solve linear optimization prob- lem. Vanderbeck (2005) listed some of the typical issues that arise when implementing column generation methods due to the dual variables. Among the problems they have detected are: (i) slow convergence, a phenomenon called the tailing-off effect; (ii) irrelevant columns generated in first iterations; (iii) restricted master solution value keeps constant for several iterations; (iv) dual values that change considerably from one iteration to another; (v) Lagrangian dual bounds do not convergence monotonically. Some techniques have been developed in order to deal with these undesirable converging behavior. Lübbecke and Desrosiers (2005) described the three important methods: Weighted Danzing-Wolfe decomposition, method and Stabiliza- tion approach using primal and dual strategies. We will use the third method because it has shown a good performance solving classical problems and it is easy to implement (Pessoa, Sadykov, Uchoa, & Vanderbeck, 2013). A detailed description and analysis of a Stabilization approach using a smoothing strategy is given in Pessoa et al. (2013). That work also shows this algorithm improves the runtime for solving 222 F. Lagos et al. / Computers & Industrial Engineering 111 (2017) 216–227

Algorithm 2. Mis-pricing sequence. Algorithm 3. Column Generation Greedy.

In Algorithm 1, we use functions for increasing and decreasing a. These functions are as follows,

a a a : f incð tÞ¼ t þð1 tÞ0 1 ð36Þ ( a t if a 2 ½Þ0:5; 1 Algorithm 4. Greedy subroutine. a 1:1 t f decð tÞ¼ ð37Þ maxf0; at ð1 atÞ0:1g otherwise

The vector gt is the sub-gradient for a given dual yt ¼½p; a solution. This vector is computed as follows, () X P t t h h ph v h v h ah g y ¼ M ð1 ej Þ Cð jÞ Cð kÞ jk ð38Þ h2H k2Q

h v t where the values of ej and j for g are those we find through the subproblem at iteration t.

3.4. for subproblem

On one hand, the master problem solution for the leader corre- sponds to the best mixed strategy using available schedules. On the other hand, the subproblem finds the best schedule to include for a new column using the dual values from master problem. Since the number of resources is limited, it seems natural to solve this sub- problem with a greedy heuristic. Those targets with the highest value (from duals) for the objective function should probably be picked for the new column. Algorithm 4 describes this greedy heuristic in detail. We use this algorithm as an additional speed up subroutine for the column generation. First, we solve the Greedy Algorithm, if the column we find has a positive reduced cost, then we add it to Mas- ter problem. If not, then we try with SUBP described in Section 3, this optimization problem must be used for checking optimality at the last step. In the Greedy algorithm, we try all the schedules over set of 4. Computational results feasible schedules S and we keep in the new strategy only those with the highest reduced cost. We repeat this process until no We randomly generate a set of instances to be solved for each more resources can be assigned. Finally, we return the best solution method. The base algorithms considered are the branch strategy and its reduced cost. The algorithm we implement is and price methods for the MIPSG and the ERASER formulations Algorithm 3. of the problem, we refer to these solution algorithms as MIPSG-C F. Lagos et al. / Computers & Industrial Engineering 111 (2017) 216–227 223 and ERASER-C, respectively. The ERASER-C algorithm is the state of Table 2 the art benchmark from prior work (Jain et al., 2010). In addition Computational results summary. we solve each instance using the greedy subroutine and the stabi- Algorithm Problems solved (%) Nodes lization approach presented above to attempt to speed up the col- MIPS G-C 87.6 1 umn generation step when solving the MIPSG formulation. We ERASER-C 94.2 138 refer to these as GREEDY and STAB, respectively. In summary we GREEDY 85.4 1 present computational results to compare four solution methods: STAB 86.4 1 ERASER-C, MIPSG-C, GREEDY and STAB over randomly generated instances. We generate random instances by sampling rewards and penal- parameters are fixed at the base case: 70 targets, 600 feasible ties for the leader and the attacker and also by generating a ran- schedules and 5 targets covered by schedule. We present results dom set of initial patrols or schedules for each instance. We for 2, 4, 6, 8 and 10 resources. generate reward values using a log-normal distribution because The average number of columns that are generated are compa- this guarantees that the reward is positive. Furthermore the log- rable for all algorithms and varies with the number of resources. normal distribution depends on two parameters l and r that can The results suggest that instances with a large number of resources be used to adjust the distribution. If X Log-normalðl; rÞ then require less columns and are thus easier to solve. ERASER-C does l r X ¼ e þ Z with Z a standard normal distribution. We set very well with few resources as well. Finally, STAB reduces the l ¼ 3:107304 and r ¼ 1:268636 so that the coefficient of variation number of columns generated by MIPSG-C for all cases with pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi r2 lþr2=2 resources greater than 4. GREEDY does not seem to reduce the CV X ¼ e 1 ¼ 2 and the mean is EðXÞ¼e ¼ 50. Penalties number of columns generated. are generated in a similar way but with a negative log-normal. This In Figs. 1–3 we have plotted the average solution time for each guarantees that the penalty is always less than the reward for algorithm. When a method is not able to solve an instance within every attacker and defender. The set of available schedules is sam- the given time limit, we consider the maximum time of 2 h. In each pled from a discrete random variable in a way we do not have plot we present sensibility with respect to one parameter, keeping repeated schedules. We have seen that a coefficient of variation the rest in the base case. In Fig. 1 we see the solution times as a of 2 corresponds to a large input variability. function of the number of resources; in Fig. 2 as a function of the Our instances consider 1 adversary, and as a base case 70 differ- number of schedules; finally in Fig. 3 as a function of the number ent zones or targets, 5 police resources to be allocated, 600 individ- of targets. The running time increases more for ERASER-C than ual schedules that each police resource can choose from with each for the algorithms that tackle the MIPSG formulation. For large of these schedules covering 5 targets. We conduct sensitivity anal- number of resources (8 and 10) the runtime for MIPSG-C and ysis on these problem parameters by varying them as indicated in GREEDY is smaller than when there is less resources. The increase Table 1, base case is indicated by bold: in number of resources does create a significant difference For each combination of the first three problem parameters we between MIPSG-C and the speed-up methods, showing comparable generate 25 random problem instances also selecting randomly the runtimes regardless of the number of resources. The runtimes of value of targets per schedule. We vary one of the first three param- MIPSG-C is close to 5 times smaller than the runtimes for eters at a time and remove repeated instances. This gives a total of ERASER-C when there are many resources. Fig. 2 shows that the 500 problem instances that are solved by the four solution algo- algorithms that tackle the MIPSG formulation increase slightly as rithms. To solve each instance we use CPLEX 12.4 with a runtime the number of schedules that form the possible joint patrols limit set to 2 h. increases. ERASER-C is more sensitive showing a significant increase as the number of schedules goes from 600 to 1000. For 4.1. Algorithm comparison the instances with schedules from 600 to 1000 ERASER-C has more than 50% runtime than MIPSG-C. The speedup alternatives The first thing to note is that no algorithm is able to solve all of (GREEDY and STAB) give comparable runtimes to MIPSG-C. Finally, the 500 random problem instances in the 2 h time limit. In Table 2 in Fig. 3 we see that all four solution algorithms increase their run- we present the total percentage of problems solved and the num- time as the number of targets increase. In particular the runtimes ber of nodes in the branch and price tree used for each of the four increase significantly when the targets go from 70 to 80. Again solution algorithms considered MIPSG-C, ERASER-C, GREEDY and MIPS-G shows a slightly better runtime than ERASER-C and a com- STAB. Methods that use MIPSG as base model only need one node parable performance when compared to GREEDY and STAB. in B&P tree because the linear relaxation of this problem with one adversary gives the integer solution. ERASER needs to branch more 4.2. MIPSG column generation results because the big M formulation gives a larger integrality gap. Over- all the ERASER-C algorithm is able to solve the most instances, Fig. 4 shows how the number of targets affects the runtime for which suggest that it is more efficient. In what follows we present MIPSG-C. We have made this analysis showing separately the detailed results to understand for which problem parameters one result for each number of targets per schedule (T/S). The other algorithm is preferable over the other. parameters, schedules and resources, are the same as base case. The number of columns generated in the B&P method indicates From this plot, first we see that as we increase the number of tar- the amount of work that is needed to solve a problem. In Table 3 gets, the running time increases for all target to schedule values. we show the average number of columns generated by each algo- Note that this relation is not linear, which is consistent with the rithm for different number of resources. The remaining problem combinatorial nature of the problem. An instance with too many targets is hard to solve even when we have a limited number of Table 1 schedules and resources, in the base case set to 600 and 5, Problem parameters considered in computational results. Base case is in bold. respectively. We can also see that the T/S impacts the solution Number of targets 50, 60, 70, 80, 90, 100 time, a 2 T/S instance is easier to solve than a 5 T/S instance. We Defender resources 1, 2, 3, 4, 5,6,7,8,9,10 can see a direct relation between T/S and runtime. An explanation Number of schedules 200, 400, 600, 800, 1000 for this is that the number of T/S of an instance is related to how Targets covered per schedule 2, 3, 4, 5 flexible it is to cover targets. The case with 2 T/S assigns patrols 224 F. Lagos et al. / Computers & Industrial Engineering 111 (2017) 216–227

Table 3 Average number of generated columns. Targets 70, schedules 600, targets/schedule 5.

Resources MIPSG-C ERASER-C GREEDY STAB 2 484 115 578 487 4 482 401 367 406 6 734 886 639 680 8 158 388 209 120 10 115 140 118 54

Fig. 1. Running time versus number of resources.

Fig. 2. Running time versus number of schedules.

Fig. 3. Running time versus number of targets. to resources that more easily can be combined to protect targets of seem to be faster to solve than the values in the middle, for exam- interest. ple 5 or 6. Finally, in Fig. 6 we have runtime versus number of In Fig. 5 we show the average runtime for each resource number available schedules and T/S, there is no clear dependency of the and T/S value. The number of resources represents how many runtime on the number of schedules, as in all the plots that sepa- schedules can be in the same strategy. It is not clear how the num- rate results for each T/S value se see that there is a tendency to ber of resources impacts on time, but the extreme values, 1 and 10, increase solution time as the number of T/S increases. F. Lagos et al. / Computers & Industrial Engineering 111 (2017) 216–227 225

Fig. 4. Running time versus number of targets and T/S for MIPSG-C.

Fig. 5. Running time versus number of resources and T/S for MIPSG-C.

Fig. 6. Running time versus number of schedules and T/S for MIPSG-C.

4.3. MIPSG stabilization translated to a significant runtime reduction as can be seen from the aggregate results presented earlier. Further work is needed to For MIPSG model it is critical to find useful speedup strategies check whether other stabilization methods could lead to better for the column generation to be able to reduce the overall runtime. performance. When dual values vary substantially from one iteration to the next, we probably generate irrelevant columns as Vanderbeck has estab- lished (Vanderbeck, 2005). The dual stabilization method we 5. Conclusions implemented addresses this issue directly. Fig. 7 shows the upper and lower bounds for an instance, where we observe that STAB, the In this paper we have introduced a column generation method stabilization procedure that smoothes the dual variables, has a to solve a novel mixed integer formulation of Stackelberg Security more monotonic behavior. In this case, we not only have a Games. Decomposition methods are key to be able to solve ever smoother curve but also the method finishes in fewer iterations, larger problem instances and strong formulations, such as MIPSG, about 45 versus 50. In addition, note that the lower bound is provide good opportunities to develop efficient algorithms. That affected, the STAB bound reaches a higher value faster than said, even if the linear relaxation problems of MIPSG yield better MIPSG-C bound. The graph however shows that there still is some integrality gaps than other existing formulations, the relaxations variability and the rate of convergence, although smaller is compa- of MIPSG turn out to be challenging to solve due to the large num- rable to the un-stabilized case. This improved performance has not ber of variables in the formulation. 226 F. Lagos et al. / Computers & Industrial Engineering 111 (2017) 216–227

Fig. 7. Example of stabilization procedure (STAB) on convergence of column generation.

A standard column generation model would not work for this Acknowledgement MIPSG problem, as the reduced cost of candidate variables depend on undefined dual variables. We circumvent this challenge by for- Research was supported by CONICYT through Fondecyt grant mulating a related problem to replace the standard column gener- 1140807 and the Complex Engineering Systems Institute, ISCI ation subproblem, and use this to generate candidate joint (CONICYT: FB0816). The research of third author has been sup- strategies and to detect optimality. We are able to obtain upper ported by the Interuniversity Attraction Poles Programme P7/36 and lower bounds at every iteration. Furthermore we explore ‘‘COMEX: combinatorial optimization & exact methods to speed up the column generation approach, including methods” of the Belgian Science Policy Office. greedy heuristics to solve the subproblem SUBP, or dual smoothing techniques. Problem structure in SSG can be exploited to create a polyno- References mial mixed integer formulation for SSG when defender strategies Bertsimas, D., & Tsitsiklis, J. (1997). Introduction to linear optimization. Athena consist of all patrols that deploy N resources on jQj targets. The Scientific. approach is based on a formulation of the frequency with which Breton, M., Alj, A., & Haurie, A. (1988). Sequential Stackelberg equilibria in two- each target is protected and the fact that there are no patrol feasi- person games. Journal of Optimization Theory and Applications, 59(1), 71–97. Casorrán-Amilburu, C., Fortz, B., Labbé, M., & Ordóñez, F. (2017). A study of general bility constraints allows to obtain an implementable solution strat- and security Stackelberg game formulations. Working paper. Département egy from this optimal frequency, (Kiekintveld et al., 2009). In the d’Informatique, Université Libre de Bruxelles. game considered in this work however, joint strategies are con- Conitzer, V., & Sandholm, T. (2006). Computing the optimal strategy to commit to. structed from a given set of fixed patrolling alternatives. This In: Proc. of the 7th ACM conference on electronic commerce (pp. 82–90). Hochbaum, D. S., Lyu, C., & Ordóñez, F. (2014). Security routing games with makes it impossible to use the polynomial reformulation forcing multivehicle chinese postman problem. Networks, 64(3), 181–191. the use of exact decomposition algorithms. Jain, M., Kardes, E., Kiekintveld, C., Ordóñez, F., & Tambe, M. (2010). Security games Our preliminary computational results evaluate the efficiency with arbitrary schedules: A branch and price approach. In: Proc. of the 24th AAAI conference on artificial intelligence (pp. 792–797). of the column generation method presented, as well as the greedy Jain, M., Kiekintveld, C., & Tambe, M. (2011). Quality-bounded solutions for finite heuristic and the stabilization by dual price smoothing. Further- bayesian Stackelberg games: Scaling up. In: Proc. of the 10th international more we contrast these results with ERASER, a state of the art conference on autonomous agents and multiagent systems (AAMAS). Jain, M., Korzhyk, D., Vanek, O., Pechoucek, M., Conitzer, V., & Tambe, M. (2011). A column generation method, built on a different mixed integer for- double oracle algorithm for zero-sum security games on graphs. In: Proc. of the mulation. We note that ERASER branches more than the branch 10th international conference on autonomous agents and multiagent systems and price method proposed for MIPSG, ERASER also generates (AAMAS). Jain, M., Tsai, J., Pita, J., Kiekintveld, C., Rathi, S., Tambe, M., et al. (2010). Software more columns as the number of resources increases. However assistants for randomized patrol planning for the LAX airport police and the since each linear relaxation for ERASER is easier to solve, it federal air marshal service. Interfaces, 40, 267–290. remains a competitive solution alternative. Our computational Kiekintveld, C., Jain, M., Tsai, J., Pita, J., Tambe, M., & Ordóñez, F. (2009). Computing optimal randomized resource allocations for massive security games. In: Proc. of results show that MIPSG-C exhibits a significantly smaller run- the 8th international conference on autonomous agents and multiagent systems time when either the number of resources or the number of (AAMAS) (pp. 689–696). schedules increases. Labbé, M., Marcotte, P., & Savard, G. (1998). A bilevel model of taxation and its We note that there remains a challenge on how to solve the lin- application to optimal highway pricing. Management Science, 44(12-part 1), 1608–1622. ear relaxations for the MIPSG problem faster. Our computational Lübbecke, M. E., & Desrosiers, J. (2005). Selected topics in column generation. results show slight improvements from using both the Greedy Operations Research, 53(6), 1007–1023. heuristic and the stabilization procedure, but further work is nec- Paruchuri, P., Pearce, J. P., Tambe, M., Ordóñez, F., & Kraus, S. (2007). An efficient heuristic approach for security against multiple adversaries. In: Proc. of the 6th essary to make these speedup methods beneficial overall. Being international conference on autonomous agents and multiagent systems (AAMAS) able to more efficiently solve the linear relaxations of MIPSG can (pp. 311–318). help construct an efficient large scale algorithm for Stackelberg Paruchuri, P., Pearce, J. P., Marecki, J., Tambe, M., Ordóñez, F., & Kraus, S. (2008). Playing games with security: An efficient exact algorithm for Bayesian games as a column generation on MIPSG solves far fewer linear Stackelberg games. In: Proc. of the 7th international conference on autonomous relaxation problem. agents and multiagent systems (AAMAS) (pp. 895–902). F. Lagos et al. / Computers & Industrial Engineering 111 (2017) 216–227 227

Pessoa, A., Sadykov, R., Uchoa, E., & Vanderbeck, F. (2013). In-out separation and Vanderbeck, F. (2005). Implementing mixed integer column generation. In Column column generation stabilization by dual price smoothing. In Experimental generation (pp. 331–358). Springer. algorithms (pp. 354–365). Springer. von Stackelberg, H., Bazin, D., Hill, R., & Urch, L. (2010). Market structure and Pita, J., Tambe, M., Kiekintveld, C., Cullen, S., & Steigerwald, E. (2011). GUARDS – equilibrium. Berlin Heidelberg: Springer. game theoretic security allocation on a national scale. In: Proc. of the 10th Yang, R., Jiang, A.X., Tambe, M., & Ordóñez, F. (2013). Scaling-up security games international conference on autonomous agents and multiagent systems (AAMAS) with boundedly rational adversaries: A cutting-plane approach. In: Proc. of the (pp. 37–44). 24nd international joint conference on artificial intelligence (IJCAI). Shieh, E., An, B., Yang, R., Tambe, M., Baldwin, C., DiRenzo, J., ..., & Meyer, G. (2012). Yin, Z., Korzhyk, D., Kiekintveld, C., Conitzer, V., & Tambe, M. (2010). Stackelberg vs. PROTECT: A deployed game theoretic system to protect the ports of the United Nash in security games: Interchangeability, equivalence, and uniqueness. In: States. In: Proc. of the 11th international conference on autonomous agents and Proc. of the 9th international conference on autonomous agents and multiagent multiagent systems (AAMAS). systems (AAMAS) (pp. 1139–1146).