Int. J. Appl. Math. Comput. Sci., 2015, Vol. 25, No. 2, 337–351 DOI: 10.1515/amcs-2015-0026
COMPUTING THE STACKELBERG/NASH EQUILIBRIA USING THE EXTRAPROXIMAL METHOD: CONVERGENCE ANALYSIS AND IMPLEMENTATION DETAILS FOR MARKOV CHAINS GAMES
a b a,∗ KRISTAL K. TREJO ,JULIO B. CLEMPNER ,ALEXANDER S. POZNYAK
aDepartment of Automatic Control, Center for Research and Advanced Studies Av. IPN 2508, Col. San Pedro Zacatenco, 07360 Mexico City, Mexico e-mail: {apoznyak,ktrejo}@ctrl.cinvestav.mx
bCenter for Economics, Management and Social Research, National Polytechnic Institute Lauro Aguirre 120, col. Agricultura, Del. Miguel Hidalgo, Mexico City, 11360, Mexico e-mail: [email protected]
In this paper we present the extraproximal method for computing the Stackelberg/Nash equilibria in a class of ergodic controlled finite Markov chains games. We exemplify the original game formulation in terms of coupled nonlinear pro- gramming problems implementing the Lagrange principle. In addition, Tikhonov’s regularization method is employed to ensure the convergence of the cost-functions to a Stackelberg/Nash equilibrium point. Then, we transform the problem into a system of equations in the proximal format. We present a two-step iterated procedure for solving the extraproximal method: (a) the first step (the extra-proximal step) consists of a “prediction” which calculates the preliminary position approximation to the equilibrium point, and (b) the second step is designed to find a “basic adjustment” of the previous prediction. The procedure is called the “extraproximal method” because of the use of an extrapolation. Each equation in this system is an optimization problem for which the necessary and efficient condition for a minimum is solved using a quadratic programming method. This solution approach provides a drastically quicker rate of convergence to the equilib- rium point. We present the analysis of the convergence as well the rate of convergence of the method, which is one of the main results of this paper. Additionally, the extraproximal method is developed in terms of Markov chains for Stackelberg games. Our goal is to analyze completely a three-player Stackelberg game consisting of a leader and two followers. We provide all the details needed to implement the extraproximal method in an efficient and numerically stable way. For in- stance, a numerical technique is presented for computing the first step parameter (λ) of the extraproximal method. The usefulness of the approach is successfully demonstrated by a numerical example related to a pricing oligopoly model for airlines companies.
Keywords: extraproximal method, Stackelberg games, convergence analysis, Markov chains, implementation.
1. Introduction traproximal step” i.e., gathering certain information on the future development of the process. Using this The extraproximal approach (Antipin, 2005) can be information, it is possible to execute the “principle considered a natural extension of the proximal and step” based on the preliminary position. It seems gradient optimization methods used for solving the more natural to call this procedure an “extraproximal method” difficult problems of finding an equilibrium point in (because of the use of extrapolation (Antipin, 2005)). game theory. The simplest and most natural approach Along with the extra-gradient technique (Poznyak, 2009), to implement the proximal method is to use a simple the extraproximal method can be considered a natural iteration by omitting the prediction step. However, extension of the proximal and gradient optimization as shown by Antipin (2005), this approach fails. A approaches to resolve more complicated game problems, more versatile procedure would be to perform an “ex- such as the Stackelberg–Nash game. ∗Corresponding author Our goal is to analyze a three-player Stackelberg 338 K.K. Trejo et al.
game (von Stackelberg, 1934), a leader and two followers, the extraproximal method. The usefulness of the method for a class of ergodic controllable finite Markov chains is successfully demonstrated by a numerical example (Poznyak et al., 2000) using the extraproximal method related to a pricing oligopoly model for airlines companies (Antipin, 2005). The leader commits first to a strategy x that choose to offer limited airline seats for specific routes. in the Markov chains game. Then, the followers realize a The paper is organized as follows. The next section Nash solution (Clempner and Poznyak, 2011) obtaining presents the preliminaries needed to understand the rest · ·| ϕl( , x)(l =1, 2) as the cost functions. The optimal of the paper. The formulation of the problem for the actions x of the leader are then chosen to minimize its N -player game and the general format iterative version · ·| payoff ϕ0( , x). The main concern about Stackelberg of the extraproximal method is presented in Section 2. games is as follows: the highest leader payoff is obtained A three-player Stackelberg game for a class of ergodic when the followers always reply in the best possible way controllable finite Markov chains using the extraproximal for the leader (this payoff is at least as high as any Nash method is given in Section 4. Section 5 includes the payoff). For ground-breaking works in the Stackelberg analysis of the converge of the extraproximal method and game, see those by Bos (1986; 1991), Harris and Wiens proves the convergence rates, which is one of the main (1980), Merril and Schneider (1966) or von Stackelberg results of this paper. A numerical example related to a (1934). Surveys can be found in the works of Breitmoser pricing oligopoly model for airline companies validates (2012), De Fraja and Delbono (1990), Nett (1993) or the proposed extraproximal method in Section 6. Final Vickers and Yarrow. (1998). comments are outlined in Section 7. In this paper we present the extraproximal method for computing the Stackelberg/Nash equilibria in a class of ergodic controlled finite Markov chains games. We exemplify the original game formulation in terms of 2. Controllable Markov chains coupled nonlinear programming problems implementing the Lagrange principle. In addition, Tikhonov’s A controllable Markov chain (Clempner and Poznyak, regularization method is employed to ensure the 2014; Poznyak et al., 2000) is a quadruple MC = convergence of the cost-functions to a Stackelberg/Nash {S, A, Υ, Π},whereS is a finite set of states, S ⊂ N, equilibrium point. The Tikhonov regularization is one of endowed with discrete topology; A is the set of actions, the most popular approaches to solve discrete ill-posed which is a metric space. For each s ∈ S, A(s) ⊂ A is problems represented in the form of a non-obligatory strict the non-empty set of admissible actions at state s ∈ S. convex function. Then, we transform the problem to a Without loss of generality we may take A= ∪s∈SA(s); system of equations in the proximal format. Υ={(s, a)|s ∈ S, a ∈ A(s)} is the set of admissible We present a two-step iterated procedure for solving state-action pairs, which is a measurable subset of S × A; the extraproximal method: (a) the first step (the Π(k)= π(i,j|k) is a stationary transition controlled extra-proximal step) consists of a “prediction” which matrix, where calculates the preliminary position approximation to the equilibrium point, and (b) the second step is designed to π(i,j|k) find a “basic adjustment” of the previous prediction. Each equation in this system is an optimization problem for ≡ P s(n +1)=s(j)|s(n)=s(i),a(n)=a(k) , which the necessary condition for a minimum is solved using a quadratic programming method, instead of the representing the probability associated with the transition iterating projectional gradient method given by Moya and from state s(i) to state s(j) under an action a(k) ∈ A s(i) Poznyak (2009). Iterating these steps, we obtain a new (k =1,...,M) at time n =0, 1,... . quick procedure which leads to a simple and logically justified computational realization: at each iteration of The dynamics of the game for Markov chains are both sub-steps of the procedure, the functional of the game described as follows. The game consists of (N +1) decreases and finally converges to a Stackelberg/Nash players (denoted by l = 0, N ) and begins at the initial l equilibrium point (Trejo et al., 2015). state s (0) which (as well as the states further realized We present the analysis of the convergence as well by the process) is assumed to be completely measurable. the rate of convergence of the method. Additionally, the Each player l is allowed to randomize, with distribution l l ∈ l l extraproximal method is developed in terms of Markov d(k|i)(n), over the pure action choices a(k) A (s(i)), chains for Stackelberg games consisting of a leader and i = 1,Nl and k = 1,Ml. The leader corresponds to two followers. We provide all the details needed to l =0and followers to l = 1, N . At each fixed strategy 0 implement the extraproximal method in an efficient and of the leader d(k|i)(n), the followers make the strategy numerically stable way. For instance, a numerical method selection trying to realize a Nash equilibrium. Below we l l is presented for computing the first step parameter (λ)of will consider only stationary strategies d(k|i)(n)=d(k|i). Computing the Stackelberg/Nash equilibria using the extraproximal method. . . 339
These choices induce the state distribution dynamics Notice that by (4) it follows that l l l l l P s =s(il) = c , P s (n +1)=s(jl) (il,kl) kl N M l (6) l l c(i ,k ) l l l l l l l d(kl|il)= l . = π(il,jl|kl)d(kl|il) P s (n)=s(il) . c(il,kl) il=1 kl=1 kl In the ergodic case (when all Markov chains are In the ergodic case, l ergodic for any stationary strategy d(k|i)) the distributions l l l c(il,kl) > 0 P s (n +1)= s(jl) exponentially quickly converge to l kl their limits P s = s(i) satisfying for all l = 0, N .Theindividual aim of each participant is l l P s = s(jl) Jl(cl) → min . Nl Ml cl∈Cl l l l l (1) adm = π(il,jl|kl)d(kl|il) P s =s(il) . il=1 kl=1 Obviously, here we have a conflict situation which can be resolved by the Stackelberg–Nash equilibrium concept For any player l, his/her individual rationality is the discussed in detail below. player’s cost function J l of any fixed policy dl,defined over all possible combinations of states and actions, and 3. Formulation of the problem indicates the expected value when taking action al in state sl and following policy dl thereafter. The J-values can be 3.1. Stackelberg–Nash equilibrium concept. To expressed by simplify the descriptions, below let us introduce the new variables: l l l l l l J (i ) (c ):= W(il,kl)d(kl|il)P s =s l , (0) i ,k (2) (0) l l x := col c ,X:= Cadm, (7) where l (l) l (l) N l l l v := col c ,V:= Cadm l = 1, . W(il,kl) = J(il,jl,kl)π(il,jl|kl), jl Consider a non-zero sum game with a leader whose l strategies are denoted by x ∈ X and N followers with (i ) the function J(il,jl,kl) being a constant at state s l when l l l strategies v ∈ V l = 1, N . Denote by the action a(kl) is applied. Then, the cost function of each player, depending on the states and actions of all N l 1 N ∈ l participants, is given by the values W(i1,k1;...;iN ,kN ),so v =(v ,...,v ) V := V that the “average cost function” Jl for each player l in the l=1 stationary regime can be expressed as the joint strategy of the followers. The leader is assumed to anticipate the reactions of the followers. They are l 0 N J c ,...,c trying to reach one of the Nash equilibria for any fixed N strategy x of the leader, that is, to find a joint strategy ··· l l ∗ 1∗ N∗ ∈ := W(i1 ,k1,...,iN ,kN ) c(il,kl), (3) w = w ,...,w W satisfying for any admissible l l i0,k0 iN ,kN l=0 x ∈ X, w ∈ W and any l = 1, N the system of inequalities (the Nash condition) where l l l lˆ∗ l l c := c(il,kl) il=1,Nl;kl=1,Ml gl(v ,w |x) ≤ 0 for any v ∈ V and all l = 1, N , is a matrix with elements l lˆ l lˆ l lˆ gl(v ,w |x):=ϕl(w ,w |x) − ϕl(v ,w |x), l l l l c(i ,k ) = d(k |i )P s =s(i ) (4) (8) l l l l l ˆ where wl is a strategy of the rest of the followers adjoint satisfying to vl, namely, ⎧ cl : cl =1,cl ≥0, lˆ 1 l−1 l+1 N ⎪ (il,kl) (il,kl) w := v ,...,v ,v ,...,v ⎨ il,kl cl ∈ Cl = N adm ⎪ (5) ˆl m ⎩⎪ l l l ∈ W := V c(jl,kl)= π(il,jl|kl)c(il,kl). kl il,kl m=1,m=l 340 K.K. Trejo et al.
and The approximative solution obtained by the Tikhonov l l lˆ w := arg min ϕl v ,w |x . regularization (see Poznyak et al., 2000) is given by vl∈V l ∗ ∗ (xδ ,wδ )=argmin max Lδ(x, v, w, λ), l lˆ x∈X v∈V, w∈W, λ≥0 Here ϕl(v ,w |x) is the cost-function of the player l, which plays the strategy vl ∈ V l, and the rest of the ˆ ˆ δ 2 wl ∈ W l Lδ(x, v, w, λ):=fδ(x, w)+λgδ(v, w|x) − λ , players—the strategy . 2 (12) Lemma 1. The Nash equilibrium w¯ ∈ W for the fol- where δ>0 and lowers given a fixed strategy of the leader x ∈ X (8) can δ 2 be equivalently expressed in the joint format (Tanaka and fδ(x, w):=f(x, w)+ x , Yokoyama, 1991), 2 (13)
max g (v, w¯|x) ≤ 0, δ 2 2 v∈W gδ(v, w|x)=g(v, w|x)− (v +w ) 2 N l lˆ l lˆ Notice that with δ>0 the functions considered g(v, w|x):= ϕl(w ,w |x) − ϕl(v ,w |x) , l=1 become strictly convex providing the uniqueness of the (9) discussed conditional optimization problem (12). For δ = l lˆ l lˆ ϕl(w ,w |x):= min ϕl(z ,w |x), 0 we have (11). Notice also that the Lagrange function in l l z ∈W (12) satisfies the saddle-point condition (Poznyak, 2009), ∈ ≥ N namely, for all x X and λ 0 we have l l l v ∈ W ,w∈ W := W . ∗ ∗ ∗ ∗ ∗ l=1 Lδ(xδ ,vδ,wδ,λδ) ≤Lδ(xδ ,vδ ,wδ ,λδ ) ∗ ∗ ∗ (14) ≤Lδ(xδ,v ,w ,λ ). Proof. Summing (8) implies (9). Conversely, taking δ δ δ vm = wm for all m = l in (9), which is valid for any admissible vl, we obtain (8). 3.3. Proximal format. In the proximal format (see Antipin, 2005) the relation (12) can be expressed as Notice that the condition g(v, w¯|x) ≤ 0 (9) is equivalent to ∗ 1 ∗ 2 ∗ ∗ ∗ λδ =argmax − λ − λδ + γLδ(xδ ,vδ ,wδ ,λ) , max {g(v, w¯|x)}≤0 λ≥0 v∈V 2 for any fixed v ∈ V and any x ∈ X. ∈ ∈ ∗ 1 ∗ 2 ∗ ∗ ∗ Let f (x, w)(x X, w W ) be the cost function xδ =argmin x − xδ + γLδ(x, vδ ,wδ ,λδ ) , l lˆ x∈X 2 of the leader. The functions of the followers ϕl(v ,w |x) N l = 1, and the function of the leader f (x, w) are assumedtobeconvexinalltheirarguments. ∗ 1 ∗ 2 ∗ ∗ ∗ vδ =argmax − v − vδ + γLδ(xδ ,v,wδ ,λδ ) , v∈V 2 Definition 1. Astrategyx∗ ∈ X of the leader together ∗ with the collection w ∈ W of the followers is said to be a StackelbergÐNash equilibrium if ∗ 1 ∗ 2 ∗ ∗ ∗ wδ =argmax − w − wδ + γLδ(xδ ,vδ ,w,λδ ) , w∈W 2 (x∗,w∗) ∈Arg min max {f(x, w)|g(v, w|x)≤0} . x∈X v∈V,w∈W (15) (10) ∗ ∗ ∗ ∗ where the solutions xδ ,vδ ,wδ and λδ depend on the small parameters δ, γ > 0. Define the extended variables 3.2. Regularized Lagrange principle application. ⎛x˜:=⎞ x ∈ X˜:=X, Applying the Lagrange principle (see, for example, v Poznyak et al., 2000) for Definition 1, we may conclude y˜:= ⎝ w ⎠ ∈ Y˜ :=V × W × R+, that (10) can be rewritten as λ (x∗,w∗) ∈Arg min max L(x, v, w, λ), x∈X v∈V, w∈W, λ≥0 w˜1 w˜ = ∈ X˜ × Y,˜ w˜2 L(x, v, w, λ):=f(x, w)+λg(v, w|x). v˜1 v˜ = ∈ X˜ × Y,˜ (11) v˜2 Computing the Stackelberg/Nash equilibria using the extraproximal method. . . 341 and the functions Remark 1. The direct one-step procedure, when λn+1 = λn, does not work (see the counter-example of Antipin L˜δ(˜x, y˜):=Lδ(x, v, w, λ), (2005)).
˜ ˜ Ψδ(˜w, v˜):=Lδ(˜w1, v˜2) − Lδ(˜v1, w˜2). 4. Markov format for the extraproximal ∗ ∗ ∗ ∗ For w˜1 =˜x, w˜2 =˜y, v˜1 =˜v1 =˜xδ and v˜2 =˜v2 =˜yδ ,we method have ∗ ∗ We consider a three-player Stackelberg game l = 1, N . Ψδ(˜w, v˜):=L˜δ(˜x, y˜δ ) − L˜δ(˜xδ , y˜). Player l should be understood to refer to players in a In these variables the relation (15) can be represented in a general expression of the game. In the remainder of “short format”as this paper, the leader is designated as player (0) and ∗ 1 ∗ 2 ∗ the followers as players (1) and (2). Numbers are used v˜ =arg min w˜ − v˜ +γΨδ(˜w, v˜ ) . (16) w˜∈X×Y 2 to refer to players in a specific expression. We also assume that the number of strategies and actions that this 3.4. Extraproximal method. The extraproximal description can take is finite and fixed. Here we will apply method for the conditional optimization problems (12) the iterative quadratic method providing a significantly was suggested by Antipin (2005). We design the method quicker rate of convergence. for the static Stackelberg–Nash game in a general format. The general format iterative version (n =0, 1,...) 4.1. Cost functions and notation. The individual of the extraproximal method with some fixed admissible cost-functions of the followers and the leader are defined initial values (x0 ∈ X, v0 ∈ V , w0 ∈ W ,andλ0 ≥ 0) is as follows. as follows: Cost-function for the leader: 1. The first half-step (prediction): J(0)(c(1),c(2),c(0)) 1 2 λn =argmin λ − λn −γLδ(xn,vn,wn,λ) , λ≥0 2 (0) (1) (2) (0) = W c c c . 1 2 (i1 ,k1;i2,k2;i0,k0) (i1,k1) (i2,k2) (i0,k0) xn =argmin x − xn +γLδ(x, vn,wn, λ¯n) , i ,k i ,k i ,k x∈X 2 1 1 2 2 0 0 (21) 1 2 vn =argmin v − vn −γLδ(xn,v,wn, λ¯n) , v∈V 2 Cost-functions for the followers: 1 2 w¯n =argmin w − wn −γLδ(xn,vn,w,λ¯n) , w∈W 2 Follower 1: (17) J(1)(c(1),c(2),c(0)) or, in the extended variables, 1 2 (1) (1) (2) (0) vˆn =arg min 2 w˜ − v˜n +γΨδ(˜w, v˜n) . (18) w˜∈X×Y = W(i1 ,k1;i2,k2;i0,k0)c(i1,k1)c(i2,k2)c(i0,k0). i1,k1 i2,k2 i0,k0 2. The second (basic) half-step, (22) 1 2 λn+1 =argmin λ − λn −γLδ(¯xn, v¯n, w¯n,λ) , Follower 2: λ≥0 2 (2) (1) (2) (0) J (c ,c ,c ) 1 2 xn+1 =argmin x − xn +γLδ(x, v¯n, w¯n, λ¯n) , x∈X 2 (2) (1) (2) (0) 1 2 = W(i ,k ;i ,k ;i ,k )c(i ,k )c(i ,k )c(i ,k ). vn+1 =argmin v − vn −γLδ(¯xn,v,w¯n, λ¯n) , 1 1 2 2 0 0 1 1 2 2 0 0 v∈V i ,k i ,k i ,k 2 1 1 2 2 0 0 1 2 (23) wn+1 =argmin w − wn −γLδ(¯xn, v¯n,w,λ¯n) w∈W 2 For n =0, 1,..., let us define the vectors (19) (0) or, in the “short format”, xn =colc (n), 1 2 v˜n+1 =arg min w˜ − v˜n + γΨδ(˜w, vˆn) . (1) w˜∈X×Y 2 col c (n) vn = , (20) col c(2) (n) 342 K.K. Trejo et al. (1)ˆ l col c (n) For computing the Nash equilibrium point, we define wn = w = ˆ , col c(2) (n) the function g(v, w, x) as follows:
g(v, w|x) (1) l col ˚c (n) ˆ ˆ w = (1) (1) (2) (0) (2) (1) (2) (0) col ˚c(2) (n) = W ˚c c c + W c ˚c c i ,k i ,k i ,k 1 1 2 2 0 0 ˆ ˆ ⎛ ⎞ − W (1)c(1)c(2)c(0) − W (2)c(1)c(2)c(0) . (1) (2)ˆ col arg min J z,c (n) ,xn ⎜ z∈C(1) ⎟ (27) ⎜ adm ⎟ = ⎝ (2) (1)ˆ ⎠ . col arg min J c (n) ,z,xn (2) z∈Cadm 4.2. Tikhonov’s regularization method. The regula- rized problem for (24) is defined as follows: Let us introduce for simplicity the following notation: (0) (1) (2) (0) δ (0) 2 fδ(x, w)= W ˚c ˚c c + c , 2 (l) (l) i1,k1 i2,k2 i0,k0 W(i1 ,k1;i2,k2;i0,k0) = W and for (27) we have (l) (l) c(il,kl) = c gδ(v, w|x) (ˆl) (ˆl) ˆ ˆ c (il,kl)=c = W (1)˚c(1)c(2)c(0) + W (2)c(1)˚c(2)c(0)
i1,k1 i2,k2 i0,k0 (l) (l) ˚c =˚c (1) (1) (2)ˆ (0) (2) (1)ˆ (2) (0) (il,kl) − W c c c − W c c c ⎛ ! ! ⎞ (l) (l) ! ! 2 c = c ! (1) !2 ! (1)ˆ ! (il,kl) δ ! ! − ⎝! c ! ! c ! ⎠ ! (2) ! + ! (2)ˆ ! . For the leader we have 2 c c (0) (1) (2) (0) (28) f(x, w)= W ˚c ˚c c . (24) i1,k1 i2,k2 i0,k0 4.3. Lagrange principle. Let us apply the Lagrange From principle (Poznyak, 2009) with the Lagrange function J(1) (1) (2)ˆ | (0) ≤ J(1) (1) (2)ˆ | (0) ˚c ,c c c ,c c , Lδ(x, v, w, λ) (2) (2) (1)ˆ (0) (2) (2) (1)ˆ (0) δ 2 J ˚c ,c |c ≤ J c ,c |c , = fδ(x, w)+λgδ(v, w, x) − λ 2 (0) (1) (2) (0) we have = W ˚c ˚c c i1,k1 i2,k2 i0,k0 l lˆ ϕl w ,w |x (1) (1) (2)ˆ (0) (2) (1)ˆ (2) (0) + λ W ˚c c c + W c ˚c c (1) (1) (2)ˆ (0) (2) (2) (1)ˆ (0) = J ˚c ,c |c + J ˚c ,c |c (1) (1) (2)ˆ (0) (2) (1)ˆ (2) (0) − W c c c − W c c c ⎛ ⎞ ! ! ! !2 ! (1) !2 ! (1)ˆ ! (1) (1) (2)ˆ (0) (2) (1)ˆ (2) (0) δ ⎝! c ! ! c ! ⎠ δ (0) 2 δ 2 = W ˚c c c + W c ˚c c , − λ ! (2) ! + ! ˆ ! + c − λ . 2 c ! c(2) ! 2 2 i1,k1 i2,k2 i0,k0 (25) 4.4. Extrapoximal method for Markov chains. The as well as procedure for solving the Stackelberg/Nash equilibrium l lˆ point for the extraproximal method consists of the ϕl v ,w |x following “iterative rules” implementation. (1) (1) (2)ˆ (0) (2) (2) (1)ˆ (0) = J c ,c |c + J c ,c |c 1. First half-step: (1) (1) (2)ˆ (0) (2) (1)ˆ (2) (0) 1.a. For = W c c c + W c c c . i1,k1 i2,k2 i0,k0 1 2 λn =argmin (λ − λn) − γLδ(xn,vn,wn,λ) , (26) λ≥0 2 Computing the Stackelberg/Nash equilibria using the extraproximal method. . . 343 we have (1) T (1) ⎡ − c c (n) c(2) c(2)(n) ⎣ (1) (1) (2)ˆ (0) λn = λn + γ W ˚c c (n)c (n) (1) (1) (2)ˆ (0) n i1,k1 i2,k2 i0,k0 + γλ W c c (n)c (n) i1,k1 i2,k2 i0,k0 (2) (1)ˆ (2) (0) (1) (1) (2)ˆ (0) ˆ + W c (n)c c (n) − W ˚c c (n)c (n) + W (2)c(1)(n)˚c(2)c(0)(n) (1) (1) (2)ˆ (0) − W c (n)c (n)c (n) ˆ − W (2)c(1)(n)˚c(2)c(0)(n) ˆ − W (2)c(1)(n)c(2)(n)c(0)(n) ⎛ ! ! ⎞⎤+ ! !2 ! !2 (0) (1) (2) (0) ! (1) ! ! (1)ˆ ! − γ W ˚c ˚c c (n) − δ ⎝! c (n) ! c (n) ⎠⎦ γ ! (2) ! + ! (2)ˆ ! c (n) ! ! i1,k1 i2,k2 i0,k0 2 c (n) ! ! ! (1) !2 δ (0) 2 1 ! c (n) ! 1 − γ c (n) + ! (2) ! × . 2 2 c (n) 1+γδ ⎫ ! !2 ! ˆ ! ⎬ δ ! c(1)(n) ! δ 2 +γλn ! ˆ ! + γ λn . 1.b. For 2 ! c(2)(n) ! 2 ⎭ 1 2 xn =argmin x − xn + γLδ(x, vn,wn, λn) , x∈X 2 1.d. For 1 2 we have wn =argmin w − wn − γLδ(xn,vn,w,λn) , w∈W 2 1 δ (0) 2 xn =argmin + γ c we have x∈X 2 2 ⎧ (0) (1) (2) (0) ! !2 ⎨ ! ˆ ! + γ W ˚c ˚c c 1 δ ! c(1) ! i1,k1 i2,k2 i0,k0 wn =argmin + γλn ! ˆ ! w∈W ⎩ 2 2 ! c(2) ! (1) (1) (2)ˆ (0) n + λ W ˚c c (n)c T (1)ˆ (1)ˆ (2) (1)ˆ (2) (0) (1) (1) (2)ˆ (0) c c (n) + W c (n)˚c c − W c (n)c (n)c − ˆ ˆ c(2) c(2)(n) − (2) (1)ˆ (2) (0) W c (n)c (n)c (1) (1) (2)ˆ (0) − γλn W ˚c c c (n) (0)T (0) 1 (0) 2 − c (n)c + c (n) i1,k1 i2,k2 i0,k0 ⎛ 2 ⎞ (2) (1)ˆ (2) (0) (1) (1) (2)ˆ (0) ! ! ! !2 + W c ˚c c (n) − W c (n)c c (n) ! (1) !2 ! (1)ˆ ! δ c (n) ! c (n) ! ˆ − ⎝! ! ! ! ⎠ − (2) (1) (2) (0) γλn ! (2) ! + ! (2)ˆ ! W c c (n)c (n) 2 c (n) c (n) − (0) (1) (2) (0) δ 2 γ W ˚c ˚c c (n) − γ λn } . i1,k1 i2,k2 i0,k0 2 ! !2 ! (1)ˆ ! δ (0) 2 1 ! c (n) ! − γ c (n) + ! ˆ ! 2 2 ! c(2)(n) ! 1.c. For ! ! * ! (1) !2 δ ! c (n) ! δ 2 + γλn ! (2) ! + γ λn . 1 2 2 c (n) 2 vn =argmin v − vn − γLδ(xn,v,wn, λn) , v∈V 2 we have 2. Second half-step: 2.a. For & ! !2 1 δ ! (1) ! ! c ! 1 2 vn =argmin + γλn ! (2) ! λn+1 =argmin (λ − λn) − γLδ(xn, vn, wn,λ) , v∈V 2 2 c λ≥0 2 344 K.K. Trejo et al. (1) T (1) we have c c (n) − (2) (2) ⎡ c c (n) (1) (1) (2)ˆ (0) ⎣ (1) (1) (2)ˆ (0) + γλn W c c (n)c (n) λn+1 = λn + γ W ˚c c (n)c (n) i1,k1 i2,k2 i0,k0 i1,k1 i2,k2 i0,k0 (2) (1)ˆ (2) (0) (1) (1) (2)ˆ (0) (2) (1)ˆ (2) (0) + W c (n)c c (n) − W ˚c c (n)c (n) + W c (n)˚c c (n) ˆ ˆ − (2) (1) (2) (0) − W (1)c(1)(n)c(2)(n)c(0)(n) W c (n)˚c c (n) ˆ δ − W (2)c(1)(n)c(2)(n)c(0)(n) − γ W (0)˚c(1)˚c(2)c(0)(n) − γ c(0)(n)2 2 ⎛ ! ! ⎞⎤+ i1,k1 i2,k2 i0,k0 ! !2 2 ⎫ ! (1) ! ! (1)ˆ ! ! ! ! !2 δ c (n) ! c (n) ! 2 ! ˆ ! ⎬ − ⎝! ! ⎠⎦ ! (1) ! (1) 2 γ ! (2) ! + ! (2)ˆ ! 1 ! c (n) ! δ ! c (n) ! δ 2 c (n) ! c (n) ! + ! (2) ! + γλn ! ˆ ! + γ λn . 2 c (n) 2 ! c(2)(n) ! 2 ⎭ 1 × . 1+γδ 2.d. For 1 2 2.b. For wn+1 =argmin w − wn − γLδ(xn, vn,w,λn) , w∈W 2 1 2 xn+1 =argmin x − xn + γLδ(x, vn, wn, λn) , we have x∈X 2 ⎧ ! !2 ⎨ ! ˆ ! 1 δ ! c(1) ! we obtain wn+1 =argmin + γλn ! ˆ ! w∈W ⎩ 2 2 ! c(2) !
xn+1 T (1)ˆ (1)ˆ 1 δ (0) 2 − c c (n) =argmin + γ c (2)ˆ (2)ˆ x∈X 2 2 c c (n) (0) (1) (2) (0) (1) (1) (2)ˆ (0) + γ W ˚c ˚c c − γλn W ˚c c c (n) i ,k i ,k i ,k 1 1 2 2 0 0 i1,k1 i2,k2 i0,k0 (1) (1) (2)ˆ (0) (2) (1)ˆ (2) (0) + λn W ˚c c (n)c + W c(i1,k1)˚c c (n) ˆ ˆ (1) (1) (2)ˆ (0) + W (2)c(1)(n)˚c(2)c(0) − W (1)c(1)(n)c(2)(n)c(0) − W c (n)c c (n) ˆ (2) (1)ˆ (2) (0) − W (2)c(1)(n)c(2)(n)c(0) − W c c (n)c (n) (0) (1) (2) (0) (0)T (0) 1 (0) 2 − − c (n)c + c (n) γ W ˚c ˚c c (n) ⎛ 2 ⎞ i1,k1 i2,k2 i0,k0 ! ! ! !2 ! ! ! (1) !2 ! (1)ˆ ! ! !2 δ ⎝! c (n) ! ! c (n) ! ⎠ ! (1)ˆ ! − γλn + ! ! δ (0) 2 1 c (n) ! (2) ! ! (2)ˆ ! − γ c (n) + ! ˆ ! 2 c (n) c (n) 2 2 ! c(2)(n) ! ! ! * δ 2 ! (1) !2 −γ λ . δ ! c (n) ! δ 2 n +γλn ! (2) ! + γ λn . 2 2 c (n) 2
2.c. For 4.5. Quadratic programming solver. Let us restrict the expression (3) for the admissible set and the stationary 1 2 n+1 − n − Lδ n n n case with the following constraint. v =argmin v v γ (x ,v,w , λ ) , (l) v∈V 2 d Each matrix (il|kl) represents a stationary mixed-strategy that belongs to the simplex we get ⎧ (l) (l) ⎪ NlMl ⎪ d(i |k ) ∈ R for d(i |k ) ≥ 0, vn+1 ⎨⎪ l l l l & (N M ) ! !2 l l ! (1) ! S := M 1 δ ! c ! ⎪ l (l) =argmin + γλn ! (2) ! ⎩⎪ where d =1. v∈V 2 2 c (il|kl) kl=1 Computing the Stackelberg/Nash equilibria using the extraproximal method. . . 345 (l) Developing the formulas of (31) for N players and A necessary and sufficient condition for d(il|kl) to be a Stackelberg/Nash equilibrium point is the solution of the (l) multiplying by c(il|kl) for each component (31), we have following QPP (quadratic programming problem): (l) (l) l 0 1 N − (j ,i ) J → BL = π(jl,il|kl) δ l l d(i0,k0),d(i1,k1),...,d(iN ,kN ) min jl=1,Nl il=1,Nl d(l) (29) ⎡ ⎤ (i |k ) (l) (l) l l π − 1 ... π ⎢ (1,1|1) (Nl,1|Ml) ⎥ subject to ⎣ ⎦ (l) (N M ) = ...... , d ∈ S l l . (l) (l) (il|kl) π ... π − 1 (Nl,1|1) (Nl,Nl|Ml) We consider the problem 1 where δ(j ,i ) is Kronecker’s delta. Then A ∈ X HX + C X → min , l l eq 2 X N N +(N ) × N N ×M R( l=1 l ) ( l=1 l l) (30) is defined as follows: ⎡ ⎤ AX ≤ b, AX = b, BL(1) 0 ... 0 ⎢ (2) ⎥ l Nl+1×(NlMl) l Nl+1 ⎢ 0 BL ... 0 ⎥ where 0 ≤ X ≤ 1,A ∈ R and b ∈ R . ⎢ ⎥ T ⎢...... ⎥ ⎢ (M )⎥ The vector C = C(N|M) is defined as ⎢ 00... BL l ⎥ Aeq = ⎢ ⎥ , l ⎢ [1] 0 ... 0 ⎥ C(1|1) = W(1|1), ⎢ ⎥ l ⎢ 0[1]... 0 ⎥ C(2|1) = W(2|1), ⎣ 00... 0 ⎦ . 000[1] . l where C(N|1) = W(N|1), [1] = (1,...,1) ∈ R(Nl) . . and N N +(N ) l ( l=1 l ) beq ∈ R C(1|Ml) = W(1|Ml), . is . beq = 0 ... 01... 1 . l C(Nl|Ml) = W(Nl|Ml) l 5. Convergence analysis that is, C(N|M) is equal to W(i1,k1,...,iN ,kN ) ordered by columns. 5.1. Auxiliary results. Let us prove the following l The vector b is defined as auxiliary results.
l T Nl+1 b =(0,...,0, 1) ∈ R . Lemma 2. Let f(z) be a convex function defined on a Nl times convex set Z.Ifz∗is a minimizer of function We will construct the matrix Al ∈ RNl×(NlMl) using 1 2 the ergodicity constraints defined in (5), ϕ(z)= z − x +αf(z) (32) 2 Ml Nl (l) (l) (i j |k ) − 0= π l l l c(il|kl) c(jl|kl) . (31) on Z with fixed x,thenf(z) satisfies the inequality kl=1 il=1 1 ∗ 2 ∗ Then, we have z −x +αf(z ) 2 Ml Nl (l) (l) 1 2 1 ∗ 2 jl =1 π(i j |k )c − c =0, ≤ z − x +αf(z)− z − z . (33) l l l (il|kl) (1|kl) 2 2 kl=1 il=1 Proof. By the necessary condition for a minimum at z∗, ∗ ∗ ∗ Ml Nl − ∇ − ≥ (l) (l) z x + α f(z ),z z 0, l (i j |k ) − j =2 π l l l c(il|kl) c(2|kl) =0, kl=1 il=1 and the convexity property of f(z), expressed as ∗ ∗ ∗ f(z) ≥ f(z )+ ∇f(z ),z− z , . . we derive Ml Nl ∗ ∗ ∗ (l) − (l) 0 ≤ z −x + α∇f(z ),z− z jl = Nl π(iljl|kl)c(i |k ) c(N |k ) =0. l l l l ∗− − ∗ − ∗ kl=1 il=1 = z x, z z + α [f(z) f(z )] . 346 K.K. Trejo et al.
Using this inequality in the identity Proof. Taking in (33) α = γ and ∗ 1 2 z =˜w, x =˜vn,z=ˆvn, z − x ∗ 2 f(z)=Ψδ(˜w, v˜n),f(z )=Ψδ(ˆvn, v˜n), 1 ∗ ∗ 1 ∗ 2 = z − z 2+ z −x, z − z∗ + z −x , we obtain 2 2 1 − 2 we get (33). The lemma is proven. vˆn vn +γΨδ(ˆvn, v˜n) 2 (37) Lemma 3. If all partial derivatives of L˜δ(˜x, y˜) satisfy 1 2 1 2 ≤ w˜ − v˜n +γΨδ(˜w, v˜n)− w˜ − vˆn . the Lipschitz condition with positive constant C0, then the 2 2 following Lipschitz-type condition holds: Again setting in (33) α = γ and ∗ δ − z =˜w, x =˜vn,z=˜vn+1, [Ψ (˜w + h, v˜ + g) Ψδ(˜w, v˜ + g)] ∗ f(z)=Ψδ(˜w, vˆn),f(z )=Ψδ(˜vn+1, vˆn), − [Ψδ(˜w + h, v˜) − Ψδ(˜w, v˜)]≤C0hg (34) we get valid for any w,˜ h, v,˜ g ∈ X˜ × Y˜ . 1 2 v˜ −v˜n +γΨδ(˜v , vˆn) Proof. By Lagrange’s formula 2 n+1 n+1 1 1 1 ≤ w˜ − v˜ 2+γΨ (˜w, vˆ )− w˜ − v˜ 2. f(x + h) − f(x)= ∇f(x + th),h dt, 2 n δ n 2 n+1 0 (38) we have Selecting w˜ =˜vn+1 in (37) and w˜ =ˆvn in (38), we obtain [Ψδ(˜w + h, v˜ + g) − Ψδ(˜w, v˜ + g)] 1 2 − − vˆ −v˜n +γΨδ(ˆv , v˜n) [Ψδ(˜w + h, v˜) Ψδ(˜w, v˜)] 2 n n 1 ≤ 1 − 2 = ∇Ψδ(˜w + th, v˜ + g) −∇Ψδ(˜w + th, v˜),h dt v˜n+1 v˜n +γΨδ(˜vn+1, v˜n) (39) 0 2 1 1 2 − v˜n+1−vˆn , ≤ | ∇Ψδ(˜w + th, v˜ + g) −∇Ψδ(˜w + th, v˜),h | dt 2 0 1 1 2 v˜n+1−v˜n +γΨδ(˜vn+1, vˆn) ≤ C0hg dt ≤ C0hk, 2 0 1 2 1 2 ≤ vˆ −v˜n +γΨ (ˆv , vˆ )− vˆ −v˜n+1 . which proves the lemma. 2 n δ n n 2 n (40) 5.2. Main convergence theorem. The following Adding (39) with (40) and using (34) for theorem presents the convergence conditions of (17)–(19) w˜ + h =˜vn+1, w˜ =ˆvn, v˜ + g =˜v , v˜ =ˆvn, and gives an estimate of its rate of convergence. n h =˜vn+1−vˆn,g=˜vn−vˆn, Theorem 1. Assume that the problem (16) has a solution. we finally conclude that Let L˜δ(˜x, y˜) be differentiable in x˜ and y˜, whose partial 2 derivative with respect to y˜ satisfies the Lipschitz condi- v˜n+1−vˆn ≤ γ[Ψδ(˜vn+1, v˜n) − Ψδ(ˆvn, v˜n)] tion with positive constant C. Then, for any δ ∈ (0, 1), − γ[Ψ (˜v , vˆ ) − Ψ (ˆv , vˆ )] there exists a small enough δ n+1 n δ n n & * ≤ γCv˜n+1−v nv˜n−vˆn, 2 √ 1 1+ 1+2C0 γ0= γ0(δ)