<<

From Competition to Coopetition: Stackelberg Equilibrium in Multi-user Power Control Games

Yi Su and Mihaela van der Schaar Department of Electrical Engineering, UCLA

Abstract— This paper considers the problem of how to centralized DSM algorithms, the Optimal Spectrum Balanc- allocate power among competing users sharing a frequency- se- ing (OSB) algorithm and the Iterative Spectrum Balancing lective interference channel. We model the interaction between (ISB) algorithm, were proposed to solve the problem of these selfish users as a non-cooperative game. We study how a foresighted user, who knows the channel state information maximization of a weighted rate-sum across all users [4] and response strategies of its competing users, should optimize [5]. OSB has an exponential complexity in the number of its own transmission . To characterize this multiuser users. ISB only has a quadratic complexity in the num- interaction, the Stackelberg equilibrium is introduced. We start ber of users because it implements the optimization in by analyzing in detail a simple two-user scenario, where the an iterative fashion. An autonomous spectrum balancing foresighted user can determine its optimal transmission strategy by solving a bi-level program which allows him to account for (ASB) technique is proposed to achieve near-optimal perfor- the myopic user’s response strategies. Therefore, the competi- mance autonomously, without real-time explicit information tion among users is transformed into a cooperative competition exchanges [6]. These works focus on cooperative games, (coopetition) since the foresighted user will avoid interfering because it is well-known that the IW algorithm may lead to the myopic user. Since the optimal solution is computationally Pareto-inefficient solutions [7], i.e. selfishness is detrimental prohibitive, we propose a low-complexity algorithm based on Lagrangian duality theory. Numerical simulations illustrate in the interference channel. that, if a foresighted user has the necessary information about In short, previous research mainly concentrates on study- its competitor, the resulting coopetition will benefit both users. ing the existence and performance of in Possible methods to acquire the required information and non-cooperative games and developing efficient algorithms to extend the formulation to more than two users are also to approach the Pareto boundary in cooperative games. How- discussed. ever, an important intrinsic dimension of this decentralized multi-user interaction still remains unexplored. Prior research I.INTRODUCTION does not consider the users’ availability of information about The multi-user power control problem in frequency- other users and their potential to improve their performance selective interference channels was investigated from the when having this information. Hence, determining what game-theoretic perspective in several prior works, including is the strategy of a selfish user if it has [1]- [6]. In these multi-user wideband power control games, the information about how the competing users respond to users are modeled as players having individual goals and interference still needs to be determined. Moreover, it still strategies. They are competing or cooperating with each needs to be established if such strategies can lead to a better other until they agree on an acceptable resource allocation performance than adopting the IW algorithm. It is important . Existing research can be categorized into two types, to look at these scenarios in order to assess the significance non-cooperative games and cooperative games. of information availability in terms of its impact on the users’ First, the formulation of the multi-user wideband power performance in non-cooperative games, and show why selfish control problem as a non-cooperative game has appeared users have incentives to learn their environment and adapt in several recent works [1] [2]. An iterative water-filling their rational response strategies [8]. Intuitively, a “clever” (IW) algorithm was proposed to mitigate the mutual inter- user with more information in this non-cooperative game ference and optimize the performance without the need for should be able to gain additional benefits [9]. a central controller [1]. At every decision stage, selfish users Throughout this paper, we differentiate two types of selfish deploying this algorithm try to maximize their achievable users based on their response strategies: rates by water-filling across the whole frequency band until 1) Myopic user: A user that always acts to maximize its a Nash equilibrium is reached. Alternatively, self-enforcing immediate achievable rate. It is myopic in the sense that it protocols are studied in the non-cooperative scenario, in treats other users’ actions as fixed, ignores the dependence which incentive compatible allocations are guaranteed [2]. between its competitors’ actions and its own action, and By imposing punishments in the case of misbehavior and determines its response such that maximize its immediate enforcing users to cooperate, efficient, fair, and incentive payoff. compatible spectrum sharing is shown to be possible. 2) Foresighted user: A user that selects its transmis- Second, there also have been a number of related works sion action by considering the long-term impacts on its studying dynamic spectrum management (DSM) in the set- performance. It anticipates how the others will react, and ting of cooperative games [3]- [6]. Two (near-) optimal but maximizes its performance by considering their reactions. It f as Hij, where f = 1, 2, ··· ,N. Similarly, denote the noise power spectral density (PSD) that receiver k experiences as f f σk and player k’s transmit PSD as Pk . For user k, the transmit PSD is subject to its power constraint: XN f max Pk ≤ Pk . (1) f=1

Fig. 1. Gaussian interference channel model. 1 2 N Define Pk = {Pk ,Pk , ··· ,Pk } as user k’s power allocation pattern. For a fixed Pk, if treating interference as noise, user should be highlighted that additional information is required k can achieve the following data rate: Ã ! to assist the foresighted user in its decision making. XN f f 2 Pk |Hkk| As opposed to previous approaches considering myopic Rk = log 1 + P . (2) 2 f f f 2 users [1], the best response strategy of a user that knows f=1 σk + j6=k Pj |Hjk| its myopic opponents’ private information, including their To fully capture the performance tradeoff in the system, channel state information and power constraints, was in- the concept of a rate region is defined as vestigated in [11] using the Stackelberg equilibrium (SE) formulation. The foresighted behavior was formulated as a R = {(R1, ··· ,RK ): ∃ (P1, ··· , PK ) satisfying (1) and (2)} . bi-level programming problem. It was shown that surpris- (3) ingly, a foresighted user playing the SE can improve both Due to the non-convexity in the capacity expression as a its performance as well as the performance of all the other function of power allocations, the computational complexity users. However, the solution proposed in [11] is heuristic, and of optimal solutions (e.g., doing exhaustive search) in finding it was derived by simply examining the necessary optimality the rate region is prohibitively high. Existing works [4]- [6] conditions. aim to compute the Pareto boundary of this rate region and In this paper, we analyze the computational complexity provide (near-) optimal performance with moderate complex- of the Stackelberg equilibrium, and show that the optimal ity. Moreover, it is noted that cooperation among users is solution is computationally intractable. Inspired by the ISB indispensable for this multi-user system to operate at the algorithm [5], we provide a low-complexity algorithm based Pareto boundary. On the other hand, the interference channel on Lagrangian duality theory. We also discuss how the strate- can also be modeled as a non-cooperative game among gic users can obtain the required information and how the multiple competing users. Instead of solving the optimization problem can be extended to the general multi-user scenario. problem globally, the IW algorithm models the users as my- The rest of the paper is organized as follows. Section II opic decision makers [1]. This means that they optimize their presents the non-cooperative game model and introduces the transmit PSD by water-filling and compete to increase their concept of Stackelberg equilibrium. In Section III, using a transmission data rates with the sole objective of maximizing simple two-user example, we formulate the foresighted user’s their own performance regardless of the coupling among optimal decision making as a bi-level programming problem users. Under a wide range of realistic channel conditions and discuss the computational complexity of its optimal [1] [14], the existence and uniqueness of the competitive solution. Section IV proposes a low-complexity dual-based optimal point (Nash equilibrium) is demonstrated and it approach and provides the simulation results. Section IV also can be obtained by the IW algorithm, which significantly discusses how the required information can be obtained by outperforms the static spectrum management algorithms. the strategic users and the problem formulation in general Throughout this paper, we also concentrate on the non- multi-user case. Conclusions are drawn in Section V. cooperative game setting. In the IW algorithm, users are as- sumed to be myopic, i.e., they update actions shortsightedly II.SYSTEM MODEL without considering the long-term impacts of taking these In this section, we describe the mathematical model of actions. We argue that the myopic behavior can be further the frequency-selective interference channel and formulate improved because it neglects the coupling nature of players’ the non-cooperative multi-user power control game. We actions and payoffs. In contrast with previous approaches, we introduce the concept of Stackelberg equilibrium and prove study the problem of how a foresighted user should behave the existence of this equilibrium in the power control game. rather than taking myopic actions. This investigation provides us some insights to the following question: why should a A. System Description strategic user sense its environment and learn the response Fig. 1 illustrates a frequency-selective Gaussian inter- strategies of its competitors and consequently, what is the ference channel model. There are K transmitters and K benefit that a foresighted user can achieve compared with receivers in the system. Each transmitter and receiver pair the myopic case? can be viewed as a player (or user). The whole frequency To illustrate the foresighted behavior, Fig. 2 shows a band is divided into N frequency bins. In frequency bin f, simple Stackelberg game [12]. Note that in this game, the row the channel gain from transmitter i to receiver j is denoted player has a strictly dominant strategy [13], Down. Therefore, Left Right NE (ak) = a−k, ∀ai = BRi (a−i) , ai ∈ Ai, i 6= k. The Up ∗ ∗ 1, 0 3, 2 strategy profile (ak,NE (ak)) is a Stackelberg equilibrium with user k leading iff Down 2, 1 4, 0 ∗ ∗ Uk (ak,NE (ak)) ≥ Uk (ak,NE (ak)) , ∀ak ∈ Ak. (7) Fig. 2. Stackelberg game: the row player’s payoff is given first in each cell, with the column player’s payoff following. The existence of Stackelberg equilibrium in the power control game has been shown in [11]. Specifically, for the two-user game, the SE strategy can be derived by solving two players will end up with a (Down, Left) play if the row a bi-level program. However, the algorithm proposed in player is myopic. However, if the row player is aware of [11] to find the SE is heuristic and it cannot be easily the column player’s coupled reaction, they will end up with extended to scenario when multiple myopic users are present a (Up, Right) play, which leads to an increased payoff for in the network. The following section will revisit the bi-level both players. It is worth noticing that additional information formulation and analyze the properties and computational is needed to attain this performance improvement. The row complexity of its optimal solution based on Lagrangian player needs to know the payoff and the response strategy duality theory. of the column player. To formulate how a strategic user can take foresighted actions, we introduce the concept of III.PROBLEM FORMULATION Stackelberg equilibrium. The next subsection will define the In this section, we study how to achieve the Stackelberg Stackelberg equilibrium and show its existence in the power equilibrium in the two-user case, and formulate the fore- control game. sighted behavior as a bi-level programming problem. We an- B. Stackelberg Equilibrium alyze the computational complexity of the optimal solution, and show that the optimum is computationally intractable Let G = [K, {A } ,U ] represent a game where K = k k for the bi-level program. We start from the simplest two- {1, ··· ,K} is the set of players, A is the set of actions k user version, because it is illustrative for understanding the available to user k, and U is the user k’s payoff [13]. In the k interactions emerging among competing users. The extension power control game, user k’s payoff U is the its achievable k to the multi-user case will be discussed in Section IV. data rate Rk and its action set Ak is the set of transmit PSDs satisfying constraint (1). Recall that the Nash equilibrium is A. A Bi-level Programming Formulation defined to be any (a∗, ··· , a∗ ) satisfying 1 K The Stackelberg equilibrium applied to the two-user power ¡ ∗ ∗ ¢ ¡ ∗ ¢ Uk ak, a−k ≥ Uk ak, a−k (4) control game can be represented by a bi-level mathematical problem [15], in which the foresighted user acts as the for all a ∈ A and k = 1, ··· ,K, where a∗ = k k −k leader and the other user behaves as the follower. The (a∗, ··· , a∗ , a∗ , ··· , a∗ ) [13]. 1 k−1 k+1 K leader chooses a transmit PSD to maximize its own benefits We also define the action a∗ to be a best response (BR) k by considering the response of its follower, who reacts to to actions a if −k the leader’s transmit PSD by water-filling over the entire ∗ Uk (ak, a−k) ≥ Uk (ak, a−k) , ∀ ak ∈ Ak. (5) frequency band. Hence, the Stackelberg equilibrium can be found by solving the following optimization problem: The set of user k’s best response to a−k is denoted as  Ã ! BR (a ).  XN f k −k  P1 The Stackelberg equilibrium is a orig- upper−  max log2 1 + f f f (a) P1 N + α P inally defined for the cases where a hierarchy of actions level f=1 1 2 2  XN exists between users [13]. Only one player is the leader and problem  f max f s.t. P1 ≤ P1 ,P1 ≥ 0, (b) the other ones are followers. The leader begins the game  f=1 Ã !  XN 0f by announcing its action. Then, the followers react to the  P2 lower−  P2 = arg max log2 1 + (c) leader’s action. The Stackelberg equilibrium prescribes an P0 f f f level 2 f=1 N2 + α1 P1  optimal strategy for the leader if its followers always react  0 XN 0 problem  f f max by playing their Nash equilibrium strategies in the smaller s.t. P ≥ 0, P ≤ P2 , (d) 2 f=1 2 sub-game. For example, in a two player game, where user 1 (8) ∗ f f f 2 f f 2 f 2 f is the leader and user 2 is the follower, an action a1 is the where N1 = σ1 /|H11| , α1 = |H12| /|H22| ,N2 = f f 2 f f 2 f 2 Stackelberg equilibrium strategy for user 1 if σ2 /|H22| , α2 = |H21| /|H11| . The sub-problem in (8.a)- (8.b) is called the upper-level problem and (8.c)-(8.d) cor- U (a∗,BR (a∗)) ≥ U (a ,BR (a )) , ∀a ∈ A . (6) 1 1 2 1 1 1 2 1 1 1 responds to the lower-level problem. Recall that additional For example, in Fig. 2, Up is the Stackelberg equilibrium information is indispensable to formulate this bi-level pro- strategy for the row player. gram. This information includes the other user’s channel f f max Next, we define Stackelberg equilibrium in the general condition N2 and α2 , maximum power constraint P2 , and case. Let NE(ak) be the Nash equilibrium strategy of its response strategy, i.e. the IW algorithm. By letting P1 and NE the remaining players if player k chooses to play ak, i.e. P2 to be the transmit PSDs of the IW algorithm P1 and NE P2 , we can see that the Nash equilibrium actually gives a B. An Exact Single-level Reformulation lower bound of the problem in (8). Furthermore, by including Bi-level programming problems belong to the mathemati- the opponent’s reaction into the lower-level problem, the user cal programs having optimization problems as constraints. It can avoid the myopic IW approach and potentially improve is well-known they are intrinsically difficult to solve [15]. To its performance. In addition, as we will show later, user 1’s understand the computational complexity, we first transform foresightedness turns out to even improve the myopic user’s the original bi-level program into a single-level reformulation performance. Now we make several illustrative remarks by with the form of showing two simple examples. Ã ! XN P f Remark 1: The Nash equilibrium achieved by the IW max log 1 + 1 2 f f f max algorithm may not solve the bi-level program (8). In other P1 f=1 N1 + α2 g2 (P1, N2,αα1, P2 ) (9) words, there exist other feasible power allocation schemes XN f max f that can attain strictly better performance than that of the s.t. P ≤ P1 ,P ≥ 0, f=1 1 1 Nash equilibrium. f max Example 1: We consider a two-user system with the in which g2 (P1, N2,αα1, P2 ) is a function that deter- 1 2 2 1 f mines user 2’s allocated power in the fth channel, N = parameters N = 2,N1 = N2 = 4,N1 = N2 = 1, αi = 0.5 © ª © ª 2 max max N 1,N 2, ··· ,N N , and α = α1, α2, ··· , αN . for ∀i, f, P1 = P2 = 10. In this simple two-channel 2 2 2 1 1 1 1 scenario, it is easy to derive that R = log [1 + P 1/(8.5 − Note that the lower-level problem in (8) is a standard 1 2 1 f 0.25P 1)]+log [1+(10−P 1)/(1.5+0.25P 1)] bits. Because convex programming problem. Its optimum is given by P2 = 1 2 1 1 ³ ´+ ∂R1 1 f max f f f 1 < 0, R1 is maximized when P1 = 0. The achievable g (P , N ,αα , P ) = K2 − N − α P , where K2 ∂P1 2 1 2 1 2 2 1 1 rates attained at the Stackelberg equilibrium is RSE ≈ 2.939 PN f max 1 is a constant that satisfies P = P . In practice, K2 SE f=1 2 2 bits and R2 ≈ 3.474 bits. The unique Nash equilibrium is usually obtained using numerical (e.g. bisection) meth- NE NE f is reached by P1 = {2, 8} and P2 = {8, 2} and its ods. In fact, an explicit expression of g (P , N ,αα , Pmax) NE NE 2 1 2 1 2 achievable rates are R1 = R2 ≈ 2.645 bits. is needed to analytically handle single-level formulation. Remark 2: For some channel realizations, the Nash strat- Towards this end, we first define a permutation π : f egy solves the problem (8). If αi = 0 for ∀i, f, the upper- {1, 2, ··· ,N} → {1, 2, ··· ,N}, which ranks all the chan- level and lower-level problems in bi-level program (8) are nels based on their noise plus interference PSDs and satisfies reduced to two uncoupled problems and the single user f1 f1 f1 f2 f2 f2 water-filling solution can achieve the upper bound in (??). π (f1) < π (f2) , if N2 +α1 P1 < N2 +α1 P1 . (10) f In addition, we give a non-trivial example in which αi 6= 0 Then, we can extend the results in [17], and have the closed- for ∀i, f and the Nash strategy still solves the problem in form expression in equation (11) , where k can be found (8). according to the condition in inequality (12). We can see Example 2: Set the parameters N 1,N 2 in Example 1 to be f max 1 2 that function g2 (P1, N2,αα1, P2 ) ranks all the frequency 6, and keep the remaining ones unchanged. We have R1 = channels based on the channel conditions and gradually 1 1 1 1 log2[1+P1 /(11−0.25P1 )]+log2[1+(10−P1 )/(1+0.25P1 )] increases the water-level until the maximal power constraint bits. In this channel realization, the Nash equilibrium coin- is satisfied. cides with the Stackelberg equilibrium. Both equilibria are Even though we have the closed-form expression of reached at PNE = {0, 10} and PNE = {10, 0} and the f max 1 2 g2 (P1, N2,αα1, P2 ), the single-level problem (9) is still resulting rates are R1 = R2 ≈ 3.460 bits. intractable due to its non-convexity. Generally speaking, the Remark 3: As opposed to the narrow-band case [16], global optimum can only be found via an exhaustive search. we would like to highlight that the degrees of freedom in If we define the granularity in the foresighted user’s transmit allocating the power across multiple bands is essential for f power as ∆P , then the value of P1 can be limited to max the foresighted user to improve its performance. Consider the set {0, ∆P , ··· , P1 } . By searching all the possible the single-band case in which N = 1. Note that user i’s combinations, the optimum¡ can be¢ found. Hence, such an 1 N achievable rate Ri is monotonically increasing in its trans- exhaustive search in P1 , ··· ,P1 has a overall complexity max N mitted power Pi. If users selfishly maximize their achievable of O((P1 /∆P ) ). rates, all of them will transmit at their maximum power in the Recently, Lagrangian duality theory has been successfully single band, which results in the unique Nash equilibrium. used to solve non-convex weighted sum-rate maximization in It is easy to check that it is also the unique Stackelberg interference channel with moderate computational complex- equilibrium and it is also Pareto efficient. ity [4]- [6]. We notice that the problem in (9) are similar Although these examples provide us some intuition about with the problems investigated in these works in that the the relationship between NE and SE, we are still interested optimization variables P1 also appear in the denominators in computing the Stackelberg equilibrium in general sce- of the objective function. The following sections will revisit narios. The following subsection will reformulate the bi- these dual approaches and show that these methods cannot level program into a single-level problem, which helps us to reduce the computational complexity of problem (9), thereby understand the computational complexity of the Stackelberg demonstrating the challenges involved in optimally comput- equilibrium in the multi-user power control games. ing the Stackelberg equilibrium.  µ ³ ´¶  Pk −1 −1 −1 1 Pmax + N π (m) + απ (m)P π (m) − N f − αf P f , π (f) ≤ k, gf (P , N ,αα , Pmax) = k 2 2 1 1 2 1 1 (11) 2 1 2 1 2  m=1 0, π (f) > k,

³ ´ Xk ³ ´ π−1(k) π−1(k) π−1(k) π−1(m) π−1(m) π−1(m) max k N2 + α1 P1 − N2 + α1 P1 < P2 ≤ m=1 (12) ³ ´ kX+1 ³ ´ π−1(k+1) π−1(k+1) π−1(k+1) π−1(m) π−1(m) π−1(m) (k + 1) N2 + α1 P1 − N2 + α1 P1 . m=1

à ! à ! XN f XN f P1 P2 max ω log2 1 + + (1 − ω) log2 1 + P ,P f f f f f f 1 2 f=1 N1 + α2 P2 f=1 N2 + α1 P1 (13) XN XN f max f f max f s.t. P ≤ P1 ,P ≥ 0, P ≤ P2 ,P ≥ 0, f=1 1 1 f=1 2 2

( Ã ! Ã ! ) XN f f P1 P2 f f L (P1, P2, λ1, λ2) = ω log2 1 + f f f + (1 − ω) log2 1 + f f f − λ1P1 − λ2P2 , (14) f=1 N1 + α2 P2 N2 + α1 P1

C. Lagrangian Dual Approach for Non-convex Problems update over λ1, λ2 will converge to the dual optimum. Third, it is also proven that, if the number of frequency bins N is We continue studying the simple two-user scenario to f f introduce the dual method. In a two-user frequency- selective large enough and Hij and σk are smooth in the spectral interference channel, the weighted sum-rate maximization domain, the optimization problem (13) satisfies the so-called investigated in [4]- [6] is given by problem (13), in which “time-sharing property” (Theorem 1 and 2, in [5]), and the ω ∈ [0, 1] is a fixed weight. The dual method forms duality gap of this non-convex problem is zero. Combining the three properties together, the dual approach can find the Lagrangian in equation (14), where λ1, λ2 ≥ 0 are Lagrangian dual variables. The Lagrangian dual function is the global optimum with the computational complexity of Q max defined as O(T1N i(Pi /∆P )), where T1 is the number of iterations needed for dual-update. We can see that the complexity of D (λ1, λ2) = max L (P1, P2, λ1, λ2) . (15) the dual approach is greatly reduced compared with that of P ,P Â0 1 2 − the exhaustive search in the primal domain. In addition, it is found in [5] that, if D (λ1, λ2) is approximated using a Denote the objective function of problem (13) as f (P1, P2) and the overall complexity of exhaustive search is local maximum of L (P1, P2, λ1, λ2), the ISB algorithm can Q max N O(( (P /∆P )) ). From optimization theory [18], we achieve near-optimal performance with the computational i i P max complexity of O(T1T2N (P /∆P )), where T2 is the know that, for arbitrary feasible P1, P2, we have f (P1, P2) ≤ i i number of iterations required for evaluating the local maxi- D (λ1, λ2) . This leads to min D (λ1, λ2) ≥ max f (P1, P2) λ ,λ P ,P 1 2 1 2 mum. , and min D (λ1, λ2) provides an upper bound of the op- λ1,λ2 timal value of the problem in (13). Generally speaking, if f (P1, P2) is non-convex, the duality gap min D (λ1, λ2) − λ1,λ2 D. The Lagrangian Dual Approach for Computing Stackel- max f (P1, P2) is not zero. berg Equilibrium P1,P2 Fig. 3 summarizes the three key steps of a dual method, the OSB algorithm [4] [5], that can efficiently find the global Now we apply the dual approach for our problem in (9) to understand why the Stackelberg equilibrium in our optimum of the problem in (13). First, for fixed λ1, λ2, the considered problem is intrinsically difficult to compute. Fig. maximization of L (P1, P2, λ1, λ2) over P1, P2 in (15) is de- composed into N uncoupled sub-problems, and each of them 4 summarizes the key properties of the dual approach that corresponds to a per-bin optimization. Therefore, the overall will be addressed in the following parts. Denote the objective function of problem (9) as f 0 (P ) . Consider its dual complexityQ of maximizing L (P1, P2, λ1, λ2) over P1, P2 is 1 max objective function D0 (µ) for a fixed dual variable µ: only O(N i(Pi /∆P )). Second, it is shown that, for fixed k, the sum power of user k’s optimal power allocation in a multicarrier system is a monotonic function of λk(Lemma 0 0 D (µ) = max L (P1, µ) , (16) P Â0 1, in [4]). This property guarantees that the bi-section dual 1 − max 2 max 2N O(())N P1 ∆P O((P1 ∆P ) ) maxf (PP12 , ) minD (λλ12 , ) maxL(PP1212 , ,λλ , ) L (PP1212,,,λλ) PP12, λλ12, PP12, Fig. 3. Key steps of the dual approach of non-convex weighted sum-rate maximization.

Fig. 4. Complexity and properties of the dual approach of computing the Stackelberg equilibrium.

f ′(P∗ ) 1 µ axis value of the tangent point monotonically increases. We denote µ∗ = arg min D0 (µ). Recall that Lemma 1 D′(µ) µ∗ µ PN f max does not claim the continuity of f=1 P1 (µ) in µ. It is R1 ′ because the allocated powers in different frequency bins are min D (µ) f max µ Pmax − N Pˆ∗f µ coupled due to function g (P , N ,αα , P ) and the time- ( 111 ∑ f =1 1 ) 2 1 2 1 2 sharing property in [5] is not guaranteed for problem (9). The discontinuity may lead to nonzero duality gap, i.e. at least max f ′(P ) 1 two tangent points exist on the tangent line in Fig. 5 and P1 N f ′ Pˆ ∗ P∗f x y ( 1 ) ∑ 1 f =1 they correspond to different power constraints P1 and P1. If N ˆ∗f xxx max yyy P P111 P111 P1 the duality gap is positive, the following theorem indicates ∑ f =1 1 11 that D0 (µ∗) provides a tighter upper bound of the achievable max Fig. 5. Duality gap for the problem in (9). rate than R1 in (??). Theorem 1: If the duality gap is nonzero, i.e. D0 (µ∗) > 0 µ ¶ max f (P1), the dual optimum provides a tighter upper N P P P f 1 in which L0 (P , µ) = log 1 + 1 +bound of user 1’s maximal achievable rate than the bound 1 2 N f +αf gf P ,N ,αα ,Pmax f=1 1 2 2 ( 1 2 1 2 ) 0 ∗ max ³ ´ in (8), i.e. D (µ ) < R1 . max PN f µ P1 − f=1 P1 . For a given µ, denote the Proof : As shown in Fig. 5, the non-zero duality gap optimal power allocation that maximizes (16) as implies that there exist at least two possible values for 0 f PN f ∗ x y P1 (µ) = arg max L (P1, µ) and P1 (µ) = [P1 (µ)]f , f=1 P1 (µ ) , which are denoted as P1 and P1 and they P1Â−0 x max y satisfy P1 < P1 < P1 . Denote the optimal power The following lemma holds for P1(µ): x y − P allocation of having power constraints P1 and P1 as P1 Lemma 1: N Pf (µ) is monotonic decreasing in + f=1 1 P and P1 respectively. We have equation (17). Moreover, µ. In addition, we have lim N P f (µ) = 0 and since Px < Pmax < Py, there exists 0 < υ < 1 µ→∞ f=1 1 1 1 1 P such that Pmax = υPx + (1 − υ)Py. Immediately, we get N Pf (0) = +∞. 1 ¡ ¢ 1 ¡ 1 ¢ f=1 1 D0 (µ∗) = υf 0 P− + (1 − υ) f 0 P+ . It corresponds to PN f 1 1 Proof: It is easy to see that P (0) = +∞. The rest − f=1 1 the time-sharing scenario, in which the power allocation P1 of the proof is the same as in Lemma 1 in [4].¥ + is adopted for time-fraction υ and P1 for time-fraction 1−υ. Fig. 5 gives a graphical illustration of the above Lemma. Consider the problem of allocating user 1’s power subject to Consider a sequence of optimization problems similar with max the maximal power constraint P1 in the interference-free (9). These problems are parameterized by the constraint environment. We know that the optimal solution is the single- imposed over user 1’s maximal sum power. The solid curve max x y ³ ´ user water-filling. Noting that P1 = υP1 + (1 − υ)P1 and PN ∗f 0 ∗ x y in Fig. 5 is a plot of the optimal value P1 , f (P1) P1 6= P , the aforementioned time-sharing strategy is sub- f=1 P 1 as this constraint varies. The curve is plotted with N P ∗f optimal for this problem. Therefore, we have equation (18) f=1 1 and this concludes the proof. ¥ Pon the x-axis. The y-axis is located at the point where N ∗f max By Theorem 1, evaluating the dual function leads to a f=1 P1 = P1 . The intersection of the curve with the 0 tighter upper bound of Stackelberg equilibrium than Rmax. y-axis is the optimum of (9), i.e. max f (P1). For a fixed 1 P 1³P ´ However, it is unfortunate that the computational complexity N ∗f 0 ∗ 0 max N µ, by drawing a tangent line to the f=1 P1 , f (P1) of optimally maximizing L (P1, µ) is still O((P1 /∆P ) ). f max curve and measuring the intersection of this tangent line This is because term g2 (P1, N2,αα1, P2 ) in the denomina- with the y-axis, the value of D0 (µ) can be graphically tor term of (9) is also a function of the allocated power 0 f 0 obtained. According to Lemma 1, as µ increases, the x- P1 (f 6= f) , which makes it impossible to decouple the µ ¶ µ ¶ XN XN 0 ∗ 0 ¡ −¢ ∗ max −f 0 ¡ +¢ ∗ max +f D (µ ) = f P + µ P1 − P = f P + µ P1 − P . (17) 1 f=1 1 1 f=1 1

µ Á ¶ XN ¯ ¯2 0 ∗ 0 ¡ −¢ 0 ¡ +¢ −f ¯ f ¯ f D (µ ) = υf P1 + (1 − υ) f P1 ≤ υ log2 1 + P1 ¯H11¯ σ1 f=1 µ Á ¶ µ Á ¶ (18) XN ¯ ¯2 XN ¯ ¯2 +f ¯ f ¯ f f∗ ¯ f ¯ f max + (1 − υ) log2 1 + P1 ¯H11¯ σ1 < log2 1 + P1 ¯H11¯ σ1 = R1 , f=1 f=1

TABLE I maximization in (16) into N independent sub-problems. To USER 1’S COMPUTATIONAL COMPLEXITY FOR DIFFERENT ALGORITHMS. conclude, the complexity of optimal solution in the dual domain is the same as the primal approach, which again Algorithm Computational complexity ± highlights the fact that the Stackelberg equilibrium is difficult max N Exhaustive search O((P1 ∆P±) ) max to compute. Algorithm 1 O(T1T2NP1 ∆P ) Iterative water-filling O(T3N) IV. LOW-COMPLEXITY ALGORITHM,SIMULATIONS, AND EXTENSIONS IW Algorithm In this section, we propose a low-complexity dual algo- 30 Pf 1 25 αf Pf +Nf rithm to search the Stackelberg equilibrium and examine its 2 2 1 achievable performance via extensive numerical simulations. 20 15 PSD

We also discuss how the strategic users can obtain the 10 required information and the extensions to general multi-user 5

0 scenarios. 0 2 4 6 8 10 12 14 16 18 20 frequency bins

Algorithm 1 A. A Low-Complexity Dual Approach 50 Pf 1 40 αf Pf +Nf As we have shown, the dual approach cannot reduce 2 2 1 the complexity of the global optimum of problem (9). 30

However, inspired by the ISB algorithm [5], we develop PSD 20 an efficient dual approach, which is listed as Algorithm 10

1. The basic idea of the algorithm is to approximately 0 0 0 0 2 4 6 8 10 12 14 16 18 20 evaluate D (µ) by locally optimizing L (P1, µ). For fixed frequency bins f µ, the algorithm finds the optimal P1 while keeping 1 f−1 f+1 N Fig. 6. User 1’s power allocation using different algorithms. P1 , ··· ,P1 ,P1 , ··· ,P1 fixed, and changes the index 0 f until it converges to a local maximum for L (P1, µ). Then the algorithm updates µ using bi-section search and repeats a wireless system with 20 sub-carriers over the 6.25-MHz the procedure above until the convergence is achieved. max max f As discussed in [5], the local optimum depends on band. We assume that P1 = P2 = 200 and σ1 = f 5 the initial starting point and the ordering of iterations. σ2 = 0.01 . To evaluate the performance, we tested 10 Moreover, the proof of convergence of the whole algo- sets of frequency-selective fading channels where a unique rithm becomes an issue. Algorithm 1 sets the Nash equi- Nash equilibrium exists, which are simulated using a four- librium as the initial starting point. In most of the ex- ray Rayleigh model with the exponential power profile and perimental setting we have tested, Algorithm 1 has been 160 ns delay between two adjacent rays [19]. The simulated power of each ray decreases exponentially according to observed to converge to a feasible solution within 10-15 f f its delay. The total power of all rays of H11 and H22 is iterations. The computational complexity of this iterative f f max normalized as one, and that of H and H is normalized algorithm is only O(T1T2NP1 /∆P ) and it reduces the 12 21 as 0.5. complexity of the± optimal exhaustive search by a factor of max N−1 Fig. 6 and 7 show the power allocations for both users O((P1 /∆P ) (T2N)) , which is considerably large for small ∆P and large N . Table I summarizes the computa- using different algorithms. In the IW algorithm, each user tional complexity comparison for user 1 if it adopts different water-fills the whole frequency band by regarding its com- algorithms, in which T3 is the number of iterations required petitor’s transmit PSD as background noise until the Nash in the iterative water-filling algorithm. equilibrium is achieved. In contrast, user 1 does not water- fill if it adopts Algorithm 1. For example, in Fig. 6, user B. Illustrative Results 1 allocates a large amount of power in frequency bin 3 In this sub-section, we evaluate the performance of Algo- even though it can gain an immediate increase in R1 by rithm 1 by comparing with the IW algorithm. We simulate re-allocating some of its power in the frequency bins 5 and Algorithm 1 :A low-complexity dual approach max max f f f f 1: Input: P1 , P2 ,N1 ,N2 , α1 , α2 for ∀f NE max min 2: Initialize: P1 = P1 , µ , µ 3: Repeat ¡ ¢ 4: µ = µmax + µmin /2. 5: Repeat 6: for f = 1 to N, n h .³ ´i o f PN f f f f max f 7: set P1 = arg max f ln 1 + P1 N1 + α2 g2 (P1, N2,αα1, P2 ) − µP1 by keeping P1 f=1 1 f−1 f+1 N P1 , ··· ,P1 ,P1 , ··· ,P1 fixed. 8: end¡ ¢ 9: until P 1, ··· ,P N converges 1 1 ¡ ¢ ¡ ¢ PN f max min max min max max min 10: if f=1 P1 > P1 , µ = µ + µ /2; else µ = µ + µ /2. 11: until it converges

IW Algorithm 1 25 Pf 2 0.9 αf Pf +Nf 20 1 1 2 R' /RNE 1 1 NE 0.8 R' /R 15 2 2 PSD 10 0.7

5 0.6

0 0 2 4 6 8 10 12 14 16 18 20 0.5 frequency bins

Algorithm 1 0.4 Cumulative Probability 25 Pf 2 αf Pf +Nf 0.3 20 1 1 2

0.2 15 PSD 10 0.1

5 0 1 1.5 2 2.5 3 3.5 4 0 Ratio 0 2 4 6 8 10 12 14 16 18 20 frequency bins ¯ ¯2 ¯ ¯2 0 NE P ¯ f ¯ P ¯ f ¯ Fig. 8. Cdfs for the ratio of Ri/Ri ( f ¯H12¯ = f ¯H21¯ = 0.5). Fig. 7. User 2’s power allocation using different algorithms.

C. Information Acquisition 6 where the noise plus interference PSD is below its water- Previous sections mentioned that, in order to play the levels in the frequency bins 7-12. Stackelberg equilibrium, the additional information about Denote user i’s achieved rate by deploying Algorithm 1 the competing user’s CSI, maximum power constraint, and 0 as Ri . Fig. 8 shows the simulated cumulative distribution power allocation strategy is indispensable. In practice, there 0 NE functions (cdf) of Ri/Ri . From the curve, Algorithm are several possible methods to acquire this required infor- 1 achieves a higher rate for the foresighted user in all mation. the simulated realizations. The average rate improvement First, the myopic user has the incentive to provide the that Algorithm 1 provides over the IW algorithm is 38%. required information, because its performance can be greatly In addition, it is surprising to find that, in 95% of the improved if the foresighted player knows the myopic player’s simulation settings, Algorithm 1 also results in a higher rate private information. In the distributed setting, users can 0 NE R2 than R2 for the myopic user, and the average rate individually decide whether or not to play the Stackelberg improvement is 45%. This is because user 1’s Stackelberg strategy based on their computational hardware constraints. strategy mitigates its interference caused to user 2. The user that wants to behave myopically can reveal its We also simulate the scenarios in which the total power information to the foresighted user. This can be viewed as f f of H12 and H21 is normalized as 0.25 and all the other the user’s cooperative behavior to avoid mutual interference. parameters remain the same as above. Fig. 9 shows the When no information exchanges among users are possible, 0 NE simulated cdfs of Ri/Ri . The average rate improvement the alternative way for users to gather this information is for user 1 is 27% and that of user 2 is 32%. It is intuitive through predictive modeling. If the foresighted user strategi- that the average rate improvement is decreasing when the cally changes its power allocation, it can measure and model f f power of H12 and H21 decreases, because the interference the resulting interference PSD, i.e. estimate the functional f max coupling between users and the foresighted user’s ability in expression of g2 (P1, N2,αα1, P2 ), without any information shaping the myopic user’s response are both reduced. exchange among users. For instance, in [20], we showed that 1 non-convex. It is easy to verify that Lemma 1 and Theorem 0.9 1 still hold. Although it is difficult to analytically derive f ¡ ¢ 0.8 R' /RNE max max 1 1 q P , N,ααα, P , ··· , P , we are still able to numer- R' /RNE k 1 2 nm+1 2 2 0.7 ically evaluate it. Hence, Algorithm 1 can be applied in this case by replacing its lines 7 with numerically finding local 0.6 maxima of (20) . We simulate the three-user scenarios in 0.5 ¯ ¯2 PN ¯ ¯ which ¯Hf ¯ = 0.25 for i 6= j, Pmax = 200, and 0.4 f=1 ij i Cumulative Probability σf = 0.01, and all the other parameters remain the same as 0.3 i 0 NE Section IV.B. Fig. 10 shows the simulated cdfs of Ri/Ri . 0.2 The average rate improvement for user 1 is 34% and that of 0.1 user 2 and 3 is 10.5%. From Fig. 10, we can see that, the

0 0.8 1 1.2 1.4 1.6 1.8 2 2.2 Stackelberg strategy also benefits the two myopic users in Ratio more than 83% of the channel realizations. Fig. 9. Cdfs for the ratio of R0 /RNE Assume now that we have multiple foresighted users, ¯ ¯ ¯ ¯ i i P ¯ f ¯2 P ¯ f ¯2 i.e. n > 1, n ≥ 1 . In this case, the single objective ( f ¯H12¯ = f ¯H21¯ = 0.25). f m function in the original upper-level problem disappears and

1 it becomes a multi-objective optimization problem. For these foresighted users, a reasonable outcome 0.9 n ©¡is to choose¢ an operating point in the set R ªf = 0.8 NE R1, ··· ,Rnf : Ri ≥ Ri , for all i = 1, ··· , nf , R' /RNE 0.7 1 1 NE R' /RNE 2 2 where R is user i’s achievable rate if all the users R' /RNE i 3 3 0.6 are myopic. This point can be determined based on the

0.5 negotiation among the foresighted users. Cooperative provides many solution concepts, e.g. bargaining, 0.4 Cumulative Probability for choosing the operating point [13]. Note that the overall 0.3 game in this scenario is a mixture of cooperation and 0.2 competition in that the cooperation exists among the 0.1 foresighted users while myopic players compete with each

0 other. A possible way of achieving the boundary point on 0.8 1 1.2 1.4 1.6 1.8 2 2.2 nf Ratio R is to let some coordinator solve the following weighted ¯ ¯ P ¯ ¯2 sum-rate maximization and determine the transmitted PSDs Fig. 10. Cdfs for the ratio of R0 /RNE ( Hf =0.25, i 6= j). i i f ¯ ij ¯ for different foresighted users in problem (21), in which ωi ≥ 0 is user i’s weight. Although this problem is generally difficult to solve optimally, some low-complexity methods the foresighted user can effectively model its experienced similar to Algorithm 1 can be adopted to obtain sub-optimal interference as a linear function of its own allocated power, solutions. formulate a local approximation of the original bi-level program, and substantially improve both users’ achievable rates. V. CONCLUSION

D. Extensions to Multi-user Games This paper considers the strategic behavior in determining The two-user formulation can be extended to the general the transmit power PSD for selfish users sharing a frequency- cases in which multiple users can be myopic or foresighted. selective interference channel. We adopt the game theoretic The analysis in these cases becomes much more involved. We concept of Stackelberg equilibrium and model the two-user denote the number of foresighted user as nf and the number case as a bi-level programming problem. We show that the of myopic user as nm. We briefly address two remaining Stackelberg equilibrium is intrinsically difficult to compute cases as follows. and propose a low-complexity approach based on Lagrangian In the first case, nf = 1, nm > 1. As in (9), dual theory. Numerical results show the strategic user should we can still have the single-level formulation given avoid shortsighted Nash strategy and it can substantially f f f 2 nby problem (19), in which Ni = oσi /|Hnii| , N = improve both users’ performance if it knows the CSI and f f response strategy of the competing user. Operational methods Ni : i = 2, ··· , nm + 1, f = 1, ··· ,K ,αα = αij : i = o for acquiring the necessary information and extensions to 1, ··· , nm + 1 , j = 2, ··· , nm + 1, f = 1, ··· ,K , ¡ ¢ multi-user scenarios are proposed. Obtaining satisfactory per- f α max max and qk P1, N,αα, P2 , ··· , Pnm+1 is the function de- formance with minimal information exchange while multiple termining user k’s allocated power in channel f. As a foresighted users exist is identified as a problem for further general from of the two-user case, problem (19) is also investigation. Ã ! XN P f max log 1 + P ¡ 1 ¢ 2 f nm+1 f f max max P1 N + α q P , N,ααα, P , ··· , P f=1 1 k=2 k1 k 1 2 nm+1 (19) XN f max f s.t. P ≤ P1 ,P ≥ 0, f=1 1 1

   

XN   f     P1  f  log2 1 +  − µP1  (20)   nmP+1 ¡ ¢  f=1 f f f α max max N1 + αk1qk P1, N,αα, P2 , ··· , Pnm+1 k=2

  XN f  Pi  max ωi log2 1 + ³ ´ P ,··· ,P f Pnf +nm f f max max 1 nf N + α q P , ··· , P , N,ααα, P , ··· , P f=1 i k=nf +1 ki k 1 nf nf +1 nf +nm (21) XN f max f NE s.t. P ≤ Pi ,P > 0,Ri > Ri , i = 1, ··· , nf , f=1 i i

REFERENCES [18] S. Boyd and L. Vandenberghe, Convex Optimization, Cambridge University Press, 2004. [1] W. Yu, G. Ginis, and J. Cioffi, “Distributed multiuser power control [19] T. S. Rappaport, Wireless Communications. Englewood Cliffs, NJ: for digital subscriber lines,” IEEE J. Sel. Areas Commun. vol. 20, no. Prentice-Hall, 1996. 5, pp.1105-1115, June 2002. [20] Y. Su and M. van der Schaar, “Conjectural equilibrium in water- [2] R. Etkin, A. Parekh, and D. Tse, “Spectrum Sharing for Unlicensed filling games,” IEEE Trans. Signal Processing, to appear. (available Bands,” IEEE J. Sel. Areas Commun. vol. 25, no. 3, pp. 517-528, at http://arxiv.org/abs/0811.0048) April 2007. [3] J. Huang, R. Berry, and M. Honig, “Distributed Interference Com- pensation for Wireless Networks,” IEEE J. Sel. Areas Commun., vol. 24, no. 5, pp. 1074-1084, May 2006. [4] R. Cendrillon, W. Yu, M. Moonen, J. Verlinden, and T. Bostoen, “Optimal multiuser spectrum balancing for digital subscriber lines,” IEEE Trans. Commu., vol. 54, no. 5, pp. 922-933, May 2006. [5] W. Yu and R. Lui, “Dual methods for nonconvex spectrum optimiza- tion of multicarrier systems”, IEEE Trans. Commu., vol. 54, p.1310- 1322, 2006. [6] R. Cendrillon, J. Huang, M. Chiang, and M. Moonen, “Autonomous Spectrum Balancing for Digital Subscriber Lines,” IEEE Trans. Signal Processing, vol. 55, no. 8, pp. 4241-4257, Aug. 2007. [7] O. Popescu, D. Popescu, and C. Rose, “Simultaneous Water Filling for Mutually Interfering Systems,” IEEE Trans. Wireless Commu., vol. 6, no.3, pp. 1102-1113, Mar. 2007. [8] S. Haykin, “Cognitive Radio: Brain-empowered wireless communi- cations,” IEEE J. Sel. Areas Commun., vol. 23, pp. 201-220, 2005. [9] S. Haykin, “Cognitive Dynamic Systems,” Proc. of IEEE ICASSP 2007, vol. 4, pp. 1369-1372, April 2007. [10] E. Altman, T. Boulogne, R. El-Azouzi, T Jimenez, and L. Wynter, “A survey on networking games”, Computers and Operations Research, 2004. [11] Y. Su and M. van der Schaar, “A new look at multi-user power control games,” Proc. IEEE ICC, pp. 1072- 1076, May 2008. [12] Y. Shoham, R. Powers, and T. Grenager, “If Multi-Agent Learning is the Answer, What is the Question?”, Artificial Intelligence, vol. 171, no. 7, pp. 365-377, May 2007 [13] D. Fudenberg and J. Tirole, Game Theory. Cambridge, MA: MIT Press, 1991. [14] K. W. Shum, K.-K. Leung, and C. W. Sung, “Convergence of iterative waterfilling algorithm for gaussian interference channels,” IEEE J. Sel. Areas Commun., vol. 25, no 6, pp. 1091-1100, Aug. 2007. [15] B. Colson, P. Marcotte, and G. Savard, “Bilevel programming: A survey,” A quarterly Journal of Operation Research, vol. 3, no.2, pp. 87-107, 2005. [16] E. Altman and Z. Altman, “S-modular games and power control in wireless networks,” IEEE Trans. Automatic Control, vol. 48, no. 5, pp. 839-842, May 2003. [17] E. Altman, K. Avrachenkov, and A. Garnaev, “Closed form solutions for symmetric water filling games,” Proc. IEEE INFOCOM 2008, pp. 673-681, April 2008.