Gradient Play in n-Cluster with Zero-Order Information

Tatiana Tatarenko, Jan Zimmermann, Jurgen¨ Adamy

Abstract— We study a distributed approach for seeking a models, each agent intends to find a to achieve a in n-cluster games with strictly monotone Nash equilibrium in the resulting n-cluster , which is mappings. Each player within each cluster has access to the a stable state that minimizes the cluster’s cost functions in current value of her own smooth local cost function estimated by a zero-order oracle at some query point. We assume the response to the actions of the agents from other clusters. agents to be able to communicate with their neighbors in Continuous time algorithms for the distributed Nash equi- the same cluster over some undirected graph. The goal of libria seeking problem in multi-cluster games were proposed the agents in the cluster is to minimize their collective cost. in [17]–[19]. The paper [17] solves an unconstrained multi- This cost depends, however, on actions of agents from other cluster game by using gradient-based algorithms, whereas the clusters. Thus, a game between the clusters is to be solved. We present a distributed gradient play algorithm for determining works [18] and [19] propose a gradient-free algorithm, based a Nash equilibrium in this game. The algorithm takes into on zero-order information, for seeking Nash and generalized account the communication settings and zero-order information Nash equilibria respectively. In discrete time domain, the under consideration. We prove almost sure convergence of this work [5] presents a leader-follower based algorithm, which algorithm to a Nash equilibrium given appropriate estimations can solve unconstrained multi-cluster games in linear time. of the local cost functions’ gradients. The authors in [20] extend this result to the case of leaderless I.INTRODUCTION architecture. Both papers [5], [20] prove linear convergence in games with strongly monotone mappings and first-order Distributed optimization and provide power- information, meaning that agents can calculate gradients ful frameworks to deal with optimization problems arising of their cost functions and use this information to update in multi-agent systems. In generic distributed optimization their states. In contrast to that, the work [16] deals with problems, the cost functions of agents are distributed across a gradient-free approach to the cluster games. However, the network, meaning that each agent has only partial infor- the gradient estimations are constructed in such a way that mation about the whole optimization problem which is to only convergence to a neighborhood of the equilibrium can be solved. Game theoretic problems arise in such networks be guaranteed. Moreover, these estimations are obtained by when the agents do not cooperate with each other and the using two query points, for which an extra coordination cost functions of these non-cooperative agents are coupled by between the agents is required. the decisions of all agents in the system. The applications Motivated by relevancy of n-cluster game models in of game theoretic and distributed optimization approaches many engineering applications, we present a discrete time include, for example, electricity markets, power systems, distributed procedure to seek Nash equilibria in n-cluster flow control problems and communication networks [6], [11], games with zero-order information. We consider settings, [12]. where agents can communicate with their direct neighbors On the other hand, cooperation and competition coex- within the corresponding cluster over some undirected graph. ists in many practical situations, such as cloud computing, However, in many practical situations the agents do not hierarchical optimization in Smart Grid, and adversarial know the functional form of their objectives and can only networks [3], [4], [8]. A body of recent work has been access the current values of their objective functions at some arXiv:2107.12648v1 [cs.GT] 27 Jul 2021 devoted to analysis of non-cooperative games and distributed query point. Such situations arise, for example, in electricity optimization problems in terms of a single model called n- markets with unknown price functions [15]. In such cases, cluster games [5], [16]–[20]. In such n-cluster games, each the information structure is referred to as zero-order oracle. cluster corresponds to a player whose goal is to minimize Our work focuses on zero-order oracle information settings her own cost function. However, the clusters in this game and, thus, assumes agents to have no access to the analytical are not the actual decision-makers as the optimization of the form of their cost functions and gradients. The agents instead cluster’s objective is controlled by the agents belonging to construct their local query points and get the corresponding the corresponding cluster. Each of such agents has her own cost values from the oracle. Based on these values, the local cost function, which is available only to this agent, agents estimate their local gradients to be able to follow but depends on the joint actions of agents in all clusters. the step in the gradient play procedure. We formulate the The cluster’s objective, in turn, is the sum of the local cost sufficient conditions and provide some concrete example on functions of the agents within the cluster. Therefore, in such how to estimate the gradients to guarantee the almost sure convergence of the resulting algorithm to Nash equilibria in The authors are with Control Methods and Robotics Lab at the TU Darmstadt, Germany. n-cluster games with strictly monotone game mappings. To the best of our knowledge, we present the first algorithm solving n-cluster games with zero-order oracle and the clusters. Instead, we consider the following zero-order infor- corresponding one-point gradient estimations. mation structure in the system: No agent has access to the The paper is organized as follows. In SectionII we for- analytical form of any cost function, including its own. Each mulated the n-cluster game with undirected communication agent can only observe the value of its local cost function topology in each cluster and zero-order oracle information. given any joint action of all agents in the system. Formally, Section III introduces the gradient play algorithm which is given a joint action x Ω, each agent j [ni], i [n] j ∈ ∈ ∈ based on the one-point gradient estimations. The convergence receives the value Ji (x) from a zero-order oracle. Especially, result is presented in Section III as well. SectionIV provides no agent has or receives any information about the gradient. an example of query points and gradient estimations which Let us denote the game between the clusters introduced guarantee convergence of the algorithm discussed in Sec- above by Γ(n, J , Ω , ). We make the following { i} { i} {Gi} tion III. SectionV presents some simulation results. Finally, assumptions regarding the game Γ: SectionVI concludes the paper. Assumption 1. The n-cluster game under consideration is Notations. The set 1, . . . , n is denoted by [n]. For any strictly convex. Namely, for all , the set is convex, { }n ∂f(x) i [n] Ωi function f : K R, K R , if(x) = is the ∈ → ⊆ ∇ ∂xi the cost function Ji(xi, x i) is continuously differentiable in partial derivative taken in respect to the ith coordinate of the for each fixed . Moreover,− the game mapping, which n xi x i vector variable x R . We consider real normed space E, is defined as − which is the space∈ of real vectors, i.e. E = n. We use (u, v) R T to denote the inner product in E. We use to denote the F(x) , [ 1J1(x1, x 1),..., nJn(xn, x n)] (1) k·k ∇ − ∇ − Euclidean norm induced by the standard dot product in E. is strictly monotone on Ω. Any mapping g : E E is said to be strictly monotone on → j Q E, if (g(u) g(v), u v) > 0 for any u, v Q, where Assumption 2. Each function iJi (xi, x i) is Lipschitz ⊆ − − ∈ continuous on Ω. ∇ − u = v. We use Br(p) to denote the ball of the radius r 0 6 ≥ and the center p E and to denote the unit sphere with i S Assumption 3. The action sets Ωj, j [ni], i [n], are the center in 0 E∈. We use v to denote the projection ∈ ∈ ∈ PΩ { } compact. Moreover, for each i there exists a so called safety of v E to a set Ω E. The mathematical expectation 2 ∈ ⊆ ball Br(p) Ωi with ri > 0 and pi Ωi . of a random value ξ is denoted by E ξ . We use the big-O ⊆ ∈ { } The assumptions above are standard in the literature on notation, that is, the function f(x): R R is O(g(x)) as → f(x) both game-theoretic and zero-order optimization [1]. Finally, x a, f(x) = O(g(x)) as x a, if limx a |g(x)| K for some→ positive constant K. → → | | ≤ we make the following assumption on the communication graph, which guarantees sufficient information ”mixing” in II.NASH EQUILIBRIUM SEEKING the network within each cluster. A. Problem Formulation Assumption 4. The underlying undirected communication We consider a non-cooperative game between n clusters. graph i([ni], i) is connected for all i = 1, . . . , n. The j G A i n n associated non-negative mixing matrix Wi = [wkj] R × Each cluster i [n] itself consists of ni agents. Let Ji ∈ j 1 ∈ defines the weights on the undirected arcs such that wi > 0 and Ωi R denote respectively the cost function and kj ⊆ if and only if (k, j) and Pni wi = 1, k [n ]. the feasible action set of the agent j in the cluster i. We ∈ Ai k=1 kj ∀ ∈ i denote the joint action set of the agents in the cluster i by One of the stable solutions in any game Γ corresponds to 1 ni j Ωi = Ωi ... Ωi . Each function Ji (xi, x i), i [n], a Nash equilibrium defined below. depends on× x =× (x1, . . . , xni ) Ω , which represents− ∈ the i i i i T ∈ Definition 1. A vector x∗ = [x∗, x∗, , x∗ ] Ω is called joint action of the agents within the cluster i, and x i 1 2 n − ··· ∈ ∈ a Nash equilibrium if for any i [n] and xi Ωi Ω i = Ω1 ... Ωi 1 Ωi+1 Ωn, denoting the joint ∈ ∈ action− of the× agents× from− all× clusters× except for the cluster i. Ji(xi∗, x∗ i) Ji(xi, x∗ i). The cooperative cost function in the cluster i [n] is, thus, − ≤ − J (x , x ) = 1 Pni J j(x , x ). ∈ In this work, we are interested in distributed seeking of a i i i ni j=1 i i i We assume− that the agents within− each cluster can interact Nash equilibrium in any game Γ(n, Ji , Ωi , i ) with the information structure described{ above} { and} {G for} which over an undirected communication graph i([ni], i). The G A Assumptions1-4 hold. set of nodes is the set of the agents [ni] and the set of undirected arcs is such that (k, j) if and only if B. Existence and Uniqueness of the Nash Equilibrium Ai ∈ Ai (j, k) i, i.e. there is a bidirectional communication link In this subsection, we demonstrate the existence of the between∈ Ak to j, over which information in form of a message Nash equilibrium for Γ(n, Ji , Ωi , i ) under Assump- can be sent from the agent k to the agent j and vice versa tions1 and3. For this purpose{ we} { recall} {G the} results connect- in the cluster i. ing Nash equilibria and solutions of variational inequalities However, there is no explicit communication between the from [9].

1All results below are applicable for games with different dimensions 2Existence of the safety ball is required to construct feasible points for j j {di } of the action sets {Ωi }. The one-dimensional case is considered for costs’ gradient estimations in the zero-order settings under consideration the sake of notation simplicity. (see [1]). Definition 2. Consider a set Q Rd and a mapping g: aim to adapt the standard projected gradient play approach ⊆ Q Rd.A solution SOL(Q, g) to the variational inequality to the cluster game with the zero-order information. → problem VI(Q, g) is a set of vectors q∗ Q such that At this point we assume each agent j [ni], i [n], , for any . ∈ (j) ∈ ∈ g(q∗), q q∗ 0 q Q based on its local estimation xi , constructs a feasible query h − i ≥ ∈ (j) The following theorem is the well-known result on the point xˆi Ωi and sends it to the oracle. As a reply from ∈ j (j) ˆ connection between Nash equilibria in games and solutions the oracle, the agent receives the value Ji (ˆxi , x˜ i). The ˆ − of a definite variational inequality (see Corollary 1.4.2 in vector x˜ i here corresponds to the point obtained by some − [9]). combination of the query vectors sent by the agents from the other clusters. Formally, Theorem 1. Consider a non-cooperative game Γ. Suppose ˆ (j1) (ji−1) (ji+1) (jn) that the action sets of the players Ωi are closed and x˜ i = (ˆx1 ,..., xˆi 1 , xˆi+1 ,..., xˆn ), (3) { } − − convex, the cost functions Ji(xi, x i) are continuously { − } where j denotes some agent from the cluster k [n], k = i. differentiable and convex in xi for every fixed x i on the k − Further each agent , , uses the received∈ value6 interior of the joint action set Ω. Then, some vector x∗ Ω j [ni] i [n] ∈ j (j) ˆ ∈ ∈ j is a Nash equilibrium in Γ, if and only if x∗ SOL(Ω, F), Ji (ˆxi , x˜ i) to obtain the random estimation di of her local ∈ − j (j) where F is the game mapping defined by (1). cost’s gradient iJi at the point (xi , x˜ i), where ∇ − Next, we formulate the result guaranteeing existence and (j1) (ji−1) (ji+1) (jn) x˜ i = (x1 , . . . , xi 1 , xi+1 , . . . , xn ) (4) uniqueness of SOL(Q, g) in the case of strictly monotone − − map Q (see Corollary 2.2.5 and Proposition 2.3.3 in [9]). corresponds to the local estimations of other agents (one for each cluster different from i) based on which query points Theorem 2. Given the VI(Q, g), suppose that Q is compact (j) j j j ˆ ni j are obtained. Thus, di = di (Ji (ˆxi , x˜ i)) R . As di is and the mapping g is strictly monotone. Then, the solution − ∈ an estimation of j (j) , we represent this vector SOL(Q, g) exists and is a singleton. iJi (xi , x˜ i) by the following decomposition:∇ − Taking into account Theorems1 and2, we obtain the j j (j) j following result. di = iJi (xi , x˜ i) + ei , (5) ∇ − j Theorem 3. Let Γ(n, Ji , Ωi , i ) be a game for which where e is a random vector reflecting inaccuracy of the { } { } {G } i Assumptions1 and3 hold. Then, there exists the unique Nash obtained estimation, i.e. the estimation error vector. Note that equilibrium in . Moreover, the Nash equilibrium in is the (j) ˆ Γ Γ for the joint query point (ˆxi , x˜ i) the oracle is free to − solution of VI(Ω, F), where F is the game mapping (see choose any combination x˜ˆ i of the local queries defined in (1)). (3). − Thus, if Assumptions1 and3 hold, we can guarantee Now we are ready to formulate the gradient play between the clusters. Starting with an arbitrary x(j)(0) Ω , each existence and uniqueness of the Nash equilibrium in the i ∈ i agent updates the local estimation vector (j), , game Γ(n, Ji , Ωi , i ) under consideration and use the j xi j [ni] { } { } {G } i [n], as follows: ∈ corresponding variational inequality in the analysis of the ∈ optimization procedure presented below. ( ni ) (j) X i (l) j xi (t + 1) = Ωi wjlxi (t) αtdi (t) , (6) AIN ESULTS P − III.M R l=1 A. Zero-order gradient play between clusters where the time-dependent parameter αt > 0 corresponds to To deal with the zero-order information available to the the step size. agents and local state exchanges within the clusters, we Let Ft be the σ-algebra generated by the estimations (j) t assume each agent j from the cluster i maintains a local x (m) up to time t, j [ni], i [n]. Let { i }m=0 ∈ ∈ variable x¯ (t) = 1 Pni x(j)(t) be the running average of the i ni j=1 i agents’ estimations vectors within the cluster i. The follow- (j) (j)1 (j)j 1 j (j)i+1 (j)ni T x = [x , , x − , x , x , , x ] Ω , i i ··· i i i ··· i ∈ i ing proposition describes the behavior of x¯i(t) in the long (2) run. (j) which is her estimation of the joint action xi = Proposition 1. Let Assumptions3 and4 hold and xi (t), 1 2 ni T of the agents from her cluster [xi , xi , , xi ] Ωi j [ni], i [n], be updated according to (6). Then for all ··· (j)k ∈ j j i. Here, x Ω is player k’s estimate of x and j ∈ [n ], i ∈[n] i ∈ i i i (j)j j j is the action of agent from cluster ∈ ∈ q xi = xi Ωi j j 2 ∈ 1) if limt αt = 0 and limt αt E ei (t) Ft = i. The goal of the agents within each cluster is to update →∞ →∞ (j) {k k | } their local variables in such a way that the joint action 0 almost surely, then limt xi (t) x¯i(t) = 0 →∞k − k 1 ni almost surely; x = (x1, . . . , xn) Ω with xi = (xi , . . . , xi ) Ωi ∈ ∈ P 2 P 2 j 2 converges to the Nash equilibrium in the game Γ between the 2) if t∞=0 αt < and t∞=0 αt E ei (t) Ft < ∞ P (j) {k k | } ∞ clusters as time runs. To let the agents achieve this goal, we almost surely, then ∞ α x (t) x¯ (t) < . t=0 tk i − i k ∞ Proof. Follows from Lemma 8 in [7]3. Cauchy–Schwarz inequality, implying j j j j In view of the proposition above and to be able to analyze (ei ((t)), vi (t) xi∗) ei ((t)) vi (t) xi∗ a.s., behavior of the algorithm by means of the running averages − − ≤ k k − k and Assumption3 implying almost sure boundedness of x¯i(t), i [n] we make the following assumption on the j j 2 ∈ j vi (t) xi∗ . We focus now on the terms vi (t) xi∗ and balance between the step size αt and the error term ei (t). k − jk (j) j k − k 2αt( iJi (xi (t), x˜ i(t)), vi (t) xi∗). Due to Assump- j − ∇ − − Assumption 5. The step size αt and the error term ei (t) tion4, we have that a.s. are such that ni ni j 2 X i (l) 2 X i (l) 2 X∞ X∞ vi (t) xi∗ = wjlxi (t) xi∗ wjl xi (t) xi∗ . α = , α2 < , k − k k − k ≤ k − k t ∞ t ∞ l=1 l=1 t=0 t=0 Pni i And, as j=1 wjl = 1, we obtain that a.s. X∞ j αtE ei ((t)) Ft < almost surely, ni ni ni {k k| } ∞ X j 2 X X i (l) 2 t=0 v (t) x∗ ( w ) x (t) x∗ k i − i k ≤ jl k i − i k j=1 l=1 j=1 X∞ 2 j 2 αt E ei ((t)) Ft < almost surely. ni {k k | } ∞ X (l) 2 t=0 = x (t) x∗ . (8) k i − i k In SectionIV we shed light on how the gradients can l=1 be sampled to guarantee fulfillment of Assumption5. With Next, Proposition1 in place, we are ready to prove the main result j (j) j ( iJi (xi (t),x˜ i(t)), vi (t) xi∗) formulated in the theorem below. ∇ − − j (j) j (j) =( iJi (xi (t), x˜ i(t)), vi (t) xi∗) Theorem 4. Let Assumptions1-5 hold and xi (t), j [ni], ∇ − − ∈ j (j) i [n], be updated according to (6). Then the joint action ( iJi (xi (t), x˜ i(t)), x¯i(t) xi∗) ∈ − ∇ − − x(t) = (x1(t), . . . , xn(t)) converges almost surely to the j (j) + ( iJi (xi (t), x˜ i(t)), x¯i(t) xi∗) unique Nash equilibrium x∗ in the Γ(n, J , Ω , ), ∇ − − { i} { i} {Gi} j ( iJi (¯xi(t)¯x i(t)), x¯i(t) xi∗) i.e. Pr limt x(t) x∗ = 0 = 1. − { →∞k − k } − ∇ j − + ( iJi (¯xi(t)¯x i(t)), x¯i(t) xi∗), (9) Proof. Let x∗ = (x1∗, . . . , xn∗ ) be the unique Nash equilib- ∇ − − P rium in the game , see Theorem3. We ¯ k6=i nk Γ(n, Ji , Ωi , i ) where x˜ i(t) R is the joint running average of { } { } {G } (j) − ∈ proceed with estimating the distance between xi (t+1) and the agents’ local variable over all clusters except for the j Pni i (l) xi∗. Let vi (t) = l=1 wjlxi (t). As xi∗ Ωi, we can use cluster i (see more details in (4)). Thus, by applying the the non-expansion of the projection operator∈ to conclude that Cauchy–Schwarz inequality to (9), we get 4 almost surely (a.s.) j (j) j ( iJi (xi (t), x˜ i(t)), vi (t) xi∗) (j) j j − ∇ − − 2 2 (j) xi (t + 1) xi∗ vi (t) αtdi (t) xi∗ j j k − k ≤ k − − k iJi (xi (t), x˜ i(t)) vi (t) x¯i(t) j 2 ≤ k∇ − kk − k = vi (t) xi∗ j (j) j k − k + iJi (xi (t), x˜ i(t)) iJi (¯xi(t)¯x i(t)) j j 2 j 2 k∇ − − ∇ − k 2αt(di (t), vi (t) xi∗) + αt di (t) − − k k x¯i(t) xi∗ j 2 × k − k = vi (t) xi∗ j k − k ( iJi (¯xi(t)¯x i(t)), x¯i(t) xi∗), a.s.. (10) j (j) j − ∇ − − 2αt( iJi (xi (t), x˜ i(t)), vi (t) xi∗) − ∇ − − Taking into account almost sure boundedness of j j 2 j 2 j (j) 2αt(e ((t)), v (t) xi∗) + O(αt (1 + e (t) )) and (see − i i − k i k iJi (xi (t), x˜ i(t)) x¯i(t) xi∗ j 2 k∇ − k k − k v (t) x∗ Assumptions1 and3) and Assumption2, we conclude that ≤ k i − i k j (j) j j (j) j 2αt( iJi (xi (t), x˜ i(t)), vi (t) xi∗) ( iJi (xi (t), x˜ i(t)), vi (t) xi∗) − ∇ − − − ∇ − − j 2 j 2 j + O(αt e ((t)) ) + O(α (1 + e (t) )), (7) O( v (t) x¯ (t) ) k i k t k i k ≤ k i − i k (j) where in the last equality we used (5), which implies that + O( xi (t) x¯i(t) + x˜ i(t) x¯ i(t) ) k − k k − − − k a.s. j ( iJi (¯xi(t), x˜ i(t)), x¯i(t) xi∗). (11) − ∇ − − j 2 j (j) 2 j 2 di (t) 2( iJi (xi (t), x˜ i(t)) + ei (t) ) Thus, we get from (7) k k ≤ k∇ − k k k j 2 j 2 (j) 2 j 2 and, thus, d (t) = O(1 + e (t) ) a.s. (see Assump- x (t + 1) x∗ v (t) x∗ k i k k i k k i − i k ≤ k i − i k tions1 and3), whereas in the last inequality we used the j 2αt( iJi (¯xi(t)¯x i(t)), x¯i(t) xi∗) − − ∇ j − 3The proof can be repeated up to (37) in [7]. The inequality (37) and + 2α O( v (t) x¯ (t) ) t k i − i k the analysis afterward stay valid in terms of the conditional expectation (j) E{·|Ft}. + 2αtO( xi (t) x¯i(t) + x˜ i(t) x¯ i(t) ) 4In the following discussion the big-O notation is defined under the limit k − k k − − − k + O(α ej((t)) ) + O(α2(1 + ej(t) 2)), a.s.. (12) t → ∞ (see Notations). tk i k t k i k Analogously to (8) almost surely. Thus,

ni ni ∞ X j X (j) X v (t) x¯ (t) x (t) x¯ (t) . hi(t) < a.s. for all i [n]. (14) k i − i k≤ k| i − i k ∞ ∈ j=1 l=1 t=0

Next, let us introduce the vector u(t) = (u1(t), . . . , un(t)), Therefore, by averaging both sides of (12) over j = 1, . . . , ni 1   2 1 Pni (j) 2 and taking the conditional expectation in respect to Ft (below where ui = x (t) x∗ . Therefore, sum- ni j=1k i − i k we use the notation E Ft = Et ), we obtain that a.s. ming (13) over i [n] implies {·| } {·} ∈ ni ni 2 2 1 X (j) 2 1 X (j) 2 Et u(t + 1) u(t) 2αt(F(¯x(t)), x¯(t) x∗) Et x (t + 1) x∗ x (t) x∗ k k ≤ k k − − n {k i − i k } ≤ n k i − i k n i j=1 i j=1 X + hi(t) ni 1 X j i=1 2αt ( iJi (¯xi(t)¯x i(t)), x¯i(t) xi∗) 2 − ni ∇ − − v(t) 2α (F(¯x(t)) F(x∗), x¯(t) x∗) j=1 ≤ k k − t − −   n ni X 1 X (j) + hi(t), (15) + O  αt x (t) x¯i(t)  n k i − k i=1 i j=1 where in the last inequality we used the fact that x∗ is + O(αt x˜ i(t) x¯ i(t) ) k − − − k the Nash equilibrium in Γ(n, Ji , Ωi , i ) and, thus, ni { } { } {G } 1 X  j 2 j 2  (F(x∗), x¯(t) x∗) 0 a.s. for all t (see Theorem1). Due + O(αtEt ei ((t)) + αt (1 + Et ei (t) )) − ≥ ni {k k} {k k } to the strictly monotone mapping (see Assumption1), which j=1 implies ni 1 X (j) 2 (F(¯x(t)) F(x∗), x¯(t) x∗) 0, = xi (t) xi∗ − − ≥ ni k − k j=1 and (14), we can apply the Robbins and Siegmund result 2αt( iJi(¯xi(t)¯x i(t)), x¯i(t) xi∗) + hi(t), (see Theorem5 in Appendix) to the inequality (15). With − ∇ − − that, we conclude that u(t) 2 converges a.s. as t and (13) k k → ∞ X∞ α (F(¯x(t)) F(x∗), x¯(t) x∗) < a.s. where t − − ∞ t=1  n  1 i P X (j) Taking the inequality above and the fact that t∞=0 αt = hi(t) = O  αt x (t) x¯i(t)  ∞ n k i − k into account, we conclude that i j=1

+ O(αt x˜ i(t) x¯ i(t) ) lim inf(F(¯x(t)) F(x∗), x¯(t) x∗) = 0 a.s., k − − − k t − − ni →∞ 1 X j which together with strict monotonicity of F implies exis- + O(αtEt ei ((t)) ) ni {k k} tence of the subsequence x¯(tm) such that limm x¯(tm) = j=1 →∞ ni x∗ almost surely. From Proposition1 it follows that 1 X 2 j 2 (j) + O(α (1 + Et e (t) )). limm xi (tm) = x∗ a.s. for all j [ni], i [n]. Finally, n t {k i k } →∞ ∈ ∈ i j=1 by taking into account existence of the finite almost sure limit   2 Pn 1 Pni (j) 2 of u(t) = x (t) x∗ as t , By taking into account Proposition1 2) and the definition of k k i=1 ni j=1k i − i k → ∞ we conclude that x˜ i(t) (see (4)), we conclude that a.s. − (j)   Pr lim x (t) xi∗ = 0 = 1 for all j [ni], i [n], ni t i X∞ 1 X (j) { →∞k − k } ∈ ∈ O  αt x (t) x¯i(t)  < , n k i − k ∞ and, therefore, t=0 i j=1

Pr lim xi(t) xi∗ = 0 = 1 for all i [n]. {t k − k } ∈ X∞ →∞ O(αt x˜ i(t) x¯ i(t) ) < . k − − − k ∞ t=0 IV. GRADIENT ESTIMATIONS Moreover, due to Assumption5, In this section we present an approach to estimate the ni X∞ 1 X j gradients of the agents’ cost functions in such a way that O(αtEt ei ((t)) < , Assumption5 is fulfilled. The idea is borrowed from the ni {k k} ∞ t=0 j=1 work [1] dealing with bandit learning in games. We assume

ni the safety ball parameters ri and pi (see Assumption3) X∞ 1 X 2 j 2 are known for each agent from the cluster i. To obtain the αt (1 + Et ei (t) )) < ni {k k } ∞ j (j) t=0 j=1 estimation di (t) based on the current estimation xi (t) and to follow the update in (6), each agent j [n ] in the cluster V. SIMULATION RESULTS ∈ i i, i [n], takes the following steps at time t. The agent In this section, we verify our theoretical analysis with a ∈ j samples the vector zi (t) from the uniform distribution on practical simulation in order to show that the states of the n the unit sphere S R i . The query direction is defined by agent system converge to the Nash equilibrium, defined in (j) j ⊂ 1 (j) w (t) = z (t) r− (x (t) p ). Then, the query point i i − i i − i Definition1, when using the update equation (6) and the at which the oracle calculates the local cost function value oracle gradient estimation of (17). As an example applica- is tion, we chose a version of the well-known Cournot game. Consider the following setup: There are n companies that xˆ(j)(t) = x(j)(t) + σ w(j)(t) i i t i compete against each other regarding the price of some 1 (j) j 1 = (1 σtr− )x (t) + σt(z + r− p), (16) specific product. Each company i owns n factories that − i i i i i 1 produce said product. It is assumed that all factories produce where σt is the query radius chosen such that σtri− < 1. the product with the same quality. The cost of factory (j) (j) j Note that, given , the query point above j + xi (t) Ωi xˆi (t) belonging to cluster i for producing the amount x R of (j) ∈ i ∈ 0 is feasible, i.e. xˆi (t) Ωi (see [1] for more details). The the product is specific for this factory and defined by gradient estimation itself∈ is obtained as follows: j j j j 2 j j j Ci (xi ) = ai (xi ) + bi xi + ci . (19) j ni (j) (j) ˆ j di (t) = Ji (ˆxi (t), x˜ i(t)) zi , (17) σt − · Naturally, the amount of product produced cannot be neg- ative. Furthermore, each company has lower and upper j where x˜ˆ i(t) is defined as in (3). This vector is then used − production limits. The former defines a lower bound xi , for to follow the update in (6). As it has been proven in [1] which production is still cost efficient, while the latter defines (see, for example, (4.7) in [1]), dj(t), as constructed above, j i a production facility dependent upper bound xi . satisfies the following property: Each company i aggregates the product, produced in their ni j j (j) j factories, and sells it. In this version of the Cournot game, it di (t) = iJi (xi (t), x˜ i(t)) + ei (t), − is assumed that there exists only a single customer instance ∇ j where Et e (t) = O(σt), that buys all the aggregated product from all companies. The {k i k}  1  price that the customer pays per unit of product is dependent j 2 (18) Et ei (t) = O 2 on the total supply by all companies and therefore defined {k k } σt as follows: n ni with x˜ i(t) defined as in (4). Thus, for fulfillment of X X j − P (x) = Pc x , (20) Assumption5 the step size parameter αt and the query radius − i i=1 j=1 σt must be balanced as follows: where Pc is a constant, which is chosen such that for any X∞ X∞ α = , α < , decision vector x x x it holds that P (x) > 0. With t ∞ t ∞ ≤ ≤ t=0 t=0 this price definition and assuming that the production costs for the factories belonging to the company is shared, each X∞ X∞ α2 α σ < , t < . company i aims to minimize its profit function, therefore t t ∞ σ2 ∞ t=0 t=0 t solving the following optimization problem:

α0 σ0 ni One example of an appropriate choice is α = a , σ = t t t tb X j j j with min Ci (xi ) xi P (xi, x i) = min Ji(xi, x i) x xi xi − − x xi xi − i≤ ≤ i≤ ≤ 1 j=1 < a 1, b 0, (21) 2 ≤ ≥ It can be seen that the companies’ profits are coupled by a + b > 1, 2a 2b > 1. the customer’s price function, therefore a Nash equilibrium − needs to be found, from which no company has any incentive One possible parameter set is a = 1, b = 1 . 3 to deviate. Relating to the n-cluster games described in Remark 1. There exist other approaches to one-point gra- this paper, the companies represent the clusters, while the dient estimations. The most known one corresponds to the factories correspond to the agents. It can be readily confirmed queries sampled from the Gaussian distribution (see [2], that the profit optimization problem in Equation (21) fulfils [14]). However, to guarantee feasibility in this case, the the Assumptions1-2. query points have to be projected onto the action sets. To For our simulation, we choose a small setup consisting of be able to control the deviation term, that appears due to two clusters, each containing four agents with parameters this projection, one needs to introduce an auxiliary time- listed in TableV and cost coefficient Pc = 250. The dependent parameter to the projection step of the main agents inside the cluster i are connected by an undirected procedure (see [2], [13]). Thus, introducing this parameter communication graph Gi that fulfils Assumption4. Over this will somewhat complicate the analysis. That is why we graph, state information is shared inside the cluster i such (j) leave the approach based on sampling from the Gaussian that an estimation of all agent’s states xi can be performed. distribution beyond the scope of this paper. The agents update their own gradient estimation according Company i = 1 Company i = 2 j 1 2 3 4 1 2 3 4 VI.CONCLUSION j ai 5 8 4 5 3 7 9 2 In this paper we presented the distributed gradient play j bi 10 11 9 12 10 11 12 9 algorithm for strictly convex n-cluster games with commu- cj 1 3 2 5 3 2 3 1 i nication setups within each cluster and a zero-order oracle xj 0 0 0 0 0 0 0 0 i in the whole system. We prove the almost sure convergence xj 20 20 20 20 10 10 10 10 i of this procedure to the unique Nash equilibria given an TABLE I appropriate estimations of the local agents’ gradients. The PARAMETERSOFAGENTSIN COURNOT-GAME. future work will be devoted to investigation of possible modifications which should enable a faster convergence rate.

APPENDIX 20 x1 The following is a well-known result of Robbins and x 2 Siegmund on non-negative random variables [10]. x∗ 15 Theorem 5. Let (Ω,F,P ) be a probability space and F1 F ... a sequence of sub-σ-algebras of F . Let z , b , ξ⊂, 2 ⊂ t t t and ζt be non-negative Ft-measurable random variables ) t

( satisfying

j i 10 x E(zt+1 Ft) zt(1 + bt) + ξt ζt. | ≤ − 5 Then, almost surely limt zt exists and is finite for the P →∞ P case in which ∞ bt < , ∞ ξt < . Moreover, P{ t=1 ∞ t=1 ∞} in this case, ∞ ζ < almost surely. t=1 t ∞ 0 REFERENCES 0 0.2 0.4 0.6 0.8 1 [1] M. Bravo, D. Leslie, and P. Mertikopoulos. Bandit learning in concave Iterations t 105 · n-person games. In Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS’18, page 5666–5676, Red Hook, NY, USA, 2018. Curran Associates Inc. Fig. 1. Convergence of the agent’s states towards the Nash equilibrium of [2] A.D. Flaxman, A.T. Kalai, and H.B. McMahan. Online convex 5 ∗ optimization in the bandit setting: Gradient descent without a gradient. the n-cluster game. Error norm after 1 × 10 steps: ||x − x ||2= 0.400. In Proceedings of the Sixteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’05, pages 385–394, USA, 2005. Society for Industrial and Applied Mathematics. [3] B. Gharesifard and J. Cortes.´ Distributed convergence to nash to equation (17), using the zero-order oracle information equilibria in two-network zero-sum games. Automatica, 49(6):1683– j 1692, 2013. at the query point xˆi defined in (16). The time-dependent, [4] M. Jarrah, M. Jaradat, Y. Jararweh, M. Al-Ayyoub, and A. Boussel- decreasing step-size αt of the gradient update and the query ham. A hierarchical optimization model for energy data flow in smart radius σ are chosen such that Assumption5 is satisfied. grid power systems. Information Systems, 53:190–200, 2015. t [5] M. Meng and X. Li. On the linear convergence of distributed nash Choosing a good set of parameters α0, σ0 > 0 and a, b is equilibrium seeking for multi-cluster games under partial-decision crucial for the convergence speed of the algorithm. Even information. arXiv preprint:2005.06923, 2020. then, due to the fact that only zero-order information is [6] A. Nedic´ and J. Liu. Distributed optimization for control. Annual Review of Control, Robotics, and Autonomous Systems, 1(1):77–103, available, convergence is slow. In FigureV the states, re- 2018. sulting from the application of the proposed algorithm to the [7] A. Nedic,´ A. Ozdaglar, and P. A. Parrilo. Constrained consensus and scenario specified above, are plotted. The dashed line marks optimization in multi-agent networks. IEEE Transactions on Automatic Control, 55(4):922–938, 2010. the true Nash Equilibrium x∗, while the two solid coloured [8] D. Niyato, A. V. Vasilakos, and Z. Kun. Resource and revenue sharing lines distinguish between the states of company 1, i.e. x1, with coalition formation of cloud providers: Game theoretic approach. and company 2, i.e. x , respectively. Because there are only In 2011 11th IEEE/ACM International Symposium on Cluster, Cloud 2 and Grid Computing, pages 215–224, 2011. four agents in each cluster, which are connected by an almost [9] J.-S. Pang and F. Facchinei. Finite-dimensional variational inequalities fully connected graph, the consensus dynamic of the agent and complementarity problems : vol. 1. Springer series in operations system is almost negligible against the gradient estimation research. Springer, New York, Berlin, Heidelberg, 2003. [10] H. Robbins and D. Siegmund. A convergence theorem for non negative and update dynamic. It can be seen that the algorithm almost supermartingales and some applications. In Herbert Robbins converges to a satisfactory vicinity of the Nash Equilibrium Selected Papers, pages 111–135. Springer, 1985. states after about 1 105 Iterations at which the error norm [11] W. Saad, H. Zhu, H. V. Poor, and T. Bas¸ar. Game-theoretic methods · for the smart grid: An overview of microgrid systems, demand-side between the agent’s states and the true Nash equilibrium x∗ management, and smart grid communications. IEEE Signal Processing measures x x∗ 2 = 0.40. While the constraints of company Magazine, 29(5):86–105, 2012. 2 are not touched,|| − || the true Nash Equilibrium for two firms of [12] G. Scutari, S. Barbarossa, and D. P. Palomar. Potential games: A framework for vector power control problems with coupled constraints. the second company lies at their maximum production limit In 2006 IEEE International Conference on Acoustics Speech and 10. Signal Processing Proceedings, volume 4, pages 241–244, May 2006. [13] M. Kamgarpour T. Tatarenko. Bandit online learning of nash equilibria in monotone games. arXiv preprint:2009.04258, 2020. [14] T. Tatarenko and M. Kamgarpour. Learning nash equilibria in monotone games. In 2019 IEEE 58th Conference on Decision and Control (CDC), pages 3104–3109, 2019. [15] A. C. Tellidou and A. G. Bakirtzis. Agent-based analysis of capacity withholding and tacit in electricity markets. IEEE Transac- tions on Power Systems, 22(4):1735–1742, Nov 2007. [16] G. Hu Y. Pang. Nash equilibrium seeking in n-coalition games via a gradient-free method. arXiv preprint:2008.12909, 2020. [17] M. Ye, G. Hu, and F. L. Lewis. Nash equilibrium seeking for N- coalition noncooperative games. Automatica, 95:266–272, 2018. [18] M. Ye, G. Hu, and S. Xu. An extremum seeking-based approach for Nash equilibrium seeking in N-cluster noncooperative games. Automatica, 114:108815, 2020. [19] X. Zeng, J. Chen, S. Liang, and Y. Hong. Generalized Nash equilibrium seeking strategy for distributed nonsmooth multi-cluster game. Automatica, 103:20–26, 2019. [20] J. Zimmermann, T. Tatarenko, V. Willert, and J. Adamy. Gradient- tracking over directed graphs for solving leaderless multi-cluster games. arXiv preprint:2102.09406, 2021.