Quick viewing(Text Mode)

Projected Push-Sum Gradient Descent-Ascent for Convex

Projected Push-Sum Gradient Descent-Ascent for Convex

arXiv:2004.02854v2 [eess.SY] 11 Aug 2020 inadoeaino ope networks”. complex of operation for and methods theoretical tion System systems: 198 energy SPP multimodal the within DFG) Forschungsgemeinschaft, (Deutsche amtd,Germany. Darmstadt, ehd,pbihdfreapei 7,[]ad[2,have These [12], time. and in [9] point [7], distinct in a example for at been published has methods, the tracki work gradient of the use of that of instead lot methods core optimization the a to dedicated as Currently, methods strategy. gradient optimization order first conve employ either We is problem non-convex. resulting the or the on power choice, Depending generat function local. the each therefore cost of and constrains private limits the it remain but should as generators, all production defined, of production the globally balanc the is keeping problems, such constraint and In limits. generator’s demand power, the producing inside the for o cost matching overall goal The the while function. minimize cost to distinct is a rep- DEDPs distributed with agent relevant generator the each only a where is resents (DEDP), application are problem example that dispatch An constraints economic agent. local single a and to be system can a the it all effect in problems, that optimization constrained constraints the global such between in distinguished of considered many be For t to process. on depending need constraints, that of variety is variable application, a Objective problem to optimization. subject of to often welfare type are is social This a private. network the as remain form exact known should the to the latter while known of functions, the local only of goal all of function, overall sum the cost minimize The local scenarios agent. a such respective has In agent each multi-agent-networks. over distributed osrit a eepesdi oa n lblsubsets. global and local t which in in expressed problem, be dispatch can is economic constraints distributed algorithm standa the a The to given of optimization. plied solution, distributed reformulation optimal in an a assumptions to convergence on algorithm prove proposed results and the matrix-updates provide single We opti into the push-sum sum. to convergence their providing of functions, descent-asc cost gradient local pro- a the and consensus diffusion push-sum information adjusted for tocol matr an t column-stochastic combines a able algorithm as are represented The be agents communicat can The which weighted function. graph, directed, a objective over global information the exchange acce of exclusive part has a agent defined each problems where optimization multi-agent-networks, distributed and constrained rjce uhSmGain ecn-setfrCne Opt Convex for Descent-Ascent Gradient Push-Sum Projected h okwsgaeul upre yteGra eerhFou Research German the by supported gratefully was work The h uhr r ihteCnrlMtosadRbtc a tT at Lab Robotics and Methods Control the with are authors The ecnie osrie piiainpolm htare that problems optimization constrained consider We Abstract epooeanvlagrtmfrsligconvex, solving for algorithm novel a propose We — ihApiaint cnmcDsac Problems Dispatch Economic to Application with .I I. a imran ain aaek,Vle ilr,J¨urge Willert, Volker Tatarenko, Tatiana Zimmermann, Jan NTRODUCTION h transforma- the Hbi and “Hybrid 4 ndation n on ent gents mum sto ss ing ap- ion ng on he ix. he rd or of U x o s f , hl epciglclcntan yadn hmoe a over them adding by constraint constraint global local incorporate to respecting order while in projection wh algorithm, a descent-ascent uses gradient propose distributed push- we novel optimum, the a the on to rely convergence whic not For [15], consensus. do or sum but [4] in updates, ones row-stochastic the use to st also row- closer basic is single algorithm the the push- a of analysis, ture easier into the for algorithm employ matrix-update push-sum we stochastic However, the [13], diffusion. formulating information and by for Simil [5] consensus global. average in and sum approaches local the into constraints to expl the the to exploiting of by commu- ability distinction done directed is the This assuming architectures. while with nication constraints, projection number local the of parameter respect advantage reduced the combine methods’ to seeks work Our have need chosen. sequence they be parameter [13], to penalization no in that global. method advantage constrai projection-free the therefore local the with and agents to the agent Compared of privacy every the by restricts the This known is optimiz are consideration distributed the problem under of constraints DEDP all that the th assumption of to drawback relation structure in major specific methods The projection-based function. mentioned proximal the of convex a projection uses a with that combined consensus push-sum Finally the information. employs matrices gradient projected communication the row-stochastic in diffusing on for contributions rely The [15] and graphs. method [4] undirected proposed to the restri However, communication which matrix-updates, [8]. double-stochastic on in met relies published gradient distributed was projection-based, ods first the [17]. of trivial One non be optimally be to to sequences needs proved dependent parameter those penalization Choosing a determined. for this sequence sequence makes second step-size a the which problems, to optimization next local, However, DEDP. of the range be a including wide to method, a by to tha constraints applicable this is objective method all approach For the this considers of [17]. into it property in One incorporated function. analyzed are penalization published further constraints was and the information, [13] spreading in for respect pu to already algorithm the able uses have also are sum that that publications method methods projection-free of A order constraints. couple first on a focused hand, been other y tracking the gradient in On considered However been a not convergence. the have require for constraints for usually sequence used methods steps-size be order diminishing can first step-size while constant update, gradient a that advantage the Adamy n imization tthe ct [14] , ation ruc- nts. icit ich sh- et. h- ar s, h e s t , , to the local cost functions. known by every agent. 0 Rm Within this article, we make the following contributions: After defining Mi = {µi ∈ |µi  0} and by using T n First, we provide a reformulation of the unconstrained push- µ = (µi )i=1, the dual function of the problem takes the sum consensus, which differs to the one published in e.g. form [5], and provide convergence properties of the update matrix. n T Secondly, we prove convergence of the distributed gradient q(µ) = inf Fi(x)+ µi gi(x) , (2) x∈X descent-ascent method to an optimal solution of the problem, (i=1 ) X respecting privacy of the local constraints. At last, we show that our proposed algorithm is applicable to the DEDP. with which the dual problem The paper is structured as follows: In section II we provide our notation for formulas and graphs. The main part begins max q(µ), (3a) µ in section III with the formulation of the problem class and 0 the results on the reformulation of the push-sum, before s.t. µ ∈M , (3b) our algorithm for solving the defined problems is proposed. 0 Rn×m 0 The subsequent section IV provides the convergence proof with M = {µ ∈ |µi ∈ Mi , i = [n]} can be of the proposed method. In section V we undergird the defined. In accordance to that, the global Lagrangian is the theoretic results by a simulation of a basic DEDP, before sum of the local Lagrangians we summarize and conclude our results in section VI. n n II. NOTATION AND GRAPHS T L(x, µ)= Li(x, µi)= Fi(x)+ µi gi(x). (4) Throughout the paper we use the following notation: The i=1 i=1 X X set of non-negative integers is denoted by Z+. All time indices t in this work belong to this set. To differentiate Before we continue with the analysis of the push-sum between multi-dimensional vectors and scalars we use bold- consensus for information diffusion, we make the following face. || · || denotes the standard euclidean norm. 1, ..., n is assumptions regarding the above problem: denoted by [n]. The element ij of matrix M is denoted Assumption 1. F is strongly convex on X and gi is convex by M . The operation is the projection of onto ij X (u) u for i = [n]. The optimal value F ∗ is finite and there is the convex set X such that (u) = arg min ||x − u||. ∗ ∗ Q X x∈X no duality gap, namely F = q . There exist compact sets Instances of a variable x at time t are denoted by x[t]. 0 Mi ⊂ M ,i = [n] containing the dual optimal vectors The directed graph Q consists of a set of vertices i G = {V, E} V µ∗,i = [n]. and a set of edges E = V×V. Vertex j can send information i to vertex i if (j, i) ∈E. The communication channels can be Assumption 2. X ⊂ Rd is convex and compact. described by the Perron matrix P , where Pij > 0 if (j, i) ∈ E and zero otherwise. This notation includes self-loops such Remark 1. Given the convexity properties of the problem and Slater’s constraint qualification, we have strong duality, that Pii > 0. The set containing the in-neighborhood nodes + − which implies that the duality gap is zero. Furthermore, the is Ni , while Ni is the set of out-neighbors. The out degree − optimal vector for µi is then uniformly bounded in the norm of node i is denoted with di = |Ni |. We say that a directed graph is strongly connected if there exists a path from every (see [2]). Thereby, Assumption 1 is given, for example, if vertex to every other vertex. Slater’s condition holds, F and gi,i = [n], are continuous and the domain X is compact. III. PROJECTED PUSH-SUMGRADIENTDESCENT-ASCENT Assumption 3. The ∇ L (x, µ ), ∇ L (x, µ ) A. Problem formulation and matrix update of the push-sum x i i µi i i exist and are uniformly bounded on X and M , i.e. ∃L < consensus i x ∞,Lµi < ∞ such that ||∇xLi(x, µi)|| ≤ Lx for x ∈ We consider optimization problems of the form X , µi ∈ Mi and ||∇µi Li(x, µi)|| ≤ Lµi for x ∈ X , µi ∈ n Mi. min F (x) = min Fi(x), (1a) x x i=1 Remark 2. Note that this means that for either fixed µi X T or fixed x the Lagrangian L (x, µ ) is Lipschitz-continuous s.t. gi(x) ≤ 0, gi(x) = (gi1(x), ..., gim(x)) , (1b) i i with constant L , L , respectively. x ∈X ⊂ Rd, (1c) x µi Assumption 4. The Graph G = {V, E} is fixed and strongly where functions F : Rd → R and g : Rd → R i ij connected. The associated Perron matrix P is column- are differentiable for i = [n] and for j = [m]1. While we stochastic. consider gi(x) to be local constraint functions of agent i, the constraint set X is assumed to be global and therefore Remark 3. If, for example, agent i weights its messages by 1/(di), with di representing the out degree of i, the resulting 1For the sake of notation, we assume here that all agents have m local constraints. However, the following considerations hold for arbitrary, yet communication matrix contains the elements Pij = 1/di, finite numbers of local constraints that can differ between the agents. which achieves column-stochasticity of P . Recall the push-sum consensus protocol from [1], [3] holds, with w being the right eigenvector of P for the eigenvalue λ =1. Therefore, y[t +1]= Py[t], (5a) 1 −1 1 x[t +1]= Px[t], (5b) lim Φ(t, 0) = (diag[w]) w1T = 11T , t→∞ n n xi[t + 1] zi[t +1]= , (5c) which is a double-stochastic matrix. yi[t + 1] Part c): with initial states x[0] = z[0] = x0 and y[0] = 1. The Using the results from b), we can write for finite s 0 and λ ∈ (0, 1) that satisfy the following expressions for i, j = [n], 0 ≤ s ≤ t and ∀t: Φ(t,s)= Q[t]Q[t − 1]...Q[s + 1]Q[s], t>s (8) a) Φ n with (t,t)= Q[t], for easier notation. 1 t Φ(t, 0)ij − Φ(t, 0)ij ≤ Cλ (9) n Lemma 1. Given Assumption 4, the time-dependent commu- i=1 X nication matrix Q[t] of equation (6) and the matrix product b) Φ(t,s) defined in (8) have the following properties: n Φ 1 Φ t−s a) Matrix Q[t] and matrix Φ(t,s) are row-stochastic for (t,s)ij − (t,s)ij ≤ Cλ (10) n 0 ≤ s ≤ t and all t. i=1 X Proof. Part a): Φ 1 11T We add −1/n + 1/n to the term on the left side of the b) limt→∞ (t, 0) = n . inequality in (9) and apply the triangle inequality, what Φ 1 1 T results in c) limt→∞ (t,s)= n y[s] for finite 0 ≤ s 0, λ ∈ (0, 1), which is a standard procedure for row-stochastic, non-negative matrix because, with y[t +1] = P t+11, each dimension of y[t + 1] multiplications, see for example Proposition 1 in [5]. contains the sum over the respective row of P t+1. Therefore, Part b): diag(y[t+1])−1 norms the rows of P t+1, such that the above Following the same line of thought as in a), we add 1 T 1 T holds. As this is true for arbitrary t, we can factor out Q[t] + n yj [s] − n yj [s] to the left side of the inequality in and show row-stochasticity for all Q[t]: (10) and receive: n Φ(t, 0)1 = Q[t]Φ(t − 1, 0)1 = Q[t]1 = 1. 1 1 1 Φ(t,s) − y [s]T + Φ(t,s) − y [s]T . ij n j n ij n j Thereby, we also have for 0 ≤ s ≤ t i=1 X Again, Lemma 1 showed convergence of the first term to Φ(t,s)1 = Q[t]Q[t − 1]...Q[s]1 = 1. zero for finite s

Following from implication (15), the third line converges t=0 s=0 X X to zero as well, which concludes the proof of part a). Applying this to our case, we conclude Part b): ∞ We have αtbt < ∞. ∞ ∞ ∞ t=0 α ||x [t] − x¯[t]|| ≤ α a + α b X t i t t t t Finally, we have t=0 t=0 t=0 X ∞ X nX ∞ ∞ x 1 x x x + α ||ǫ [t − 1]|| + ||ǫ [t − 1]|| (17) α ||ǫ [t − 1]|| ≤ α 1||ǫ [t − 1]|| < ∞, t i n i t i t− i t=0 i=1 ! t=0 t=0 X X X X with sequences where we used again the implication (16) and the fact that α is non-increasing. n n t 1 Therefore, at = Φ(t − 1, 0)ij − Φ(t − 1, 0)ij ||xj [0]|| ∞ =1 n =1 j i αt||xi[t] − x¯[t]|| < ∞ X X t=0 X and bt = holds, which concludes the proof. t−2 n n 1 Φ(t−1,s+1) − Φ(t−1,s+1) ǫx[s] . ij n ij j s=0 j=1 i=1 The above Lemma is necessary for the convergence proof X X X of the suggested algorithm. Next, we provide an upper bound From Lemma 2 a) we know that the sequence at can be for the algorithm updates in each time step. ′ bounded by a sequence at as follows Proposition 1. Let Assumptions 2, 3 and 4 hold. Then, for n ∗ ∗ t ′ the optimal primal dual pair x ∈ X , µi ∈Mi and t ≥ t0, at ≤ Cλ ||xj [0]|| = at. the following bound holds j=1 X n ∞ ′ ∗ 2 ∗ 2 The t=0 at converges, as it is a geometric, conver- yi[t + 1]||xi[t + 1] − x || + ||µi[t + 1] − µi || gent series with 0 <λ< 1. By direct comparison test it i=1 ∞ X follows thatP converges as well, because n  t=0 at 0 ≤ at ≤ ∗ 2 ∗ 2 ′ ≤ yi[t]||xi[t] − x || + ||µi[t] − µ || at. i Resulting fromP Assumption 5, α is a positive, non- i=1 t X ¯ ∗ ∗ ∗ ∗ ∗  ∗ increasing sequence. Therefore, there exists a 0 t0, it holds that ≤ yj [t]||xj [t] − x || Pij = yi[t]||xi[t] − x || , nwi wi j=1 i=1 i=1 yi[t] ≥ 2 > 2 as limt→∞ yi[t] = wi. Therefore, using X X X the gradient bounds it holds for t>t0 that where we replaced Q[t]ij with its elements in the first line, rearranged the sums in the second and used the column- n n n stochasticity property of P . With that, 2 2 1 2 2 (19e) + (19f) ≤ Lxαt + αt Lµi . n n w i=1 i=1 i i=1 ∗ 2 ∗ 2 X X X (19b) ≤ yi[t]||xi[t] − x || + ||µi[t] − µi || . i=1 i=1 Combing above results concludes the proof. X X  For (19c), because of convexity of Li(zi[t], µi[t]) for fixed µi[t], we have −∇ L (z [t], µ [t])T (z [t] − x∗) x i i i i Before we are able to finally prove convergence of our ∗ ≤ Li(x , µi[t]) − Li(zi[t], µi[t]). method to the optimum of problem (1), we provide the

In (19d), Li(zi[t], µi[t]) depends affinely on µi[t] with fixed following Lemma, which is the deterministic version of a zi[t] and therefore Theorem in [10]: T ∗ ∞ ∞ ∞ ∞ ∇µi Li(zi[t], µi[t]) (µi[t] − µi ) Lemma 4. Let {vt}t=0, {ut}t=0, {bt}t=0 and {ct}t=0 ∞ ∗ be non-negative sequences such that t=0 bt < ∞ and = Li(zi[t], µi[t]) − Li(zi[t], µi ). ∞ ∗ ∗ ∗ ∗ t=0 ct < ∞ and Combing above results, adding +Li(x , µi ) − Li(x , µi ) P ¯ ∗ ¯ ∗ and +Li(x[t], µi ) − Li(x[t], µi ), as well as summing from P i =0 to n, we receive vt+1 ≤ (1 + bt)vt − ut + ct, ∀t ≥ 0. n ¯ ∗ ∗ ∗ ∞ (19c) + (19d) ≤−2αt L(x[t], µ ) − L(x , µ ) Then vt converges and t=0 ut < ∞. i=1 X ∗ ∗ ∗ This Lemma will beP the key element for proving the + L(x , µ ) − L(x , µ[t]) n following main result: ∗  ∗ − 2αt (Li(zi[t], µ ) − Li(x¯[t], µ )) . i i Theorem 1. Let Assumptions 1-5 hold. Then, xi[t] and µi[t], i=1 X updated by the rules in (11a) - (11d), converge to an optimal ∗ ∗ primal dual pair (x , µi ) ∈X×M for i = [n] as t → ∞. Proof. To apply Lemma 4 let us define Resulting from the strong convexity of F (x) over X , the ∗ ∗ ∗ ∗ n equality L(xˆ, µ ) = L(x , µ ) = minx L(x, µ ) implies ∗ 2 ∗ 2 ˆ ∗ ˆ ˆ ∗ vt = yi[t]||xi[t] − x || + ||µi[t] − µi || , that x = x . Due to dual feasibility of µ, x = x and ∗ ∗ ∗ ∗ i=1 L(x , µˆ) = L(x , µ ) = maxµ≥0 L(x , µ), it is implied X ∗ ∗ ∗ ∗ ∗ ∗ ∗ ut =2αt(L(x¯[t], µ ) − L(x , µ )+ L(x , µ ) that (xˆ, µˆ) = (x , µ ). Next, taking into account Lemma 3a) − L(x∗, µ[t])), and (20), we obtain n n n 2 2 1 ∗ 2 ∗ 2 ct =2αtLxn ||xi[t] − x¯[t]|| + L α δ = lim yi[t]||xi[t] − x || + ||µi[t] − µ || x t w t→∞ i i=1 i=1 i i=1 n X X Xn  2 ∗ 2 ∗ 2 + α Lµ , = lim yi[tl ]||x¯[tl ] − x || + ||µi[tl ] − µ || t i s→∞ s s s i i=1 i=1 X X  bt =0. =0.

For showing that ct is sumable, we recall from Lemma 3b) Finally, as y [t] > 0 for all t, we conclude ∞ i that, under given assumptions, =0 αt||xi[t] − x¯[t]|| < ∞. t ∗ Therefore, lim ||xi[t] − x || =0, P t→∞ ∞ n ∗ lim ||µi[t] − µi || =0 for i = [n]. 2Lxn αt ||xi[t] − x¯[t]|| < ∞. t→∞ t=0 i=1 X X By Assumption 5, we directly have 2 n 1 ∞ 2 L + Lµ α < ∞. Together, this V. SIMULATION x i=1 wi i t=0 t results in  P  ∞P As motivated in the introduction, we consider an economic dispatch problem as an example application. In this problem, ct < ∞. t=0 a group of networked generators seeks to fulfill some pre- X Applying Lemma 4, we can then make the statements defined demand D while minimizing their summed up local n cost functions, which are assumed to take a quadratic form. ∗ 2 ∗ 2 Formally, the problem can be defined by ∃δ, lim vt = lim yi[t]||xi[t]−x || + ||µi[t]−µ || t→∞ t→∞ i i=1 n n X  2 (20) min Ci(pi) = min aipi + bipi + ci, (22a) = δ ≥ 0, p p ∞ ∞ i=1 i=1 ∗ ∗ ∗ Xn X ut = 2αt(L(x¯[t], µ ) − L(x , µ ) t=0 t=0 s. t. pi = D, (22b) X X i=1 + L(x∗, µ∗) − L(x∗, µ[t]) < ∞. (21) X p ≤ p ≤ p , ∀i ∈ [n]. (22c) ∞ i,min i i,max Because the sum of the step-size is infinite, t=0 αt = ∞, by assumption, there need to exist subsequences x[tl], µi[tl], The power balance constraint (22b) consists of the sum of all P such that local decision variables pi and is therefore part of the global ∗ ∗ ∗ ∗ ∗ ∗ constraint set. Furthermore, we add the technical constraint lim L(x¯[tl], µ )−L(x , µ )+L(x , µ )−L(x , µ[tl]) = 0 l→∞ that all power outputs of the generators should be positive ∗ p ≥ 0, defining the global constraint set as P = {p|p ≥ Because L(x[t], µ[t]) is affine for fixed x[t] = x and n ∗ 0, pi = D}. Thereby, we ensure that P is closed and convex for fixed µi[t]= µ , it holds that, ∀tl, i=1 i bounded and therefore compact, satisfying Assumption 2. ¯ ∗ ∗ ∗ ∗ ∗ ∗ L(x[tl], µ )−L(x , µ ) ≥ 0, L(x , µ )−L(x , µ[tl]) ≥ 0. Furthermore,P this provides that the projection onto P binds Therefore, the limit holds, if and only if the generation pi, because, regardless of other pj , j = [n] 6= , it holds that . The lower and upper power ∗ ∗ ∗ i 0 ≤ pi ≤ D lim L(x¯[tl], µ )=L(x , µ ), l→∞ limits of each generator in constraint (22c) allocate the local ∗ ∗ ∗ part of the constraint set and are therefore exclusively known lim L(x , µ[tl])=L(x , µ ). l→∞ by the respective agent i. Following from convergence of vt to some constant δ, the The solution set is non-empty if ¯ subsequences x[tl] and µ[tl] are bounded. With that, we can n n x¯ µ choose convergent subsequences [tls ] and [tls ], such that p ≤ D ≤ p . ¯ ˆ ˆ min,i max,i lims→∞(x[tls ], µ[tls ]) = (x, µ) ∈X×M, since X and M i=1 i=1 are closed. Therefore, it holds that X X Therefore, there exists at least one relative interior point ¯ ∗ ˆ ∗ ∗ ∗ lim L(x[tls ], µ )= L(x, µ )= L(x , µ ) and for which the affine equality and inequality constraints are s→∞ ∗ ∗ ∗ ∗ fulfilled, which satisfies the Slater constraint qualification. lim L(x , µ[tl ]) = L(x , µˆ)= L(x , µ ). s→∞ s Together with the strong convexity of the cost functions, we 1.6 δ1 1.4 δ2 δ3 1.2 δ4 1.0

0.8

0.6 Relative error

1.6 VI. CONCLUSION 0.4 δ1 1.4 δ2 In the work at hand, we tailored a solution method to δ3 a class of distributed, convex optimization problems that 1.2 0.2 δ4 need to respect both global and local constraints. Each agent 1.0 projects its gradient update onto the global set while updating a Lagrange parameter, over which their local constraints 0.8 0 are added to the cost function. Convergence to the optimal 0.6 value was proven and some convergence properties shortly Relative error 0 1,000 2,000discussed by 3an,000 example of the 4 economic,000 dispatch problem. 0.4 IterationFuture work will include augmenting the method for time- 0.2 varying communication architectures in order to make the algorithm more robust against failing communication chan- 0 nels. 0 1,000 2,000 3,000 4,000 Iteration REFERENCES

0.60 Fig. 1. Convergence of relative error with step size at = 15/t . Largest [1] T. Charalambous and C. N. Hadjicostis. Average consensus in error after 4000 iterations: 0.01. the presence of dynamically changing directed topologies and time delays. Proceedings of the IEEE Conference on Decision and Control, 2015(February):709–714, 2014. [2] J.-B. Hiriart-Urruty and C. Lemar´echal. Convex analysis and mini- mization algorithms : 1. Fundamentals, 1996. conclude that Assumption 1 is given for this problem. [3] D. Kempe, A. Dobra, and J. Gehrke. Gossip-based computation of The local Lagrangians of the problem takes the form aggregate information. In Proceedings - Annual IEEE Symposium on Foundations of Computer Science, FOCS, volume 2003-January, pages 482–491. IEEE Computer Society, 2003. 2 [4] H. Li, Q. Lu, and T. Huang. Distributed Projection Subgradient L(pi, µi)= aipi + bipi + ci Algorithm over Time-Varying General Unbalanced Directed Graphs. IEEE Transactions on Automatic Control, 64(3):1309–1316, 2019. + µi,1(pi,min − pi)+ µi,2(pi − pi,max) [5] A. Nedi´c and A. Olshevsky. Distributed optimization over time- varying directed graphs. IEEE Transactions on Automatic Control, 60(3):601–615, March 2015. To check, whether Assumption 3 is satisfied, the gradients [6] A. Nedi´c, A. Olshevsky, and M. G. Rabbat. Network Topology and of the local Lagrangians are inspected. First, we realize that Communication-Computation Tradeoffs in Decentralized Optimiza- tion. Proceedings of the IEEE, 106(5):953–976, 2018. pi[t] ≡ zi[t] of equation (11b) is uniformly bounded in every [7] A. Nedi´c, A. Olshevsky, and W. Shi. Achieving geometric convergence time step. This results from projecting the gradient update for distributed optimization over time-varying graphs. SIAM Journal onto the convex and compact set P and communicating the on Optimization, 27(4):2597–2633, 2017. [8] A. Nedi´c, A. E. Ozdaglar, and P. A. Parrilo. Constrained consensus and results (x[t + 1] in equation (11c)) in the next time step optimization in multi-agent networks. IEEE Trans. Automat. Contr., via a row-stochastic communication matrix, such that the 55(4):922–938, 2010. result lies inside the convex hull spanned by x[t + 1] and [9] S. Pu, W. Shi, J. Xu, and A. Nedi´c. Push-pull gradient methods for distributed optimization in networks. CoRR, abs/1810.06653, 2018. therefore lies inside of P. Using this result and the fact that [10] H. Robbins and D. Siegmund. A convergence theorem for non negative Slater’s constraint qualification is given, which provides us almost supermartingales and some applications. Optimizing methods with uniform bounds on µi (see Remark 1), we can conclude in statistics, Academic Press, New York, pages 233–257, 1971. [11] E. Seneta. Non-negative Matrices and Markov Chains. Springer Series that both ∇pL(pi, µi) and ∇µi L(pi, µi) can be uniformly in Statistics. Springer New York, New York, NY, 1981. bounded. Together with the strong convexity of the cost [12] W. Shi, Q. Ling, G. Wu, and W. Yin. EXTRA: an exact first-order functions (22a), we conclude that Assumption 3 is given for algorithm for decentralized consensus optimization. SIAM Journal on Optimization, 25(2):944–966, 2015. Problem (22). [13] T. Tatarenko, J. Zimmermann, V. Willert, and J. Adamy. Penalized We chose a simple setup of four generators and designed push-sum algorithm for constrained distributed optimization with the demand D and generator limits such that above equation application to energy management in smart grid. In 58th Conference on Decision and Control (CDC), Dezember 2019. is true. Each agent i maintains an estimation pi, containing [14] K.I. Tsianos, S. Lawlor, and M.G. Rabbat. Consensus-based dis- n all decision variables, i.e. pi = (pi)i=1. The agents were tributed optimization: Practical issues and applications in large-scale connected by a static, strongly connected graph, such that machine learning. In 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 1543– Assumption 4 is satisfied. At last, the step size sequence 1550. IEEE, October 2012. was chosen by a grid search according to Assumption 5 with [15] Sy Van Mai and Eyad H. Abed. Distributed optimization over weighted 0.60 at = 15/t . directed graphs using row stochastic matrix. Proceedings of the American Control Conference, 2016-July:7165–7170, 2016. In Figure 1, the convergence of the relative errors δk = |pk − [16] M. Zhu and E. Frazzoli. Distributed robust adaptive equilibrium ∗ ∗ pk|/pk, k = [4] of one example agent are depicted. Three of computation for generalized convex games. Automatica, 63:82 – 91, the four states approach 0 already after 500 iterations, while 2016. [17] J. Zimmermann, T. Tatarenko, V. Willert, and J. Adamy. Optimales the error δ3 shows slow convergence over several hundred Energie-Management ¨uber verteilte, beschr¨ankte Gradientenverfahren. iterations. After 1000 iterations, all errors are below 0.04 and at - Automatisierungstechnik, 67(11):922–935, November 2019. after 4000 iterations the highest relative error in the agent system is lower than 0.01.