Projected Push-Sum Gradient Descent-Ascent for Convex
arXiv:2004.02854v2 [eess.SY] 11 Aug 2020 inadoeaino ope networks”. complex of operation for and methods theoretical tion System systems: 198 energy SPP multimodal the within DFG) Forschungsgemeinschaft, (Deutsche amtd,Germany. Darmstadt, ehd,pbihdfreapei 7,[]ad[2,have These [12], time. and in [9] point [7], distinct in a example for at been published gradient has methods, the tracki work gradient of the use of that of instead lot methods core optimization the a to dedicated as Currently, methods strategy. gradient optimization order first conve employ either We is problem non-convex. resulting the or the on power choice, Depending generat function local. the each therefore cost of and constrains private limits the it remain but should as generators, all production defined, of production the globally balanc the is keeping problems, such constraint and In limits. generator’s demand power, the producing inside the for o cost matching overall goal The the while function. minimize cost to distinct is a rep- DEDPs distributed with agent relevant generator the each only a where is resents (DEDP), application are problem example that dispatch An constraints economic agent. local single a and to be system can a the it all effect in problems, that optimization constrained constraints the global such between in distinguished of considered many be For t to process. on depending need constraints, that of variety is variable application, a Objective problem to optimization. subject of to often welfare type are is social This a private. network the as remain form exact known should the to the latter while known of functions, the local only of goal all of function, overall sum the cost minimize The local scenarios agent. a such respective has In agent each multi-agent-networks. over distributed osrit a eepesdi oa n lblsubsets. global and local t which in in expressed problem, be dispatch can is economic constraints distributed algorithm standa the a The to given of optimization. plied solution, distributed reformulation optimal in an a assumptions to convergence on algorithm prove proposed results and the matrix-updates provide single We opti into the push-sum sum. to convergence their providing of functions, descent-asc cost gradient local pro- a the and consensus diffusion push-sum information adjusted for tocol matr an t column-stochastic combines a able algorithm as are represented The be agents communicat can The which weighted function. graph, directed, a objective over global information the exchange acce of exclusive part has a agent defined each problems where optimization multi-agent-networks, distributed and constrained rjce uhSmGain ecn-setfrCne Opt Convex for Descent-Ascent Gradient Push-Sum Projected h okwsgaeul upre yteGra eerhFou Research German the by supported gratefully was work The h uhr r ihteCnrlMtosadRbtc a tT at Lab Robotics and Methods Control the with are authors The ecnie osrie piiainpolm htare that problems optimization constrained consider We Abstract epooeanvlagrtmfrsligconvex, solving for algorithm novel a propose We — ihApiaint cnmcDsac Problems Dispatch Economic to Application with .I I. a imran ain aaek,Vle ilr,J¨urge Willert, Volker Tatarenko, Tatiana Zimmermann, Jan NTRODUCTION h transforma- the Hbi and “Hybrid 4 ndation n on ent gents mum sto ss ing ap- ion ng on he ix. he rd or of U x o s f , hl epciglclcntan yadn hmoe a over them adding by constraint constraint global local incorporate to respecting order while in projection wh algorithm, a descent-ascent uses gradient propose distributed push- we novel optimum, the a the on to rely convergence whic not For [15], consensus. do or sum but [4] in updates, ones row-stochastic the use to st also row- closer basic is single algorithm the the push- a of analysis, ture easier into the for algorithm employ matrix-update push-sum we stochastic However, the [13], diffusion. formulating information and by for Simil [5] consensus global. average in and sum approaches local the into constraints to expl the the to exploiting of by commu- ability distinction done directed is the This assuming architectures. while with nication constraints, projection number local the of parameter respect advantage reduced the combine methods’ to seeks work Our have need chosen. sequence they be parameter [13], to penalization no in that global. method advantage constrai projection-free the therefore local the with and agents to the agent Compared of privacy every the by restricts the This known is optimiz are consideration distributed the problem under of constraints DEDP all that the th assumption of to drawback relation structure in major specific methods The projection-based function. mentioned proximal the of convex a projection uses a with that combined consensus push-sum Finally the information. employs matrices gradient projected communication the row-stochastic in diffusing on for contributions rely The [15] and graphs. method [4] undirected proposed to the restri However, communication which matrix-updates, [8]. double-stochastic on in met relies published gradient distributed was projection-based, ods first the [17]. of trivial One non be optimally be to to sequences needs proved dependent parameter those penalization Choosing a determined. for this sequence sequence makes second step-size a the which problems, to optimization next local, However, DEDP. of the range be a including wide to method, a by to tha constraints applicable this is objective method all approach For the this considers of [17]. into it property in One incorporated function. analyzed are penalization published further constraints was and the information, [13] spreading in for respect pu to already algorithm the able uses have also are sum that that publications method methods projection-free of A order constraints. couple first on a focused hand, been other y tracking the gradient in On considered However been a not convergence. the have require for constraints for usually sequence used methods steps-size be order diminishing can first step-size while constant update, gradient a that advantage the Adamy n imization tthe ct [14] , ation ruc- nts. icit ich sh- et. h- ar s, h e s t , , Lagrange multiplier to the local cost functions. known by every agent. 0 Rm Within this article, we make the following contributions: After defining Mi = {µi ∈ |µi 0} and by using T n First, we provide a reformulation of the unconstrained push- µ = (µi )i=1, the dual function of the problem takes the sum consensus, which differs to the one published in e.g. form [5], and provide convergence properties of the update matrix. n T Secondly, we prove convergence of the distributed gradient q(µ) = inf Fi(x)+ µi gi(x) , (2) x∈X descent-ascent method to an optimal solution of the problem, (i=1 ) X respecting privacy of the local constraints. At last, we show that our proposed algorithm is applicable to the DEDP. with which the dual problem The paper is structured as follows: In section II we provide our notation for formulas and graphs. The main part begins max q(µ), (3a) µ in section III with the formulation of the problem class and 0 the results on the reformulation of the push-sum, before s.t. µ ∈M , (3b) our algorithm for solving the defined problems is proposed. 0 Rn×m 0 The subsequent section IV provides the convergence proof with M = {µ ∈ |µi ∈ Mi , i = [n]} can be of the proposed method. In section V we undergird the defined. In accordance to that, the global Lagrangian is the theoretic results by a simulation of a basic DEDP, before sum of the local Lagrangians we summarize and conclude our results in section VI. n n II. NOTATION AND GRAPHS T L(x, µ)= Li(x, µi)= Fi(x)+ µi gi(x). (4) Throughout the paper we use the following notation: The i=1 i=1 X X set of non-negative integers is denoted by Z+. All time indices t in this work belong to this set. To differentiate Before we continue with the analysis of the push-sum between multi-dimensional vectors and scalars we use bold- consensus for information diffusion, we make the following face. || · || denotes the standard euclidean norm. 1, ..., n is assumptions regarding the above problem: denoted by [n]. The element ij of matrix M is denoted Assumption 1. F is strongly convex on X and gi is convex by M . The operation is the projection of onto ij X (u) u for i = [n]. The optimal value F ∗ is finite and there is the convex set X such that (u) = arg min ||x − u||. ∗ ∗ Q X x∈X no duality gap, namely F = q . There exist compact sets Instances of a variable x at time t are denoted by x[t]. 0 Mi ⊂ M ,i = [n] containing the dual optimal vectors The directed graph Q consists of a set of vertices i G = {V, E} V µ∗,i = [n]. and a set of edges E = V×V. Vertex j can send information i to vertex i if (j, i) ∈E. The communication channels can be Assumption 2. X ⊂ Rd is convex and compact. described by the Perron matrix P , where Pij > 0 if (j, i) ∈ E and zero otherwise. This notation includes self-loops such Remark 1. Given the convexity properties of the problem and Slater’s constraint qualification, we have strong duality, that Pii > 0. The set containing the in-neighborhood nodes + − which implies that the duality gap is zero. Furthermore, the is Ni , while Ni is the set of out-neighbors. The out degree − optimal vector for µi is then uniformly bounded in the norm of node i is denoted with di = |Ni |. We say that a directed graph is strongly connected if there exists a path from every (see [2]). Thereby, Assumption 1 is given, for example, if vertex to every other vertex. Slater’s condition holds, F and gi,i = [n], are continuous and the domain X is compact. III. PROJECTED PUSH-SUMGRADIENTDESCENT-ASCENT Assumption 3. The gradients ∇ L (x, µ ), ∇ L (x, µ ) A. Problem formulation and matrix update of the push-sum x i i µi i i exist and are uniformly bounded on X and M , i.e. ∃L < consensus i x ∞,Lµi < ∞ such that ||∇xLi(x, µi)|| ≤ Lx for x ∈ We consider optimization problems of the form X , µi ∈ Mi and ||∇µi Li(x, µi)|| ≤ Lµi for x ∈ X , µi ∈ n Mi. min F (x) = min Fi(x), (1a) x x i=1 Remark 2. Note that this means that for either fixed µi X T or fixed x the Lagrangian L (x, µ ) is Lipschitz-continuous s.t. gi(x) ≤ 0, gi(x) = (gi1(x), ..., gim(x)) , (1b) i i with constant L , L , respectively. x ∈X ⊂ Rd, (1c) x µi Assumption 4. The Graph G = {V, E} is fixed and strongly where functions F : Rd → R and g : Rd → R i ij connected. The associated Perron matrix P is column- are differentiable for i = [n] and for j = [m]1. While we stochastic. consider gi(x) to be local constraint functions of agent i, the constraint set X is assumed to be global and therefore Remark 3. If, for example, agent i weights its messages by 1/(di), with di representing the out degree of i, the resulting 1For the sake of notation, we assume here that all agents have m local constraints. However, the following considerations hold for arbitrary, yet communication matrix contains the elements Pij = 1/di, finite numbers of local constraints that can differ between the agents. which achieves column-stochasticity of P . Recall the push-sum consensus protocol from [1], [3] holds, with w being the right eigenvector of P for the eigenvalue λ =1. Therefore, y[t +1]= Py[t], (5a) 1 −1 1 x[t +1]= Px[t], (5b) lim Φ(t, 0) = (diag[w]) w1T = 11T , t→∞ n n xi[t + 1] zi[t +1]= , (5c) which is a double-stochastic matrix. yi[t + 1] Part c): with initial states x[0] = z[0] = x0 and y[0] = 1. The Using the results from b), we can write for finite s
Following from implication (15), the third line converges t=0 s=0 X X to zero as well, which concludes the proof of part a). Applying this to our case, we conclude Part b): ∞ We have αtbt < ∞. ∞ ∞ ∞ t=0 α ||x [t] − x¯[t]|| ≤ α a + α b X t i t t t t Finally, we have t=0 t=0 t=0 X ∞ X nX ∞ ∞ x 1 x x x + α ||ǫ [t − 1]|| + ||ǫ [t − 1]|| (17) α ||ǫ [t − 1]|| ≤ α 1||ǫ [t − 1]|| < ∞, t i n i t i t− i t=0 i=1 ! t=0 t=0 X X X X with sequences where we used again the implication (16) and the fact that α is non-increasing. n n t 1 Therefore, at = Φ(t − 1, 0)ij − Φ(t − 1, 0)ij ||xj [0]|| ∞ =1 n =1 j i αt||xi[t] − x¯[t]|| < ∞ X X t=0 X and bt = holds, which concludes the proof. t−2 n n 1 Φ(t−1,s+1) − Φ(t−1,s+1) ǫx[s] . ij n ij j s=0 j=1 i=1 The above Lemma is necessary for the convergence proof X X X of the suggested algorithm. Next, we provide an upper bound From Lemma 2 a) we know that the sequence at can be for the algorithm updates in each time step. ′ bounded by a sequence at as follows Proposition 1. Let Assumptions 2, 3 and 4 hold. Then, for n ∗ ∗ t ′ the optimal primal dual pair x ∈ X , µi ∈Mi and t ≥ t0, at ≤ Cλ ||xj [0]|| = at. the following bound holds j=1 X n ∞ ′ ∗ 2 ∗ 2 The series t=0 at converges, as it is a geometric, conver- yi[t + 1]||xi[t + 1] − x || + ||µi[t + 1] − µi || gent series with 0 <λ< 1. By direct comparison test it i=1 ∞ X follows thatP converges as well, because n t=0 at 0 ≤ at ≤ ∗ 2 ∗ 2 ′ ≤ yi[t]||xi[t] − x || + ||µi[t] − µ || at. i Resulting fromP Assumption 5, α is a positive, non- i=1 t X ¯ ∗ ∗ ∗ ∗ ∗ ∗ increasing sequence. Therefore, there exists a 0