A Coordinate Descent Primal-Dual Algorithm and Application To

arXiv:1407.0898v3 [math.OC] 30 Sep 2015 ifrnibe h i st itiuieysolve distributively to is aim The differentiable. o aiTc,Prs rne h hr uhri with is E-mail author France. third Grenoble, ODISSEE. The ANR-grant Fourier, (DGA) Agency Defense France. Joseph [email protected] Paris, [email protected], Université pascal.bianchi, ParisTech, LJK, com where ealna prtr ie w elcne functions convex real two Given operator. linear a be g of osnu loihs rmlDa Algorithm. Primal-Dual algorithms, Consensus r sdwt admcoc ftecoordinates. the of choice algori situa random a optimization other with convex to used adapted descent are naturally coordinate be resul where can Numerical tions method. approach idle. the general of go The performance and exch attractive neighbors, random estimates, the their local a demonstrate their with iteration, update data each up, some at wake a Th agents where objective. of having aggregate algorithm subset the an s of each yields minimum term, the agents, method differentiable on a consensus of a containing find network function The cost distri case. A a separate particular asynchronously problem. a solve as to optimization used ADMM is known algorithm varia well obtained Co a the V˜u and upon by includes proposed builds that algorithm algorithm (deterministic) recent The a iteration. coordinat of of each subset at random a updated where introduced, is algorithm iiiainproblem minimization fti ae usaseilepai ntepolmof problem o the framework, on particular emphasis of set this special a considers In a optimization. puts distributed paper this va functions of are general contributions very theoretical for our Although continuous. a rvt oto h form the of cost private a has w ovxcs ucino oe(te)space (other) some on function cost convex two h rttoatosaewt h NSLC;Tele- LTCI; CNRS the with are authors two first The on Let Abstract ne Terms Index α aeae prtr,arnoie rmlda optimizati primal-dual randomized a operators, -averaged X X f oriaeDsetPia-ulAlgorithm Primal-Dual Descent Coordinate A n plcto oDsrbtdAsynchronous Distributed to Application and n elcne function convex real a and and sdfeetal n t gradient its and differentiable is ae nteie frnoie oriaedescent coordinate randomized of idea the on Based — Y itiue piiain oriaeDescent, Coordinate Optimization, Distributed — etoEciensae n let and spaces Euclidean two be x N inf ∈X u inf ∈ gnssc htec agent each that such agents X .I I. f n X ( N =1 x NTRODUCTION + ) hswr a rne yteFrench the by granted was work This . f f n , g ( u ( f g x + ) n + ) and + g h g h n n h ( ( on u Mx h plcto part application the , where ) .Baci .Hce n .Iutzeler F. and Hachem W. Bianchi, P. Y . ) ecnie the consider we , ∇ f f Optimization n n sLipschitz- is X M 1 = and , : f Y → X N , . . . , n g f e to eek being buted n thms ange ndat sis es and are (2) (1) lid on ne nt ts s: e - sotnamnaoyse ntecntuto fdistributed of construction the in (2 step Problem mandatory algorithms. of a reformulation this often see, is shall we As otherwise. niao ucin qa ozr if zero to equal function, indicator pca ntneo rbe 1 foechooses one if (1) Problem of instance special equal biul,polm()i qiaett h iiiainof minimization the to equivalent f is (2) problem Obviously, o all for nodrt osrc itiue loihsasadr ap- standard a algorithms introducing distributed in consists construct proach to order In ( )VuadCna aesprtl rpsda algorithm an proposed separately have Condat and V˜u 1) )Bsdo h dao the of idea the on Based 2) x u otiuin r sfollows. as are contributions Our + ) oeipratyi h otx fti ae,ti idea this paper, this of context the in importantly More loih sa ntneo ocle Krasnosel’skii- so-called a an their to of that applied idea instance iteration the Mann [2] an [1]- is from borrow algorithm we step, first a As h elkonAM 3,[]a h pca case special the as [4] [3], ADMM to corresponding known (Alternating includes well for it ADMM+ because as the plus) algorithm to Multipliers iterative of refer Method on we Direction an Elaborating which respectively. provide (1) [2] solving we and algorithm, [1] in this (1) solve to scmae oteoiia ˜/odtalgorithm. V˜u/Condat sizes original step provably the the to is on compared as assumptions ADMM+ weaker optimization, under convergent distributed the of trtossilcnegst xdpito an of randomized point Krasnosel’skii-Mann fixed a a the to that converges of still generality iterations version this most In descent the in iteration. coordinate each of show at we principle coordinates paper, The of subset operator. only update random the to a is of algorithms descent point coordinate converge stochastic fixed iterations a Krasnosel’skii-Mann to the propertie make contraction-like have that operators Such 5.2]. tion h a enmil tde nteltrtr nthe in literature we the [5]–[7], algorithms in a gradient develop proximal studied of mainly case special been has who ena h plcto fa of application be can the algorithm as convergence this the seen since a of algorithm, versions result gradient descent coordinate proximal side stochastic a the of as proof provides This operator. i.e. f g x ( ( , x x ( = x ) = ) 1 ne h osritta l opnnsof components all that constraint the under = x itiue asynchronous distributed n X 1 · · · N =1 x , . . . , f = n ( x f x N N n 0 = hrfr,Polm()i nfc a fact in is (2) Problem Therefore, . ) ) ntepoutspace product the in neetnl,i h framework the in Interestingly, . and tcatccodnt descent coordinate stochastic α x 1 aeae prtr[,Sec- [8, operator -averaged g / 1 ( 2 x = aeae prtr[8]. operator -averaged = ) · · · eso fADMM+. of version n X = N =1 x g N h n ( ( n to and X Mx α x n -averaged ) = ) san as x X + are N ∞ s 1 ) . 2

leads to provably convergent asynchronous distributed 2) The algorithm is a proximal method. Similarly to the versions of ADMM+. distributed ADMM, it allows for the use of a proximity 3) Putting together both ingredients above, we apply our operator at each node. This is especially important to findings to asynchronous distributed optimization. First, cope with the presence of possibly non-differentiable reg- the optimization problem (1) is rewritten in a form ularization terms. This is unlike the classical adaptation- where the operator M encodes the connections between diffusion methods mentioned above or the more recent the agents within a graph in a manner similar to [9]. first order distributed algorithm EXTRA proposed by Then, a distributed optimization algorithm for solving [28]. Problem (2) is obtained by applying ADMM+. Using the 3) The algorithm is a first-order method. Similarly to idea of coordinate descent on the top of the algorithm, we adaptation-diffusion methods, our algorithm allows to then obtain a fully asynchronous distributed optimization compute gradients of the local cost functions. This is algorithm that we refer to as Distributed Asynchronous unlike the distributed ADMM which only admits implicit Primal Dual algorithm (DAPD). At each iteration, an steps i.e., agents are required to locally solve an optimiza- independent an identically distributed random subset of tion problem at each iteration. agents wake up, apply essentially the proximity operator 4) The algorithm admits constant step size. As remarked in on their local functions, send some estimates to their [28], standard adaptation-diffusion methods require the neighbors and go idle. use of a vanishing step size to ensure the convergence to An algorithm that has some formal resemblance with the sought minimizer. In practice, this comes at the price ADMM+ was proposed in [10], who considers the minimiza- of slow convergence. Our method allows for the use of a tion of the sum of two functions, one of them being subjected constant step size in the gradient descent step. to noise. This reference includes a linearization of the noisy The paper is organized as follows. Section II is devoted to function in ADMM iterations. the the introduction of ADMM+ algorithm and its relation with The use of stochasti