Using a Factored Dual in Augmented Lagrangian Methods for Semidefinite Programming

arXiv:1710.04869v3 [math.OC] 10 Sep 2018 ntrso optto ieadmmr eurmns once both requirements, impractical memory and become time m SDP computation of solve terms to in methods point terior order the in5sosnmrclrslsadScin6concludes. 6 Section and results Sec- numerical 4.1. shows Section 5 in tion ADAL of how rate on convergence insights the give improve We can method. point fixed a th as ADAL convergence of The iteration step. one improvement of as within used variable is dual algorithm method ADAL the new of our Sectio update outline in given tional we are 4, af- matrix, Section dual Lagrangian, the In augmented of of the factorization description the maximize ter a we how 2 given. is on Section programs Details semidefinite In solving for methods notations. ADAL our the state and lation fashion. variabl unconstrained aug an dual in the the performed to of be respect can maximization with the function em that Lagrangian by mented so matrix factorization, dual eliminate a the to on ploying be constraint will semidefinite idea positive this key the The of introducin by purpose refinements. L 4] the [3, computational augmented in is direction proposed It alternating algorithms (ADAL) the grangian 5]. on 4, elaborate to 3, paper 2, L [1, augmented approaches on based grangian some including literature, the in duced rpitsbitdt Rletters OR to submitted Preprint iet oefie rsrbdpeiin u h computation the but precision, e prescribed fixed some to time Introduction 1. a the Keywords: of benefits the showing reported, converge are the results improving Numerical of aim Lag the with augmented Lagrangian the augmented d of the maximization on unconstrained constraint resulting semidefinite positive solvi the for eliminating approaches Lagrangian of augmented of context the In Abstract Santis), MSC: 2010 Aglk Wiegele) (Angelika sn atrdDa nAgetdLgaga ehd o Se for Methods Lagrangian Augmented in Dual Factored a Using ff ntermidro hsscinw ietepolmformu- problem the give we section this of remainder the In eient rgas(D)cnb ovdi polynomial in solved be can (SDP) Programs Semidefinite r rw ohwt h number the with both grows ort DADAL ≥ mi addresses: Email 10 [email protected] 4 n aiyflosb h nlssdn n[] htlosat looks that [5], in done analysis the by follows easily eea loihi lentvshv enintro- been have alternatives algorithmic Several . fteudryn pc fsmercmtie.In- matrices. symmetric of space underlying the of a eient rgamn,AtraigDrcinAugmente Direction Alternating Programming, Semidefinite 02,9C0 90C06 90C30, 90C22, iatmnod nenraIfraiaAtmtc Gesti e Automatica Informatica Ingegneria di Dipartimento [email protected] b ntttfu ahmtk Alpen-Adria-Universität Klag für Mathematik, Institut FazRendl), (Franz m ainaD Santis De Marianna [email protected] fcntansadwith and constraints of DADAL Mran De (Marianna naddi- an : DADAL a rn Rendl Franz , a arxb mlyn atrzto.Hnso o odeal to how on Hints factorization. a employing by matrix ual 3. n pproach. nl,Spez nvri` iRm,VaAiso 5 00185 25, Ariosto, Via Roma, Università di Sapienza onale, gsmdfiiepormigpolm,w netgt h po the investigate we problems, programming semidefinite ng es a- a- al of maximum approximate the use further We given. are rangian g e - - c aeo lentn ieto umne arninfra Lagrangian augmented direction alternating of rate nce nut nvri¨tsrß 56,92 lgnut Aust Klagenfurt, 9020 Universitätsstraße 65-67, enfurt, ntedaoa.With diagonal. the in notengtv eient oeb ( by cone semidefinite negative the onto in.W lodnt yDiag( by denote also We tion). oun of columns yvec( by and isnr ncs fmtie n h ulda omi case matri in symmetric some norm of Euclidean S projection the the denote and We matrices vectors. of of case in norm nius ai in basis where ota togdaiyhls ne hsasmto,( assumption, this Under holds. duality strong that so sisadjoint, its is ( h tnadpia-ulpi fSPpolm sfollows: as problems SDP of pair primal-dual standard the ..PolmFruainadNotations and Formulation Problem 1.1. etesto oiiesmdfiiemtie.Dntn by Denoting matrices. semidefinite positive of h set the be efrhrasm htmatrix that assume further We pia fadol if only and if optimal A X X notepstv eient oeb ( by cone semidefinite positive the onto , easm htbt rbeshv tityfail points, feasible strictly have problems both that assume We Let Let X Y ∈ b ) i neiaWiegele Angelika , i arninmto,teafunction theta method, Lagrangian d S S = v C = n + M n R , h ∈ etestof set the be trace( ∈ the ) A n hnvranr sue,w osdrteFrobe- the consider we used, is norm a Whenever . i R M A , S X A n n X , ntpo ahohr(vec other each of top on i XY ⊤ mn and b = with y dmninlvco omdb tcigthe stacking by formed vector -dimensional ∈ h tnadinrpoutin product inner standard the ) = b , M P R e max A s.t. n i m i i -by- Z ednt the denote we ∈ min y , s.t. b ∈ i ∈ A A R S ient Programming midefinite i n S . C b Z n m : n + , ymti arcsand matrices symmetric T × , ∈ − i S h n A X y A v C ntefloig edenote we following, the In . n = S h ignlmti having matrix diagonal the ) A X ∈ a ulrank. full has , C n + → X 1 ⊤ = , S , . . . , i y − n + b R i = A t etro h standard the of vector -th , m S − Z ⊤ ria m 1 S ) y oa Italy Roma, stelna operator linear the is − steivreopera- inverse the is ) = and . + Z n t projection its and , etme 2 2018 12, September A ⊤ S ZX n : ewrite we , meworks. R X S = ihthe with ssibility , m n + y 0 , → ⊂ . Z is ) the (1) (2) (3) S S x v n n 2. Augmented Lagrangian Methods for SDP The Boundary Point Method proposed in [3, 4] can also be viewed as an alternating direction aug- Let X ∈ Sn be the Lagrange multiplier for the dual equation mented Lagrangian method. In fact, as also noted Z − C + A⊤y = 0. In order to solve Problem (2) with the aug- in [5], the available implementation mprw.m (see mented Lagrangian method we introduce https://www.math.aau.at/or/Software/) is an alternating direction augmented Lagrangian method, since in the T ⊤ σ ⊤ 2 Lσ(y, Z; X) := b y − hZ − C + A y, Xi− kZ − C + A yk . inner loop (see Table 2 in [3]) only one iteration is performed. 2 We report in Algorithm 1, the scheme of mprw.m. The stop- To solve (2), we deal with ping criterion for mprw.m considers only the following primal and dual infeasibility errors: max Lσ(y, Z; X) + (4) s.t. y ∈ Rm, Z ∈ S , kAX − bk n r = , P 1 + kbk where X is fixed and σ> 0 is the penalty parameter. Once Problem (4) is (approximately) solved, the multiplier X kC − Z − A⊤yk r = ; is updated by a first order rule: D 1 + kCk S+ = + + A⊤ as the other optimality conditions (namely, X ∈ n , Z ∈ X X σ(Z − C y) (5) S+ = n , ZX 0) are satisfied up to machine accuracy throughout and the process is iterated until convergence, i.e. until the op- the algorithm. More precisely, the algorithm stops as soon as timality conditions (3) are satisfied within a certain tolerance the quantity = (see Chapter 2 in [6] for further details). δ max{rP, rD}, Problem (4) is a convex quadratic semidefinite optimization is less than a fixed precision ε> 0. problem, which is tractable but expensive to solve directly. Sev- eral simplified versions have been proposed in the literature to Algorithm 1 Scheme of mprw.m quickly get approximate solutions. S+ In the alternating direction framework proposed by Wen et al. 1 Initialization: Choose σ> 0, X ∈ n , ε> 0. Set Z = 0. [5], the augmented Lagrangian Lσ(y, Z; X) is maximized with respect to y and Z one after the other. More precisely, at every 2 Repeat until δ<ε: k+1 k+1 k+1 = AA⊤ −1 1 A 1 + iteration k, the new point (X , y , Z ) is computed by the 3 Compute y ( ) σ b − ( σ Y − C Z) ⊤ following steps: 4 Compute Z = −(X/σ −C + A y)− ⊤ X = σ(X/σ − C + A y)+ k+1 = k k y : argmax Lσk (y, Z ; X ), (6) 5 Compute δ = max{rP, rD} y∈Rm 6 Update σ k+1 k+1 k = k Z : arg max+ Lσ (y , Z; X ), (7) Z∈Sn It is the main purpose of this paper to investigate enhance- + + + Xk 1 := Xk + σk(Zk 1 − C + A⊤yk 1). (8) ments to this algorithm. The key idea will be to replace in the subproblem (4) the constraint Z ∈ S+ by Z = VV⊤ and consid- These three steps are iterated until a stopping criterion is met. n ering (4) as an unconstrained problem in y and V. The update of y in (6) is derived from the first-order optimality condition of Problem (4): yk+1 is the unique solution of 3. Solving the subproblem (4) k k k k k ⊤ ∇yLσk (y, Z ; X ) = b − A(X + σ (Z − C + A y)) = 0, By introducing a variable V ∈ Rn×r (1 ≤ r ≤ n), such that = ⊤ S+ that is Z VV ∈ n , we reformulate Problem (4) as the following 1 1 unconstrained maximization problem yk+1 = (AA⊤)−1 b − A( Xk − C + Zk) . σk σk max Lσ(y, V; X) Rm Rn×r (10) Then, the maximization in (7) is conducted by considering the s.t. y ∈ , V ∈ , equivalent problem where + k 2 min kZ W k , (9) T ⊤ ⊤ σ ⊤ ⊤ 2 Z∈S+ L (y, V; X) = b y−hVV −C +A y, Xi− kVV −C +A yk . n σ 2 k = Xk + A⊤ k+1 with W ( σk − C y ), or, in other words, by projecting Note that the number of columns r of matrix V represents the k S S− W ∈ n onto the (closed convex) cone n and taking its addi- rank of the dual variable Z. tive inverse (see Algorithm 1). Such a projection is computed The first-order necessary optimality conditions for Prob- via the spectral decomposition of the matrix Wk. lem (10) state the following: 2 ∗ ∗ Rm Rn×r = ⊤ ⊤ Proposition 1. Let (y , V ) ∈ × be a stationary point where fst(V) : es VV Vet, s ∈ {1,..., n}, t ∈ {1,..., r}. for Problem (10), then We get

∗ ∗ ∗ ∗⊤ ⊤ ∗ ∇yLσ(y , V ; X) = b − A(X + σ(V V − C + A y )) = 0, 2 ∂ Lσ(y, V; X) = ⊤ 2 + ⊤ 2 + 2 ∗ ∗ = + ∗ ∗⊤ + A⊤ ∗ ∗ = −2mss − 2σ (es Vet) kV esk kVetk . ∇V Lσ(y , V ; X) −2(X σ(V V − C y ))V 0. ∂vs,t∂vs,t (11) We deﬁne the n × r matrix H by From (11), we can easily see that we can keep the optimality conditions with respect to y satisﬁed, while moving V along any = + ⊤ 2 + ⊤ 2 + 2 n×r (H)s,t 2max{0, mss} 2σ (es Vet) kV esk kVetk , direction DV ∈ R : Rn×r Proposition 2. Let DV ∈ . Let and we propose to use the search direction DV , given by the

2 gradient scaled by H, thus y(V + αDV ) = y0 + αy1 + α y2, (12)

(∇V Lσ)s,t with (DV )s,t := . (13) = AA⊤ −1 1 A 1 + ⊤ Hs,t y0 ( ) σ b − ( σ X − C VV ) , = AA⊤ −1 A ⊤ + ⊤ y1 ( ) − (DV V VDV ) , We note that H is generically positive, as V should not contain ⊤ −1 ⊤ columns all equal to zero. In practice, we compute the scaled y2 = (AA ) − A(DV D ) . −3 V gradient direction in case k∇V Lσ(y, V; X)k < 10 . Then ∇ L (y(V + αD ), V + αD ; X) = 0, y σ V V 3.2. Exact Linesearch for all α ∈ R. Given a search direction D , we notethat L (y(V +αD ), V + Proof. Let α ∈ R. From (11), we have that ∇ L (y, V; X) = 0 V σ V y σ αD ; X) is a polynomial of degree 4 in α, so that we can inter- iff V 1 1 polate five different points AA⊤y = b − A( X − C + VV⊤) . σ σ (αi, Lσ(y(V + αiDV ), V + αiDV ; X)), i = 1,..., 5, Therefore, when V = V + αDV , we get 1 1 (AA⊤)y = b − A X − C + (V + αD )(V + αD )⊤ to get its analytical expression. This also means that the max- σ σ V V L y V + αD , V + αD X imum of σ( ( V ) V ; ) can be detected analyt- ically (using the Cardano formula). In practice, we evaluate 1 1 ⊤ = b − A X − C + VV the 4-degree polynomial L (y(V + αD ), V + αD ; X) in few σ σ σ V V thousands of points in (0, 10) and take the best α. A ⊤ + ⊤ 2A ⊤ − α (DV V VDV ) − α (DV DV ). Algorithm 2 is a scheme of the generic iteration we perform to maximize L (y, V; X), X being fixed. By multiplying both the l.h.s. and the r.h.s. with (AA⊤)−1, we σ get the expression in (12) and the proposition is proven. Algorithm 2 Solving (4) approximately Thanks to Proposition 2, we can maximize L (y, V; X) with σ 1 Input: σ> 0, y ∈ Rm, V ∈ Rn×r respect to V, keeping variable y updated according to (12) along 2 Repeat until k∇ L k < ǫ the iterations. Thus we are in fact maximizing a polynomial of V σ inner 3 Compute the search direction D ∈ Rn×r using (13) degree 4 in V. V 4 Compute optimal stepsize α 5 Update y = y(V + αD ) according to (12) 3.1. Direction Computation V and V = V + αDV . In order to compute an ascent direction for the augmented Lagrangian we consider two possibilities. Either we use the gradient of Lσ(y, V; X) with respect to V, or we use the gradient scaled with the inverse of the diagonal of the Hessian of Lσ(y, V; X). We recall the gradient of Lσ with respect to V, see 4. DADAL: a dual step for improving alternating direction (11), as the n × r matrix augmented Lagrangian methods for SDP ⊤ ∇V Lσ(y, V; X) = −2(M + σVV )V, Our idea is to insert the approximatedsolution of Problem(4) = + A⊤ where M X σ( y − C). In order to compute the generic obtained from Algorithm 2, as a simultaneous update of y and (s, t) entry on the main diagonal of the Hessian, we consider V to be performed before the projection step in Algorithm 1. We detail below the scheme of the ADAL method where this 1 + ⊤ lim fst(V teset ) − fst(V) , “dual step” is inserted. We refer to Algorithm 3 as DADAL. t→0 t 3 Algorithm 3 DADAL Note that, if ρ< 1, we have thatρ ¯ = ρ + ε< 1.

1 Initialization: Choose σ> 0, r > 0, ε> 0, Theorem 1. Let (ˆyk, Zˆk) be the dual variables obtained from S+ Rn×r = ⊤ Rm X ∈ n , V ∈ , Z VV , y ∈ . (yk, Zk) by performing one iteration of Algorithm 2. Let the 2 Repeat until δ<ε: direction DV in Algorithm 2, be chosen such that 3 Update (y, V) by Algorithm 2 ⊤ k ∗ k ∗ 4 Compute Z = −(X/σ − C + A y)− kZˆ − Z k≤ δkZ − Z k, (15) ⊤ X = σ(X/σ − C + A y)+ 5 Update r ← rank(Z) and V so that VV⊤ = Z with δ ≤ 1. Then the sequence {Xk, yk, Zk} generated by Algo- ∗ ∗ ∗ 6 Compute δ = max{rP, rD} rithm 3 converges to a solution {X , y , Z }. 7 Update σ Proof. Since condition (15) holds, we can apply Proposition 3, so that δ¯ exists, δ ≤ δ¯ ≤ 1, such that

4.1. Convergence Analysis k(Zˆk, Xk/σ) − (Z∗, X∗/σ)k≤ δ¯k(Zk, Xk/σ) − (Z∗, X∗/σ)k. The convergence analysis of DADAL may follow from the analysis in [5], that looks at ADAL as a fixed point method. Then, the series of inequalities (14) can be extended as: One iteration of the ADAL method can be seen as the result k+1 k+1 ∗ ∗ k k ∗ ∗ of the combination of two operators, namely (Zk+1, Xk+1) = k(Z , X /σ) − (Z , X /σ)k = kP(W(Zˆ , X )) − P(W(Z , X ))k k k P(W(Z , X )), where P denotes the projection performed at ≤ kW(Zˆk, Xk) − W(Z∗, X∗)k Step 4 in Algorithm1 and W(Zk, Xk) = Xk/σ−C+A⊤y(Zk, Xk), ≤ k(Zˆk, Xk/σ) − (Z∗, X∗/σ)k being y(Zk, Xk) = (AA⊤)−1 b/σk − A(Xk/σk − C + Zk) . It can ¯ k k ∗ ∗ be shown that P and W are non-expansive operators (see [5] for ≤ δk(Z , X /σ) − (Z , X /σ)k. further details). Hence, the key step in the proof of Theorem 2 in [5] states the following The rest of the proof follows the same arguments as those in Theorem 2 in [5]. k(Zk+1, Xk+1/σ) − (Z∗, X∗/σ)k = kP(W(Zk, Xk)) − P(W(Z∗, X∗))k Note that δ < 1 in (15) implies δ¯ < 1 and the additional step k k ∗ ∗ ≤ kW(Z , X ) − W(Z , X )k of maximizing the augmented Lagrangian is strictly improving (14) the convergence rate of ADAL methods. ≤ k(Zk, Xk/σ) − (Z∗, X∗/σ)k. Remark 1. When dealing with unconstrained optimization In DADAL, before the projection step, the dual variables are up- problems, it is well known that the sequence {xk} produced by dated by performing one maximization step for the augmented Newton’s method converges superlinearly to a stationary point Lagrangian, so that we get (ˆyk, Vˆ k) from (yk, Vk) and, in partic- x∗ if the starting point x0 is sufficiently close to x∗ (see e.g. Prop ⊤ ular, the dual matrix is given as Zˆk = Vˆ kVˆk . 1.4.1 in [7]). Therefore, condition (15) is satisfied when, e.g., DV is chosen as the Newton direction and our starting V in ∗ ∗ ∗ Proposition 3. Let Zˆ, Z, X, X , Z ∈ Sn and let Z , Z . Let Algorithm 2 is in a neighborhood of the optimal solution. As- sumption (15) is in fact motivating our direction computation: ˆ ∗ 2 ∗ 2 kZ − Z k ≤ ρkZ − Z k , as soon as we are close enough to an optimal solution, we try to mimic the Newton direction by scaling the gradient with the with 0 < ρ ≤ 1. Then ρ¯ exists, ρ¯ ≤ 1, ρ¯ ≥ ρ such that inverse of the diagonal of the Hessian (13). k(Zˆ, X) − (Z∗, X∗)k2 ≤ ρ¯k(Z, X) − (Z∗, X∗)k2 4.2. Choice and Update of the Penalty Parameter Proof. Let (y0, Z0, X0) be our starting solution. In defining a start- 0 k(Zˆ, X) − (Z∗, X∗)k2 = kZˆ − Z∗k2 + kX − X∗k2 ing penalty parameter σ , i.e. a starting value for scaling the violation of the dual equality constraints in the augmented La- ≤ ρkZ − Z∗k2 + kX − X∗k2 grangian, we might want to take into account the dual infea- = ∗ ∗ 2 ρ¯k(Z, X) − (Z , X )k , sibility error rD (defined in Section 2) at the starting solution. Since the penalty parameter enters in the update of the primal = + whereρ ¯ ρ ε, with solution (8) as well, we can tune σ0 so that the starting primal and dual infeasibility errors are balanced. ε = (1 − ρ) c(X, Z), Our proposal is to use the following as starting penalty pa- and rameter: ∗ 2 0 kX − X k rP kAX − bk = σ0 = c(X, Z) : . 0 ⊤ 0 (16) k(Z, X) − (Z∗, X∗)k2 rD kC − Z − A y k Since 0 ≤ c(X, Z) < 1, we have 0 ≤ ε < (1 − ρ), so thatρ ¯ ≤ 1 It has been noticed that the update of the penalty parameter σ is andρ ¯ ≥ ρ. crucial for the computational performances of ADAL methods 4 for SDPs [3, 4, 5]. Therefore, in order to improve the numerical perfomance of DADAL, strategies to dynamically adjust the value of σ may be considered: In the implementation of DADAL used in our numerical experience (see Section 5), we adopt the Problem n m p seed following strategy. P1 300 20000 3 3002030 We monitor the primal and dual relative errors, and note that P2 300 25000 3 3002530 the stopping condition of the augmented Lagrangian method is P3 300 10000 4 3001040 P4 400 30000 3 4003030 given by rD ≤ ε, provided that the inner problem is solved with reasonable accuracy. In order to improve the numerical perfor- P5 400 40000 3 4004030 P6 400 15000 4 4001540 mance, we use the following intuition. If rD is much smaller P7 500 30000 3 5003030 than r (we test for 100r < r ), this indicates that σ is too P D p P8 500 40000 3 5004030 big in the subproblem. On the other hand, if rD is much larger P9 500 50000 3 5005030 than rp (we test for 2rD > rP), then σ should be increased to P10 500 20000 4 5002040 facilitate overall progress. If either of these two conditions oc- P11 600 40000 3 6004030 curs consecutively for several iterations, we change σ dividing P12 600 50000 3 6005030 or multiplying it by 1.3. Similar heuristics have been suggested P13 600 60000 3 6006030 also in [5] and [4]. P14 600 20000 4 6002040 P15 700 50000 3 7005030 P16 700 70000 3 7007030 5. Numerical Results P17 700 90000 3 7009030 P18 800 70000 3 8007030 In this section we report our numerical experience: we P19 800 100000 3 80010030 compare the performance of DADAL and mprw.m on ran- P20 800 110000 3 80011030 domly generated instances, on instances from the SDP problem underlying the Lovász theta number of a graph and Table 1: Randomly generated instances. on linear ordering problem instances. Both mprw.m and DADAL are implemented in MATLAB R2014b and are available at https://www.math.aau.at/or/Software/. In our implementation of Algorithm 3, we use the choice and update strategy for the penalty parameter σ described in Section 4.2 and we perform two iterations of Algorithm 2 in order to update (y, V) in Step 3. We set the accuracy level ε = 10−5. MOSEK mprw.m DADAL We also report the run time of the interior point method for Problem time(s) iter time(s) iter time(s) SDP using the solver MOSEK [8]. The experiments were carried P1 633.1 340 10.1 68 10.2 out on an Intel Core i7 processor running at 3.1 GHz under P2 2440.4 427 33.1 76 31.2 Linux. P3 129.1 260 11.3 146 32.6 P4 7514.6 306 13.3 101 18.9 5.1. Comparison on randomly generated instances P5 - 376 51.4 73 56.3 P6 375.7 255 19.5 187 72.9 The random instances considered in the first experiment are P7 7388.3 268 11.5 151 23.1 some of those used in [4] (see Table 1). For these instances, the P8 - 289 15.2 130 30.7 AA⊤ Cholesky factor of is computed once and then used along P9 - 319 35.3 111 74.4 the iterations in order to update the dual variable y. P10 886.0 251 26.9 222 119.8 As can be seen in Table 2, interior point methods are not P11 - 266 17.3 177 41.9 able to solve instances with more than 30000 constraints due P12 - 275 18.6 148 42.1 to memory limitations. For what concerns DADAL, the number P13 - 293 28.4 132 68.7 of iterations decreases. However, we observe that a decrease P14 1156.3 249 20.3 270 116.2 in the number of iterations does not always correspond to an P15 - 262 22.4 207 83.7 improvement in the computational time: This suggests that the P16 - 278 30.1 151 79.2 update of the dual variables performed at Step 3 in DADAL may P17 - 303 74.2 128 181.5 be too expensive. According to the MATLAB profiling of our P18 - 264 34.3 207 111.2 P19 - 285 53.3 157 126.4 code, this is due to the need of solving three linear systems for P20 - 296 80.9 133 168.6 the update of y (see Proposition 2).

Table 2: Comparison on randomly generated instances. 5.2. Computation of the Lovász theta number Given a graph G, let V(G) and E(G) be its set of vertices and its set of edges, respectively. The Lovász theta number ϑ(G) of 5 Problem n m ϑ(G) mprw.m DADAL ϑ-62 300 13389 29.6413 Problem iter time(s) iter time(s) ϑ-82 400 23871 34.3669 ϑ-62 790 9.4 132 4.9 ϑ-102 500 37466 38.3906 ϑ-82 821 18.2 124 6.8 ϑ-103 500 62515 22.5286 ϑ-102 852 32.2 123 11.3 ϑ-104 500 87244 13.3363 ϑ-103 848 33.8 125 11.7 ϑ-123 600 90019 24.6687 ϑ-104 873 34.0 166 15.2 ϑ-162 800 127599 37.0097 ϑ-123 874 55.0 121 17.1 ϑ-1000 1000 249750 31.8053 ϑ-162 907 108.9 103 27.4 ϑ-1500 1500 562125 38.8665 ϑ-1000 941 212.9 105 50.6 ϑ-2000 2000 999500 44.8558 ϑ-1500 997 803.6 94 136.8 ϑ-2000 1034 1948.5 89 283.9 Table 3: Instances from the Kim-Chuan Toh collection [9]. Table 5: Comparison between mprw.m and DADAL on theta number instances - best parameter tuning. MOSEK mprw.m DADAL Problem time(s) iter time(s) iter time(s) ϑ-62 205.1 790 9.4 173 6.8 5.3. Comparison on linear ordering problem instances ϑ-82 1174.5 821 18.2 143 11.1 Ordering problems associate to each ordering (or permuta- ϑ-102 - 852 32.2 163 20.7 tion) of a set of n objects N = {1,..., n} a profit and the goal is ϑ-103 - 848 33.8 192 25.1 to find an ordering of maximum profit. In the case of the linear ϑ-104 - 873 34.0 290 32.9 ordering problem (LOP), this profit is determined by those pairs ϑ-123 - 874 55.0 191 38.6 ∈ × ϑ-162 - 907 108.9 165 61.2 (u, v) N N, where u comes before v in the ordering. The ϑ-1000 - 941 212.9 177 117.8 simplest formulation of LOP problems is a binary linear pro- ϑ-1500 - 997 803.6 131 266.6 gramming problem. Several semidefinite relaxation have been ϑ-2000 - 1034 1948.5 111 478.8 proposed to compute bounds on this challenging combinatorial problem [10], the basic one obtained from the matrix lifting ap- Table 4: Comparison on ϑ-number instances (from the Kim-Chuan Toh collec- proach has the following formulation: tion). max hC, Zi S+ = s.t. Z ∈ n , diag(Z) e, G is defined as the optimal value of the following SDP problem: yi j, jk − yi j,ik − yik, jk = −1, ∀i < j < k,

max hJ, Xi = 1 y⊤ s.t. Xi j 0, ∀ i j ∈ E(G), where Z = is of order n = N + 1. We have considered + y Y 2 traceX = 1, X ∈ S , " # n LOP instances where the dimension of the set N ranges from 10 to 100 and the matrix C is randomly generated. where J is the matrix of all ones. Again, for these instances, we have that AA⊤ is a diagonal In Table 3, we report the dimension and the optimal value matrix and using DADAL, especially when dealing with large of the theta instances considered, obtained from some random scale instances, leads to an improvement both in terms of num- graphs of the Kim-Chuan Toh collection [9]. As in the case of ber of iterations and in terms of computational time (see Ta- randomly generated instances, MOSEK can only solve instances ble 6). with m < 25000. ⊤ In the case of theta instances AA is a diagonal matrix, so 6. Conclusions that the update of the dual variable y turned out to be less expensive with respect to the case of random instances. We can We investigate the idea of factorizing the dual variable Z see in Table 4 the benefits of using DADAL: the improvements when solving SDPs in standard form within augmented La- both in terms of number of iterations and in terms of computa- grangian approaches. Our proposal is to use a first order update tional time are evident. We want to underline that for specific of the dual variables in order to improve the convergence rate instances different tuning of the parameters in DADAL can fur- of ADAL methods. We add this improvement step to the im- ther improve the running time. In Table 5, we report the results plementation mprw.m and we conclude that the approach pro- we obtained with DADAL keeping σ fixed to σ0 along the it- posed looks particularly promising for solving structured SDPs erations and performing only one iteration of Algorithm 2 in (which is the case for many applications). order to update (y, V) in Step 3: this turned out to be a better From our computational experience we notice that the spec- choice for these instances and resulted in even a better perfo- tral decomposition needed to perform the projection is not nec- mance compared to mprw.m. essarily the computational bottleneck anymore. In fact, the ma- 6 MOSEK mprw.m DADAL References |N| n m time(s) iter time(s) iter time(s) [1] S. Burer, R. D. C. Monteiro, A nonlinear programming algorithm for 10 46 166 0.4 411 0.2 117 0.8 solving semidefinite programs via low-rank factorization, Math. Program. 20 191 1331 3.6 441 2.3 150 2.6 95 (2, Ser. B) (2003) 329–357. 30 436 4496 34.5 483 12.1 170 15.5 [2] S. Burer, R. D. C. Monteiro, Local minima and convergence in low-rank 40 781 10661 294.1 542 60.7 232 79.8 semidefinite programming, Math. Program. 103 (3, Ser. A) (2005) 427– 50 1226 20826 1339.8 604 237.7 215 232.1 444. 60 1771 35991 - 636 759.8 252 748.3 [3] J. Povh, F. Rendl, A. Wiegele, A Boundary Point Method to solve Semidefinite Programs, Computing 78 (2006) 277–286. 70 2416 57156 - 707 2049.1 273 2122.7 [4] J. Malick, J. Povh, F. Rendl, A. Wiegele, Regularization methods for 80 3161 85321 - 745 4788.6 282 4395.9 semidefinite programming, SIAM J. Optim. 20 (1) (2009) 336–356. 90 4006 121486 - 773 9589.3 300 8923.2 [5] Z. Wen, D. Goldfarb, W. Yin, Alternating direction augmented lagrangian 100 4951 166651 - 821 18820.7 323 17944.1 methods for semidefinite programming, Math. Progr. Comp. 2 (3) (2010) 203–230. [6] D. P. Bertsekas, Constrained optimization and Lagrange multiplier meth- Table 6: Comparison on linear ordering problems. ods, Computer Science and Applied Mathematics, Academic Press Inc. [Harcourt Brace Jovanovich Publishers], New York, 1982. [7] D. P. Bertsekas, Nonlinear Programming, Athena Scientific, 1999. trix multiplications needed in order to update y (see Proposi- [8] M. ApS, The MOSEK optimization toolbox for MATLAB manual. Version 8.0.0.81. tion 2) can be the most expensive operations (e.g. in the case of (2017). randomly generated instances). URL http://docs.mosek.com/8.0/toolbox/index.html [9] K.-C. Toh, Solving large scale semidefinite programs via an iterative We also tried to use the factorization of Z in a “pure” aug- solver on the augmented systems, SIAM J. Optim. 14 (3) (2003) 670– mented Lagrangian algorithm. However, this turned out to be 698 (electronic). not competitive with respect to ADAL methods. This maybe [10] P. Hungerländer, F. Rendl, Semidefinite relaxations of ordering problems, Mathematical Programming 140 (1) (2013) 77–97. due to the fact that positive semidefiniteness of the primal ma- [11] C. Chen, B. He, Y. Ye, X. Yuan, The direct extension of admm for multi- trix and complementarity conditions are not satisfied by con- block convex minimization problems is not necessarily convergent, Math- struction (as is the case in ADAL methods) and this slows down ematical Programming 155 (1-2) (2016) 57–79. the convergence. We want to remark that dealing with the update of the penalty parameter in ADAL methods turned out to be a critical issue. Here we propose a unified strategy that leads to a satisfactory performance on the instances tested. However, for specific instances, different parameter tuning may further improve the performance as demonstrated for the case of computing the ϑ-number. Finally, we want to comment about the extension of DADAL to deal with SDPs in general form. Of course, any inequal- ity constraint may be transformed into an equality constraint via the introduction of a non-negative slack variable, so that DADAL can be applied to solve any SDP. However, avoid- ing the transformation to SDPs in standard form is in general preferable, in order to preserve favorable constraint structures such as sparsity and orthogonality. In [5], an alternating direction augmented Lagrangian method to deal with SDPs that includes, in particular, posi- tivity constraints on the elements of the matrix X is presented and tested, even if no convergence analysis is given. In fact, when considering multi-blocks alternating direction augmented Lagrangian methods, theoretical convergence is an issue [11]. The investigation on how to properly insert the proposed factorization technique within converging augmented Lagrangian schemes for SDPs in general form will be the topic of future work.

Acknowledgement The authors would like to thank the anonymous referees for their careful reading of the paper and for their constructive comments, which were greatly appreciated. The third author acknowledges support by the Austrian Science Fund (FWF): I 3199-N31. 7