
Part 5. Dual-based Methods Math 126 Winter 18 Date of current version: February 25, 2018 Abstract This note studies dual-based methods such as dual subgradient method, dual proximal gradi- ent method, augmented Lagrangian method and alternating direction method of multipliers (ADMM). Many parts of this note are based on the chapters [1, Chapter 8,12,15]. Please email me if you find any typos or errors. 1 Dual Projected Subgradient Methods (see [1, Chapter 8]) Consider the problem min f(x) (1.1) x subject to g(x) 0, where we assume that d Ê – f : Ê → is convex. d Ê – g(·)=(g1(·),...,gm(·))⊤, where g1,...,gm : Ê → are convex. – The problem has finite optimal value, and the optimal set, denoted by X , is nonempty. – There exists xˆ for which g(xˆ) ≺ 0. ∗ m – For any λ ∈ Ê+ , the problem minx{f(x)+ λ⊤g(x)} has an optimal solution. The Lagrange dual function is q(λ) = inf L(x, λ) ≡ f(x)+ λ⊤g(x) , (1.2) x and the dual problem is max q(λ). (1.3) λ 0 m For a given λ ∈ Ê+ , suppose that the minimum in the minimization problem defining q(λ) in (1.2) is attained at xλ, i.e., L(xλ, λ)= f(xλ)+ λ⊤g(xλ)= q(λ). (1.4) ¯ m We seek to find a subgradient of the convex function −q at λ. For any λ ∈ Ê+ , we have q(λ¯) = min f(x)+ λ¯⊤g(x) (1.5) x ≤ f(xλ)+ λ¯⊤g(xλ) = f(xλ)+ λ⊤g(xλ)+(λ¯ − λ)⊤g(xλ) = q(λ)+ g(xλ)⊤(λ¯ − λ), Donghwan Kim Dartmouth College E-mail: [email protected] which leads to −q(λ¯) ≥−q(λ)+(−g(x))⊤(λ¯ − λ). (1.6) This show that −g(xλ) ∈ ∂(−q)(λ). (1.7) Algorithm 1 Dual Projected Subgradient (Ascent) Method m 1: Input: λ0 ∈ Ê+ . 2: for k ≥ 0 do 3: Compute x ∈ arg min L(x, λ ) ≡ f(x)+ λ⊤g(x) . k x k k 4: Choose a step size sk > 0. 5: λk+1 = [λk + skg(xk)]+. 6: If a stopping criteria is satisfied, then stop. Example 1.1 Dual decomposition. Consider the problem B min f(x) ≡ fi(xi) x ( i=1 ) X subject to Ax b, where f is (block-)separable, x = (x1⊤,..., xB⊤)⊤, and A = (A1⊤,..., AB⊤)⊤. Then the Lagrangian L(·) is also (block-)separable in x: B L(x, λ)= Li(xi, λ), i=1 X where Li(xi, λ) = fi(xi)+ λ⊤(Aixi − b). Thus, at each kth iteration of the dual subgradient method, the x-minimization splits into B separate minimizations as [xk]i ∈ arg min Li(xi, λk), i = 1,...,B. xi Convergence analysis of the dual objective function sequence {q(λk)}k 0 under various choice of ≥ {sk}k 0 is already known since the convergence analysis of the primal objective function sequence directly applies≥ here. How about the convergence analysis of a primal sequence? For the primal case, we have to consider a primal sequence other than the sequence {xk}k 0: ≥ – full averaging sequence: k si x¯k = ηk,ixi, where ηk,i = k , i = 0,...,k. (1.8) i=0 l=0 sl X – partial averaging sequence: P k si x˜k = ηk,ixi, where ηk,i = k , i = 0,...,k. (1.9) i= k/2 l= k/2 sl X⌈ ⌉ ⌈ ⌉ Lemma 1.1 Assume that there exists M > 0 such that P||g(x)||2 ≤ M for any x. Let ρ > 0 be some positive number, and let {xk}k 0 and {λk}k 0 be the sequences generated by the dual projected subgradient ≥ ≥ methods with step size s = γk . Then for any k ≥ 2, k g(xk) 2 || || 2 k 2 M (||λ0||2 + ρ) + i=0 γi f(x¯k) − f(x )+ ρ||[g(x¯k)]+||2 ≤ (1.10) ∗ 2 k i=0 γiP and P 2 k 2 M (||λ k/2 ||2 + ρ) + i= k/2 γi f(x˜k) − f(x )+ ρ||[g(x˜k)]+||2 ≤ ⌈ ⌉ ⌈ ⌉ . (1.11) ∗ 2 k i= k/2 Pγi ⌈ ⌉ P 2 2 Dual Proximal Gradient Methods (see [1, Chapter 12]) Consider the problem min{f(x)+ φ(Ax)}, (2.1) x where we assume that d – f : Ê → (−∞, ∞] is proper closed and σ-strongly convex (σ > 0). p – φ : Ê → (−∞, ∞] is proper closed and convex. p d – A ∈ Ê × is a matrix. – There exists xˆ ∈ relint(dom f) and zˆ ∈ relint(dom φ) such that Axˆ = zˆ. We first reformulate the problem as min {f(x)+ φ(z)} (2.2) x,z subject to Ax − z = 0. (2.3) The Lagrangian is L(x, z, µ)= f(x)+ φ(z) −hµ, Ax − zi = f(x)+ φ(z) −hA⊤µ, xi + hµ, zi, (2.4) and the dual function is q(µ) = inf f(x)+ φ(z) −hA⊤µ, xi + hµ, zi (2.5) x,z = − sup −f(x) − φ(z)+ hA⊤µ, xi + h− µ, zi . x,z Then, the dual problem is max q(µ) ≡−f ∗(A⊤µ) − φ∗(−µ) . (2.6) µ We reformulate the dual problem in its minimization form: min {F (µ)+ Φ(µ)} (2.7) µ where F (µ)= f ∗(A⊤µ), Φ(µ)= φ∗(−µ). (2.8) Theorem 2.1 Let σ > 0. Then d 1 Ê – If f : Ê → is a σ -smooth convex function, then f ∗ is σ-strongly convex. d 1 – If f : Ê → (−∞, ∞] is a proper closed σ-strongly convex function, then f ∗ is σ -smooth. Lemma 2.1 (properties of F and Φ) A 2 || ||2 (a) F is convex and LF -smooth with LF = σ . (b) Φ is proper closed and convex. 1 Proof (a) Since f is proper closed and σ-strongly convex, f ∗ is σ -smooth. Therefore for any µ1, µ2, we have ||∇F (µ1) − ∇F (µ2)||2 = ||A(∇f ∗(A⊤µ1)) − A(∇f ∗(A⊤µ2))||2 ≤ ||A||2||∇f ∗(A⊤µ1) − ∇f ∗(A⊤µ2)||2 1 ≤ ||A|| ||A µ − A µ || 2 σ ⊤ 1 ⊤ 2 2 1 ≤ ||A|| ||A || ||µ − µ || σ 2 ⊤ 2 1 2 2 ||A||2 = 2 ||µ − µ || . σ 1 2 2 F is convex since F is a composition of a convex function f ∗ and a linear mapping. (b) Since φ is proper closed and convex, so is φ∗. Thus Φ(µ) ≡ φ∗(−µ) is also proper closed and convex. 3 If f is also Lf -smooth, then we can use the proximal gradient method as below. Algorithm 2 Primal Proximal Gradient Method 1: Input: x0 and L ≥ Lf . 2: for k ≥ 0 do 1 3: xk+1 = prox 1 φ(A·) xk − L ∇f(xk) . L This might not be efficient since the proximal operator prox 1 φ(A ) may not have a closed-form L · expression due to A (unless it is a diagonal matrix). Let’s see how we can circumvent such issue using the Lagrange dual (even in the case where we do not need the Lf -smoothness assumption on f). We can easily use the proximal gradient method to solve the dual problem as below. Algorithm 3 Dual Proximal Gradient Method (dual representation) 2 ||A||2 1: Input: µ0 and L ≥ LF = σ . 2: for k ≥ 0 do 1 3: µk+1 = prox 1 Φ µk − L ∇F (µk) . L Theorem 2.2 Let {µk}k 0 be the sequence generated by the dual proximal gradient method. Then, for any optimal solution µ of≥ the dual problem (2.7), we have ∗ 2 L||µ0 − µ ||2 q(µ ) − q(µk) ≤ ∗ . (2.9) ∗ 2k We next find a primal representation of the dual proximal gradient method, which is written in a more explicit way in terms of f,φ, A. (proof omitted) Algorithm 4 Dual Proximal Gradient Method (primal representation) 2 ||A||2 1: Input: µ0 and L ≥ LF = σ . 2: for k ≥ 0 do ⊤ 3: xk = argmaxx hx, A µki−f(x) . 1 1 4: µk+1 = µk − L Axk + L proxLφ (Axk − Lµk). Remark 2.1 The sequence {xk}k 0 generated by the method will be called the primal sequence. The elements of the sequence are actually≥ not necessarily feasible with respect to the primal problem (2.1) since they are not guaranteed to belong to dom φ(A·). Nevertheless, the primal sequence converge to the optimal solution x . ∗ Lemma 2.2 (primal-dual relation) Let µ¯ ∈ dom Φ, and let x¯ = arg max hx, A⊤µ¯i−f(x) . (2.10) x Then, 2 2 ||x¯ − x ||2 ≤ (q(µ ) − q(µ¯)). (2.11) ∗ σ ∗ Theorem 2.3 Let {xk}k 0 and {µk}k 0 be the primal and dual sequences generated by the dual proximal gradient method. Then, for≥ any optimal≥ solution µ of the dual problem (2.7), we have ∗ 2 2 L||µ0 − µ ||2 ||xk − x ||2 ≤ ∗ . (2.12) ∗ σk 4 We can accelerate dual proximal gradient method by using the fast dual proximal gradient method as below. Algorithm 5 Fast Dual Proximal Gradient Method (dual representation) 2 ||A||2 1: Input: µ0 = η0 and L ≥ LF = σ . 2: for k ≥ 0 do 1 3: µk+1 = prox 1 Φ ηk − L ∇F (ηk) . L 2 1+q1+4tk 4: tk+1 = 2 . tk−1 5: ηk+1 = µ + (µ − µ ). k+1 tk+1 k+1 k Theorem 2.4 Let {µk}k 0 be the sequence generated by the fast dual proximal gradient method. Then, for any optimal solution µ≥ of the dual problem (2.7), we have ∗ 2 2L||µ0 − µ ||2 q(µ ) − q(µk) ≤ ∗ (2.13) ∗ (k + 1)2 Algorithm 6 Fast Dual Proximal Gradient Method (primal representation) 2 ||A||2 1: Input: µ0 = η0 and L ≥ LF = σ .
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages11 Page
-
File Size-