UCLA Electronic Theses and Dissertations
Total Page:16
File Type:pdf, Size:1020Kb
UCLA UCLA Electronic Theses and Dissertations Title Operator Splitting Methods for Convex and Nonconvex Optimization Permalink https://escholarship.org/uc/item/78q0n13c Author Liu, Yanli Publication Date 2020 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA Los Angeles Operator Splitting Methods for Convex and Nonconvex Optimization A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Mathematics by Yanli Liu 2020 c Copyright by Yanli Liu 2020 ABSTRACT OF THE DISSERTATION Operator Splitting Methods for Convex and Nonconvex Optimization by Yanli Liu Doctor of Philosophy in Mathematics University of California, Los Angeles, 2020 Professor Wotao Yin, Chair This dissertation focuses on a family of optimization methods called operator split- ting methods. They solve complicated problems by decomposing the problem structure into simpler pieces and make progress on each of them separately. Over the past two decades, there has been a resurgence of interests in these methods as the demand for solving structured large-scale problems grew. One of the major challenges for split- ting methods is their sensitivity to ill-conditioning, which often makes them struggle to achieve a high order of accuracy. Furthermore, their classical analyses are restricted to the nice settings where solutions do exist, and everything is convex. Much less is known when either of these assumptions breaks down. This work aims to address the issues above. Specifically, we propose a novel acceler- ation technique called inexact preconditioning, which exploits second-order information at relatively low computation cost. We also show that certain splitting methods still work on problems without solutions, in the sense that their iterates provide information on what goes wrong and how to fix. Finally, for nonconvex problems with saddle points, we show that almost surely, splitting methods will only converge to the local minimums under certain assumptions. ii The dissertation of Yanli Liu is approved. Ali H. Sayed Lieven Vandenberghe Luminita Aura Vese Wotao Yin, Committee Chair University of California, Los Angeles 2020 iii Contents List of Figures ................................... x List of Tables .................................... xii Acknowledgments ................................. xiii Curriculum Vitae ................................. xiv I Introduction 1 1 Introduction ................................... 2 1.1 Background ................................. 3 1.2 Operator Splitting Methods ........................ 4 1.3 Contributions ................................ 5 1.3.1 Acceleration by inexact preconditioning .............. 5 1.3.2 Convergence behavior on pathological problems ......... 7 1.3.3 Convergence behavior on nonconvex problems .......... 8 1.4 Notations and Preliminaries ........................ 9 1.5 Common Operator Splitting Schemes ................... 11 II Acceleration by Inexact Preconditioning 15 2 Inexact Preconditioning for PDHG and ADMM ............ 17 2.1 Introduction ................................. 17 iv 2.1.1 Proposed approach ......................... 18 2.1.2 Related Literature ......................... 20 2.1.3 Organization ............................ 21 2.2 Preliminaries ................................ 22 2.3 Main results ................................. 22 2.3.1 Preconditioned PDHG ....................... 23 2.3.2 Choice of preconditioners ...................... 24 2.3.3 PrePDHG with fixed inner iterations ............... 26 2.3.4 Global convergence of iPrePDHG ................. 30 2.4 Numerical experiments ........................... 39 2.4.1 Graph cuts ............................. 43 2.4.2 Total variation based image denoising ............... 44 2.4.3 Earth mover’s distance ....................... 47 2.4.4 CT reconstruction ......................... 49 2.5 Conclusion .................................. 50 2.A ADMM as a special case of PrePDHG .................. 52 2.B Proof of Theorem 2.3.4: bounded relative error when S is the iterator of cyclic proximal BCD ............................ 54 2.C Two-block ordering in Claim 2.4.1 and four-block ordering in Claim 2.4.2 56 3 Inexact Preconditioning for SVRG and Katyusha X .......... 58 3.1 Introduction ................................. 58 3.1.1 Related Work ............................ 58 3.1.2 Our Contributions ......................... 60 v 3.2 Preliminaries and Assumptions ...................... 61 3.3 Proposed Algorithms ............................ 63 3.4 Main Theory ................................ 66 3.5 Experiments ................................. 71 3.5.1 Lasso ................................. 72 3.5.2 Logistic Regression ......................... 74 3.5.3 Sum-of-nonconvex Example .................... 76 3.6 Conclusions and Future Work ....................... 77 3.A Proof of Lemma 3.4.1 ............................ 78 3.B Proof of Theorem 3.4.2 ........................... 84 3.C Proof of Lemma 3.4.4 ............................ 91 3.D Proof of Theorem 3.4.3 ........................... 91 3.E Proof of Theorems 3.4.5 and 3.4.6 ..................... 92 III Convergence Behaviors on Pathological Problems 95 4 DRS for Pathological Conic Programs .................. 97 4.1 Introduction ................................. 97 4.1.1 Basic definitions ........................... 99 4.1.2 Classification of conic programs .................. 100 4.1.3 Classification method overview .................. 105 4.1.4 Previous work ............................ 106 4.2 Obtaining certificates from Douglas-Rachford Splitting ......... 106 4.2.1 Convergence of DRS ........................ 109 vi 4.2.2 Fixed-point iterations without fixed points ............ 111 4.2.3 Feasibility and infeasibility ..................... 114 4.2.4 Modifying affine constraints to achieve strong feasibility ..... 118 4.2.5 Improving direction ......................... 119 4.2.6 Modifying the objective to achieve finite optimal value ..... 122 4.2.7 Other cases ............................. 123 4.2.8 The algorithms ........................... 125 4.2.9 Case-by-case illustration ...................... 126 4.3 Numerical Experiments ........................... 131 5 DRS and ADMM for Pathological Convex Problems ......... 134 5.1 Introduction ................................. 134 5.1.1 Summary of results, contribution, and organization ....... 135 5.1.2 Prior work .............................. 137 5.2 Preliminaries ................................ 139 5.2.1 Duality and primal subvalue .................... 140 5.2.2 Douglas–Rachford operator .................... 142 5.2.3 Fixed-point iterations without fixed points ............ 142 5.3 Theoretical results ............................. 143 5.3.1 Infimal displacement vector of the DRS operator ......... 145 5.3.2 Function-value analysis ....................... 148 5.3.3 Evolution of shadow iterates .................... 153 5.4 Pathological convergence: DRS ...................... 155 5.4.1 Classification ............................ 156 vii 5.4.2 Convergence results ......................... 158 5.4.3 Interpretation ............................ 159 5.4.4 Feasibility problems ......................... 160 5.5 Pathological convergence: ADMM ..................... 162 5.5.1 Classification and convergence results ............... 163 5.5.2 Interpretation ............................ 164 5.5.3 Proofs ................................ 164 5.6 When strong duality fails .......................... 165 5.7 Conclusion .................................. 168 IV Convergence Behaviors on Nonconvex Problems 169 6 Strict-Saddle Point Avoidance of FBS and DRS ............ 171 6.1 Introduction ................................. 171 6.2 Preliminaries ................................ 173 6.3 Davis-Yin Splitting and its Envelope ................... 175 6.3.1 Review of Davis-Yin Splitting ................... 175 6.3.2 Envelope of Davis-Yin Splitting .................. 177 6.4 Properties of Envelope ........................... 178 6.4.1 Global Minimizers Correspondence ................ 181 6.4.2 Davis-Yin Splitting as Gradient Descent of the Envelope .... 183 6.4.3 Local Minimizers Correspondence ................. 187 6.4.4 Critical and Stationary Point Correspondence .......... 188 6.4.5 Strict Saddle Correspondence ................... 189 viii 6.5 Avoidance of Strict Saddle Points ..................... 191 6.6 Conclusions ................................. 195 Bibliography .................................... 196 ix List of Figures 2.1 two-block ordering in Claim 2.4.1 ....................... 41 2.2 four-block ordering in Claim 2.4.2 ....................... 41 2.3 Input image ................................... 44 2.4 Graph cut by iPrePDHG (Inner: BCD) .................... 44 2.5 Noisy image ................................... 46 2.6 Denoising by iPrePDHG (Inner: BCD) .................... 46 × −6 1 2.7 For PDHG, τ = 3 10 , σ = τkdivk2 ; For iPrePDHG (Inner: BCD), τ = −6 −1 T 1 ∗ 3 × 10 , M1 = τ In, M2 = τdivdiv , γ = , and p = 2. km |k1,2 is kM2k obtained by calling CVX. ........................... 48 2.8 ρ0, ρ1 are the white standing cat and the black crouching cat, respectively. Both images are 256 × 256, and the earth mover’s distance between ρ0 and ρ1 is 0.6718. ................................... 49 −3 −8 3.1 Lasso on w1a.t, (n, d) = (47272, 300), λ1 = 10 , λ2 = 10 . For iPreSVRG and iPreKatX: η1 = 0.005; For SVRG and Katyusha X: η2 = 0.08; For Katyusha