Tensors: an Adaptive Approximation Algorithm, Convergence in Direction, and Connectedness Properties
Total Page:16
File Type:pdf, Size:1020Kb
Tensors: An Adaptive Approximation Algorithm, Convergence in Direction, and Connectedness Properties A dissertation presented to the faculty of the College of Arts and Sciences of Ohio University In partial fulfillment of the requirements for the degree Doctor of Philosophy Nathaniel J. McClatchey May 2018 © 2018 Nathaniel J. McClatchey. All Rights Reserved. 2 This dissertation titled Tensors: An Adaptive Approximation Algorithm, Convergence in Direction, and Connectedness Properties by NATHANIEL J. MCCLATCHEY has been approved for the Department of Mathematics and the College of Arts and Sciences by Martin J. Mohlenkamp Associate Professor of Mathematics Robert Frank Dean, College of Arts and Sciences 3 Abstract MCCLATCHEY, NATHANIEL J., Ph.D., May 2018, Mathematics Tensors: An Adaptive Approximation Algorithm, Convergence in Direction, and Connectedness Properties (135 pp.) Director of Dissertation: Martin J. Mohlenkamp This dissertation addresses several problems related to low-rank approximation of tensors. Low-rank approximation of tensors is plagued by slow convergence of the sequences produced by popular algorithms such as Alternating Least Squares (ALS), by ill-posed approximation problems which cause divergent sequences, and by poor understanding of the nature of low-rank tensors. Though ALS may produce slowly-converging sequences, ALS remains popular due to its simplicity, its robustness, and the low computational cost for each iteration. I apply insights from Successive Over-Relaxation (SOR) to ALS, and develop a novel adaptive method based on the resulting Successive Over-Relaxed Least Squares (SOR-ALS) method. Numerical experiments indicate that the adaptive method converges more rapidly than does the original ALS algorithm in almost all cases. Moreover, the adaptive method is as robust as ALS, is only slightly more complicated than ALS, and each iteration requires little computation beyond that of an iteration of ALS. Divergent sequences in tensor approximation may be studied by examining their images under some map. In particular, such sequences may be re-scaled so that they become bounded, provided that the objective function is altered correspondingly. I examine the behavior of sequences produced when optimizing bounded multivariate rational functions. The resulting theorems provide insight into the behavior of certain divergent sequences. Finally, to improve understanding of the nature of low-rank tensors, I examine connectedness properties of spaces of low-rank tensors. I demonstrate that spaces of unit 4 tensors of bounded rank are path-connected if the space of unit vectors in at least one of the factor spaces is path-connected, and that spaces of unit separable tensors are simply-connected if the unit vectors are simply-connected in every factor space. Moreover, I partially address simple connectedness for unit tensors of higher rank. 5 Dedication To my family, whose love and support brought me to this point: 6 Acknowledgments This material is based upon work supported by the National Science Foundation under Grant No. 1418787. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation. I am grateful to Dr. Mohlenkamp, without whose guidance this dissertation would not have been possible. That you always had a word of advice when I needed it, despite my research touching three unique topics, is nothing short of amazing. I am grateful also to the other faculty of Ohio University — particularly Dr. Just, whose timely and thorough feedback has been a blessing. 7 Table of Contents Page Abstract.........................................3 Dedication........................................5 Acknowledgments....................................6 List of Figures......................................9 List of Acronyms.................................... 10 1 Introduction..................................... 11 2 Background..................................... 16 2.1 Tensors.................................... 16 2.2 Tensor formats................................ 18 2.2.1 Coordinate format.......................... 18 2.2.2 Separated representation....................... 19 2.2.2.1 Difficulties inherent in the separated representation... 20 2.2.3 Tucker format............................. 23 2.2.4 Tensor Train format......................... 24 2.3 Algorithms for separated representations................... 24 2.3.1 Block Coordinate Descent...................... 24 2.3.1.1 Alternating Least Squares................. 27 2.3.2 Line Search.............................. 28 2.3.3 Rank adjustment........................... 30 2.4 Algorithms for separable approximation................... 31 2.5 Łojasiewicz inequality and related results.................. 32 3 Nonlinear SOR and Tensor Approximation..................... 35 3.1 Introduction.................................. 35 3.2 SOR-ALS................................... 37 3.3 Local Q-linear convergence of SOR-ALS.................. 40 3.3.1 Uschmajew’s proof of the local linear convergence of ALS..... 42 3.3.2 Linearization of nonlinear SOR................... 46 3.3.3 Linear SOR and the energy seminorm................ 51 3.3.4 Local linear convergence of SOR-ALS............... 54 3.4 Numerical results............................... 55 3.4.1 Modeling terminal behavior..................... 57 3.4.2 An adaptive algorithm........................ 61 8 3.5 Concluding remarks.............................. 64 4 On Convergence to Essential Singularities..................... 65 4.1 Introduction.................................. 65 4.2 On the assumption of a Łojasiewicz inequality................ 71 4.2.1 A Łojasiewicz-like inequality holds on cones............ 72 4.2.2 A Łojasiewicz inequality for sequences approaching singularities. 80 4.2.3 New convergence theorems..................... 86 4.3 Algorithms, examples, and implications................... 87 4.3.1 Example from figure......................... 88 4.3.2 Convergence in direction....................... 89 4.3.3 Implications for tensor approximation................ 91 4.4 Concluding remarks.............................. 92 5 Connectedness Properties of Low-rank Unit Tensors................ 93 5.1 Preliminaries................................. 94 5.2 Separable tensors............................... 100 5.2.1 The tensor product is continuous................... 100 5.2.2 Path-connectedness.......................... 103 5.2.3 Simple connectedness........................ 105 5.2.4 Miscellany.............................. 111 5.3 Sums of separable tensors........................... 113 5.3.1 Path-connectedness.......................... 113 5.3.2 Simple connectedness........................ 115 5.4 Concluding remarks.............................. 125 References........................................ 127 9 List of Figures Figure Page 3.1 f (v + ! arg min f (z + v)) as a function of ! ................... 39 z2V( j) 3.2 Example rate of convergence (q!) of SOR-ALS relative to rate of convergence (q1) of ALS. If ln(q!=q1) < 0, then SOR-ALS outperforms ALS at that value of !........................................ 56 3.3 Speedup of SOR-ALS compared to speed of ALS. Rank of target equals rank of approximation.................................. 59 3.4 Speedup of SOR-ALS compared to speed of ALS. Rank of target equals rank of approximation.................................. 60 3.5 Optimal ! compared to speed of ALS....................... 61 3.6 Speedup of Algorithm8 compared to speed of ALS. Rank of target equals rank of approximation.................................. 63 3.7 An example comparing ALS and Algorithm8................... 64 −xy 4.1 Line Search along the gradient maximizes f (x; y) = (x2+y2)(1+x2+y2) from an initial estimate of (x0; y0) = (2; −0:1)........................ 67 4.2 Illustration of Lemma 4.2.8............................ 82 k1 5.1 Refining the partition fsngn=1............................ 97 5.2 Improving a spline by refining subdivisions................... 119 5.3 Hierarchy of sets used in proof of Theorem 5.3.7................ 121 10 List of Acronyms ALS Alternating Least Squares BCD Block Coordinate Descent CP Canonical Polyadic dGN damped Gauss-Newton GD Gradient Descent GS Gauss-Seidel MBI Maximum Block Improvement NCG Nonlinear Conjugate Gradient NSOR Nonlinear Successive Over-Relaxation PNCG Nonlinearly Preconditioned Nonlinear Conjugate Gradient RALS Randomized Alternating Least Squares SEY Schmidt-Eckhart-Young SOR Successive Over-Relaxation SOR-ALS Successive Over-Relaxed Least Squares SVD Singular Value Decomposition 11 1 Introduction Representation and analysis of multidimensional data has been the subject of renewed interest in recent years. Consider a d-dimensional multi-index array A. Each tuple of indexes i1;:::; id has a corresponding element Ai1;:::;id in the array. The computational and storage complexity of such an array depends exponentially on the number of dimensions d. For example, if each index i j ranges from 1 to 10, then A would have 10d entries. To reduce the computational and storage complexity of such an array, one may decompose it into a sum of separable arrays r d X Y l Ai1;:::;id = v j(i j) ; l=1 j=1 which may be represented more concisely using the tensor product r d X O l A = v j : l=1 j=1 This decomposition, the separated representation, is known by many names including the Canonical Polyadic (CP) decomposition, CANDECOMP, PARAFAC, and tensor rank decomposition.