Nonnegative Matrix Factorization Nicolas Gillis

Université catholique de Louvain École polytechnique de Louvain Département d’ingénierie mathématique Nonnegative Matrix Factorization Complexity, Algorithms and Applications Nicolas Gillis Thesis submitted in partial fulfillment of the requirements for the degree of Docteur en Sciences de l’Ingénieur Dissertation committee: Prof. Philippe Lefèvre (UCL, President) Prof. François Glineur (UCL, Advisor) Prof. Pierre-Antoine Absil (UCL) Prof. Paul Van Dooren (UCL) Prof. Samuel Fiorini (Université libre de Bruxelles, Belgium) Prof. Didier Henrion (LAAS-CNRS, University of Toulouse, France) Prof. Sabine Van Huffel (Katholieke Universiteit Leuven, Belgium) Prof. Stephen Vavasis (University of Waterloo, Canada) February 2011 Acknowledgments I would like to thank my advisor, François Glineur. I have enjoyed working with him since my master’s thesis, and he has constantly supported and encouraged me for my different projects. I also thank Robert Plemmons who warmly greeted me at Wake Forest Uni- versity during the Fall semester 2009. It was a pleasure working with him, Paùl Pauca and their colleagues. I am grateful to my support committee, Didier Henrion and Paul Van Dooren, with whom I had the chance to discuss my problems, and benefit from their experience. I also thank the members of my defense committee Pierre-Antoine Absil, Samuel Fiorini, Sabine Van Huffel, Stephen Vavasis, and the chair Philippe Lefèvre. I am also grateful to the other people who have helped me in various ways, and have interacted with this work: Michael Berry, Mathieu Van Vyve, Raphaël Jungers, Chia-Tche Chang, Quentin Rentmeesters, Santanu Dey, Lau- rence Wolsey, and others. This experience could not have been what it has without the warm am- biance at CORE. I thank my office mates, Gauthier, Joachim, Salome, Grégory, Stéphane, Laurent, Alexis, Vladislav and Julie ; my colleagues Robert, Olivier, Rafael, Sylvette, Joël, Peter, Maia, Margherita, Olivier’, Jean-Sébastien, Clau- dia, Mikel and many others. I also thank the precious help of the CORE administratives. As a F.R.S.-FNRS research fellow, I thank the Fonds de la recherche scien- tifique (F.R.S.-FNRS) for their administrative and financial support. Last but not least, I thank my family and my friends. I dedicate this thesis to my grandparents. i ii Abstract Linear dimensionality reduction techniques such as principal component analysis are powerful tools for the analysis of high-dimensional data. In this thesis, we explore a closely related problem, namely nonnegative matrix factorization (NMF), a low-rank matrix approximation problem with nonnegativity constraints. More precisely, we seek to approximate a given nonnegative matrix with the product of two low-rank nonnegative matrices. These nonnegative fac- tors can be interpreted in the same way as the data, e.g., as images (described by pixel intensities) or texts (represented by vectors of word counts), and lead to an additive and sparse representation. However, they render the problem much more difficult to solve (i.e., NP-hard). A first goal of this thesis is to study theoretical issues related to NMF. In particular, we make connections with well-known problems in graph theory, combinatorial optimization and computational geometry. We also study computational complexity issues and show, using reductions from the maximum- edge biclique problem, NP-hardness of several low-rank matrix approximation problems, including the rank-one subproblems arising in NMF, a problem in- volving underapproximation constraints (NMU) and the unconstrained version of the factorization problem where some data is missing or unknown. Our second goal is to improve existing techniques and develop new tools for the analysis of nonnegative data. We propose accelerated variants of several NMF algorithms based on a careful analysis of their computational cost. We also introduce a multilevel approach to speed up their initial convergence. Finally, a new greedy heuristic based on NMU is presented and used for the analysis of hyperspectral images, in which each pixel is measured along hun- dreds of wavelengths, which allows for example spectroscopy of satellite images. iii iv Table of contents Acknowledgments i Abstract iii Table of Contents vi Notation vii 1 Introduction 1 Thesis outline and related publications . 5 2 Preliminaries 9 2.1 Optimization ............................ 9 2.2 Low-RankMatrixApproximation. 12 2.3 ComputationalComplexity . 16 3 Nonnegative Rank 21 3.1 ComputationalComplexity . 22 3.2 ExtendedFormulations . 25 3.3 RestrictedNonnegativeRank . 26 3.4 Lower Bounds for the Nonnegative Rank . 38 3.5 Applications: Slack Matrices and Linear EDM’s . 45 3.6 Improvements using the Matrix Transpose . 57 4 Algorithms for NMF 63 4.1 Three Existing Algorithms: MU, ANLS and HALS . 65 4.2 AcceleratedMUandHALSAlgorithms . 73 4.3 AMultilevelApproach. 90 5 Nonnegative Factorization 109 5.1 Rank-oneNonnegativeFactorization . 110 5.2 StationaryPoints. .. .. .. .. .. .. .. .. .. .. .. 114 v TABLE OF CONTENTS 5.3 BicliqueFindingAlgorithm . 119 5.4 MUforNonnegativeFactorization . 126 6 Nonnegative Matrix Underapproximation 135 6.1 Sparsity ............................... 137 6.2 RelatedWork............................ 138 6.3 ComputationalComplexity . 139 6.4 ConvexFormulations. 141 6.5 Lagrangian Relaxation based Algorithm for NMU . 148 6.6 NMUvssparseNMF........................ 153 7 Hyperspectral Data Analysis using NMU 167 7.1 TheIdealCase ........................... 169 7.2 TheNon-IdealCase ........................ 175 7.3 ℓ0-Pseudo-Norm Minimization and ℓ1-NormRelaxation . 176 7.4 NumericalExperiments . 180 8 Weights and Missing Data 199 8.1 PreviousResults .......................... 201 8.2 WeightedLow-RankApproximation . 204 8.3 Low-Rank Approximation with Missing Data . 208 Conclusion 215 Bibliography 220 Appendix 232 A Active set methods for NNLS 233 vi Notation Scalars, Vectors, Matrices R set of real numbers R0 set of nonzero real numbers R+ set of nonnegative real numbers Rn set of real column vectors of dimension n m n R × set of real matrices of dimension m-by-n n R+ set of nonnegative real column vectors of dimension n m n R+ × set of nonnegative real matrices of dimension m-by-n Norms n n . 1 ℓ1-norm, x 1 = i=1 xi , x R || || || || | | n∈ 2 n . 2 vector ℓ2-norm, x 2 = i=1 xi , x R || || ||P|| ∈ m n n matrix ℓ2-norm, A 2 = maxx R , x 2=1 Ax 2, A R × || || pP ∈ || || || ||n ∈ . vector ℓ -norm, x = max1 i n xi , x R || ||∞ ∞ || ||∞ ≤ ≤ | | ∈ . ℓ -‘norm’, x = i x =0 , x Rm || ||0 0 || ||0 { | i 6 } ∈ m n 2 m n . F Frobenius norm, A F = i=1 j=1 Aij , A R × || || || || ∈ qPvii P TABLE OF CONTENTS Functions on Matrices m n m n ., . scalar product, A, B = A B ,A,B R × h i h i i=1 j=1 ij ij ∈ σ (.) ith singular values of a matrix, in non-decreasing order i P P rank(.) rank of a matrix rank+(.) nonnegative rank of a nonnegative matrix rank+∗ (.) restricted nonnegative rank of a nonnegative matrix conv(.) convex hull of the columns of a matrix col(.) column space of a matrix th Ai: or A(i, :) i row of A th A:j or A(:, j) j column of A Aij or A(i, j) entry at position (i, j) of A A(I, J) submatrix of A with row (resp. column) indices in I (resp. J) component-wise multiplication, (A B) = A B ◦ ◦ ij ij ij [ . ] [A] Aij [ . ] component-wise division, [B] = B ij ij T T (.) transpose of a matrix, (A )ij = Aji Miscellaneous 1m vector of all ones of dimension m 1m n matrix of all ones of dimension m-by-n × 0m n matrix of zeros of dimension m-by-n I × identity matrix of dimension n .n = is equal by definition a:b set a,a +1,...,b 1,b (for a and b integers with a b) f It is{ the the gradient− of} the function f ≤ ∇2f It is the hessian of the function f ∇ . x is the smallest integer greater or equal to x R ⌈ ⌉ ⌈ ⌉ ∈ . x is the largest integer smaller or equal to x R ⌊ ⌋ ⌊substraction⌋ of two sets, R S is the set of elements∈ in R not in S \. cardinality of a set, S is the\ number of elements in S ¯.| | complement of a set,| for| S R, S¯ = R S ⊂ \ n supp(.) support (set of non-zero entries), supp(x)= 1 i n xi =0 , x R supp(.) sparsity pattern (set of zero entries, complement{ ≤ of≤ the| support)6 } ∈ viii TABLE OF CONTENTS Acronyms PCA principal component analysis (p. 1) SPCA sparse principal component analysis (p. 1) WLRA weighted low-rank approximation (p. 1 and p. 200) ICA independent component analysis (p. 2) NMF nonnegative matrix factorization (p. 3) LP linear programming (p. 10) QP quadratic programming (p. 10) SOCP second order cone programming (p. 10) SDP semidefinite positive (p. 10) GP geometric programming (p. 10 and p. 143) SVD singular value decomposition (p. 13) LRA low-rank matrix approximation (p. 12) LRA-1 rank-one matrix approximation (p. 13) PSV principal singular vectors (p. 14) PC principal component (p. 14) Exact NMF exact nonnegative matrix factorization (p. 22) IS intermediate simplex (p. 22) RNR restricted nonnegative rank (p. 26) NPP nested polytopes problem (p. 27) EDM Euclidean distance matrix (p. 48) NNLS nonnegative least squares (p. 64) MU multiplicative updates (p. 65) ANLS alternating nonnegative least squares (p. 68) HALS hierarchical alternating least squares (p. 70) R1NF rank-one nonnegative factorization (p. 110) NF nonnegative factorization (p. 127) NMU nonnegative matrix underapproximation (p. 135) NMU-1 rank-one nonnegative matrix underapproximation (p. 139) PCAMD PCA with missing data (p. 199) SFM structure from motion (p. 203) LRAMD low-rank matrix approximation with missing data (p. 208) ix TABLE OF CONTENTS x Chapter 1 Introduction Approximating a matrix with one of lower rank is a key problem in data analysis, and is widely used for linear dimensionality reduction. Typically, we are m n given a matrix M R × , where each of the n columns represents an ele- ment of a dataset in∈ an m-dimensional space.

Load more