COMPRESSED NONNEGATIVE MATRIX FACTORIZATION IS FAST and ACCURATE Mariano Tepper, Guillermo Sapiro

1 COMPRESSED NONNEGATIVE MATRIX FACTORIZATION IS FAST AND ACCURATE Mariano Tepper, Guillermo Sapiro Abstract—Nonnegative matrix factorization (NMF) has an commercial computer power, such rich datasets would be established reputation as a useful data analysis technique in created at an increasing speed. numerous applications. However, its usage in practical situations In this big data scenario, data communication is one of is undergoing challenges in recent years. The fundamental factor to this is the increasingly growing size of the datasets available the main performance bottlenecks for numerical algorithms and needed in the information sciences. To address this, in this (here, we mean communication in a broad sense, including work we propose to use structured random compression, that is, for example, network transfers and secondary memory access). random projections that exploit the data structure, for two NMF Since the data cannot be easily stored in main memory, variants: classical and separable. In separable NMF (SNMF) the performing fewer passes over the original data, even at the cost left factors are a subset of the columns of the input matrix. We present suitable formulations for each problem, dealing with of more floating-point operations, may result in substantially different representative algorithms within each one. We show faster techniques. that the resulting compressed techniques are faster than their Lastly, the architecture of computing units is evolving uncompressed variants, vastly reduce memory demands, and towards massive parallelism (consider, for example, general do not encompass any significant deterioration in performance. purpose GPUs and MapReduce models [1]). Numerical algo- The proposed structured random projections for SNMF allow to deal with arbitrarily shaped large matrices, beyond the rithms should adapt to these environments and exploit their standard limit of tall-and-skinny matrices, granting access to benefits for boosting their performance. very efficient computations in this general setting. We accompany In recent years, Nonnegative Matrix Factorization the algorithmic presentation with theoretical foundations and (NMF) [2] has been frequently used since it provides a numerous and diverse examples, showing the suitability of the good way for modeling many real-life applications (e.g., proposed approaches. recommender systems [3] and audio processing [4]). NMF Index Terms—Nonnegative matrix factorization, separable seeks to represent a nonnegative matrix (i.e., a matrix with nonnegative matrix factorization, structured random projections, big data. nonnegative entries) as the product of two nonnegative matrices. One of the reasons for the method’s popularity is that the use of non-subtractive linear combinations renders I. INTRODUCTION the factorization, in many cases, easily interpretable. The goal The number and diversity of the fields that make use of data of this work is to develop algorithms, based on structured analysis is rapidly increasing, from economics and marketing random projections, for computing NMF for big data matrices. to medicine and neuroscience. In all of them, data is being collected at an astounding speed: databases are now measured A. Two flavors of nonnegative matrix factorization in gigabytes and terabytes, including trillions of point-of-sale transactions, worldwide social networks, and gigapixel images. Given an m × n nonnegative matrix A, NMF is formally Organizations need to rapidly turn these terabytes of raw data defined as arXiv:1505.04650v2 [cs.LG] 6 Sep 2015 into significant insights for their users to guide their research, 2 min kA − XYkF s.t. X; Y ≥ 0; (1) marketing, investment, and/or management strategies. m×r r×n X2R ;Y2R Matrix factorization is a fundamental data analysis technique. Whereas its usefulness as a theoretical tool is beyond where r is a parameter that controls the size of factors X and doubt now, its usage in practical situations has undergone a few Y and, hence, the factorization’s accuracy. For simplicity, we challenges in recent years. Among other factors contributing to use B ≥ 0 to denote a matrix B with nonnegative entries. this are new developments in computer hardware architecture Despite its appealing advantages, NMF does present some and new applications in the information sciences. theoretical and practical challenges. In the general case, NMF Perhaps the key aspect is that the matrices to analyze is known to be NP-Hard [5] and highly ill-posed [6, and are becoming astonishingly big. Classical algorithms are not references therein]. However, there are matrices that exhibit designed to cope with the amount of information present in a particular structure such that NMF can be solved efficiently these large-scale problems. We may even hypothesize that, (i.e., in polynomial time) [7]. if proper tools for these problems were widely available for Definition 1. A nonnegative matrix A is r-separable if there exists an index set K of cardinality r over the columns of A This work was partially supported by NSF, ONR, NGA, ARO, and NSSEFF. r×n The authors are with the Department of Electrical and and a nonnegative matrix Y 2 R , such that Computer Engineering, Duke University, NC 27708 USA (e-mail: fmariano.tepper,[email protected]) A = (A):KY; (2) 2 where (A):K represents the matrix obtained by horizontally Interestingly, the use of structured random projections allows stacking the columns of A indexed by K. Consequently, to compute SNMF for arbitrarily large matrices, eliminating a nonnegative matrix A is near r-separable if it can be the tall-and-skinny requirement while preserving efficiency. represented as Our code is available at http://www.marianotepper.com.ar/ A = (A):KY + N; (3) research/cnmf. The remainder of the paper is organized as follows. In where N is a noise matrix. SectionII we provide an overview of random projection When A presents this type of special structure, the NMF methods for matrix factorization and provide some theoretical problem (now denoted as separable NMF, SNMF) can be results relevant to this work. In sections III andIV we propose simply modeled as a set of techniques for using random projections for NMF and SNMF, respectively. Extensive experimental results on diverse #K = r; 2 problems are presented in SectionV, studying the performance min kA − (A):KYkF s.t. (4) K⊂{1;:::;ng Y ≥ 0; of the proposed techniques on both medium and large-scale Y2 r×n R problems. Finally, we provide some concluding remarks in where the choice of the Frobenius norm corresponds to a SectionVI. Gaussian noise matrix N. Having a more constrained structure for the left factor (i.e., X = (A):K) makes the problem II. ON RANDOMIZATION AND MATRIX DECOMPOSITIONS significantly easier to solve, improving the stability and the In this section we begin by describing the random projec- speed of the involved algorithms. tion algorithm used throughout this work. We also present theory that provides some guarantees for the use of random B. Structured random projections projections in matrix decomposition (in this work we use in- In recent years, we have seen an increase in the popularity of terchangeably projection or compression). Finally, we discuss randomized algorithms for computing partial matrix decompo- the performance limits of the algorithm when dealing with big sitions. These partial decompositions assume that most of the data and introduce a way to overcome such limitations. action of a matrix occurs in a subspace. The key observation In problems (1) and (4), the rank of the desired matrix here is that such a subspace can be identified through random factorization is prespecified. In the following, we will thus sampling. After projecting the input matrix into this subspace assume that we are given a matrix A, a target rank r, and an (i.e., compressing it), the desired low-rank factorization can oversampling parameter rOV (its role will become clear next). be obtained by manipulating deterministically this compressed We define a Gaussian random matrix Ω as a matrix whose matrix. In many cases, this approach outperforms its classi- entries are drawn independently from a standard Gaussian cal competitors in terms of accuracy, speed, and robustness. distribution, i.e., each entry (Ω)ij is a realization of an See [8] for a thorough review of these techniques. independent and identically distributed random variable with distribution N (0; 1). The overall approach to matrix factorization presented in [8] C. Contributions and organization consists of the following three steps: We propose an algorithmic solution for computing struc- 1) Compute an approximate basis for the range of the tured random projections of extremely large matrices (i.e., input matrix A: we construct a matrix Q, with r + rOV matrices so large that even after compression they do not fit in orthonormal columns (i.e., QTQ = I, where I is the main memory). This is useful as a general tool for computing (r + r ) × (r + r ) identity matrix), for which many different matrix decompositions (beyond NMF, which OV OV T is the particular focus of this work). Our approach leads to A − QQ A 2 ≈ min kA − Zk2 = σr+1; (5) the implementation of compression algorithms that perform rank(Z)≤r out-of-core computations (i.e., loading information in main where σj denotes the j-th largest singular value of A. In memory only as needed). other words, QQTA is a good rank-r approximation of We propose to use structured random projections for NMF A. and show that, in practice, their use implies a substantial 2) Compute a factorization of QTA. increase in speed. This performance boost does not come at the 3) Multiply the leftmost factor of the decomposition by Q, price of significant errors with respect to the uncompressed so- all other factors remain unchanged. lutions. We show this for representative algorithms of different Throughout this paper, we will use the algorithm in Fig.1 for NMF approaches, namely, multiplicative updates [9], active set performing Step (5).

Load more