Non-Negative Matrix Factorization

Non-negative matrix factorization Mathieu Blondel NTT Communication Science Laboratories 2014/10/28 1 / 44 Outline • Non-negative matrix factorization (NMF) • Optimization algorithms • Passive-aggressive algorithms for NMF 2 / 44 Non-negative matrix factorization 3 / 44 Non-negative matrix factorization (NMF) n×d n×m Given observed matrix R ∈ R+ , find matrices P ∈ R+ m×d and Q ∈ R+ such that R ≈ PQ r1,1 ··· r1,d p1,1 ··· p1,m q1,1 ··· q1,d . .. . .. . .. . ≈ . × . rn,1 ··· rn,d pn,1 ··· pn,m qm,1 ··· qm,d | {z } | {z } | {z } n×d n×m m×d m is a user-given hyper-parameter PQ is called a low-rank approximation of R 4 / 44 Examples of non-negative data The matrix R could contain... • Number of word occurrences in text documents • Pixel intensities in images • Ratings given by users to movies • Magnitude spectrogram of an audio signal • etc... 5 / 44 Why imposing non-negativity of P and Q? • Natural assumption if R is non-negative • Each row of R is approximated by a strictly additive combination of factors / bases / atoms m X [ru,1, ··· , ru,d ] ≈ pu,k × [qk,1, ··· , qk,d ] k=1 | {z } | {z } weight / activation factor / basis / atom • P and Q tend to be sparse (have many zeros) ⇒ easy-to-interpret, part-based solution 6 / 44 Application 1: document analysis • R is a collection of n text documents • Each row [ru,1, ··· , ru,d ] of R corresponds to a document represented as a bag of words • ru,i is the number of occurrences of word i in document u • Factors [qk,1,..., qk,d ] in Q correspond to “topics” • pu,k is the weight of topic k in document u 7 / 44 letters to nature letters to nature parts are likely to occur together. This results in complex depen- consequence of the non-negativity constraints, which is that dencies between the hidden variablesparts that are likely cannot to be occur captured together. by Thissynapses results inare complex either excitatory depen- consequence or inhibitory, of but the do non-negativity not change constraints, which is that algorithms that assume independencedencies in between the encodings. the hidden An alter- variablessign. that Furthermore,cannot be captured the non-negativity by synapses of are the either hidden excitatory and visible or inhibitory, but do not change native application of ICA is to transformalgorithms the that PCA assume basis images, independence to variables in the encodings. corresponds An to alter- the physiologicalsign. Furthermore, fact that the the firing non-negativity rates of of the hidden and visible make the images rather than the encodingsnative application as statistically of ICA indepen- is to transformneurons the PCA cannot basis be images, negative. to Wevariables propose corresponds that the one-sided to the physiological con- fact that the firing rates of 18 make the images rather than the encodings as statistically indepen- neurons cannot be negative. We propose that the one-sided con- dent as possible . This results in a basis that is non-global;18 however, straints on neural activity and synaptic strengths in the brain may be in this representation all the basisdent images as possible are used. This in results cancelling in a basisimportant that is non-global; for developing however, sparselystraints distributed, on neuralletters parts-basedactivity to and nature synaptic repre- strengths in the brain may be combinations to represent an individualin this representationface, and thus theall the encod- basis imagessentations are used for perception. in cancelling important for developing sparselyⅪ distributed, parts-based repre- ings are not sparse. In contrast, thecombinationsparts NMF are likely representation to to occur represent together. contains an This individual results in complex face, and depen- thusconsequence the encod- ofsentations the non-negativity for perception. constraints, which is that Ⅺ ingsdencies are between not sparse. the hidden In contrast,variables that the cannot NMF be representation captured by synapses contains are eitherletters excitatory or to inhibitory, nature but do not change both a basis and encoding that are naturally sparse, in that many of Methods bothalgorithms a basis that and assume encoding independence that are in the naturally encodings. sparse, An alter- in thatsign. many Furthermore, of Methods the non-negativity of the hidden and visible the components are exactlyparts equal arenative likely to zero. toapplication occur Sparseness together. of ICA This is in to results transformboth in the complex the PCAThe depen- basisfacial images, imagesconsequence used to invariables of Fig. the 1 consisted non-negativitycorresponds of frontal to theconstraints, views physiological hand-aligned which fact is that in that a the 19 firingϫ 19 grid. rates of the components are exactly equal to zero. Sparseness in both the The facial images used in Fig. 1 consisted of frontal views hand-aligned in a 19 ϫ 19 grid. basis and encodings is crucialdencies for between amake parts-based the the images hidden rather representation. variables than that the cannotencodings be capturedas statisticallyFor each by image,synapses indepen- the greyscale areneurons either intensities excitatory cannot were be or negative. first inhibitory, linearly We butscaled propose do so not that that change the the pixel one-sided mean and con- basis and encodings18 is crucial for a parts-based representation. For each image, the greyscale intensities were first linearly scaled so that the pixel mean and The algorithm of Fig. 2algorithms performsdent that asboth assume possible learning independence. This results and in in inference the a basis encodings. that is non-global; Anstandard alter- deviationsign. however, Furthermore, werestraints equal theto on 0.25, neuralnon-negativity and activity then clipped and of the synaptic to hidden the strengths range and [0,1].visible in the NMF brain was may be native applicationinThe this representation algorithm of ICA is to transform of allFig. the basis the 2 PCAperforms images basis are images,performed both used to in learning cancellingvariables with theand corresponds iterativeimportant inference algorithm to the for physiological developing describedstandard in deviation sparselyfact Fig. that 2, startingthe distributed, were firing equal with rates to parts-basedrandom of 0.25, and initial then repre- clipped to the range [0,1]. NMF was simultaneously. That is, it both learns a set of basis images and make thesimultaneously.combinations images rather than to represent the That encodings an is, individual it as both statistically face, learnsindepen- andconditions a thus set theneurons offor encod-W basisand cannotH images.sentations The be algorithm negative. and for perception. was Weperformed mostlypropose converged thatwith the the one-sided iterative after less algorithm than con- 50 iterations; describedⅪ in Fig. 2, starting with random initial infers values for the hiddendent variables as possibleings are18. from This not sparse. results the in In visible a contrast, basis that variables. isthe non-global; NMF representation however, straints contains on neural activity and synapticconditions strengths for W inand theH brain. The may algorithm be was mostly converged after less than 50 iterations; infers values for the hidden variableslettersthe to from results nature shown the visible are after variables. 500 iterations, which took a few hours of computationletters time to on nature a Although the generative modelin this representation ofboth Fig. a basis 3 andis all linear, theencoding basis the images thatinference are are naturally used in sparse, cancellingPentium in that IIimportant computer. many of PCAforMethods developing was done by sparsely diagonalizingthe results distributed, shown the matrix parts-based are afterVV 500T. The repre- iterations, 49 eigenvectors which took a few hours of computation time on a combinationsAlthough to represent the an generative individual face, model and thus of the Fig. encod- 3 is linear,sentations the for perception. inference Pentium II computer. PCA was doneⅪ by diagonalizing the matrix VVT. The 49 eigenvectors the components are exactly equal to zero.parts Sparseness arewith likely tothe occur in largest both together. eigenvalues the ThisThe results facial are in imagescomplexdisplayed. used depen- in VQ Fig. 1wasconsequence consisted done of via frontalof the views non-negativityk-means hand-aligned algorithm, constraints, in a 19 ϫ 19 which grid. is that computationparts are likely to isoccur nonlinear together. This owingresultsings in are complex to not the sparse. depen- non-negativity Inconsequence contrast, the ofthe NMF constraints. non-negativity representation constraints, contains which is that For each image, the greyscalewith intensities the largest were eigenvalues first linearly scaled are sodisplayed. that the pixel VQ mean was and done via the k-means algorithm, computationbasis and encodings is nonlinear is crucial for owing a parts-baseddencies to the betweenstarting representation. non-negativity the from hidden random variables constraints.thatinitial cannot conditions be captured for byWsynapsesand H. are either excitatory or inhibitory, but do not change Thedencies computation between the hidden is variables similar thatboth to cannot maximum a basis be capturedThe and encodingalgorithm by likelihoodsynapses that of are are Fig. eitherreconstruction naturally 2 performs excitatory sparse, or both inhibitory,algorithms in that learning many butthat do assume ofand not inferenceMethods

Load more