Convex Nonnegative Matrix Factorization with Missing Data Ronan Hamon, Valentin Emiya, Cédric Févotte
Total Page:16
File Type:pdf, Size:1020Kb
Convex nonnegative matrix factorization with missing data Ronan Hamon, Valentin Emiya, Cédric Févotte To cite this version: Ronan Hamon, Valentin Emiya, Cédric Févotte. Convex nonnegative matrix factorization with missing data. IEEE International Workshop on Machine Learning for Signal Processing, Sep 2016, Vietri sul Mare, Salerno, Italy. hal-01346492 HAL Id: hal-01346492 https://hal-amu.archives-ouvertes.fr/hal-01346492 Submitted on 7 Oct 2016 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. 2016 IEEE INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING, SEPT. 13–16, 2016, SALERNO, ITALY CONVEX NONNEGATIVE MATRIX FACTORIZATION WITH MISSING DATA Ronan Hamon, Valentin Emiya˚ Cedric´ Fevotte´ Aix Marseille Univ, CNRS, LIF, Marseille, France CNRS & IRIT, Toulouse, France ABSTRACT example concerns the field of image or audio inpainting [5, 6, 7, 8], where CNMF may improve the current reconstruction Convex nonnegative matrix factorization (CNMF) is a variant techniques. In inpainting of audio spectrograms for example, of nonnegative matrix factorization (NMF) in which the com- setting up the dictionary to be a comprehensive collection of ponents are a convex combination of atoms of a known dic- notes from a specific instrument may guide the factorization tionary. In this contribution, we propose to extend CNMF to toward a realistic and meaningful decomposition, increasing the case where the data matrix and the dictionary have miss- the quality of the reconstruction of the missing data. In this ing entries. After a formulation of the problem in this context contribution, we also consider the case where the dictionary of missing data, we propose a majorization-minimization al- may have missing coefficients itself. gorithm for the solving of the optimization problem incurred. Experimental results with synthetic data and audio spectro- The paper is organized as follows. Section 2 formulates grams highlight an improvement of the performance of re- CNMF in the presence of missing entries in the data ma- construction with respect to standard NMF. The performance trix and in the dictionary. Section 3 describes the proposed gap is particularly significant when the task of reconstruction majorization-minimization (MM) algorithm. Sections 4 and 5 becomes arduous, e.g. when the ratio of missing data is high, report experimental results with synthetic data and audio the noise is steep, or the complexity of data is high. spectrograms. Index Terms— matrix factorization, nonnegativity, low- rankness, matrix completion, spectrogram inpainting 2. CONVEX NONNEGATIVE MATRIX FACTORIZATION WITH MISSING DATA 1. INTRODUCTION 2.1. Notations and definitions Convex NMF (CNMF) [1] is a special case of nonnegative For any integer N, the integer set t1; 2;:::;Nu is denoted by matrix factorization (NMF) [2], in which the matrix of com- rNs. The coefficients of a matrix A P RMˆN are denoted ponents is constrained to be a linear combination of atoms by either amn or rAsmn. The element-wise matrix product, A of a known dictionary. The term “convex” refers to the con- matrix division and matrix power are denoted by A:B, B straint of the linear combination, where the combination co- and A.γ , respectively where A and B are matrices with same efficients forming each component are nonnegative and sum dimensions and γ is a scalar. 0 and 1 denote vectors or ma- to 1. Compared to the fully unsupervised NMF setting, the trices composed of zeros and ones, respectively, with dimen- use of known atoms is a source of supervision that may guide sions that can be deduced from the context. The element-wise learning based on this additional data: in particular, an inter- negation of a binary matrix M is denoted by M¯ fi 1 ´ M. esting case of CNMF consists in auto-encoding the data them- selves, by defining the atoms as the data matrix. CNMF has 2.2. NMF and Convex NMF been of interest in a number of contexts, such as clustering, F ˆN data analysis, face recognition, or music transcription [1, 3]. NMF consists in approximating a data matrix V P R` as It is also related to the self-expressive dictionary-based repre- F ˆK the product WH of two nonnegative matrices W P R` sentation proposed in [4]. KˆN and H P R` . Often, K ă min pF; Nq, such that WH An issue that has not yet been addressed is when the is a low-rank approximation of V. Every sample vn, the n- data matrix has missing coefficients. Such an extension of th column of V, is thus decomposed as a linear combination CNMF is worth being considered, as it opens the way to F of K elementary components or patterns w1;:::; wK P R`, data-reconstruction settings with nonnegative low-rank con- the columns of W. The coefficients of the linear combination straints, which covers several relevant applications. One are given by the n-th column hn of H. ˚This work was supported by ANR JCJC program MAD (ANR- 14- In [9] and [10], algorithms have been proposed for the un- CE27-0002). supervised estimation of W and H from V, by minimization 978-1-5090-0746-2/16/$31.00 c 2016 IEEE of the cost function Dβ pV|WHq “ fn dβ pvfn|rWHsfnq, Algorithm 1 CNMF with missing data o where dβ px|yq is the β-divergence defined as: Require: V, S , M , M , β ř V S Initialize S, L, H with random nonnegative values 1 β β β´1 βpβ´1q x ` pβ ´ 1q y ´ βxy loop Update S: $ ` for β P Rz t0; 1˘u dβ px|yq fi (1) ' x o 'x log y ´ x ` y for β “ 1 S Ð M :S ` (5) &' S x ´ log x ´ 1 for β “ 0 .γpβq y y M : pSLHq:pβ´2q :V pLHqT ' M :S: V ' S ´ :pβ´1q ¯ T When ill-defined,% we set by convention dβp0|0q “ 0. ¨ M : pSLHq pLHq ˛ V CNMF is a variant of NMF in which W “ SL. S “ ˝ ´ ¯ ‚ F ˆP rs1;:::; sP s P R` is a nonnegative matrix of atoms and Update L: P ˆK L “ rl1;:::; lK s P R` is the so-called labeling matrix. .γpβq T :pβ´2q T Each dictionary element wk is thus equal to Slk, with usu- S M : pSLHq :V H L Ð L: V (6) ally P K, and the data is in the end decomposed as :pβ´1q ¡¡ ¨ ST´ M : pSLHq ¯HT ˛ V “ SLH. The scale indeterminacy between L and H may V ˝ ´ ¯ ‚ be lifted by imposing }lk}1 “ 1, in which case wk is pre- Update H: cisely a convex combination of the elements of the subspace .γpβq S. CNMF can be related to the so-called archetypal analysis pSLqT M : pSLHq:pβ´2q :V [11], but without considering any nonnegativity constraint. H Ð H: V (7) ¨ pSLqT´ M : pSLHq:pβ´1q ¯ ˛ The use of known examples in S can then be seen as a V source of supervision that guides learning. A special case of ˝ ´ ¯ ‚ CNMF is obtained by setting S “ V, thus auto-encoding Rescale L and H: the data as VLH. This particular case is studied in depth @k P rKs ; h Ð}l } ˆ h (8) in [1]. In this paper, we consider the general case for S, with k k 1 k or without missing data. lk lk Ð (9) }lk}1 2.3. Convex NMF with missing data end loop We assume that some coefficients in V and S may be missing. return S, L, H Let Ă rF s ˆ rNs be a set of pairs of indices that locates V the observed coefficients in V: pf; nq P iff vfn is known. Similarly, let Ă rF s ˆ rP s be a set of pairsV of indices that 3. PROPOSED ALGORITHM locates the observedS coefficients in S. The use of sets and may be reformulated equivalently by defining maskingV ma- 3.1. General description of the algorithm S F ˆN F ˆP trices MV P t0; 1u and MS P t0; 1u from and Algorithm 1 extends the algorithm proposed in [9] for com- as V S plete CNMF with the β-divergence to the case of missing en- tries in V or S. The algorithm is a block-coordinate descent 1 if pf; nq P procedure in which each block is one the three matrix factors. rMV sfn fi V @pf; nq P rF s ˆ rNs (2) #0 otherwise The updates of each block/factor is obtained via majorization- minimization (MM), a classic procedure that consists in itera- 1 if pf; pq P tively minimizing a tight upper bound (called auxiliary func- rMS sfp fi S @pf; pq P rF s ˆ rP s (3) #0 otherwise tion) of the objective function. In the present setting, the MM procedure leads to multiplicative updates, characteristic A major goal in this paper is to estimate L, H and the of many NMF algorithms, that automatically preserve non- missing entries in S, given the partially observed data matrix negativity given positive initialization. V. Denoting by So the set of observed/known dictionary ma- trix coefficients, our aim is to minimize the objective function 3.2. Detailed updates C pS; L; Hq fi Dβ pMV :V|MV :SLHq (4) We consider the optimization of C pS; L; Hq with respect to each of its three arguments individually, using MM.