Initialization Enhancer for Non-Negative Matrix Factorization

ARTICLE IN PRESS Engineering Applications of Artificial Intelligence 20 (2007) 101–110 www.elsevier.com/locate/engappai Initialization enhancer for non-negative matrix factorization Zhonglong Zhenga,b,Ã, Jie Yangb, Yitan Zhuc aInstitute of Information Science and Engineering, P.O. box 143, Zhejiang Normal University, Jinhua 321004, Zhejiang, People’s Republic of China bInstitute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200030, People’s Republic of China cAlexandria Research Institute, Virginia Polytechnic Institute and State University, 206 N. Washington Street, Alexandria, VA 22314, USA Received 25 March 2005; received in revised form 9 September 2005; accepted 12 March 2006 Available online 27 April 2006 Abstract Non-negative matrix factorization (NMF), proposed recently by Lee and Seung, has been applied to many areas such as dimensionality reduction, image classification image compression, and so on. Based on traditional NMF, researchers have put forward several new algorithms to improve its performance. However, particular emphasis has to be placed on the initialization of NMF because of its local convergence, although it is usually ignored in many documents. In this paper, we explore three initialization methods based on principal component analysis (PCA), fuzzy clustering and Gabor wavelets either for the consideration of computational complexity or the preservation of structure. In addition, the three methods develop an efficient way of selecting the rank of the NMF in low- dimensional space. r 2006 Elsevier Ltd. All rights reserved. Keywords: Non-negative matrix factorization; Principal component analysis; Fuzzy clustering; Gabor wavelet; Dimensionality reduction; Image classification 1. Introduction classification of classes. In addition, the eigen-face derived from PCA is holistic or can be called dense (Buciu and In pattern analysis and computer vision, visual recogni- Pitas, 2004). tion of objects is one of the most challenging problems. Recently, an unsupervised approach called non-negative Approaches to overcoming such problems have focused on matrix factorization (NMF) has been proposed by Lee and several methodologies. Appearance-based representation Seung (Lee and Seung, 1999, 2001). NMF is different by and recognition is one of the most successfully methods adding its non-negative constraints in contrast to PCA and still used today. It involves preprocessing of multidimen- independent component analysis (ICA) (Barlett et al., sional signals, such as face images (Turk and Pentland, 1998). When applied to image analysis and representation, 1991), character images (Jiangying zhou and Lopresti, the obtained NMF bases are localized features that 1997), speech spectrograms (Bell and Sejnowski, 1995) and correspond with intuitive notions of the parts of the so on. In fact, the essence of the preprocessing is the so- images. It is supported by psychological and physiological called dimensionality reduction. Principal component evidence that perception of the whole is based on parts- analysis (PCA), based on second-order statistics, is one of based representations (Mel, 1999). Researchers have the most popular linear dimensionality reduction methods. proposed several different algorithms based on traditional It is a well-known fact that PCA is optimal in terms of the NMF. Guillamet et al. presented a study of the weighted reconstruction error (MSE) but not for the separation and version of the original NMF (WNMF) (Guillamet et al., 2003). Li et al. developed a variant of NMF, named local non-negative matrix factorization (LNMF), by imposing Ã Corresponding author. Institute of Information Science and Engineer- additional constraints (Li et al., 2001). Buciu et al. ing, P.O. box 143, Zhejiang Normal University, Jinhua 321004, Zhejiang, People’s Republic of China. Tel.: +86 579 2282155; fax: +86 579 2298229. proposed a supervised technique called discriminant non- E-mail addresses: [email protected], [email protected] negative matrix factorization (DNMF) by taking into (Z. Zheng). account class information (Buciu and Pitas, 2004). Also, 0952-1976/$ - see front matter r 2006 Elsevier Ltd. All rights reserved. doi:10.1016/j.engappai.2006.03.001 ARTICLE IN PRESS 102 Z. Zheng et al. / Engineering Applications of Artificial Intelligence 20 (2007) 101–110 NMF technique has been applied to many areas (Pauca This is lower bounded by 0, and equals 0 if and only if et al., 2004; Guillamet and Vitria` , 2002; Lee and Lee, 2 X ¼ Xe. To minimize X À Xe with respect to W and H, 2001). As we know, however, there are few documents paying subject to the constraints W; HX0, the multiplicative much attention to the problem of NMF initialization. In update rule is described below: theory, NMF unavoidably converges to local minima. So, T T ðW XÞau ðXH Þia the NMF bases will be different given different initializa- Hau Hau ; W ia W ia . (3) ðW TWHÞ ðWHHTÞ tion. The experiments cannot be repeated by others if they au ia do not know the initialization conditions. It is our desire to The Euclidean distance is invariant under these updates if give proper initialization for a given task in order to and only if W and H are at a stationary point of the improve the performance of NMF either for consideration distance. of computational complexity or for the preservation of Another useful measure is the divergence of X and Xe.It data structure. To this end, we explore three techniques, is defined as ! which are PCA, fuzzy clustering and Gabor wavelet, to X realize NMF initialization. Furthermore, the three initi- e X ij e DDðXjjXÞ¼ X ij log e À X ij þ X ij : (4) alization methods provide helpful indications in determin- ij X ij ing the rank of NMF in low-dimensional space. It is also lower bounded 0, and vanishes if and only if In the following sections, we will give a brief introduction X ¼ Xe. We call it ‘‘divergence’’ instead of ‘‘distance’’ to the NMF algorithm. Then, we will present the three because it is not symmetric in X and Xe. To minimize initialization methods more formally, together with some DðXjjXeÞ with respect to W and H, subject to the illustrative simulations on the face image dataset. Finally, we constraints W; HX0, the corresponding multiplicative will discuss how to determine the rank of NMF in low- update rule is described as follows: dimensional space through the three initialization techniques. P W iaX iu=ðWHÞiu i P 2. Non-negative matrix factorization algorithm Hau Hau , W ka k Let a set of N training images be given as an m Â N P HauX iu=ðWHÞiu matrix X, with each column consisting of the m non- u P W ia W ia . ð5Þ negative pixel values of an image. NMF seeks to find non- Hav negative factors W and H such that v X Xe ¼ WH, (1) The divergence is also invariant under these updates if and only if W and H are at a stationary point of the divergence mÂr rÂN where W 2 R ; H 2 R . The r columns of W are Lee and Seung, 2001. thought of as basis images derived from the NMF Li et al. proposed a refinement of NMF by a slight algorithm (Lee and Seung, 1999). H is the encoding variation of the divergence algorithm, called LNMF (Li variable matrix. Dimensionality reduction is achieved when et al., 2001). The objective function of LNMF is roN. ! Xm XN X NMF imposes the non-negative constraints on both W X ij D ¼ X log À X þ Xe þ aU À b V , and H. As a consequence, the entries of W and H are all LD ij e ij ij ij ii i¼1 j¼1 X ij i non-negative and hence only non-subtractive basis combinations are allowed. It is believed to be compatible to the (6) intuitive notion of combining parts to form a whole. where a, b are positive constants and However, PCA requires columns of W to be orthonormal U ¼ W TW; V ¼ HHT. The update rule of the LNMF and the rows of H to be mutually orthogonal. It lacks for W is nearly identical to that in Eq. (5). For H, the intuitive meaning in that the linear combination of the update rule uses the square root to satisfy the additional bases in W usually involves complex cancellations between constraints: vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi positive and negative numbers because of the arbitrary sign u X of the entries in W and H. u P W ia Hau tHau X iu . (7) To find an approximate factorization of NMF, the cost i W iaHau function that quantifies the quality of the approximation a has to be defined. Such a cost function can be constructed Having introduced the framework of NMF, we test using some measure of distance between W and H. One NMF on our private SJTU-face-database to extract the natural way is simply to use Euclidean distance between basis images to give an illustrative impression of NMF. We two matrices X and Xe to evaluate the approximation: select 400 images and each of them is cropped to 64 Â 64 2 X size. The results are shown in Fig. 1. The NMF basis e e e 2 DEðX; XÞ¼ X À X ¼ ðX ij À X ijÞ . (2) images look a little like those eigenfaces derived from PCA, ij to some extent. However, the basis faces in Fig. 1(b) could ARTICLE IN PRESS Z. Zheng et al. / Engineering Applications of Artificial Intelligence 20 (2007) 101–110 103 Fig. 1. Basis images: (a) part of original face images; (b) NMF basis images; (c) LNMF basis images; and (d) PCA basis images. T approximate the entire collection of faces using only Using SVD to compute the eigenvectors of the X X T positive combinations instead of arbitrary combination in instead of XX (because usually mbN), up to N PCA.

Load more