Latent Discriminant Analysis with Representative Feature Discovery

Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Latent Discriminant Analysis with Representative Feature Discovery Gang Chen Department of Computer Science and Engineer SUNY at Buffalo, Buffalo, NY 14260 [email protected] Abstract gene expression data analysis. Moreover, LDA was shown to compare favorably with other supervised dimensionality re- Linear Discriminant Analysis (LDA) is a well-known method duction methods through extensive experiments (Sugiyama for dimension reduction and classification with focus on discriminative feature selection. However, how to discover dis- et al. 2010). criminative as well as representative features in LDA model However, as a supervised approach, LDA expects man- has not been explored. In this paper, we propose a latent ually annotated training sets, e.g., instance/label pairs. As Fisher discriminant model with representative feature discov- we known, it is labor-intensive and time-consuming to label ery in an semi-supervised manner. Specifically, our model each instance, which is surprisingly prohibitive especially leverages advantages of both discriminative and generative for large scale data. Correspondently, it is reasonable to ex- models by generalizing LDA with data-driven prior over the tend supervised LDA into a semi-supervised method, and latent variables. Thus, our method combines multi-class, la- many approaches (Joachims 1999; Cai, He, and Han 2007; tent variables and dimension reduction in an unified Bayesian framework. We test our method on MUSK and Corel datasets Zhang and Yeung 2008; Sugiyama et al. 2010) have been and yield competitive results compared to baselines. We proposed. Unfortunately, most of these methods still need also demonstrate its capacity on the challenging TRECVID instance/label pairs, i.e. training a classifier with a few la- MED11 dataset for semantic keyframe extraction and con- beled instances. In practice, many real applications require duct a human-factors ranking-based experimental evaluation, bag-level labels (Andrews, Tsochantaridis, and Hofmann which clearly demonstrates our proposed method consistently 2002), such as molecule activity (Maron and Lozano-Prez extracts more semantically meaningful keyframes than chal- 1998), image classification (Maron and Ratan 1998) and lenging baselines. event detection (Perera et al. 2011). Recently, MI-SVM or latent SVM (Andrews, Tsochantaridis, and Hofmann 2002; Introduction Felzenszwalb et al. 2010) has been widely used for classification tasks, such as object detection. In a sense, MI-SVM Linear Discriminant Analysis (LDA) (Fisher 1936) is a pow- can learn discriminative features effectively under maxi- erful tool for dimensionality reduction and classification that mum margin framework. However, MI-SVM does not con- projects high dimensional data into a low-dimensional space sider data distribution while inferring latent variables. On where the data achieves maximum class separability (Duda, the contrary, LDA leverages data distribution by computing Hart, and Stork 2000; Fukunaga 1990; Wu, Wipf, and Yun between-class and within-class covariances to learn a dis- 2015). The basic idea in classical LDA, known as Fisher criminant projection. Thus it is possible to incorporate the Linear Discriminant Analysis (FDA) is to obtain the pro- data driven prior into LDA to discover both representative jection matrix by minimizing the within-class distance and and discriminative features. maximizing the between-class distance simultaneously to In this paper, we propose a Latent Fisher Discriminant yield the maximum class discrimination. It has been proved Analysis model (or LFDA in short) with representative fea- analytically that the optimal transformation is readily com- ture discovery. On the one hand, we hope our model can han- puted by solving a generalized eigenvalue problem (Fuku- dle semi-supervised learning problems. On the other hand, naga 1990). In order to deal with multi-class scenarios (Rao we can generalize discriminative FDA to select representa- 1948; Duda, Hart, and Stork 2000), LDA can be easily ex- tive features as well. More specifically, our method unifies tended from binary case to multi-class problems, which finds the discriminative nature of FDA with a data driven Gaus- a subspace with d − 1 dimensions, where d is the num- sian mixture prior over the training data under the Bayesian ber of classes in the training dataset. Because of its effec- framework. By combining these two terms into one model, tiveness and computational efficiency, it has been applied we infer latent variables and learn projection matrix in an successfully in many applications, such as face recognition alternative manner until convergence. To further leverage (Belhumeur, Hepanha, and Kriegman 1997) and microarray the compactness of each component with Gaussian mix- Copyright c 2017, Association for the Advancement of Artificial ture model, we assume that all instances in each component Intelligence (www.aaai.org). All rights reserved. have the same label. Thus, our model relaxes the instance 1791 level inference into component level inference by maximiz- However, directly casting LDA as a semi-supervised method ing a joint likelihood, which can capture representative fea- to handle bag-level labels is still a challenge for multi-class tures effectively. To sum up, our method combines multi- problems. class, latent variables and dimension reduction in an unified bayesian framework. We demonstrate the advantages of Latent Fisher discriminant analysis our model on MUSK and Corel datasets for classification Let X = {x1, x2, ..., xn} represent n bags, with the cor- problems, and on TRECVID MED11 dataset for semantic responding labels L = {l1,l2, ..., ln} as the training data. keyframe extraction on five video events (Perera et al. 2011). For each bag xi ∈ X , it can have one or multiple instances (Andrews, Tsochantaridis, and Hofmann 2002), and its la- Related Work bel li is categorical and assumes values in a finite set, e.g. d×ni LDA has been a popular method for dimension reduction {1, 2, ..., C}. Let xi ∈ R , which means it contains ni x { 1 2 ni } and classification. It searches a projection matrix that simul- instances (or frames), denoted as i = xi ,xi , ..., xi with taneously maximizes the between-class dissimilarity and th j ∈ Rd its j instance xi (however, its label is not given). minimizes the within-class dissimilarity to increase class Given the data X and its corresponding instance level labels separability, typically for classification applications. And Z(X ), LDA searches for a discriminative feature transfor- many methods (Belhumeur, Hepanha, and Kriegman 1997; mation f : X → Y to maximize the ratio of between-class Chen et al. 2000; Baudat and Anouar 2000; Merchante, variance to the within-class variance, where y ∈ Rd and Grandvalet, and Govaert 2012) have been proposed to ei- d ≤ d. In general, d is decided by C, namely d = C − 1. ther leverage or extend LDA because of its effectiveness However, we do not know the instance-level labels Z(X ) and computational efficiency. Belhumeur et al proposed for the data X . In our case, only the bag-level labels L are PCA+LDA (Belhumeur, Hepanha, and Kriegman 1997) for available. Thus, we think for any instance ∀x ∈ X ,ithas face recognition. Recently, sparsity induced LDA is also a corresponding label z(x), which can be inferred from the proposed (Merchante, Grandvalet, and Govaert 2012; Wu, training pairs (X , L). Wipf, and Yun 2015). Latent Fisher discriminant analysis model generalizes However, many real-world applications only provide la- LDA with latent variables. Suppose the projection matrix is bels on bag-level, such as object detection (Felzenszwalb P, and y = f(x)=Px, then our latent Fisher LDA pro- et al. 2010) and image classification (Maron and Ratan poses to minimize the following ratio: 1998). In the last decades, semi-supervised methods have ∗ been proposed to utilize unlabeled data to aid classifica- (P )=argminJ(P,Z) P tion or regression tasks under situations with limited labeled ,Z PT Σ (X , L,Z)P data, such as Transductive SVM (TSVM) (Vapnik 1998; w PT P =argmintrace T + β Joachims 1999) and Co-Training (Blum and Mitchell 1998). P,Z P Σb(X , L,Z)P One of the main trend is to extend LDA to handle semi- (1) supervised problems (Cai, He, and Han 2007; Zhang and X Yeung 2008; Sugiyama et al. 2010) in a transductive man- where Z are the latent variables for the data , and β is ner, which attempts to utilize unlabeled data to aid clas- a weighing parameter for regularization term. The variable ∈ X sification or regression tasks under situations with limited z Z( ) defines the possible latent values for a sam- ∈ X ∈{ } X L labeled data. For example, Semi-supervised Discriminant ple x . In our case, z 1, 2, ..., C . Σb( , ,Z) X L Analysis (Cai, He, and Han 2007) was proposed, which is between-class scatter matrix and Σw( , ,Z) is within- made use of both labeled and unlabeled samples. Sugiyama class scatter matrix, defined respectively as follows: C T et. al. proposed a semi-supervised dimensionality reduc- Σw(X , L,Z)=Σk=1Σ{x∈X |δ(z(x)=k)}(x − x¯k)(x − x¯k) tion method (Sugiyama et al. 2010), which can preserve (2) the global structure of unlabeled samples in addition to Σ (X , L,Z)=ΣC m (¯x − x¯)(¯x − x¯)T separating labeled samples in different classes from each b k=1 k k k (3) other. Chen and Corso proposed a semi-supervised approach where δ(z(x)=k) is the indicator function, mk is (Chen and Corso 2012) to learn discriminative codebooks the number of training samples for each class k, x¯k = x and classify instances with nearest neighbor voting. Re- {x∈X |δ(z(x)=k)} is the mean of the k-th class and x¯ is the cenly, latent SVM or MI-SVM has attracted great atten- mk 1 C total mean vector given by x¯ = C Σ =1mkx¯k.

Load more