Discriminant Analysis in Correlation Similarity Measure Space

Discriminant Analysis in Correlation Similarity Measure Space Yong Ma [email protected] Shihong Lao [email protected] Erina Takikawa [email protected] Masato Kawade [email protected] Sensing & Control Lab., Omron Corporation, Kyoto, 619-0283, Japan Abstract For example, in face recognition, the framework (Kittler, 2000) combining principal component analysis (PCA), Correlation is one of the most widely used linear discriminant analysis (LDA) (Fukunaga, 1990), and similarity measures in machine learning like nearest neighbor (NN) based on correlation measure Euclidean and Mahalanobis distances. However, achieved obviously superior performance to other compared with proposed numerous discriminant frameworks, such as the combination of PCA, LDA, and learning algorithms in distance metric space, NN based on Euclidean metric or weighted Euclidean only a very little work has been conducted on metric, on a large scale comparative evaluations such as this topic using correlation similarity measure. In BANCA (Kittler, 2000) etc. In gene expression analysis this paper, we propose a novel discriminant (Brown, 2000) and document categorization problems learning algorithm in correlation measure space, (Han, 2001; Peterson, 2005), correlation was found to be Correlation Discriminant Analysis (CDA). In an effective measure. Besides these areas, in signal this framework, based on the definitions of processing community, correlation and correlation-based within-class correlation and between-class filters (Kumar, 1986; Mahalanobis, 1987; Xie, 2005) were correlation, the optimum transformation can be also widely used to detect the existence of some specific sought for to maximize the difference between signals. them, which is in accordance with good classification performance empirically. Under However, compared with the Euclidean or Mahalanobis different cases of the transformation, different distance, the correlation is not a metric due to that it implementations of the algorithm are given. cannot satisfy the non-negativity and the triangle Extensive empirical evaluations of CDA inequality. And different from the situation that there are demonstrate its advantage over alternative so many discriminant learning algorithms in distance methods. metric space such as LDA, relevant component analysis (RCA) (Hillel, 2005), distance metric learning with side- information (DMLSI) (Xing, 2003) etc, there is only a very little discriminant learning work based on correlation similarity measure. To the best of our knowledge, this type of research includes canonical correlation analysis 1. Introduction (CCA) (Hardoon, 2004), correlation filter design such as minimum average correlation energy (MACE) filter Correlation (also termed as normalized correlation, (Mahalanobis, 1987), etc. So it is very interesting to correlation coefficient, Pearson correlation, or cosine extend the discriminant learning research in correlation similarity, hereafter correlation for simplicity) is a widely similarity space from a new way. used measure to describe the similarity s between two vectors, u and v , in pattern classification and signal On the other hand, from the research (Belhumeur, 1997; processing problems like Euclidean distance and Kittler, 2000; Martinez, 2005) on face recognition about the framework of PCA, LDA and correlation-based NN Mahalanobis distance. classifier, we knew that PCA was mainly used to reduce uvT feature dimension and solve the singularity problem of s = , (1) LDA due to small sample size, LDA was used to find the uuvvTT optimum discriminant projections to make the Euclidean metric-based within-class scatter denser and between- ————— class more scattered over training set, and the correlation- Appearing in Proceedings of the 24dthInternational Conference on based NN is used to find the best match in the gallery set Machine Learning, Corvallis, OR, 2007. Copyright 2007 by the for a probe face according to their correlation similarity author(s)/owner(s). during evaluation stage. We could notice that although Discriminant Analysis in Correlation Similarity Measure Space correlation-based NN does not match the Euclidean LDA, RCA, and DMLSI are only concerned with global metric-based LDA very much, it can achieve better Euclidean or Mahalanobis structure. Sometimes in the performance than Euclidean metric-based NN. Naturally, context of sparse data such as face recognition, these we would wonder how about the performance if we distance metrics were not optimal (Kittler, 2000). Several directly substitute correlation measure for Euclidean attempts have made to take the advantage of non-linear metric during the discriminant learning stage. structure to provide more discriminatory information while keeps LDA structure untouched. In the Smith et In this paper, we will explore the above questions and al.’s (2006) work, an angular LDA (ALDA) was propose a novel discriminant learning algorithm in presented. In their method, the LDA transformation was correlation measure space, Correlation Discriminant first applied to the probe vector, and then a non-linear Analysis (CDA). CDA combines the discriminant transformation was used to project the probe LDA vector learning with correlation measure together, seeks for the into the probe ALDA vector whose ith component was optimum transformation to maximizing the difference the angle between the probe LDA vector and the ith LDA between within-class correlation and between-class component axis. Standard classifier can be designed in correlation. This difference can be empirically considered this ALDA space. This work still separated the as a criterion for the classification performance. In optimization procedure from the metric or measure extensive experimental studies on publicly available learning. databases and real applications we show that the proposed learning algorithm can consistently improve the Correlation similarity measure was also widely utilized performance of original correlation-based classification together with k-nearest neighbor algorithms to replace further, and achieve the comparable or even better traditional Euclidean/Mahalanobis distance. Han et al., performance with alternatives. (2001) proposed a weight adjusted k-nearest neighbor classification method to learn the weights for different The rest of the paper is organized as follows. In section 2, features in correlation similarity; Peterson et al., (2005) we will briefly introduce the related work and compare presented a genetic optimization algorithm to solve the them with CDA. In section 3, the details of CDA above problem to optimize both the feature weights and algorithm and optimization methods under different offsets to features according to a simple empirical situation are given. In section 4, thorough experiments are criterion. The main drawback of these methods is that the made among CDA and other supervised discriminant transformation is only in diagonal form and the weight of learning algorithms. Finally, section 5 gives the every feature is assumed to be uncorrelated. conclusions. Canonical correlation analysis (CCA) (Hardoon, 2004) might be the most successful dicriminant learning 2. Related Work methods dedicated to correlation measure till now. It can In this section, we will first review the most popular be viewed as the problem of finding basis vectors for two discriminant learning techniques in distance metric space, sets of variables such that the correlations between the such as LDA, relevant component analysis (RCA) (Hillel, projections of the variables are mutually maximized. So if 2005), distance metric learning with side-information one set of variables is taken as class labels, CCA can be (DMLSI) (Xing, 2003), and then introduce some related explicitly utilized to realize a supervised linear feature work in similarity measure domain. extraction and subsequent classification (Loog, 2004). The similar method is also straightforwardly used in As we know, LDA find the best projection directions on partial least squares (PLS) (Barker 2003) for classification which the within-class scatter matrix is minimized while in which it also involves the correlation between two sets the between-class scatter matrix is maximized under the of variables with different constraints. Fisher criterion (Fukunaga, 1990). To solve its singularity problem when the number of samples is much smaller There are two main problems for CCA and PLS to be than the feature dimension, PCA is often used prior to applied to classification problems. The first one is that LDA; Not considering the maximization of between-class different encoding modes for class labels can result in scatter matrix which stands for the negative equivalence different performances. Johansson (2001) proposed the constraints, RCA is only concerned with the minimization one-of-c label encoding to associate a sample with its of sum of squared Mahalanobis distances of positive corresponding class label; Sun (2007) proposed a soft equivalence constraints; under this criterion, the optimum label encoding method based on fuzzy k-nearest neighbor transformation is the inverse of within-class scatter matrix. method. The second problem is that they are not suitable Different from RCA, DMLSI tries to learn an optimum for open-set verification applications (Phillips, 2000). In Mahalanobis metric under both positive and negative open-set verification, the classes in gallery or probe set constraints, in which, the sum of squared Mahalanobis might never appear

Load more