Learning Canonical Correlations of Paired Tensor Sets Via Tensor-To-Vector Projection
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence Learning Canonical Correlations of Paired Tensor Sets Via Tensor-to-Vector Projection Haiping Lu Institute for Infocomm Research Singapore [email protected] Abstract structure in feature extraction, obtain more compact represen- tation, and process big data more efficiently, without reshap- Canonical correlation analysis (CCA) is a useful ing tensors into vectors. Many multilinear subspace learn- technique for measuring relationship between two ing (MSL) algorithms [Lu et al., 2011] generalize linear algo- sets of vector data. For paired tensor data sets, rithms such as principal component analysis and linear dis- we propose a multilinear CCA (MCCA) method. criminant analysis to second/higher-order [Ye et al., 2004a; Unlike existing multilinear variations of CCA, 2004b; Yan et al., 2005; Tao et al., 2007; Lu et al., 2008; MCCA extracts uncorrelated features under two ar- 2009a; 2009b]. There have also been efforts to generalize chitectures while maximizing paired correlations. CCA to tensor data. To the best of our knowledge, there are Through a pair of tensor-to-vector projections, one two approaches. architecture enforces zero-correlation within each One approach analyzes the relationship between two tensor set while the other enforces zero-correlation be- data sets rather than vector data sets. The two-dimensional tween different pairs of the two sets. We take a CCA (2D-CCA) [Lee and Choi, 2007] is the first work along successive and iterative approach to solve the prob- this line to analyze relations between two sets of image data lem. Experiments on matching faces of different without reshaping into vectors. It was further extended to poses show that MCCA outperforms CCA and 2D- local 2D-CCA [Wang, 2010], sparse 2D-CCA [Yan et al., CCA, while using much fewer features. In addi- 2012], and 3-D CCA (3D-CCA) [Gang et al., 2011]. Un- tion, the fusion of two architectures leads to per- der the MSL framework, these CCA extensions are using the formance improvement, indicating complementary tensor-to-tensor projection (TTP) [Lu et al., 2011]. information. Another approach studies the correlations between two data tensors with one or more shared modes because CCA can be viewed as taking two data matrices as input. This 1 Introduction idea was first introduced in [Harshman, 2006]. Tensor CCA Canonical correlation analysis (CCA) [Hotelling, 1936] is a (TCCA) [Kim and Cipolla, 2009] was proposed later with well-known method for analyzing the relations between two two architectures for matching (3-D) video volume data of the sets of vector-valued variables. It assumes that the two data same dimensions. The single-shared-mode and joint-shared- sets are two views of the same set of objects and projects them mode TCCAs can be viewed as transposed versions of 2D- into low dimensional spaces where each pair is maximally CCA and classical CCA, respectively. They apply canonical correlated subject to being uncorrelated with other pairs. transformations to the non-shared one/two axes of 3-D data, CCA has various applications such as information retrieval which can be considered as partial TTPs under MSL. Follow- [Hardoon et al., 2004], multi-label learning [Sun et al., 2008; ing TCCA, the multiway CCA in [Zhang et al., 2011] uses Rai and Daume´ III, 2009] and multi-view learning [Chaud- partial projections to find correlations between second-order huri et al., 2009; Dhillon et al., 2011]. and third-order tensors. Many real-world data are multi-dimensional, represented However, none of the multilinear extensions above takes as tensors rather than vectors. The number of dimensions is an important property of CCA into account, i.e., CCA ex- the order of a tensor. Each dimension is a mode. Second- tracts uncorrelated features for different pairs of projections. order tensors include matrix data such as 2-D images. Ex- To close this gap, we propose a multilinear CCA (MCCA) al- amples of third-order tensors are video sequences, 3-D im- gorithm for learning canonical correlations of paired tensor ages, and web graph mining data organized in three modes data sets with uncorrelated features under two architectures. of source, destination and text [Kolda and Bader, 2009; It belongs to the first approach mentioned above. Kolda and Sun, 2008]. MCCA uses the tensor-to-vector projection (TVP) [Lu et To deal with multi-dimensional data effectively, re- al., 2009a], which will be briefly reviewed in Sec. 2. We fol- searchers have attempted to learn features directly from ten- low the CCA derivation in [Anderson, 2003] to successively sors [He et al., 2005]. The motivation is to preserve data solve for elementary multilinear projections (EMPs) to maxi- 1516 mize pairwise correlations while enforcing same-set or cross- 3.1 Problem formulation through TVP set zero-correlation constraint in Sec. 3. Finally, we evaluate In (second-order) MCCA, we consider two matrix data the performance on facial image matching in Sec. 4. I1×I2 sets with M samples each: Xm 2 R and To reduce notation complexity, we present our work here Y 2 J1×J2 , m = 1; :::; M. I and I may not only for second-order tensors, i.e., matrices, as a special case, m R 1 2 equal to J and J , respectively. We assume that their although we have developed MCCA for general tensors of 1 2 [ ] means have been subtracted so that they have zero-mean, any order following Lu et al., 2011 . PM PM i.e., m=1 Xm = 0 and m=1 Ym = 0. We are in- terested in paired projections f(u ; v ); (u ; v )g, 2 Preliminary of CCA and TVP xp xp yp yp I1 I2 J1 J2 uxp 2 R , vxp 2 R , uyp 2 R , vyp 2 R , p = 1; :::; P , 2.1 CCA and zero-correlation constraint for dimensionality reduction to vector spaces to reveal CCA captures the correlations between two sets of vector- correlations between them. The P projection pairs form valued variables that are assumed to be different represen- matrices Ux = [ux1 ; :::; uxP ], Vx = [vx1 ; :::; vxP ], [ ] tations of the same set of objects Hotelling, 1936 . For two Uy = [uy1 ; :::; uyP ], and Vy = [vy1 ; :::; vyP ]. The matrix x 2 I y 2 J m = 1; :::; M I 6= J P paired data sets m R , m R , , in sets fXmg and fYmg are projected to frm 2 R g and general, CCA finds paired projections fu ; u g, u 2 I , P xp yp xp R fsm 2 R g, respectively, through TVP (3) as J uyp 2 R , p = 1; :::; P , such that the canonical variates wp r = diag (U0 X V ) ; s = diag U0 Y V : and zp are maximally correlated. The mth elements of wp m x m x m y m y (4) and z are denoted as w and z , respectively: p pm pm We can write the projection above according to EMP (2) as 0 0 wp = u xm; zp = u ym; (1) m xp m yp r = u0 X v ; s = u0 Y v ; mp xp m xp mp yp m yp (5) where u0 denotes the transpose of u. In addition, for p 6= q where r and s are the pth elements of r and s , re- (p; q = 1; :::; P ), wp and wq, zp and zq, and wp and zq, re- mp mp m m M spectively, are all uncorrelated. Thus, CCA produces uncor- spectively. We then define two coordinate vectors wp 2 R M related features both within each set and between different and zp 2 R for the pth EMP, where their pth elements pairs of the two sets. wpm = rmp and zpm = smp . Denote R = [r1; :::; rM ], The solutions fuxp ; uyp g for this CCA problem can be ob- S = [s1; :::; sM ], W = [w1; :::; wP ], and Z = [z1; :::; zP ]. 0 0 tained as eigenvectors of two related eigenvalue problems. We have W = R and Z = S . Here, wp and zp are analo- However, a straightforward generalization of these results to gous to the canonical variates in CCA [Anderson, 2003]. tensors [Lee and Choi, 2007; Gang et al., 2011] does not We propose MCCA with two architectures. Architecture give uncorrelated features while maximizing the paired cor- I: wp and zp are maximally correlated, while for q 6= p relations (to be shown in Figs. 3(b) and 3(c)). Therefore, (p; q = 1; :::; P ), wq and wp, and zq and zp, respectively, are we derive MCCA following the more principled approach of uncorrelated. Architecture II: wp and zp are maximally cor- solving CCA in [Anderson, 2003], where CCA projection related, while for q 6= p (p; q = 1; :::; P ), wp and zq are un- pairs are obtained successively through enforcing the zero- correlated. Figure 1 is a schematic diagram showing MCCA correlation constraints. and its two different architectures. We extend the sample-based estimation of canonical corre- 2.2 TVP: tensor space to vector subspace lations and variates in CCA to the multilinear case for matrix To extract uncorrelated features from tensor data directly, we data using Architecture I. use the TVP in [Lu et al., 2009a]. The TVP of a matrix X 2 I1×I2 P Definition 1. Canonical correlations and variates for ma- R to a vector r 2 R consists of P EMPs fup; vpg, trix data (Architecture I). For two matrix data sets with I1 I2 u 2 , v 2 , p = 1; :::; P . It is a mapping from I1×I2 J1×J2 p R p R M samples each fXm 2 R g and fYm 2 R g, I1 N I2 the original tensor space R R into a vector subspace m = 1; :::; M, the pth pair of canonical variates is the pair P .