Manifold Alignment Using Procrustes Analysis Chang Wang University of Massachusetts - Amherst
Total Page:16
File Type:pdf, Size:1020Kb
University of Massachusetts Amherst ScholarWorks@UMass Amherst Computer Science Department Faculty Publication Computer Science Series 2008 Manifold Alignment using Procrustes Analysis Chang Wang University of Massachusetts - Amherst Sridhar Mahadevan University of Massachusetts - Amherst Follow this and additional works at: https://scholarworks.umass.edu/cs_faculty_pubs Part of the Computer Sciences Commons Recommended Citation Wang, Chang and Mahadevan, Sridhar, "Manifold Alignment using Procrustes Analysis" (2008). Computer Science Department Faculty Publication Series. 64. Retrieved from https://scholarworks.umass.edu/cs_faculty_pubs/64 This Article is brought to you for free and open access by the Computer Science at ScholarWorks@UMass Amherst. It has been accepted for inclusion in Computer Science Department Faculty Publication Series by an authorized administrator of ScholarWorks@UMass Amherst. For more information, please contact [email protected]. Manifold Alignment using Procrustes Analysis Chang Wang [email protected] Sridhar Mahadevan [email protected] Computer Science Department, University of Massachusetts, Amherst, MA 01003 USA Abstract only have a limited number of degrees of freedom, im- plying the data set has a low intrinsic dimensionality. In this paper we introduce a novel approach Similar to current work in the ¯eld, we assume kernels to manifold alignment, based on Procrustes for computing the similarity between data points in analysis. Our approach di®ers from \semi- the original space are already given. In the ¯rst step, supervised alignment" in that it results in a we map the data sets to low dimensional spaces reflect- mapping that is de¯ned everywhere { when ing their intrinsic geometries using a standard (nonlin- used with a suitable dimensionality reduction ear or linear) dimensionality reduction approach. For method { rather than just on the training example, using a graph-based nonlinear dimensional- data points. We describe and evaluate our ity reduction method provides a discretized approxi- approach both theoretically and experimen- mation to the manifolds, so the new representations tally, providing results showing useful knowl- characterize the relationships between points but not edge transfer from one domain to another. the original features. By doing this, we can compare Novel applications of our method including the embeddings of the two sets instead of their original cross-lingual information retrieval and trans- representations. Generally speaking, if two data sets fer learning in Markov decision processes are S and S have similar intrinsic geometry structures, presented. 1 2 they have similar embeddings. In our second step, we apply Procrustes analysis to align the two low dimen- sional embeddings of the data sets based on a number 1. Introduction of landmark points. Procrustes analysis, which has Manifold alignment is very useful in a variety of appli- been used for statistical shape analysis and image reg- cations since it provides knowledge transfer between istration of 2D/3D data (Luo et al., 1999), removes the two seemingly disparate data sets. Sample applica- translational, rotational and scaling components from tions include automatic machine translation, represen- one set so that the optimal alignment between the two tation and control transfer between di®erent Markov sets can be achieved. decision processes (MDPs), image comparison, and There is a growing body of work on manifold align- bioinformatics. More precisely, suppose we have two ment. Ham et al. (Ham et al., 2005) align the mani- data sets S1 = fx1; ¢ ¢ ¢ ; xmg and S2 = fy1; ¢ ¢ ¢ ; yng for folds leveraging a set of correspondences. In their ap- which we want to ¯nd a correspondence. Working with proach, they map the points of the two data sets to the the data in its original form can be very di±cult as the same space by solving a constrained embedding prob- data might be in high dimensional spaces and the two lem, where the embeddings of the corresponding points sets might be represented by di®erent features. For ex- from di®erent sets are constrained to be identical. The ample, S1 could be a collection of English documents, work of Lafon et al. (Lafon et al., 2006) is based on a whereas S2 is a collection of Arabic documents. Thus, similar framework as ours. They use Di®usion Maps it may be di±cult to directly compare documents from to embed the nodes of the graphs corresponding to the the two collections. aligned sets, and then apply a±ne matching to align Even though the processing of high-dimensional data the resulting clouds of points. sets is challenging, for many cases, the data source may Our approach di®ers from semi-supervised align- ment (Ham et al., 2005) in that it results in a map- th Appearing in Proceedings of the 25 International Confer- ping that is de¯ned everywhere rather than just on the ence on Machine Learning, Helsinki, Finland, 2008. Copy- right 2008 by the author(s)/owner(s). known data points (provided a suitable dimensionality Manifold Alignment using Procrustes Analysis reduction method like LPP (He et al., 2003) or PCA dimensionality reduction. is used). Recall that semi-supervised alignment is de- ¯ned only on the known data points and it is hard 1. Constructing the relationship matrices: to handle the new test points (Bengio et al., 2004). Our method is also faster, since it requires computing ² Construct the weight matrices W1 for S1 eigendecompositions of much smaller matrices. Com- and W2 for S2 using Ki, where W1(i; j) = pared to a±ne matching, which changes the shape of K1(xi; xj) and W2(i; j) = K2(yi; yj). one given manifold to achieve alignment, our approach ² Compute Laplacian matrices L1 = I ¡ keeps the manifold shape untouched. This property ¡0:5 ¡0:5 ¡0:5 ¡0:5 preserves the relationship between any two data points D1 W1D1 and L2 = I ¡D2 W2D2 , in each individual manifold in the process of alignment. Pwhere Dk is a diagonal matrix (Dk(i; i) = W (i; j)) and I is the identity matrix. The computation times for a±ne matching and Pro- j k crustes analysis are similar, both run in O(N 3) (where N is the number of instances). 2. Learning low dimensional embeddings of the data sets: Given the fact that dimensionality reduction ap- proaches play a key role in our approach, we provide a ² Compute selected eigenvectors of L1 and L2 theoretical bound for the di®erence between subspaces as the low dimensional embeddings of the spanned by low dimensional embeddings of the two data sets S1 and S2. Let X, XU be the d l u data sets. This bound analytically characterizes when dimensional embeddings of S1 and S1 , Y , YU l the two data sets can be aligned well. In addition be the d dimensional embeddings of S2 and u l l to the theoretical analysis of our algorithm, we also S2 , where S1, S2 are in pairwise alignment l l report on several novel applications of our alignment and jS1j=jS2j. approach. The rest of this paper is as follows. In Section 2 we de- 3. Finding the optimal alignment of X and Y : scribe the main algorithm. In Section 3 we explain the ² Translate the con¯gurations in X, X , Y rationale underlying our approach, and prove a bound U and YU , so that X, Y have their centroids on the di®erence between the subspaces spanned by P l P l jS1j l jS2j l low dimensional embeddings of the two data sets be- ( i=1 Xi=jS1j; i=1 Yi=jS2j) at the origin. ing aligned. We describe some novel applications and ² Compute the singular value decomposi- summarize our experimental results in Section 4. Sec- tion (SVD) of Y T X, that is U§V T = tion 5 provides some concluding remarks. SVD(Y T X). ² Y ¤ = kY Q is the optimal mapping re- ¤ 2. Manifold Alignment sult that minimizes kX ¡ Y kF , where k:kF is Frobenius norm, Q = UV T and k = 2.1. The Problem trace(§)=trace(Y T Y ). Given two data sets along with additional pairwise correspondences between a subset of the training in- 4. Apply Q and k to ¯nd correspondences be- u u stances, we want to determine a correspondence be- tween S1 and S2 . tween the remaining instances in the two data sets. S ¤ l u ² Y = kYU Q. Formally speaking, we have two sets: S1 = S1 S1 = U l S u fx1; ¢ ¢ ¢ ; xmg, S2 = S2 S2 = fy1; ¢ ¢ ¢ ; yng, and the ² For each element x in XU , its correspondence l l ¤ ¤ subsets S and S are in pairwise alignment. We want in Y = arg min ¤ ¤ ky ¡ xk. 1 2 U y 2YU to ¯nd a mapping f, which is more precisely de¯ned in Section 3.1, to optimally match the points between u u Depending on the approach that we want to use, there S1 and S2 . are several variations of Step 1. For example, if we 2.2. The Algorithm are using PCA, then we use the covariance matrices instead of Laplacian matrices; similarly, if we are using Assume the kernel Ki for computing the similarity be- LPP (He et al., 2003), then we construct the weight tween data points in each of the two data sets is al- l l l l matrices W1 for D1, W2 for D2 using Ki and then learn ready given. The algorithmic procedure is stated be- the projections. Note that when PCA or LPP is used, low. For the sake of concreteness, in the procedure, then the low dimensional embedding will be de¯ned Laplacian eigenmap (Belkin et al., 2003) is used for everywhere rather than just on the training points. Manifold Alignment using Procrustes Analysis 3. Justi¯cation Case 1: If trace(QT Y T X) ¸ 0, then the problem becomes In this section, we prove two theorems.