Correlation Clustering for Learning Mixtures of Canonical Correlation Models

Correlation Clustering for Learning Mixtures of Canonical Correlation Models X. Z. Fern∗ C. E. Brodley† M. A. Friedl‡ Abstract multivariate statistical technique commonly used to This paper addresses the task of analyzing the correlation identify and quantify the correlation between two sets between two related domains X and Y . Our research is of random variables. Given a compound data set motivated by an Earth Science task that studies the rela- described by feature vectors ~x and ~y, CCA seeks to find tionship between vegetation and precipitation. A standard a linear transformation of ~x and a linear transformation statistical technique for such problems is Canonical Correla- of ~y such that the resulting two new variables are tion Analysis (CCA). A critical limitation of CCA is that it maximumly correlated. can only detect linear correlation between the two domains In Earth science research, CCA has been often that is globally valid throughout both data sets. Our ap- applied to examine whether there is a cause-and-effect proach addresses this limitation by constructing a mixture relationship between two domains or to predict the of local linear CCA models through a process we name cor- behavior of one domain based on another. For example, relation clustering. In correlation clustering, both data sets in [13] CCA was used to analyze the relationship are clustered simultaneously according to the data’s corre- between the monthly mean sea-level pressure (SLP) and lation structure such that, within a cluster, domain X and sea-surface temperature (SST) over the North Atlantic domain Y are linearly correlated in the same way. Each clus- in the months of December, January and February. This ter is then analyzed using the traditional CCA to construct analysis confirmed the hypothesis that atmospheric SLP local linear correlation models. We present results on both anomalies cause SST anomalies. artificial data sets and Earth Science data sets to demon- Because CCA is based on linear transformations, strate that the proposed approach can detect useful correla- the scope of its applications is necessarily limited. One tion patterns, which traditional CCA fails to discover. way to tackle this limitation is to use nonlinear canonical correlation analysis (NLCCA) [5, 8]. NLCCA applies 1 Introduction nonlinear functions to the original variables in order to extract correlated components from the two sets of vari- In Earth science applications, researchers are often ables. Although promising results have been achieved interested in studying the correlation structure between by NLCCA in some Earth science applications, it tends two domains in order to understand the nature of to be difficult to apply such techniques because of the the relationship between them. The inputs to our complexity of the model and the lack of robustness due correlation analysis task can be considered as two data to overfitting [5]. sets X and Y whose instances are described by feature In this paper we propose to use a mixture of lo- vectors ~x and ~y respectively. The dimension of ~x and cal linear correlation models to capture the correlation that of ~y do not need to be the same, although there structure between two sets of random variables (fea- must be a one-to-one mapping between instances of X tures). Mixtures of local linear models not only provide and instances of Y . Thus, it is often more convenient to an alternative solution to capturing nonlinear correla- consider these two data sets as one compound data set tions, but also have the potential to detect correlation whose instances are described by two feature vectors ~x patterns that are significant only in a part (a local re- and ~y. Indeed, throughout the remainder of this paper, gion) of the data. The philosophy of using multiple lo- we will refer to the input of our task as one data set, cal linear models to model global nonlinearity has been and the goal is to study how the two sets of features are successfully applied to other statistical approaches with correlated to each other. similar linearity limitations such as principal component Canonical Correlation Analysis (CCA) [4, 6] is a analysis [12] and linear regression [7]. Our approach uses a two-step procedure. Given a compound data set, ∗School of Elec. and Comp. Eng., Purdue University, West we propose to first solve a clustering problem that par- Lafayette, IN 47907, USA titions the data set into clusters such that each cluster †Dept. of Comp. Sci., Tufts University, Medford, MA 02155, contains instances whose ~x features and ~y features are USA linearly correlated. We then independently apply CCA ‡Dept. of Geography, Boston University, Boston, MA, USA to each cluster to form a mixture of correlation models The remainder of the paper is arranged as follows. that are locally linear. In Section 2, we review the basics of CCA. Section 3 in- In designing this two-step process, we need address troduces the intuitions behind our correlation clustering the following two critical questions. algorithm and formally describes the algorithm, which is then applied to artificially constructed data sets to 1. Assume we are informed a priori that we can model demonstrate its efficacy in finding correlation clusters the correlation structure using k local linear CCA from the data. Section 4 demonstrates how cluster en- models. How should we cluster the data in the semble techniques can be used to determine the num- context of correlation analysis? ber of clusters in the data and address the local optima problem of the k-means style correlation clustering al- 2. In real-world applications, we are rarely equipped gorithm. Section 5 explains our motivating application, with knowledge of k. How can we decide how many presents results, and describes how our domain expert clusters there are in the data or whether a global interprets the results. Finally, in Section 6 we conclude linear structure will suffice? the paper and discuss future directions. Note that the goal of clustering in the context of cor- 2 Basics of CCA relation analysis is different from traditional clustering. Given a data set whose instances are described by two In traditional clustering, the goal is to group instances feature vectors ~x and ~y, the goal of CCA is to find linear that are similar (as measured by certain distance or sim- transformations of ~x and linear transformations of ~y ilarity metric) together. In contrast, here we need to such that the resulting new variables are maximumly group instances based on how their ~x features and ~y fea- correlated. tures correlate to each other, i.e., instances that share In particular, CCA constructs a sequence of pairs similar correlation structure between the two sets of fea- of strongly correlated variables (u1, v1), (u2, v2), , tures should be clustered together. To differentiate this · · · (ud, vd) through linear transformations, where d is the clustering task from traditional clustering, we name it minimum dimension of ~x and ~y. These new variables correlation clustering1 and, in Section 3 we propose an ui’s and vi’s, named canonical variates (sometimes iterative greedy k-means style algorithm for this task. referred to as canonical factors). They are similar To address the second question, we apply the tech- to principal components in the sense that principal nique of cluster ensembles [2] to our correlation cluster- components are linear combinations of the original ing algorithm, which provides a user with a visualization variables that capture the most variance in the data and of the results that can be used to determine the proper in contrast canonical variates are linear combinations of number of clusters in the data. Note that our correla- the original variables that capture the most correlation tion clustering algorithm is a k-means style algorithm between two sets of variables. and as such may have many locally optimal solutions— To construct these canonical covariates, CCA first different initializations may lead to significantly differ- seeks to transform ~x and ~y into a pair of new variables ent clustering results. By using cluster ensembles, we u1 and v1 by the linear transformations: can also address the local optima problem of our clus- T T tering algorithm and find a stable clustering solution. u1 = ( ~a1) ~x, and v1 = (b~1) ~y To demonstrate the efficacy of our approach, we ~ apply it to both artificial data sets and real world Earth where the transformation vectors ~a1 and b1 are defined science data sets. Our results on the artificial data such that corr(u1, v1) is maximized subject to the 2 sets show that (1) the proposed correlation clustering constraint that both u1 and v1 have unit variance. ~ ~ algorithm is capable of finding a good partition of Once ~a1,b1; ; ~ai,bi are determined, we then find the · · · ~ the data when the correct k is used and (2) cluster next pair of transformations ai~+1 and bi+1 such that the T ~ T ensembles provide an effective tool for finding k. When correlation between (ai~+1) ~x and (bi+1) ~y is maximized applied to the Earth science data sets, our technique with the constraint that the resulting ui+1 and vi+1 are 3 detected significantly different correlation patterns in uncorrelated with all previous canonical variates. Note comparison to what was found via traditional CCA. that the correlation between ui and vi becomes weaker These results led our domain expert to highly interesting as i increases. Let ri represent the correlation between hypotheses that merit further investigation. the ith pair of canonical variates, we have ri ri+1. ≥ 2This constraint ensures unique solutions. 1Note that the term correlation clustering has also been used 3This constraint ensures that the extracted canonical variates by [1] as the name of a technique for traditional clustering.

Correlation Clustering for Learning Mixtures of Canonical Correlation Models

Canonical Correlation a Tutorial

A Tutorial on Canonical Correlation Analysis

A Call for Conducting Multivariate Mixed Analyses

A Probabilistic Interpretation of Canonical Correlation Analysis

Canonical Correlation Analysis and Graphical Modeling for Huaman

Cross-Modal Subspace Clustering Via Deep Canonical Correlation Analysis

The Geometry of Kernel Canonical Correlation Analysis

Lecture 7: Analysis of Factors and Canonical Correlations

Too Much Ado About Instrumental Variable Approach

Too Much Ado About Instrumental Variable Approach

Temporal Kernel CCA and Its Application in Multimodal Neuronal Data Analysis

Connections Between Canonical Correlation Analysis, Linear Discriminant Analysis, and Optimal Scaling