Cross-Modal Subspace Clustering Via Deep Canonical Correlation Analysis
Total Page:16
File Type:pdf, Size:1020Kb
The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20) Cross-Modal Subspace Clustering via Deep Canonical Correlation Analysis Quanxue Gao,1 Huanhuan Lian,1∗ Qianqian Wang,1† Gan Sun2 1State Key Laboratory of Integrated Services Networks, Xidian University, Xi’an 710071, China. 2State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, China. [email protected], [email protected], [email protected], [email protected] Abstract Different Different Self- S modalities subspaces expression 1 For cross-modal subspace clustering, the key point is how to ZZS exploit the correlation information between cross-modal data. Deep 111 X Encoding Z However, most hierarchical and structural correlation infor- 1 1 S S ZZS 2 mation among cross-modal data cannot be well exploited due 222 to its high-dimensional non-linear property. To tackle this problem, in this paper, we propose an unsupervised frame- Z X 2 work named Cross-Modal Subspace Clustering via Deep 2 Canonical Correlation Analysis (CMSC-DCCA), which in- Different Similar corporates the correlation constraint with a self-expressive S modalities subspaces 1 ZZS layer to make full use of information among the inter-modal 111 Deep data and the intra-modal data. More specifically, the proposed Encoding X Z A better model consists of three components: 1) deep canonical corre- 1 1 S ZZS S lation analysis (Deep CCA) model; 2) self-expressive layer; Maximize 2222 3) Deep CCA decoders. The Deep CCA model consists of Correlation convolutional encoders and correlation constraint. Convolu- X Z tional encoders are used to obtain the latent representations 2 2 of cross-modal data, while adding the correlation constraint for the latent representations can make full use of the infor- Figure 1: The illustration shows the influence of correla- mation of the inter-modal data. Furthermore, self-expressive tion constraint. The learned features of two modalities in layer works on latent representations and constrain it perform the subspace are quite different without the correlation con- self-expression properties, which makes the shared coeffi- straint, which resulting a bad shared coefficient matrix S, be- cient matrix could capture the hierarchical intra-modal cor- cause it’s too difficult to reflect the different structure of two relations of each modality. Then Deep CCA decoders recon- struct data to ensure that the encoded features can preserve the modalities simultaneously. However, combining the correla- structure of the original data. Experimental results on several tion constraint with a self-expressive layer, we can learn a real-world datasets demonstrate the proposed method outper- better shared coefficient matrix S. X1 and X2 are the input forms the state-of-the-art methods. data. Z1 and Z2 are the latent representations in subspace. S1 and S2 are the self-expression coefficient matrices. Introduction In machine learning, classification (Liu, Tsang, and Muller¨ Bauckhage 2011); 2) Multi-kernel learning strategy (Guo 2017; Liu and Tsang 2017) and clustering are two core et al. 2014); 3) Subspace clustering methods (Chaudhuri et tasks. However, clustering task has attracted considerable al. 2009); 4) Self-representation based methods (Yin et al. attention due to the fact that label information is difficult 2015); 5) Graph constraint based methods (Xia et al. 2014; to obtain in real applications. Most traditional clustering Nie, Cai, and Li 2017). Among these above methods, sub- methods mainly focus on the clustering problem of low- space and self-representation based methods have received dimensional data. Generally, traditional clustering methods a lot of attention and achieved remarkable results. However, can be roughly divided into five categories: 1) Non-negative these traditional methods adopt shallow and linear embed- matrix factorization (NMF) clustering (Akata, Thurau, and ding functions to reveal the intrinsic structure of data, which ∗Corresponding author: Huanhuan Lian cannot simulate the high-dimensional nonlinear character- †Corresponding author: Qianqian Wang istics of data very well. Especially, for subspace clustering Copyright c 2020, Association for the Advancement of Artificial methods, high-dimensional data can easily lead to “curse of Intelligence (www.aaai.org). All rights reserved. dimensionality”. 3938 To solve the “curse of dimensionality” problem, Elham- consider both the correlations of the inter-modal data and ifar et al. (2013) propose a Sparse Subspace Clustering the distribution of the intra-modal data. (SSC) method to reduce dimension. Patel et al. (2015) • We add Deep CCA decoders into our model to reconstruct propose Latent Space Sparse Subspace Clustering (LSSC) data, which ensures that the encoded features can well re- that simultaneously performs dimensionality reduction and flect the overall structural distribution of original data. sparse coding for SSC. With the development of neural net- works, some people introduce the deep learning to solve sub- • We give an efficient algorithm to optimize loss function space clustering problems (Ji et al. 2017; Sun et al. 2018; and train the entire network at once to reduce the calcula- Li et al. 2019). For instance, Ji et al. (2017) use stacked tion of parameters. In addition, we perform spectral clus- auto-encoders as their basic model and adopt self-expression tering on shared coefficient matrix. Experiments show our to learn the affinity of data in latent space for clustering. model achieves the best clustering performance. Deep Multi-modal Subspace Clustering (DMSC) (Abavisani and Patel 2018) presents a convolutional neural network Related work (CNN) approach for unsupervised multi-modal subspace In real life, data are often described by different modalities. clustering. However, for cross-modal (Rasiwasia et al. 2010; Therefore, cross-modal data processing is getting more at- Jia, Salzmann, and Darrell 2011) subspace clustering, there tention, which focuses on maximizing the correlations of the are only a few superior methods. For example, Zhang et inter-modal data. In this section, we provide a brief review al. (2007) propose to exploit cross-modal correlations to about the correlation studies on cross-modal data. cluster two modalities data. He et al. (2015) use cross- Among cross-modal data correlation studies, the most modal learning via the pairwise constraint and aim to find representative method is based on Canonical Correlation the common hidden structure of cross-modal data. These Analysis (CCA) (Kim, Kittler, and Cipolla 2007). The main cross-modal methods reduce the semantic gap of the inter- idea of CCA is to find the mapping vector of each modal- modal data and improve the clustering accuracy. However, ity in the subspace by maximizing the correlations of the they cannot well capture nonlinear feature for cross-modal inter-modal data. However, CCA can only calculate the lin- clustering. Moreover, cross-modal subspace clustering still ear correlations of the inter-modal data. In real applications, confronts the following challenges: the relationships between cross-modal data may be non- • How should we consider the correlations of the inter- linear. To solve this problem, some nonlinear CCA meth- modal data and the intra-modal data simultaneously to ods have been proposed, such as, the Kernel Canonical learn a representative shared subspace representation to Correlation Analysis (KCCA) (Akaho 2006) and Locality improve clustering accuracy? Preserving CCA (LPCCA) (Sun and Chen 2007). How- • How could we guarantee that the features encoded by the ever, these methods have high computational complexity. deep network can still reflect the structural distribution of In addition, it is easy overfitting and relatively difficult to the original data? choose a suitable kernel function for KCCA method. Deep Canonical Correlation Analysis (Deep CCA) (Andrew et To overcome these challenges, we propose a novel Cross- al. 2013) method can learn complex nonlinear transforma- Modal Subspace Clustering framework via Deep Canoni- tions of the inter-modal data such that the learned repre- cal Correlation Analysis (CMSC-DCCA) to improve clus- sentations are highly correlated through a multi-layer deep tering performance. The proposed method consists of three network. The disadvantage of Deep CCA is that it cannot parts: deep canonical correlation analysis (Deep CCA) (An- reconstruct data and further results in poor clustering per- drew et al. 2013) model, a self-expressive layer and Deep formance. Wang et al. (2016) further propose Deep Canon- CCA decoders. In the Deep CCA model, we map the high- ically Correlated Auto-encoders (DCCAE) with an auto- dimensional nonlinear cross-modal data into latent represen- encoder to reconstruct data. Deep Generalized Canonical tations by deep convolution encoders; meanwhile, correla- Correlation Analysis (DGCCA) (Benton et al. 2019) learns a tion constraint can maximize the correlations of the inter- modal-independent representation to cluster, which reduces modal data. For the self-expressive layer, a self-expressive redundant information of data. However, both DCCAE and loss function is employed for the latent representation, DGCCA methods do not consider the correlations of the which can reveal the correlation information among intra- intra-modal data. modal data. In addition, we add the Deep CCA decoders to reconstruct the original data, which ensures the representa- tions processed by the encoders can still well reflect the char-