Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence
Semi-Supervised Learning with Manifold Fitted Graphs Tongtao Zhang† Rongrong Ji†‡ Wei Liu ‡Xiamen University, China †Columbia University, USA IBM Research, USA [email protected] [email protected] [email protected]
Dacheng Tao§ Gang Hua §University of Technology Stevens Inst. of Technology, USA Sydney, Australia [email protected] [email protected]
Abstract graph, representative semi-supervised learning methods, e.g., [Zhu et al., 2003][Zhou et al., 2003a][Belkin et al., 2006][Liu In this paper, we propose a locality-constrained and Chang, 2009], can be readily deployed through engaging and sparsity-encouraged manifold fitting approach, this graph in proper smoothness regularization terms. A typ- aiming at capturing the locally sparse manifold ical kind of graphs are kNN graphs which have been inten- structure into neighborhood graph construction by sively used in a number of learning problems such as dimen- exploiting a principled optimization model. The sionality reduction [Belkin and Niyogi, 2003] and spectral proposed model formulates neighborhood graph clustering [Ng et al., 2001] besides semi-supervised learning construction as a sparse coding problem with the studied in this paper. To construct a kNN graph on an input d locality constraint, therefore achieving simultane- dataset X = {x1, ··· , xn}⊂R with cardinality |X | = n, ous neighbor selection and edge weight optimiza- edges are established to connect each individual data point tion. The core idea underlying our model is to per- and its k nearest neighbors. The edge weights are then ob- form a sparse manifold fitting task for each data tained by either parametric kernel functions (typically the point so that close-by points lying on the same Gaussian kernel) or nonparametric optimization procedures local manifold are automatically chosen to con- [Jebara et al., 2009]. However, kNN graphs do not explicitly nect and meanwhile the connection weights are ac- explore or capture intrinsically low-dimensional manifolds on quired by simple geometric reconstruction. We which high-dimensional data samples reside, though they can term the novel neighborhood graph generated by approximate some manifolds existing in simple datasets. our proposed optimization model M-Fitted Graph since such a graph stems from sparse manifold fit- Towards capturing geometric structure hidden in data into ting. To evaluate the robustness and effectiveness graph construction mechanisms, one recent trend is to explore of M-fitted graphs, we leverage graph-based semi- the sparse subspace structure [Elhamifar and Vidal, 2009][El- supervised learning as the testbed. Extensive exper- hamifar and Vidal, 2010][Elhamifar and Vidal, 2011][Gao et iments carried out on six benchmark datasets vali- al., 2010], which has turned out to be very effective for dis- date that the proposed M-fitted graph is superior covering the clusters formed in high-dimensional data such to state-of-the-art neighborhood graphs in terms of as face images. Specifically, [Elhamifar and Vidal, 2009][El- classification accuracy using popular graph-based hamifar and Vidal, 2010] proposed to reconstruct each indi- x X semi-supervised learning methods. vidual data point i using the rest of points in in a least squares sense. Very importantly, for each instance xi a spar- sity constraint is imposed to make such a geometric recon- 1 Introduction struction find the most fitted subspace spanned by a few in- stances in X . In another words, the found subspace holds In this paper, we investigate the problem of Graph-Based x Semi-Supervised Learning [Zhu and Goldberg, 2009] which the shortest distance to the point i meanwhile keeping the is an emerging machine learning topic having received broad subspace dimension as low as possible. This sparse subspace research attention and also triggered a variety of practical fitting problem can be relaxed to an 1 minimization problem [ ] applications including document and image ranking [Zhou Cand´es and Tao, 2005 and thus solved by conventional con- et al., 2003b], web-scale image search [Jing and Baluja, vex optimization techniques. In addition, the seminal work [ ] [ ] 2008][Liu et al., 2011a], image and video annotation [Jiang et Gao et al., 2010 and Elhamifar and Vidal, 2011 suggested al., 2012], protein classification [Weston et al., 2003], etc. To similar geometric reconstruction schemes in kernel-induced accomplish sensible semi-supervised learning, the key chal- feature spaces and on manifolds, respectively. In the work, lenge arises, i.e., how to build a robust neighborhood graph the sparsity constraint is enforced as well. which is capable of capturing the inherent manifold struc- The key computational challenge of the aforementioned ture underlying input data samples. After constructing such a sparse subspace fitting scheme [Elhamifar and Vidal, 2009]
1896 and its variants is that they require all, or at least a part, of the input dataset X to reconstruct every data point xi, that is, minimizing an 1 problem (·1 denotes 1 norm) of n − 1 variables
min ai1 n−1 ai∈R
s.t. xi = X¯iai. (1)
In reconstructing xi, this 1 problem involves a basis X¯i = [x1, ··· , xi−1, xi+1, ··· , xn] that is composed of n − 1 in- stances in X except the target instance xi. Above all, the Figure 1: The best fitted local affine subspace Fi (red dashed convex relaxation from 0 norm to 1 norm cannot guarantee xi1 −xi xi4 −xi ai line) consists of two local directions and , that the resulting solution (i.e., the subspace reconstruc- xi1 −xi xi4 −xi tion coefficient vector) is always meaningful. The 1 mini- and holds the minimal distance to xi. mization may lead to a dense ai and thus violate the sparsity expectation of the subspace reconstruction coefficients. Sec- graphs, then illustrate the usage in the context of graph-based ond, solving the 1 problem in eq. (1) is nontrivial and time- semi-supervised learning, and finally show the quantitative consuming for high-dimensional or large-scale data. Third, experimental results. the sparse subspace fitting scheme neglects the locality of data and is hence likely to introduce irrelevant instances that M are outliers or from other classes with respect to xi. Fi- 2 -Fitted Graphs nally and most critically, sparse subspace fitting assumes that Notations. We define the input data matrix as X = d×n a multi-subspace structure exists in the data, which is not [x1,...,xn] ⊂ R of d feature dimensions and n in- always true for real-world data. It is noted that [Cheng et stances. We formulate the neighborhood graph to be con- n×n al., 2009] considered the locality constraint when performing structed as an affinity matrix W ∈ R where wij repre- sparse subspace fitting for neighborhood graph construction, sents the edge weight between vertices xi and xj. We de- but it still adopted the multi-subspace assumption. fine by M the manifold(s) underlying the input dataset X , n n In this paper, we argue that a multi-manifold structure is {Ti}i=1 the local tangent spaces, {Ui}i=1 the local direction n more natural than the multi-subspace structure for realistic basis matrices, and {Fi}i=1 the local affine subspaces. datasets. The latter can be regarded as an extreme case of the Geometric Idea. Following the well-known manifold learn- multi-manifold structure. ing approach Local Tangent Space Alignment [Zhang and In order to overcome the potential issues associated Zha, 2004], we assume that the manifold(s) (possibly mul- with sparse subspace fitting, in this work we propose a tiple manifolds) underlying the input data collection X are locality-constrained and sparsity-encouraged reconstruction composed of and merged by a series of local tangent spaces. approach, namely Sparse Manifold Fitting, which can cap- Each tangent space, denoted by Ti, lies under each data point n ture the locally sparse manifold structure and thus dis- xi. In mathematics, we can write M = i=1 Ti if n →∞, cover the global multi-manifold structure. The proposed ap- where a local tangent space Ti is actually a small patch of a proach yields a novel neighborhood graph called by M-Fitted local manifold from M. Then our goal is to seek such a tan- Graph. In sparse manifold fitting, we directly design an gent space located at each point. We employ the geometric 0 norm-based optimization model to perform sparse coding sparsity idea proposed in [Elhamifar and Vidal, 2011] to ap- with the locality constraint that restricts the coding within the proximately seek the local tangent spaces. Specifically, we scope of a limited number of neighbors, thereby ensuring the reconstruct each data point xi using sparse bases from a local optimization efficiency. direction basis Ui, and the minimal reconstruction distortion Our M-fitted graphs can benefit vast applications which is pursued to yield the optimal local directions that determine previously used kNN graphs. To validate the robustness and the optimally fitted local affine subspace Fi for xi. As a re- effectiveness of M-fitted graphs, we adopt graph-based semi- sult, this sparse reconstruction gives rise to the local affine supervised learning as the testbed since its classification accu- subspace Fi with the minimal angle to the target local tangent racy heavily depends on the quality of used graphs. Through space Ti, therefore discovering the best fitted local manifold. extensive experiments, we find that our M-fitted graphs are Through sparse reconstruction, we can build a neighbor- very suitable for semi-supervised learning and exhibit supe- hood graph in which the graph edges correspond to the solved rior performance over state-of-the-art neighborhood graphs local directions. Under such a circumstance, the edges con- when cooperating with representative semi-supervised learn- necting to xi are closely parallel to the local tangent space Ti ing techniques, e.g., Local and Global Consistency (LGC) and thus closely reside on the local manifold at xi. [Zhou et al., 2003a], Linear Neighborhood Propagation Formulation. Our proposed sparse reconstruction starts from (LNP) [Wang and Zhang, 2008], and Gaussian Fields and setting up a relatively large neighborhood of m (m 1897