
TANGENT SPACE INTRINSIC MANIFOLD REGULARIZATION FOR DATA REPRESENTATION Shiliang Sun Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 200241, P. R. China m ABSTRACT where fi ∈ R corresponds to xi and m << d. Classical lin- ear dimensionality reduction methods include principal com- A new regularization method called tangent space intrinsic ponent analysis (PCA) and metric multidimensional scaling manifold regularization is presented, which is intrinsic to data (MDS) [4]. Recent nonlinear dimensionality reduction meth- manifold and favors linear functions on the manifold. Funda- ods achieved a great success especially for representing data mental elements involved in its formulation are local tangent that obey the manifold assumption, e.g., successfully unfold- space representations which we estimate by local principal ing the intrinsic degrees of freedom for gradual changes of component analysis, and the connections which relate adja- images and videos. Representative algorithms in this cate- cent tangent spaces. We exhibit its application to data repre- gory include locally linear embedding [2], isomap [3], Lapla- sentation where a nonlinear embedding in a low-dimensional cian eigenmaps [1], Hessian eigenmaps [5], maximum vari- space is found by solving an eigen-decomposition problem. ance unfolding with semidefinite programming [6], and oth- Experimental results including comparisons with state-of-the- ers [7]. Some of the above linear and nonlinear dimensional- art techniques show the effectiveness of the proposed method. ity reduction methods can be characterized as spectral meth- Index Terms— Regularization, tangent space, manifold ods, because computationally they often contain the proce- learning, dimensionality reduction, data representation dure of eigen-decomposition of an appropriately constructed matrix and then exploit the top or bottom eigenvectors [6]. The principle of regularization has its root in mathemat- 1. INTRODUCTION ics to solve ill-posed problems, and is widely used in pattern recognition and machine learning. Many well-known algo- In this paper, we present a new data-dependent regulariza- rithms, e.g., SVMs [8, 9], ridge regression and lasso [10, 11], tion method named tangent space intrinsic manifold regular- can be interpreted as instantiations of the idea of regulariza- ization for function learning, which is motivated largely by tion. The regularization method presented in this paper is in- the geometry of the data-generating distribution. In particu- trinsic to data manifold which prefers linear functions on the lar, data lying in a high-dimensional space are assumed to be manifold. Namely, functions with constant manifold deriva- intrinsically of low dimensionality. That is, data can be well tives are more appealing. Fundamental elements involved in characterized by far fewer parameters or degrees of freedom the regularization formulation are local tangent space repre- than the actual ambient representation. This setting is usually sentations and the connections which relate adjacent tangent referred to as manifold learning, and the distribution of data is spaces. When applying this regularization principle to nonlin- regarded to live on or near a low-dimensional manifold. The ear dimensionality reduction with data modeled as adjacency validity of manifold learning, especially for high-dimensional graphs, it turns out that the resultant problem can be readily image and text data, has already been testified by recent de- solved by eigen-decomposition. We illustrate that this regu- velopments [1, 2, 3]. Although our new regularization method larization method can obtain good and reasonable data em- has potentials to be applied to a variety of pattern recognition bedding results. and machine learning problems, here we only consider unsu- pervised data representation or dimensionality reduction. 2. THE PROPOSED REGULARIZATION The problem of representing data in a low-dimensional space for the sake of data visualization and organization is es- We are interested in estimating a function f(x) defined on sentially a dimensionality reduction problem. Given a data set M ⊂ Rd, where M is a smooth manifold on Rd. We as- k Rd k {xi}i=1 with xi ∈ , its task is to deliver a data set {fi}i=1 sume that f(x) can be well approximated by a linear function with respect to the manifold M. Let m be the dimensional- Thanks to NSFC Project 61075005 and Shanghai Knowledge Service Platform Project (No. ZF1213) for funding, and to Prof. Tong Zhang from ity of M. At each point z ∈ M, f(x) can be represented > 2 Rutgers University for helpful discussions. Email: [email protected] as a linear function f(x) ≈ bz + wz uz(x) + o(kx − zk ) lo- cally around z, where uz(x) = Tz(x − z) is an m-dimensional 3. GENERALIZATION AND REFORMULATION vector representing x in the tangent space around z, and Tz is an m × d matrix that projects x around z to a representa- Relating data with a discrete weighted graph is popular es- tion in the tangent space of M at z. Note that in this paper pecially in graph-based machine learning methods. It also the basis for Tz is computed using local PCA for its simplic- makes sense for us to generalize the regularizer in (2) using a ity and wide applicability. In particular, the point z and its symmetric weight matrix W constructed from the above data neighbors are sent over to the regular PCA procedure and the collection Z. top m eigenvectors are returned back as rows of matrix Tz. Entries in W characterize the closeness of different points m The weight vector wz ∈ R is an m-dimensional vector, and where the points are often called nodes in the terminology of it is also the manifold-derivative of f(x) at z with respect to graphs. Usually there are two steps involved in constructing the uz(·) representation on the manifold, which we write as a weighted graph. The first step builds an adjacency graph ∇T f(x)|x=z = wz. by putting an edge between two “close” points. People can Mathematically, a linear function with respect to the man- choose to use parameter ∈ R or parameter n ∈ N to de- ifold M, which is not necessarily a globally linear function in termine close points, which means that two nodes would be Rd, is a function that has constant manifold derivative. How- connected if their Euclidean distance is within or either node ever, this does not mean wz is a constant function of u due is among the n nearest neighbors of the other as indicated by to the different coordinate systems when the “anchor point” z the Euclidean distance. The second step calculates weights changes from one point to another. This needs to be compen- on the edges of the graph with a certain similarity measure. sated using “connections” that map a coordinate representa- For example, the heat-kernel method computes weight Wij 2 0 −kxi−xj k /t tion uz0 to uz for any z near z. for two connected nodes i and j by Wij = exp To see how our approach works, we assume for simplicity with parameter t > 0, while for nodes not directly connected > that Tz is an orthogonal matrix for all z: TzTz = I(m×m). the weights would be zero [1]. This means that if x ∈ M is close to z ∈ M, then x − z ≈ Therefore, the generalization of the tangent space intrinsic > 2 Tz Tz(x − z) + O(kx − zk ). Now consider x that is close to manifold regularizer turns out to be 0 both z and z . We can express f(x) both in the tangent space k k representation at z and z0, which gives R({bz,wz}z∈Z ) = Wij bzi − bzj − > 0 > 0 0 2 2 Xi=1 Xj=1 bz + wz uz(x) ≈ bz + wz0 uz (x) + O(kx − z k + kx − zk ). > 2 > 2 > 0 > 0 w T (z − z ) + γkw − T T w k . (4) That is, bz + wz uz(x) ≈ bz + wz0 uz (x). This means that zj zj i j zi zi zj zj 2 > > 0 bz + wz Tz(x − z) ≈ bz0 + wz0 Tz0 (x − z ). Now we reformulate the regularizer (4) into a canonical > 0 matrix quadratic form to facilitate subsequent formulations Setting x z, we obtain b b 0 w 0 T 0 z z , and = z ≈ z + z z ( − ) on data representation. In particular, we would like to rewrite > 0 > > 0 bz0 + wz0 Tz0 (z − z ) + wz Tz(x − z) ≈ bz0 + wz0 Tz0 (x − z ). the regularizer as a quadratic form in terms of a symmetric This implies that matrix S as follows, > > > 0 bz1 bz1 wz Tz(x − z) ≈ wz0 Tz (x − z) > > 0 2 2 . . ≈wz0 Tz0 Tz Tz(x − z) + O(kx − z k + kx − zk ). (1) . bzk S1 S2 bzk Since (1) holds for arbitrary x ∈ M close to z ∈ M, it fol- R({bz, wz}z∈Z ) = > , (5) > > 0 > 0 > 0 wz1 S2 S3 wz1 lows that, wz ≈ wz0 Tz Tz +O(kz−z k) or wz ≈ TzTz0 wz + 0 . . O(kz − z k). . . This means that if we expand at points z1,..., zk ∈ Z, w w zk zk and denote neighbors of zj as N (zj ), then the correct regu- larizer will be S1 S2 where > is a block matrix representation of S and k S2 S3 the size of S1 is k × k. In this paper, we suppose that wz R({bz, wz}z∈Z ) = bz − bz − i i j (i = 1,...,k) is an m-dimensional vector. Thus, the size Xi=1 j∈NX(zi) of S is (k + mk) × (k + mk). Due to space limit, we omit w> T z z 2 γ w T T >w 2 . (2) zj zj ( i − j) + k zi − zi zj zj k2 the detailed derivation for S1, S2 and S3, which will be made available on the internet.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages5 Page
-
File Size-