Mathematical Advances in Manifold Learning

Mathematical Advances in Manifold Learning Nakul Verma University of California, San Diego [email protected] June 03, 2008 Abstract number – the angle of the turn, i.e. the orientation of the body. Manifold learning has recently gained a lot of inter- In a typical learning scenario the task is slightly est by machine learning practitioners. Here we pro- more complicated as the agent only gets to see a vide a mathematically rigorous treatment of some of few samples from which it somehow needs to inter- the techniques in unsupervised learning in context of polate and generalize various possible scenarios. In manifolds. We will study the problems of dimension our example this translates to the agent only hav- reduction and density estimation and present some ing access to few of the body poses, from which it recent results in terms of fast convergence rates when needs to predict where the person is looking. Thus the data lie on a manifold. the agent is faced with the difficulty to find an ap- propriate (possibly non-linear) basis to represent this data compactly. Manifold learning can be broadly 1 Introduction described as the study of algorithms that use and in- ferring the properties of data that is sampled from an With increase in the volume of data, both in terms of underlying manifold. number of observations as well as number of measure- The goal of this survey is to study different math- ments, traditional learning algorithms are now faced ematical techniques by which we can estimate some with new challenges. One may expect that more data global properties of a manifold from a few samples. should lead to more accurate models, however a large We will start by studying random projections as a collection of irrelevant and correlated features just nonadaptive linear dimensionality reduction proce- add on to the computational complexity of the al- dure, which provides a probabilistic guarantee on pre- gorithm, without helping much to solve the task at serving the interpoint distances between all points on hand. This makes the learning task especially dif- a manifold. We will then focus on analyzing the spec- ficult. In an attempt to alleviate such problems, a trum of Laplace-Beltrami operator on functions on a new model in terms of manifolds for finding relevant manifold for finding non-linear embeddings and sim- features and representing the data by a few parame- plifying its structure. Lastly we will look at kernel ters is gaining interest by machine learning and signal density estimation to estimate high density regions processing communities. on a manifold. Most common examples of superficially high di- It is worth mentioning that our survey is by no mensional data are found in the fields of data mining means comprehensive and we simply highlight some and computer vision. Consider the problem of esti- of the recent theoretical advances in manifold learn- mating the face and body pose in humans. Know- ing. Most notably we do not cover the topics of ing where a person is looking gives a wealth of infor- regularization, regression and clustering of data be- mation to an automated agent regarding where the longing to manifolds. In the topic of dimensional- object of interest is – whether the person wants to ity reduction, we are skipping the analysis of classic interact with the agent or whether she is convers- techniques such as LLE (Locally Linear Embedding), ing with another person. The task of deciding where Isomap and their variants. someone is looking seems quite challenging given the fact that the agent is only receiving a large array of 1.1 Preliminaries pixels. However, knowing that a person’s orientation only has one degree of freedom, the relevant infor- We begin by introducing our notation which we will mation in this data can be expressed by just a single use throughout the paper. 1 (x2, y2) 1.5 1 0.5 0 a −0.5 b 10 −1 5 −1.5 0 12 10 −5 (x1, y1) 8 6 −10 4 2 0 −15 Figure 1: A 1-manifold in R3 Figure 2: Movement of a robot’s arm traces out a 2-manifold in R4 Definition 1. We say a function f : U 7→ V is a diffeomorphism, if it is smooth1 and invertible with a smooth inverse. 1.2 Some examples of manifolds Definition 2. A subset M ⊂ RD is said to be a Movement of a robotic arm: Consider the prob- smooth n-manifold if M is locally diffeomorphic to lem of modelling the movement of a robotic arm with Rn, that is, at each p ∈ M we can find an open neigh- two joints (see figure 2). For simplicity let’s restrict borhood U ⊂ RD such that there exist a diffeomorphic the movement to the 2D-plane. Since there are two map between U ∩ M and Rn. degrees of freedom, intuitively one should suspect that the movement should trace out a 2-manifold. It is always helpful to have a picture in mind. See We now confirm this in detail. R3 figure 1 for an example of 1-manifold in . Notice Let’s denote the fixed shoulder joint as the that locally any small segment of the manifold “looks origin, the position of the elbow joint as (x ,y ) R1 1 1 like” an interval in . and the position of wrist as (x2,y2). To see that Definition 3. A tangent space at a point p ∈ M, the movement of the robotic arm traces out a 2-manifold, consider the map f : R4 → R2 defined as denoted by TpM, is the affine subspace formed by col- 2 2 2 2 lection of all tangent vectors to M at p. (x1,y1,x2,y2) 7→ (x1 + y1, (x2 − x1) + (y2 − y1) ). Clearly M ⊂ R4, s.t. M = f −1(b2, a2) is the For the purposes of this survey we will restrict our- desired manifold. We can verify that locally M selves to the discussion of manifolds whose tangent is diffeomorphic to R2 by looking at its derivative x y 0 0 space at each point is equipped with an inner prod- map Df = 2 1 1 uct. Such manifolds are called Riemannian manifolds x1 − x2 y1 − y2 x2 − x1 y2 − y1 and allow us to define various notions of length, an- and observing that it has maximal rank for non- gles, curvature, etc. on the manifold. degenerate values of a and b. Since we will largely be dealing with samples from a manifold, we need to define Set of orthogonal n × n matrices: We present this example to demonstrate that manifolds are not RD Definition 4. A sequence x1,...,xn ⊂ M ⊂ is only good for representing physical processes with called independent and identically distributed (i.i.d.) small degrees of freedom but also to better under- when each xi is picked independently from a fixed dis- stand some of the abstract objects which we regularly tribution D over M. encounter. Consider the problem of understanding With this mathematical machinery in hand, we can the geometry the set of orthonormal matrices in the now demonstrate that manifolds incorporate a wide space of real n × n matrices. Note that the set of array of important examples – we present two such n × n orthonormal matrices is also called the orthogonal group, and is denoted by O(n). We claim that examples that serve as a motivation to study these 2 n objects. this set forms a k(k − 1)/2-manifold in R . 2 n n(n+1)/2 1 To see this, consider the map f : R → R recall that a function is smooth if all its partial derivatives 2 n T Rn ∂ f/∂xi1 ...∂xin exist and are continuous. defined by (A)ij 7→ A A. Now M ⊂ such that 2 −1 M = f (In×n) is exactly O(n). To see that M is As one might expect, finding a mapping that pre- in fact a manifold, observe that the derivative map serves all distances of an arbitrary dataset can be a T T DfA.B = B A + A B is regular. difficult task. Luckily in our case, the saving grace comes from observing that the data has a manifold Observe that the examples above required us to structure. We are only required to preserve distances know the mapping f a priori. However in the context between points that lie on the manifold and not the of machine learning, the task is typically to estimate whole ambient space. properties about M without having access to f. 2.1.1 Dimension reduction of manifold data 1.3 Outline In the past decade, numerous methods for manifold The paper is organized as follows. We will discuss dimension reduction have been proposed. The clas- some linear and non-linear dimensionality reduction sic techniques such as Locally Linear Embeddings methods on manifolds with a special focus on ran- (LLE) and Isomaps, and newer ones such as Lapla- dom projections in section 2. We will then study cian Eigenmaps and Hessian Eigenmaps, all share a Laplacian-eigenmaps as a process to simplify mani- common intuition – all these methods try to capture fold structure in section 3, followed by nonparametric the local manifold geometry by constructing the ad- density estimation techniques on manifolds in section jacency graph on the sampled data. They all bene- 4. We will finally conclude by discussing the signif- fit from the observation that inference done on this icance of the results and some directions for future neighborhood graph corresponds approximately to work in section 5. the inference on the underlying manifold.

Load more