Cauchy Graph Embedding
Total Page:16
File Type:pdf, Size:1020Kb
Cauchy Graph Embedding Dijun Luo [email protected] Chris Ding [email protected] Feiping Nie [email protected] Heng Huang [email protected] The University of Texas at Arlington, 701 S. Nedderman Drive, Arlington, TX 76019 Abstract classify unsupervised embedding approaches into two cat- Laplacian embedding provides a low- egories. Approaches in the first category are to embed data dimensional representation for the nodes of into a linear space via linear transformations, such as prin- a graph where the edge weights denote pair- ciple component analysis (PCA) (Jolliffe, 2002) and mul- wise similarity among the node objects. It is tidimensional scaling (MDS) (Cox & Cox, 2001). Both commonly assumed that the Laplacian embed- PCA and MDS are eigenvector methods and can model lin- ding results preserve the local topology of the ear variabilities in high-dimensional data. They have been original data on the low-dimensional projected long known and widely used in many machine learning ap- subspaces, i.e., for any pair of graph nodes plications. with large similarity, they should be embedded However, the underlying structure of real data is often closely in the embedded space. However, in highly nonlinear and hence cannot be accurately approx- this paper, we will show that the Laplacian imated by linear manifolds. The second category ap- embedding often cannot preserve local topology proaches embed data in a nonlinear manner based on differ- well as we expected. To enhance the local topol- ent purposes. Recently several promising nonlinear meth- ogy preserving property in graph embedding, ods have been proposed, including IsoMAP (Tenenbaum we propose a novel Cauchy graph embedding et al., 2000), Local Linear Embedding (LLE) (Roweis & which preserves the similarity relationships of Saul, 2000), Local Tangent Space Alignment (Zhang & the original data in the embedded space via a Zha, 2004), Laplacian Embedding/Eigenmap (Hall, 1971; new objective. Consequentially the machine Belkin & Niyogi, 2003; Luo et al., 2009), and Local Spline learning tasks (such as k Nearest Neighbor type Embedding (Xiang et al., 2009) etc. Typically, they set up classifications) can be easily conducted on the a quadratic objective derived from a neighborhood graph embedded data with better performance. The and solve for its leading eigenvectors: Isomap takes eigen- experimental results on both synthetic and real vectors associated with the largest eigenvalues; LLE and world benchmark data sets demonstrate the Laplacian embedding use eigenvectors associated with the usefulness of this new type of embedding. smallest eigenvalues. Isomap tries to preserve the global pairwise distances of the input data as measured along the low-dimensional manifold; LLE and Laplacian embedding 1. Introduction try to preserve local geometric relationships of the data. Unsupervised dimensionality reduction is an important As one of the most successful methods in transductive in- procedure in various machine learning applications which ference (Belkin & Niyogi, 2004), spectral clustering (Shi range from image classification (Turk & Pentland, 1991) & Malik, 2000; Simon, 1991; Ng et al., 2002; Ding et al., to genome-wide expression modeling (Alter et al., 2000). 2001), and dimensionality reduction (Belkin & Niyogi, Many high-dimensional real world data often intrinsically 2003), Laplacian embedding seeks a low-dimensional rep- lie in a low-dimensional space, hence the dimensionality of resentation of a set of data points with a matrix of pairwise the data can be reduced without significant loss of infor- similarities (i.e. a graph data) Laplacian embedding and the mation. From the data embedding point of view, we can related usage of eigenvectors of graph Laplace matrix was first developed in 1970s. It was called quadratic placement 28 th Appearing in Proceedings of the International Conference (Hall, 1971) of graph nodes in a space. The eigenvectors of on Machine Learning, Bellevue, WA, USA, 2011. Copyright 2011 by the author(s)/owner(s). graph Laplace matrix were used for graph partitioning and Cauchy Graph Embedding 2 connectivity analysis (Fiedler, 1973). This approach be- The minimization of ij(xi − xj ) wij would get xi = 0 came popular in 1990s for circuit layout in VLSI commu- if there is no constraintP on the magnitude of the vector x. 2 nity (please see the review (Alpert & Kahng, 1995)), and Therefore, the normalization i xi = 1 is imposed. The graph partitioning (Pothen et al., 1990) for domain decom- original objective function isP invariant if we replace xi by position, a key problem in distributed-memory computing. xi+a where a is a constant. Thus the solution is not unique. A generalized version of graph Laplacian (p-Laplacian) To fix this uncertainty, we can adjust the constant such that was also developed for other graph partitioning (Buhler¨ & xi = 0 (xi is centered around 0). With the centering Hein, 2009; Luo et al., 2010). Pconstraint, xi have mixed signs. With these two constraints, the embedding problem becomes: It is generally considered that Laplacian embedding has the local topology preserving property: a pair of graph nodes 2 2 min (xi − xj) wij , s.t. x = 1, xi = 0. (2) x i with high mutual similarities are embedded nearby in the Xij Xi Xi embedding space, whereas a pair of graph nodes with small mutual similarities are embedded far-way in the embedding The solution of this embedding problem can be easily ob- space. Local topology preserving property provides a ba- tained, because sis for utilizing the quadratic embedding objective function T J(x) = 2 xi(D − W )ij xj = 2x (D − W )x, (3) as regularization in many applications (Zhou et al., 2003; Xij Zhu et al., 2003; Nie et al., 2010). Such assumption was considered as a desirable property of Laplacian embedding, where D = diag(d1, · · · , dn), di = j Wij . The ma- and many previous research work used it as the regulariza- trix (D − W ) is called as the graph LaplacianP and the em- tion term to embed the graph data with preserving local bedding solution of minimizing the embedding objective is topology (Weinberger et al.; Ando & Zhang). given by the eigenvectors of In this paper, we point out the perceived local topology (D − W )x = λx. (4) preserving property of Laplacian embedding does not hold Laplacian embedding has been widely used in machine in many applications. More precisely, we first give a pre- learning, and often as regularization for embedding the cise definition of the local topology preserving property, graph nodes with preserving local topology. and then show Laplacian embedding often gives an em- bedding without preserving local topology in the sense that node pairs with large mutual similarities are not embedded 3. The Local Topology Preserving Property of nearby in the embedding space. After that, we will propose Graph Embedding a novel Cauchy embedding method that not only has nice nonlinear embedding properties as Laplacian embedding, In this paper, we study the local topology preserving prop- but also successively preserves the local topology existing erty of the graph embedding. We first provide definition of in original data. Moreover, we introduce Exponential and local topology preserving, and show that in contrary to the Gaussian embedding approaches that further emphasize the widely accepted conception, Laplacian embedding may not data points with large similarities to have small distances in preserve the local topology of original data in the embed- embedding space. Our empirical studies on both synthetic ded space for many cases. data and real world benchmark data sets demonstrate the promising results of our proposed methods. 3.1. Local Topology Preserving We first provide a definition of local topology preserving. 2. Laplacian Embedding Given a symmetric (undirected) graph with edge weights W = (wij ), and a corresponding embedding (x1, · · · , xn) We start with a brief introduction to Laplacian embed- for the n nodes of the graph. We say that the embedding ding/Eigenmap. The input data is a matrix W of pairwise preserves local topology if the following condition holds similarities among n data objects. We view W as the edge 2 2 weights on a graph with n nodes. The task is to embed if wij ≥ wpq, then (xi − xj) ≤ (xp − xq) , ∀ i,j,p,q. the nodes of the graph into 1-D space with coordinates (5) (x1, · · · ,xn). The objective is that if i, j are similar (i.e., Roughly speaking, this definition says that for any pair of wij is large), they should be adjacent in embedded space, nodes (i, j), the more similar they are (the bigger the edge 2 i.e., (xi − xj) should be small. This can be achieved by weight wij is), the closer they should be embedded together minimizing (Hall, 1971; Belkin & Niyogi, 2003) (the smaller |xi − xj| should be). Laplacian embedding has been widely used in machine 2 learning with a perceived notion of preserving local topol- min J(x) = (xi − xj) wij . (1) x Xij ogy. As a contribution of this paper, we point out here that Cauchy Graph Embedding 0.8 0.8 this perceived notion of local topology preserving is in fact 8 6 45 5 3 1 2 4 7 0.6 0.6 false in many cases. 19 9 20 22 36 54 0.4 23 0.4 55 18 56 Our finding has two aspects. First, at large distance (small 21 57 similarity): 0.2 0.2 0 0 29 −0.2 −0.2 36 The quadratic function of the Laplacian embed- 1 34 39 2 −0.4 35 −0.4 3 ding emphasizes the large distance pairs, which 37 4 38 −0.6 49 −0.6 20 52 54 enforces node pair (i, j) with small w to be sep- 55 ij 57 51 50 53 56 12 arated far-away. −0.8 −0.8 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 1 Second, at small distance (large similarity): 0.8 0.8 7 16 0.6 1 3 5 10 0.6 10 4 2 15 6 17 8 9 19 0.4 21 0.4 The quadratic function of the Laplacian embed- 22 14 27 4 30 3 35 0.2 16 36 0.2 37 33 2 25 18 ding de-emphasizes the small distance pairs, 28 46 31 35 1 54 0 43 51 40 0 20 34 27 42 41 44 leading to many violations of local topology pre- −0.2 45 48 69 −0.2 57 81 47 50 49 52 72 45 80 56 −0.4 54 61 60 74 −0.4 79 serving at small distance pairs.