Higher Order Learning with Graphs

Higher Order Learning with Graphs Sameer Agarwal [email protected] Kristin Branson [email protected] Serge Belongie [email protected] Department of Computer Science and Engineering, University of California San Diego, La Jolla, CA 92093, USA Abstract assumption that it is possible to measure similarity Recently there has been considerable inter- between pairs of points. Consider a k-lines algorithm d est in learning with higher order relations which clusters points in R into k clusters where ele- (i.e., three-way or higher) in the unsupervised ments in each cluster are well-approximated by a line. and semi-supervised settings. Hypergraphs As every pair of data points trivially define a line, and tensors have been proposed as the nat- there is no useful measure of similarity between pairs ural way of representing these relations and of points for this problem. However, it is possible to their corresponding algebra as the natural define measures of similarity over triplets of points that tools for operating on them. In this paper indicate how close they are to being collinear. This we argue that hypergraphs are not a natural analogy can be extended to any model-based clustering representation for higher order relations, in- task where the fitting error of a set of points to a model deed pairwise as well as higher order relations can be considered a measure of the dissimilarity among can be handled using graphs. We show that them. We refer to similarity/dissimilarity measured various formulations of the semi-supervised over triples or more of points as higher order relations. and the unsupervised learning problem on hy- A number of questions that have been addressed in do- pergraphs result in the same graph theoretic mains with pairwise relations can now be asked for the problem and can be analyzed using existing case of higher order relations. How does one perform tools. clustering in such a domain? How can one formulate and solve the semi-supervised and unsupervised learning problems in this setting? 1. Introduction Hypergraphs are a generalization of graphs in which Given a data set, it is common practice to represent the edges are arbitrary non-empty subsets of the vertex the similarity relation between its elements using a set. Instead of having edges between pairs of vertices, weighted graph. A number of machine learning meth- hypergraphs have edges that connect sets of two or ods for unsupervised and semi-supervised learning can more vertices. While our understanding of hypergraph then be formulated in terms of operations on this graph. spectral methods relative to that of graphs is very In some cases like spectral clustering, the relation be- limited, a number of authors have considered extensions tween the structural and the spectral properties of the of spectral graph theoretic methods to hypergraphs. graph can be exploited to construct matrix theoretic methods that are also graph theoretic. The most com- Another possible representation of higher order rela- monly used matrix in these methods is the Laplacian tions is a tensor. Tensors are a generalization of ma- of the graph (Chung, 1997). In the same manner that trices to higher dimensional arrays, and they can be the Laplace-Beltrami operator is used to analyze the analyzed with multilinear algebra. geometry of continuous manifolds, the Laplacian of a Recently a number of authors have considered the graph is used to study the structure of the graph and problem of unsupervised and semi-supervised learning functions defined on it. in domains with higher order relations (Agarwal et al., A fundamental constraint in this formulation is the 2005; Govindu, 2005; Shashua & Hazan, 2005; Shashua et al., 2006; Zhou et al., 2005). The success of graph rd Appearing in Proceedings of the 23 International Confer- and matrix theoretic representations have prompted ence on Machine Learning, Pittsburgh, PA, 2006. Copyright 2006 by the author(s)/owner(s). researchers to extend these representations to the case Higher Order Learning with Graphs of higher order relations. where S is the |V | × |V | adjacency matrix with entry (u, v) equal to the weight of the edge (u, v) if they are In this paper we focus on spectral graph and hyper- connected, 0 otherwise. An important variant is the graph theoretic methods for learning with higher order normalized Laplacian, relations. We survey a number of approaches from machine learning, VLSI CAD and graph theory that L = I − D−1/2SD−1/2 (3) have been proposed for analyzing the structure of hy- v v pergraphs. We show that despite significant differences For future reference it is useful to rewrite the above ex- in how previous authors have approached the problem, pressions in terms of the vertex-edge incidence relation there are two basic graph constructions that underlie > all these studies. Furthermore, we show that these L = 2Dv − HWH (4) constructions are essentially the same when viewed 1 L = I − D−1/2HWH>D−1/2 (5) through the lens of the normalized Laplacian. 2 v v The paper is organized as follows. Section 2 defines our notation. Section 3 reviews the properties of the graph 3. The Graph Laplacian Laplacian from a machine learning perspective. Sec- The graph Laplacian is the discrete analog of the tion 4 considers the algebraic generalization of Lapla- Laplace-Beltrami operator on compact Riemannian cian to higher order structures and shows why it is not manifolds (Belkin & Niyogi, 2003; Rosenberg, 1997; useful for machine learning tasks. Section 5 presents Chung, 1997). It has been used extensively in machine a survey of graph constructions and linear operators learning, initially for the unsupervised case and then re- related to hypergraphs that various studies have used cently for semi-supervised learning (Zhu, 2005). In this for analyzing the structure of hypergraphs and for unsu- section we highlight some of the properties of the Lapla- pervised and semi-supervised learning. In Section 6 we cian from a machine learning perspective, and motivate show how all these constructions can be reduced to two the search for similar operators on hypergraphs. graph constructions and their associated Laplacians. Finally, in Section 7 we conclude with a summary and Perhaps the earliest use of the graph Laplacian was the discussion of key results. development of spectral clustering algorithms which considered continuous relaxations of graph partitioning 2. Notation problems (Alpert et al., 1999; Shi & Malik, 2000; Ng et al., 2002). The relaxation converted the optimiza- Let G(V, E) denote a hypergraph with vertex set V tion problem into a generalized eigenvalue problems and edge set E. The edges are arbitrary subsets of V involving the Laplacian matrix of the graph. with weight w(e) associated with edge e. The degree In (Zhou & Schölkopf, 2005), the authors develop a dis- d(v) of a vertex is d(v) = P w(e). The degree e∈E|v∈e crete calculus on 2-graphs by treating them as discrete of an edge e is denoted by δ(e) = |e|. For k-uniform analogs of compact Riemannian manifolds. As one of hypergraphs, the degrees of each edge are the same, the consequences of this development they argue that, δ(e) = k. In particular, for the case of ordinary graphs in analogy to the continuous case, the graph Laplacian or “2-graphs,” δ(e) = 2. The vertex-edge incidence be defined as an operator L : H(V ) → H(V ), matrix H is |V | × |E| where the entry h(v, e) is 1 if v ∈ e and 0 otherwise. By these definitions, we have: 1 Lf := div(∇f) (6) X X 2 d(v) = w(e)h(v, e) and δ(e) = h(v, e) (1) e∈E v∈V Zhou et al. also argue that there exists a family of regularization operators on the 2-graphs, the Laplacian De and Dv are the diagonal matrices consisting of being one of them, that can be used for transduction, edge and vertex degrees, respectively. W is the di- i.e., given a partial labeling of the graph vertices y, use agonal matrix of edge weights, w(·). A number of the geometric structure of the graph to induce a labeling different symbols have been used in the literature to f on the unlabeled vertices. The vertex label y(v) denote the Laplacian of graph. We follow the conven- is +1, −1 for positive and negative valued examples, tion in (Chung, 1997) and use L for the combinatorial respectively, and 0 if no information is available about Laplacian and L for the normalized Laplacian. L is the label. They consider the regularized least squares also known as the unnormalized Laplacian of a graph problem and is usually written as 2 arg min hf, Lfi + µkf − yk2 (7) f L = Dv − S (2) Higher Order Learning with Graphs While the discrete version of the problem where f(v) ∈ lines, triangles, etc. Algebraic topologists refer to these {+1, −1} is a hard combinatorial problem, relaxing the functions as p-chains, where p is size of the simplices range of f to the real line R results in a simple linear on which they are defined. Thus vertex functions are least squares problem, solved as f = µ(µI + ∆)−1y.A 0-chains, edge functions are 1-chains, and so on. In similar formulation is considered by (Belkin & Niyogi, each case one can ask the question, how does one mea- 2003). In the absence of any labels, the problem re- sure the variation in these functions with respect to duces to that of clustering, and the eigenmodes of the the geometry of the hypergraph or its corresponding Laplacian are used to label the vertices (Shi & Malik, simplex? 2000; Ng et al., 2002).

Load more