<<

Higher Order Learning with Graphs

Sameer Agarwal [email protected] Kristin Branson [email protected] Serge Belongie [email protected] Department of and Engineering, University of California San Diego, La Jolla, CA 92093, USA

Abstract assumption that it is possible to measure similarity Recently there has been considerable inter- between pairs of points. Consider a k-lines algorithm d est in learning with higher order relations which clusters points in R into k clusters where ele- (i.e., three-way or higher) in the unsupervised ments in each cluster are well-approximated by a line. and semi-supervised settings. As every pair of data points trivially define a line, and tensors have been proposed as the nat- there is no useful measure of similarity between pairs ural way of representing these relations and of points for this problem. However, it is possible to their corresponding algebra as the natural define measures of similarity over triplets of points that tools for operating on them. In this paper indicate how close they are to being collinear. This we argue that hypergraphs are not a natural analogy can be extended to any model-based clustering representation for higher order relations, in- task where the fitting error of a set of points to a model deed pairwise as well as higher order relations can be considered a measure of the dissimilarity among can be handled using graphs. We show that them. We refer to similarity/dissimilarity measured various formulations of the semi-supervised over triples or more of points as higher order relations. and the unsupervised learning problem on hy- A number of questions that have been addressed in do- pergraphs result in the same graph theoretic mains with pairwise relations can now be asked for the problem and can be analyzed using existing case of higher order relations. How does one perform tools. clustering in such a domain? How can one formulate and solve the semi-supervised and unsupervised learn- ing problems in this setting? 1. Introduction Hypergraphs are a generalization of graphs in which Given a data set, it is common practice to represent the edges are arbitrary non-empty subsets of the the similarity relation between its elements using a set. Instead of having edges between pairs of vertices, weighted graph. A number of meth- hypergraphs have edges that connect sets of two or ods for unsupervised and semi-supervised learning can more vertices. While our understanding of then be formulated in terms of operations on this graph. spectral methods relative to that of graphs is very In some cases like , the relation be- limited, a number of authors have considered extensions tween the structural and the spectral properties of the of spectral graph theoretic methods to hypergraphs. graph can be exploited to construct matrix theoretic methods that are also graph theoretic. The most com- Another possible representation of higher order rela- monly used matrix in these methods is the Laplacian tions is a tensor. Tensors are a generalization of ma- of the graph (Chung, 1997). In the same manner that trices to higher dimensional arrays, and they can be the Laplace-Beltrami operator is used to analyze the analyzed with multilinear algebra. geometry of continuous , the Laplacian of a Recently a number of authors have considered the graph is used to study the structure of the graph and problem of unsupervised and semi-supervised learning functions defined on it. in domains with higher order relations (Agarwal et al., A fundamental constraint in this formulation is the 2005; Govindu, 2005; Shashua & Hazan, 2005; Shashua et al., 2006; Zhou et al., 2005). The success of graph rd Appearing in Proceedings of the 23 International Confer- and matrix theoretic representations have prompted ence on Machine Learning, Pittsburgh, PA, 2006. Copyright 2006 by the author(s)/owner(s). researchers to extend these representations to the case Higher Order Learning with Graphs of higher order relations. where S is the |V | × |V | with entry (u, v) equal to the weight of the edge (u, v) if they are In this paper we focus on spectral graph and hyper- connected, 0 otherwise. An important variant is the graph theoretic methods for learning with higher order normalized Laplacian, relations. We survey a number of approaches from machine learning, VLSI CAD and that L = I − D−1/2SD−1/2 (3) have been proposed for analyzing the structure of hy- v v pergraphs. We show that despite significant differences For future reference it is useful to rewrite the above ex- in how previous authors have approached the problem, pressions in terms of the vertex-edge relation there are two basic graph constructions that underlie > all these studies. Furthermore, we show that these L = 2Dv − HWH (4) constructions are essentially the same when viewed 1 L = I − D−1/2HWH>D−1/2 (5) through the lens of the normalized Laplacian. 2 v v The paper is organized as follows. Section 2 defines our notation. Section 3 reviews the properties of the graph 3. The Graph Laplacian Laplacian from a machine learning perspective. Sec- The graph Laplacian is the discrete analog of the tion 4 considers the algebraic generalization of Lapla- Laplace-Beltrami operator on compact Riemannian cian to higher order structures and shows why it is not manifolds (Belkin & Niyogi, 2003; Rosenberg, 1997; useful for machine learning tasks. Section 5 presents Chung, 1997). It has been used extensively in machine a survey of graph constructions and linear operators learning, initially for the unsupervised case and then re- related to hypergraphs that various studies have used cently for semi-supervised learning (Zhu, 2005). In this for analyzing the structure of hypergraphs and for unsu- section we highlight some of the properties of the Lapla- pervised and semi-supervised learning. In Section 6 we cian from a machine learning perspective, and motivate show how all these constructions can be reduced to two the search for similar operators on hypergraphs. graph constructions and their associated Laplacians. Finally, in Section 7 we conclude with a summary and Perhaps the earliest use of the graph Laplacian was the discussion of key results. development of spectral clustering algorithms which considered continuous relaxations of graph partitioning 2. Notation problems (Alpert et al., 1999; Shi & Malik, 2000; Ng et al., 2002). The relaxation converted the optimiza- Let G(V,E) denote a hypergraph with vertex set V tion problem into a generalized eigenvalue problems and edge set E. The edges are arbitrary subsets of V involving the of the graph. with weight w(e) associated with edge e. The In (Zhou & Sch¨olkopf, 2005), the authors develop a dis- d(v) of a vertex is d(v) = P w(e). The degree e∈E|v∈e crete calculus on 2-graphs by treating them as discrete of an edge e is denoted by δ(e) = |e|. For k-uniform analogs of compact Riemannian manifolds. As one of hypergraphs, the degrees of each edge are the same, the consequences of this development they argue that, δ(e) = k. In particular, for the case of ordinary graphs in analogy to the continuous case, the graph Laplacian or “2-graphs,” δ(e) = 2. The vertex-edge incidence be defined as an operator L : H(V ) → H(V ), matrix H is |V | × |E| where the entry h(v, e) is 1 if v ∈ e and 0 otherwise. By these definitions, we have: 1 Lf := div(∇f) (6) X X 2 d(v) = w(e)h(v, e) and δ(e) = h(v, e) (1) e∈E v∈V Zhou et al. also argue that there exists a family of regularization operators on the 2-graphs, the Laplacian De and Dv are the diagonal matrices consisting of being one of them, that can be used for transduction, edge and vertex degrees, respectively. W is the di- i.e., given a partial labeling of the graph vertices y, use agonal matrix of edge weights, w(·). A number of the geometric structure of the graph to induce a labeling different symbols have been used in the literature to f on the unlabeled vertices. The vertex label y(v) denote the Laplacian of graph. We follow the conven- is +1, −1 for positive and negative valued examples, tion in (Chung, 1997) and use L for the combinatorial respectively, and 0 if no information is available about Laplacian and L for the normalized Laplacian. L is the label. They consider the regularized least squares also known as the unnormalized Laplacian of a graph problem and is usually written as 2 arg min hf, Lfi + µkf − yk2 (7) f L = Dv − S (2) Higher Order Learning with Graphs

While the discrete version of the problem where f(v) ∈ lines, triangles, etc. Algebraic topologists refer to these {+1, −1} is a hard combinatorial problem, relaxing the functions as p-chains, where p is size of the simplices range of f to the real line R results in a simple linear on which they are defined. Thus vertex functions are least squares problem, solved as f = µ(µI + ∆)−1y.A 0-chains, edge functions are 1-chains, and so on. In similar formulation is considered by (Belkin & Niyogi, each case one can ask the question, how does one mea- 2003). In the absence of any labels, the problem re- sure the variation in these functions with respect to duces to that of clustering, and the eigenmodes of the the geometry of the hypergraph or its corresponding Laplacian are used to label the vertices (Shi & Malik, simplex? 2000; Ng et al., 2002). Let us take a second look at the graph Laplacian. As More generally, in (Smola & Kondor, 2003), the au- the graph Laplacian is a positive semidefinite operator, thors prove that just as the continuous Laplacian is it can be written as the unique linear second order self adjoint operator L = BB> (8) invariant under the action of rotation operators, the same is true for the Laplacian and the unnormalized Here, B is a |V | × |E| matrix such that (u, v)th column Laplacian with the of rotations replaced by group contains +1 and −1 in rows u and v, respectively. of . The exact ordering does not matter. B is called the boundary operator ∂ that maps on 1-chains (edges) A number of successful regularizers in the continuous 1 to 0-chains and B> is the co-boundary operator that domain can be written as hf, r(L)fi where L is the maps 0-chains to 1-chains. Note that B is different continuous Laplacian, f is the model and r is non- from H; although H is also a vertex-edge incidence decreasing scalar function that operates on the spec- matrix, all of its entries are non-negative. We can trum of ∆. Smola and Kondor show that the same can rewrite be shown for a variety of regularization operators on f >Lf = f >BB>f = kB>fk2. (9) graphs. 2 Thus f >Lf is the squared norm of a vector of size |E|, 4. Higher Order Laplacians whose entries are the change in the vertex function or the 0-chain along an edge. This is a particular case of In light of the previous section, it is interesting to the general definition of the pth Laplacian operator on consider generalizations of the Laplacian to higher or- p-chains, given by der structures. We now present a brief look at the > > Lp = ∂p+1∂ + ∂ ∂p (10) algebro-geometric view of the Laplacian, and how it p+1 p leads to the generalization of the combinatorial Lapla- Symbolically, this is exactly the same as the Laplace cian for hypergraphs. For simplicity of exposition, we operator on p-forms on a Riemannian (Rosen- will consider the unweighted case. For a more formal berg, 1997). For the case of hypergraphs or simplicial presentation of the material in this section, we refer complexes, we interpret this as the operator that mea- the reader to (Munkres, 1984; Chung, 1993; Chung, sures variations on functions defined on p-sized subsets 1997; Forman, 2003). of the vertex set (p-chains). It does so by considering the change in the chain with respect to simplices of Let us assume that a graph represents points in some size p + 1 and p − 1. For the case of the ordinary graph, abstract space with the edges representing lines con- we only consider the first term in the above expression necting these points and the weights on the edge having since vertex functions are 0-chains, and there are no an inverse relation to the length of the line. The Lapla- −1 sized simplices. It is however possible to consider cian then measures how smoothly a function defined 1-chains or functions defined on edges of the graphs and on these points (vertices) changes with respect to their measure their variation using the edge Laplacian, given relative arrangement. As we saw earlier, the quadratic > by L1 = B B. In light of this, the usual Laplacian on form f >Lf does this for the vertex function f. This the graph is the L0 or vertex Laplacian. In (Chung, view of a graph and its Laplacian can be generalized to 1993) the Laplacian for the particular case of the k- hypergraphs. A hypergraph represents points in some uniform hypergraph is presented. A more elaborate abstract space where each hyperedge corresponds to a discussion of the construction of various kinds of Lapla- simplex in that space with the vertices of the hyperedge cians on simplicial complexes and their uses is described as its corners. The weight on the hyperedge is inversely in (Forman, 2003). related to the size of the simplex. Now we are not restricted to define functions on just vertices, we can Unfortunately, while geometrically and algebraically define functions on sets of vertices, corresponding to these constructions extend the graph Laplacian to hy- pergraphs, it is not clear how one can use them in Higher Order Learning with Graphs machine learning. The fundamental object we are in- Here µ is a fixed scalar. The combinatorial or nor- terested in is a vertex function or 0-chain, thus the malized Laplacian of the constructed graph Gx is then linear operator we are looking for should operate on used to partition the vertices. 0-chains. Notice, however, that a pth order Laplacian only considers p-chains, and the structure of the Lapla- 5.2. Expansion cian depends on the incidence relations between p − 1, p and p + 1 simplices. To operate on vertex functions, The star expansion algorithm constructs a graph ∗ ∗ ∗ one needs a vertex Laplacian, which unfortunately only G (V ,E ) from hypergraph G(V,E) by introduc- considers the incidence of 0-chains with 1-chains. Thus ing a new vertex for every hyperedge e ∈ E, thus ∗ the vertex Laplacian for a k-uniform hypergraph will V = V ∪ E (Zien et al., 1999). It connects the new not consider any hyperedges, rendering it useless for graph vertex e to each vertex in the hyperedge to it, ∗ the purposes of studying vertex functions. Indeed the i.e. E = {(u, e): u ∈ e, e ∈ E}. Laplacian on a 3-uniform graph operates on 2-chains, Note that each hyperedge in E corresponds to a star in functions defined on all pairs of vertices (Chung, 1993). the graph G∗ and that G∗ is a bi-partite graph. Star expansion assigns the scaled hyperedge weight to each 5. Hypergraph Learning Algorithms corresponding graph edge: ∗ A number of existing methods for learning from a w (u, e) = w(e)/δ(e) (13) hypergraph representation of data first construct a The combinatorial or normalized Laplacian of the con- graph representation using the structure of the initial structed graph Gx is then used to partition the vertices. hypergraph. Then, they project the data onto the eigenvectors of the combinatorial or normalized graph 5.3. Bolla’s Laplacian Laplacian. Other methods define a hypergraph “Lapla- cian” using analogies from the graph Laplacian. These Bolla (Bolla, 1993) defines a Laplacian for an un- methods show that the eigenvectors of their Laplacians weighted hypergraph in terms of the diagonal vertex are useful for learning, and that there is a relationship degree matrix Dv, the diagonal edge degree matrix De, between their hypergraph Laplacians and the struc- and the H, defined in Section 2. ture of the hypergraph. In this section, we review o −1 > these methods. In the next section, we compare these L := Dv − HDe H . (14) methods analytically. The eigenvectors of Bolla’s Laplacian Lo define the “best” Euclidean embedding of the hypergraph. Here, 5.1. Clique Expansion k the cost for embedding φ : V → R of the hypergraph The clique expansion algorithm constructs a graph is the total squared distance between pairs of embedded Gx(V,Ex ⊆ V 2) from the original hypergraph G(V,E) vertices in the same hyperedge by replacing each hyperedge e = (u1, ..., u ) ∈ E with X X δ(e) kφ(u) − φ(v)k2 (15) an edge for each pair of vertices in the hyperedge (Zien et al., 1999): Ex = {(u, v): u, v ∈ e, e ∈ E}. u,v∈V e∈E:u,v∈e Bolla shows a relationship between the spectral prop- Note that the vertices in hyperedge e form a clique in erties of Lo and the minimum cut of the hypergraph. the graph Gx. The edge weight wx(u, v) minimizes the difference between the weight of the graph edge and the weight of each hyperedge e that contains both u 5.4. Rodriguez’s Laplacian and v: Rodr´ıguez (Rodr´ıguez, 2003; Rodr´ıguez, 2002) con- r r x X 2 structs a weighted graph G (V,E = E ) from an wx(u, v) = arg min (wx(u, v) − w(e)) (11) wx(u,v) unweighted hypergraph G(V,E). Like clique expan- e∈E:u,v∈e sion, each hyperedge is replaced by a clique in the Thus, clique expansion uses the discriminative model graph Gr. The weight wr(u, v) of an edge is set to the that every edge in the clique of Gx associated with number of edges containing both u and v: hyperedge e has weight w(e). The minimizer of this wr(u, v) = |{e ∈ E : u, v ∈ e}| (16) criterion is simply Rodr´ıguezexpresses the graph Laplacian applied to Gr x X X w (u, v) = µ w(e) = µ h(u, e)h(v, e)w(e). in terms of the hypergraph structure: e∈E:u,v∈e e r r r > (12) L (G ) = Dv − HH (17) Higher Order Learning with Graphs

r r where Dv is the vertex degree matrix of the graph G . 5.7. Li’s Adjacency Matrix Like Bolla, Rodriguez shows a relationship between Li et al. (Li & Sol´e,1996) formally define properties of a the spectral properties of Lr and the cost of minimum regular, unweighted hypergraph G(V,E) in terms of the partitions of the hypergraph. star expansion of the hypergraph. In particular, they define the |V |×|V | adjacency matrix of the hypergraph, 5.5. Zhou’s Normalized Laplacian HH>. They show a relationship between the spectral Zhou et al. (Zhou et al., 2005) generalize their earlier properties of the adjacency matrix of the hypergraph work on regularization on graphs and consider the HH> and the structure of the hypergraph. following regularization on a vertex function f. !2 6. Comparing Hypergraph Learning 1 X 1 X f(u) f(v) hf, Lzfi = w(e) − Algorithms 2 δ(e) pd(u) pd(v) e∈E {u,v}⊆e In this section, we compare the algorithms for learning Note that this regularization term is small if vertices from a hypergraph representation of data described in with high affinities have the same label. They show Section 5. In Section 6.1, we compute the normalized that the operator Lz can then be written as Laplacian for the star expansion graph. In Section 6.2, we compute the combinatorial and normalized z −1/2 −1 > −1/2 L = I − Dv HWDe H Dv (18) Laplacian of the clique expansion graph. In Section 6.3, In addition, Zhou et al. define a hypergraph normalized we show that these Laplacians are nearly equivalent to each other. Finally, in Section 6.4, we show that the cut criterion for a k-partition of the vertices Pk = {V , ..., V }: various hypergraph Laplacians can be written as the 1 k graph Laplacian of the clique expansion graph. k P c X e∈E w(e)|e ∩ Vi||e ∩ Vi | We begin by stating a simple lemma. The proof is NCut(Pk) := . (19) δ(e) P d(v) trivial. i=1 v∈Vi This criterion is analogous to the normalized cut crite- Lemma 1. Let, rion for graphs. They then show that if minimizing the  I −A  normalized cut is relaxed to a real-vealed optimization B = −A> I problem, the second smallest eigenvector of Lz is the optimal classification function f. Finally, they also be a block matrix with A rectangular. Consider the draw a parallel between their hypergraph normalized eigenvalue problem cut criterion and random walks over the hypergraph.  I −A   x   x  = λ 5.6. Gibson’s Dynamical System −A> I y y

In (Gibson et al., 1998) the authors have proposed a then the following relation holds dynamical system to cluster categorical data that can be represented using a hypergraph. They consider the AA>x = (1 − λ)2x following iterative process.

n+1 P P n 6.1. Star Graph Laplacian 1. sij = e:i∈e k6=i∈e weskj Given a hypergraph G(V,E), consider the star graph 2. Orthonormalize the vectors sn. j G∗(V ∗,E∗), i.e. V ∗ = V ∪ E,E∗ = {(u, e): u ∈ e, e ∈ E}. Notice that this is a , with They prove that the above iteration is convergent. We vertices corresponding to E on one side and vertices observe that corresponding to V on the other, since there are no ! n+1 X X n n edges from V to V or from E to E. Let us also assume sij = h(i, e) h(k, e)weskj − wesij that the vertex set V ∗ has been ordered such that all e k elements of V come before elements of E. n+1 > n sj = (HWH − Dv)sj (20) Let w∗ : V × E → R+ be the (as yet unspecified) Thus, the iterative procedure described above is the graph edge weight function. In addition, let S∗ be the power method for calculating the eigenvectors of the (|V | + |E|) × (|V | + |E|) affinity matrix. We can write > adjacency matrix S = Dv − HWH . the affinity matrix in terms of the hypergraph structure Higher Order Learning with Graphs and the weight function w∗ as 6.2. Clique Graph Laplacian Given a hypergraph G(V,E), consider the graph  0 HW ∗  S∗ = |V | (21) Gc(V,Ec = Ex) with the same structure as the clique W ∗H> 0 |E| expansion graph, i.e. Ec = {(u, v): u, v ∈ e, e ∈ E. ∗ The degrees of vertices in G are then Let wc : V × E → R+ be the (as yet unspecified) hypergraph edge weight. We can write the normalized X c d∗(u) = h(u, e)w∗(u, e) u ∈ V (22) Laplacian of G in terms of the hypergraph structure c c e∈E and the weight function w as L := I − C. If there is X no hyperedge e ∈ E such that u, v ∈ E then C = 0. d∗(e) = h(u, e)w∗(u, e) e ∈ E (23) uv Otherwise, u∈V wc(u, v) The normalized Laplacian of this graph can now be [C]uv = (31) pdc(u)pdc(v) written in the form where   ∗ I −A L = . (24) c X X c −A> I d (u) = h(u, e) w (u, v) (32) e∈E v∈e\{u} Here, A is the |V | × |E| matrix is the vertex degree. For the standard clique expansion construction, A = D∗−1/2HWD∗−1/2 v v X wc(u, v) = wx(u, v) = w(e). (33) with entry (u, e) e∈E:u,v∈e so the vertex degrees are h(u, e)w∗(u, e) Aue = p p . (25) X d∗(u) d∗(e) dc(u) = dx(u) = h(u, e)(δ(e) − 1)w(e) (34) e∈E > > > ∗ Any |V |+|E| eigenvector x = [xv , xe ] of L satisfies L∗x = λx. Then by Lemma 1, we know that 6.3. Unifying Star and Clique Expansion

> 2 To show the relationship between star and clique ex- AA xv = (λ − 1) xv. (26) ∗ ∗ ∗ pansion, consider the star expansion graph Gc (V ,E ) with weighting function Thus, the |V | elements of the eigenvectors of the normal- ∗ ∗ ∗ ized Laplacian L corresponding to vertices V ⊆ V are wc (u, e) := w(e)(δ(e) − 1) (35) the eigenvectors of the |V | × |V | matrix AA>. Element (u, v) of AA> is Note that this is (δ(e) − 1)δ(e) times the standard star expansion weighting function w∗(u, e) (Eq. (13)). ∗ ∗ Plugging this value into Equations (22) and (23), we > X h(u, e)h(v, e)w (u, e)w (v, e) [AA ]uv = . (27) ∗ p ∗ ∗ p ∗ get that the degrees of vertices in G are e∈E d (u)d (e) d (v) ∗ X x dc (u) = h(u, e)w(e)(δ(e) − 1) = d (u) (36) For the standard star expansion weighting function, e∈E w∗(u, e) = w(e)/δ(e), so the vertex degrees are ∗ dc (e) = w(e)δ(e)(δ(e) − 1) (37) X x d∗(u) = h(u, e)w(e)/δ(e) u ∈ V (28) where d (u) is the vertex degree for the standard clique ∗ e∈E expansion graph Gx. Thus, ∗ X   d (e) = w(e)/δ(e) = w(e) e ∈ E (29) ∗ ∗> X δ(e) − 1 h(u, e)h(v, e)w(e) [A A ]uv = (38) u∈e c c δ(e) p x p x e∈E d (u) d (v) Thus, we can write Similarly, suppose we choose the clique expansion weighting function h(u, e)h(v, e)w(e)/δ(e)2 > ∗ X P [AA ]uv = = p p (30) h(u, e)h(v, e)w(e) d∗(u) d∗(v) wc(u, v) := e∈E (39) e∈E ∗ δ(e)(1 − δ(e)) Higher Order Learning with Graphs

Then we can show that the vertex degree is For an unweighted hypergraph, Bolla’s Laplacian Lo

c X ∗ corresponds to the unnormalized Laplacian of the asso- d∗(u) = h(u, e)w(e)/δ(e) = d (u) (40) ciated clique expansion with the weight matrix of the e∈E hypergraph the inverse of the degree matrix De: ∗ where d (u) is the vertex degree function for standard o −1 > star expansion. We can then write W = HDe H (42) X 1 h(u, e)h(v, e)w(e) The row sums of this matrix are given by [C∗]uv = (41) δ(e)δ(e − 1) pd∗(u)pd∗(v) X X 1 X e∈E do(u) = h(u, e) h(v, e) = h(u, e) (43) δ(e) v e∈E e∈E A commonly occuring case is the k-uniform hyper- graph. In this case, each hyperedge has exactly the which as a diagonal matrix is exactly the vertex degree same number of vertices, i.e. δ(e) = k. Then it is matrix Dv for an unweighted hypergraph, giving us ∗ ∗> the unnormalized Laplacian easy to see that the bipartite graph matrix Ac Ac is a constant scalar times the clique expansion matrix o −1 > L = Dv − HD H (44) C. Thus, the eigenvectors of the normalized Laplacian e ∗ for the bipartite graph Gc are exactly the eigenvectors The Rodr´ıguezLaplacian can similarly be shown to of the normalized Laplacian for the standard clique be the unnormalized Laplacian of the clique expansion x expansion graph G . Similarly, the clique matrix C∗ is of an unweighted graph with every hyperedge weight a constant scalars times the standard star expansion set to 1. Similarly, Gibson’s algorithm calculates the matrix [AA>]∗. Thus, the eigenvectors of the normal- eigenvectors of the adjacency matrix for the clique c ized Laplacian for the clique graph G∗ are exactly the expansion graph. eigenvectors of the normalized Laplacian for standard We now turn our attention to the normalized Laplacian star expansion. This is a surprising result, since the of Zhou et al. Consider the star expansion of the two graphs are completely different in the number of hypergraph with the weight function wz(u, e) = w(e). vertices and the connectivity between these vertices. Then the adjacency matrix for the resulting bi-partite For non-uniform hypergraphs (i.e. the hyperedge cardi- graph can be written as ∗ ∗> nality varies), the bipartite graph matrix Ac Ac while   z 0 HW not the same is close to the clique expansion matrix S = > (45) c WH 0 C∗. Each term in the sum in Equation (38) has an additional factor (δ(e) − 1)/δ(e), giving slightly higher It is easy to show that the degree matrix for this graph weight to hyperedges with a higher degree. This dif- is the diagonal matrix ference however is not large, especially with higher  D 0  cardinalities. As the bipartite graph matrix A A> is Dz = v (46) c c 0 WD approximately the clique expansion matrix, we con- e clude that their eigenvectors are similar. A similar Thus the normalized Laplacian for this bi-partite graph c relation holds for the clique graph G∗ and the stan- is given by the matrix dard star expansion where the clique graph gives lower " −1/2 −1/2 −1/2 # weight to larger edges. These observations can be re- I −Dv HW De −1/2 −1/2 > −1/2 versed to characterize the behavior of the standard −De W H WDv I clique expansion and star expansion construction, and we conclude that the clique expansion gives more weight Now if we consider the eigenvalue problem for this > x x to evidence from larger edges than star expansion. matrix, with eigenvectors x = [ v e ] then by Lemma 1, we can show that xv is given by the following There is no clear reason why one should give more eigenvalue problem. weight to smaller hyperedges versus larger edges or vice −1/2 −1 > −1/2 2 versa. The exact choice will depend on the properties Dv HWDe H Dv xv = (1 − λ) xv (47) −1/2 −1 > −1/2 2 of the affinity function used. (I − Dv HWDe H Dv )xv = (1 − (1 − λ) )xv This is exactly the same eigenvalue problem that Zhou 6.4. Unifying Hypergraph Laplacians et al. propose for the solution of the clustering prob- In this section we take a second look at the various lem. Thus Zhou et al.’s Laplacian is equivalent to constructions in Section 5 and show that they all corre- constructing a star expansion and using the normalized spond to either clique or star expansion of the original Laplacian defined on it. The following table summa- hypergraph with the appropriate weighting function. rizes this discussion. Higher Order Learning with Graphs

Algorithm Graph Matrix Bolla, M. (1993). Spectra, euclidean representations and Bolla Clique Combinatorial Laplacian clusterings of hypergraphs. Discrete , 117. Rodr´ıguez Clique Combinatorial Laplacian Chung, F. R. (1993). The Laplacian of a hypergraph. In Zhou Star Normalized Laplacian J. Friedman (Ed.), Expanding graphs (DIMACS series), Gibson Clique Adjacency 21–36. AMS. Li Star Adjacency Chung, F. R. K. (1997). . AMS. Table 1. This table summarizes the various hypergraph Forman, R. (2003). Bochner’s method for cell complexes and learning algorithms, their underlying graph construction combinatorial Ricci curvature. Discrete & Computational and the associated matrix used for the spectral analysis. Geometry, 29, 323–374. Gibson, D., Kleinberg, J. M., & Raghavan, P. (1998). Clus- tering categorical data: An approach based on dynamical 7. Discussion systems. VLDB (pp. 311–322). In this paper we have examined the use of hypergraphs Govindu, V. M. (2005). A tensor decomposition for geo- in learning with higher order relations. We surveyed metric grouping and segmentation. CVPR. the various Laplace like operators that have been con- Li, W., & Sol´e,P. (1996). Spectra of regular graphs and hy- structed to analyze the structure of hypergraphs. We pergraphs and orthogonal polynomials. European Journal showed that all of these methods, despite their very of Combinatorics, 17, 461–477. different formulations, can be reduced to two graph Munkres, J. R. (1984). Elements of algebraic topology. The constructions – the star expansion and the clique ex- Benjamin/Cummings Publishing Company. pansion – and the study of their associated Laplacians. Ng, A., Jordan, M., & Weiss, Y. (2002). On spectral We have also shown that for the commonly occurring clustering: Analysis and an algorithm. NIPS. case of k-uniform graphs these two constructions are identical. This is a surprising and unexpected result Rodr´ıguez,J. A. (2002). On the Laplacian eigenvalues and metric parameters of hypergraphs. Linear and Multilinear as the two graph constructions are completely differ- Algebra, 50, 1–14. ent in structure. In the case of non-uniform graphs, we showed that the essential difference between the Rodr´ıguez, J. A. (2003). On the Laplacian spectrum and walk-regular hypergraphs. Linear and Multilinear Alge- two constructions is how they weigh the evidence from bra, 51, 285–297. hyperedges of differing sizes. Rosenberg, S. (1997). The Laplacian on a Riemannian In summary, while hypergraphs may be an intuitive manifold. London Mathematical Society. representation of higher order similarities, it seems Shashua, A., & Hazan, T. (2005). Non-negative tensor (anecdotally at least) that graphs lie at the heart of factorization with applications to statistics and vision. this problem. ICML. Shashua, A., Zass, R., & Hazan, T. (2006). Multi-way Acknowledgements clustering using super-symmetric non-negative tensor factorization. ECCV. It is a pleasure to acknowledge our conversations with Shi, J., & Malik, J. (2000). Normalized cuts and image Fan Chung Graham, Pietro Perona, Lihi Zelnik-Manor, segmentation. PAMI, 22, 888–905. Eric Wiewiora, Piotr Doll´ar, and Manmohan Chan- draker. Serge Belongie and Sameer Agarwal were sup- Smola, A., & Kondor, I. (2003). Kernels and regularization on graphs. COLT. Springer. ported by NSF CAREER Grant #0448615 and the Alfred P. Sloan Research Fellowship. Kristin Branson Zhou, D., Huang, J., & Sch¨olkopf, B. (2005). Beyond was supported by NASA GSRP grant NGT5-50479. pairwise classification and clustering using hypergraphs (Technical Report 143). Max Plank Institute for Biologi- cal Cybernetics, T¨ubingen,Germany. References Zhou, D., & Sch¨olkopf, B. (2005). Regularization on discrete Agarwal, S., Lim, J., Zelnik-Manor, L., Perona, P., Krieg- spaces. Pattern Recognition, 361–368. man, D. J., & Belongie, S. (2005). Beyond pairwise Zhu, X. (2005). Semi-supervised learning literature survey clustering. CVPR (2) (pp. 838–845). (Technical Report 1530). Computer Sciences, University Alpert, C. J., Kahng, A. B., & Yao, S.-Z. (1999). Spectral of Wisconsin. partitioning with multiple eigenvectors. Discrete Applied Zien, J. Y., Schlag, M. D. F., & Chan, P. K. (1999). Multi- Mathematics, 90, 3–26. level spectral hypergraph partitioning with arbitrary ver- tex sizes. IEEE Transactions on Computer-Aided Design Belkin, M., & Niyogi, P. (2003). Semi-supervised learning of Integrated Circuits and Systems, 18, 1389–1399. on riemannian manifolds. Mach. Learn., 56, 209–239.