Quick viewing(Text Mode)

Lecture 7 1 Normalized Adjacency and Laplacian Matrices

Lecture 7 1 Normalized Adjacency and Laplacian Matrices

ORIE 6334 Spectral September 13, 2016 Lecture 7 Lecturer: David P. Williamson Scribe: Sam Gutekunst

In this lecture, we introduce normalized adjacency and Laplacian matrices. We state and begin to prove Cheeger’s inequality, which relates the second eigenvalue of the normalized Laplacian to a graph’s connectivity. Before stating the inequality, we will also define three related measures of expansion properties of a graph: , (edge) expansion, and sparsity.

1 Normalized Adjacency and Laplacian Matrices

We use notation from Lap Chi Lau. Definition 1 The normalized is

A ≡ D−1/2AD−1/2, where A is the adjacency matrix of G and D = diag(d) for d(i) the of node i. For a graph G (with no isolated vertices), we can see that √1 0 ··· 0  d(1)  1   0 √ ··· 0  −1/2  d(2)  D =  . . . .  .  . . .. .    0 0 ··· √1 d(n) Definition 2 The normalized is

L ≡ I − A .

−1/2 −1/2 −1/2 −1/2 Notice that L = I − A = D (D − A)D = D LGD , for LG the (unnor- malized) Laplacian. Recall that for the largest eigenvalue λ of A and ∆ the maximum degree of a in a graph, davg ≤ λ ≤ ∆. “Normalizing” the adjacency matrix makes its largest eigenvalue 1, so the analogous result for normalized matrices is the following:

Claim 1 Let α1 ≥ · · · ≥ αn be the eigenvalues of A and let λ1 ≤ · · · λn be the eigenvalues of L . Then

1 = α1 ≥ · · · ≥ αn ≥ −1, 0 = λ1 ≤ · · · ≤ λn ≤ 2. 0This lecture is derived from Lau’s 2012 notes, Week 2, http://appsrv.cse.cuhk.edu.hk/~chi/ csc5160/notes/L02.pdf.

7-1 Proof: First, we show that 0 is an eigenvalue of L using the vector x = D−1/2e. Then 1/2 −1/2 −1/2 1/2 −1/2 L (D e) = D LGD D e = D LGe = 0, 1/2 since e is a eigenvector of LG corresponding to eigenvalue 0. This shows that D e is an eigenvector of L of eigenvalue 0. To show that it’s the smallest eigenvalue, notice that L is positive semidefinite1, as for any x ∈ Rn:

xT L x = xT (I − A )x

X 2 X 2x(i)x(j) = x(i) − p i∈V (i,j)∈E d(i)d(j) !2 X x(i) x(j) = p − p (i,j)∈E d(i) d(j) ≥ 0.

 2 The last equality can be see “in reverse” by expanding √x(i) − √x(j) . We have d(i) d(j) now shown that L has nonnegative eigenvalues, so indeed λ1 = 0. To show that α1 ≤ 1, we make use of the positive semidefiniteness of L = I − A . This gives us that, for all x ∈ Rn: xT A x xT (I − A )x ≥ 0 =⇒ xT x − xT A x ≥ 0 =⇒ 1 ≥ . (1) xT x

This Rayleigh quotient gives us the upper bound that α1 ≤ 1. To get equality, consider again x = D1/2e. Since, for this x,

xT L x = 0 =⇒ xT (I − A )x = 0.

xT A x The exact same steps as in Equation 1 yield xT x = 1, as we now have equality. To get a similar lower bound on αn, we can show that I+A is positive semidefinite using a similar sum expansion2. Then

T T T T x A x x (I + A )x ≥ 0 =⇒ x x + x A x ≥ 0 =⇒ ≥ −1 =⇒ αn ≥ −1. xT x

1 A slick proof that does not make use of this quadratic is to use the fact that LG is positive T T −1/2 semidefinite. Thus LG = BB for some B, so that L = VV for V = D B. 2This time, use !2 X X 2x(i)x(j) X x(i) x(j) xT (I + A )x = x(i)2 + = + ≥ 0. pd(i)d(j) pd(i) pd(j) i∈V (i,j)∈E (i,j)∈E

7-2 Finally, notice that xT (I + A )x ≥ 0 implies the following chain:

T T T T T T x L x −x A x ≤ x x =⇒ x Ix − x A x ≤ 2x x =⇒ ≤ 2 =⇒ λn ≤ 2, xT x

using the same Rayleigh quotient trick and that λn is the maximizer of that quotient. 

Remark 1 Notice that, given the spectrum of A , we have the following: −A has spectrum negatives of A , and I − A adds one to each eigenvalue of −A . Hence, 0 = λ1 ≤ · · · ≤ λn ≤ 2 follows directly from 1 = α1 ≥ · · · ≥ αn ≥ −1.

2 Connectivity and λ2(L )

Recall that λ2(LG) = 0 if and only if G is disconnected. The same is true for λ2(L ), and we can say more!

2.1 Flavors of Connectivity Let S ⊂ V. Recall that δ(S) denotes the set of edges with exactly one endpoint in S, P and define vol(S) ≡ i∈S d(i). Definition 3 The conductance of S ⊂ V is

|δ(S)| φ(S) ≡ . min{vol(S), vol(V − S)}

The edge expansion of S is

|δ(S)| n α(S) ≡ , for |S| ≤ . |S| 2

The sparsity of S is |δ(S)| ρ(S) ≡ . |S||V − S| These measures are similar if G is d-regular (i.e., d(i) = d for all i ∈ V ). In this case, n α(S) = dφ(S), ρ(S) ≤ α(S) ≤ nρ(S). 2 To see the first equality, e.g., notice that the volume of S is d|S|. In general, notice that 0 ≤ φ(S) ≤ 1 for all S ⊂ V. We’re usually interested in finding the sets S that minimize these quantities over the entire graph.

7-3 Definition 4 We define

φ(G) ≡ min φ(S), α(G) ≡ min α(S), ρ(G) ≡ min ρ(S). S⊂V n S⊂V S⊂V :|S|≤ 2

We call a graph G an expander if φ(G) (or α(G)) is “large” (i.e. a constant3). Otherwise, we say that G has a sparse cut. One algorithm for finding a sparse cut that works well in practice, but that lacks strong theoretical guarantees is called spectral partitioning.

Algorithm 1: Spectral Partitioning

1 Compute x2 of L (the eigenvector corresponding to λ2(L )); 2 Sort V such that x2(1) ≤ · · · ≤ x2(n).; 3 Define the sweep cuts for i = 1, ..., n − 1 by Si ≡ {1, ..., i}.; 4 Return mini∈{1,...,n−1} φ(Si).;

The following picture illustrates the idea of the algorithm; sweep cuts correspond to cuts between consecutive bars:

푥2(1) 푥2(2) 푥2(3) ⋯ 푥2(푛 − 1) 푥2(푛)

Cheeger’s inequality provides some insight into why this algorithm works well.

3 Cheeger’s Inequality

We now work towards proving the following:

Theorem 2 (Cheeger’s Inequality) Let λ2 be the second smallest eigenvalue of L . Then: λ2 p ≤ φ(G) ≤ 2λ . 2 2 The theorem proved by Jeff Cheeger actually has to do with manifolds and hypersur- faces; the theorem above is considered to be a discrete analog of Cheeger’s original inequality. But the name has stuck. Typically, people think of the first inequality being “easy” and the second being “hard.” We’ll prove the first inequality, and start the proof of the second inequality.

3One should then ask “A constant with respect to what?” Usually one defines families of graphs of increasing size as families of expanders, in which case we want the conductance or expansion constant with respect to the number of vertices.

7-4 Proof: Recall that T T −1/2 −1/2 x L x x D LGD x λ2 = min T = min T . x:=0 x x x:=0 x x

Consider the change of variables obtained by setting y = D−1/2x and x = D1/2y :

T T y LGy y LGy λ2 = min 1/2 T 1/2 = min T . y:=0 (D y) (D y) y:=0 y Dy

The minimum is being taken over all y such that < D1/2y, D1/2e >= 0. That is, over y such that: X (D1/2y)T D1/2e = 0 ⇐⇒ yT De = 0 ⇐⇒ d(i)y(i) = 0. i∈V Hence, we have that P (y(i) − y(j))2 λ = min (i,j)∈E . 2 P P 2 y: i∈V d(i)y(i)=0 i∈V d(i)y(i) Now let S∗ be such that φ(G) = φ(S∗), and try defining ( 1, i ∈ S∗ yˆ(i) = 0, else.

|δ(S∗)| |δ(S∗)| It would be great if λ2 was bounded by P = ∗ . However, there are two i∈S∗ d(i) vol(S ) P |δ(S∗)| ∗ problems. We have i∈V d(i)ˆy(i) 6= 0; moreover vol(S∗) might not be φ(S ), as we want the denominator to be min{vol(S∗), vol(V − S∗)}. Hence, we redefine

( 1 ∗ vol(S∗) , i ∈ S yˆ(i) = 1 − vol(V −S∗) , else. Now we notice that: P P X ∗ d(i) ∗ d(i) d(i)ˆy(i) = i∈S − i/∈S = 1 − 1 = 0. vol(S∗) vol(V − S∗) i∈V

Thus, this is a feasible solution to the minimization problem defining λ2, and we have that the only edges contributing anything nonzero to the numerator are those with exactly one endpoint in S∗. Thus:

2 ∗  1 1  |δ(S )| vol(S∗) + vol(V −S∗) λ2 ≤ 2 2 P  1  P  1  i∈S∗ d(i) vol(S∗) + i/∈S∗ d(i) vol(V −S∗)

7-5 2 ∗  1 1  |δ(S )| vol(S∗) + vol(V −S∗) = 1 1 vol(S∗) + vol(V −S∗)  1 1  = |δ(S∗)| + vol(S∗) vol(V − S∗)  1 1  ≤ 2|δ(S∗)| max , vol(S∗) vol(V − S∗) 2|δ(S∗)| = min{vol(S∗), vol(V − S∗)} = 2φ(G).

 This completes the proof of the first inequality. To get the second, the idea is to suppose we had a y with

P (y(i) − y(j))2 R(y) ≡ (i,j)∈E . P 2 i∈V d(i)y(i)

Claim 3 We’ll be able to find a cut S ⊂ supp(Y ) ≡ {i ∈ V : y(i) 6= 0} with δ(S) p vol(S) ≤ 2R(y).

δ(S) This will not suffice to prove the second part of the inequality, as vol(S) need not equal φ(S), but we’ll come back to this next lecture. Proof: Without loss of generality, we assume −1 ≤ y(i) ≤ 1, as we can scale y if not. Our trick (from Trevisan) is to pick t ∈ (0, 1] uniformly at random, and let 2 St = {i ∈ V : y(i) ≥ t}. Notice that:

X X 2 E[vol(St)] = d(i)Pr[i ∈ St] = d(i)y(i) , i∈V i∈V

and assuming4 that (i, j) ∈ E =⇒ y(i)2 ≤ y(j)2,

X X 2 2 X 2 2 E[|δ(St)|] = Pr[(i, j) ∈ δ(St)] = Pr[y(i) < t ≤ y(j) ] = (y(j) −y(i) ). (i,j)∈E (i,j)∈E (i,j)∈E

4We make this assumption without loss of generality because it doesn’t matter in the end and is notationally convenient.

7-6 Rewriting the above using difference of squares and using Cauchy-Schwarz,

X s X X (y(j) − y(i))(y(j) + y(i)) ≤ (y(j) − y(i))2 (y(j) + y(i))2 (i,h)∈E (i,j)∈E (i,j)∈E s X s X ≤ (y(j) − y(i))2 2 (y(j)2 + y(i)2) (i,j)∈E (i,j)∈E s X sX = (y(j) − y(i))2 2d(i)y(i)2 (i,j)∈E i∈V sX = p2R(y) d(i)y(i)2. i∈V

This gives that

E[|δ(St)|] p p ≤ R(y) =⇒ E[|δ(St)| − 2R(y) vol(St)] ≤ 0. E[vol(St)] This means that there exists a t such that |δ(S )| t ≤ p2R(y). vol(St)

To derandomize the algorithm, look at each of the n possible cuts St by looking at 2 2 2 sweep cuts for the order y(1) ≤ y(2) ≤ · · · ≤ y(n) . 

7-7