Lecture 7 1 Normalized Adjacency and Laplacian Matrices
Total Page:16
File Type:pdf, Size:1020Kb
ORIE 6334 Spectral Graph Theory September 13, 2016 Lecture 7 Lecturer: David P. Williamson Scribe: Sam Gutekunst In this lecture, we introduce normalized adjacency and Laplacian matrices. We state and begin to prove Cheeger's inequality, which relates the second eigenvalue of the normalized Laplacian matrix to a graph's connectivity. Before stating the inequality, we will also define three related measures of expansion properties of a graph: conductance, (edge) expansion, and sparsity. 1 Normalized Adjacency and Laplacian Matrices We use notation from Lap Chi Lau. Definition 1 The normalized adjacency matrix is A ≡ D−1=2AD−1=2; where A is the adjacency matrix of G and D = diag(d) for d(i) the degree of node i. For a graph G (with no isolated vertices), we can see that 0p1 0 ··· 0 1 d(1) B 1 C B 0 p ··· 0 C −1=2 B d(2) C D = B . C : B . .. C @ A 0 0 ··· p1 d(n) Definition 2 The normalized Laplacian matrix is L ≡ I − A : −1=2 −1=2 −1=2 −1=2 Notice that L = I − A = D (D − A)D = D LGD ; for LG the (unnor- malized) Laplacian. Recall that for the largest eigenvalue λ of A and ∆ the maximum degree of a vertex in a graph, davg ≤ λ ≤ ∆. \Normalizing" the adjacency matrix makes its largest eigenvalue 1, so the analogous result for normalized matrices is the following: Claim 1 Let α1 ≥ · · · ≥ αn be the eigenvalues of A and let λ1 ≤ · · · λn be the eigenvalues of L : Then 1 = α1 ≥ · · · ≥ αn ≥ −1; 0 = λ1 ≤ · · · ≤ λn ≤ 2: 0This lecture is derived from Lau's 2012 notes, Week 2, http://appsrv.cse.cuhk.edu.hk/~chi/ csc5160/notes/L02.pdf. 7-1 Proof: First, we show that 0 is an eigenvalue of L using the vector x = D−1=2e: Then 1=2 −1=2 −1=2 1=2 −1=2 L (D e) = D LGD D e = D LGe = 0; 1=2 since e is a eigenvector of LG corresponding to eigenvalue 0. This shows that D e is an eigenvector of L of eigenvalue 0. To show that it's the smallest eigenvalue, notice that L is positive semidefinite1, as for any x 2 Rn: xT L x = xT (I − A )x X 2 X 2x(i)x(j) = x(i) − p i2V (i;j)2E d(i)d(j) !2 X x(i) x(j) = p − p (i;j)2E d(i) d(j) ≥ 0: 2 The last equality can be see \in reverse" by expanding px(i) − px(j) : We have d(i) d(j) now shown that L has nonnegative eigenvalues, so indeed λ1 = 0. To show that α1 ≤ 1; we make use of the positive semidefiniteness of L = I − A : This gives us that, for all x 2 Rn: xT A x xT (I − A )x ≥ 0 =) xT x − xT A x ≥ 0 =) 1 ≥ : (1) xT x This Rayleigh quotient gives us the upper bound that α1 ≤ 1: To get equality, consider again x = D1=2e. Since, for this x, xT L x = 0 =) xT (I − A )x = 0: xT A x The exact same steps as in Equation 1 yield xT x = 1; as we now have equality. To get a similar lower bound on αn, we can show that I+A is positive semidefinite using a similar sum expansion2. Then T T T T x A x x (I + A )x ≥ 0 =) x x + x A x ≥ 0 =) ≥ −1 =) αn ≥ −1: xT x 1 A slick proof that does not make use of this quadratic is to use the fact that LG is positive T T −1=2 semidefinite. Thus LG = BB for some B, so that L = VV for V = D B: 2This time, use !2 X X 2x(i)x(j) X x(i) x(j) xT (I + A )x = x(i)2 + = + ≥ 0: pd(i)d(j) pd(i) pd(j) i2V (i;j)2E (i;j)2E 7-2 Finally, notice that xT (I + A )x ≥ 0 implies the following chain: T T T T T T x L x −x A x ≤ x x =) x Ix − x A x ≤ 2x x =) ≤ 2 =) λn ≤ 2; xT x using the same Rayleigh quotient trick and that λn is the maximizer of that quotient. Remark 1 Notice that, given the spectrum of A , we have the following: −A has spectrum negatives of A , and I − A adds one to each eigenvalue of −A . Hence, 0 = λ1 ≤ · · · ≤ λn ≤ 2 follows directly from 1 = α1 ≥ · · · ≥ αn ≥ −1: 2 Connectivity and λ2(L ) Recall that λ2(LG) = 0 if and only if G is disconnected. The same is true for λ2(L ), and we can say more! 2.1 Flavors of Connectivity Let S ⊂ V: Recall that δ(S) denotes the set of edges with exactly one endpoint in S, P and define vol(S) ≡ i2S d(i): Definition 3 The conductance of S ⊂ V is jδ(S)j φ(S) ≡ : minfvol(S); vol(V − S)g The edge expansion of S is jδ(S)j n α(S) ≡ ; for jSj ≤ : jSj 2 The sparsity of S is jδ(S)j ρ(S) ≡ : jSjjV − Sj These measures are similar if G is d-regular (i.e., d(i) = d for all i 2 V ). In this case, n α(S) = dφ(S); ρ(S) ≤ α(S) ≤ nρ(S): 2 To see the first equality, e.g., notice that the volume of S is djSj. In general, notice that 0 ≤ φ(S) ≤ 1 for all S ⊂ V: We're usually interested in finding the sets S that minimize these quantities over the entire graph. 7-3 Definition 4 We define φ(G) ≡ min φ(S); α(G) ≡ min α(S); ρ(G) ≡ min ρ(S): S⊂V n S⊂V S⊂V :jS|≤ 2 We call a graph G an expander if φ(G) (or α(G)) is \large" (i.e. a constant3). Otherwise, we say that G has a sparse cut. One algorithm for finding a sparse cut that works well in practice, but that lacks strong theoretical guarantees is called spectral partitioning. Algorithm 1: Spectral Partitioning 1 Compute x2 of L (the eigenvector corresponding to λ2(L )); 2 Sort V such that x2(1) ≤ · · · ≤ x2(n):; 3 Define the sweep cuts for i = 1; :::; n − 1 by Si ≡ f1; :::; ig.; 4 Return mini2f1;:::;n−1g φ(Si).; The following picture illustrates the idea of the algorithm; sweep cuts correspond to cuts between consecutive bars: 푥2(1) 푥2(2) 푥2(3) ⋯ 푥2(푛 − 1) 푥2(푛) Cheeger's inequality provides some insight into why this algorithm works well. 3 Cheeger's Inequality We now work towards proving the following: Theorem 2 (Cheeger's Inequality) Let λ2 be the second smallest eigenvalue of L : Then: λ2 p ≤ φ(G) ≤ 2λ : 2 2 The theorem proved by Jeff Cheeger actually has to do with manifolds and hypersur- faces; the theorem above is considered to be a discrete analog of Cheeger's original inequality. But the name has stuck. Typically, people think of the first inequality being \easy" and the second being \hard." We'll prove the first inequality, and start the proof of the second inequality. 3One should then ask \A constant with respect to what?" Usually one defines families of graphs of increasing size as families of expanders, in which case we want the conductance or expansion constant with respect to the number of vertices. 7-4 Proof: Recall that T T −1=2 −1=2 x L x x D LGD x λ2 = min T = min T : x:<x;D1=2e>=0 x x x:<x;D1=2e>=0 x x Consider the change of variables obtained by setting y = D−1=2x and x = D1=2y : T T y LGy y LGy λ2 = min 1=2 T 1=2 = min T : y:<D1=2y;D1=2e>=0 (D y) (D y) y:<D1=2y;D1=2e>=0 y Dy The minimum is being taken over all y such that < D1=2y; D1=2e >= 0: That is, over y such that: X (D1=2y)T D1=2e = 0 () yT De = 0 () d(i)y(i) = 0: i2V Hence, we have that P (y(i) − y(j))2 λ = min (i;j)2E : 2 P P 2 y: i2V d(i)y(i)=0 i2V d(i)y(i) Now let S∗ be such that φ(G) = φ(S∗), and try defining ( 1; i 2 S∗ y^(i) = 0; else: jδ(S∗)j jδ(S∗)j It would be great if λ2 was bounded by P = ∗ : However, there are two i2S∗ d(i) vol(S ) P jδ(S∗)j ∗ problems. We have i2V d(i)^y(i) 6= 0; moreover vol(S∗) might not be φ(S ); as we want the denominator to be minfvol(S∗); vol(V − S∗)g: Hence, we redefine ( 1 ∗ vol(S∗) ; i 2 S y^(i) = 1 − vol(V −S∗) ; else: Now we notice that: P P X ∗ d(i) ∗ d(i) d(i)^y(i) = i2S − i=2S = 1 − 1 = 0: vol(S∗) vol(V − S∗) i2V Thus, this is a feasible solution to the minimization problem defining λ2; and we have that the only edges contributing anything nonzero to the numerator are those with exactly one endpoint in S∗. Thus: 2 ∗ 1 1 jδ(S )j vol(S∗) + vol(V −S∗) λ2 ≤ 2 2 P 1 P 1 i2S∗ d(i) vol(S∗) + i=2S∗ d(i) vol(V −S∗) 7-5 2 ∗ 1 1 jδ(S )j vol(S∗) + vol(V −S∗) = 1 1 vol(S∗) + vol(V −S∗) 1 1 = jδ(S∗)j + vol(S∗) vol(V − S∗) 1 1 ≤ 2jδ(S∗)j max ; vol(S∗) vol(V − S∗) 2jδ(S∗)j = minfvol(S∗); vol(V − S∗)g = 2φ(G): This completes the proof of the first inequality.