Filtering and Sampling Graph Signals, and its Application to Compressive Spectral Clustering
Nicolas Tremblay(1,2), Gilles Puy(1), R´emiGribonval(1), Pierre Vandergheynst(1,2)
(1) PANAMA Team, INRIA Rennes, France (2) Signal Processing Laboratory 2, EPFL, Switzerland Introduction to GSP Graph sampling Application to clustering Conclusion
Why graph signal processing ?
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 1 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Introduction to GSP Graph Fourier Transform Graph filtering
Graph sampling
Application to clustering
What is Spectral Clustering ? Compressive Spectral Clustering A toy experiment Experiments on the SBM
Conclusion
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 2 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Introduction to graph signal processing : graph Fourier transform
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 3 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
What’s a graph signal ?
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 4 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Three useful matrices
The adjacency matrix : The degree matrix :
0 1 1 0 2 0 0 0 1 0 1 1 0 3 0 0 W = S = 1 1 0 0 0 0 2 0 0 1 0 0 0 0 0 1
The Laplacian matrix :
2 −1 −1 0 −1 3 −1 −1 L = S − W = −1 −1 2 0 0 −1 0 1
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 5 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Three useful matrices
The adjacency matrix : The degree matrix :
0 .5 .5 0 1 0 0 0 .5 0 .5 4 0 5 0 0 W = S = .5 .5 0 0 0 0 1 0 0 4 0 0 0 0 0 4
The Laplacian matrix :
1 −.5 −.5 0 −.5 5 −.5 −4 L = S − W = −.5 −.5 1 0 0 −4 0 4
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 5 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
What’s a graph Fourier transform ? [Hammond ’11]
• U is the Fourier basis of the graph • the Fourier transform of a signal x reads :
> L = S − W =UΛU xˆ =U >x
• Λ = Diag(λ1, λ2, ··· , λN ) the spectrum
A low frequency Fourier mode A high frequency Fourier mode
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 6 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
The graph Fourier transform encodes the structure of the graph
Slide courtesy of D. Shuman
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 7 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Introduction to graph signal processing : filtering graph signals
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 8 / 47 Problem : this costs L’s diagonalisation [O(N3)].
p Solution : we use a poly approx of order p of h : X l h˜(λ) = αl λ ' h(λ). l=1 Indeed, in this case :
p p > X l > X l H˜x =U h˜(Λ)U x =U αl Λ U x = αl L x ' Hx l=1 l=1 ⇒ Only involves matrix-vector multiplications [costs O(pN)].
Introduction to GSP Graph sampling Application to clustering Conclusion
Graph filtering
1
0.8 Given a filter function h defined in the 0.6 g( λ ) Fourier space. h 0.4 0.2
0 1 2 λ
In the node space, the signal x filtered by h reads : x h =U h(Λ)U > x = Hx
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 9 / 47 p Solution : we use a poly approx of order p of h : X l h˜(λ) = αl λ ' h(λ). l=1 Indeed, in this case :
p p > X l > X l H˜x =U h˜(Λ)U x =U αl Λ U x = αl L x ' Hx l=1 l=1 ⇒ Only involves matrix-vector multiplications [costs O(pN)].
Introduction to GSP Graph sampling Application to clustering Conclusion
Graph filtering
1
0.8 Given a filter function h defined in the 0.6 g( λ ) Fourier space. h 0.4 0.2
0 1 2 λ
In the node space, the signal x filtered by h reads : x h =U h(Λ)U > x = Hx Problem : this costs L’s diagonalisation [O(N3)].
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 9 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Graph filtering
1
0.8 Given a filter function h defined in the 0.6 g( λ ) Fourier space. h 0.4 0.2
0 1 2 λ
In the node space, the signal x filtered by h reads : x h =U h(Λ)U > x = Hx Problem : this costs L’s diagonalisation [O(N3)].
p Solution : we use a poly approx of order p of h : X l h˜(λ) = αl λ ' h(λ). l=1 Indeed, in this case :
p p > X l > X l H˜x =U h˜(Λ)U x =U αl Λ U x = αl L x ' Hx l=1 l=1 ⇒ Only involves matrix-vector multiplications [costs O(pN)].
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 9 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
A few applications
2 > • Tikhonov regularization for denoising : argminf {kf − yk2 + γf Lf }
∗ 2 • Wavelet denoising : argmina{kf − W ak2 + γ kak1,µ}
Slide courtesy of D. Shuman • Compression via filterbanks, etc.
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 10 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Introduction to GSP Graph Fourier Transform Graph filtering
Graph sampling
Application to clustering
What is Spectral Clustering ? Compressive Spectral Clustering A toy experiment Experiments on the SBM
Conclusion
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 11 / 47 How to reconstruct the original signal ?
Basically, we need : 1. a (low-dimensional) model for the signal to sample 2. a method to choose the nodes to sample 3. a “decoder” that exactly recovers the signal given its samples
Introduction to GSP Graph sampling Application to clustering Conclusion
Sampling a graph signal consists in : 1. choosing a subset of nodes 2. measuring the signal on these nodes only
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 12 / 47 How to reconstruct the original signal ?
Basically, we need : 1. a (low-dimensional) model for the signal to sample 2. a method to choose the nodes to sample 3. a “decoder” that exactly recovers the signal given its samples
Introduction to GSP Graph sampling Application to clustering Conclusion
Sampling a graph signal consists in : 1. choosing a subset of nodes 2. measuring the signal on these nodes only
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 12 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Sampling a graph signal consists in : 1. choosing a subset of nodes 2. measuring the signal on these nodes only
How to reconstruct the original signal ?
Basically, we need : 1. a (low-dimensional) model for the signal to sample 2. a method to choose the nodes to sample 3. a “decoder” that exactly recovers the signal given its samples
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 12 / 47 Definition (Bandlimited graph signal [Puy ’15, Chen ’15, Anis ’16, Segarra ’15] ) N k A k-bandlimited signal x ∈ R on G is a signal that satisfies, for someα ˆ ∈ R
x = Uk αˆ,
Introduction to GSP Graph sampling Application to clustering Conclusion
Smoothness assumption
In 1D signal processing, a smooth signal has most of its energy at low frequencies.
1 3
0.9
2.5 0.8
0.7 2 0.6
0.5 1.5
0.4
1 0.3
0.2 0.5 0.1
0 0 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 Smooth signal in time Fourier transform
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 13 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Smoothness assumption
In 1D signal processing, a smooth signal has most of its energy at low frequencies.
1 3
0.9
2.5 0.8
0.7 2 0.6
0.5 1.5
0.4
1 0.3
0.2 0.5 0.1
0 0 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 Smooth signal in time Fourier transform
Definition (Bandlimited graph signal [Puy ’15, Chen ’15, Anis ’16, Segarra ’15] ) N k A k-bandlimited signal x ∈ R on G is a signal that satisfies, for someα ˆ ∈ R
x = Uk αˆ,
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 13 / 47 Sampling procedure : draw n nodes according to p : {ωi }i∈[1,n].
We create a matrix M that measures the signal x only on the selected nodes :
1 if j = ω M := i ij 0 otherwise,
N For any signal x ∈ R on G, its sampled version is y = Mx (it has size n < N).
Introduction to GSP Graph sampling Application to clustering Conclusion
Sampling band-limited graph signals
Preparation :
• Associate to each node i a probability pi to draw this node. N • This defines a probability distribution p ∈ R .
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 14 / 47 We create a matrix M that measures the signal x only on the selected nodes :
1 if j = ω M := i ij 0 otherwise,
N For any signal x ∈ R on G, its sampled version is y = Mx (it has size n < N).
Introduction to GSP Graph sampling Application to clustering Conclusion
Sampling band-limited graph signals
Preparation :
• Associate to each node i a probability pi to draw this node. N • This defines a probability distribution p ∈ R .
Sampling procedure : draw n nodes according to p : {ωi }i∈[1,n].
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 14 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Sampling band-limited graph signals
Preparation :
• Associate to each node i a probability pi to draw this node. N • This defines a probability distribution p ∈ R .
Sampling procedure : draw n nodes according to p : {ωi }i∈[1,n].
We create a matrix M that measures the signal x only on the selected nodes :
1 if j = ω M := i ij 0 otherwise,
N For any signal x ∈ R on G, its sampled version is y = Mx (it has size n < N).
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 14 / 47 | For any signal x, remember that kUk xk2 is the energy of x on the first k frequencies.
Then :
1. For each node i, construct the Dirac δi centered at node i. | | 2. Compute kUk δi k2 (we have 0 6 kUk δi k2 6 1).
• | If kUk δi k2 ≈ 1 : there exists a smooth signal concentrated on node i. ⇒ Node i is important. • | If kUk δi k2 ≈ 0 : no smooth signal has energy concentrated on node i. ⇒ Node i can be sampled with less probability.
Introduction to GSP Graph sampling Application to clustering Conclusion
Optimizing the sampling distribution
Some nodes are more important to sample than others.
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 15 / 47 Then :
1. For each node i, construct the Dirac δi centered at node i. | | 2. Compute kUk δi k2 (we have 0 6 kUk δi k2 6 1).
• | If kUk δi k2 ≈ 1 : there exists a smooth signal concentrated on node i. ⇒ Node i is important. • | If kUk δi k2 ≈ 0 : no smooth signal has energy concentrated on node i. ⇒ Node i can be sampled with less probability.
Introduction to GSP Graph sampling Application to clustering Conclusion
Optimizing the sampling distribution
Some nodes are more important to sample than others.
| For any signal x, remember that kUk xk2 is the energy of x on the first k frequencies.
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 15 / 47 • | If kUk δi k2 ≈ 1 : there exists a smooth signal concentrated on node i. ⇒ Node i is important. • | If kUk δi k2 ≈ 0 : no smooth signal has energy concentrated on node i. ⇒ Node i can be sampled with less probability.
Introduction to GSP Graph sampling Application to clustering Conclusion
Optimizing the sampling distribution
Some nodes are more important to sample than others.
| For any signal x, remember that kUk xk2 is the energy of x on the first k frequencies.
Then :
1. For each node i, construct the Dirac δi centered at node i. | | 2. Compute kUk δi k2 (we have 0 6 kUk δi k2 6 1).
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 15 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Optimizing the sampling distribution
Some nodes are more important to sample than others.
| For any signal x, remember that kUk xk2 is the energy of x on the first k frequencies.
Then :
1. For each node i, construct the Dirac δi centered at node i. | | 2. Compute kUk δi k2 (we have 0 6 kUk δi k2 6 1).
• | If kUk δi k2 ≈ 1 : there exists a smooth signal concentrated on node i. ⇒ Node i is important. • | If kUk δi k2 ≈ 0 : no smooth signal has energy concentrated on node i. ⇒ Node i can be sampled with less probability.
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 15 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
The graph weighted coherence
We measure the quality of p with the graph weighted coherence.
Definition (Graph weighted coherence) n Let p ∈ R represent a sampling distribution on {1,..., N}. The graph weighted coherence of order k for the pair (G, p) is n o k −1/2 | νp := max pi kUk δi k2 . 16i6N
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 16 / 47 √ k ∗ • Let’s minimize νp ! Its lower bound, k, may always be reached for p :
∗ | 2 ∀i ∈ [1, N] pi = kUk δi k2 /k
∗ • With p , one needs n & k log (k) ⇒ up to the log factor, it is optimal ! • We have an efficient algorithm that estimates p∗ in O(pN log N)!
Introduction to GSP Graph sampling Application to clustering Conclusion
How many nodes to select ?
Theorem (Restricted isometry property) Let M be a random subsampling matrix constructed using the sampling distribution p. For any δ, ∈ (0, 1), with probability at least 1 − , 1 2 2 −1/2 2 (1 − δ) kx1 − x2k2 6 MP (x1 − x2) 6 (1 + δ) kx1 − x2k2 n 2
for all x1, x2 ∈ span(Uk ) provided that 3 2k n (νk )2 log . > δ2 p
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 17 / 47 ∗ • With p , one needs n & k log (k) ⇒ up to the log factor, it is optimal ! • We have an efficient algorithm that estimates p∗ in O(pN log N)!
Introduction to GSP Graph sampling Application to clustering Conclusion
How many nodes to select ?
Theorem (Restricted isometry property) Let M be a random subsampling matrix constructed using the sampling distribution p. For any δ, ∈ (0, 1), with probability at least 1 − , 1 2 2 −1/2 2 (1 − δ) kx1 − x2k2 6 MP (x1 − x2) 6 (1 + δ) kx1 − x2k2 n 2
for all x1, x2 ∈ span(Uk ) provided that 3 2k n (νk )2 log . > δ2 p √ k ∗ • Let’s minimize νp ! Its lower bound, k, may always be reached for p :
∗ | 2 ∀i ∈ [1, N] pi = kUk δi k2 /k
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 17 / 47 • We have an efficient algorithm that estimates p∗ in O(pN log N)!
Introduction to GSP Graph sampling Application to clustering Conclusion
How many nodes to select ?
Theorem (Restricted isometry property) Let M be a random subsampling matrix constructed using the sampling distribution p. For any δ, ∈ (0, 1), with probability at least 1 − , 1 2 2 −1/2 2 (1 − δ) kx1 − x2k2 6 MP (x1 − x2) 6 (1 + δ) kx1 − x2k2 n 2
for all x1, x2 ∈ span(Uk ) provided that 3 2k n (νk )2 log . > δ2 p √ k ∗ • Let’s minimize νp ! Its lower bound, k, may always be reached for p :
∗ | 2 ∀i ∈ [1, N] pi = kUk δi k2 /k
∗ • With p , one needs n & k log (k) ⇒ up to the log factor, it is optimal !
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 17 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
How many nodes to select ?
Theorem (Restricted isometry property) Let M be a random subsampling matrix constructed using the sampling distribution p. For any δ, ∈ (0, 1), with probability at least 1 − , 1 2 2 −1/2 2 (1 − δ) kx1 − x2k2 6 MP (x1 − x2) 6 (1 + δ) kx1 − x2k2 n 2
for all x1, x2 ∈ span(Uk ) provided that 3 2k n (νk )2 log . > δ2 p √ k ∗ • Let’s minimize νp ! Its lower bound, k, may always be reached for p :
∗ | 2 ∀i ∈ [1, N] pi = kUk δi k2 /k
∗ • With p , one needs n & k log (k) ⇒ up to the log factor, it is optimal ! • We have an efficient algorithm that estimates p∗ in O(pN log N)!
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 17 / 47 We propose to solve (links with the SSL literature [Chapelle ’10, Fu ’12])
2 min P−1/2 (Mz − y) + γ z |g(L)z, N Ω z∈R 2 where γ > 0 and g : R → R is a nonnegative and nondecreasing poly function.
Introduction to GSP Graph sampling Application to clustering Conclusion
Reconstruction N n We sampled the signal x ∈ R , i.e., we measured y = Mx + n (n ∈ R models noise). The goal is to estimate x from y.
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 18 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Reconstruction N n We sampled the signal x ∈ R , i.e., we measured y = Mx + n (n ∈ R models noise). The goal is to estimate x from y.
We propose to solve (links with the SSL literature [Chapelle ’10, Fu ’12])
2 min P−1/2 (Mz − y) + γ z |g(L)z, N Ω z∈R 2 where γ > 0 and g : R → R is a nonnegative and nondecreasing poly function.
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 18 / 47 It is fast as it involves only matrix-vector multiplications with sparse matrices.
We proved that the result is accurate and stable to noise :
• The quality of the reconstruction depends on the eigengap ratio g(λk )/g(λk+1). • γ should be adjusted with the signal-to-noise ratio. • In absence of noise, the reconstruction quality improves when g(λk )/g(λk+1) → 0 and γ → 0. If g(λk ) = 0 and g(λk+1) > 0, we have exact recovery.
Introduction to GSP Graph sampling Application to clustering Conclusion
Reconstruction
Solving
2 min P−1/2 (Mz − y) + γ z |g(L)z, N Ω z∈R 2 can be done, e.g., by gradient descent or conjugate gradient.
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 19 / 47 We proved that the result is accurate and stable to noise :
• The quality of the reconstruction depends on the eigengap ratio g(λk )/g(λk+1). • γ should be adjusted with the signal-to-noise ratio. • In absence of noise, the reconstruction quality improves when g(λk )/g(λk+1) → 0 and γ → 0. If g(λk ) = 0 and g(λk+1) > 0, we have exact recovery.
Introduction to GSP Graph sampling Application to clustering Conclusion
Reconstruction
Solving
2 min P−1/2 (Mz − y) + γ z |g(L)z, N Ω z∈R 2 can be done, e.g., by gradient descent or conjugate gradient.
It is fast as it involves only matrix-vector multiplications with sparse matrices.
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 19 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Reconstruction
Solving
2 min P−1/2 (Mz − y) + γ z |g(L)z, N Ω z∈R 2 can be done, e.g., by gradient descent or conjugate gradient.
It is fast as it involves only matrix-vector multiplications with sparse matrices.
We proved that the result is accurate and stable to noise :
• The quality of the reconstruction depends on the eigengap ratio g(λk )/g(λk+1). • γ should be adjusted with the signal-to-noise ratio. • In absence of noise, the reconstruction quality improves when g(λk )/g(λk+1) → 0 and γ → 0. If g(λk ) = 0 and g(λk+1) > 0, we have exact recovery.
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 19 / 47 Given a k-bandlimited graph signal x defined on this graph, one can : ∗ | 2 1. estimate the optimal probability distrib pi = kUk δi k2 /k [O(pN log N)] 2. sample n = O(k log k) nodes from this distribution n 3. measure the signal y = Mx ∈ R 4. reconstruct the signal : [O(pN)]
2 −1/2 | xrec = argmin PΩ (Mz − y) + γ z g(L)z N 2 z∈R
Introduction to GSP Graph sampling Application to clustering Conclusion
Recap
Given a graph and its Laplacian matrix L.
Given any graph signal x defined on this graph, one can : 1. filter this signal with any filter h(λ): x h =U h(Λ)U > [O(N3)], Pp l h Pp l 2. fast filter it w/ poly approx h(λ) ' l=1 αl λ : x ' l=1 αl L x [O(pN)].
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 20 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Recap
Given a graph and its Laplacian matrix L.
Given any graph signal x defined on this graph, one can : 1. filter this signal with any filter h(λ): x h =U h(Λ)U > [O(N3)], Pp l h Pp l 2. fast filter it w/ poly approx h(λ) ' l=1 αl λ : x ' l=1 αl L x [O(pN)].
Given a k-bandlimited graph signal x defined on this graph, one can : ∗ | 2 1. estimate the optimal probability distrib pi = kUk δi k2 /k [O(pN log N)] 2. sample n = O(k log k) nodes from this distribution n 3. measure the signal y = Mx ∈ R 4. reconstruct the signal : [O(pN)]
2 −1/2 | xrec = argmin PΩ (Mz − y) + γ z g(L)z N 2 z∈R
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 20 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Introduction to GSP Graph Fourier Transform Graph filtering
Graph sampling
Application to clustering
What is Spectral Clustering ? Compressive Spectral Clustering A toy experiment Experiments on the SBM
Conclusion
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 21 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Application to clustering : What is Spectral Clustering ?
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 22 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Given a series of N objects :
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 23 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Given a series of N objects :
1/ Find adapted descriptors
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 23 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Given a series of N objects :
1/ Find adapted 2/ Cluster descriptors
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 23 / 47 Goal of clustering : assign a label c(i) = 1, ··· , k to each object i in order to organize / simplify / analyze the data.
There exists two different general types of methods :
• methods directly based on the xi and/or ∆ like k-means or hierarchical clustering. • graph-based methods.
Introduction to GSP Graph sampling Application to clustering Conclusion
From the N objects, one creates : • N vectors : x1, x2, ··· , xN N×N • and their distance matrix ∆ ∈ R .
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 24 / 47 There exists two different general types of methods :
• methods directly based on the xi and/or ∆ like k-means or hierarchical clustering. • graph-based methods.
Introduction to GSP Graph sampling Application to clustering Conclusion
From the N objects, one creates : • N vectors : x1, x2, ··· , xN N×N • and their distance matrix ∆ ∈ R .
Goal of clustering : assign a label c(i) = 1, ··· , k to each object i in order to organize / simplify / analyze the data.
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 24 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
From the N objects, one creates : • N vectors : x1, x2, ··· , xN N×N • and their distance matrix ∆ ∈ R .
Goal of clustering : assign a label c(i) = 1, ··· , k to each object i in order to organize / simplify / analyze the data.
There exists two different general types of methods :
• methods directly based on the xi and/or ∆ like k-means or hierarchical clustering. • graph-based methods.
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 24 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Graph construction from the distance matrix ∆
Create a graph G = (V , E): • each node in V is one of the N objects • each pair of nodes (i, j) is connected if ∆(i, j) is small enough.
For example, two connectivity possibilities : • Gaussian kernel: 1. all pairs of nodes are connected with links of weights exp(−∆(i, j)/σ) 2. remove all links of weight inferior to • k nearest neighbors : connect each node to its k nearest neighbors.
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 25 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
The clustering problem now states : Given the graph G representing the similarity between the N objects, find a partition of all nodes into k clusters.
Many methods exist [Fortunato ’10] :
• Modularity (or other cost-function) optimisation methods [Newman ’06] • Random walk methods [Schaub ’12] • Methods inspired from statistical physics [Krzakala ’13], information theory [Rosvall ’08]... • spectral methods • ...
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 26 / 47 Definition : Let us call Dij the spectral clustering distance.
Introduction to GSP Graph sampling Application to clustering Conclusion
The classical spectral clustering (SC) algorithm [Von Luxburg ’06]:
Given the N-node graph G of laplacian matrix L :
1. Compute L’s first k eigenvectors :
Uk = (u1|u2| · · · |uk ) .
k 2. Consider each node i as a point in R :
> fi = Uk δi .
3. Run k-means with the Euclidean distance : Dij = kfi − fj k and obtain k clusters.
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 27 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
The classical spectral clustering (SC) algorithm [Von Luxburg ’06]:
Given the N-node graph G of laplacian matrix L :
1. Compute L’s first k eigenvectors :
Uk = (u1|u2| · · · |uk ) .
k 2. Consider each node i as a point in R :
> fi = Uk δi .
3. Run k-means with the Euclidean distance : Dij = kfi − fj k and obtain k clusters.
Definition : Let us call Dij the spectral clustering distance.
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 27 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
What’s the point of using a graph ?
N points in d = 2 dims. Result with After creating a graph, partial diago- k-means (k=2) on ∆ : nalisation of L and running k-means (k=2) on D :
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 28 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Application to clustering : Compressive Spectral Clustering
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 29 / 47 6 Goal : SC in high dimensions : with N > 10 nodes and/or k > 100.
Contribution : an algorithm that • approximates the true SC solution • with controlled relative error • with a running time in O(k2 log2 k + pN(log(N) + k)).
Introduction to GSP Graph sampling Application to clustering Conclusion
Our goal
Problem: N and/or k large, two main bottlenecks : 1. partial eigendecomposition of (sparse) Laplacian (e.g. restarted Arnoldi) [at least O(k3 + Nk2)] [Chen ’11a] 2. high-dimensional k-means [O(Nk2)].
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 30 / 47 Contribution : an algorithm that • approximates the true SC solution • with controlled relative error • with a running time in O(k2 log2 k + pN(log(N) + k)).
Introduction to GSP Graph sampling Application to clustering Conclusion
Our goal
Problem: N and/or k large, two main bottlenecks : 1. partial eigendecomposition of (sparse) Laplacian (e.g. restarted Arnoldi) [at least O(k3 + Nk2)] [Chen ’11a] 2. high-dimensional k-means [O(Nk2)].
6 Goal : SC in high dimensions : with N > 10 nodes and/or k > 100.
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 30 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Our goal
Problem: N and/or k large, two main bottlenecks : 1. partial eigendecomposition of (sparse) Laplacian (e.g. restarted Arnoldi) [at least O(k3 + Nk2)] [Chen ’11a] 2. high-dimensional k-means [O(Nk2)].
6 Goal : SC in high dimensions : with N > 10 nodes and/or k > 100.
Contribution : an algorithm that • approximates the true SC solution • with controlled relative error • with a running time in O(k2 log2 k + pN(log(N) + k)).
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 30 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Main ideas of Compressive Spectral Clustering (CSC) :
CSC is based on two main observations: > 1. SC does not need explicitly fi = Uk δi , but only Dij = kfi − fj k N 2. each cluster indicator function cj ∈ R is in fact approx. k-bandlimited ! ∀j ∈ [1, k] cj “is close to” span(Uk )
CSC follows 4 steps:
1. Estimate Dij by filtering d random graph signals, 2. Sample n nodes out of the N available ones, r n 3. Run low-dim k-means on these n nodes to obtain cj ∈ R , r 4. Reconstruct each reduced cluster indicator function cj back on the whole graph to obtain cj , as desired.
(Steps 2 to 4 already covered !)
Step 1 : How to estimate Dij without computing Uk ?
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 31 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Remember : the classical spectral clustering algorithm
Given the N-node graph G of Laplacian matrix L :
1. Compute L’s first k eigenvectors : Uk = (u1|u2| · · · |uk ) . k > 2. Consider each node i as a point in R : fi = Uk δi .
3. Run k-means with Dij = ||fi − fj || and obtain k clusters.
Our goal : Estimate Dij without computing exactly Uk .
> Dij = Uk (δi − δj ) > 1.5 = Uk δij 1 )
λ
> ( = Uk U δij k 0.5 k λ h 0 > = U hλ (Λ)U δij k -0.5 0 λ 1 2 k = kHλk δij k . λ
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 32 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Fast filtering [Hammond, ACHA ’11]
1.5 In practice, we use a poly ap- ideal 1 m=100p prox of order p of hλ : m=20p k ( λ ) k 0.5 m=5p λ h p 0 X l h˜ = α λ ' h . λk l λk -0.5 0 λ 1 2 l=1 k λ
Such that : Dij = kHλ δij k = lim H˜ λ δij k p→∞ k
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 33 / 47 N×d Let R = (r1|r2| · · · |rd ) ∈ R be a random Gaussian matrix, i.e. a collection of d random graph signals, with 0 mean and var. 1/d.
We define f˜ = (H˜ R)>δ ∈ d and D˜ = f˜ − f˜ i λk i R ij i j
Theorem (Norm conservation theorem in the case of infinite p) 2 Let > 0, if d > d0 ∼ log N/ , then, with proba > 1 − 1/N, we have : 2 ˜ ∀(i, j) ∈ [1, N] (1 − )Dij 6 Dij 6 (1 + )Dij .
Consequence : to estimate Dij with no partial diagonalisation of L, fast filter only d ∼ log N random signals !
Introduction to GSP Graph sampling Application to clustering Conclusion
Norm conservation result [Tremblay ’16a, Ramasamy 15’]
The spectral distance reads : D = kH δ k = lim H˜ δ ij λk ij p→∞ λk ij
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 34 / 47 Theorem (Norm conservation theorem in the case of infinite p) 2 Let > 0, if d > d0 ∼ log N/ , then, with proba > 1 − 1/N, we have : 2 ˜ ∀(i, j) ∈ [1, N] (1 − )Dij 6 Dij 6 (1 + )Dij .
Consequence : to estimate Dij with no partial diagonalisation of L, fast filter only d ∼ log N random signals !
Introduction to GSP Graph sampling Application to clustering Conclusion
Norm conservation result [Tremblay ’16a, Ramasamy 15’]
The spectral distance reads : D = kH δ k = lim H˜ δ ij λk ij p→∞ λk ij
N×d Let R = (r1|r2| · · · |rd ) ∈ R be a random Gaussian matrix, i.e. a collection of d random graph signals, with 0 mean and var. 1/d.
We define f˜ = (H˜ R)>δ ∈ d and D˜ = f˜ − f˜ i λk i R ij i j
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 34 / 47 Consequence : to estimate Dij with no partial diagonalisation of L, fast filter only d ∼ log N random signals !
Introduction to GSP Graph sampling Application to clustering Conclusion
Norm conservation result [Tremblay ’16a, Ramasamy 15’]
The spectral distance reads : D = kH δ k = lim H˜ δ ij λk ij p→∞ λk ij
N×d Let R = (r1|r2| · · · |rd ) ∈ R be a random Gaussian matrix, i.e. a collection of d random graph signals, with 0 mean and var. 1/d.
We define f˜ = (H˜ R)>δ ∈ d and D˜ = f˜ − f˜ i λk i R ij i j
Theorem (Norm conservation theorem in the case of infinite p) 2 Let > 0, if d > d0 ∼ log N/ , then, with proba > 1 − 1/N, we have : 2 ˜ ∀(i, j) ∈ [1, N] (1 − )Dij 6 Dij 6 (1 + )Dij .
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 34 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Norm conservation result [Tremblay ’16a, Ramasamy 15’]
The spectral distance reads : D = kH δ k = lim H˜ δ ij λk ij p→∞ λk ij
N×d Let R = (r1|r2| · · · |rd ) ∈ R be a random Gaussian matrix, i.e. a collection of d random graph signals, with 0 mean and var. 1/d.
We define f˜ = (H˜ R)>δ ∈ d and D˜ = f˜ − f˜ i λk i R ij i j
Theorem (Norm conservation theorem in the case of infinite p) 2 Let > 0, if d > d0 ∼ log N/ , then, with proba > 1 − 1/N, we have : 2 ˜ ∀(i, j) ∈ [1, N] (1 − )Dij 6 Dij 6 (1 + )Dij .
Consequence : to estimate Dij with no partial diagonalisation of L, fast filter only d ∼ log N random signals !
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 34 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
How to quickly estimate λk , the sole unknown of the fast filtering operation ?
Goal : given a SDP L, estimate its k-th eigenvalue as fast as possible.
We use eigencount techniques [Napoli ’13] (also based on polynomial filtering of random vectors !) : • given the interval [0, b], get an approximation of the number of enclosed eigenvalues.
• And find λk by dichotomy on b.
done in [O(pN log N)]
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 35 / 47 Next steps (sampling) : 4. sample n nodes from p∗ r 5. run k-means on the n associated feature vectors and obtain {cj }j=1:k
6. reconstruct all k indicator functions {cj }j=1:k
r If n ∼ k log k and cj = Mcj , we prove that we control the reconstruction error.
Introduction to GSP Graph sampling Application to clustering Conclusion
The CSC algorithm [Tremblay ’16b, Puy 16’]
1. Estimate λk , the k-th eigenvalue of L. N×d 2. Generate d random graph signals in matrix R ∈ R . ˜ d 3. Filter them with Hλk and treat each node i as a point in R : ˜> > ˜ fi = δi Hλk R.
If d ∼ log N, we prove that D˜ij = kf˜i − f˜j k ' Dij .
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 36 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
The CSC algorithm [Tremblay ’16b, Puy 16’]
1. Estimate λk , the k-th eigenvalue of L. N×d 2. Generate d random graph signals in matrix R ∈ R . ˜ d 3. Filter them with Hλk and treat each node i as a point in R : ˜> > ˜ fi = δi Hλk R.
If d ∼ log N, we prove that D˜ij = kf˜i − f˜j k ' Dij .
Next steps (sampling) : 4. sample n nodes from p∗ r 5. run k-means on the n associated feature vectors and obtain {cj }j=1:k
6. reconstruct all k indicator functions {cj }j=1:k
r If n ∼ k log k and cj = Mcj , we prove that we control the reconstruction error.
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 36 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Application to clustering : A toy experiment
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 37 / 47 > 2 fi = U2 δi ∈ R : Dij :
1.5 200
400 1
600 0.5 800
1000 0 200 400 600 800 1000
Introduction to GSP Graph sampling Application to clustering Conclusion
SC on a toy example
N = 1000, k = 2
Com 1 : Com 2 : 300 nodes 700 nodes } }
200
400
600
800
1000 200 400 600 800 1000
Compute U2 = (u1, u2)
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 38 / 47 Dij :
1.5 200
400 1
600 0.5 800
1000 0 200 400 600 800 1000
Introduction to GSP Graph sampling Application to clustering Conclusion
SC on a toy example > 2 fi = U2 δi ∈ R : N = 1000, k = 2 1
0.5 Com 1 : Com 2 : 300 nodes 700 nodes 0 } } -0.5 200 -1 400 -1 -0.5 0
600
800
1000 200 400 600 800 1000
Compute U2 = (u1, u2)
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 38 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
SC on a toy example > 2 fi = U2 δi ∈ R : N = 1000, k = 2 1
0.5 Com 1 : Com 2 : 300 nodes 700 nodes 0 } } -0.5 200 -1 400 -1 -0.5 0 D : 600 ij
800 1.5 200
1000 400 1 200 400 600 800 1000 600 0.5 800 Compute U2 = (u1, u2) 1000 0 200 400 600 800 1000
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 38 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
SC on a toy example > 2 fi = U2 δi ∈ R : N = 1000, k = 2 1
0.5 k-means Com 1 : Com 2 : 300 nodes 700 nodes 0 } } perf = 0.996 -0.5 200 -1 400 -1 -0.5 0 D : 600 ij
800 1.5 200
1000 400 1 200 400 600 800 1000 600 0.5 800 Compute U2 = (u1, u2) 1000 0 200 400 600 800 1000
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 38 / 47 ˜ 3 fi ∈ R : D˜ij ' Dij :
200 1 400
600 0.5 800
1000 0 200 400 600 800 1000
Introduction to GSP Graph sampling Application to clustering Conclusion
CSC on the same toy example
∗ 1. estimate λ2 and p 2. gen. d = 3 random graph signals 3. 4. 5. 6.
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 39 / 47 D˜ij ' Dij :
200 1 400
600 0.5 800
1000 0 200 400 600 800 1000
Introduction to GSP Graph sampling Application to clustering Conclusion
CSC on the same toy example
˜ 3 fi ∈ R : -0.4
-0.6
-0.8 ∗ 1. estimate λ2 and p -1 2. gen. d = 3 random graph signals -0.5 0 3. low-pass filter them : f˜i 0.5 0.5 0 -0.5 -1 4. 5. 6.
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 39 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
CSC on the same toy example
˜ 3 fi ∈ R : -0.4
-0.6
-0.8 ∗ 1. estimate λ2 and p -1 2. gen. d = 3 random graph signals -0.5 0 3. low-pass filter them : f˜i 0.5 0.5 0 -0.5 -1 4. D˜ij ' Dij : 5. 6. 200 1 400
600 0.5 800
1000 0 200 400 600 800 1000 N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 39 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
CSC on the same toy example
˜ 3 fi ∈ R : -0.4
-0.6
-0.8 ∗ 1. estimate λ2 and p -1 2. gen. d = 3 random graph signals -0.5 0 3. low-pass filter them : f˜i 0.5 0.5 0 -0.5 -1 4. sample n = 3 nodes from p∗ D˜ij ' Dij : 5. 6. 200 1 400
600 0.5 800
1000 0 200 400 600 800 1000 N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 39 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
CSC on the same toy example
˜ 3 fi ∈ R :
-0.6
-0.8 ∗ 1. estimate λ2 and p -1 2. gen. d = 3 random graph signals -0.5 0 3. low-pass filter them : f˜i 0.5 0.5 0 -0.5 -1 4. sample n = 3 nodes from p∗ D˜ij ' Dij : 5. 6. 200 1 400
600 0.5 800
1000 0 200 400 600 800 1000 N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 39 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
CSC on the same toy example
˜ 3 fi ∈ R :
-0.6 k-means
-0.8 ∗ 1. estimate λ2 and p -1 2. gen. d = 3 random graph signals -0.5 0 3. low-pass filter them : f˜i 0.5 0.5 0 -0.5 -1 4. sample n = 3 nodes from p∗ D˜ij ' Dij : 5. run low-dim. k-means 6. 200 1 400
600 0.5 800
1000 0 200 400 600 800 1000 N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 39 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
CSC on the same toy example
˜ 3 fi ∈ R : -0.4
-0.6 after interpolation -0.8 ∗ 1. estimate λ2 and p -1 2. gen. d = 3 random graph signals -0.5 perf = 0.951 0 3. low-pass filter them : f˜i 0.5 0.5 0 -0.5 -1 4. sample n = 3 nodes from p∗ D˜ij ' Dij : 5. run low-dim. k-means 6. reconstruct the result 200 1 400
600 0.5 800
1000 0 200 400 600 800 1000 N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 39 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Application to clustering : Experiments on the SBM
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 40 / 47 Experiments with N = 103, k = 20, s = 16, wrt to different parameters :
1 1 1
SC SC 0.5 0.5 0.5 SC n = k log(k) d = 2 log(n) p = 10 n = 2 k log(k) d = 3 log(n) p = 20 n = 3 k log(k) d = 4 log(n) p = 50 n = 4 k log(k) d = 5 log(n) p = 100 0 0 0 Recovery performance 0 0.05 0.1 ǫ 0.15 Recovery performance 0 0.05 0.1 ǫ 0.15 Recovery performance 0 0.05 0.1 ǫ 0.15 c c c ǫ ǫ ǫ
Introduction to GSP Graph sampling Application to clustering Conclusion
Experiments The Stochastic Block Model (SBM) :
C1 C2 Ck { { { • N nodes and k communities of equal size N/k.
• proba q1 if in same community q1 q2 q2 • proba q2 if not. q2 q2 q1 • define the ratio = q2/q1 • SBM fully defined by and average degree s. • define critical ratio q q √ √ q2 2 1 c = (s − s)/(s + s(k − 1)) [Decelle ’11]
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 41 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Experiments The Stochastic Block Model (SBM) :
C1 C2 Ck { { { • N nodes and k communities of equal size N/k.
• proba q1 if in same community q1 q2 q2 • proba q2 if not. q2 q2 q1 • define the ratio = q2/q1 • SBM fully defined by and average degree s. • define critical ratio q q √ √ q2 2 1 c = (s − s)/(s + s(k − 1)) [Decelle ’11]
Experiments with N = 103, k = 20, s = 16, wrt to different parameters :
1 1 1
SC SC 0.5 0.5 0.5 SC n = k log(k) d = 2 log(n) p = 10 n = 2 k log(k) d = 3 log(n) p = 20 n = 3 k log(k) d = 4 log(n) p = 50 n = 4 k log(k) d = 5 log(n) p = 100 0 0 0 Recovery performance 0 0.05 0.1 ǫ 0.15 Recovery performance 0 0.05 0.1 ǫ 0.15 Recovery performance 0 0.05 0.1 ǫ 0.15 c c c ǫ ǫ ǫ
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 41 / 47 On a real-world graph : Amazon graph with 335000 nodes and 926000 edges :
SC CSC k=250 7h17m, 0.84 1h20m, 0.83 k=500 15h29m, 0.84 3h34m, 0.84 17h36m (eigs) k=1000 + at least 21 h 10h18m, 0.84 for k-means, unknown
Introduction to GSP Graph sampling Application to clustering Conclusion
Experiments −3 With params d = 4 log (k), n = 2k log (k), p = 50, γ = 10 , and = c /4 :
10 5 1 N=10 4, CSC N=10 4, PM 0.95 4 N=10 , SC 10 3 N=10 5, CSC 0.9 N=10 5, PM N=10 5, SC 1 0.85 N=10 6, CSC 10 N=10 6, PM
0.8 N=10 6, SC Computation time (s) Recovery performance 20 50 100 200 20 50 100 200 # of classes k # of classes k PM = Power Method [Lin ’10, Boutsidis ’15]
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 42 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Experiments −3 With params d = 4 log (k), n = 2k log (k), p = 50, γ = 10 , and = c /4 :
10 5 1 N=10 4, CSC N=10 4, PM 0.95 4 N=10 , SC 10 3 N=10 5, CSC 0.9 N=10 5, PM N=10 5, SC 1 0.85 N=10 6, CSC 10 N=10 6, PM
0.8 N=10 6, SC Computation time (s) Recovery performance 20 50 100 200 20 50 100 200 # of classes k # of classes k PM = Power Method [Lin ’10, Boutsidis ’15]
On a real-world graph : Amazon graph with 335000 nodes and 926000 edges :
SC CSC k=250 7h17m, 0.84 1h20m, 0.83 k=500 15h29m, 0.84 3h34m, 0.84 17h36m (eigs) k=1000 + at least 21 h 10h18m, 0.84 for k-means, unknown
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 42 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Conclusion
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 43 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Two main ideas
Low-pass graph fast filtering of random signals : a way to by-pass the Laplacian’s diagonalisation for learning tasks.
Cluster indicator functions live in a low-dimensional space (are k-bandlimited) : we can use sampling schemes to recover them efficiently.
Details of this work are found in :
(Sampling part) (Clustering part)
Random sampling of bandlimited Compressive Spectral Clustering, signals on graphs, ACHA 2016. ICML 2016.
A MATLAB toolbox is available at : A MATLAB toolbox is available at : grsamplingbox.gforge.inria.fr cscbox.gforge.inria.fr
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 44 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Links with literature
Low-rank approximation : Nystrom methods [Sun ’15], leverage scores [Mahoney ’11]
Machine Learning : semi-supervised learning [Chapelle ’10], active learning [Fu ’12, Gadde ’14], coresets [Har-Peled ’04, Frahling ’08]
Compressed sensing : variable density sampling [Puy ’11]
Other fast approximate SC algorithms: [Lin ’10, Fowlkes ’04, Wang ’09, Chen ’11a, Chen ’11b]
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 45 / 47 Perspectives
1. Rational filters instead of polynomial filters ? [Shi ’15, Isufi ’16] 2. Smoother filters for better approximation ? [Sakiyama ’16] 3. How about if nodes are added one by one ? 4. SBMO ! [cf. E. Kaufmann] 5. Experiments shown were done with L = I − D−1/2WD−1/2. Test for L = D1−2α ˆ − D−αˆ WD−αˆ ! [cf. R. Couillet]
Introduction to GSP Graph sampling Application to clustering Conclusion
Perspectives and difficult questions
Two difficult questions (among others) :
1. Given a SDP matrix, how to estimate as fast as possible its k-th eigenvalue, and only that one ? 2. How to choose automatically the appropriate polynomial order p ?
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 46 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion
Perspectives and difficult questions
Two difficult questions (among others) :
1. Given a SDP matrix, how to estimate as fast as possible its k-th eigenvalue, and only that one ? 2. How to choose automatically the appropriate polynomial order p ?
Perspectives
1. Rational filters instead of polynomial filters ? [Shi ’15, Isufi ’16] 2. Smoother filters for better approximation ? [Sakiyama ’16] 3. How about if nodes are added one by one ? 4. SBMO ! [cf. E. Kaufmann] 5. Experiments shown were done with L = I − D−1/2WD−1/2. Test for L = D1−2α ˆ − D−αˆ WD−αˆ ! [cf. R. Couillet]
N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 46 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion [Ramasamy ’15] Compressive spectral embedding : sidestepping . . . NIPS. [Fortunato ’10] Community detection in graphs, Physics Reports [Newman ’06] Modularity and community structure in networks, PNAS [Schaub ’12] Markov dynamics as a zooming lens for multiscale . . . , Plos One [Krzakala ’13] Spectral redemption : clustering sparse networks, PNAS [Rosvall ’08] Maps of random walks on complex networks reveal . . . , PLOS One [Von Luxburg ’06] A tutorial on spectral clustering, Statistics and Computing. [Chen ’11a] Parallel spectral clustering in distributed systems, IEEE TPAMI [Lin ’10] Power iteration clustering, ICML [Boutsidis ’15] Spectral clustering via the power method - provably, ICML [Fowlkes ’04] Spectral grouping using the nystrom method, IEEE TPAMI [Wang ’09] Approximate spectral clustering, AKDDM [Chen ’11b] Large scale spectral clustering with landmark-based . . . , CAI [Shuman ’13] The emerging field of signal processing on graphs . . . , SPMag [Hammond ’11] Wavelets on graphs via spectral graph theory, ACHA [Napoli ’13] Efficient estimation of eigenvalue counts in an interval, arXiv [Tremblay ’16a] Accelerated spectral clustering using graph. . . , ICASSP [Tremblay ’16b] Compressive spectral clustering, ICML [Puy ’16] Random sampling of bandlimited signals . . . , ACHA [Shi ’15] Infinite impulse response graph filters in wireless sensor networks, SPL [Chen ’15] Discrete Signal Processing on Graphs : Sampling Theory, TSP [Anis ’16] Efficient Sampling Set Selection for Bandlimited Graph . . . , TSP [Segarra ’15] Sampling of graph signals with successive local aggregations, TSP [Chapelle ’10] Semi-Supervised Learning, The MIT Press [Fu ’12] A survey on instance selection for active learning, KIS [Mahoney ’11] Randomized algorithms for matrices and data, Found. and Trends in ML [Sun ’15] A review of Nystr¨ommethods for large-scale machine learning, Inf. Fus. [Puy ’11] On variable density compressive sampling, SPL [Gadde ’14] Active Semi-supervised Learning Using Sampling Theory. . . , SIGKDD [Isufi ’16] Distributed Time-Varying Graph Filtering, ArXiv [Sakiyama ’16] Sp. Gr. Wav. and Filter Banks with Low Approximation Error, not published yet N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 47 / 47