<<

Filtering and Sampling Graph Signals, and its Application to Compressive Spectral Clustering

Nicolas Tremblay(1,2), Gilles Puy(1), R´emiGribonval(1), Pierre Vandergheynst(1,2)

(1) PANAMA Team, INRIA Rennes, France (2) Laboratory 2, EPFL, Switzerland Introduction to GSP Graph sampling Application to clustering Conclusion

Why graph signal processing ?

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 1 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Introduction to GSP Graph Graph filtering

Graph sampling

Application to clustering

What is Spectral Clustering ? Compressive Spectral Clustering A toy experiment Experiments on the SBM

Conclusion

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 2 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Introduction to graph signal processing : graph Fourier transform

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 3 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

What’s a graph signal ?

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 4 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Three useful matrices

The adjacency matrix : The degree matrix :

0 1 1 0 2 0 0 0 1 0 1 1 0 3 0 0 W =   S =   1 1 0 0 0 0 2 0 0 1 0 0 0 0 0 1

The :

 2 −1 −1 0  −1 3 −1 −1 L = S − W =   −1 −1 2 0  0 −1 0 1

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 5 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Three useful matrices

The adjacency matrix : The degree matrix :

 0 .5 .5 0 1 0 0 0 .5 0 .5 4 0 5 0 0 W =   S =   .5 .5 0 0 0 0 1 0 0 4 0 0 0 0 0 4

The Laplacian matrix :

 1 −.5 −.5 0  −.5 5 −.5 −4 L = S − W =   −.5 −.5 1 0  0 −4 0 4

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 5 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

What’s a graph Fourier transform ? [Hammond ’11]

• U is the Fourier basis of the graph • the Fourier transform of a signal x reads :

> L = S − W =UΛU xˆ =U >x

• Λ = Diag(λ1, λ2, ··· , λN ) the spectrum

A low frequency Fourier mode A high frequency Fourier mode

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 6 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

The graph Fourier transform encodes the structure of the graph

Slide courtesy of D. Shuman

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 7 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Introduction to graph signal processing : filtering graph signals

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 8 / 47 Problem : this costs L’s diagonalisation [O(N3)].

p Solution : we use a poly approx of order p of h : X l h˜(λ) = αl λ ' h(λ). l=1 Indeed, in this case :

p p > X l > X l H˜x =U h˜(Λ)U x =U αl Λ U x = αl L x ' Hx l=1 l=1 ⇒ Only involves matrix-vector multiplications [costs O(pN)].

Introduction to GSP Graph sampling Application to clustering Conclusion

Graph filtering

1

0.8 Given a filter function h defined in the 0.6 g( λ ) Fourier space. h 0.4 0.2

0 1 2 λ

In the node space, the signal x filtered by h reads : x h =U h(Λ)U > x = Hx

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 9 / 47 p Solution : we use a poly approx of order p of h : X l h˜(λ) = αl λ ' h(λ). l=1 Indeed, in this case :

p p > X l > X l H˜x =U h˜(Λ)U x =U αl Λ U x = αl L x ' Hx l=1 l=1 ⇒ Only involves matrix-vector multiplications [costs O(pN)].

Introduction to GSP Graph sampling Application to clustering Conclusion

Graph filtering

1

0.8 Given a filter function h defined in the 0.6 g( λ ) Fourier space. h 0.4 0.2

0 1 2 λ

In the node space, the signal x filtered by h reads : x h =U h(Λ)U > x = Hx Problem : this costs L’s diagonalisation [O(N3)].

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 9 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Graph filtering

1

0.8 Given a filter function h defined in the 0.6 g( λ ) Fourier space. h 0.4 0.2

0 1 2 λ

In the node space, the signal x filtered by h reads : x h =U h(Λ)U > x = Hx Problem : this costs L’s diagonalisation [O(N3)].

p Solution : we use a poly approx of order p of h : X l h˜(λ) = αl λ ' h(λ). l=1 Indeed, in this case :

p p > X l > X l H˜x =U h˜(Λ)U x =U αl Λ U x = αl L x ' Hx l=1 l=1 ⇒ Only involves matrix-vector multiplications [costs O(pN)].

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 9 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

A few applications

2 > • Tikhonov regularization for denoising : argminf {kf − yk2 + γf Lf }

∗ 2 • Wavelet denoising : argmina{kf − W ak2 + γ kak1,µ}

Slide courtesy of D. Shuman • Compression via filterbanks, etc.

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 10 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Introduction to GSP Graph Fourier Transform Graph filtering

Graph sampling

Application to clustering

What is Spectral Clustering ? Compressive Spectral Clustering A toy experiment Experiments on the SBM

Conclusion

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 11 / 47 How to reconstruct the original signal ?

Basically, we need : 1. a (low-dimensional) model for the signal to sample 2. a method to choose the nodes to sample 3. a “decoder” that exactly recovers the signal given its samples

Introduction to GSP Graph sampling Application to clustering Conclusion

Sampling a graph signal consists in : 1. choosing a subset of nodes 2. measuring the signal on these nodes only

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 12 / 47 How to reconstruct the original signal ?

Basically, we need : 1. a (low-dimensional) model for the signal to sample 2. a method to choose the nodes to sample 3. a “decoder” that exactly recovers the signal given its samples

Introduction to GSP Graph sampling Application to clustering Conclusion

Sampling a graph signal consists in : 1. choosing a subset of nodes 2. measuring the signal on these nodes only

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 12 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Sampling a graph signal consists in : 1. choosing a subset of nodes 2. measuring the signal on these nodes only

How to reconstruct the original signal ?

Basically, we need : 1. a (low-dimensional) model for the signal to sample 2. a method to choose the nodes to sample 3. a “decoder” that exactly recovers the signal given its samples

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 12 / 47 Definition (Bandlimited graph signal [Puy ’15, Chen ’15, Anis ’16, Segarra ’15] ) N k A k-bandlimited signal x ∈ R on G is a signal that satisfies, for someα ˆ ∈ R

x = Uk αˆ,

Introduction to GSP Graph sampling Application to clustering Conclusion

Smoothness assumption

In 1D signal processing, a smooth signal has most of its energy at low frequencies.

1 3

0.9

2.5 0.8

0.7 2 0.6

0.5 1.5

0.4

1 0.3

0.2 0.5 0.1

0 0 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 Smooth signal in time Fourier transform

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 13 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Smoothness assumption

In 1D signal processing, a smooth signal has most of its energy at low frequencies.

1 3

0.9

2.5 0.8

0.7 2 0.6

0.5 1.5

0.4

1 0.3

0.2 0.5 0.1

0 0 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 Smooth signal in time Fourier transform

Definition (Bandlimited graph signal [Puy ’15, Chen ’15, Anis ’16, Segarra ’15] ) N k A k-bandlimited signal x ∈ R on G is a signal that satisfies, for someα ˆ ∈ R

x = Uk αˆ,

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 13 / 47 Sampling procedure : draw n nodes according to p : {ωi }i∈[1,n].

We create a matrix M that measures the signal x only on the selected nodes :

1 if j = ω M := i ij 0 otherwise,

N For any signal x ∈ R on G, its sampled version is y = Mx (it has size n < N).

Introduction to GSP Graph sampling Application to clustering Conclusion

Sampling band-limited graph signals

Preparation :

• Associate to each node i a probability pi to draw this node. N • This defines a probability distribution p ∈ R .

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 14 / 47 We create a matrix M that measures the signal x only on the selected nodes :

1 if j = ω M := i ij 0 otherwise,

N For any signal x ∈ R on G, its sampled version is y = Mx (it has size n < N).

Introduction to GSP Graph sampling Application to clustering Conclusion

Sampling band-limited graph signals

Preparation :

• Associate to each node i a probability pi to draw this node. N • This defines a probability distribution p ∈ R .

Sampling procedure : draw n nodes according to p : {ωi }i∈[1,n].

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 14 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Sampling band-limited graph signals

Preparation :

• Associate to each node i a probability pi to draw this node. N • This defines a probability distribution p ∈ R .

Sampling procedure : draw n nodes according to p : {ωi }i∈[1,n].

We create a matrix M that measures the signal x only on the selected nodes :

1 if j = ω M := i ij 0 otherwise,

N For any signal x ∈ R on G, its sampled version is y = Mx (it has size n < N).

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 14 / 47 | For any signal x, remember that kUk xk2 is the energy of x on the first k frequencies.

Then :

1. For each node i, construct the Dirac δi centered at node i. | | 2. Compute kUk δi k2 (we have 0 6 kUk δi k2 6 1).

• | If kUk δi k2 ≈ 1 : there exists a smooth signal concentrated on node i. ⇒ Node i is important. • | If kUk δi k2 ≈ 0 : no smooth signal has energy concentrated on node i. ⇒ Node i can be sampled with less probability.

Introduction to GSP Graph sampling Application to clustering Conclusion

Optimizing the sampling distribution

Some nodes are more important to sample than others.

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 15 / 47 Then :

1. For each node i, construct the Dirac δi centered at node i. | | 2. Compute kUk δi k2 (we have 0 6 kUk δi k2 6 1).

• | If kUk δi k2 ≈ 1 : there exists a smooth signal concentrated on node i. ⇒ Node i is important. • | If kUk δi k2 ≈ 0 : no smooth signal has energy concentrated on node i. ⇒ Node i can be sampled with less probability.

Introduction to GSP Graph sampling Application to clustering Conclusion

Optimizing the sampling distribution

Some nodes are more important to sample than others.

| For any signal x, remember that kUk xk2 is the energy of x on the first k frequencies.

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 15 / 47 • | If kUk δi k2 ≈ 1 : there exists a smooth signal concentrated on node i. ⇒ Node i is important. • | If kUk δi k2 ≈ 0 : no smooth signal has energy concentrated on node i. ⇒ Node i can be sampled with less probability.

Introduction to GSP Graph sampling Application to clustering Conclusion

Optimizing the sampling distribution

Some nodes are more important to sample than others.

| For any signal x, remember that kUk xk2 is the energy of x on the first k frequencies.

Then :

1. For each node i, construct the Dirac δi centered at node i. | | 2. Compute kUk δi k2 (we have 0 6 kUk δi k2 6 1).

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 15 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Optimizing the sampling distribution

Some nodes are more important to sample than others.

| For any signal x, remember that kUk xk2 is the energy of x on the first k frequencies.

Then :

1. For each node i, construct the Dirac δi centered at node i. | | 2. Compute kUk δi k2 (we have 0 6 kUk δi k2 6 1).

• | If kUk δi k2 ≈ 1 : there exists a smooth signal concentrated on node i. ⇒ Node i is important. • | If kUk δi k2 ≈ 0 : no smooth signal has energy concentrated on node i. ⇒ Node i can be sampled with less probability.

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 15 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

The graph weighted coherence

We measure the quality of p with the graph weighted coherence.

Definition (Graph weighted coherence) n Let p ∈ R represent a sampling distribution on {1,..., N}. The graph weighted coherence of order k for the pair (G, p) is n o k −1/2 | νp := max pi kUk δi k2 . 16i6N

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 16 / 47 √ k ∗ • Let’s minimize νp ! Its lower bound, k, may always be reached for p :

∗ | 2 ∀i ∈ [1, N] pi = kUk δi k2 /k

∗ • With p , one needs n & k log (k) ⇒ up to the log factor, it is optimal ! • We have an efficient algorithm that estimates p∗ in O(pN log N)!

Introduction to GSP Graph sampling Application to clustering Conclusion

How many nodes to select ?

Theorem (Restricted isometry property) Let M be a random subsampling matrix constructed using the sampling distribution p. For any δ,  ∈ (0, 1), with probability at least 1 − , 1 2 2 −1/2 2 (1 − δ) kx1 − x2k2 6 MP (x1 − x2) 6 (1 + δ) kx1 − x2k2 n 2

for all x1, x2 ∈ span(Uk ) provided that 3  2k  n (νk )2 log . > δ2 p 

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 17 / 47 ∗ • With p , one needs n & k log (k) ⇒ up to the log factor, it is optimal ! • We have an efficient algorithm that estimates p∗ in O(pN log N)!

Introduction to GSP Graph sampling Application to clustering Conclusion

How many nodes to select ?

Theorem (Restricted isometry property) Let M be a random subsampling matrix constructed using the sampling distribution p. For any δ,  ∈ (0, 1), with probability at least 1 − , 1 2 2 −1/2 2 (1 − δ) kx1 − x2k2 6 MP (x1 − x2) 6 (1 + δ) kx1 − x2k2 n 2

for all x1, x2 ∈ span(Uk ) provided that 3  2k  n (νk )2 log . > δ2 p  √ k ∗ • Let’s minimize νp ! Its lower bound, k, may always be reached for p :

∗ | 2 ∀i ∈ [1, N] pi = kUk δi k2 /k

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 17 / 47 • We have an efficient algorithm that estimates p∗ in O(pN log N)!

Introduction to GSP Graph sampling Application to clustering Conclusion

How many nodes to select ?

Theorem (Restricted isometry property) Let M be a random subsampling matrix constructed using the sampling distribution p. For any δ,  ∈ (0, 1), with probability at least 1 − , 1 2 2 −1/2 2 (1 − δ) kx1 − x2k2 6 MP (x1 − x2) 6 (1 + δ) kx1 − x2k2 n 2

for all x1, x2 ∈ span(Uk ) provided that 3  2k  n (νk )2 log . > δ2 p  √ k ∗ • Let’s minimize νp ! Its lower bound, k, may always be reached for p :

∗ | 2 ∀i ∈ [1, N] pi = kUk δi k2 /k

∗ • With p , one needs n & k log (k) ⇒ up to the log factor, it is optimal !

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 17 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

How many nodes to select ?

Theorem (Restricted isometry property) Let M be a random subsampling matrix constructed using the sampling distribution p. For any δ,  ∈ (0, 1), with probability at least 1 − , 1 2 2 −1/2 2 (1 − δ) kx1 − x2k2 6 MP (x1 − x2) 6 (1 + δ) kx1 − x2k2 n 2

for all x1, x2 ∈ span(Uk ) provided that 3  2k  n (νk )2 log . > δ2 p  √ k ∗ • Let’s minimize νp ! Its lower bound, k, may always be reached for p :

∗ | 2 ∀i ∈ [1, N] pi = kUk δi k2 /k

∗ • With p , one needs n & k log (k) ⇒ up to the log factor, it is optimal ! • We have an efficient algorithm that estimates p∗ in O(pN log N)!

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 17 / 47 We propose to solve (links with the SSL literature [Chapelle ’10, Fu ’12])

2 min P−1/2 (Mz − y) + γ z |g(L)z, N Ω z∈R 2 where γ > 0 and g : R → R is a nonnegative and nondecreasing poly function.

Introduction to GSP Graph sampling Application to clustering Conclusion

Reconstruction N n We sampled the signal x ∈ R , i.e., we measured y = Mx + n (n ∈ R models noise). The goal is to estimate x from y.

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 18 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Reconstruction N n We sampled the signal x ∈ R , i.e., we measured y = Mx + n (n ∈ R models noise). The goal is to estimate x from y.

We propose to solve (links with the SSL literature [Chapelle ’10, Fu ’12])

2 min P−1/2 (Mz − y) + γ z |g(L)z, N Ω z∈R 2 where γ > 0 and g : R → R is a nonnegative and nondecreasing poly function.

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 18 / 47 It is fast as it involves only matrix-vector multiplications with sparse matrices.

We proved that the result is accurate and stable to noise :

• The quality of the reconstruction depends on the eigengap ratio g(λk )/g(λk+1). • γ should be adjusted with the signal-to-noise ratio. • In absence of noise, the reconstruction quality improves when g(λk )/g(λk+1) → 0 and γ → 0. If g(λk ) = 0 and g(λk+1) > 0, we have exact recovery.

Introduction to GSP Graph sampling Application to clustering Conclusion

Reconstruction

Solving

2 min P−1/2 (Mz − y) + γ z |g(L)z, N Ω z∈R 2 can be done, e.g., by gradient descent or conjugate gradient.

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 19 / 47 We proved that the result is accurate and stable to noise :

• The quality of the reconstruction depends on the eigengap ratio g(λk )/g(λk+1). • γ should be adjusted with the signal-to-noise ratio. • In absence of noise, the reconstruction quality improves when g(λk )/g(λk+1) → 0 and γ → 0. If g(λk ) = 0 and g(λk+1) > 0, we have exact recovery.

Introduction to GSP Graph sampling Application to clustering Conclusion

Reconstruction

Solving

2 min P−1/2 (Mz − y) + γ z |g(L)z, N Ω z∈R 2 can be done, e.g., by gradient descent or conjugate gradient.

It is fast as it involves only matrix-vector multiplications with sparse matrices.

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 19 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Reconstruction

Solving

2 min P−1/2 (Mz − y) + γ z |g(L)z, N Ω z∈R 2 can be done, e.g., by gradient descent or conjugate gradient.

It is fast as it involves only matrix-vector multiplications with sparse matrices.

We proved that the result is accurate and stable to noise :

• The quality of the reconstruction depends on the eigengap ratio g(λk )/g(λk+1). • γ should be adjusted with the signal-to-noise ratio. • In absence of noise, the reconstruction quality improves when g(λk )/g(λk+1) → 0 and γ → 0. If g(λk ) = 0 and g(λk+1) > 0, we have exact recovery.

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 19 / 47 Given a k-bandlimited graph signal x defined on this graph, one can : ∗ | 2 1. estimate the optimal probability distrib pi = kUk δi k2 /k [O(pN log N)] 2. sample n = O(k log k) nodes from this distribution n 3. measure the signal y = Mx ∈ R 4. reconstruct the signal : [O(pN)]

2 −1/2 | xrec = argmin PΩ (Mz − y) + γ z g(L)z N 2 z∈R

Introduction to GSP Graph sampling Application to clustering Conclusion

Recap

Given a graph and its Laplacian matrix L.

Given any graph signal x defined on this graph, one can : 1. filter this signal with any filter h(λ): x h =U h(Λ)U > [O(N3)], Pp l h Pp l 2. fast filter it w/ poly approx h(λ) ' l=1 αl λ : x ' l=1 αl L x [O(pN)].

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 20 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Recap

Given a graph and its Laplacian matrix L.

Given any graph signal x defined on this graph, one can : 1. filter this signal with any filter h(λ): x h =U h(Λ)U > [O(N3)], Pp l h Pp l 2. fast filter it w/ poly approx h(λ) ' l=1 αl λ : x ' l=1 αl L x [O(pN)].

Given a k-bandlimited graph signal x defined on this graph, one can : ∗ | 2 1. estimate the optimal probability distrib pi = kUk δi k2 /k [O(pN log N)] 2. sample n = O(k log k) nodes from this distribution n 3. measure the signal y = Mx ∈ R 4. reconstruct the signal : [O(pN)]

2 −1/2 | xrec = argmin PΩ (Mz − y) + γ z g(L)z N 2 z∈R

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 20 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Introduction to GSP Graph Fourier Transform Graph filtering

Graph sampling

Application to clustering

What is Spectral Clustering ? Compressive Spectral Clustering A toy experiment Experiments on the SBM

Conclusion

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 21 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Application to clustering : What is Spectral Clustering ?

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 22 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Given a series of N objects :

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 23 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Given a series of N objects :

1/ Find adapted descriptors

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 23 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Given a series of N objects :

1/ Find adapted 2/ Cluster descriptors

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 23 / 47 Goal of clustering : assign a label c(i) = 1, ··· , k to each object i in order to organize / simplify / analyze the data.

There exists two different general types of methods :

• methods directly based on the xi and/or ∆ like k-means or hierarchical clustering. • graph-based methods.

Introduction to GSP Graph sampling Application to clustering Conclusion

From the N objects, one creates : • N vectors : x1, x2, ··· , xN N×N • and their distance matrix ∆ ∈ R .

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 24 / 47 There exists two different general types of methods :

• methods directly based on the xi and/or ∆ like k-means or hierarchical clustering. • graph-based methods.

Introduction to GSP Graph sampling Application to clustering Conclusion

From the N objects, one creates : • N vectors : x1, x2, ··· , xN N×N • and their distance matrix ∆ ∈ R .

Goal of clustering : assign a label c(i) = 1, ··· , k to each object i in order to organize / simplify / analyze the data.

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 24 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

From the N objects, one creates : • N vectors : x1, x2, ··· , xN N×N • and their distance matrix ∆ ∈ R .

Goal of clustering : assign a label c(i) = 1, ··· , k to each object i in order to organize / simplify / analyze the data.

There exists two different general types of methods :

• methods directly based on the xi and/or ∆ like k-means or hierarchical clustering. • graph-based methods.

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 24 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Graph construction from the distance matrix ∆

Create a graph G = (V , E): • each node in V is one of the N objects • each pair of nodes (i, j) is connected if ∆(i, j) is small enough.

For example, two connectivity possibilities : • Gaussian kernel: 1. all pairs of nodes are connected with links of weights exp(−∆(i, j)/σ) 2. remove all links of weight inferior to  • k nearest neighbors : connect each node to its k nearest neighbors.

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 25 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

The clustering problem now states : Given the graph G representing the similarity between the N objects, find a partition of all nodes into k clusters.

Many methods exist [Fortunato ’10] :

• Modularity (or other cost-function) optimisation methods [Newman ’06] • Random walk methods [Schaub ’12] • Methods inspired from statistical physics [Krzakala ’13], information theory [Rosvall ’08]... • spectral methods • ...

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 26 / 47 Definition : Let us call Dij the spectral clustering distance.

Introduction to GSP Graph sampling Application to clustering Conclusion

The classical spectral clustering (SC) algorithm [Von Luxburg ’06]:

Given the N-node graph G of laplacian matrix L :

1. Compute L’s first k eigenvectors :

Uk = (u1|u2| · · · |uk ) .

k 2. Consider each node i as a point in R :

> fi = Uk δi .

3. Run k-means with the Euclidean distance : Dij = kfi − fj k and obtain k clusters.

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 27 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

The classical spectral clustering (SC) algorithm [Von Luxburg ’06]:

Given the N-node graph G of laplacian matrix L :

1. Compute L’s first k eigenvectors :

Uk = (u1|u2| · · · |uk ) .

k 2. Consider each node i as a point in R :

> fi = Uk δi .

3. Run k-means with the Euclidean distance : Dij = kfi − fj k and obtain k clusters.

Definition : Let us call Dij the spectral clustering distance.

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 27 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

What’s the point of using a graph ?

N points in d = 2 dims. Result with After creating a graph, partial diago- k-means (k=2) on ∆ : nalisation of L and running k-means (k=2) on D :

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 28 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Application to clustering : Compressive Spectral Clustering

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 29 / 47 6 Goal : SC in high dimensions : with N > 10 nodes and/or k > 100.

Contribution : an algorithm that • approximates the true SC solution • with controlled relative error • with a running time in O(k2 log2 k + pN(log(N) + k)).

Introduction to GSP Graph sampling Application to clustering Conclusion

Our goal

Problem: N and/or k large, two main bottlenecks : 1. partial eigendecomposition of (sparse) Laplacian (e.g. restarted Arnoldi) [at least O(k3 + Nk2)] [Chen ’11a] 2. high-dimensional k-means [O(Nk2)].

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 30 / 47 Contribution : an algorithm that • approximates the true SC solution • with controlled relative error • with a running time in O(k2 log2 k + pN(log(N) + k)).

Introduction to GSP Graph sampling Application to clustering Conclusion

Our goal

Problem: N and/or k large, two main bottlenecks : 1. partial eigendecomposition of (sparse) Laplacian (e.g. restarted Arnoldi) [at least O(k3 + Nk2)] [Chen ’11a] 2. high-dimensional k-means [O(Nk2)].

6 Goal : SC in high dimensions : with N > 10 nodes and/or k > 100.

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 30 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Our goal

Problem: N and/or k large, two main bottlenecks : 1. partial eigendecomposition of (sparse) Laplacian (e.g. restarted Arnoldi) [at least O(k3 + Nk2)] [Chen ’11a] 2. high-dimensional k-means [O(Nk2)].

6 Goal : SC in high dimensions : with N > 10 nodes and/or k > 100.

Contribution : an algorithm that • approximates the true SC solution • with controlled relative error • with a running time in O(k2 log2 k + pN(log(N) + k)).

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 30 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Main ideas of Compressive Spectral Clustering (CSC) :

CSC is based on two main observations: > 1. SC does not need explicitly fi = Uk δi , but only Dij = kfi − fj k N 2. each cluster indicator function cj ∈ R is in fact approx. k-bandlimited ! ∀j ∈ [1, k] cj “is close to” span(Uk )

CSC follows 4 steps:

1. Estimate Dij by filtering d random graph signals, 2. Sample n nodes out of the N available ones, r n 3. Run low-dim k-means on these n nodes to obtain cj ∈ R , r 4. Reconstruct each reduced cluster indicator function cj back on the whole graph to obtain cj , as desired.

(Steps 2 to 4 already covered !)

Step 1 : How to estimate Dij without computing Uk ?

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 31 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Remember : the classical spectral clustering algorithm

Given the N-node graph G of Laplacian matrix L :

1. Compute L’s first k eigenvectors : Uk = (u1|u2| · · · |uk ) . k > 2. Consider each node i as a point in R : fi = Uk δi .

3. Run k-means with Dij = ||fi − fj || and obtain k clusters.

Our goal : Estimate Dij without computing exactly Uk .

> Dij = Uk (δi − δj ) > 1.5 = Uk δij 1 )

λ

> ( = Uk U δij k 0.5 k λ h 0 > = U hλ (Λ)U δij k -0.5 0 λ 1 2 k = kHλk δij k . λ

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 32 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Fast filtering [Hammond, ACHA ’11]

1.5 In practice, we use a poly ap- ideal 1 m=100p prox of order p of hλ : m=20p k ( λ ) k 0.5 m=5p λ h p 0 X l h˜ = α λ ' h . λk l λk -0.5 0 λ 1 2 l=1 k λ

Such that : Dij = kHλ δij k = lim H˜ λ δij k p→∞ k

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 33 / 47 N×d Let R = (r1|r2| · · · |rd ) ∈ R be a random Gaussian matrix, i.e. a collection of d random graph signals, with 0 mean and var. 1/d.

We define f˜ = (H˜ R)>δ ∈ d and D˜ = f˜ − f˜ i λk i R ij i j

Theorem (Norm conservation theorem in the case of infinite p) 2 Let  > 0, if d > d0 ∼ log N/ , then, with proba > 1 − 1/N, we have : 2 ˜ ∀(i, j) ∈ [1, N] (1 − )Dij 6 Dij 6 (1 + )Dij .

Consequence : to estimate Dij with no partial diagonalisation of L, fast filter only d ∼ log N random signals !

Introduction to GSP Graph sampling Application to clustering Conclusion

Norm conservation result [Tremblay ’16a, Ramasamy 15’]

The spectral distance reads : D = kH δ k = lim H˜ δ ij λk ij p→∞ λk ij

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 34 / 47 Theorem (Norm conservation theorem in the case of infinite p) 2 Let  > 0, if d > d0 ∼ log N/ , then, with proba > 1 − 1/N, we have : 2 ˜ ∀(i, j) ∈ [1, N] (1 − )Dij 6 Dij 6 (1 + )Dij .

Consequence : to estimate Dij with no partial diagonalisation of L, fast filter only d ∼ log N random signals !

Introduction to GSP Graph sampling Application to clustering Conclusion

Norm conservation result [Tremblay ’16a, Ramasamy 15’]

The spectral distance reads : D = kH δ k = lim H˜ δ ij λk ij p→∞ λk ij

N×d Let R = (r1|r2| · · · |rd ) ∈ R be a random Gaussian matrix, i.e. a collection of d random graph signals, with 0 mean and var. 1/d.

We define f˜ = (H˜ R)>δ ∈ d and D˜ = f˜ − f˜ i λk i R ij i j

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 34 / 47 Consequence : to estimate Dij with no partial diagonalisation of L, fast filter only d ∼ log N random signals !

Introduction to GSP Graph sampling Application to clustering Conclusion

Norm conservation result [Tremblay ’16a, Ramasamy 15’]

The spectral distance reads : D = kH δ k = lim H˜ δ ij λk ij p→∞ λk ij

N×d Let R = (r1|r2| · · · |rd ) ∈ R be a random Gaussian matrix, i.e. a collection of d random graph signals, with 0 mean and var. 1/d.

We define f˜ = (H˜ R)>δ ∈ d and D˜ = f˜ − f˜ i λk i R ij i j

Theorem (Norm conservation theorem in the case of infinite p) 2 Let  > 0, if d > d0 ∼ log N/ , then, with proba > 1 − 1/N, we have : 2 ˜ ∀(i, j) ∈ [1, N] (1 − )Dij 6 Dij 6 (1 + )Dij .

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 34 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Norm conservation result [Tremblay ’16a, Ramasamy 15’]

The spectral distance reads : D = kH δ k = lim H˜ δ ij λk ij p→∞ λk ij

N×d Let R = (r1|r2| · · · |rd ) ∈ R be a random Gaussian matrix, i.e. a collection of d random graph signals, with 0 mean and var. 1/d.

We define f˜ = (H˜ R)>δ ∈ d and D˜ = f˜ − f˜ i λk i R ij i j

Theorem (Norm conservation theorem in the case of infinite p) 2 Let  > 0, if d > d0 ∼ log N/ , then, with proba > 1 − 1/N, we have : 2 ˜ ∀(i, j) ∈ [1, N] (1 − )Dij 6 Dij 6 (1 + )Dij .

Consequence : to estimate Dij with no partial diagonalisation of L, fast filter only d ∼ log N random signals !

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 34 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

How to quickly estimate λk , the sole unknown of the fast filtering operation ?

Goal : given a SDP L, estimate its k-th eigenvalue as fast as possible.

We use eigencount techniques [Napoli ’13] (also based on polynomial filtering of random vectors !) : • given the interval [0, b], get an approximation of the number of enclosed eigenvalues.

• And find λk by dichotomy on b.

done in [O(pN log N)]

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 35 / 47 Next steps (sampling) : 4. sample n nodes from p∗ r 5. run k-means on the n associated feature vectors and obtain {cj }j=1:k

6. reconstruct all k indicator functions {cj }j=1:k

r If n ∼ k log k and cj = Mcj , we prove that we control the reconstruction error.

Introduction to GSP Graph sampling Application to clustering Conclusion

The CSC algorithm [Tremblay ’16b, Puy 16’]

1. Estimate λk , the k-th eigenvalue of L. N×d 2. Generate d random graph signals in matrix R ∈ R . ˜ d 3. Filter them with Hλk and treat each node i as a point in R : ˜> > ˜ fi = δi Hλk R.

If d ∼ log N, we prove that D˜ij = kf˜i − f˜j k ' Dij .

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 36 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

The CSC algorithm [Tremblay ’16b, Puy 16’]

1. Estimate λk , the k-th eigenvalue of L. N×d 2. Generate d random graph signals in matrix R ∈ R . ˜ d 3. Filter them with Hλk and treat each node i as a point in R : ˜> > ˜ fi = δi Hλk R.

If d ∼ log N, we prove that D˜ij = kf˜i − f˜j k ' Dij .

Next steps (sampling) : 4. sample n nodes from p∗ r 5. run k-means on the n associated feature vectors and obtain {cj }j=1:k

6. reconstruct all k indicator functions {cj }j=1:k

r If n ∼ k log k and cj = Mcj , we prove that we control the reconstruction error.

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 36 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Application to clustering : A toy experiment

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 37 / 47 > 2 fi = U2 δi ∈ R : Dij :

1.5 200

400 1

600 0.5 800

1000 0 200 400 600 800 1000

Introduction to GSP Graph sampling Application to clustering Conclusion

SC on a toy example

N = 1000, k = 2

Com 1 : Com 2 : 300 nodes 700 nodes } }

200

400

600

800

1000 200 400 600 800 1000

Compute U2 = (u1, u2)

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 38 / 47 Dij :

1.5 200

400 1

600 0.5 800

1000 0 200 400 600 800 1000

Introduction to GSP Graph sampling Application to clustering Conclusion

SC on a toy example > 2 fi = U2 δi ∈ R : N = 1000, k = 2 1

0.5 Com 1 : Com 2 : 300 nodes 700 nodes 0 } } -0.5 200 -1 400 -1 -0.5 0

600

800

1000 200 400 600 800 1000

Compute U2 = (u1, u2)

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 38 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

SC on a toy example > 2 fi = U2 δi ∈ R : N = 1000, k = 2 1

0.5 Com 1 : Com 2 : 300 nodes 700 nodes 0 } } -0.5 200 -1 400 -1 -0.5 0 D : 600 ij

800 1.5 200

1000 400 1 200 400 600 800 1000 600 0.5 800 Compute U2 = (u1, u2) 1000 0 200 400 600 800 1000

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 38 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

SC on a toy example > 2 fi = U2 δi ∈ R : N = 1000, k = 2 1

0.5 k-means Com 1 : Com 2 : 300 nodes 700 nodes 0 } } perf = 0.996 -0.5 200 -1 400 -1 -0.5 0 D : 600 ij

800 1.5 200

1000 400 1 200 400 600 800 1000 600 0.5 800 Compute U2 = (u1, u2) 1000 0 200 400 600 800 1000

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 38 / 47 ˜ 3 fi ∈ R : D˜ij ' Dij :

200 1 400

600 0.5 800

1000 0 200 400 600 800 1000

Introduction to GSP Graph sampling Application to clustering Conclusion

CSC on the same toy example

∗ 1. estimate λ2 and p 2. gen. d = 3 random graph signals 3. 4. 5. 6.

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 39 / 47 D˜ij ' Dij :

200 1 400

600 0.5 800

1000 0 200 400 600 800 1000

Introduction to GSP Graph sampling Application to clustering Conclusion

CSC on the same toy example

˜ 3 fi ∈ R : -0.4

-0.6

-0.8 ∗ 1. estimate λ2 and p -1 2. gen. d = 3 random graph signals -0.5 0 3. low-pass filter them : f˜i 0.5 0.5 0 -0.5 -1 4. 5. 6.

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 39 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

CSC on the same toy example

˜ 3 fi ∈ R : -0.4

-0.6

-0.8 ∗ 1. estimate λ2 and p -1 2. gen. d = 3 random graph signals -0.5 0 3. low-pass filter them : f˜i 0.5 0.5 0 -0.5 -1 4. D˜ij ' Dij : 5. 6. 200 1 400

600 0.5 800

1000 0 200 400 600 800 1000 N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 39 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

CSC on the same toy example

˜ 3 fi ∈ R : -0.4

-0.6

-0.8 ∗ 1. estimate λ2 and p -1 2. gen. d = 3 random graph signals -0.5 0 3. low-pass filter them : f˜i 0.5 0.5 0 -0.5 -1 4. sample n = 3 nodes from p∗ D˜ij ' Dij : 5. 6. 200 1 400

600 0.5 800

1000 0 200 400 600 800 1000 N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 39 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

CSC on the same toy example

˜ 3 fi ∈ R :

-0.6

-0.8 ∗ 1. estimate λ2 and p -1 2. gen. d = 3 random graph signals -0.5 0 3. low-pass filter them : f˜i 0.5 0.5 0 -0.5 -1 4. sample n = 3 nodes from p∗ D˜ij ' Dij : 5. 6. 200 1 400

600 0.5 800

1000 0 200 400 600 800 1000 N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 39 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

CSC on the same toy example

˜ 3 fi ∈ R :

-0.6 k-means

-0.8 ∗ 1. estimate λ2 and p -1 2. gen. d = 3 random graph signals -0.5 0 3. low-pass filter them : f˜i 0.5 0.5 0 -0.5 -1 4. sample n = 3 nodes from p∗ D˜ij ' Dij : 5. run low-dim. k-means 6. 200 1 400

600 0.5 800

1000 0 200 400 600 800 1000 N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 39 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

CSC on the same toy example

˜ 3 fi ∈ R : -0.4

-0.6 after interpolation -0.8 ∗ 1. estimate λ2 and p -1 2. gen. d = 3 random graph signals -0.5 perf = 0.951 0 3. low-pass filter them : f˜i 0.5 0.5 0 -0.5 -1 4. sample n = 3 nodes from p∗ D˜ij ' Dij : 5. run low-dim. k-means 6. reconstruct the result 200 1 400

600 0.5 800

1000 0 200 400 600 800 1000 N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 39 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Application to clustering : Experiments on the SBM

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 40 / 47 Experiments with N = 103, k = 20, s = 16, wrt to different parameters :

1 1 1

SC SC 0.5 0.5 0.5 SC n = k log(k) d = 2 log(n) p = 10 n = 2 k log(k) d = 3 log(n) p = 20 n = 3 k log(k) d = 4 log(n) p = 50 n = 4 k log(k) d = 5 log(n) p = 100 0 0 0 Recovery performance 0 0.05 0.1 ǫ 0.15 Recovery performance 0 0.05 0.1 ǫ 0.15 Recovery performance 0 0.05 0.1 ǫ 0.15 c c c ǫ ǫ ǫ

Introduction to GSP Graph sampling Application to clustering Conclusion

Experiments The Stochastic Block Model (SBM) :

C1 C2 Ck { { { • N nodes and k communities of equal size N/k.

• proba q1 if in same community q1 q2 q2 • proba q2 if not. q2 q2 q1 • define the ratio  = q2/q1 • SBM fully defined by  and average degree s. • define critical ratio q q √ √ q2 2 1 c = (s − s)/(s + s(k − 1)) [Decelle ’11]

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 41 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Experiments The Stochastic Block Model (SBM) :

C1 C2 Ck { { { • N nodes and k communities of equal size N/k.

• proba q1 if in same community q1 q2 q2 • proba q2 if not. q2 q2 q1 • define the ratio  = q2/q1 • SBM fully defined by  and average degree s. • define critical ratio q q √ √ q2 2 1 c = (s − s)/(s + s(k − 1)) [Decelle ’11]

Experiments with N = 103, k = 20, s = 16, wrt to different parameters :

1 1 1

SC SC 0.5 0.5 0.5 SC n = k log(k) d = 2 log(n) p = 10 n = 2 k log(k) d = 3 log(n) p = 20 n = 3 k log(k) d = 4 log(n) p = 50 n = 4 k log(k) d = 5 log(n) p = 100 0 0 0 Recovery performance 0 0.05 0.1 ǫ 0.15 Recovery performance 0 0.05 0.1 ǫ 0.15 Recovery performance 0 0.05 0.1 ǫ 0.15 c c c ǫ ǫ ǫ

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 41 / 47 On a real-world graph : Amazon graph with 335000 nodes and 926000 edges :

SC CSC k=250 7h17m, 0.84 1h20m, 0.83 k=500 15h29m, 0.84 3h34m, 0.84 17h36m (eigs) k=1000 + at least 21 h 10h18m, 0.84 for k-means, unknown

Introduction to GSP Graph sampling Application to clustering Conclusion

Experiments −3 With params d = 4 log (k), n = 2k log (k), p = 50, γ = 10 , and  = c /4 :

10 5 1 N=10 4, CSC N=10 4, PM 0.95 4 N=10 , SC 10 3 N=10 5, CSC 0.9 N=10 5, PM N=10 5, SC 1 0.85 N=10 6, CSC 10 N=10 6, PM

0.8 N=10 6, SC Computation time (s) Recovery performance 20 50 100 200 20 50 100 200 # of classes k # of classes k PM = Power Method [Lin ’10, Boutsidis ’15]

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 42 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Experiments −3 With params d = 4 log (k), n = 2k log (k), p = 50, γ = 10 , and  = c /4 :

10 5 1 N=10 4, CSC N=10 4, PM 0.95 4 N=10 , SC 10 3 N=10 5, CSC 0.9 N=10 5, PM N=10 5, SC 1 0.85 N=10 6, CSC 10 N=10 6, PM

0.8 N=10 6, SC Computation time (s) Recovery performance 20 50 100 200 20 50 100 200 # of classes k # of classes k PM = Power Method [Lin ’10, Boutsidis ’15]

On a real-world graph : Amazon graph with 335000 nodes and 926000 edges :

SC CSC k=250 7h17m, 0.84 1h20m, 0.83 k=500 15h29m, 0.84 3h34m, 0.84 17h36m (eigs) k=1000 + at least 21 h 10h18m, 0.84 for k-means, unknown

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 42 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Conclusion

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 43 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Two main ideas

Low-pass graph fast filtering of random signals : a way to by-pass the Laplacian’s diagonalisation for learning tasks.

Cluster indicator functions live in a low-dimensional space (are k-bandlimited) : we can use sampling schemes to recover them efficiently.

Details of this work are found in :

(Sampling part) (Clustering part)

Random sampling of bandlimited Compressive Spectral Clustering, signals on graphs, ACHA 2016. ICML 2016.

A MATLAB toolbox is available at : A MATLAB toolbox is available at : grsamplingbox.gforge.inria.fr cscbox.gforge.inria.fr

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 44 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Links with literature

Low-rank approximation : Nystrom methods [Sun ’15], leverage scores [Mahoney ’11]

Machine Learning : semi-supervised learning [Chapelle ’10], active learning [Fu ’12, Gadde ’14], coresets [Har-Peled ’04, Frahling ’08]

Compressed sensing : variable density sampling [Puy ’11]

Other fast approximate SC algorithms: [Lin ’10, Fowlkes ’04, Wang ’09, Chen ’11a, Chen ’11b]

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 45 / 47 Perspectives

1. Rational filters instead of polynomial filters ? [Shi ’15, Isufi ’16] 2. Smoother filters for better approximation ? [Sakiyama ’16] 3. How about if nodes are added one by one ? 4. SBMO ! [cf. E. Kaufmann] 5. Experiments shown were done with L = I − D−1/2WD−1/2. Test for L = D1−2α ˆ − D−αˆ WD−αˆ ! [cf. R. Couillet]

Introduction to GSP Graph sampling Application to clustering Conclusion

Perspectives and difficult questions

Two difficult questions (among others) :

1. Given a SDP matrix, how to estimate as fast as possible its k-th eigenvalue, and only that one ? 2. How to choose automatically the appropriate polynomial order p ?

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 46 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion

Perspectives and difficult questions

Two difficult questions (among others) :

1. Given a SDP matrix, how to estimate as fast as possible its k-th eigenvalue, and only that one ? 2. How to choose automatically the appropriate polynomial order p ?

Perspectives

1. Rational filters instead of polynomial filters ? [Shi ’15, Isufi ’16] 2. Smoother filters for better approximation ? [Sakiyama ’16] 3. How about if nodes are added one by one ? 4. SBMO ! [cf. E. Kaufmann] 5. Experiments shown were done with L = I − D−1/2WD−1/2. Test for L = D1−2α ˆ − D−αˆ WD−αˆ ! [cf. R. Couillet]

N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 46 / 47 Introduction to GSP Graph sampling Application to clustering Conclusion [Ramasamy ’15] Compressive spectral embedding : sidestepping . . . NIPS. [Fortunato ’10] Community detection in graphs, Physics Reports [Newman ’06] Modularity and community structure in networks, PNAS [Schaub ’12] Markov dynamics as a zooming lens for multiscale . . . , Plos One [Krzakala ’13] Spectral redemption : clustering sparse networks, PNAS [Rosvall ’08] Maps of random walks on complex networks reveal . . . , PLOS One [Von Luxburg ’06] A tutorial on spectral clustering, Statistics and Computing. [Chen ’11a] Parallel spectral clustering in distributed systems, IEEE TPAMI [Lin ’10] Power iteration clustering, ICML [Boutsidis ’15] Spectral clustering via the power method - provably, ICML [Fowlkes ’04] Spectral grouping using the nystrom method, IEEE TPAMI [Wang ’09] Approximate spectral clustering, AKDDM [Chen ’11b] Large scale spectral clustering with landmark-based . . . , CAI [Shuman ’13] The emerging field of signal processing on graphs . . . , SPMag [Hammond ’11] Wavelets on graphs via , ACHA [Napoli ’13] Efficient estimation of eigenvalue counts in an interval, arXiv [Tremblay ’16a] Accelerated spectral clustering using graph. . . , ICASSP [Tremblay ’16b] Compressive spectral clustering, ICML [Puy ’16] Random sampling of bandlimited signals . . . , ACHA [Shi ’15] Infinite impulse response graph filters in wireless sensor networks, SPL [Chen ’15] Discrete Signal Processing on Graphs : Sampling Theory, TSP [Anis ’16] Efficient Sampling Set Selection for Bandlimited Graph . . . , TSP [Segarra ’15] Sampling of graph signals with successive local aggregations, TSP [Chapelle ’10] Semi-Supervised Learning, The MIT Press [Fu ’12] A survey on instance selection for active learning, KIS [Mahoney ’11] Randomized algorithms for matrices and data, Found. and Trends in ML [Sun ’15] A review of Nystr¨ommethods for large-scale , Inf. Fus. [Puy ’11] On variable density compressive sampling, SPL [Gadde ’14] Active Semi-supervised Learning Using Sampling Theory. . . , SIGKDD [Isufi ’16] Distributed Time-Varying Graph Filtering, ArXiv [Sakiyama ’16] Sp. Gr. Wav. and Filter Banks with Low Approximation Error, not published yet N. Tremblay Compressive Spectral Clustering Gdr ISIS, 17th of June 2016 47 / 47