LAB 8 MATH 150 LINEAR ALGEBRA SPECTRAL CLUSTERING SPRING 2019

Graphs

A graph is a system of vertices and edges, with an edge between vertices indicating some kind of relationship between those vertices. Sometimes edges include arrows to indicate direction, or may loop from a vertex back to itself.

There is an entire field of study devoted to graphs, with many problems that are easy to understand but hard to solve. In this lab, we’ll focus on a particular type of graph, with the following properties:

• Simple: there should not be multiple edges between any two vertices, and no loops from a vertex to itself. • Labeled: each vertex will be distinguished in some way, usually by numbering. The vertices are meant to represent particular objects, people, places, or other things that can not be considered equivalent. • Undirected: no edges will have an arrow indicating a “one-way” relationship between the vertices.

An example of just such a graph is below:

1 3

2 6

4

5

Note that the placement of the vertices means nothing, nor does the length of an edge indicate anything about the relation- ship between the vertices. In fact, the graph below is completely equivalent to the one above:

3

6

1 4 2

5

All of the vertices have the same relationship (as defined by the edges) that they did before, so the meaning of the graph is not changed.

Adjacency

The of a graph has an entry of 1 in the ith row and jth column if there is an edge from i to j, and a 0 otherwise. The adjacency matrix for the graph above would be 0 0 0 0 1 0 0 0 1 1 1 0   0 1 0 1 0 1 A =   0 1 1 0 0 1   1 1 0 0 0 0 0 0 1 1 0 0 Matrix

The degree matrix of a graph is a with an entry of k in row i and column i if vertex i has k edges attached to it.

Here again is our graph:

1 3 2 6 4 5 and here is its degree matrix: 1 0 0 0 0 0 0 3 0 0 0 0   0 0 3 0 0 0 D =   0 0 0 3 0 0   0 0 0 0 2 0 0 0 0 0 0 2

We can also find the degree matrix by adding the numbers in each row of the adjacency matrix.

The Laplacian

The of a (simple) graph is the difference between its adjacency matrix and its degree matrix, L = D − A. In our example, 1 0 0 0 0 0 0 0 0 0 1 0  1 0 0 0 −1 0  0 3 0 0 0 0 0 0 1 1 1 0  0 3 −1 −1 −1 0        0 0 3 0 0 0 0 1 0 1 0 1  0 −1 3 −1 0 −1 L =   −   =   0 0 0 3 0 0 0 1 1 0 0 1  0 −1 −1 3 0 −1       0 0 0 0 2 0 1 1 0 0 0 0 −1 −1 0 0 2 0  0 0 0 0 0 2 0 0 1 1 0 0 0 0 −1 −1 0 2

The Fiedler vector

The Fieldler vector of a graph is an eigenvector of the Laplacian matrix of that graph. Specifically, it’s the eigenvector asso- ciated with the second smallest eigenvalue, called the “spectral gap,” of the Laplacian.

Even in this small example, finding the eigenvectors of a 6 × 6 matrix is labor intensive, so we’ll have Scilab compute every- thing for use. First define the matrix and check that it’s entered correctly:

-->L = [1 0 0 0 -1 0//03 -1 -1 -1 0//0 -13 -10 -1//0 -1 -130 -1//-1 -1002 0//0 0 -1 -10 2] L=

1. 0. 0. 0. -1. 0. 0. 3. -1. -1. -1. 0. 0. -1. 3. -1. 0. -1. 0. -1. -1. 3. 0. -1. -1. -1. 0. 0. 2. 0. 0. 0. -1. -1. 0. 2. Next, use the spec method to retrieve the eigenvalues and eigenvectors of the matrix. The method actually returns two ma- trices: one with diagonal entries corresponding to the eigenvalues of L, and the other whose columns correspond to those eigenvalues in the same order. Only the eigenvalues are returned when we type spec(L):

--> spec(L) ans =

8.327D-16 0.4384472 2. 3. 4. 4.5615528

But we can see everything we need if we give names to the two matrices, like evals and evecs:

--> [evecs, evals] = spec(L) evals=

7.078D-16 0. 0. 0. 0. 0. 0. 0.4384472 0. 0. 0. 0. 0. 0. 2. 0. 0. 0. 0. 0. 0. 3. 0. 0. 0. 0. 0. 0. 4. 0. 0. 0. 0. 0. 0. 4.5615528

evecs=

0.4082483 -0.7018088 -0.5 0.2886751 2.490D-16 -0.0863966 0.4082483 0.0863966 0.5 0.2886751 -8.097D-16 -0.7018088 0.4082483 0.3077061 0. 0.2886751 -0.7071068 0.3941027 0.4082483 0.3077061 1.138D-16 0.2886751 0.7071068 0.3941027 0.4082483 -0.3941027 0.5 -0.5773503 0. 0.3077061 0.4082483 0.3941027 -0.5 -0.5773503 -8.882D-16 -0.3077061 Note: The smallest eigenvalue of a Laplacian will always be 0. However, Scilab computes the eigenvalues numerically, and is open to rounding error. This is a good example of why numerical approximations can be misleading!

The second smallest eigenvalue would be λ ≈ 0.4384472. There may be a bit of rounding error here too, but the exact value does not matter. The corresponding eigenvector is the Fiedler vector we want: −0.7018088  0.0863966     0.3077061  f~ =    0.3077061    −0.3941027 0.3941027

Clustering Data

The Fiedler vector we just found encodes a method for finding the minimum “graph cut” needed to split the graph into two connected groups. The minimum cut is the one that requires breaking the least number of edges. Hence, it will effectively split our vertices into two strongly related groups. Looking at the graph, you may be able to predict how this will turn out. The vertices corresponding to rows of the Fiedler vector with positive entries will form one cluster, and those corresponding to rows with negative entries will form the second cluster: −0.7018088 −  0.0863966  +      0.3077061  + f~ =   ⇒    0.3077061  +     −0.3941027 − 0.3941027 + So the vertices 1 and 5 would form one cluster, with vertices 2, 3, 4, and 6 forming the other. The minimal cut is only through 1 edge to form these two connected components:

1 3

2 6

4

5

4-Way Clustering

The eigenvectors of the Laplacian can also be used to find the minimal way to lay out the graph on a Cartesian grid so that related vertices appear connected in the same quadrants. We then “cut” the edges lying across each axis to form four con- nected clusters.

To do so, we’ll use the second and third eigenvectors (the Fiedler vector and the one next to it, corresponding to the next largest eigenvalue). In this case, the eigenvectors we’re interesting in are −0.7018088  −0.5   0.0863966   0.5       0.3077061   0  ~x =   ~y =    0.3077061  and −1.260 × 10−16     −0.3941027  0.5  0.3941027 −0.5

Now for each vertex i, plot a point at (~x[i], ~y[i]). For instance, the vertex for 1 would be plotted at (−0.7018088, −0.5). Doing so for each vertex yields:

5 2

43

1 6

All of the original edges are there too, just “linearized” to lie along a single path.

Next, form clusters by which quadrant each vertex falls in. It’s hard to see, but the vertices 3 and 4 are in the same position in quadrant I. So the clusters would be {2, 3, 4}{5}{1}{6} Returning to the graph to highlight the new clusters:

1 3

2 6

4

5 Collaboration Matrices

The matrix below represents the collaboration between faculty, and can also be considered an adjacency matrix. Each row and column represents a faculty member, labeled 1 through 15. A 1 in the matrix indicates that the faculty in that row and column have collaboration on a research project. A 0 indicates the faculty have never worked together. 010110010010010 100100110001001   000001100110000   110010001001001   100100100101010   001000001000101   011010010010100   A = 110000100100010   000101000011000   001010010000101   101000101001010   010110001010001   000001100100011   100010010010100 010101000101100

The department is considering rearranging its offices so that faculty that work together are in the same area of the building. A look at the graph suggests this will be difficult, as there is no obvious “split” in faculty collaboration. Any attempt to divide the faculty will be imperfect, but we’re seeking the best possible outcome.

5 4 6 3 7 2 8 1 9 15 10 14 11 12 13

Performing the two-way clustering analysis using Scilab gives us the best possible division: {1, 2, 4, 5, 7, 8, 11, 12, 14}{3, 6, 9, 10, 13, 15}

After removing the edges between clusters and dividing the graph:

5 4 5 4 6 6 3 3 7 7 2 2 8 8 ⇒ 1 1 9 9 15 15 10 10 14 14 11 11 12 13 12 13