<<

Lecture III: Functions in , Part 2

Michele Benzi Department of Mathematics and Computer Science Emory University

Atlanta, Georgia, USA

Summer School on Theory and Computation of Matrix Functions Dobbiaco, 15-20 June, 2014

1 Outline

1 Diffusion on networks

2 Hub and authority rankings in digraphs

3 Numerical methods of network analysis

4 Numerical examples

5 Bibliography

2 Outline

1 Diffusion on networks

2 Hub and authority rankings in digraphs

3 Numerical methods of network analysis

4 Numerical examples

5 Bibliography

3 Dynamics on networks: diffusion

The diffusion of contagious diseases or computer viruses, the adoption of new products, the spread of rumors, beliefs, fads, consensus phenomena etc. can be modeled in terms of diffusion processes on graphs.

There exist a number of models to describe diffusion phenomena on networks, both deterministic and stochastic in nature.

Moreover, it is possible to investigate a number of important structural properties of networks by studying the behavior of solutions of evolution problems, such as diffusion or wave propagation, taking place on the network.

An increasingly important model in this direction is given by quantum graphs, dicussed in Lecture VI.

4 Dynamics on networks: diffusion (cont.)

The simplest (semi-deterministic) model of diffusion on a graph is probably the initial value problem for the heat equation on G:

dx = −L x, t > 0, x(0) = x . dt 0

Here L = D − A denotes the graph Laplacian, x = x(t) is the “temperature" (or “concentration") at time t, and x0 the initial distribution; e.g., if x0 = ei, the “substance" that is diffusing on G is initially all concentrated on node vi. As a rule, x0 ≥ 0. The solution of the initial value problem is then given by

−tL x(t) = e x0, t > 0.

Here we assume that G is connected.

5 Dynamics on networks: diffusion (cont.)

−tL We note that the heat kernel {e }t≥0 forms a family (semigroup) of doubly stochastic matrices. This means that each matrix e−tL is nonnegative and has all its row and column sums equal to 1.

The diffusion process on G is therefore a continuous time Markov process, which converges for t → ∞ to the uniform distribution, regardless of the (nonzero) initial state:

lim x(t) = c1, c > 0, t→∞

where c is the average of the initial vector x0.

6 Dynamics on networks: diffusion (cont.)

The rate at which the system reaches the steady-state (equilibrium) is governed by the smallest non-zero eigenvalue of L, which in turn is determined by the topology of the graph. In general, the small-world property leads to extremely fast diffusion (“mixing") in complex networks.

This can help explain the very rapid global spread of certain epidemics (like the flu) but also of fads, rumors, financial crises, etc.

Note that this rapid convergence to equilibrium is in contrast with the situation for diffusion on a regular grid obtained from the discretization of d a (bounded) domain Ω ⊂ R (d = 1, 2, 3).

7 Dynamics on networks: diffusion (cont.)

More in detail, let N T X T L = QΛQ = λk qkqk k=1 be the eigendecomposition of the Laplacian of the (connected) graph G. Recalling that the eigenvalues of L satisfy 0 = λ1 < λ2 ≤ ... ≤ λN and that the (normalized) eigenvector associated with the simple zero eigenvalue is q = √1 1, 1 N we obtain N 1 X e−tL = 11T + e−tλk q qT . N k k k=2 Therefore we have N 1 X e−tLx = (1T x )1 + e−tλk (qT x )q −→ c1 for t → ∞ , 0 N 0 k 0 k k=2

1 T where c = N (1 x0) is positive if the initial (nonzero) vector x0 is nonnegative. It is clear that the rate of convergence to equilibrium (“mixing rate") depends on

the smallest nonzero eigenvalue λ2 of L (also called the spectral gap of L). 8 Dynamics on networks: diffusion (cont.)

As an extreme example, consider the on N nodes, in which every node is connected to any other node.

The Laplacian of this graph has only two distinct eigenvalues, 0 and N, and equilibrium is reached extremely fast.

However, there also exist very sparse graphs that exhibit a sizable gap and therefore fast mixing rates. An important example is given by expander graphs. These graphs are highly sparse, yet they are highly connected and have numerous applications in the design of communication and infrastructure networks.

Another important property of such graphs is that they are very robust, meaning that many edges have to be deleted to break the network into two (large) disconnected subnetworks.

The problem of constructing sparse graphs with good expansion properties is an active area of research. 9 Dynamics on networks: the Schrödinger equation

Also of interest in Network Science is the Schrödinger equation on a graph: dψ i = Lψ, t ∈ , ψ(0) = ψ , dt R 0 N where ψ0 ∈ C is a prescribed initial state with kψ0k2 = 1. −itL The solution is given esplicitly by ψ(t) = e ψ0, for all t ∈ R; note that since itL is skew-Hermitian, the propagator U(t) = e−itL is a , which guarantees that the solution has unit norm for all t:

kψ(t)k2 = kU(t)ψ0k2 = kψ0k2 = 1, ∀t ∈ R.

2 The amplitudes |[U(t)]ij| express the probability to find the system in state ei at time t, given that the initial state was ej. The Schrödinger equation has recently been proposed as a tool to analyze properties of complex graphs in P. Suau, E. R. Hancock, & F. Escolano, Graph characteristics from the Schrödinger operator, LNCS 7877 (2013), pp. 172–181. 10 Outline

1 Diffusion on networks

2 Hub and authority rankings in digraphs

3 Numerical methods of network analysis

4 Numerical examples

5 Bibliography

11 Hubs and authorities in digraphs

Returning to the topic of measures, suppose that G = (V,E) is a . In such graphs each node can play two different roles: that of hub, and that of authority. For example, we may be interested in analyzing citation patterns among authors in a given field, or prey-predator relations in a food web. The Hyperlink-Induced Topic Search (HITS) algorithm (Kleinberg, 1999) provides a way of ranking hubs and authorities in a digraph. The original application area was WWW information retrieval. In a nutshell: A hub is a webpage that points to many good authorities; An authority is a webpage that is pointed to by many good hubs;

Every webpage i is given an authority score xi and a hub score yi, based on an iterative scheme.

12 Weights are updated in the following way:

(k) X (k−1) (k) X (k) xi = yj and yi = xj j:(j,i)∈E j:(i,j)∈E

for k = 1, 2, 3,... P (k 2 P (k) 2 Weights are normalized so that j(xj ) = 1 and j(yj ) = 1. Under mild conditions, the sequences of vectors {x(k)} and {y(k)} converge as k → ∞.

Calculation of HITS scores

The HITS ranking scheme is an iterative method converging to a stationary solution.

Initially, each xi and yi is set equal to 1.

13 P (k 2 P (k) 2 Weights are normalized so that j(xj ) = 1 and j(yj ) = 1. Under mild conditions, the sequences of vectors {x(k)} and {y(k)} converge as k → ∞.

Calculation of HITS scores

The HITS ranking scheme is an iterative method converging to a stationary solution.

Initially, each xi and yi is set equal to 1. Weights are updated in the following way:

(k) X (k−1) (k) X (k) xi = yj and yi = xj j:(j,i)∈E j:(i,j)∈E

for k = 1, 2, 3,...

13 Under mild conditions, the sequences of vectors {x(k)} and {y(k)} converge as k → ∞.

Calculation of HITS scores

The HITS ranking scheme is an iterative method converging to a stationary solution.

Initially, each xi and yi is set equal to 1. Weights are updated in the following way:

(k) X (k−1) (k) X (k) xi = yj and yi = xj j:(j,i)∈E j:(i,j)∈E

for k = 1, 2, 3,... P (k 2 P (k) 2 Weights are normalized so that j(xj ) = 1 and j(yj ) = 1.

13 Calculation of HITS scores

The HITS ranking scheme is an iterative method converging to a stationary solution.

Initially, each xi and yi is set equal to 1. Weights are updated in the following way:

(k) X (k−1) (k) X (k) xi = yj and yi = xj j:(j,i)∈E j:(i,j)∈E

for k = 1, 2, 3,... P (k 2 P (k) 2 Weights are normalized so that j(xj ) = 1 and j(yj ) = 1. Under mild conditions, the sequences of vectors {x(k)} and {y(k)} converge as k → ∞.

13 T I A A is the authority matrix. T I AA is the hub matrix. I Note that the Perron–Frobenius Theorem applies to both matrices.

HITS as the power method

When written in terms of matrices, the HITS process becomes

x(k) = AT y(k−1) and y(k) = Ax(k).

This can be rewritten as

x(k) = AT Ax(k−1) and y(k) = AAT y(k−1).

HITS is just an iterative power method to compute the dominant eigenvector for AT A and AAT .

14 T I AA is the hub matrix. I Note that the Perron–Frobenius Theorem applies to both matrices.

HITS as the power method

When written in terms of matrices, the HITS process becomes

x(k) = AT y(k−1) and y(k) = Ax(k).

This can be rewritten as

x(k) = AT Ax(k−1) and y(k) = AAT y(k−1).

HITS is just an iterative power method to compute the dominant eigenvector for AT A and AAT . T I A A is the authority matrix.

14 I Note that the Perron–Frobenius Theorem applies to both matrices.

HITS as the power method

When written in terms of matrices, the HITS process becomes

x(k) = AT y(k−1) and y(k) = Ax(k).

This can be rewritten as

x(k) = AT Ax(k−1) and y(k) = AAT y(k−1).

HITS is just an iterative power method to compute the dominant eigenvector for AT A and AAT . T I A A is the authority matrix. T I AA is the hub matrix.

14 HITS as the power method

When written in terms of matrices, the HITS process becomes

x(k) = AT y(k−1) and y(k) = Ax(k).

This can be rewritten as

x(k) = AT Ax(k−1) and y(k) = AAT y(k−1).

HITS is just an iterative power method to compute the dominant eigenvector for AT A and AAT . T I A A is the authority matrix. T I AA is the hub matrix. I Note that the Perron–Frobenius Theorem applies to both matrices.

14 HITS reformulation

Letting  0 A  A = AT 0 we obtain a .

Taking powers of this matrix, we get:

 (AAT )k 0  A2k = 0 (AT A)k

 0 A(AT A)k  A2k+1 = . (AT A)kAT 0

Additionally, AT A = AAT = A2.

15 The HITS iteration becomes  AAT 0  u(k) = A2u(k−1) = u(k−1). 0 AT A

The first N entries of u(k) are the hub rankings on the nodes. The last N are the authority rankings. The HITS rankings use only the information contained in the dominant eigenvector of A. Hence, HITS is just for A! Can we do better than HITS?

HITS reformulation (cont.)

 y(k)  Let u(k) = x(k)

16 The first N entries of u(k) are the hub rankings on the nodes. The last N are the authority rankings. The HITS rankings use only the information contained in the dominant eigenvector of A. Hence, HITS is just eigenvector centrality for A! Can we do better than HITS?

HITS reformulation (cont.)

 y(k)  Let u(k) = x(k) The HITS iteration becomes  AAT 0  u(k) = A2u(k−1) = u(k−1). 0 AT A

16 The HITS rankings use only the information contained in the dominant eigenvector of A. Hence, HITS is just eigenvector centrality for A! Can we do better than HITS?

HITS reformulation (cont.)

 y(k)  Let u(k) = x(k) The HITS iteration becomes  AAT 0  u(k) = A2u(k−1) = u(k−1). 0 AT A

The first N entries of u(k) are the hub rankings on the nodes. The last N are the authority rankings.

16 Hence, HITS is just eigenvector centrality for A! Can we do better than HITS?

HITS reformulation (cont.)

 y(k)  Let u(k) = x(k) The HITS iteration becomes  AAT 0  u(k) = A2u(k−1) = u(k−1). 0 AT A

The first N entries of u(k) are the hub rankings on the nodes. The last N are the authority rankings. The HITS rankings use only the information contained in the dominant eigenvector of A.

16 Can we do better than HITS?

HITS reformulation (cont.)

 y(k)  Let u(k) = x(k) The HITS iteration becomes  AAT 0  u(k) = A2u(k−1) = u(k−1). 0 AT A

The first N entries of u(k) are the hub rankings on the nodes. The last N are the authority rankings. The HITS rankings use only the information contained in the dominant eigenvector of A. Hence, HITS is just eigenvector centrality for A!

16 HITS reformulation (cont.)

 y(k)  Let u(k) = x(k) The HITS iteration becomes  AAT 0  u(k) = A2u(k−1) = u(k−1). 0 AT A

The first N entries of u(k) are the hub rankings on the nodes. The last N are the authority rankings. The HITS rankings use only the information contained in the dominant eigenvector of A. Hence, HITS is just eigenvector centrality for A! Can we do better than HITS?

16 Subgraph centrality for digraphs

Recall that the subgraph centrality of a node vi in an undirected graph is

A SC(i) = [e ]ii,

and that the communicability between two nodes vi, vj is

A C(i, j) = [e ]ij.

The notion of subgraph centrality applies primarily to undirected graphs, since it cannot distinguish between the two roles of hub and authority.

In analogy with HITS, we can calculate these quantities for the augmented symmetric matrix  0 A  A = . AT 0

17 Take the vertices from the original network A and make two copies V and V 0. Let V = V ∪ V 0. Undirected edges exist between the two sets based on the following 0 rule: E = {(vi, vj)| there is a directed edge from vi to vj in the original network}. This creates a graph with 2N nodes.

The nodes vi ∈ V1 ⊂ V represent the original nodes in their role as 0 hubs, while the nodes vi ∈ V2 ⊂ V represent the original nodes in their role as authorities.

Subgraph centrality for digraphs (cont.)

As we saw in Lecture II, this matrix is the of the G = (V, E) associated with the original directed graph G.

The network behind A can be thought of as follows:

18 Undirected edges exist between the two sets based on the following 0 rule: E = {(vi, vj)| there is a directed edge from vi to vj in the original network}. This creates a graph with 2N nodes.

The nodes vi ∈ V1 ⊂ V represent the original nodes in their role as 0 hubs, while the nodes vi ∈ V2 ⊂ V represent the original nodes in their role as authorities.

Subgraph centrality for digraphs (cont.)

As we saw in Lecture II, this matrix is the adjacency matrix of the bipartite graph G = (V, E) associated with the original directed graph G.

The network behind A can be thought of as follows: Take the vertices from the original network A and make two copies V and V 0. Let V = V ∪ V 0.

18 This creates a graph with 2N nodes.

The nodes vi ∈ V1 ⊂ V represent the original nodes in their role as 0 hubs, while the nodes vi ∈ V2 ⊂ V represent the original nodes in their role as authorities.

Subgraph centrality for digraphs (cont.)

As we saw in Lecture II, this matrix is the adjacency matrix of the bipartite graph G = (V, E) associated with the original directed graph G.

The network behind A can be thought of as follows: Take the vertices from the original network A and make two copies V and V 0. Let V = V ∪ V 0. Undirected edges exist between the two sets based on the following 0 rule: E = {(vi, vj)| there is a directed edge from vi to vj in the original network}.

18 The nodes vi ∈ V1 ⊂ V represent the original nodes in their role as 0 hubs, while the nodes vi ∈ V2 ⊂ V represent the original nodes in their role as authorities.

Subgraph centrality for digraphs (cont.)

As we saw in Lecture II, this matrix is the adjacency matrix of the bipartite graph G = (V, E) associated with the original directed graph G.

The network behind A can be thought of as follows: Take the vertices from the original network A and make two copies V and V 0. Let V = V ∪ V 0. Undirected edges exist between the two sets based on the following 0 rule: E = {(vi, vj)| there is a directed edge from vi to vj in the original network}. This creates a graph with 2N nodes.

18 Subgraph centrality for digraphs (cont.)

As we saw in Lecture II, this matrix is the adjacency matrix of the bipartite graph G = (V, E) associated with the original directed graph G.

The network behind A can be thought of as follows: Take the vertices from the original network A and make two copies V and V 0. Let V = V ∪ V 0. Undirected edges exist between the two sets based on the following 0 rule: E = {(vi, vj)| there is a directed edge from vi to vj in the original network}. This creates a graph with 2N nodes.

The nodes vi ∈ V1 ⊂ V represent the original nodes in their role as 0 hubs, while the nodes vi ∈ V2 ⊂ V represent the original nodes in their role as authorities.

18 Subgraph centrality for digraphs (cont.)

Using the SVD of A, we can show that

 √  √ † √   cosh AAT A AT A sinh AT A A   e =  √  √ † √   . sinh AT A AT A AT cosh AT A

Here A† denotes the pseudoinverse of A.

√  The diagonal entries of the top left block, cosh AAT , can be used to the nodes of G in their role as hubs; those of the bottom right block, √  cosh AT A , can be used to rank the nodes in their role as authorities.

The interpretation of the off-diagonal entries is a bit more involved, but they too provide useful information on the network.

19 Nodes 1, 2, . . . , n ∈ V represent the original nodes in their role as hubs. Nodes n + 1, n + 2,..., 2n ∈ V represent the original nodes in their role as authorities.

Interpretation

In a directed network, each node has two roles: being a hub and being an authority.

20 Nodes n + 1, n + 2,..., 2n ∈ V represent the original nodes in their role as authorities.

Interpretation

In a directed network, each node has two roles: being a hub and being an authority. Nodes 1, 2, . . . , n ∈ V represent the original nodes in their role as hubs.

20 Interpretation

In a directed network, each node has two roles: being a hub and being an authority. Nodes 1, 2, . . . , n ∈ V represent the original nodes in their role as hubs. Nodes n + 1, n + 2,..., 2n ∈ V represent the original nodes in their role as authorities.

20 An alternating walk of length k, starting with an in-edge, from node v1 to vk+1 is: v1 ← v2 → v3 ← · · · T [AA A...]ij (k matrices multiplied) counts the number of alternating walks of length k, starting with an out-edge, from node i to node j. T T [A AA ...]ij (k matrices multiplied) counts the number of alternating walks of length k, starting with an in-edge, from node i to node j.

Alternating walks

An alternating walk of length k, starting with an out-edge, from v1 to vk+1 is: v1 → v2 ← v3 → · · ·

21 T [AA A...]ij (k matrices multiplied) counts the number of alternating walks of length k, starting with an out-edge, from node i to node j. T T [A AA ...]ij (k matrices multiplied) counts the number of alternating walks of length k, starting with an in-edge, from node i to node j.

Alternating walks

An alternating walk of length k, starting with an out-edge, from v1 to vk+1 is: v1 → v2 ← v3 → · · · An alternating walk of length k, starting with an in-edge, from node v1 to vk+1 is: v1 ← v2 → v3 ← · · ·

21 T T [A AA ...]ij (k matrices multiplied) counts the number of alternating walks of length k, starting with an in-edge, from node i to node j.

Alternating walks

An alternating walk of length k, starting with an out-edge, from v1 to vk+1 is: v1 → v2 ← v3 → · · · An alternating walk of length k, starting with an in-edge, from node v1 to vk+1 is: v1 ← v2 → v3 ← · · · T [AA A...]ij (k matrices multiplied) counts the number of alternating walks of length k, starting with an out-edge, from node i to node j.

21 Alternating walks

An alternating walk of length k, starting with an out-edge, from v1 to vk+1 is: v1 → v2 ← v3 → · · · An alternating walk of length k, starting with an in-edge, from node v1 to vk+1 is: v1 ← v2 → v3 ← · · · T [AA A...]ij (k matrices multiplied) counts the number of alternating walks of length k, starting with an out-edge, from node i to node j. T T [A AA ...]ij (k matrices multiplied) counts the number of alternating walks of length k, starting with an in-edge, from node i to node j.

21 If node i in the original matrix is a good hub, it will point to many good authorities. These authorities will be pointed at by many good hubs, etc. Thus, if node i is a good hub, it will be included in the sets of hubs described above many times. This means that there should be many even length alternating walks, starting with an out-edge, from node i to itself.

Interpretation of diagonal entries

T k T k [(AA ) ]ij and [(A A) ]ij count the number of walks of length 2k starting, respectively, with an out-edge and an in-edge.

22 Thus, if node i is a good hub, it will be included in the sets of hubs described above many times. This means that there should be many even length alternating walks, starting with an out-edge, from node i to itself.

Interpretation of diagonal entries

T k T k [(AA ) ]ij and [(A A) ]ij count the number of walks of length 2k starting, respectively, with an out-edge and an in-edge. If node i in the original matrix is a good hub, it will point to many good authorities. These authorities will be pointed at by many good hubs, etc.

22 This means that there should be many even length alternating walks, starting with an out-edge, from node i to itself.

Interpretation of diagonal entries

T k T k [(AA ) ]ij and [(A A) ]ij count the number of walks of length 2k starting, respectively, with an out-edge and an in-edge. If node i in the original matrix is a good hub, it will point to many good authorities. These authorities will be pointed at by many good hubs, etc. Thus, if node i is a good hub, it will be included in the sets of hubs described above many times.

22 Interpretation of diagonal entries

T k T k [(AA ) ]ij and [(A A) ]ij count the number of walks of length 2k starting, respectively, with an out-edge and an in-edge. If node i in the original matrix is a good hub, it will point to many good authorities. These authorities will be pointed at by many good hubs, etc. Thus, if node i is a good hub, it will be included in the sets of hubs described above many times. This means that there should be many even length alternating walks, starting with an out-edge, from node i to itself.

22 Letting A = UΣV T be the SVD of A, this becomes:

 Σ2 Σ4 Σ2k  U I + + + ··· + + ··· U T 2! 4! (2k)! √ = U cosh(Σ)U T = cosh( AAT )

Interpretation of diagonal entries (cont.)

1 The number of such walks, scaling a walk of length 2k by (2k)! , can be counted by the (i, i) entry of:

AAT AAT AAT (AAT )k I + + + ··· + + ··· 2! 4! (2k)!

23 Interpretation of diagonal entries (cont.)

1 The number of such walks, scaling a walk of length 2k by (2k)! , can be counted by the (i, i) entry of:

AAT AAT AAT (AAT )k I + + + ··· + + ··· 2! 4! (2k)!

Letting A = UΣV T be the SVD of A, this becomes:

 Σ2 Σ4 Σ2k  U I + + + ··· + + ··· U T 2! 4! (2k)! √ = U cosh(Σ)U T = cosh( AAT )

23 Using a similar analysis, it can be seen that the authority centrality of node i is given by √ A T [e ](i+n,i+n) = [cosh( A A)](i,i).

It measures how well node i receives information from the hubs in the network. These measures use information from all of the eigenvectors of the matrix A.

Interpretation of diagonal entries (cont.)

The hub centrality of node i (in the original network) is given by √ A T [e ](i,i) = [cosh( AA )](i,i).

It measures how well node i transmits information to the authoritative nodes in the network.

24 These measures use information from all of the eigenvectors of the matrix A.

Interpretation of diagonal entries (cont.)

The hub centrality of node i (in the original network) is given by √ A T [e ](i,i) = [cosh( AA )](i,i).

It measures how well node i transmits information to the authoritative nodes in the network. Using a similar analysis, it can be seen that the authority centrality of node i is given by √ A T [e ](i+n,i+n) = [cosh( A A)](i,i).

It measures how well node i receives information from the hubs in the network.

24 Interpretation of diagonal entries (cont.)

The hub centrality of node i (in the original network) is given by √ A T [e ](i,i) = [cosh( AA )](i,i).

It measures how well node i transmits information to the authoritative nodes in the network. Using a similar analysis, it can be seen that the authority centrality of node i is given by √ A T [e ](i+n,i+n) = [cosh( A A)](i,i).

It measures how well node i receives information from the hubs in the network. These measures use information from all of the eigenvectors of the matrix A.

24 The influence of the spectral gap

When the gap between the largest and second largest singular value of A is large enough, the resulting rankings tend to agree with those obtained by HITS.

However, if the gap is relatively small the new rankings can be significantly different, and arguably more meaningful, than those obtained with HITS, since more singular value/vector information is taken into account.

This method is an alternative to the Katz-like approach based on the row and column sums of f(A).

We illustrate this on three examples.

M. Benzi, E. Estrada, & C. Klymko, Ranking hubs and authorities using matrix functions, Linear Algebra Appl., 438 (2013), pp. 2447-2474.

25 Example: ranking of websites on specific topics

Next, we compare our approach with HITS.

The following data sets are small web graphs on various issues.

The data is provided by P. Tsaparas and can be found on his website

http://www.cs.toronto.edu/ tsap/experiments/www10-experiments/index.html

26 Abortion dataset

The abortion dataset contains n = 2293 nodes and m = 9644 directed edges. The maximum eigenvalue of A is λN = 31.91022 and the second largest eigenvalue is λN−1 = 26.03882

40

30

20

10

0

−10

−20

−30

−40 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

27

Figure: Plot of the eigenvalues of the extended abortion matrix A. Hub ranking

A Table: Top 10 hubs of the abortion data set, ranked using [e ]ii and HITS. A [e ]ii nodes HITS nodes 48 48 1021 1006 1007 1007 1006 1021 1053 1053 1020 1020 987 960 990 968 985 969 989 970

28 Authority ranking

A Table: Top 10 authorities of the abortion data set, ranked using [e ]ii and HITS. A [e ]ii nodes HITS nodes 967 939 958 958 939 967 962 961 963 962 964 963 961 964 965 965 966 966 587 1582

29 Computational complexity dataset

The computational complexity dataset contains n = 884 nodes and m = 1616 directed edges. The maximum eigenvalue of A is λN = 10.93347 and the second largest eigenvalue is λN−1 = 9.85677.

15

10

5

0

−5

−10

−15 0 200 400 600 800 1000 1200 1400 1600 1800

30

Figure: Plot of the eigenvalues of the extended computational complexity matrix A. Hub ranking

A Table: Top 10 hubs of the computational complexity data set, ranked using [e ]ii and HITS. A [e ]ii nodes HITS nodes 57 57 17 634 644 644 643 721 634 643 106 554 119 632 529 801 86 640 162 639

31 Authority ranking

Table: Top 10 authorities of the computational complexity data set, ranked using A [e ]ii and HITS. A [e ]ii nodes HITS nodes 1 719 315 717 673 727 148 723 719 808 717 735 2 737 45 1 727 722 534 770

32 Death penalty dataset

The death penalty dataset contains n = 1850 nodes and m = 7363 directed edges. The maximum eigenvalue of A is λN = 28.0250 and the second largest eigenvalue λN−1 = 17.68323

30

20

10

0

−10

−20

−30 0 500 1000 1500 2000 2500 3000 3500 4000

33

Figure: Plot of the eigenvalues of the extended death penalty matrix A. Hub ranking

A Table: Top 10 hubs of the death penalty data set, ranked using [e ]ii and HITS. A [e ]ii nodes HITS nodes 210 210 637 637 413 413 1586 1586 552 552 462 462 930 930 542 542 618 618 1275 1275

34 Authority ranking

A Table: Top 10 authorities of the death penalty data set, ranked using [e ]ii and HITS. A [e ]ii nodes HITS nodes 4 4 1 1 6 6 7 7 10 10 16 16 2 2 3 3 44 44 27 27

35 Outline

1 Diffusion on networks

2 Hub and authority rankings in digraphs

3 Numerical methods of network analysis

4 Numerical examples

5 Bibliography

36 Computational tasks

We saw that many problems in network analysis lead to solving sparse linear systems (Katz), computing eigenvectors or singular vectors (eigenvector centrality, PageRank, HITS), and evaluating matrix functions f(A) (or f(L)), or their action on a prescribed vector. Typically, the diagonal entries of f(A) are wanted, and for large graphs some global averages over subsets of entries of f(A), often expressed in terms of traces or row/column sums. High accuracy is not always required. For example, in the case of it is often enough to be able to identify just the top-ranked nodes. In many cases, bounds are often sufficient. Remark: When only row and/or column sums of f(A) are required, there is no need to explicitly compute any of the entries of f(A).

37 Computational tasks (cont.)

Many of these computations can be formulated as evaluation of T N expressions of the form u f(A) v for suitably chosen vectors u, v ∈ R and for suitable functions f(x): T Subgraph centrality: SC(i) = ei exp (A) ei T Communicability: C(i, j) = ei exp (A) ej Average communicability: 1 hC(i)i = 1T exp (A) e − eT exp (A) e  N − 1 i i i √ Hub scores: SH(i) = eT cosh ( AAT ) e i √ i T T Authority scores: SA(i) = ei cosh ( A A) ei 1 T Total network communicability: TC(G) = N 1 exp(A) 1 Analogous resolvent-based, Katz-like quantities Question: How do we compute these quantities efficiently? 38 Golub & Meurant to the rescue!

Gene Golub and Gérard Meurant (Oxford, July 2007)

39 Golub & Meurant to the rescue!

40 Quadrature-based bounds

Explicit computation of f(A) is prohibitive for large graphs. Also, many real-world networks are “noisy”. For examples, edges in biological networks have many “false positives”. Gaussian quadrature rules can be used to obtain bounds on the entries of certain functions of symmetric matrices: Golub & Meurant (1994, 2010), Benzi & Golub (1999). Upper and lower bounds are available for the bilinear form

h(u, v) = hf(A)u, vi = uT f(A)v

with N u, v ∈ R (taking u = ei, v = ej yields [f(A)]ij) N×N A ∈ R symmetric f(x) strictly completely monotonic on an interval containing the spectrum of A.

41 Quadrature-based bounds (cont.)

Definition A real function f(x) is strictly completely monotonic on an interval I ⊂ R if f (2j)(x) > 0 and f (2j+1)(x) < 0 on I for all j ≥ 0.

Examples: f(x) = 1/x is s.c.m. on (0, ∞), f(x) = ex is not s.c.m., −x f(x) = e is s.c.m. on R. In particular, we can compute: bounds on eA = e−(−A), bounds on B−1 if B = I − αA is positive definite (that is, if 0 < α < 1 ). λmax(A)

42 Quadrature-based bounds (cont.)

Consider the spectral decompositions

A = QΛQT , f(A) = Qf(Λ)QT .

N For u, v ∈ R we have

N T T T T X u f(A)v = u Qf(Λ)Q v = p f(Λ)q = f(λi)piqi, i=1

where p = QT u and q = QT v. Rewrite this as a Riemann-Stieltjes integral:  0 λ < a = λ Z b  1 T  Pi u f(A)v = f(λ)dµ(λ), µ(λ) = j=1 pjqj λi ≤ λ < λi+1 a  PN  j=1 pjqj b = λN ≤ λ.

Caveat: Note the numbering of the eigenvalues of A here.

43 Gauss quadrature

The general Gauss-type quadrature rule is

n M Z b X X f(λ)dµ(λ) = wjf(tj) + vkf(zk) + R[f], a j=1 k=1

where the nodes {zk} are prescribed. Gauss: M = 0,

Gauss–Radau: M = 1, z1 = a or z2 = b,

Gauss–Lobatto: M = 2, z1 = a and z2 = b. The evaluation of these quadrature rules is reduced to computation of orthogonal polynomials via three-term recurrence, or, equivalently, computation of entries and spectral information of the corresponding (Lanczos).

44 Gauss quadrature (cont.)

For instance, we have for the Gauss rule:

n Z b X f(λ)dµ(λ) = wjf(tj) + R[f] a j=1

with n f (2n+M)(η) Z b h Y i2 R[f] = (λ − t ) dµ(λ), (2n + M)! j a j=1 for some a < η < b. Theorem (Golub-Meurant) n X T wjf(tj) = e1 f(Jn)e1 = [f(Jn)]11 j=1

45 Gauss quadrature (cont.)

The tridiagonal matrix Jn corresponds to the three-term recurrence relationship satisfied by the set of polynomials orthonormal with respect to dµ:   ω1 γ1  γ1 ω2 γ2     ......  Jn =  . . .     γn−2 ωn−1 γn−1  γn−1 ωn

The eigenvalues of Jn are the Gauss nodes, whereas the Gauss weights are given by the squares of the first entries of the normalized eigenvectors of Jn.

The quadrature rule is computed with the Golub–Welsch QR algorithm. Alternatively, the (1,1) entry of f(Jn) can be computed via explicit diagonalization of Jn. 46 Gauss quadrature (cont.)

Consider the case u = v = ei (corresp. to the (i, i) entry of f(A)).

The entries of Jn are computed using the symmetric Lanczos algorithm:

γjxj = rj = (A − ωjI)xj−1 − γj−1xj−2, j = 1, 2,... T ωj = xj−1Axj−1,

γj = krjk2

with initial vectors x−1 = 0 and x0 = ei. Two approaches for computing bounds: Explicit, a priori bounds are obtained by taking a single Lanczos step. Alternatively, one may explicitly carry out a certain number of Lanczos iterations (MMQ Matlab toolbox by G. Meurant). Each additional Lanczos step amounts to adding another node to the Gauss-type quadrature rule, resulting in tighter and tighter bounds.

47 Gauss quadrature (cont.)

For f(A) = eA and f(A) = (I − αA)−1 we obtain:

bounds on [f(A)]ii from symmetric Lanczos, bounds on [f(A)]ij using the identity 1 eT f(A)e = [(e + e )T f(A)(e + e ) − (e − e )T f(A)(e − e )], i j 4 i j i j i j i j

or, bounds on [f(A)]ii + [f(A)]ij using nonsymmetric Lanczos, lower bounds from the Gauss and the Gauss-Radau rules, upper bounds from the Gauss-Radau and Gauss-Lobatto rules. In computations one can use the simple (Geršgorin) estimates

λmin ≈ − max {deg(i)}, λmax ≈ max {deg(i)}. 1≤i≤N 1≤i≤N

where deg(i) is the of node vi. Using more accurate estimates of the extreme eigenvalues of A generally leads to improved results, especially for scale-free graphs. 48 Explicit bounds for SC(i) and EE(G)

Carrying out explicitly (“by hand”) a single Lanczos step, we obtain the following bounds for centrality: Gauss:

Ti 0 q q 1 d 3 3 3 3 e i q 4Ti + 4di 4Ti + 4di [eA] ≥ B 4T 3 + 4d3 cosh − 2T sinh C ii q @ i i i A 3 3 2di 2di 4Ti + 4di

Gauss-Radau:

d d 2 i −b 2 i −a b e b + die A a e a + die 2 ≤ [e ]ii ≤ 2 b + di a + di Gauss-Lobatto: ae−b − be−a [eA] ≤ . ii a − b Bounds on the Estrada index are easily computed by summing over i. In these formulas, Ti denotes the number of triangles (closed walks of length three) in G with a at node i. 49 Explicit bounds for communicability

The Gauss-Radau rule gives the following bounds:

T d 2 ij −b 2 i −a b e b + Tije a e a + die A 2 − 2 ≤ [e ]ij b + Tij a + di

T d 2 ij −a 2 i −b A a e a + Tije b e b + die [e ]ij ≤ 2 − 2 . a + Tij b + di P 2 where Tij := k6=i Aki(Aki + Akj) − (Aij) ≥ 0.

50 Explicit bounds for resolvent centrality

Let B = I − A/(N − 1). Note that B is a symmetric M-matrix. We have: P P B B B k6=i `6=i ki k` `i −1 (Gauss) 2 ≤ [B ]ii, P P di k6=i `6=i BkiBk`B`i − (N−1)4 1 − b + di 1 − a + di b(N−1)2 −1 a(N−1)2 (Radau) ≤ [B ]ii ≤ , di di 1 − b + (N−1)2 1 − a + (N−1)2 a + b − 1 (Lobatto) B−1 ≤ ii ab and N 2 − 3N + 2 1 ≤ [B−1] ≤ 2 . ii N 2 − 3N + 3 M. Benzi & P. Boito, Quadrature rule-based bounds for functions of adjacency matrices, Linear Algebra Appl., 433: 637–652 (2010).

51 MMQ (iteratively computed) bounds

This refers to the (increasingly tight) bounds obtained by adding quadrature nodes (= performing additional Lanczos steps) to estimate entries of f(A); Computations are performed using G. Meurant’s MMQ package, suitably adapted to handle sparse matrices; Fast and accurate computation of upper and lower bounds for centrality and communicability; Complexity is at most O(N) per iteration (much less for the first 2-3 iterations) ; For graphs with bounded max degree, number of iterations does not depend on the graph size N (see below); Parallelization is possible on shared memory machines.

52 Low-rank approximations

A major drawback of subgraph centrality (and communicability) in the case of large networks is cost.

The approximate O(N 2) scaling to evaluate the subgraph centrality of each node for a network with N nodes is unacceptably high for networks with millions of nodes (or more).

The cost can be reduced by working with low-rank approximations of A. Estimating a small number k  N of the largest eigenpairs (λi, xi) and using the approximation

k A X λi T e ≈ e xixi i=1 can lead to significant savings. The accuracy is typically good due to rapid decay in the eigenvalues of A for many networks.

53 Low-rank approximations (cont.)

Additional savings can be obtained if one only wants to identify the top-k nodes, where k  N. Similarly, in case of digraphs A can be approximated in terms of a few dominant singular triplets. Other improvements can be obtained when computing communicabilities by using the block Lanczos algorithm. For details, we refer to the following works: C. Fenu, D. Martin, L. Reichel, & G. Rodriguez, Network analysis via partial spectral factorization and Gauss quadrature, SIAM J. Sci. Comput., 35 (2013), pp. A2046–2068. C. Fenu, D. Martin, L. Reichel, & G. Rodriguez, Block Gauss and anti-Gauss quadrature with application to networks, SIAM J. Matrix Anal. Appl., 34 (2013), pp. 1655–1684. J. Baglama, C. Fenu, L. Reichel, & G. Rodriguez, Analysis of directed networks via partial singular value decomposition and Gauss quadrature, Linear Algebra Appl., 456 (2014), pp. 93–121. 54 Computing row and column sums

Recall that the computations of row/column sums of matrix functions amounts to computing the action of f(A) or f(AT ) = f(A)T on the vector 1 of all ones.

For instance, computing the total communicability of a graph with adjacency matrix A, 1 TC(G) = 1T eA1, N amounts to taking the arithmetic mean of the entries of the vector eA1.

This can be accomplished very fast using Krylov subspace methods for evaluating the action of the on a vector.

A very efficient Matlab tolbox has been developed by Stefan Güttel: http://www.mathe.tu-freiberg.de/∼guettels/funm_kryl

55 Krylov subspace methods

Krylov subspace methods are the algorithms of choice for solving many linear algebra problems, including: Large-scale linear systems of equations Ax = b Large-scale (generalized) eigenvalue problems Ax = λB x Computing f(A)v where A is a large matrix, v is a vector and f a given function (e.g., f(A) = e−tA, t > 0) Solving matrix equations (e.g., AX + XA∗ + B = 0)

An attractive feature of Krylov methods in some applications is that the matrix A is not needed explicitly: only the ability to multiply A times a given vector is required (action, or operator principle).

56 Krylov subspace methods (cont.)

The main idea behind Krylov methods is the following: A nested sequence of suitable low-dimensional subspaces (the Krylov subspaces) is generated; The original problem is projected onto these subspaces; The (small) projected problems are solved “exactly"; The approximate solution is expanded back to the original N-dimensional space, once sufficient accuracy has been attained.

Krylov subspace methods are example of polynomial approximation methods, where f(A)v is approximated by p(A)v where p is a (low-degree) polynomial. Since every matrix function f(A) is a polynomial in A, this is appropriate.

57 Krylov subspace methods (cont.)

N×N N The kth Krylov subspace of A ∈ C and a nonzero vector v ∈ C is defined by k−1 Kk(A, v) = span {v,Av,...,A v}, and it can be written as

Kk(A, v) = {q(A)v | q is a polynomial of degree ≤ k − 1}.

Obviously,

K1(A, v) ⊂ K2(A, v) ⊂ · · · ⊂ Kd(A, v) = ··· = KN (A, v).

Here d is the degree of the minimum polynomial of A with respect to v.

58 Krylov subspace methods (cont.)

As is well known, computing projections onto a subspace is greatly facilitated if an orthonormal basis for the subspace is known. This is also desirable for numerical stability reasons.

An orthonormal basis for a Krylov subspace can be efficiently constructed using the Arnoldi process; in the Hermitian case, this is known as the Lanczos process (Arnoldi, 1951; Lanczos, 1952). Both of these are efficient implementations of the classical Gram–Schmidt process.

In Arnoldi’s method, the projected matrix has upper Hessenberg structure, which can be exploited in the computation. In the Hermitian case it is tridiagonal.

59 Arnoldi’s process

N×N N For arbitrary A ∈ C and v ∈ C , v 6= 0, the Arnoldi process is:

Set q1 = v/kvk2; For j = 1, . . . , m do: – hi,j = hAqi, qji for i = 1, 2, . . . , j Pj – uj = Aqj − i=1 hi,jqi – hj+1,j = kujk2 – If hj+1,j = 0 then STOP; – qj+1 = uj/kujk2

Remarks: (i) If the algorithm does not stop before the mth step, the Arnoldi vectors {q1,..., qm} form an ONB for the Krylov subspace Km(A, v). (ii) At each step j, the Arnoldi process requires one matrix-vector product, j + 1 inner products, and j linked triads.

(iii) At each step the algorithm computes Aqj and then orthonormalizes it against all previously computed qj’s.

60 Arnoldi’s process (cont.)

N×m Define Qm = [q1,..., qm] ∈ C . Introducing the (m + 1) × m matrix Hbm = [hij] and the m × m upper Hm obtained by deleting the last row of Hbm, the following Arnoldi relations hold:

T AQm = QmHm + umem = Qm+1Hbm

∗ QmAQm = Hm ∗ Hence, the m × m matrix Hm is precisely the projected matrix QmAQm. ∗ ∗ If A = A , then Hm = Hm = Tm is a tridiagonal matrix, and the Arnoldi process reduces to the Lanczos process, which is much cheaper in terms of both operations and storage.

Indeed, the Lanczos process consists of a three-term recurrence, with constant operation count and storage costs per step, whereas the Arnoldi process has increasing costs for increasing j.

61 Krylov subspace approximation to f(A)v

To approximate f(A)v we take q1 = v/kvk2 and for an appropriate k we compute ∗ fk := kvk2Qkf(Hk)e1 = Qkf(Hk)Qkv.

The vector fk is the kth approximation to f(A)v. Typically, k  N and computing f(Hk) is inexpensive, and can be carried out in a number of ∗ ways. For instance, when Hk = Hk = Tk, it can be computed via explicit diagonalization of Tk.

Deciding when this approximation fk is sufficiently close to f(A)v to stop the process is a non-trivial problem, and an area of active research.

For the case of the matrix exponential, Saad (1992) suggested the error estimate A Hk ∗ T Hk ke v − Qke Qkvk2 ≈ kvk2ek e e1 .

T Hk Hk Note that ek e e1 is the (k, 1) entry of e .

62 Outline

1 Diffusion on networks

2 Hub and authority rankings in digraphs

3 Numerical methods of network analysis

4 Numerical examples

5 Bibliography

63 Summary of results

We begin with subgraph centrality and communicability calculations. Numerical experiments were performed on a number of networks, both synthetic (generated with the CONTEST Toolbox for Matlab, Taylor & Higham 2010) and and from real-world applications, including biological, social, collaboration, and transportation networks.

A few (3-5) Lanczos steps per estimate are usually enough Independent of N for graphs with bounded max degree Almost as good for power-law degree distributions Very efficient approach for computing global averages (like the total communicability)

M. Benzi and P. Boito, Quadrature rule-based bounds for functions of adjacency matrices, Linear Algebra Appl., 433: 637–652 (2010). M. Benzi and C. Klymko, Total communicability as a centrality measure, J. Complex Networks, 1(2):124–149 (2013). 64 Numerical examples

Experiments are performed on "random" matrices, "small world" matrices and "range-dependent" matrices, generated using the CONTEST toolbox for Matlab ([Taylor & Higham 2010]). We test a priori and a posteriori bounds and compare them to previous results.

65 Small-world networks (Watts–Strogatz model)

A=smallw(N,k,p) N positive integer (number of nodes) k positive integer (number of connected neighbors) 0 ≤ p ≤ 1 (probability of a shortcut)

The Watts–Strogatz small-world model interpolates between a regular lattice and a .

66 Small-world networks (Watts–Strogatz model)

Begin with a ring...

67 Small-world networks (Watts–Strogatz model)

Begin with a ring...

68 Small-world networks (Watts–Strogatz model)

Begin with a ring...

69 Small-world networks (Watts–Strogatz model)

Begin with a ring... and add random shortcuts.

70 Small-world networks (Watts–Strogatz model)

Here is the corresponding adjacency matrix (N = 15, k = 1, p = 0.4):

0

2

4

6

8

10

12

14

16 0 2 4 6 8 10 12 14 16 nz = 40

71 Range-dependent networks

A=renga(N,α,λ) N positive integer (number of nodes) α > 0 0 ≤ λ ≤ 1.

Choose an ordering of the nodes (i = 1, 2,...,N). Nodes i and j have a probability α · λ|i−j|−1 of being connected. Good model for protein-protein interaction.

72 Range-dependent networks

Sparsity pattern for N = 100, α = 1 and λ = 0.85:

0

10

20

30

40

50

60

70

80

90

100 0 10 20 30 40 50 60 70 80 90 100 nz = 1256

73 Erdös-Rényi (random) networks

A=erdrey(N,m) N positive integer (number of nodes) m positive integer (number of edges)

An (N, m)-Erdös-Rényi graph is a graph chosen uniformly at random in the set of all (suitable) graphs with N nodes and m edges.

74 Erdös-Rényi (random) networks

Sparsity pattern for N = 100, m = 300:

0

10

20

30

40

50

60

70

80

90

100 0 10 20 30 40 50 60 70 80 90 100 nz = 600

75 A priori bounds for EE(G)

More bounds for EE(G) are found in the literature (de la Peña et al., LAA (2007)): p √ N 2 + 4m ≤ EE(G) ≤ N − 1 + e 2m.

We compare bounds on EE(G) for a 100 × 100 range-dependent test matrix:

de la Peña Radau Gauss EE(G) 1.1134 · 103 1.8931 · 103 1.8457 · 104 5.4802 · 105

EE(G) Radau Lobatto de la Peña 5.4802 · 105 2.3373 · 108 1.3379 · 107 1.0761 · 1015

76 A priori bounds for EE(G)

Logarithmic plot of bounds on the Estrada index for small world matrices of increasing size.

30

Estrada index Radau (lower) 25 de la Peña (lower) Radau (upper) de la Peña (upper)

20

15

10

5

0 40 60 80 100 120 140 160 180 200 220 N

77 MMQ bounds for EE(G) and communicability

A is a 100 × 100 range dependent matrix. MMQ approximations of the Estrada index EE(G) = 425.0661

# it 1 2 3 4 5 Gauss 348.9706 416.3091 424.4671 425.0413 425.0655 Radau (lower) 378.9460 420.6532 424.8102 425.0570 425.0659 Radau (upper) 652.8555 437.7018 425.6054 425.0828 425.0664 Lobatto 2117.9233 531.1509 430.3970 425.2707 425.0718

A MMQ approximations of [e ]1,5 = 0.396425 # it 1 2 3 4 5 Radau (lower) −2.37728 0.213316 0.388791 0.396141 0.396420 Radau (upper) 4.35461 0.595155 0.404905 0.396626 0.396431

78 MMQ bounds w.r.t. matrix size

Relative error for the Estrada index of small world matrices of parameters (4, 0.1); averaged over 10 matrices; 5 iterations.

N error (upper bound) error (lower bound) 50 4.87e−5 4.35e−5 100 5.05e−5 4.09e−5 150 5.31e−5 3.98e−5 200 5.05e−5 3.57e−5 250 5.57e−5 3.84e−5 300 5.63e−5 3.73e−5

79 MMQ bounds w.r.t. matrix size (cont.)

Bounds for the Estrada index of small world matrices of increasing size. Watts–Strogatz model with parameters (4, 0.1); five Lanczos iterations.

N Gauss Gauss-Radau (l) Gauss-Radau (u) Gauss-Lobatto 1000 1.64e5 1.63e5 1.64e5 1.66e5 2500 4.09e5 4.10e5 4.11e5 4.12e5 5000 8.18e5 8.18e5 8.20e5 8.33e5 7500 1.23e6 1.22e6 1.22e6 1.25e6 10000 1.63e6 1.63e6 1.64e6 1.66e6

Five iterations suffice to reach a small relative error, independent of N.

80 MMQ bounds for resolvent centrality

Relative errors for MMQ bounds on the resolvent centrality of node 10 for Erdös–Rényi matrices associated with graphs with N vertices and 4N edges; averaged on 10 matrices; 2 iterations.

N Gauss Radau (lower) Radau (upper) Lobatto 100 3.00e−9 1.29e−11 1.84e−9 2.70e−9 200 3.02e−11 4.65e−14 6.99e−12 2.79e−11 300 2.51e−12 8.35e−15 1.21e−12 2.38e−12 400 3.15e−13 5.55e−16 1.67e−14 2.85e−13

In this case the error decreases because I − A/(N − 1) approaches the I as N → ∞.

81 Asymptotic independence of convergence rate on N

Theorem

Let {AN } be the adjacency matrices associated with a sequence of graphs {GN } of size N → ∞. Assume that the degree di of any node in GN satisfies di ≤ D for all N, with D constant. Let f be an analytic function defined on a region containing the interval [−D,D]. Then, for any given ε > 0, the number of Lanczos steps (or, equivalently, of quadrature nodes) needed to approximate any entry [f(A)]ij with an error < ε is bounded independently of N.

The assumption that di ≤ D for all N may be restrictive for certain networks (e.g., Twitter). More generally, we have proved that the number of Lanczos steps grows at worse like O(DN ), where DN is the max degree of any node in GN . The result is a consequence of the fact that the coefficients of an analytic function in the Chebyshev expansion decay at least exponentially fast.

82 Some timings results in Matlab

180

160

140

120

100

80

60

40

20

0 1000 1500 2000 2500 3000 3500 4000 Timings (in seconds) for computing the diagonal of the exponential of small-world matrices of increasing order N. Blue: time for the Matlab expm function. Black: time using eig. Red: using 5 iterations of the Lanczos algorithm for each node. 83 Some timings results in Matlab (cont.)

120

100

80

60

40

20

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

Plot of the time in seconds for computing the diagonal entries of the exponential of of small world matrices of size N = 1000, 5000, 10000 using 5 iterations of the Lanczos algorithm for each node. 84 Results for scale-free networks

Experiments were performed on scale-free networks (Barabási– Albert model) of increasing size, to see the effect of increasing maximum degree.

In this case, the Geršgorin estimate λmax ≈ dmax gives poor results. Indeed, whereas the max degree increases with N, the spread√ of the eigenvalues tend to grow much more slowly (roughly, λmax = O( dmax) for N → ∞.)

A better strategy is to estimate λmin, λmax using Lanczos. The number of Lanczos steps per estimate (either diagonal elements or off-diagonal ones) increases very slowly with N, for a fixed accuracy. Typically, 9-11 Lanczos steps suffice to achieve relative errors around 10−6.

85 Results for real world networks

Network N NNZ λ1 λ2 Zachary Karate Club 34 156 6.726 4.977 Drug Users 616 4024 18.010 14.234 Yeast PPI 2224 13218 19.486 16.134 Pajek/Erdos971 472 2628 16.710 10.199 Pajek/Erdos972 5488 14170 14.448 11.886 Pajek/Erdos982 5822 14750 14.819 12.005 Pajek/Erdos992 6100 15030 15.131 12.092 SNAP/ca-GrQc 5242 28980 45.617 38.122 SNAP/ca-HepTh 9877 51971 31.035 23.004 SNAP/as-735 7716 26467 46.893 27.823 Gleich/Minnesota 2642 6606 3.2324 3.2319

Characteristics of selected real world networks. All networks are undirected.

86 Results for real world networks (cont.)

Network expm mmq funm_kryl Zachary Karate Club 0.062 0.138 0.120 Drug Users 0.746 2.416 0.363 Yeast PPI 47.794 9.341 0.402 Pajek/Erdos971 0.542 2.447 0.317 Pajek/Erdos972 579.214 35.674 0.410 Pajek/Erdos982 612.920 39.242 0.393 Pajek/Erdos992 656.270 53.019 0.325 SNAP/ca-GrQc 281.814 23.603 0.465 SNAP/ca-HepTh 2710.802 58.377 0.435 SNAP/as-735 2041.439 75.619 0.498 Gleich/Minnesota 1.956 10.955 0.329

Timings (in seconds) to compute centrality rankings based on the diagonal and row sum of eA for various test problems using different codes. expm: Matlab’s built-in matrix exponential; mmq: Meurant’s code, modified; funm_kryl: Güttel’s code.

87 A large example: the Wikipedia graph

The Wikipedia graph is a directed graph with N = 4, 189, 503 nodes and 67, 197, 636 edges (as of June 6, 2011).

The row/columns of the exponential of the adjacency matrix can be used to rank the hubs and authorities in Wikipedia in order of importance.

Using the Arnoldi-based code funm_kryl we can compute the hub and authority rankings for all the nodes in about 216s to high accuracy on a parallel system comprising 24 Intel(R) Xeon(R) E5-2630 2.30GHz CPUs.

88 Summary of results (cont.)

Computing scores requires the solution of large and sparse linear systems of the form (I − αA)x = 1. For this task, iterative methods should be used. Note that for an undirected graph the condition number of I − αA is bounded by κ = 2 . For α not too close to 1 , convergence is usually fast; for 1−αλ1 λ1 example, conjugate gradient, Richardson and Chebyshev iterations have all been found to perform well. Often, simple preconditioners give the best results. On the other hand, for large networks (N = O(106) or larger) one would like to parallelize the computation. It turns out that for most complex networks this is very difficult, due to the fact that generalized nested dissection and similar orderings result in huge separator sets and therefore very high communication costs.

89 Summary of results (cont.)

For the same reasons, sparse direct solvers generate enormous amounts of fill-in and are generally unsuitable for solving problems. Example: GeneRank linear system of order N = 152, 520 from computational genomics, 791, 768 nonzeros, SPD. Matlab’s backslash: 42.3 sec CG with no preconditioning: 1.46 sec CG with diagonal preconditioning: 0.42 sec Chebyshev with diagonal preconditioning: 0.18 sec With AMD reordering, the Cholesky factor contains 211,825,948 nonzeros!

M. Benzi and V. Kuhlemann, Chebyshev acceleration of the GeneRank algorithm, Electr. Trans. Numer. Anal., 40 (2013), 311–320.

90 Summary of results (cont.)

In many problems involving the matrix exponential, convergence is usually very fast; one should also keep in mind that high accuracy is seldom needed in this context. As already mentioned, for very large problems, low rank approximations of the adjacency matrix can be used to reduce cost. For example, centrality calculations can often be performed working with just the 5-10 leading eigenpairs (or singular triplets) of the adjacency matrix. The presence of a few dominant eigenvalues well-separated from the rest of the spectrum is of great help. This is another aspect in which complex network problems behave very differently from discrete PDE problems.

91 Concluding remarks

Network Science is an exciting multidisciplinary field undergoing rapid development. Research opportunities abound, and applied mathematicians can make a number of important contributions on all fronts: theory, modeling, and computation.

Current emphasis is on dynamic processes involving time-evolving networks.

The notions of centrality, communicability, community, diffusion etc. are being adapted to this setting, but much more work remains to be done, including (especially?) on the algorithmic aspects.

92 Outline

1 Diffusion on networks

2 Hub and authority rankings in digraphs

3 Numerical methods of network analysis

4 Numerical examples

5 Bibliography

93 Monographs

G. H. Golub and G. Meurant, Matrices, Moments and Quadrature with Applications, Princeton University Press, 2010.

J. Liesen and Z. Strakos, Krylov Subspace Methods: Principles and Analysis, Oxford University Press, 2013.

94 Software

Complex Networks Package for Matlab:

http://www.levmuchnik.net/Content/Networks/ComplexNetworksPackage.html

CONTEST Matlab Toolbox:

http://www.mathstat.strath.ac.uk/research/groups/numerical_analysis/contest/toolbox

G. Meurant’s MMQ package: http://pagesperso-orange.fr/gerard.meurant/.

Stefan Güttel package: http://www.mathe.tu-freiberg.de/∼guettels/funm_kryl

95