Journal of Computational Dynamics doi:10.3934/jcd.2015.2.1 c American Institute of Mathematical Sciences Volume 2, Number 1, June 2015 pp. 1–24

MODULARITY OF DIRECTED NETWORKS: CYCLE DECOMPOSITION APPROACH

Nataˇsa Djurdjevac Conrad Freie Universit¨atBerlin Department of Mathematics and Computer Science Arnimallee 6, 14195 Berlin, Germany and Zuse Institute Berlin Takustrasse 7, 14195 Berlin, Germany Ralf Banisch Freie Universit¨atBerlin Department of Mathematics and Computer Science Arnimallee 6, 14195 Berlin, Germany Christof Schutte¨ Freie Universit¨atBerlin Department of Mathematics and Computer Science Arnimallee 6, 14195 Berlin, Germany and Zuse Institute Berlin Takustrasse 7, 14195 Berlin, Germany

Abstract. The problem of decomposing networks into modules (or clusters) has gained much attention in recent years, as it can account for a coarse- grained description of complex systems, often revealing functional subunits of these systems. A variety of module detection algorithms have been proposed, mostly oriented towards finding hard partitionings of undirected networks. De- spite the increasing number of fuzzy clustering methods for directed networks, many of these approaches tend to neglect important directional information. In this paper, we present a novel random walk based approach for finding fuzzy partitions of directed, weighted networks, where edge directions play a crucial role in defining how well nodes in a module are interconnected. We will show that cycle decomposition of a random walk process connects the notion of network modules and information transport in a network, leading to a new, symmetric measure of node communication. Finally, we will use this measure to introduce a communication graph, for which we will show that although being undirected it inherits important directional information of modular structures from the original network.

1. Introduction. In recent years complex networks have become widely recog- nized as a powerful abstraction of real-world systems [25]. The simple form of network representation allows an easy description of various systems, such as bio- logical, technological, physical and social systems [26]. Thus, network analysis can

2010 Mathematics Subject Classification. Primary: 60J20, 05C81; Secondary: 94C15. Key words and phrases. Directed networks, modules, cycle decomposition, measure of node communication.

1 2 NATASAˇ DJURDJEVAC CONRAD, RALF BANISCH AND CHRISTOF SCHUTTE¨ be used as a tool to understand the structural and functional organization of real- world systems. One of the most important network substructures are modules (or clusters, community structures), that typically correspond to groups of nodes which are similar to each other in some sense. In the case of undirected networks, mod- ules represent densely interconnected subgraphs having relatively few connections to the rest of the network [13]. Finding modules leads to a coarse-grained descrip- tion of the network by smaller subunits, which in many cases have been found to correspond to functional units of real-world systems, such as protein complexes in biological networks [6]. Although a large number of different clustering algorithms [12, 25, 31] have been developed, most of them are oriented towards: (i) finding hard (or full, complete) partitions of networks into modules, where every node is assigned to exactly one module; (ii) analyzing undirected networks. Several approaches have been proposed for module identification where overlaps between modules are allowed; for example k- percolation [30] and the extension of modularity score for overlapping par- titions [28]. However, in this paper we will not consider complete or overlapping partitions of the network. Instead, we will consider modules as groups of densely connected nodes that do not partition the network completely and do not overlap. This means that there will remain nodes that are not assigned to any module at all and we will call the set of these nodes the transition region. Nodes in the transition region are affiliated to several modules at once in a probabilistic fashion, which reflects uncertainty in the clustering. We call such a clustering a fuzzy clustering. This is a useful concept when considering imperfectly structured networks such as real-world networks. Very few fuzzy clustering methods have been discussed in the literature [8, 35, 21] recently, all of which are restricted to undirected networks. However, many real-world networks are weighted and directed [4, 20, 22] such as metabolic networks, email networks, World Wide Web, food webs etc. In such networks edge directions contain important information about the underlying sys- tem and as such should not be dismissed [18]. So far, several approaches for finding modules in directed networks have been developed[19, 40, 20, 37], see [22] for an extensive overview. Among these, generalizations of modularity optimization be- came the most popular group of approaches [2, 20, 16]. We will refer to this family of methods as generalized Newman-Girvan (gNG) approaches. Despite their frequent application, modularity based methods are typically only sensitive to the density of links, but tend to neglect information encoded in the pattern of the di- rections, see [18, 36]. This limitation of gNG methods (but also one-step methods and methods based on network symmetrization) is illustrated with the following example. Using a gNG approach, the network shown in Figure 1a is partitioned into three modules C1, C2 and C3 with high link density. However, C3 is qualitatively different from C1 and C2: While in C1 and C2 edge directions are fairly symmetric, in C3 they are completely asymmetric such that all nodes in C3 are connected to each other by short paths in one direction and long paths in the other direction. For example, nodes A, B ∈ C3 are connected by a directed edge (AB), but in the direction from B to A they are connected only by long paths that pass through the whole network. In this paper, we will require nodes in a module to be close in both directions and therefore wish to reject a structure such as C3 from being modular. The result of the fuzzy clustering algorithm we present here is shown in Figure 1b. MODULARITY OF DIRECTED NETWORKS 3

B

A C1 C1

C3

C2 C2

0 0.5 1 (a) (b)

Figure 1. Illustration of a false module identification: (a) clus- tering into three modules (red, blue and green) by gNG algorithm. (b) result of our new clustering algorithm into two modules (red and blue), where colors indicate the level of affiliation to one of the two modules.

The main topic of this paper is on finding modules in directed, weighted net- works using random walks, while explicitly respecting information encoded in the pattern of the directions. Our approach is a dynamics-based multi-step community detection algorithm and as such can deal with directed networks in a natural way. A similar dynamics-based approach is Markov stability [17], but this algorithm can produce only hard partitionings. In this paper, we will show that the notion of modules in directed networks is closely related to network cycles and described by dynamical properties of a random walk process, see Sections 2.2 and 2.3. We will make this connection precise in Section 2.4, by using a cycle decomposition of this Markov process to introduce a novel measure of communication Ixy between nodes. In Section 2.5, we will demonstrate that Ixy indeed reflects information transport in the network and as such we will use Ixy to introduce a communication graph which, although being undirected, inherits important directional information about modular structures from the original network by construction. That is the key con- tribution of this paper: fuzzy clustering approaches for undirected graphs can be used for finding fuzzy partitions of directed graphs. In Section3 we will describe our new algorithmic approach and its relevant computational aspects. Finally, we will demonstrate our method on several examples in Section4 and compare it to gNG methods, indicating a possible improvement of modularity function.

2. Theoretical background.

2.1. Random walks on directed networks. We consider the weighted, directed network G = (V ; E), where V is the set of nodes and E the set of edges. We further assume that the network is strongly connected, i.e. there is a directed path from every node to every other node in the network. In particular, this means that the 4 NATASAˇ DJURDJEVAC CONRAD, RALF BANISCH AND CHRISTOF SCHUTTE¨ network has no sinks and sources. Now, we can define a time-discrete random walk process (Xn)n on the network with a transition matrix P K(x, y) p = , (1) xy K+(x) + P where K(x, y) denotes the weight of an edge (xy) ∈ E and K (x) = y K(x, y) the weighted out- of a node x. Since the networks we consider are strongly connected, the introduced random walk process is ergodic and has a unique sta- tionary distribution π with πT P = πT . However, this process is not time-reversible and the detailed balance condition

πxpxy = πypyx, x, y ∈ V doesn’t hold. For this reason, most of the existing spectral methods fail to deal with finding modules in directed networks, as they are restricted to analysis of reversible processes. 2.2. Cycle decomposition of flows. In this section we will briefly introduce the theory of cycle decompositions of Markov processes on finite state spaces as developed in [15] and [14]. We will use this decomposition to describe the random walk process defined by (1) using cycles. Let us introduce a probability flow F : E → R, such that Fxy = πxpxy for every edge (xy) ∈ E. Then, the equation πT P = πT of stationarity for π directly yields conservation of the flow at every node x X X Fyx = Fxy. (2) y y Now we will generalize the notion of edge flows to cycle flows, where we define cycles in the following way Definition 2.1. An n-cycle (or n-) on G is defined as an ordered sequence 1 of n connected nodes γ = (x1, x2, . . . , xn), where (xi, xi+1) ∈ E for i = 1, . . . , n − 1 and (xn, x1) ∈ E, whose length we denote by |γ| = n. Cycles that do not have self-intersections are called simple cycles and we denote with C the collection of simple cycles in G. In Figure2 are shown examples of two simple cycles α and β, where |α| = 4 and |β| = 2. As a consequence of ergodicity and the flow conservation law (2), we can find a cycle decomposition of F into cycles. More precisely, there is a collection Γ ⊂ C of cycles γ with real positive weights F (γ) such that for every edge (xy) the flow decomposition formula holds X Fxy = F (γ), (3) γ⊃(xy) where we write γ ⊃ (xy) if the edge (xy) is in γ. Condition (3) does not specify cycle decompositions uniquely, and different approaches for finding a particular one have been developed [38, 41, 15,1]. An algorithm termed deterministic algorithm in [15] finds a cycle decomposition in polynomial time in |V | by iteratively reducing the flow F . A more algebraic approach due to Schnakenberg [38] uses the fact that cycles form a vector space of dimension d = |E|−|V |+1, and realizes Γ as a basis of

1More precisely, cycles are equivalence classes of ordered sequences up to cycle permutations. In this note we do not distinguish between cycles and their representatives. MODULARITY OF DIRECTED NETWORKS 5

y 1 β x 2 One realization of the Markov chain:

RW process: X1 X2 X3 X4 X5 X6 X7 X8 X9

x 3 α visted states: x1 x2 y1 x2 y1 x2 x3 x4 x1 x 1 visited cycles: β β α

x 4

Figure 2. Examples of two simple cycles α and β, |α| = 4 and |β| = 2, and one realization of the Markov chain (Xn)x. this vector space which is constructed via a minimal spanning tree. We give more details on these algorithms in section 3.1. In this paper we will use the stochastic decomposition approach introduced in [15, 14], which leads to a unique cycle decomposition with the additional prop- erty that the weights F (γ) have a stochastic interpretation. Let (Xn)1≤n≤T be a realization of the Markov chain given by (1). We will say that (Xn)1≤n≤T passes through the edge (xy) if there exists k < T such that Xk = x and Xk+1 = y. The process (Xn)1≤n≤T passes through a cycle γ if it passes through all edges of a cycle γ in the right order, but not necessarily consecutively meaning that excur- sions through one or more full new cycles are allowed. This behavior is illustrated in Figure2. γ Let NT be the number of times (Xn)1≤n≤T passes through a cycle γ up to time T . The limit N γ w(γ) := lim T (4) T →∞ T exists almost surely [14]. Then, if the following weights condition (WC) holds WC : Γ = {γ|w(γ) > 0},F (γ) = w(γ), (5) we obtain a unique cycle decomposition (Γ, w). In terms of the random walk process, the weight w(γ) denotes the mean number of passages through a cycle γ of almost all sample paths (Xn)1≤n≤T . We will refer to w(γ) as an importance weight for the cycle γ. An explicit formula to calculate the weights w(γ) was given in [14], which reads γ = (x1, . . . xn)

w(γ) = πx1 px1x2 . . . pxnx1 Nx1...xn (6) where the normalization factor Nx1...xn is a product of taboo Green’s functions accounting for the excursions the random walk process can take while completing γ. It is hard to calculate w(γ) in general, so we will use (6) for illustration purposes only, see [14] for more details. As an example we will consider the barbell graph presented in Figure3. This is a weighted, consisting of two cycles αl = (l0, l1, . . . , ln−1) and αr = (r0, r1, . . . , rn−1) with n nodes each and all edge weights are one. These two cycles are joined by an undirected edge (or 2−cycle) αc = (l0, r0) with edge weight 6 NATASAˇ DJURDJEVAC CONRAD, RALF BANISCH AND CHRISTOF SCHUTTE¨

original graph G cycle decomposition ln-1 r1 ln-1 r1

ε α αr l αc l0 r0 l0 r0

rn-1 r l1 l1 n-1

Figure 3. The barbell graph example network, consisting of two cycles with n nodes joined by an edge with weight . The cycle decomposition of the barbell graph consists of three cycles αl, αc and αr.

K(l0, r0) = K(r0, l0) = ε < 1. Then, the probability of a standard random walk  process (1) leaving any of two cycles αl or αr at l0 or r0 respectively, is pl0r0 = 1+ = pr0l0 . The central nodes have the same stationary distribution πl0 = πr0 = πc and for all other nodes we have πli = πri = πl. Using the flow conservation property ε (2) at one of the central nodes we get the equation πl + πc 1+ε = πc from which we can easily calculate 1 1 + ε π = , π = (1 + ε)π = . l 2(n + ε) c l 2(n + ε)

Since every edge belongs to exactly one of the three cycles αc = (l0, r0), αl = (l0, l1, . . . , ln−1) and αr = (r0, r1, . . . , rn−1), the weights of these cycles can be inferred directly from (3) as 1 w(α ) = w(α ) = =: w, w(α ) = εw. l r 2(n + ε) c 2.3. Cycle-based random walks. In this section, we will use the stochastic cycle decomposition to introduce reversible processes that are a good approximation of the original non-reverisble process, in the sense that they contain all the important directional information needed for the module identification. This will allow us to construct graph transformations which will map a directed graph into an undirected graph, such that cycle node connections (i.e. if x, y ∈ α) and the cycle importance weights are encoded into the structure of the new graph. Then, we will be able to re- duce the original module finding problem on directed networks to a well understood problem of finding module partitions of undirected networks. We start by reformulating the original dynamics (Xn)n on vertices V in terms of dynamics on the cycle space Γ. Namely, instead of the dynamics of the standard random walk process with jumps from one node to another node, we will look at the random walk process with dynamics of the type node-cycle-node. To this end, we introduce

Definition 2.2. The process (X˜n)n is a stochastic process on V generated by the following procedure:

1. If X˜n = x, then draw the next cycle ξn according to the conditional probabil- ities ( 1 (x) π w(α), x ∈ α P(ξn = α|X˜n = x) =: b = x α 0 x∈ / α. MODULARITY OF DIRECTED NETWORKS 7

2. If ξn = α, then follow the cycle α one step to obtain X˜n+1 = y, (xy) ⊂ α. In this process, jumps from a node x to a cycle α are probabilistic with prob- (x) abilities bα given by the cycle decomposition {Γ, w}. Using (3) it follows that P (x) α⊃x w(α) = πx, so that bα form indeed conditional probabilities. Furthermore, (X˜n)n is a Markov process with a transition matrix P X P(X˜n+1 = y|X˜n = x) = P(ξn = α|X˜n = x) α⊃(xy) 1 X = w(α) πx α⊃(xy) (3) 1 = Fxy = pxy. πx

Thus, (Xn)n and (X˜n)n are statistically identical and we will refer to them as (Xn)n in the rest of the paper. As a byproduct we obtained the associated chain (ξn)n ⊂ Γ of cycles used by (Xn)n. Notice that cycle-node jumps are deterministic and that (ξn)n is a non-Markovian process. To see this, consider the barbell graph in Figure 3. We have P(ξn+2 = αr|ξn+1 = αr, ξn = αc) = 1 since in this case Xn+2 = r1 with 1 ε probability 1. But P(ξn+2 = αr|ξn+1 = αr) = 1 − n 1+ε as in this case Xn+2 = r0 with probability 1/n, so that Xn+2 = αc is possible. If (Xn)n is in equilibrium such that P(Xn = x) = πx, then the chain (ξn)n is also in equilibrium, with distribution

X X (x) µ(α) := P(ξn = α) = P(ξn = α|Xn = x)P(Xn = x) = bα πx x x⊂α X 1 = w(α)π = |α|w(α). π x x⊂α x

Further assuming that (Xn)n is in equilibrium, the one-step transition matrix of (ξn)n is X 1 (x) Q := P (ξ = β|ξ = α) = b , (7) αβ n n−1 |α| β x∈(α∩β) see Appendix for detailed calculation. Moreover, Q is detailed-balanced X 1 µ(α)Qαβ = w(α)w(β) = µ(β)Qβα. (8) πx x∈(α∩β)

Thus, given a non-reversible Markov chain (Xn)n on V , we have constructed a reversible, but non-Markovian associated chain (ξn)n on the cycle space Γ. We will now modify Definition 2.2 to obtain two Markov chains (Zn)n ⊂ V and (ζn)n ⊂ Γ. It will turn out later that (ζn)n is the Markov approximation of (ξn)n.

Definition 2.3. The cycle-node-cycle process (ζn)n ⊂ Γ and the node-cycle-node process (Zn)n ⊂ V are stochastic processes generated by the following procedure:

1. If Zn = x, then draw the next cycle ζn according to the conditional probabil- ities (x) P(ζn = α|Zn = x) = bα . 8 NATASAˇ DJURDJEVAC CONRAD, RALF BANISCH AND CHRISTOF SCHUTTE¨

2. If ζn = α, then draw Zn+1 = y with probability ( 1 |α| , y ∈ α P(Zn+1 = y|ζn = α) := 0, y∈ / α. Jumps node-cycle are again probabilistic, but now also jumps from a cycle to a node are probabilistic because from a cycle α the process can jump with uniform probability to any of the nodes in this cycle. Both processes (Zn)n and (ζn)n are Markov chains and we will now look at their transition properties. To simplify the notation we introduce the two stochastic matrices B : RΓ → RV and V : RV → RΓ with components (x) Bxα = P(ζn = α|Zn = x) = bα , 1 V = P(Z = y|ζ = α) = δ(y ∈ α). (9) αy n+1 n |α|

Let πn(x) := P(Zn = x) and µn(α) := P(ζn = α) be the probability distributions of (Zn)n and (ζn)n. Then the above definition implies the following propagation scheme for πn and µn BT VT BT VT πn −→ µn −→ πn+1 −→ µn+1 −→ ... (10)

The next lemma gives the explicit expressions for the transition matrices of (ζn)n and (Zn)n in terms of B and V

Lemma 2.4. Let P be the transition matrix of (Zn)n and Q be the transition matrix of (ζn)n. Then P = BV and Q = VB and 1 X w(α) P = (11) xy π |α| x α3x,y

X 1 (x) Q = b . (12) αβ |α| β x∈(α∩β)

T Proof. By the Kolmogorov forward equation πn+1 = P πn, which in view of (10) immediately implies P = BV, and similarly we get Q = VB. On the other hand, using (9): X (x) 1 1 X w(α) (VB)αβ = bβ , (BV)xy = . |α| πx |α| x∈α∩β α3x,y Since B and V are stochastic, then P and Q are stochastic matrices too.

Process (Xn)n (ξn)n (Zn)n (ζn)n Properties non-reversible reversible reversible reversible Markov non-Markov Markov Markov State space V Γ V Γ Transition matrix P Q P Q Stationary distribution π µ π µ Table 1. Properties of the introduced random walk processes.

By comparing with (7), we can see that the one-step transition matrices of (ζn)n and (ξn)n are the same, so that (ζn)n is the Markov approximation of (ξn)n. Thus, our construction produced a Markov process (Zn)n on the space of graph nodes V MODULARITY OF DIRECTED NETWORKS 9 with transition matrix P and a closely related Markov process (ζn)n on the space of graph cycles Γ with the transition matrix Q. Table1 shows the main properties of the introduced random walk processes. To establish a more precise relation between these processes, we will examine the spectral properties of P and Q. Let Dπ = diag(π) and Dµ = diag(µ) with µ(α) = |α|w(α) as before. We equip V Γ T T R and R with the scalar products hu, viπ := u Dπv and hu, viµ := u Dµv. Then, Lemma 2.5 shows that (Zn)n and (ζn)n are two time-reversible Markov processes with identical spectral properties.

Lemma 2.5. The transition matrices P and Q of processes (Zn)n and (ζn)n have the following properties: (i) π is the stationary distribution of P and µ is the stationary distribution of Q. (ii) (Zn)n and (ζn)n are time-reversible processes, i.e. hu, Pviπ = hPu, viπ and hu, Qviµ = hQu, viµ. (iii) The non-zero spectrum of Q is identical to the non-zero spectrum of P, i.e. σ(Q) \{0} = σ(P) \{0}. (iv) For λ ∈ σ(P) \{0}, dim ker(P − λ1) = dim ker(Q − λ1). If v ∈ ker(Q − λ1), then Bv ∈ ker(P − λ1). If v ∈ ker(P − λ1), then Vv ∈ ker(Q − λ1) Proof. See Appendix.

1 1

0.8 0.5

0.6

Im 0 0.4

0.2 −0.5

0 −1 −1 −0.5 0 0.5 1 0 5 10 15 20 Re

(a) (b)

Figure 4. Eigenvalues of random walk processes on a barbell graph with n = 40 and  = 0.1. Red: Eigenvalues of P . Blue: Eigenvalues of P. 4a shows the real part of the largest 20 eigen- values of P and P. 4b shows the complete spectrum of P and P.

Figure4 shows the spectrum of transition matrices P and P for the barbell graph example from Figure3 and n = 40,  = 0.1. The spectrum of the transition matrix P of the original non-reversible random walk process consists of complex eigenvalues which all lie almost on the unit circle, i.e. have a modulus almost 1. On the other hand, the spectrum of P (therefore also Q) is real valued and offers a clear spectral gap after the second eigenvalue, indicating the existence of two network modules. The benefit of the new random walk process in this example is that it preserves the dominant real valued eigenvalues related to the existence of network modules and projects out the complex eigenvalues. 10 NATASAˇ DJURDJEVAC CONRAD, RALF BANISCH AND CHRISTOF SCHUTTE¨

2.4. Communication graph and cycle graph. Time-reversibility of (Zn)n and (ζn)n directly implies that the weights Ixy := πxPxy and Wαβ := µ(α)Qαβ are symmetric. Remarkably, both Wαβ and Ixy have an interpretation in terms of the original Markov process (Xn)n. More precisely, the quantity

X (x) Wαβ = w(α)bβ , α, β ∈ Γ (13) x∈(α∩β) is the total exchange of probability flux F between the cycles α and β per timestep. To see this, note that for x ∈ α, w(α) is the fraction of probability current arriving (x) at x which is carried by α, and bβ is the probability to switch to β at x, so that (x) w(α)bβ is the exchange of probability flux between α and β that happens at x. The quantity X w(α) I = (14) xy |α| α3x,y is a measure for intensity of communication between nodes x, y under (Xn)n in terms of the cycles through which x and y are connected. Indeed, Ixy is large, when x and y are connected by many cycles α which are important (i.e. w(α) large) and short (i.e. |α| small). This interpretation of Ixy as a communication intensity measure will be made clear in the next Section 2.5. We will use these properties of Wαβ and Ixy, to introduce the following undirected graphs:

Definition 2.6. Given the directed graph G = (V,E), we let HG(Γ,EW ) be the undirected graph with set Γ which has an edge with weight Wαβ between α and β if Wαβ > 0, and we call HG the cycle graph of G. We let KG(V,EI ) be the undirected graph with vertex set V which has an edge with weight Ixy between x and y if Ixy > 0, and we call KG the communication graph of G.

Notice that HG is simply the network representation of the cycle-node-cycle pro- cess (ζn)n and KG is the network representation of the node-cycle-node process (Zn)n. Thus, the standard random walk processes on HG and KG will have tran- sition matrices Q and P respectively. Since important module information from G is encoded in the structure of both HG and KG, we can apply some of the existing module finding approaches in undirected graphs on HG and KG and project the result to G. In this paper we will present the algorithm for module identification which uses only the communication graph (see Section3), because HG is usually much larger than KG for modular directed networks G. In Figure5, the barbell graph is shown together with the corresponding cycle graph HG and communication graph KG. Since the cycle decomposition of the barbell graph produces three cycles αl, αc and αr, the cycle graph will have only w three nodes. The edge weights (13) of the cycle graph are Wαlαc = 1+ = Wαcαr . The communication graph of the barbell graph has the following edge weights  w/n x, y ∈ αl  w/n x, y ∈ α  r Ixy = εw/2 x 6= y ∈ αc (15)  εw/2 + w/n x = y ∈ αc  0 otherwise MODULARITY OF DIRECTED NETWORKS 11

cycle decomposition

ln-1 cycle graph HG

αl w(αl) = w l0

original graph G l1 ln-1 r1

αc ε w(αc) = εw l0 r0

r1 l1 rn-1

αr communication graph KG r0 l M1 n-1 M2 r1 w(αr) = w w/n rn-1

εw/2

l0 r0

l1 rn-1

Figure 5. The barbell graph and corresponding cycle and com- munication graphs.

As observed in Figure5, this results in cycles αl and αr being mapped into two complete subgraphs with equally weighted edges and consequently the appearance of two modules M1 and M2 in KG. Remark 1. (Line-graphs) Cycle graphs can be related to line-graphs (or edge- graphs), which are important objects in spectral graph theory [29,7]. If G = (V,E) is an undirected graph with |E| edges, its line-graph L(G) has |E| nodes which are edges of G and nodes in L(G) are adjacent if edges in G are. So, a transformation of G into L(G) maps graph’s edges into nodes and a transformation of G into HG maps graph’s cycles into nodes. Furthermore, line-graphs can be very efficiently used for fuzzy clustering of undirected networks [11] by employing some of the full partitioning approaches for finding modules in line-graphs and then projecting these modules back to the original graph. Similarly, one can use cycle graphs for fuzzy partitioning of directed networks. Remark 2. (Entropy production) The work of [1] introduces cycle graphs as a way to describe non-equilibrium Markov processes. Indeed, many non-equilibrium properties may very well be related to cycle decompositions of the type we discussed here. We give one example. The entropy production rate is believed to be a measure of the degree of irreversibility of a system. For Markov chains, it is given by   X 1 πxpxy e = (π p − π p ) log p 2 x xy y yx π p x,y y yx if the condition pxy > 0 ⇔ pyx > 0 holds for all pairs of states x, y (otherwise eP = ∞). It has been shown in [14] that the entropy production rate can be 12 NATASAˇ DJURDJEVAC CONRAD, RALF BANISCH AND CHRISTOF SCHUTTE¨ expressed by means of the stochastic cycle decomposition (4) as X  w(γ)  e = (w(γ) − w(γ−)) log . (16) p w(γ−) γ∈Γ Here, γ− denotes the cycle γ with reversed orientation2. This shows that the entropy production rate is given in terms of the difference of the cycle weights for cycles γ and their reversed counterparts γ−. More precise relations between cycle decomposition and entropy production will be the topic of our research in the future. 2.5. Information transport. In this section we show that the cycle-node-cycle process (Zn)n in fact mimics a model for information transport in the network, which gives an interpretation to the quantity Ixy. Suppose that initially, X0 = x. Let Z˜0 = x denote the initial location of one bit of information. We will now model how this bit is transported through the network by means of (Xn)n:

1. Let τ be the first time (Xn)n returns to x. Since we assumed (Xn)n to be irreducible, τ is a.s. finite and the path p = (X0,...,Xτ ) is a loop, possibly with self-intersections. We can decompose p into simple cycles, exactly one of which contains x. Let this cycle be γ. 2. The bit of information is now distributed uniformly among all nodes on γ. To model this, we will pick the new location Z˜1 = y of the bit uniformly at random among the vertices of γ. We will then restart the process (Xn)n at y and repeat.

Now observe that the outcome of step 1 is the cycle γ exactly iff (Xn)n passes through γ, as it was defined in Section 2.2. The probability of the outcome being γ is therefore the probability of (Xn)n passing through γ conditioned on starting P at x, which is δ(x ∈ γ)w(γ)/ γ⊃x w(γ) = Bxγ . On the other hand, if the outcome ˜ 1 of (1) is γ, then the probability of Y1 = y in (2) is |γ| δ(y ∈ γ) = Vγy. We thus have by Lemma 2.5: X P(Z˜1 = y|Z˜0 = x) = Bxγ Vγy = Pxy = P(Z1 = y|Z0 = x). γ

This implies that (Z˜n)n = (Zn)n, that is the node-cycle-node process describes the sequence of locations of the bit of information which is transported along the network by (Xn)n. Then X w(α) I = P(Z = x, Z = y) = xy n n+1 |α| α3x,y is the probability that the bit is delivered from x to y, and therefore measures how easy it is to share information between the nodes x and y. From the point of view of the network G, Ixy connects topological properties of G (how many cycles connect x and y, and how long are they) with dynamical properties of the process (Xn)n (which gives the importance weights w(α)). As we established in Section 2.4, Ixy is symmetric since (Zn)n is reversible. This allows us to apply any clustering method designed for undirected, weighted networks to find a partition of G into modules based on node communication. The algorithmic aspect of this approach will be presented in the next section together with a view on its computational complexity.

2 − If γ = (x1, . . . , xs), then γ = (xs, . . . , x1). MODULARITY OF DIRECTED NETWORKS 13

3. Algorithmic aspects. Following the ideas developed in Section2, we will say that a subset M ⊂ V is a module of the directed graph G = (V,E) iff it is a module in the usual sense of the undirected, weighted communication graph KG = (V,EI ) defined in Section 2.4. This suggests the following procedure to identify modules in G: 1. Given the graph G = (V,E) with edge weights K(x, y), construct the transi- tion matrix P according to (1). 2. Compute Ixy and construct the undirected weighted graph KG. 3. Find fuzzy partitions of KG. In the following, we will explain in more detail steps (2) and (3).

3.1. Computing Ixy. We now discuss how to compute Ixy in practice. One way of doing so is via (14), which requires the computation of the cycle decomposition {Γ, w} with weights w(γ) satisfying (4). This is clearly a hard problem: While even finding and enumerating all cycles in G scales exponentially in the worst case, explicitly computing the weight w(γ) for γ = (x1, . . . xl) according to (6) requires −1 the computation of l taboo Green functions of the form (I − P{x1,...,xs}) for

1 ≤ s ≤ l where P{x1,...,xs} is P with rows and columns indexed by {x1, . . . , xs} deleted [14]. This will not be feasible for large networks, and we therefore resort to sampling. First we present an algorithm which computes the collection Γ of cycles γ and counts NT from a realization (Xn)1≤n≤T of the Markov chain (Xn)n, see [14]. The algorithm uses an auxiliary chain η which is initialized at η = X1, then the states Xi are added sequentially to η until η contains a cycle, which is then removed from η:

Algorithm 1 (sampling the cycle decomposition): (i) Initialization: Set Γ = ∅ and η = X1. (ii) for i = 2 to T : If Xi ∈/ η, set η = [η, Xi]. Else if η = [η1, . . . , ηp−1,Xi, ηp+1, . . . , ηl] then

1. Set γ = [Xi, ηp+1, . . . , ηl] and η = [η1, . . . , ηp−1]. γ γ γ 2. If γ ∈ Γ, then NT = NT +1. Otherwise add γ to Γ and set NT = 1.

1 γ We then set wT (γ) := T NT . In [14] it has been shown that wT (γ) → w(γ) a.s. as T → ∞, therefore the finite-time weights wT returned by the sampling algorithm are an approximation of the true weights w. However, the number |Γ| of cycles may grow exponentially with |V |, and therefore the rate of convergence of the weights wT (γ) might be very slow for many γ ∈ Γ. But note that we do not need to actually compute the cycle decomposition; for the clustering task it is enough to compute Ixy. This turns out to be much easier:

Algorithm 2 (sampling Ixy): (i) Initialization: Set Γ = ∅, η = X1 and N˜T (x, y) = 0 for all x, y ∈ V . (ii) for i = 2 to T : If Xi ∈/ η, set η = [η, Xi]. Else if η = [η1, . . . , ηp−1,Xi, ηp+1, . . . , ηl] then

1. Set γ = [Xi, ηp+1, . . . , ηl] and η = [η1, . . . , ηp−1]. ˜ ˜ 1 2. Update NT (x, y) → NT (x, y) + |γ| for all x, y ∈ γ. 14 NATASAˇ DJURDJEVAC CONRAD, RALF BANISCH AND CHRISTOF SCHUTTE¨

1 ˜ We then set IT (x, y) := T NT (x, y). We have a.s. convergence IT (x, y) → Ixy for the following reason: The counts N˜T (x, y) computed by algorithm 2 fulfill the ˜ P 1 equation NT (x, y) = γ3x,y |γ| NT (γ). This is seen by comparing step (ii) of both algorithms. We thus have 1 X 1 X 1 lim N˜T (x, y) = lim wT (γ) = w(γ) = Ixy T →∞ T T →∞ |γ| |γ| γ3x,y γ3x,y by virtue of wT (γ) → w(γ) a.s. and (14). The computational complexity of algorithm 2 is O(T ). The convergence rates of the counts IT (x, y) depend on the slowest relaxation timescale and hence on the second largest eigenvalue of P [34]. If P is very metastable, then sampling can be- come prohibitively expensive due to low crossing rates between different metastable sets. In such a case Algorithm 2 can be improved further by e.g. parallel sampling of multiple trajectories that are initialized at different points.

Remark 3 (Timeseries perspective). In many applications, the network G will be a representation of a finite timeseries of data (Xn)1≤n≤T . In such situations it is natural to use the sampling algorithm directly on the data, which fixes T and removes any convergence issues. Our method then turns into a form of timeseries analysis via recurrence networks. This point of view is discussed in [3].

Remark 4 (Alternative Algorithms). Various alternative algorithms exist to obtain a cycle decomposition {Γ˜, w˜} which satisfies only (3), but not (4). We quickly review two examples: (a) Deterministic iterative algorithm:[15, 14] This algorithm iterates the fol- lowing steps: (i) find a cycle γ in G, (ii) set w(γ) = min(xy)⊂γ Fxy, (iii) update F → F − w(γ)χγ where χγ is the unit flow along γ. The iteration stops when F < T OL is reached for a prescribed small T OL > 0. This algorithm has polynomial complexity in the number of vertices of G. (b) Cycle-basis construction: [15] The space of cycles is a vector space of dimension d = |E| − |V | + 1 and can be equipped with a scalar product. Then we may take Γ˜ to be a basis of C, which can be found e.g. via a minimal spanning tree [15]. The weightsw ˜ can be computed by solving a linear system involving the d × d matrix Cγγ0 = hχγ , χγ0 i. If we use (a) or (b) instead of the sampling algorithm in step (2), then the resulting weights I˜xy loose their interpretation in terms of information transport measure between x and y. Still, in many cases I˜xy might be a good approximation to Ixy, and (a) and (b) are fast even for large networks. The approximation quality of I˜xy obtained by (a) and (b) will be addressed in our future work.

3.2. Fuzzy network partitioning. To partition KG in step (3), one can use any method designed for clustering undirected, weighted graphs [12, 25]. In this paper, we will use Markov State Modeling (MSM) clustering [9, 33], as it can find multiscale fuzzy network partitions. More precisely, MSM clustering returns m module cores M1,...,Mm ⊂ V , which are disjoint and do not necessarily form a full partition

m m ∪i=1Mi 6= V ⇒ T = V \ ∪i=1Mi 6= ∅. MODULARITY OF DIRECTED NETWORKS 15

The number of modules m can be inferred from the spectrum of P , see [10] for more details. Furthermore, the MSM approach aims at finding fuzzy network partitions into modules, such that some nodes do not belong to only one of the modules but to several or to none at all. For nodes that do not deterministically belong to one of the modules, we will say that they belong to the transition region T . The next obvious question is how to characterize the affiliation of the nodes from the transition region to one of the modules. In the MSM approach, this affiliation is given in terms of the random walk process (Zn)n on KG by the committor functions

h  m  i qi(x) = P τx ∪ j=1 Mj > τx(Mi) , i = 1, . . . , m, x ∈ V j6=i where τx(A) is the first hitting time of the set A ⊂ V by the process (Zn)n if started in node x, τx(A) = inf{t ≥ 0 : Zn ∈ A, Z0 = x}. Therefore, qi(x) gives us the probability that the random walk process if started in node x, will enter the module Mi earlier than any other module. Committor functions can be computed very efficiently by solving sparse, symmetric and positive definite linear systems [24, 10]

(Id − P )qi(x) = 0, ∀x ∈ T

qi(x) = 1, ∀x ∈ Mi (17)

qi(x) = 0, ∀x ∈ Mj, j 6= i.

Furthermore, it is easy see that the committor functions q1, . . . , qm form a partition of unity m X qi(x) = 1, ∀x ∈ V, i=1 such that we can interpret qi(x) as the natural random walk based probability of affiliation of a node x to a module Mi.

3.3. Verification of modularity. It is always important to assess the quality of M the partition {χm}m=1 returned by any clustering algorithm. An obvious way of doing so is by considering systems where one has prior knowledge about the features a good clustering should capture, and we will do so in section4. Additionally, one may use the well-known modularity function [27] to evaluate the quality of the resulting partition. Note that we have two choices here: We can evaluate the partition on the original directed graph G, or we can utilize the cycle decomposition M once more and evaluate {χm}m=1 on the undirected communication graph KG. In probabilistic terms, we ask for the modularity as seen by either the standard RW process (Xn)n or the node-cycle-node process (Zn)n, clf. definition 2.3. In the latter case we obtain X X Q¯ = χm(x)[P(Zn = x, Zn+1 = y) − P(Zn = x)P(Zn = y)] χm(y) m x,y X X = χm(x)[Ixy − πxπy] χm(y). (18) m x,y 16 NATASAˇ DJURDJEVAC CONRAD, RALF BANISCH AND CHRISTOF SCHUTTE¨

In the former case we obtain in the same manner: X X Q = χm(x)[P(Xn = x, Xn+1 = y) − P(Xn = x)P(Xn = y)] χm(y) m x,y X X = χm(x)[πxpxy − πxπy] χm(y) (19) m x,y which agrees with an extension of the original modularity function for directed networks [20, 16]. Of course one can find ‘optimal’ partitions by optimizing over Q resp. Q¯, and we will consider these alternative clustering algorithms as well in section4. However, approaches based on maximizing Q are shown to often neglect important directional information [18, 36] and have the same outcome as when symmetrizing P . More precisely, Q is unchanged if πxpxy is replaced by s 1 the symmetrized version πxpxy = 2 (πxpxy + πypyx), which ignores directions. In contrast, using Q¯ to evaluate the quality of a partitioning of KG is unambiguous, and the directional information of G is directly built into Ixy. We will demonstrate on some examples in Section4 that Q and Q¯ may differ significantly. This difference comes mainly from the fact that Q takes into account only one-step transitions pxy, unlike Q¯ which considers multi-step transitions encoded in Ixy via cycle structures of G.

4. Numerical experiments. In this section, we will compare the results of three different clustering algorithms on a few example networks. The algorithms consid- ered are:

1. The algorithm proposed in Section3, i.e. construction of KG via a realization (Xn)1≤n≤T and then MSM clustering of KG, which we will refer to as Cy- cle MSM (CMSM) clustering. This algorithm returns a fuzzy partition M {χm}m=1. 2. Construct KG via a realization (Xn)1≤n≤T and then optimize the KG-based ¯ M modularity function Q (18) over all full partitions {χm}m=1, χm(x) ∈ {0, 1}. We will refer to this as Q¯-maximization. 3. Maximize the G-based modularity function Q (19) over all full partitions M {χm}m=1, χm(x) ∈ {0, 1}. This direct approach doesn’t need the construction of KG. We will refer to it as Q-maximization. The optimizations of Q and Q¯ where performed with the algorithm in [13].

4.1. Barbell graph. As discussed earlier, the communication graph KG consists of two complete graphs with n nodes each which are joined by the edge (l0r0). CMSM clustering produces a full partition {χ1, χ2} where χ1 is one on C1 = αl and zero on C2 = αr, see Figure6. The scores assigned by Q and Q¯ to {χ1, χ2} are almost identical: 1 ε 1 1 ε Q(C ,C ) = − , Q¯(C ,C ) = − . 1 2 2 n + ε 1 2 2 2 n + ε

Now we address the question whether {χ1, χ2} optimizes Q resp. Q¯. To do so, we compare {χ1, χ2} for even n with another partition into 3 sets C1,a,C1,b and C2 where |C1,a| = |C1,b|, see Figure6. Let ∆ Q = Q(C1,C2) − Q(C1,a,C1,b,C2) and ∆Q¯ similarly. One finds 1 1 1 1 n ∆Q = − , ∆Q¯ = − . 8 n + ε 8 4 n + ε MODULARITY OF DIRECTED NETWORKS 17

Then, ∆Q > 0 for n ≥ 8 and ∆Q¯ < 0 as long as n > ε. That is, as n grows Q-maximization produces a partition into more and more subchains with less than 8 nodes while Q¯-maximization always produces the partition {χ1, χ2}. Notice that the block structure (15) indicates that it is equally easy to transmit information between all nodes in C1 while it is hard to transmit information between C1 and C2. From the point of view of the RW process this is natural: The average time to 1 ε go from C1 to C2 is (n+ 2 ) 1+ε , while mixing within C1 and in particular transitions between C1,a and C1,b happen quickly. To summarize, the barbell graph has a modular structure which is not based on link density and therefore cannot be detected by density based methods such as Q-maximization. Rather, this modular structure is information based and revealed by KG. In the particular case here, using CMSM clustering or Q¯-optimization to partition KG yields the same result.

l n-1 r C1,b 1

ln/2 ε

l0 r0

l C2 C1 n/2-1

rn-1 C1,a

Figure 6. The two modules C1 and C2 produced by CMSM clus- tering. Q-maximization gives a partitioning into subchains of the type C1,a,C1,b.

4.2. Wheel switch. Our next example is the ‘wheel switch’ graph which is shown in Figure 7a. It consists of a clockwise outer loop αo and an anticlockwise inner loop αi with n nodes each and some connections between the two loops. At each connection, the probability to switch between αo and αi is p and the connections are such that two smaller loops β1 and β2 of length 4 are formed (coloured in Figure 7a). The of this graph depends on p. If p > 1/2, completing the small loops β1 and β2 is more likely then completing αi and αo, so we expect a community structure where β1 and β2 dominate. Indeed, CMSM clustering pro- duces the partition shown in Figure 7a which identifies β1 and β2 as modules and gives all other nodes an affiliation of 0.5 to both modules (indicated by the white colour). The modularity values achieved by this partitioning depend on p and n, but typical values are Q¯ ≈ 0.2. If p ≤ 1/2, transitions between αi and αo are unlikely and we expect to find the community structure shown in Figure 7b. Indeed CMSM clustering produces exactly this partition. As in the previous example, for p ≤ 1/2 Q¯-maximization gives the same result. On the other hand and again as in the previous example, Q-maximization divides the red and green modules in Figure 7b further into many small chains of length less than 8. Also for 1/2 < p ≤ 0.63, Q¯-maximization produces the partition shown in Figure 7b, while for p > 0.63 it gives the one shown in Figure 7c. Note that the difference between Figure 7a and Figure 7c is only that the white nodes are now fully assigned to one of the two modules in such a way that as much of αi and αo is intact as 18 NATASAˇ DJURDJEVAC CONRAD, RALF BANISCH AND CHRISTOF SCHUTTE¨ possible. Q-maximization again produces many small chains of length less than 8 as modules and the exact number of new modules depends on n. Even if we restrict Q-maximization to partitions with only two modules, the optimum is degenerate: Any cut S that produces modules of equal size and leaves β1 and β2 intact, as shown in Figure 7d, will optimize Q. In summary, CMSM clustering exactly reproduces the expected community struc- ture for all values of p and n. Q-maximization destroys the community structure for large n and produces many small chains instead. Q¯-maximization cannot pro- duce the fuzzy partition of Figure 7a, but instead produces the best possible full partition, independent of n. However, the p-threshold between outcomes 7a and 7b is p = 0.5 for CMSM clustering, while it is p = 0.63 between outcomes 7b and 7c for Q¯-maximization.

1-p

p β 1 αi p 1-p 1-p p β αo 2 p

1-p (a) (b)

S

(c) (d)

Figure 7. Graph of the wheel switch with n = 10, and p being the probability to switch between the outer and inner loop. The coloring: in (a) shows the modules found by CMSM clustering for p > 1/2; in (b) is shown clustering found by CMSM for p ≤ 1/2; (c) shows resulting clustering of Q¯-maximization for p ≥ 0.63; and (d) one example of clustering obtained by Q-maximization for p ≥ 1/2.

4.3. Langevin dynamics. Recent years have seen a continuous increase in re- search on so-called Markov State Models (MSM) for molecular dynamics (MD). MSMs are based on a discretization of the transfer operator associated with MD and aim at the reproduction of the longest timescales in MD, see [39,5]. However, MODULARITY OF DIRECTED NETWORKS 19 most of the approaches to MSM building assume that the underlying dynamics is reversible [5], and MSM building for non-reversible MD is still in its infancies. The most popular model for non-reversible MD is the so-called Langevin System [32]:

x˙ = p, p˙ = −∇x V (x) − γ p + σ W˙ t, (20) where x is the vector of the Euclidean coordinates of all atoms in the molecular sys- tem, p the vector of associated momenta, V (x) the interaction energy, and −∇xV (x) the vector of all inter-atom interaction forces. γ > 0 denotes some friction constant and Fext = σW˙ t the external forcing given by a 3N-dimensional Brownian motion Wt. The external stochastic force is assumed to model the influence of the heat bath surrounding the molecular system. The total internal energy given by the Hamil- tonian H(x, p) = p2/2+V (x) is not preserved, but the interplay between stochastic excitation and damping balances the internal energy. As a consequence, the canon- ical pdf µ(x, p) ∝ exp(−βH(x, p)) with x = (x, p) is invariant w.r.t. the Markov process corresponding to the Langevin system under weak growth conditions on the potential energy function [23], if the noise and damping constants satisfy [32]: 2γ β = . (21) σ2

The Langevin process (Yt) is not reversible but satisfies the so-called extended detailed balance condition  0 0  P Yt+T = (x , p ) | Yt = (x, p) µ(x, p)

 0 0  0 0 = P Yt+T = (x, −p) | Yt = (x , −p ) µ(x , −p ), (22) for arbitrary t and T > 0. We consider Langevin dynamics with one-dimensional x and p and double well potential V (x) = (x2 − 1)2. Figure8 shows the phase space of this system together with some periodic orbits of the associated Hamiltonian system. In this case the process has two metastable sets around the minima x± = ±1 of the potential V and p = 0. We set β = 5 and γ = 0.2, so that the Langevin dynamics is still “close” to the associated Hamiltonian system

x˙ = p, p˙ = −∇x V (x), (23) i.e., if we start the Langevin process in (x0, p0) with energy E0 = H(x0, p0) < 0.9 then the dynamics will approximately follow the periodic orbit H(x, p) = E0 (with x < 0 if x0 < 0 and x > 0 if x0 > 0) of the associated Hamiltonian system for some time interval of order 1. Therefore the typical transition from the vicinity of one of the wells across the energy barrier at x = 0 towards the other well will look as follows: first the trajectory will orbit the initial well for some period of time before it crosses the barrier and starts to orbit the target well until it finally hits the close vicinity around the respective energy minimum. For this Langevin process MSM building is done as follows [5, 39]: We construct a uniform box covering of the essential state space [−1.8, 1.8] × [−1.8, 1.8] where the invariant pdf is larger than the square root of the machine precision. We took square boxes Bi, i = 1, . . . , n of size ∆x = 0.2 and ∆p = 0.2. Next, M = 1000 trajectories of length τ = 0.25 were started in every box (distributed due to µ), and the transition matrix m P = ij , ij M 20 NATASAˇ DJURDJEVAC CONRAD, RALF BANISCH AND CHRISTOF SCHUTTE¨

1.5 1.2 1 1 0.5

0.8

p 0

0.6 −0.5

−1 0.4

−1.5 0.2

−1.5 −1 −0.5 0 0.5 1 1.5 x

Figure 8. Some periodic orbits of the system with Hamiltonian 1 2 H = 2 p + V (x). The coloring is according to the energy E0 = H(x0, p0).

was constructed, where mij is the number of trajectories starting in Bi and ending up in Bj. The length τ = 0.25 of the trajectories is very small compared to the expected transition time (which is larger than 100 here) and still shorter than the period of the periodic orbits of the associated Hamiltonian system. MSM [39] theory tells us that the leading eigenvalues of the transition matrix P are very close approximations of the leading eigenvalues of the Langevin transfer operator and thus allow for an approximation of the transition statistics between the metastable wells. The fuzzy partitioning found by CMSM is shown in Figure 9a and 9b. It achieves a modularity score of Q¯ = 0.39. We observe that the cluster centers, i.e. the boxes x with q1(x) = 1 or q1(x) = 0, respect the rotational symmetry (x, p) 7→ (−x, −p) that the Hamiltonian system (23) enjoys. Furthermore, the affiliation function q1 picks out the separatrix, i.e. the curve E0 = 1 which separates orbits that are restricted to one of the minima and orbits around both minima, in the following sense: Boxes inside the separatrix either have q1(x) ≈ 0 or q1(x) ≈ 1, while boxes outside have q1(x) ≈ 0.5. This is in perfect agreement with the intuition that the Langevin system is still close to the Hamiltonian system. Q-maximization produces the clustering shown in Figure 9c, with Q = 0.603. We observe five larger modules and a number of modules of size one at the boundary of the phase space (shown in black in 9c), this is due to the stationary distribution being almost zero there. The metastable cores (x±, 0) are identified, but the two clusters largest in size seem mostly arbitrary, and the separatrix is not identified. The result of Q¯-maximization is shown in Figure 9d, with Q = 0.502. Again, we observe a number of small modules of size one at the boundary of the phase space as well as four larger modules. The largest module roughly corresponds to the region outside the separatrix, and the metastable cores are correctly identified.

5. Conclusion. In this paper we discussed the problem of finding fuzzy partitions of weighted, directed networks. The main difficulty compared to the well studied case of analyzing undirected networks is that considering asymmetric edge directions MODULARITY OF DIRECTED NETWORKS 21

1.5 1.5

1 1

0.5 0.5

0 0 p p

−0.5 −0.5

−1 −1

−1.5 −1.5

−2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 x x

(a) (b)

1.5 1.5

1 1

0.5 0.5

0 0 p p

−0.5 −0.5

−1 −1

−1.5 −1.5

−2 −2 −1.5 −1 −0.5 0 0.5 1 1.5 −1.5 −1 −0.5 0 0.5 1 1.5 x x

(c) (d)

Figure 9. Clustering results for the Langevin system. (a): CMSM clustering, cluster centers. (b): CMSM clustering, affiliation q1 and separatrix E0 = 1 (blue curve). (c): Q-maximization. (d): Q¯-maximization. Black boxes in (c) and (d) are clusters of size one. often leads to more complicated network substructures. For this reason most of the clustering approaches for undirected networks cannot be generalized to directed networks in a straightforward way. Although different new methods have been proposed, most of them fail to capture all directional information important for module finding, as they consider only one-step, one-directional transitions, which can lead to detecting false modules (where nodes are closely connected only in one direction) and over-partitioning of the network. To this end, we introduced herein a novel measure of communication between nodes which is based on using multi-step, bidirectional transitions encoded by a cycle decomposition of the probability flow. We have shown that this measure is symmetric and that it reflects the information transport in the network. Furthermore, this enabled us to construct an undirected communication graph that captures information flow of the original graph and apply clustering methods designed for undirected graphs. We applied our new method to several examples, showing how it overcomes the essential limitation of common methods. This indicates that our method can be used to circumvent the problems of one-step methods, like modularity based methods. Further generalizations of modularity function and consideration of different null models will be the topic of 22 NATASAˇ DJURDJEVAC CONRAD, RALF BANISCH AND CHRISTOF SCHUTTE¨ our future research. An important aspect of our method is its role in describing non-equilibrium Markov processes and in particular the connection with entropy production rate. Our future research will be oriented towards these problems.

Acknowledgments. The authors would like to thank Peter Koltai, Stefan Klus and Marco Sarich for insightful discussions. RB thanks the Dahlem Research School (DRS) and the Berlin Mathematical School (BMS).

Appendix.

Proof of equation7. We want to find the transition matrix for (ξn)n: X P(ξn = β|ξn−1 = α) = P(ξn = β|ξn−1 = α, Xn = x)P(Xn = x|ξn−1 = α) x X = P(ξn = β|Xn = x)P(Xn = x|ξn−1 = α) x X (x) = bβ P(Xn = x|ξn−1 = α) x∈(α∩β)

(x) by properties of (Xn)n, (ξn)n and the definition of bβ . Now

P(Xn = x|ξn−1 = α) X = P(Xn = x|ξn−1 = α, Xn−1 = y)P(Xn−1 = y|ξn−1 = α) y X = P(Xn = x|ξn−1 = α, Xn−1 = y) y

P(Xn−1 = y) × P(ξn−1 = α|Xn−1 = y) P(ξn−1 = α) X πy = P(X = x|ξ = α, X = y)b(y) n n−1 n−1 α |α|F (α) y⊂α X 1 = P(X = x|ξ = α, X = y) n n−1 n−1 |α| y⊂α

Since (Xn)n is actually determined by the pair (ξn−1,Xn−1), we have

( − 1 y = nα (x) P(Xn = x|ξn−1 = α, Xn−1 = y) = 0 otherwise

− where nα (x) is the vertex we reach next if we follow the cycle α in reverse orientation starting at x. Putting it all together, we have

X 1 (x) Q := P(ξ = β|ξ = α) = b . αβ n n−1 |α| β x∈(α∩β)

Proof of Lemma 2.5. (i) follows from (ii) by the fact that P and Q are stochastic. To show (ii), πxPxy = πyPyx and µ(α)Qαβ = µ(β)Qβα can be directly inferred from (11) and (12). (iii) follows from (iv). To show (iv), let λ 6= 0 and v ∈ ker(Q − λ1). This implies VBv = λv, multiplication from the left with B yields PBv = λBv, hence Bv ∈ ker(P − λ1). Now since Q = VB we have ker B ⊂ ker Q ⊥ ker(Q − λ1). Therefore B is injective as a map from ker(Q − λ1) to ker(P − λ1). To show that B MODULARITY OF DIRECTED NETWORKS 23

1 is also surjective, take w ∈ ker(P − λ1). Define v := λ Vw. Then v ∈ ker(Q − λ1) 1 1 1 1 since Qv = λ QVw = λ VPw = λ Vλw = λv, and Bv = λ BVw = w.

REFERENCES

[1] B. Altaner, S. Grosskinsky, S. Herminghaus, L. Katth¨an,M. Timme and J. Vollmer, Net- work representations of nonequilibrium steady states: Cycle decompositions, symmetries, and dominant paths, Phys. Rev. E, 85 (2012), 041133, URL http://link.aps.org/doi/10. 1103/PhysRevE.85.041133. [2] A. Arenas, J. Duch, A. Fern´andezand S. G´omez, Size reduction of complex networks pre- serving modularity, New Journal of Physics, 9 (2007), p176. [3] R. Banisch and N. D. Conrad, Cycle-flow based module detection in directed recurrence networks, EPL (Europhysics Letters), 108 (2014), 68008, URL http://stacks.iop.org/ 0295-5075/108/i=6/a=68008. [4] A. Barrat, M. Barthelemy, R. Pastor-Satorras and A. Vespignani, The architecture of com- plex weighted networks, Proceedings of the National Academy of Sciences of the United States of America, 101 (2004), 3747–3752, URL http://www.pnas.org/content/101/11/ 3747.abstract. [5] G. R. Bowman and V. S. Pande and F. No´e,editors, An Introduction to Markov State Models and Their Application to Long Timescale Molecular Simulation, volume 797 of Advances in Experimental Medicine and Biology, Springer, 2014. [6] J. Chen and B. Yuan, Detecting functional modules in the yeast protein–protein interaction network, Bioinformatics, 22 (2006), 2283–2290. [7] D. Cvetkovic, P. Rowlinson and S. Simic, Spectral Generalizations of Line Graphs, Cambridge University Press, 2004, Cambridge Books Online. [8] P. Deuflhard and M. Weber, Robust perron cluster analysis in conformation dynamics, Lin- ear Algebra and its Applications, 398 (2005), 161–184, URL http://www.sciencedirect. com/science/article/pii/S0024379504004689, Special Issue on Matrices and Mathematical Biology. [9] N. Djurdjevac, S. Bruckner, T. O. F. Conrad and C. Sch¨utte,Random walks on complex modular networks, Journal of Numerical Analysis, Industrial and Applied Mathematics, 6 (2011), 29–50. [10] N. Djurdjevac, M. Sarich and C. Sch¨utte, Estimating the eigenvalue error of markov state models, Multiscale Modeling & Simulation, 10 (2012), 61–81. [11] T. S. Evans and R. Lambiotte, Line graphs, link partitions, and overlapping communities, Phys. Rev. E, 80 (2009), 016105. [12] S. Fortunato, Community detection in graphs, Physics Reports, 486 (2010), 75–174, URL http://www.sciencedirect.com/science/article/pii/S0370157309002841. [13] M. Girvan and M. Newman, Community structure in social and biological networks, Proceedings of the National Academy of Sciences of the United States of Amer- ica, 99 (2002), 7821–7826, URL http://www.scopus.com/inward/record.url?eid=2-s2. 0-0037062448&partnerID=40&md5=11975daf89980be3e7f9b3ee3051796d, Cited By (since 1996)3239. [14] D. Jiang, M. Qian and M.-P. Quian, Mathematical Theory of Nonequilibrium Steady States: On the Frontier of Probability and Dynamical Systems, Springer, 2004. [15] S. L. Kalpazidou, Cycle Representations of Markov Processes, Springer, 2006. [16] Y. Kim, S.-W. Son and H. Jeong, Finding communities in directed networks, Phys. Rev. E, 81 (2010), 016103. [17] R. Lambiotte, J. C. Delvenne and M. Barahona, Laplacian dynamics and multiscale modular structure in networks, ArXiv. [18] A. Lancichinetti and S. Fortunato, Benchmarks for testing community detection algorithms on directed and weighted graphs with overlapping communities, Phys. Rev. E, 80 (2009), 016118. [19] A. Lancichinetti, F. Radicchi, J. J. Ramasco and S. Fortunato, Finding statistically significant communities in networks, PLoS ONE, 6 (2011), e18961, [20] E. A. Leicht and M. E. J. Newman, Community structure in directed networks, Phys. Rev. Lett., 100 (2008), 118703. [21] T. Li, J. Liu and W. E, Probabilistic framework for network partition, Phys. Rev. E, 80 (2009), 026106. 24 NATASAˇ DJURDJEVAC CONRAD, RALF BANISCH AND CHRISTOF SCHUTTE¨

[22] F. D. Malliaros and M. Vazirgiannis, Clustering and community detection in directed net- works: A survey, Physics Reports, 533 (2013), 95–142, URL http://www.sciencedirect. com/science/article/pii/S0370157313002822, Clustering and Community Detection in Di- rected Networks: A Survey. [23] J. Mattingly, A. M. Stuart and D. J. Higham, Ergodicity for SDEs and approximations: Locally Lipschitz vector fields and degenerated noise, Stochastic Process Appl., 101 (2002), 185–232. [24] P. Metzner, C. Sch¨utteand E. Vanden-Eijnden, Transition path theory for markov jump processes, Multiscale Modeling & Simulation, 7 (2008), 1192–1219. [25] M. E. J. Newman, The structure and function of complex networks, SIAM Review, 45 (2003), 167–256. [26] M. E. J. Newman, Fast algorithm for detecting community structure in networks, Phys. Rev. E, 69 (2004), 066133. [27] M. E. J. Newman, Modularity and community structure in networks, Proceedings of the National Academy of Sciences, 103 (2006), 8577–8582. [28] V. Nicosia, G. Mangioni, V. Carchiolo and M. Malgeri, Extending the definition of modularity of directed graphs with overlapping communities, Journal of Statistical Mechanics: Theory and Experiment, 2009 (2009), p03024. [29] P. Pako´nski,G. Tanner and K. Zyczkowski,˙ Families of line-graphs and their quantization, Journal of Statistical Physics, 111 (2003), 1331–1352. [30] G. Palla, I. Derenyi, I. Farkas and T. Vicsek, Uncovering the overlapping community structure of complex networks in nature and society, Nature, 435 (2005), 814–818. [31] M. A. Porter, J.-P. Onnela and P. J. Mucha, Communities in networks, Notices of the Amer- ican Mathematical Society, 56 (2009), 1082–1097. [32] H. Risken, The Fokker-Planck Equation, Springer, New York, 1996. 2nd edition. [33] M. Sarich, N. Djurdjevac, S. Bruckner, T. O. F. Conrad and C. Sch¨utte, Modularity revisited: A novel dynamics-based concept for decomposing complex networks, Journal of Computa- tional Dynamics, 1 (2014), 191–212. [34] M. Sarich, F. No´eand C. Sch¨utte, On the Approximation Quality of Markov State Models, Multiscale Modeling & Simulation, 8 (2010), 1154–1177. [35] M. Sarich, C. Sch¨utteand E. Vanden-Eijnden, Optimal fuzzy aggregation of networks, Mul- tiscale Modeling & Simulation, 8 (2010), 1535–1561. [36] M. T. Schaub, J.-C. Delvenne, S. N. Yaliraki and M. Barahona, Markov dynamics as a zooming lens for multiscale community detection: Non clique-like communities and the field-of-view limit, PLoS ONE, 7 (2012), e32210, [37] M. T. Schaub, R. Lambiotte and M. Barahona, Encoding dynamics for multiscale community detection: Markov time sweeping for the map equation, Phys. Rev. E, 86 (2012), 026112. [38] J. Schnakenberg, of microscopic and macroscopic behavior of master equation systems, Rev. Mod. Phys., 48 (1976), 571–585. [39] Ch. Sch¨utteand M. Sarich, Metastability and Markov State Models in Molecular Dynamics: Modeling, Analysis, Algorithmic Approaches, volume 24 of Courant Lecture Notes, American Mathematical Society, December 2013. [40] A. Viamontes Esquivel and M. Rosvall, Compression of flow can reveal overlapping-module organization in networks, Phys. Rev. X , 1 (2011), 021025. [41] R. K. P. Zia and B. Schmittmann, Probability currents as principal characteristics in the statistical mechanics of non-equilibrium steady states, Journal of Statistical Mechanics-theory and Experiment, 2007 (2007), p07012. Received August 2014; revised February 2015. E-mail address: [email protected] E-mail address: [email protected] E-mail address: [email protected]