Mining evolving network processes

Misael Mongiov`ı, Petko Bogdanov, Ambuj K. Singh Email:[email protected], [email protected], [email protected]

Abstract—Processes within real world networks evolve accord- NZ Cricket Players ing to the underlying graph structure. A bounty of examples New New India India Zealand Zealand exists in diverse network genres: botnet communication growth, NCT NCT moving traffic jams [25], information foraging [37] in document NCT NCT networks (WWW and Wikipedia), and spread of viral memes Cricket Cricket or opinions in social networks. The network structure in all the Pages Pages Sri Sri above examples remains relatively fixed, while the shape, size and Pakistan Pakistan Lanka Lanka NCT NCT NCT position of the affected network regions change gradually with NCT 2011 Cricket World Cup time. Traffic jams grow, move, shrink and eventually disappear. 28/3 29/3 Finals Schedule Public attention shifts among current hot topics inducing a similar New India Zealand India shift of highly accessed Wikipedia articles. Discovery of such NCT NCT NCT smoothly evolving network processes has the potential to expose India NCT the intrinsic mechanisms of dynamics, enable Cricket Cricket new data-driven models and improve network design. Pages Pages Cricket Sri Sri Pakistan Pakistan Pages Lanka Lanka We introduce the novel problem of Mining smoothly evolving NCT NCT NCT NCT processes (MINESMOOTH) in networks with dynamic real-valued node/edge weights. We show that ensuring smooth transitions in 30/3 - 31/3 1/4 - 4/4 5/4 the solution is NP-hard even on restricted network structures such as trees. We propose an efficient filtering-based framework, called Fig. 1. The information searching process in Wikipedia coinciding with the LEGATO. It achieves 3−7 times improvement in the obtained pro- Cricket World Cup finals in 2011. The process is represented by an evolving cess scores (i.e. larger and stronger-impact processes) compared network region of highly accessed pages. As the games proceed, attention to alternatives on real networks, and above 80% accuracy in shifts to currently playing teams and eventually onto the winning team. discovering realistic “embedded” processes in synthetic networks. In transportation networks, LEGATO discovers processes that conform to existing theoretical models for traffic jams, while its world cup progresses. Our goal is to capture such localized, obtained processes in Wikipedia reveal the temporal evolution of contiguous and smooth processes within the global dynamics information seeking of users. of the underlying network.

I.INTRODUCTION Similar networked evolving processes are common to other domains as well. Cellular signaling pathways have traditionally Network processes in domains ranging from biology been modeled as dynamic networked systems, whose evolution and neuroscience to transportation and information networks includes spatial and temporal switching and oscillating behav- evolve temporally, following the underlying graph topology. ior in the global cell activity [42]. The nervous system also ex- One example is the process of fact searching in information hibits dynamic behavior within a fixed connectivity of neurons, networks such as Wikipedia. The network structure of inter- with both theoretical [10] and experimental [14] support for the linked pages remains fixed (relative to the rate of accesses to existence of evolving activation processes. In transportation, them) and, as major events unfold, the number of views of computer and social networks, diverse processes like traffic subnetworks of related pages increases above normal levels. jams, denial of service attacks and social rumor, all evolve An important question then arises: How can we capture the gradually along the underlying structure. Traffic jams shift evolution of attention focus of information seekers in large spatially in highway networks [21], [40]; malicious botnet networks such as Wikipedia? activity propagates in computer networks as computers get One such information foraging process that we discov- infected and recover due to software patches [18]; and similarly ered in Wikipedia is summarized in Fig. 1. It coincides rumors spread along friendship links in social networks [36]. with the final games from the Cricket World Cup in 2011 Mining and characterization of evolving network processes and captures the gradual shift of interest across pages of has a number of important applications: it furthers the un- currently playing teams, players, statistics and cricket rules derstanding of their domain-specific causes and mechanisms; pages. The underlying structure of the information (hyperlink) it is essential in summarizing the micro-evolution of the graph remains constant, but edge significance scores change in network behavior as well as spotting anomalous behavior. In time with high values if both connected pages receive higher water networks, where low concentrations of chlorine indicate than usual traffic (shaded nodes have high-value links). A a contamination [31], mapping a contamination process as temporally contiguous series of subgraphs that include many a growing network region may help indicate the source of high-value edges captures the evolving process of information contamination and predict the rate and direction of its spread. foraging [37] of fans who surf Wikipedia to get trivia and facts Similarly, identifying flexible congestion processes in highway about teams, players, rules and records as the games proceed. networks helps the understanding of traffic jam propagation This process is naturally localized in the information graph and may enable improved urban planning [25]. as relevant pages are well interlinked and those links direct people in their search for facts. In addition, the attention of We propose a novel problem formulation and a corre- information seekers smoothly shifts from team to team as the sponding solution for mining significant smoothly-evolving processes on a network with dynamic node/edge weights. In evolution [13]. We will use the terms time-evolving network our setting, a process is a smoothly evolving subgraph in time and dynamic network interchangeably. that includes high-weighted (significant) edges. While we al- low the subgraph to change in consecutive time steps to capture A network process can be summarized by aggregating a the evolution of the underlying phenomenon, we impose a connected set of edges that altogether have high (positive) smoothness constraint on their rate of change, ensuring that weights. For example, in a traffic network a connected set of we capture a unique network process. Smoothness is important edges that prevalently have high positive weight represents a in modeling processes that evolve gradually in time due to congestion. The sum of the weights quantifies the significance physical constraints or limited resources (e.g. traffic jams and of the process. Summing up weights is a powerful method information foraging). Similar temporal smoothness notions that can model a variety of problems, by suitably balancing have previously been adopted for different dynamic network between positive and negative weights. In order to model problems such as evolutionary clustering [13]. processes in a dynamic network, we first introduce the concept of smoothly evolving subgraph, then we formalize its score Mining smoothly evolving processes is a computationally as the sum of its edge weights. Smoothness is targeted to difficult task due to noise and the large scale of real-world processes that evolve gradually in time (such as traffic jams and networks evolving over long periods of time. Furthermore, information foraging), due to physical constraints or limited we show that mining smoothly-evolving subgraphs is NP-hard resources. even on simple structures such as trees. While the problem of A smoothly evolving subgraph is a contiguous sequence of mining fixed or connected temporal subgraphs can be solved connected subgraphs that changes gradually in time. Smooth- by reduction to the corresponding single-slice (one time step) ness is controlled by a parameter α. An evolving subgraph problem, this approach is not suitable for discovering evolving with smoothness not exceeding α is called α-smooth evolving processes. subgraph. More formally: Our contributions are as follows: Definition 1: Given a dynamic network G¯ and an integer Novelty: We introduce the problem of mining smoothly- α, an α-smooth evolving subgraph R = {Gi,Gi+1,...,Gj}, evolving processes and prove that it is NP-hard even on trees. is a sequence of subgraphs (each one in a separate time slice) Quality: We propose a high-quality method for the smoothly- of G¯ that satisfies the following constraints: evolving process mining problem that achieves above 80% smoothness: |E(Gt)|+|E(Gt+1)|−|E(Gt)∩E(Gt+1)| ≤ accuracy in detecting realistic embedded processes and 3-7 α, ∀t ∈ [i, j − 1]. times improvement in solution scores compared to alternatives. connectivity: every subgraph is connected within its time Efficiency: We design a filtering-based framework for reduc- slice. ing the size of the input, making our high-quality algorithm temporal connectivity: two contiguous subgraphs share at feasible for large datasets. least one edge, i.e. E(Gt) ∩ E(Gt+1) 6= ∅, ∀t ∈ [i, j − 1] Relevance: Processes discovered in real-world transportation no-negative slices: all slices have positive score, i.e. networks agree with established models for traffic jam evolu- score(Gt) > 0, ∀t ∈ [i, j]. tion, while those in information networks reveal the informa- We measure the amount of variation between two consec- tion seeking behavior of a large user population as a response utive subgraphs as the Hamming distance of their edge-sets. to major events. For example in Fig. 4 The evolving subgraph composed by A, B and C is a 1-smooth evolving subgraph, while D and E 2 II.PROBLEM DEFINITION AND METHOD OVERVIEW form a -smooth evolving subgraph. An alternative is to use the Jaccard similarity, which allows for a number of changes We first introduce some notation, define the problem of relative to the current size of the subgraph. Our complexity mining smoothly evolving subgraphs (MINESMOOTH) and results and solution to the problem apply to both measures. prove that it is NP-hard even on trees. We then give a brief The other constraints in the definition ensure connectivity in overview of our method. graph and time, respectively. The requirement of no negative slices ensures that the graph of interest evolves contiguously ¯ A time-evolving network G = (V,E,W ) is an undirected and does not combine components that are temporally distant connected graph where V is the set of nodes, E is the set of (by a low-cost temporal path). edges, and W = {w1, w2, . . . , w|W |} is a family of edge- Definition 2: Score of an evolving subgraph: Given a weighting functions of the kind wt : E → R with positive time evolving network G¯, and an evolving subgraph R = (affected by a process) and negative (regular) values. For {Gi,Gi+1,...,Gj} of G¯, the score of R is given by: simplicity we consider only edge-weighted networks, although similar notions/solutions hold for node-weighted networks. j j X X X Each function wt corresponds to a discrete time t and defines score(R) = score(Gt) = w(e) a single snapshot static network sharing the topology of the t=i t=i e∈E(Gt) dynamic network. The weight of an edge in time quantifies its degree of involvement in a process of interest. High positive We remind the reader that edge weights can be positive or weight indicates strong activity (related to an underlying negative. A “good” evolving subgraph tends to include positive process), while a negative score means low or not significant edges and omit negative ones. Now, we are ready to define process-related activity of the edge. the main problem. In our formulation, the structure is fixed, while edge Definition 3: Problem α-MINESMOOTH: Given a time weights evolve in time. A different time-evolving network evolving network G, and a smoothness threshold α, find the formulation may involve dynamic node weights (our methods α-smooth evolving subgraph Rα of maximum score. extend to such cases as well). Note that our formulation We focus on maximizing the score of one single pattern, is different from alternatives focusing on network structure considering it as a fundamental step that allows for wider analysis. An algorithm for optimizing a smoothly evolving

subgraph can be applied several times, erasing the positive edges of found solutions, in order to list the main processes LB Spot occurring in a network. Network Dynamic UB We now discuss some special cases of our problem. None Smooth of them is appropriate for describing evolving processes. No-loss Accurate However, we will refer to them in our method description. Filtering Search α-MineSmooth Process 1) Single slice: In a static network the problem reduces to the Heaviest Subgraph (HS) problem [8], which calls for Fig. 2. Overview of our approach LEGATO for α-MINESMOOTH. finding the subgraph G0 ⊂ G of maximum score. HS can be shown to be equivalent to the Prize Collecting Steiner Tree (PCST) [22], which considers nodes with non-negative search procedure on the remaining subnetworks. The no- weights (prizes) and edges with negative weights (costs). Both loss filtering step involves computing a global lower bound HS and PCST are NP-hard on graphs but polynomial on trees. LB of the solution, and an upper bound UB(u, t) for each We will also refer to a variant of HS, called S-constrained pair node/time-slice (u, t) for the maximum score of any HS, requiring that a set S of nodes (or edges) are included smooth solution including the pair. Each node/time-slice (u, t) in the solution.S-constrained HS can be approximated by such as UB(u, t) < LB can be filtered out. The filtering generalizing the algorithm for computing rooted HS [33](for step (i) drastically reduces the size of the input without any details see Suppl. mat. [2]). loss in accuracy, and (ii) shatters the dynamic networks into smaller disconnected components that can be analyzed sepa- 2) Fixed pattern (0-MINESMOOTH): This problem can be 2 rately. This permits the employment of our accurate, though reduced to O(T ) instances of PCST [8]; on trees it can be POT ND MOOTH 2 expensive, heuristic S A S in the second step. solved with time complexity O(T · |E|). SPOTANDSMOOTH spots promising traces over multiple time 3) Connected pattern (∞-MINESMOOTH): This problem slices, builds “rough” solutions by relaxing the smoothness can be solved in polynomial time on trees (see Suppl. mat. [2]). constraint; and then refines the solutions until smoothness is satisfied. The method is described in details in the following Problem α-MINESMOOTH and its special cases are NP- sections. hard as they generalize HS to multiple slices. However, optimizing under a general smoothness constraint α makes III.NO-LOSS FILTERING FOR MINESMOOTH the problem even harder. More formally, if we restrict the input to trees, both 0-MINESMOOTH and ∞-MINESMOOTH We employ an effective filtering approach to reduce the become solvable in polynomial time, while we show that α- size of the input. For each pair node/time-slice (u, t) of the MINESMOOTH remains NP-hard. We call this restriction to dynamic network, we compute an upper bound UB(u, t) for trees α-MINESMOOTHTREE, and prove that, even for α = 1, the score of any α-MINESMOOTH solution that includes the the problem (1-MINESMOOTHTREE) is NP-hard. node u at time t. UB(u, t) is then compared with a global lower bound LB for the optimal solution, computed by a fast Theorem 1: The decision version of 1- heuristic. Nodes in time (u, t) s.t. UB(u, t) < LB cannot MINESMOOTHTREE: “Is there a 1-MINESMOOTHTREE belong to the optimal solution and can be filtered out. solution with score exceeding B?” is NP-hard. The problem remains NP-hard with the Jaccard coefficient smoothness Next, we discuss the computation of our upper bound. It constraint (in place of the Hamming distance). involves two steps: 1) computation of a single-slice upper A proof by reduction from 3SAT is in the supplement [2]. bound UB(v, t) for every node and then 2) computation of an upper bound for MINESMOOTH (across multiple slices) Solutions for α-MINESMOOTH can be approached in sev- UB(v, t) based on UB(v, t). We first define a novel tight upper eral ways. An optimal algorithm can be designed by branch- bound for a single slice. We further improve it, by designing and-bound or by reducing to Integer Linear Programming and an efficient and effective partitioning-based algorithm for com- employing a standard solver. However, such approaches are puting an even tighter node-wise upper bound. We then extend limited to very small problem instances, and therefore are our “group upper bound” to multiple slices. We also describe infeasible on large networks (in our tests, a standard solver our fast heuristic for computing a lower bound. (ILOG CPLEX) was able to handle up to a few tens of slices in a network with 100 nodes). A. Upper Bound in a single slice Another direction consists of approximating the solution We describe the computation of an upper bound in a with a fast heuristic by starting from a promising graph in a single slice under the PCST setting: non-negative node weights time slice and extending greedily, until no “profitable” exten- (prizes) and negative edge weights (costs). A Heaviest Sub- sions are possible without violating the constraints. Although graph problem instance can be reduced to a PCST instance, efficient, such an approach loses in quality and hence is not as discussed previously. While our goal is to develop effective satisfactory. However, as we show in Sect. III-D, such approach filtering for our problem, our upper bound is applicable to can be used to generate a quick lower bound estimate. general PCST instances, and hence can be used for estimating the solution of arbitrarily hard PCST instances. Our approach LEGATO (illustrated in Fig. 2) reconciles the objective of accuracy with the need for feasibility for The general idea is that an optimal PCST solution may large problem instances. LEGATO is a two-step technique contain (at most) every positive node that is not too distant that first filters nodes of the dynamic network that provably from other positive nodes. Hence, every positive node can cannot belong to the solution, and then applies a high-quality potentially contribute to the score of a solution by its weight minus some cost that depends on its distance from other solution. For instance, in Fig. 3, g and h have positive potential, positive nodes in the graph. For example, in Fig 3, node i and hence contribute to the upper bound. However, since the is too far from any other positive-score node in the graph, and subnetwork composed by g and h is far from the other positive therefore it cannot contribute to the score of an aggregated nodes in the graph, we may infer that it cannot be aggregated solution. In contrast, nodes a, b, e and f are close-by, and with S in the optimal PCST solution. This situation is typical hence each of them contributes to the upper bound. in real networks, where active nodes and edges are often “clustered” and there may be considerable distance among C UB(C )=5 3-2 4 1 1 clusters. To take advantage of this observation, we propose to -5 g h -5 p(g)=2 p(h)=3 clusters partition the network in groups and compute an upper bound -10 -10 for each group. Our procedure aggregates nodes in a bottom-

C2 UB(C2)=13 up fashion. It starts with assigning each positive node to a -12 -12 different group and proceeds by aggregating groups iteratively 8 7 a e until each group is isolated, i.e. its border nodes have a distance -4 -2 p(a)=4 p(e)=4 to positive components higher than the group upper bound. At c -2 d -4 -4 the end, the maximum upper bound among all isolated groups 7 5 is an upper bound for the optimal PCST in G. f p(b)=3 p(f)=2 -12 -12 A group C is a connected set of nodes in G. It may include -5 6 -5 zero and positive nodes. The upper bound of a group UB(C) -5 -5 zero nodes optimal solution S i is defined as: score = 11 C3 nonzero nodes p(i)=-5  X  Fig. 3. Example computation of the upper bound for a single slice. UB(C) = max p(u), max w(u) u∈C Let d(u, v) be the shortest path distance in a PCST instance u∈C:p(u)>0 (considering the absolute value of edge weights and disregard- ing node prizes) between two positive nodes u and v in G. Let The border of a group C is the subset of nodes in C that also d(u) be the shortest path distance between a node u in have at least one edge to a node not in C. For instance, in Fig. 3 G and the closest positive node in G. We call the quantity the border of C1 is composed of the two black nodes, since p(u) = w(u) − d(u)/2 potential of node u. We define the each of them has an edge that crosses the dashed “border” total potential (TP (G)) of a graph as the sum of the positive line. We define the minimum border distance dmin(C) as the P minimum d(u) over all u on the border of C (i.e. the minimum potentials: TP (G) = u∈G:p(u)>0 p(u). The upper bound is computed as the maximum of the total potential and the highest distance of border nodes from any positive nodes in G). We node weight in G: say that a group C is isolated if its minimum border distance is at least twice its upper bound (i.e. d(u) ≥ 2 · UB(C) for each u in the border of C). We next prove that an optimal   UB(G) = max TP (G), max w(u) solution for PCST cannot lie across two isolated groups. As u∈G a consequence, the maximum UB(C) over all groups is an Lemma 1: UB(G) is an upper bound for the optimal upper bound to the optimal solution. PCST in G. Lemma 2: Let C be a family of disjoint isolated groups G The proof is obtained by bounding the cost in an optimal that spans all positive nodes in . An optimal PCST solution C solution by the length of the circular path that explores all of positive score cannot include two or more groups in . solution nodes in DFS order. The detailed proof can be found The proof proceeds by contradiction. A hypothetical cross- in the supplement [2]. The upper bound can be computed with group solution would have score below zero. This can be time complexity O(|E| + |V | log |V |) by a modification of proved by bounding its score by the length of the circular path Dijkstra’s algorithm that computes d(u) for every node. The that explores each node/edge of the solution in DFS order and modified Dijkstra’s algorithm starts by visiting all positive considering that at least one sub-path (path between two nodes) nodes, and then zero nodes in order of proximity (shortest must have length greater than 2 · UB(C). For a detailed proof path distance) to the closest visited node. In this way, “discs” see the supplement [2]. of visited nodes are grown around positive nodes (centers). The above lemma implies that maxC∈C(UB(C)) is an When an edge across two disks is explored, the distance of upper bound for the optimal PCST in G. A general upper the two centers is updated. Eventually each positive node will bound for every node in the graph is defined as follows: store its minimum distance from any other positive nodes. Corollary 1: Let C be a family of disjoint isolated groups UB(G) can be used as a node-wise upper bound by posing that spans all positive nodes in G and let v ∈ C ∈ C be UB(u, t) = UB(Gt). However, this upper bound does not a node in the group C. UB(C) is an upper bound for the permit applying our filtering to parts of a slice. Next we show optimal PCST in G rooted in v. that we can apply our upper bound to groups of nodes in place We denote with UB(v, t) the single-slice upper bound for of the whole network. This allows us to compute the upper node v in slice t of a dynamic network. bound at a finer granularity. The grouping algorithm proceeds as follows. First, the B. Group Upper Bound distance d(u) from the closest positive node is computed for every node in the graph. This can be done in time complexity The single slice sums up quantities from all nodes in the O(|E| + |V | log |V |). Next, each positive node is assigned to graph that have positive potential. Some of these nodes are a different group. Each group C maintains its (i) UB(C) and far from each other and hence they cannot belong to the same (ii) dmin(C) = min d(u) over all u in the border of C. Groups are expanded in the following order. First, we pick the group C that minimizes 2 · UB(C) − dmin(C). Then, we expand the edge (u, v) such as u is on the border and d(u) is minimum. When an edge across two groups C1 and C2 is explored, C1 and C2 are merged and the upper bound of the new group is updated. The procedure stops when all groups are isolated (2 · UB(C) − dmin(C) ≤ 0) or the whole graph falls into a single group. By using binomial heaps and a disjoint-set data structure [15] the overall complexity can be shown to be O(|E| + |V | log |V |). In the example in Fig. 3, the algorithm produces three groups. UB(C2) has the highest value (13) and therefore it Fig. 4. Illustrative example for SPOTANDSMOOTH’s operations. represents an upper bound for the optimal solution S. The optimal solution is fully contained in C . Its value is 11. 2 then backward. During the backward propagation, the multiple- slice upper bound is computed. The algorithm is linear in the C. Upper Bound across slices number of connections between groups in contiguous time An upper bound for α-MINESMOOTH can be obtained by slices. Further details on the multiple-slice upper bound are relaxing the smoothness constraint and propagating the upper given in the supplement [2]. bound for static networks across slices. The idea is that a single-slice upper bound in a node can contribute to the upper D. EXPAND: a fast heuristic for MineSmooth bound of the same node in both the next and previous time Given a tight upper bound for every node/time-slice, we slices. In view of efficiency, we compute the upper bound at need to compare it to a global lower bound, to decide which the level of groups, instead of nodes, as all nodes within a nodes to filter. We approximate a lower bound for the optimal group have the same upper bound. solution of MINESMOOTH by a fast heuristic: EXPAND. It We extend the notation UB(u, t), indicating the upper first finds seeds by approximating the highest score subgraph bound of node u at time t, to groups, and write UB(C, t) in every slice. Then it expands seeds forward and backward to denote the multiple-slice upper bound of a group C at time in time by a greedy strategy that preserves the smoothness slice t. The upper bound of a group is an upper bound for constraint. every node within it. In more detail, EXPAND samples k time slices at random ←−− with a probability that is proportional to their sum of edge We introduce the quantities UB(C, t) (score propagated to −−→ weights. This gives us higher chances to select a slice that C from future time slices) and UB(C, t) (score propagated to contains a connected subgraph of high score. For each selected C from past time slices). Let Ct be a group in time slice t slice, we approximate the Heaviest Subgraph [8] and expand and Ct+1 be the set of groups in slice t + 1 that share at least ←−− the obtained subgraph greedily forward and backward in time. one node with Ct. UB(Ct, t) is defined recursively as: At the end, the highest-score evolving subgraph found is selected.

←−− t t+1 ←−− t+1 The expansion procedure proceeds recursively as follows. UB(C , t) = max (UB(C , t + 1) + UB(C , t + 1)) t Ct+1∈Ct+1 Given an evolving subgraph expanded up to slice , first the subgraph Gt+1 at time t + 1 (t − 1 in the case of expansion Gt −−→ t backward) is set equal to . Then, the set of all candidate UB(C , t) is defined similarly. UB(C, t) is defined as: “edits” (edge insertions and deletions) on Gt+1 that give a “revenue” (increase of the score) is considered and organized t+1 −−→ ←−− into a priority queue. Such a set includes all edges of G UB(C, t) = UB(C, t) + UB(C, t) + UB(C, t) that do not disconnect the graph and have negative weight, and all edges not in Gt+1 that have at least one endpoint in Gt+1 The following lemma states that UB(C, t) is an upper and have positive weight. The edits are processed one by one bound for the optimal multi-slice solution. until no more edits with positive revenue are possible without violating the smoothness constraint. Every time an edit is pro- Lemma 3: Let C be an isolated group in a time slice t of ¯ cessed, the list of candidate edits is refreshed. The expansion a dynamic network G. Let u be an arbitrary node in C and procedure stops when no more revenue can be obtained from α — an arbitrary integer. UB(C, t) is an upper bound for the a local expansion. This procedure can be computed efficiently optimal α-MINESMOOTH solution that contains u at time slice by keeping track of all bi-connected components [45]. The t. overall complexity is O(k · |E| · |W | · (log |E| + log |V |)). We The proof of this lemma stems from the fact that any refer to the supplemental material [2] for further details. solution for MINESMOOTH can be decomposed into a number of connected graphs, one for each slice, each of them fully IV. AN ACCURATE ALGORITHM FOR MINESMOOTH contained in a group. Therefore, the score of a multiple-slice solution can be bounded by the sum of the single-slice upper The filtering allows us to reduce the size of the dy- bounds of each group. namic network significantly by removing parts of the slices that cannot belong to the solution. Next, we can ap- Lemma 3 gives us a simple algorithm for computing the ply a high-quality (though expensive) algorithm on the re- upper bound on multiple slices. After the group upper bound is maining dynamic network. Our graph-smoothing-based algo- computed on every slice, the scores are propagated forward and rithm SPOTANDSMOOTH for α-MINESMOOTH first identifies Algorithm 1 Spot Algorithm 2 Smooth Require: A time-evolving network G¯ = (V,E,S), an integer α Require: A time-evolving network G¯ = (V,E,S), an integer α Ensure: A set of candidate traces C in G Require: A set of candidate traces C in G 1: Compute all {v}-constrained HS Gt,v in each node/slice v, t Ensure: The best α-smooth evolving subgraph found R 2: Construct a temporal DAG with nodes of the kind (v, t) representing 1: for all paths P ∈ C do {v}-constrained HSs with positive score, and edges across contiguous 2: Rˆ ← BuildEvolvingSubgraph(P ) slices of the kind (v, t) → (u, t + 1) indicating that v and u can be 3: R ← SmoothEvolvingSubgraph(R) connected across the slices t, t + 1. 4: end for 3: C ← ∅ 5: return the maximum score evolving subgraph R found 4: for all pairs ((vi, i), (vj , j)) in DAG do 5: Compute the highest-score path P between (vi, i) and (vj , j) 6: C ← C ∪ P 7: end for the minimum penalty which needs to be paid for satisfying the 8: return C smoothness constraint. We introduce the cost for smoothing to penalize drastic changes and increase the chances that the best path corresponds to a smooth subgraph. The resulting structure promising “rough” candidates and then refines them to obtain does not have any directed cycles as no edges are added among smooth solutions. There are two main phases of the algorithm: nodes within the same time slice and added edges follow the (i) Spot (candidate identification) and (ii) Smooth (refinement). direction of time. Finding the highest-score path between a pair of nodes is an NP-hard problem for general weighted The general idea of SPOTANDSMOOTH can be described graphs [15]. However, for DAGs one can employ dynamic with the example in Fig. 4. Positive edge weights are indicated programming to find the highest score from a source to any in red, while negative ones are colored green. Five groups destinations in linear time [15]. of positive edges (A-E) are highlighted. Those groups can be joined by traces across slices (in dashed line). For clarity Next, we discuss in details how to compute the cost for only three possible traces are indicated. Among the two traces interconnection and the cost for smoothing. joining A and C, only the one that passes through B is considered as candidate. Indeed for joining E with A we 1) Cost for interconnection: Given two connected sub- need to (i) insert negative edges to maintain connectivity graphs Gt,v and Gt+1,u in two contiguous slices, if such across slices and (ii) change A and/or E so as to satisfy the subgraphs have at least one node in common, their cost of smoothness constraint. The A-E-C choice is suboptimal and interconnection is zero. If they do not have edges in common, then the cost for interconnection is obtained by enumerating all hence neglected. Smoothing is then applied to the traces A-B- ¯ C and D-E. Assuming α = 1, A-B-C is a smoothly evolving nodes in G and, for each node z (called bridge node), summing subgraph, while D-E needs to be smoothed as two edges are up the cost for interconnecting Gt,v with z in slice t (costt) change from one to the next slice. Edges can be added to E (or and the cost for interconnecting Gt+1,u with z in slice t + 1 removed from D) until the smoothness constraint is satisfied (costt+1). The minimum over all nodes is the interconnection (in this case only one edge is sufficient). cost. costt can be computed by approximating the {v, z}- constrained HS in slice t and subtracting score(Gt,v). costt+1 can be computed analogously. A. The Spot algorithm 2) Cost for smoothing: The cost for smoothing is the The Spot procedure is described in Alg. 1. It takes as penalty that needs to be paid for transforming G and G ¯ t,v t+1,u input a dynamic network G and a smoothness constraint α, in two graphs that satisfy the smoothness constraint. It can be ¯ and computes a set of candidate traces in G. After computing computed by smoothing the evolving subgraph {G ,G } ¯ t,v t+1,u the {v}-constrained HS for each node/slice in G (Step 1), and comparing the smoothed score with the “rough” one. we build a directed acyclic graph (DAG) that summarizes the Smoothing is discussed next. dynamic network (Step 2) and then we compute the highest- score path among all pairs in the DAG (Step 3-7). Each path B. The Smooth algorithm (trace) corresponds to a candidate evolving subgraph. Although a candidate is not necessarily smooth, smoothness is taken into The Smooth procedure is described in Alg. 2. For each account in the highest-score path computation, so as to favor candidate trace P = [(vi, i), (vi+1, i + 1),..., (vj, j)] from paths that are likely to produce good smooth solutions. graphs the Spot procedure, first a “rough” evolving subgraph is built but can be computed efficiently on a DAG. as follows:

In order to construct the DAG, for each node/slice (v, t), (1) For each internal path node (vt, t) (except the first and the we compute the {v}-constrained HS Gt,i and add a node in the last ones), consider the bridge nodes (nodes that minimize the DAG identified by (v, t). Two DAG nodes (v, t) and (u, t+1) interconnection cost, see Sect.IV-A) zt−1 (with the previous in contiguous time slices are connected by an edge iff their slice) and zt (with the next slice) and compute Gt as the {v}-constrained HS Gt,v and Gt+1,u can be interconnected {zt−1, zt+1}-constrained HS in t; without causing the score in one of the two slices to be (2) Compute Gi as the {zi}-constrained HS in i (first slice); negative. The edge direction between two nodes respects the (3) Compute Gj as the {zj−1}-constrained HS in j (last slice); t t+1 (v, t) time order (from to ). Each node is assigned with a (4) Build Rˆ as {Gt|t = i . . . j}. weight equal to score(Gt,v). The edge weight is assigned with a negative score that represents the cost for interconnecting The next step is to refine Rˆ to obtain a smoothly evolving the two subgraphs plus the cost for smoothing (computing subgraph. Smoothing is performed from the starting slice to the two costs is discussed later in this section). The cost the end, by transforming each slice such the edge set Hamming for interconnecting considers that the HSs in two contiguous distance with neighboring slices is reduced to at most α. The slices may not overlap, and thus a cost needs to be paid to transformation considers a series of moves, where a move can maintain connectivity. The cost for smoothing corresponds to be an edge insertion or an edge deletion. Each move has a cost, corresponding to the penalty (score reduction) that needs Significance score. We obtain real-valued (positive and nega- to be paid for making the move. Feasible moves are the ones tive) scores for our real-world datasets by applying a signifi- that do not disconnect the graph. Moves are chosen greedily cance transformation wt(e) = −log(p(st(e))/µ), where st(e) until the smooth constraint is satisfied, similar to the extension is the original value of the edge/node at time t (average speed strategy in EXPAND (Sect.III-D). or number of page views), p() is the p-value of observing the value in the empirical distribution for the same edge/node and µ is a significance threshold level used for the whole network. V. EXPERIMENTS In Wikipedia, we assign an edge score based on the average p-values of neighboring nodes, ensuring that two pages are A. Implementation more likely to be included in the solution if they both observe We implement our method in Java and experiment on a higher than usual page view traffic. The significance level Linux server Intel Xeon 2.0 GHz 4MB cache (on a single core) transformation is designed to boost the score of significant and 98 GB RAM. Since there are no competing methods for α- (anomalous) edge weights, thus targeted to anomaly processes MINESMOOTH, we compare LEGATO to two other heuristics: unfolding in the networks. Other scoring schemes such as REFINE and EXPAND.REFINE obtains a fixed dynamic region thresholding can be applied as well. The value of µ is set based on [8] and then refines it by removing negative or adding to 0.01 for Traffic-100 and Wikipedia networks and 0.001 positive edges greedily, while keeping the smoothness con- for the large Traffic network. straint satisfied. We also compare to the previously described heuristic EXPAND (Sect. III-D). C. Evaluation In this section, we evaluate the quality of LEGATO (in terms B. Datasets of solution score), its running time and the pruning power of the upper bound. At the end, we summarize the mined patterns. We use three real-world dynamic networks: a small and a large highway traffic network and a subnetwork of Wikipedia. Quality. First, we evaluate the quality of LEGATO and alter- We also experiment with synthetic dynamic networks. Table I natives. The higher the score of the obtained region, the better lists the sizes of all datasets. The largest network (Traffic) it has captured the underlying process (bigger/longer region). contains 53 million edges in all time slices. We measure the quality as the relative score improvement with respect to a 0-MINESMOOTH solution (obtained by [8]). Note TABLE I. SIZESOFTHEEXPERIMENTALNETWORKS that by definition a 0-MINESMOOTH is a valid, although sub- optimal, solution for α-MINESMOOTH (α > 0). Dataset Nodes Edges Slices Slice length Traffic-100 100 128 8640 5 min In order to demonstrate the quality over different-size prob- Traffic 1923 6208 8640 5 min lem instances, we sample random time intervals of increasing Wikipedia 992 2888 1446 1 day size (100 samples per interval size) and plot the obtained Synthetic 500 1000 8000 improvement (average % improvement w.r.t. 0-MINESMOOTH) in Fig. 5(a)-5(d). Note, that high quality in larger intervals is of Traffic. The structure of the traffic datasets corresponds to more interest as real datasets might evolve over long periods. the highway network of Los Angeles, CA. Edges are highway LEGATO outperforms the simple heuristics on all datasets with segments weighted by average traffic speeds over time at 5 increasing improvement gap for longer intervals. On average, min. resolution. This 854MB dataset spans April 2011 and is we achieve a relative improvement exceeding 25% over 0- obtained from the PeMS [1] website. We also experiment with MINESMOOTH, demonstrating that processes in all considered a smaller connected subgraph of Traffic, called Traffic-100. networks evolve smoothly and hence require an adequate and EGATO 3 Wikipedia. The Wikipedia dataset captures the daily number flexible method to be detected. L exhibits between - 7 of views of hyper-linked articles in a subgraph of Wikipedia fold (on Traffic-100) and -fold (on Traffic) improvement as (in the period 2008-2011). An edge is included if there are at compared to EXPAND and REFINE for long intervals. least 8 hyper-links between 2 articles. Next, we measure quality for increasing α (Fig. 5(e)- 5(h)). Again, LEGATO outperforms the alternatives in all cases. Synthetic. The structure of our synthetic network is an Erdos-˝ When a small number of changes across time slices is allowed Renyi´ . We generate the dynamics by injecting (α ≤ 3), all alternatives achieve scores close to the baseline shifting diffusion processes and then assigning weights to 0-MINESMOOTH solution. As we allow regions to vary more edges. “In-pattern” edges are part of injected processes, while across slices (targeting evolving processes), LEGATO’s scores “out-pattern” ones are normal edges. To select “in-pattern” exceeds the 0-MINESMOOTH score by more than 25% on real edges, we choose uniformly at random a root edge e(t) at datasets, constituting more than 2-fold improvement over the time t and a duration of the subgraph (distributed normally alternatives. This gap is even bigger in Traffic (7-fold). Since around 10 time steps). We expand the pattern in the graph our Synthetic network is designed to be challenging for fixed by a random walk from e(t) based on a fixed probability pattern approaches (due to aggressive shift of regions), we p = 0.8 of including a neighboring edge. The t + 1 root edge obtain solution scores 4 times higher than the ones found by is chosen uniformly among the “in-pattern” edges at time t REFINE and EXPAND (for α = 30). and the expansion process is repeated until the pattern reaches the chosen duration. We repeat process injection until a fixed Accuracy. We also evaluate the accuracy of LEGATO in detect- fraction µ = 0.01 of in-pattern edges is reached. We weight ing manually introduced processes 6(a). We generate random in-/out-pattern edges by random positive/negative scores drawn dynamic networks (|V | = 100, |E| = 400, |W | = 100), from an exponential distribution and add noise by reversing the weighted similarly to our synthetic network, but containing sign of some edges. The evolution of our synthetic network only one shifting process each. We control the level of shifting is designed to be more similar to real-world processes. in the generation of embedded processes, by varying the edges 30 25 40 200 Legato Legato 35 Legato 25 Expand 20 Expand Expand 150 Refine Refine 30 Refine 20 15 25 100 Legato 15 20 Expand 10 15 50 Refine 10 10 5 0 5 5 % Impr. wrt 0-MineSmooth % Impr. wrt 0-MineSmooth % Impr. wrt 0-MineSmooth % Impr. wrt 0-MineSmooth 0 0 0 -50 10 100 200 300 10 50 100 150 10 50 100 150 10 50 100 150 Interval Size Interval Size Interval Size Interval Size (a) Traffic-100 (b) Wiki 1k (c) Traffic (d) Synthetic

35 25 35 400 Legato Legato Legato 350 Legato 30 20 30 Expand Expand Expand 300 Expand 25 Refine Refine 25 Refine Refine 15 250 20 20 200 10 15 15 150 5 100 10 10 50 0 5 5 0 % Impr. wrt 0-MineSmooth % Impr. wrt 0-MineSmooth % Impr. wrt 0-MineSmooth % Impr. wrt 0-MineSmooth 0 -5 0 -50 1 3 5 10 30 1 3 5 10 30 1 3 5 10 30 1 3 5 10 30 α α α α (e) Traffic-100 (f) Wikipedia (g) Traffic (h) Synthetic

Fig. 5. LEGATO outperforms alternatives significantly in quality (score improvement over 0-MINESMOOTH) in all datasets for (a)-(d) increasing interval sizes (α = 8), and (e)-(h) increasing α (interval size 100).

100 1.0 Legato 10000 90 Expand 100 Refine 1000 85 100 Pruning 50 80 0.5 No Pruning 95

% time 10 Time (s) Traffic-100 α Wiki 75 =1 1 % Filtered Nodes Traffic α=10 Synthetic α=30 Solution score (% of UB)

Accuracy (Frac. Overlap) 0.1 90 70 0.0 0 40 50 60 Filter SpotAndSmooth 5 10 15 50 100 150 200 50 100 150 % Shift across slices % Shift across slices Interval Size Interval Size Interval Size (a) Accuracy (b) Time break-down (c) Pruning vs. no pruning (d) Pruned nodes (e) UB tightness

Fig. 6. (a) Accuracy of discovering “embedded” dynamic processes. (b) Execution break-down of LEGATO on the Traffic-100 (intervals size 100, α = 8). The boxes mark the 1st and 3rd quartile, the thick line is the average, and the error bars mark max/min values. (c) Avg. running time of LEGATO with and without pruning on Traffic. (d) Avg. % of pruned nodes in all datasets. (e) Tightness of the upper bound on Traffic-100, defined as the solution score as a percentage of the average node upper bound.

TABLE II. RUNNING TIME COMPARISON (INSECONDS) FORALL that change from one time slice to the next from 40% to TECHNIQUES AND DATASETS (α = 8). 60%. We also introduce noise by flipping the signs of random edges in time in the whole network. Accuracy of a solution REFINE EXPAND LEGATO is measured as the fraction of edges that overlap with the Interval: 100 Full 100 Full 100 Full Traffic-100 0.01 6 0.004 0.15 1.8 131 EGATO originally embedded process. L dominates in accuracy Traffic 0.3 64 0.1 0.6 15.5 3005 the alternatives for all levels of shift, retaining close to 90% Wikipedia 0.13 2 0.05 0.18 139 2655 of the original process for small shifts and 80% for higher Synthetic 0.07 1 0.03 0.07 144 1833 levels of shift. The relative improvement of LEGATO over the alternatives (15%-40%) increases with the shift in embedded processes, making it suitable for real world networks (Traffic quality comes at the price of more computation. LEGATO and Wiki) that observe significant shifting behavior. EXPAND completes in less than 2.5 minutes when evaluating size-100 is not able to capture completely the embedded processes due intervals on all datasets, and in less than an hour when executed to its greedy strategy causing it to sometimes “break” the on the full datasets. Although the sampling/greedy strategies extension. used by EXPAND and REFINE are much faster than LEGATO, The advantage of LEGATO stems mainly from its ability they often miss interesting high-scoring patterns, which capture to spot good multiple-slice candidates. This is due to the the underlying evolution of network processes. summarization performed during the DAG construction and the Figure 6(b) shows the percentage of time spent by LEGATO possibility to take into account the single-slice (PCST-based) in each of its phases. While there is some variability for differ- optimization. LEGATO captures flexible (growing, shrinking, ent sampled instances, SPOTANDSMOOTH tends to dominate moving) processes and effectively handles possible noisy neg- the execution time. Finally, in order to quantify the effect ative edges that might “break” high-scoring regions. of the upper-bound filtering, we compare LEGATO with a Execution time. We report the running time of all techniques version without filtering in the large Traffic dataset (Fig. 6(c)). in Table II. We measure the average time (100 samples) for Filtering enables 4 orders of magnitude running time savings intervals of size 100 and for the full datasets. While our scheme for instances as small as 10 time steps. This gap tends to achieves the best score in all datasets (Fig. 5), this superior increase dramatically, making it infeasible to analyze large 2.6 1.8 mat. [2]. Legato 1.6 Legato 2.4 Expand Expand Refine 1.4 Refine 2.2 1.2 We extracted multiple patterns from Wikipedia by repeat- 2 1 edly applying LEGATO on the whole network timeline and 1.8 0.8 0.6 erasing positive edge scores after each discovered pattern. We 1.6 Average Rel. Growth Average Center Shift 0.4 were able to match the obtained fact searching processes ( 1.4 0.2 5m 10m 15m 20m 25m 30m 5m 10m 15m 20m 25m 30m by time and network location) to major events that received Time since start Time since start significant media attention: the Cricket World Cup finals (see (a) Relative Growth (b) Relative Shift Fig. I), other major sport events, airplane accidents, elections and others. While earlier methods [33] have also discovered Fig. 7. (a) Relative process growth w.r.t. the first time slice region and (b) important events in Wikipedia’s page views time-line, LEGATO relative process center shift for a 30-min period in the Traffic-100 dataset. is further able to capture the evolution of attention focus in the information foraging process, since it is tailored to discovery real-world datasets without incorporating filtering. of flexible subgraphs that evolve in time as opposed to ones with a fixed structure. Pruning. While LEGATO produces high quality solutions, the filtering step is essential for scaling to large instances. Thanks VI.RELATED WORK to the upper-bound filtering, over 95% of nodes (over time) are filtered across all datasets (Fig. 6(d)), and this number exceeds Several mining methods for time-evolving networks have 99% for the large networks: Traffic and Synthetic. been recently proposed. Approaches include evolutionary clus- tering (community detection) [26], [13], [20], [11], [35], [30], We also evaluate the tightness of the upper bound (how [46], [39], [6], temporal pattern mining [17], [32], frequent close the solution score is to the upper bound) (Fig. 6(e)). pattern mining [9], [7], link prediction [41], and anomaly For every solution of LEGATO in randomly sampled Traffic detection [43], [3]. Our problem is different from all the intervals, we measure the average difference of the solution above since we focus on mining processes (part of the graphs score and the included nodes’ upper bounds. We plot those that are relevant as opposed to frequent) and we focus on for varying interval length and smoothness. The upper bound scenarios with evolving node/edge weights. While our method is tighter when more changes are allowed, but even for α = 1 is applicable for detecting anomalies, it differs from existing the solution score is more than 75% of the Upper bound. methods in focusing to evolving subgraphs as opposed to single nodes or edges. Discovered patterns. We demonstrated that LEGATO dom- inates alternatives based on score. Next, we also show that Research on mining processes in real networks has focused it discovers more interesting and domain-relevant processes. mainly on epidemic processes, such as virus propagation [38] First, we analyze the evolution of discovered patterns (rep- and information diffusion [19], [34]. Studies on information resenting processes) in Traffic-100. We measure the relative diffusion have coped with problems such as influence max- growth of patterns that last at least 30 minutes (7 time steps) imization [23], [24], [28], tracking [4], and prediction [27], in Fig. 7(a) and their relative average “shift” in Fig. 7(b). The [44]. Epidemic processes are profoundly different from the relative growth is measured as the ratio of the subgraph size ones we introduce in this paper. While epidemic processes after a fixed elapsed time (5 to 30 min) and its size in the first spread very fast, we focus on processes that evolve smoothly in time slice of the solution. The shift of a dynamic subgraph is time, such as traffic jams and information foraging. More simi- measured as the distance from the most central node in the lar to our setting is the heaviest dynamic subgraph problem [8] initial slice and the centers of subsequent slices. and the discovery of correlated spatio-temporal changes in evolving graphs [12]. In these works the subgraph of interest is According to established models for congestion evolution considered fixed throughout its temporal extent. This is a strict in road networks [25], traffic jams “grow and shift” upstream requirement for a number of application domains such as traffic of the traffic direction with time. Since LEGATO is specifically networks and biological networks. Congested regions have targeted to discover such evolving processes the behavior of been shown to spread and migrate in the network [25], [40]. its solutions agree with the established models. The average Similarly, the components of biological functional modules are LEGATO pattern grows up to 2.4 times its original size in the dynamically activated [16], [29]. first 15 minutes and this growth captures the onset of traffic jams (Fig. 7(a)). The pattern regions identified by LEGATO The concept of temporal smoothness has been employed also tend to “drift” from their original position with time in evolutionary clustering [13], motivated by the idea that (Fig. 7(b)). Our scheme can also identify traffic jams that communities tend to evolve slowly in time. A similar concept spread to intersecting highways as well, and can aid novel have been re-proposed by Ahmed et al. [5] for modeling the models that go beyond a single highway. evolution of “relational states” in dynamic networks. Here, we consider smoothness for modeling the evolution of quantitative Note, that similar trends of growth and shift are not processes within a network and therefore the problem solved in captured to the same extent by REFINE and EXPAND this paper is significantly different from previously proposed (Fig 7(a),7(b)), as by design REFINE is not tailored to evolving ones. Moreover, to the best of our knowledge there are no patterns while EXPAND is biased to the region of its original specific studies on the complexity introduced by smoothness. seed slice. The solutions of the alternative techniques cover From this perspective, our results can be conceptually extended about 3/4 of the average LEGATO pattern (based on number to evolutionary clustering and other fields. of common edges), missing parts of the underlying process evolution. The superiority of LEGATO is not only based on VII.CONCLUSIONS better pattern score, but also on capturing the natural smooth evolution of the underlying processes. For more characteri- Mining processes that may expand, shrink, and shift over a zation of the discovered process behaviors/shapes see Suppl. dynamic network is a difficult task. The challenges are many: the size of the network structure, the number of time slices [19] D. Gruhl, R. Guha, D. Liben-Nowell, and A. Tomkins. Information over which the network evolves, the separation of signal from diffusion through blogspace. In Proceedings of the 13th international noise, and ensuring smoothness from one time slice to the next. conference on World Wide Web, WWW ’04, pages 491–501, New York, NY, USA, 2004. ACM. A successful approach for the task should be able to adapt to [20] M. Gupta, C. Aggarwal, J. Han, and Y. Sun. Evolutionary clustering the studied underlying process while scaling to large problem and analysis of bibliographic networks. In ASONAM, 2011. instances. [21] D. Helbing, M. Treiber, A. Kesting, and M. Schonhof.¨ Theoretical vs. empirical classification and prediction of congested traffic states. Eur. We demonstrated through an extensive validation in real- Phys. J. B, 69(4):583–598, 2009. world datasets that we can find relevant dynamic processes [22] D. S. Johnson, M. Minkoff, and S. Phillips. The Prize Collecting Steiner (not just ones of high score which is only one way to measure Tree Problem : Theory and Practice. SODA, 2000. their “interestingness”). Our intuitive problem formulation [23] D. Kempe, J. Kleinberg, and E.´ Tardos. Maximizing the spread of captures the dynamics of evolving processes in real networks. influence through a . In Proceedings of the ninth ACM Apart from achieving 3-7 times improvement over alternatives, SIGKDD international conference on Knowledge discovery and data our method LEGATO discovers evolving processes in traffic mining, pages 137–146. ACM, 2003. networks that conform to known models for shifting traffic [24] D. Kempe, J. Kleinberg, and E.´ Tardos. Influential nodes in a diffusion jams, as well as evolving regions of highly accessed pages in model for social networks. In Automata, languages and programming, pages 1127–1138. Springer, 2005. Wikipedia, temporally corresponding to major events. LEGATO [25] B. S. Kerner, H. Rehborn, M. Aleksic, and A. Haug. Recognition can aid novel data driven models for congestion dynamics in and tracking of spatial-temporal congested traffic patterns on freeways. traffic networks, root cause analysis in computer and water Transportation Research Part C: Emerging Technologies, 12(5), 2004. distribution networks as well as provide an understanding of [26] M.-S. Kim and J. Han. A particle-and-density based evolutionary the dynamics of interactions within communities in social clustering method for dynamic networks. VLDB Endow., 2. networks. [27] K. Lerman and T. Hogg. Using a model of social dynamics to predict popularity of news. In Proceedings of the 19th international conference on World wide web, WWW ’10, pages 621–630, New York, NY, USA, REFERENCES 2010. ACM. [28] J. Leskovec, L. A. Adamic, and B. A. Huberman. The dynamics of [1] Pems http://pems.dot.ca.gov/. viral marketing. ACM Transactions on the Web (TWEB), 1(1):5, 2007. [2] Supplemental material. https://www.box.com/s/sjkq4euln2sscjnmvj87. [29] M. Li, X. Wu, J. Wang, and Y. Pan. Towards the identification of [3] J. Abello, T. Eliassi-rad, and N. Devanur. Detecting Novel Discrepancies protein complexes and functional modules by integrating PPI network in Communication Networks. ICDM, 2010. and gene expression data. BMC Bioinformatics, (1):109, 2012. [4] E. Adar and L. Adamic. Tracking information epidemics in blogspace. [30] Y.-R. Lin, Y. Chi, S. Zhu, H. Sundaram, and B. L. Tseng. Facetnet: a In Web Intelligence, 2005. Proceedings. The 2005 IEEE/WIC/ACM framework for analyzing communities and their evolutions in dynamic International Conference on, pages 207–214, 2005. networks. In WWW, 2008. [31] X. Ma, H. Xiao, S. Xie, Q. Li, Q. Luo, and C. Tian. Continuous, online [5] R. Ahmed and G. Karypis. Algorithms for mining the evolution of monitoring and analysis in large water distribution networks. In ICDE, conserved relational states in dynamic networks. In ICDM, 2011. 2011. [6] S. Asur, S. Parthasarathy, and D. Ucar. An event-based framework for [32] M. McGlohon, J. Leskovec, C. Faloutsos, M. Hurst, and N. Glance. characterizing the evolutionary behavior of interaction graphs. In KDD, Finding patterns in blog shapes and blog evolution. ICWSM, 2007. 2007. [33] M. Mongiovi, P. Bogdanov, R. Ranca, A. K. Singh, E. E. Papalexakis, [7] M. Berlingerio, F. Bonchi, B. Bringmann, and A. Gionis. Mining graph and C. Faloutsos. Netspot: Spotting significant anomalous regions on evolution rules. In PKDD, 2009. dynamic networks. In SDM, 2013. [8] P. Bogdanov, M. Mongiovi, and A. K. Singh. Mining heavy subgraphs [34] A. Montanari and A. Saberi. The spread of innovations in so- in time-evolving networks. In ICDM, 2011. cial networks. Proceedings of the National Academy of Sciences, [9] K. Borgwardt, H. Kriegel, and P. Wackersreuther. Pattern Mining in 107(47):20196–20201, 2010. Frequent Dynamic Subgraphs. In ICDM, 2006. [35] P. J. Mucha, T. Richardson, K. Macon, M. A. Porter, and J.-P. Onnela. [10] R. B. Buxton, E. C. Wong, and L. R. Frank. Dynamics of blood flow in time-dependent, multiscale, and multiplex and oxygenation changes during brain activation: The balloon model. networks. Science, 328(5980), 2010. Magnetic Resonance in Medicine, 39(6), 1998. [36] S. A. Myers, C. Zhu, and J. Leskovec. Information diffusion and [11] D. Chakrabarti, R. Kumar, and A. Tomkins. Evolutionary clustering. external influence in networks. In KDD, 2012. In KDD, 2006. [37] P. Pirolli and W.-T. Fu. Snif-act: A model of information foraging on [12] J. Chan, J. Bailey, and C. Leckie. Discovering correlated spatio- the world wide web. pages 45–54, 2003. temporal changes in evolving graphs. Knowledge and Information [38] B. A. Prakash, H. Tong, N. Valler, M. Faloutsos, and C. Faloutsos. Virus Systems, 16(1):53–96, 2008. propagation on time-varying networks: Theory and immunization algo- [13] Y. Chi, X. Song, D. Zhou, K. Hino, and B. L. Tseng. Evolutionary rithms. In Machine Learning and Knowledge Discovery in Databases, spectral clustering by incorporating temporal smoothness. In KDD, pages 99–114. Springer, 2010. 2007. [39] J. Sun, C. Faloutsos, S. Papadimitriou, and P. S. Yu. Graphscope: [14] J. Cohen, W. Perlstein, T. Braver, L. Nystrom, D. Noll, J. Jonides, and parameter-free mining of large time-evolving graphs. In KDD, 2007. E. Smith. Temporal dynamics of brain activation during a working [40] M. Treiber and A. Kesting. Calibration and validation of models memory task. Nature, 386, 1997. describing the spatiotemporal evolution of congested traffic patterns. [15] T. H. Cormen, C. Stein, R. L. Rivest, and C. E. Leiserson. Introduction arXiv:1008.1639v1, 2010. to Algorithms. McGraw-Hill, 2001. [41] T. Tylenda, R. Angelova, and S. J. Bedathur. Towards time-aware link [16] U. de Lichtenberg, L. J. Jensen, S. Brunak, and P. Bork. Dynamic prediction in evolving social networks. In SNAKDD, page 9, 2009. complex formation during the yeast cell cycle. Science, 307(5710), [42] J. J. Tyson, K. C. Chen, and B. Novak. Sniffers, buzzers, toggles and 2005. blinkers: dynamics of regulatory and signaling pathways in the cell. [17] N. Du, H. Wang, and C. Faloutsos. Analysis of large multi-modal Current Opinion in Cell Biology, 15(2), 2003. social networks: Patterns and a generator. In Machine Learning and [43] X. Wan, E. Milios, N. Kalyaniwalla, and J. Janssen. Link-based event Knowledge Discovery in Databases, volume 6321 of LNCS. 2010. detection in email communication networks. In SAC, 2009. [18] S. Goel, A. Baykal, and D. Pon. Botnets: the anatomy of a case. Journal [44] L. Weng, F. Menczer, and Y.-Y. Ahn. Virality prediction and community of Information Systems Security, 2006. structure in social networks. arXiv preprint arXiv:1306.0158, 2013. [45] J. Westbrook and R. Tarjan. Maintaining bridge-connected and bicon- nected components on-line. Algorithmica, 7:433–464, 1992. [46] T. Yang, Y. Chi, S. Zhu, Y. Gong, and R. Jin. Detecting communities and their evolutions in dynamic social networks–a bayesian approach. Mach. Learn., 82, 2011.