<<

Uncovering the Temporal Dynamics of Diffusion Networks

Manuel Gomez-Rodriguez1,2 [email protected] David Balduzzi1 [email protected] Bernhard Scholkopf¨ 1 [email protected] 1MPI for Intelligent Systems and 2Stanford University

Abstract Observing a diffusion process often reduces to noting when Time plays an essential role in the diffusion of in- nodes (people, blogs, etc.) reproduce a piece of informa- formation, influence and disease over networks. tion, get infected by a virus, or buy a product. Epidemiolo- In many cases we only observe when a node gists can observe when a person becomes ill but they can- copies information, makes a decision or becomes not tell who infected her or how many exposures and how infected – but the connectivity, transmission rates much time was necessary for the infection to take hold. In between nodes and transmission sources are un- information propagation, we observe when a blog mentions known. Inferring the underlying dynamics is a piece of information. However if, as is often the case, of outstanding interest since it enables forecast- the blogger does not link to her source, we do not know ing, influencing and retarding infections, broadly where she acquired the information or how long it took construed. To this end, we model diffusion pro- her to post it. Finally, viral marketers can track when cus- cesses as discrete networks of continuous tempo- tomers buy products or subscribe to services, but typically ral processes occurring at different rates. Given cannot observe who influenced customers’ decisions, how cascade data – observed infection times of nodes long they took to make up their minds, or when they passed – we infer the edges of the global diffusion net- recommendations on to other customers. In all these sce- work and estimate the transmission rates of each narios, we observe where and when but not how or why edge that best explain the observed data. The op- information (be it in the form of a virus, a meme, or a timization problem is convex. The model nat- decision) propagates through a population of individuals. urally (without heuristics) imposes sparse solu- The mechanism underlying the process is hidden. How- tions and requires no parameter tuning. The ever, the mechanism is of outstanding interest in all three problem decouples into a collection of indepen- cases, since understanding diffusion is necessary for stop- dent smaller problems, thus scaling easily to net- ping infections, predicting meme propagation, or maximiz- works on the order of hundreds of thousands of ing sales of a product. nodes. Experiments on real and synthetic data This article presents a method for inferring the mechanisms show that our algorithm both recovers the edges underlying diffusion processes based on observed infec- of diffusion networks and accurately estimates tions. To achieve this aim, we construct a model incor- their transmission rates from cascade data. porating some basic assumptions about the spatiotemporal structures that generate diffusion processes. The assump- tions are as follows. First, diffusion processes occur over 1. Introduction static (fixed) but unknown networks (directed graphs). Sec- ond, infections are binary, i.e., a node is either infected or Diffusion and propagation processes have received increas- it is not; we do not model partial infections or the partial ing attention in a broad range of domains: information propagation of information. Third, infections along edges propagation (Adar & Adamic, 2005; Gomez-Rodriguez of the network occur independently of each other. Fourth, et al., 2010; Meyers & Leskovec, 2010), social net- an infection can occur at different times: the likelihood of works (Kempe et al., 2003; Lappas et al., 2010), viral mar- node a infecting node b at time t is modeled via a proba- keting (Watts & Dodds, 2007) and epidemiology (Wallinga bility density function depending on a, b and t. Finally, we & Teunis, 2004). observe all infections occurring in the network during the Appearing in Proceedings of the 28 th International Conference recorded time window. Our aim is to infer the connectiv- on , Bellevue, WA, USA, 2011. Copyright 2011 ity of the network and the likelihood of infections across by the author(s)/owner(s). its edges after observing the times at which nodes in the Uncovering the Temporal Dynamics of Diffusion Networks network become infected. section we formulate our model, starting from the data it is designed for, and concluding with a precise statement of In more detail, we formulate a generative probabilistic the network inference problem. model of diffusion that aims to describe realistically how infections occur over time in a static network. Finding Data. Observations are recorded on a fixed population of the optimal network and transmission rates maximizing the N nodes and consist of a set C of cascades {t1,..., t|C|}. likelihood of an observed set of infection cascades reduces Each cascade tc is a record of observed infection times to solving a convex program. The convex problem decou- within the population during a time interval of length T c. ples into many smaller problems, allowing for natural par- c c c A cascade is an N-dimensional vector t := (t1, . . . , tN ) allelization so that our algorithm scales to networks with c c recording when nodes are infected, tk ∈ [0,T ] ∪ {∞}. hundreds of thousands of nodes. We show the effectiveness Symbol ∞ labels nodes that are not infected during ob- of our method by reconstructing the connectivity and con- servation window [0,T c] – it does not imply that nodes tinuous temporal dynamics of synthetic and real networks are never infected. The ‘clock’ is reset to 0 at the start using cascade data. of each cascade. Lengthening the observation window T c increases the number of observed infections within a cas- Related work. The work most closely related to cade c and results in a more representative sample of the ours (Gomez-Rodriguez et al., 2010; Meyers & Leskovec, underlying dynamics. However, these advantages must be 2010) also uses a generative probabilistic model for infer- weighed against the cost of observing for longer periods. ring diffusion networks. Gomez-Rodriguez et al. (2010) For simplicity we assume T c = T for all cascades; the (NETINF) infers network connectivity using submodular results generalize trivially. optimization and Meyers & Leskovec (2010) (CONNIE) infer not only the connectivity but also a prior probabil- The time-stamps assigned to nodes by a cascade induce the ity of infection for every edge using a convex program and structure of a directed acyclic graph (DAG) on the network some heuristics. However, both papers force the transmis- (which is not acyclic in general) by defining node i is a par- sion rate between all nodes to be fixed – and not inferred. ent of j if ti < tj. Thus, it is meaningful to refer to parents In contrast, our model allows transmission at different rates and children within a cascade, but not on the network. The across different edges so that we can infer temporally het- DAG structure dramatically simplifies the computational erogeneous interactions within a network, as found in real- complexity of the inference problem. Also, since the un- world examples. Thus, we can now infer the temporal dy- derlying network is inferred from many cascades (each of namics of the underlying network. which imposes its own DAG structure), the inferred net- work is typically not a DAG. The main innovation of this paper is to model diffusion as a spatially discrete network of continuous, conditionally in- Pairwise transmission likelihood. The first step in model- dependent temporal processes occurring at different rates. ing diffusion dynamics is to consider pairwise interactions. Infection transmission depends on the complex intricacies We assume that infections can occur at different rates over e.g. of the underlying mechanisms ( , a person’s susceptibil- different edges of a network, and aim to infer the transmis- ity to viral infections depends on weather, diet, age, stress sion rates between pairs of nodes in the network. levels, prior exposures to similar pathogens and so on). We avoid modeling the mechanisms underlying individual in- Define f(ti|tj, αj,i) as the conditional likelihood of trans- fections, and instead develop a data-driven approach, suit- mission between a node j and node i. The transmission able for large-scale analyses, that infers the diffusion pro- likelihood depends on the infection times (tj, ti) and a pair- cess using only the visible spatiotemporal traces (cascades) wise transmission rate αj,i. A node cannot be infected by a it generates. We therefore model diffusion using only time- node infected later in time. In other words, a node j that has dependent pairwise transmission likelihood between pairs been infected at a time tj may infect a node i at a time ti of nodes, transmission rates and infection times, but not only if tj < ti. Although in some scenarios it may be possi- prior of infection that depend on unknown ble to estimate a non-parametric likelihood empirically, for external factors. To the best of our knowledge, continu- simplicity we consider three well-known parametric mod- ous temporal dynamics of diffusion networks has not been els: exponential, power-law and Rayleigh (see Table 1). modeled or inferred in previous work. We believe this is a Transmission rates are denoted as αj,i ≥ 0 and δ is the key point for understanding diffusion processes. minimum allowed time difference in the power-law to have a bounded likelihood. As αj,i → 0 the likelihood of in- 2. Problem formulation fection tends to zero and the expected transmission time becomes arbitrarily long. Without loss of generality, we This paper develops a method for inferring the spatiotem- consider δ = 1 in the power-law model from now on. poral dynamics that generate observed infections. In this Exponential and power-laws are monotonic models that Uncovering the Temporal Dynamics of Diffusion Networks

Table 1.

Transmission likelihood Log survival function Hazard function Model f(ti|tj ; αj,i) log S(ti|tj ; αj,i) H(ti|tj ; αj,i)  −α (t −t ) α · e j,i i j if tj < ti Exponential (EXP) j,i −α (t − t ) α 0 otherwise j,i i j j,i

( −1−αj,i αj,i  ti−tj  if tj + δ < ti  t −t  δ δ i j 1 Power law (POW) −αj,i log δ αj,i · t −t 0 otherwise i j

1 2  − αj,i(ti−tj ) 2 α (t − t )e 2 if tj < ti (ti−tj ) Rayleigh (RAY) j,i i j −α α · (t − t ) 0 otherwise j,i 2 j,i i j have been previously used in modeling diffusion networks assume infections are conditionally independent given the and social networks (Gomez-Rodriguez et al., 2010; Mey- parents of the infected nodes, the likelihood factorizes over ers & Leskovec, 2010). Power-laws model infections with nodes as long-tails. The Rayleigh model is a non-monotonic para- ≤T Y metric model previously used in epidemiology (Wallinga f(t ; A) = f(ti|t1, . . . , tN \ ti; A). (2) & Teunis, 2004). It is well-adapted to modeling fads, ti≤T where infection likelihood rises to a peak and then drops Computing the likelihood of a cascade thus reduces to com- extremely rapidly. puting the conditional likelihood of the infection time of We recall some additional standard notation (Law- each node given the rest of the cascade. As in the indepen- less, 1982). The cumulative density function, denoted dent cascade model (Kempe et al., 2003), we assume that F (ti|tj; αj,i), is computed from the transmission likeli- a node gets infected once the first parent infects the node. hoods. Given that node j was infected at time tj, the sur- Given an infected node i, we compute the likelihood of a vival function of edge j → i is the that node i potential parent j to be the first parent by applying Eq. 1, is not infected by node j by time ti: Y f(ti|tj; αj,i) × S(ti|tk; αk,i). (3) S(ti|tj; αj,i) = 1 − F (ti|tj; αj,i). j6=k,tk

The hazard function, or instantaneous infection rate, of We now compute the conditional likelihoods of Eq. 2 edge j → i is the ratio by summing over the likelihoods of the mutually disjoint events that each potential parent is the first parent, f(ti|tj; αj,i) H(ti|tj; αj,i) = . X S(ti|tj; αj,i) f(ti|t1, . . . , tN \ ti; A) = f(ti|tj; αj,i)×

j:tj T . Since each t ≤T j:t

ti≤T k:tk

Eq. 6 only considers infected nodes. However, the fact that If all pairwise transmission likelihoods between pairs of some nodes are not infected during the observation window nodes in the network have log-concave survival functions is also informative. We therefore add the multiplicative sur- and concave hazard functions in the parameter(s) of the vival term from Eq. 1 and also replace the ratios in Eq. 6 pairwise transmission likelihoods, then convexity of Eq. 9 with hazard functions: follows from linearity, composition rules for concavity, and concavity of the logarithm. Y Y f(t; A) = S(T |ti; αi,m)× Corollary 2. The network inference problem defined by ti≤T tm>T Y X equation Eq. 9 is convex for the exponential, power-law S(ti|tk; αk,i) H(ti|tj; αj,i). (7) and Rayleigh models. k:tk 0. and infection time differences (ti − tj), but not absolute infection times ti or tj. Our formulation thus does not de- 3. Proposed algorithm: NETRATE pend on the absolute time of the root node of each cascade.

The solution to Eq. 9 is unique, computable and consistent: The Ψ1 and Ψ2 terms contribute a positively weighted l1- Theorem 1. Given log-concave survival functions and norm on vector A that encourages sparse solutions (Boyd concave hazard functions in the parameter(s) of the pair- & Vandenberghe, 2004). The penalty arises naturally wise transmission likelihoods, the network inference prob- within the probabilistic model and therefore heuristic lem defined by equation Eq. 9 is convex in A. penalty terms to encourage sparsity are not necessary. Each term of the l1-norm is linearly (exponential model), logarithmically (power-law) or quadratically (Rayleigh) Proof. By Eq. 8, the log-likelihood of a set of cascades is weighted by infection times. 1 |C|  L {t ... t }; A = The Ψ2 term penalizes edges k → i based on the infec- X c c c tion times difference ti − tk. Edges transmitting infec- Ψ1(t ; A) + Ψ2(t ; A) + Ψ3(t ; A), (10) tions slowly are heavily penalized and conversely. The Ψ1 c term penalizes edges i → j targeting uninfected nodes j c 1 |C| where for each cascade t ∈ {t ,..., t }, based on the time T − ti till the observation window cut- off. Lengthening the observation window produces harsher c X X Ψ1(t ; A) = log S(T |ti; αi,m), penalties – however, it also allows further infections. The

i:ti≤T tm>T penalties are finite, i.e., if no infection of node j is ob- X X served, we can only say that it has survived until time T . Ψ (tc; A) = log S(t |t ; α ), 2 i j j,i There is insufficient evidence to claim j will never be in- i:ti≤T j:tj

1 1 1

0.8 0.8 0.8

0.6 0.6 0.6

0.4 0.4 0.4 Precision Precision Precision 0.2 NetRate 0.2 NetRate 0.2 NetRate NetInf NetInf NetInf ConNIe ConNIe ConNIe 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Recall Recall Recall

(a) Precision-recall (Hierarchical, (b) Precision-recall (Forest fire, POW) (c) Precision-recall (Random, RAY) EXP)

ρ ρ ρ 0 200 400 600 800 1000 0 200 400 600 800 1000 0 200 400 600 800 1000 1 1 1 0.8 0.8 0.8 0.6 0.6 0.6 0.4 0.4 0.4 Accuracy NetRate Accuracy NetRate Accuracy NetRate 0.2 NetInf 0.2 NetInf 0.2 NetInf ConNIe ConNIe ConNIe 0 0 0 0 2000 4000 6000 8000 0 2000 4000 6000 8000 0 2000 4000 6000 8000 k k k (d) Accuracy (Hierarchical, EXP) (e) Accuracy (Forest fire, POW) (f) Accuracy (Random, RAY)

Figure 1. Panels (a-c) plot precision against recall; panels (d-f) plot accuracy. For CONNIE and NETINF we sweep over parameters ρ (penalty factor) and k (number of edges) respectively to control the solution sparsity in both algorithms, thereby generating a family of inferred models. NETRATE has no tunable parameters and therefore yields a unique solution. (a,d): 1,024 node hierarchical Kronecker network with exponential model for 5,000 cascades. (b,e): 1,024 node Forest Fire network with power law model for 5,000 cascades. (c,f): 1,024 node random Kronecker network with Rayleigh model for 2,000 cascades. ble. Instead, NETRATE infers that infections are impossi- Solving NETRATE. We solve Eq. 9 with CVX, a pack- ble across certain edges, i.e., that some of the optimal rates age for specifying and solving convex programs (Grant & αj,i are 0, based solely on the observed data and the length Boyd, 2010). of the time horizon.

The Ψ3 term ensures infected nodes have at least one parent 4. Experimental evaluation since otherwise the objective function would be negatively We evaluate the performance of NETRATE on: (i) synthetic unbounded, i.e., log 0 = −∞. Moreover, our formulation networks that mimic the structure of social networks and encourages a natural diminishing property on the number (ii) real cascades extracted from the MemeTracker dataset1. of parents of a node – since the logarithm grows slowly, it We show that NETRATE discovers more than 95% of the weakly rewards infected nodes for having many parents. edges in synthetic networks and more than 60% in real net- works, accurately recovers transmission rates from diffu- Optimizing NETRATE. We speed up the convex program by orders of magnitude via two improvements: sion data, and typically outperforms two previously devel- oped inference algorithms, NETINF and CONNIE. We use Distributed optimization: The optimization problem splits the public implementations of NETINF and CONNIE. into N subproblems, one for each node i, in which we find N − 1 rates αj,i, j = 1,...,N \ i. The computa- 4.1. Experiments on synthetic data tion can be performed in parallel, obtaining local solutions that are globally optimal. Importantly, each node’s com- Experimental setup. We focus on synthetic networks putation only requires the infection times of other nodes in that mimic the structure of real-world diffusion networks cascades it belongs to. – in particular, social networks. We consider two mod- Unfeasible rates: If a pair (j, i) is not in any common els of directed real-world social networks: the Forest Fire (scale free) model (Barabasi´ & Albert, 1999) and the Kro- cascades, αj,i only arises in the non-positive term Ψ3 in necker Graph model (Leskovec et al., 2010) to generate dif- Eq. 10, so the optimal αj,i is zero. We therefore simply the objective function by setting αj,i to zero. 1Data available at http://memetracker.org Uncovering the Temporal Dynamics of Diffusion Networks

1 servation window. In contrast, CONNIE assigns probability Core-Periphery 0.8 Hierarchical priors to edges that are defined without reference to an ob- Random Forest-Fire servation window. Therefore, the values assigned to edges 0.6 by NETRATE and CONNIE are not comparable, so we do 0.4 not compute MAE for CONNIE.

Normalized MAE 0.2 Figure 1 compares the precision, recall and accuracy of 0 EXP POW RAY NETRATE with NETINF and CONNIE for two types of Kronecker networks (hierarchical community structure and random) and a Forest Fire network over an observation Figure 2. NETRATE’s normalized mean absolute error (MAE) for three types of Kronecker networks (1,024 nodes and 2,048 edges) window of length T = 10. In terms of precision-recall, and a Forest Fire network (1,024 edges and 2,422 edges) for 5,000 NETRATE outperforms CONNIE and NETINF for all the cascades. We consider all three models of transmission: exponen- synthetic examples in the Pareto sense (Boyd & Vanden- tial (EXP), power-law (POW) and Rayleigh (RAY). berghe, 2004). More specifically, if we set CONNIE and NETINF’s tunable parameters to provide solutions with the same precision as NETRATE,NETRATE’s recall is always fusion networks. We generate three types of Kronecker higher than the other two methods. Strikingly, CONNIE graph with very different structures: random (Erdos˝ & and NETINF do not achieve NETRATE’s recall for any pre- Renyi,´ 1960) (parameter matrix [0.5, 0.5; 0.5, 0.5]), hierar- cision value. NETRATE outperforms CONNIE with respect chical (Clauset et al., 2008) ([0.9, 0.1; 0.1, 0.9]) and core- to accuracy for any penalty factor ρ in all the synthetic ex- periphery (Leskovec et al., 2008) ([0.9, 0.5; 0.5, 0.3]). amples. It is also more accurate than NETINF for most val- ues of k (number of edges). Importantly, NETINF and CO- First, we generate network G∗ by drawing transmission NNIE yield a curve of solutions from which have to select a rates for edges (j, i) from a uniform distribution. For the point blindly (or at best heuristically), whereas NETRATE exponential and Rayleigh models α ∈ [0.01, 1] and for the yields a unique solution without any tuning. power law α ∈ [0.01, 2]. The transmission rate for an edge (j, i) models how fast the information spreads from node Figure 2 shows the normalized MAE of the estimated trans- j to node i in social networks. Then, we generate a set of mission rates for the same networks, computed on 5,000 cascades over G∗. Root nodes of cascades are chosen uni- cascades. The normalized MAE is under 25% for almost all formly at random. As noted previously, the optimization networks and transmission models – surprisingly low given problem depends on the time differences (ti − tj). There- we are estimating more than 2,000 non-zero real numbers. fore, our formulation does not depend on the absolute time of the root node of each cascade. Once a node is infected, NETRATE performance vs. cascade coverage. Observ- the transmission likelihoods of outgoing edges determine ing more cascades leads to higher precision-recall and more the infection times of its neighbours. We record the time of accurate estimates of the transmission rates. Figure 3(a) the first infection if a node is infected more than once. In- plots the MAE of inferred networks against the number fections are not observed after a pre-specified time horizon of observed cascades for a hierarchical Kronecker network T . with all three transmission models. Estimating transmis- sion rates is considerably harder than simply discovering Accuracy of NETRATE. We evaluate NETRATE against edges and therefore more cascades are needed for accurate two other inference methods, NETINF and CONNIE, by estimates. As many as 5,000 cascades are required to ob- comparing the inferred and true networks via three mea- tain normalized MAE values lower than 20%. sures: precision, recall and accuracy. Precision is the frac- tion of edges in the inferred network Gˆ present in the true NETRATE performance vs. time horizon. Intuitively, the network G∗ . Recall is the fraction of edges of the true longer the observation window, the more accurately NET- network G∗ present in the inferred network Gˆ. Accuracy RATE is able to infer transmission rates. Figure 3(b) con- P |I(α∗ )−I(ˆα )| i,j i,j i,j firms this intuition by showing the MAE of inferred net- is 1 − P I(α∗ )+P I(ˆα ) , where I(α) = 1 if α > 0 i,j i,j i,j i,j works for different time horizons T for a hierarchical Kro- and I(α) = 0 otherwise. Inferred networks with no edges necker with exponential, power-law and Rayleigh transmis- or only false edges have zero accuracy. Second, we evalu- sion models for 5,000 cascades. ate how accurately NETRATE infers transmission rates over edges by computing the normalized mean absolute error NETRATE running time. Figure 3(c) plots the average  ∗ ∗ ∗ (MAE, i.e., E |α −αˆ|/α , where α is the true transmis- running time to infer rates of all incoming edges to a node sion rate and αˆ is the estimated transmission rate). Note against number of nodes in a network (the number of edges that in NETRATE, as for real cascades, the probability of is twice the number of nodes) on a single CPU. Further infection depends on both the transmission rate and the ob- Uncovering the Temporal Dynamics of Diffusion Networks

1 1 EXP EXP 20 0.8 POW 0.8 POW RAY RAY 16 0.6 0.6 12

0.4 0.4 8 0.2 0.2 Running time (s) Normalized MAE Normalized MAE 4 0 0 2500 5000 7500 10000 0 2 4 6 8 10 5000 10000 15000 20000 Number of cascades T Network size (nodes) (a) Cascade coverage (b) Time horizon (c) Running time

Figure 3. Panels (a,b) show NETRATE’s normalized MAE vs number of cascades and time horizon respectively for a hierarchical Kro- necker network with 1,024 nodes and 2,048 edges with exponential (EXP), power-law (POW) and Rayleigh (RAY) transmission models. Panel (c) plots NETRATE’s average running time to infer rates of all incoming edges to a node against network size (number of nodes) for a hierarchical Kronecker network. improvements can be achieved since NETRATE naturally ing single solutions for NETINF and CONNIE, there is no splits into a collection of subproblems, one per node. A guarantee that the solutions chosen from the curves will be cluster with 25 CPUs can therefore infer a network with anywhere near the highest achievable value. 16,000 nodes (and 32,000 edges) in less than 4 hours. 5. Conclusions 4.2. Experiments on real data We have developed a flexible model, NETRATE, of the spa- Dataset description. As in previous work tackling dif- tiotemporal structure underlying diffusion processes. The fusion networks, we use the MemeTracker dataset, which model makes minimal assumptions about the physical, bio- contains more than 172 million news articles and blog posts logical or cognitive mechanisms responsible for diffusion. from 1 million online sources. We use hyperlinks between Instead, it infers transmission rates between nodes of a net- blog posts to trace the flow of information. A site publishes work by computing the model that maximizes the likeli- a piece of information and uses hyper-links to refer to the hood of the observed data – temporal traces left by cas- same or closely related pieces of information published by cades of infections. Qualitative assumptions about infec- other sites. These other sites link to still others and so on. A tions (e.g., are they long-tailed?) determine the choice of cascade is thus a collection of time-stamped hyperlinks be- parametric model on the edges. An interesting feature of tween sites (in blog posts) that refer to the same or closely NETRATE, to be investigated in future work, is the possi- related pieces of information. We record one cascade per bility of exponential, power law, Rayleigh or other piece – or closely related pieces – of information. We ex- models within a single inference algorithm, thus providing tract the top 500 media sites and blogs with the largest num- tremendous flexibility in fitting real data which may com- ber of documents, 5,000 hyperlinks and 116,234 cascades. bine long-tailed, faddish and other qualitative behaviors. Remarkably, introducing continuous temporal dynamics, Accuracy on real data. As the ground truth is unknown allowing variable transmission rates across edges, and on real data, we proceed as follows. We create a network avoiding further assumptions dramatically simplified the where there is an edge (u, v) if a post on a site u linked problem compared with previous approaches (Gomez- to a post on a site v. We consider this as the ground truth Rodriguez et al., 2010; Meyers & Leskovec, 2010). The network G∗ and we use the hyperlink cascades to infer the model has parameters with natural interpretations, and it network Gˆ and evaluate how many edges our method esti- leads to a well-defined convex maximum likelihood prob- mates properly. We assume an exponential model. lem that can be solved efficiently. Importantly, we do not Figure 4 compares our results with NETINF and CONNIE. need to tune parameters by hand to control the sparsity As in the synthetic experiments, NETRATE yields a unique of the inferred network (i.e., number of edges to infer or solution whereas the other algorithms produce curves of penalty terms). Heuristic l1-like penalty terms, as the ones solutions. Panel (a) shows that NETRATE performs com- used in Meyers & Leskovec (2010), are unnecessary since parably to NETINF and it outperforms CONNIE on preci- the probabilistic model naturally imposes sparsity. sion and recall. Panel (b) plots accuracy: NETRATE’s out- We evaluated NETRATE on a wide range of synthetic dif- performs the other two algorithms on the majority of their fusion networks with heterogeneous temporal dynamics outputted solutions, and almost matches their best perfor- which aim to mimic the structure of real-world social and mances. Since there are no principled methods for choos- Uncovering the Temporal Dynamics of Diffusion Networks

ρ 1 Clauset, A., Moore, C., and Newman, M. E. J. Hierarchical NetRate 0 200 400 600 800 1000 0.8 NetInf 1 structure and the prediction of missing links in networks. ConNIe NetRate 0.6 0.8 NetInf Nature, 453(7191):98–101, 2008. 0.6 ConNIe 0.4

Precision 0.4

0.2 Accuracy Erdos,˝ P. and Renyi,´ A. On the evolution of random graphs. 0.2 0 0 Publication of the Mathematical Institute of the Hungar- 0 0.2 0.4 0.6 0.8 1 0 5000 10000 15000 20000 ian Academy of Science, 5:17–67, 1960. Recall k (a) Precision-recall (b) Accuracy Gomez-Rodriguez, M., Leskovec, J., and Krause, A. In- ferring Networks of Diffusion and Influence. In Proc. Figure 4. Real data. Precision-recall and accuracy of NETRATE, of the 16th ACM SIGKDD International Conference on NETINF and CONNIE, with an exponential model, on a 500 node Knowledge Discovery in Data Mining, pp. 1019–1028, hyperlink network with 5,000 edges using hyperlinks cascades. 2010. Grant, M. and Boyd, S. CVX: Matlab software for dis- information networks. NETRATE provides a unique solu- ciplined convex programming, version 1.21. http: tion to the network inference problem with high recall, pre- //cvxr.com/cvx, 2010. cision and accuracy. A direct comparison with the current ´ state of the art is difficult, since these methods include a Kempe, D., Kleinberg, J. M., and Tardos, E. Maximizing parameter controlling the sparsity of the inferred network the spread of influence through a social network. In Proc. that requires blind tuning. Nevertheless, NETRATE is typ- of the 9th ACM SIGKDD International Conference on ically better in terms of accuracy than previous methods Knowledge Discovery and Data Mining, pp. 137–146, across the full range of their tunable parameters. In addi- 2003. tion, it accurately estimates transmission rates, which other Lappas, T., Terzi, E., Gunopulos, D., and Mannila, H. Find- methods cannot estimate at all. The performance of CO- ing effectors in social networks. In Proc. of the 16th NNIE appears significantly worse than reported in Meyers ACM SIGKDD International Conference on Knowledge & Leskovec (2010); a possible explanation for the degra- Discovery and Data Mining, pp. 1059–1068, 2010. dation is that in our work, we consider networks with het- erogeneous temporal dynamics. It is surprising how well Lawless, J.F. Statistical models and methods for lifetime NETINF performs in comparison with NETRATE despite data. Wiley New York, 1982. assuming uniform temporal dynamics and priors. Leskovec, J., Lang, K. J., Dasgupta, A., and Mahoney, Finally, we evaluated NETRATE on real data. Again, NET- M. W. Statistical properties of community structure RATE provides a unique solution to the network inference in large social and information networks. In Proc. of problem but in this case, as expected, the values of re- the 17th International Conference on World Wide Web, call, precision and accuracy are modest – adopting a simple 2008. parametric pairwise transmission model is a simplistic as- Leskovec, J., Chakrabarti, D., Kleinberg, J., Faloutsos, C., sumption on real data. In terms of accuracy, it outperforms and Ghahramani, Z. Kronecker graphs: An approach to previous methods across a significant part of the full range modeling networks. The Journal of Machine Learning of their tunable parameters. Research, 11:985–1042, 2010. NETRATE provides a novel view of diffusion processes. We believe it can be fruitfully applied to several lines of Meyers, S. and Leskovec, J. On the Convexity of Latent research including influence maximization, control of epi- Social Network Inference. In Advances in Neural Infor- demics, and causal inference. mation Processing Systems, 2010. Newey, W. K. and McFadden, D. L. Large sample estima- References tion and hypothesis testing. In Handbook of Economet- rics, Volume IV, pp. 2111–2245. 1994. Adar, E. and Adamic, L. A. Tracking information epi- demics in blogspace. In Web Intelligence, pp. 207–214, Wallinga, J. and Teunis, P. Different epidemic curves for 2005. severe acute respiratory syndrome reveal similar impacts of control measures. American Journal of Epidemiology, Barabasi,´ A.-L. and Albert, R. Emergence of scaling in 160(6):509–516, 2004. random networks. Science, 286:509–512, 1999. Watts, Duncan J. and Dodds, Peter S. Influentials, net- Boyd, S.P. and Vandenberghe, L. Convex optimization. works, and public opinion formation. Journal of Con- Cambridge University Press, 2004. sumer Research, 34(4):441–458, 2007.