A Comparison Study of Disease Spreading Models

by Douglas Nogueira Oliveira

A dissertation submitted to Florida Institute of Technology in partial fulfillment of the requirements for the degree of

Doctor of Philosophy in Computer Science

Melbourne, Florida April 2018 We the undersigned committee hereby recommend that the attached dissertation

be accepted as fulfilling in part the requirements for the degree of

Doctor of Philosophy in Computer Sciences

A Comparison Study of Disease Spreading Models

by Douglas Nogueira Oliveira

Marco Carvalho, Ph.D. Associate Professor, Computer Sciences Advisor and Committe Chair

Robert van Woesik, Ph.D. Professor, Biological Sciences

William Allen, Ph.D. Associate Professor, Computer Sciences

Ronaldo Menezes, Ph.D. Professor, Computer Sciences Abstract

Title: A Comparison Study of Disease Spreading Models Author: Douglas Nogueira Oliveira Committee Chair: Marco Carvalho, Ph.D.

Epidemic diseases have become one of the major sources of concern in modern society. Diseases such as HIV and H1N1 have infected millions of people around the world. Many studies have attempted to describe the dynamics of disease spreading. Despite the numerous models, most only analyze one aspect of a given model, such as the impact of the topology of a contagious network, or, the spreading rate of disease. Although useful, these types of analyses do not provide insights into how different models could be combined to better understand the phenomenon, or how the modeling decisions affect the dynamics of disease spreading. The lack of methods to compare and analyze different models and assess their limitation is an open problem. We need methods to answer questions such as: Is it possible to combine different scenarios produced by different models to have a better understanding of disease spreading? How do modeling decisions affect the dynamics of disease spreading? Is it possible to rank different models according to

iii their accuracy and identify statistical dependencies among the parameters of the system? In this dissertation, I propose an approach to compare disease spreading models using Bayesian networks, which allows one to understand the relationship between the variables of the model and the results it is able to generate. This approach, formalized as a methodology, will provide a new perspective to the study of disease spreading models. By creating a merged Bayesian model that accounts for all the results of the analyzed models we are able to evaluate how modeling decisions affect the dynamics of disease spreading and consequently the accuracy of such models. We apply our methodology by analyzing models of disease spreading presented in the literature. Our results shown that some models are more indicated than others related to their accuracy to reproduce real world scenarios. We evaluated the effects on the spreading dynamics of the models parameters and their values chosen. We also used real-world data about chlamydia to validate our results.

iv Contents

1 Introduction 1

2 Literature Review 7 2.1 Related Work ...... 7 2.1.1 Complex Networks ...... 7 2.1.2 Disease Spreading Models ...... 16 2.1.3 Probabilistic Graphical Models ...... 25 2.2 State-of-the-Art ...... 32 2.3 Literature Gaps ...... 35 2.3.1 Problem Statement ...... 35 2.3.2 Thesis Statement ...... 36

3 Proposed Methodology 38 3.1 Controlled Experiment ...... 38 3.2 Weak Ensemble Assumption ...... 45 3.3 Proposed Methodology ...... 49

4 Model Analyses 64 4.1 SIS Models Analysis ...... 64

v 4.1.1 Analysis ...... 65 4.1.2 CDC Data Analysis ...... 75 4.2 SIR Models Analysis ...... 82 4.2.1 Analysis ...... 82

5 Conclusions 91

vi List of Figures

2.1 Examples of various real world networks. (a) A tourism network where nodes are grouped in communities of different colors [32]. (b) A regulatory network of a bacteria where nodes are colored according to the community they belong to [155]. (c) A friendship network of children in a US school where nodes are colored according to race [150]. (d) A network of sexual contacts from [172]. (e) The Brazilian airline network where nodes are colored according to its community [156]. (f) A network of the nervous system of the worm C. elegans, from [112]. (g) A recipes network where the color of a node represents its country of origin, from [116]...... 9 2.2 (a) Example of network evolution using the model of [63] with different p. (b) The degree distribution of a random network with 10,000 nodes and p=0.0015, and calculated the number of nodes

with degree k, Xk. Both figures were taken from [7]...... 13 2.3 The degree distribution of real world networks. (a) The Internet network at the router level. (b) Movie actor collaboration network. (c) Coauthorship network of physicists. (d) Coauthorship of neuroscientists. All plots were taken from [7]...... 14

vii 2.4 (a) The random rewiring of edges in a network with twenty nodes changing p from zero to one. When p=0 we have a regular lattice, when p=1 the lattice is random, between the extremes lies Small World networks. (b) Characteristic path length L(p) and clustering coefficient C(p) for the Watts and Strogatz model. Both figures were taken from [203]...... 16 2.5 (a) Delay in the case importation from Mexico to a given country compared with the reference scenario as a function of the travel reduction α. (b) The same as (a) where earlier dates for the start of the intervention are considered, has a fixed α = 90%. Figure taken from [18]...... 18 2.6 The Bayesian network graph for a naive Bayes model. Figure taken from [113]...... 27

3.1 Bayesian network of the phenomenon of the controlled experiment. 40 3.2 Bayesian networks that represent the different classes of models that describe the controlled experiment phenomenon. On the top left is a Bayesian network generated from an instance of Class 1, on the top right Class 2, on the bottom left Class 3, and on the bottom right Class 4...... 42

viii 3.3 Bayesian networks obtained from merging the results of different models. In the left we have the Bayesian network obtained from merging the results from an instance of Class 1 and an instance of Class 2. In the center we have the Bayesian network obtained from merging the results from an instance of Class 1 and an instance of Class 3. In the right we have the Bayesian network obtained from merging the results from an instance of Class 2 and an instance of Class 4...... 44 3.4 An illustration of the how weak ensemble approach fits our work. . 49 3.5 Steps of the proposed methodology...... 59

4.1 The Bayesian network of PV model, generated by implementing the model used in [161]...... 71 4.2 The Bayesian network of KA model, generated by implementing the model used in [117]...... 71 4.3 The Bayesian network generated from the sampled table of the results of KA model. Edges colored red were removed and edges colored blue were added...... 73 4.4 The resulting Bayesian network created using the merged dataset from PV and KA model...... 73

ix 4.5 Table representing the Bayesian structure of each model and the merged one. Each cell which is red colored indicates the presence of an edge between the two variables of the row and column. Cells colored in yellow represent variables that were not considered in that model. The number in each cell represent the corresponding points scored in that edge...... 74 4.6 The Bayesian network of each model with the dependencies colored to reflect accurate (green) and spurious (red) dependencies. The Bayesian on the left represent belongs to PV model and the Bayesian network on the right belongs to the KA model ...... 75 4.7 The proportion of accurate and spurious edges identified in each model analyzed...... 76 4.8 The total rate of reported cases of Chlamydia for the United States and outlying areas. Figure taken from [68]...... 79 4.9 The total rate of reported cases of Chlamydia by county in the state of Florida. Figure taken from [68]...... 79 4.10 The Bayesian network created from real data about Chlamydia . . . 81 4.11 The proportion of accurate and spurious edges identified in each Bayesian network created in this analysis: merged, PV and KA models. This comparison has as basis the Bayesian network created from real-world data about Chlamydia...... 81 4.12 The Bayesian network generated by implementing the model used in [146]...... 86 4.13 The Bayesian network generated by implementing the model used in [139]...... 86

x 4.14 The Bayesian network created by merging the PGMs of the Nek and Moore models...... 87 4.15 Table representing the Bayesian structure of each model and the merged one. Each row which is red colored indicated the presence of the edge between the two variables of the row and column. Cells colored in yellow represent variables that were not considered in that model. The number in each cell represent the corresponding scenario of the merging outcome...... 88 4.16 The Bayesian network of each model with the dependencies colored to reflect accurate (green) and spurious (red) dependencies. The Bayesian on the left belongs to Nek model and the Bayesian network on the right belongs to Moore model ...... 89 4.17 The proportion of accurate and spurious edges identified in each model analyzed...... 90

xi Acknowledgements

I would like to express the deepest appreciation to my advisor Dr. Marco Carvalho, who has shown the singular attitude and substance. Without his supervision and constant help this dissertation would not have been possible. I would also like to thank, Dr. Robert van Woesik, Dr. Willian Allen and Dr. Ronaldo Menezes for serving as my committee members. I am also grateful to the staff of the Computer Science Department for being supportive and helpful through my doctoral program. In special to Mrs. Dee Casale, that made these last four years more enjoyable and funny with her inexplicable always good mood. I would also like to thank Dr. Vasco Furtado for helping me seize this opportunity here at FloridaTech. A special thanks to my family. Words cannot express how grateful I am for all of the sacrifices that you have made on my behalf. Your support and safe harbor are what have sustained me thus far. I would like to dedicate this dissertation to the family that I choose, my friends. If this dissertation is ready it is because of you, too many to name, too many to count, too many to mention how you helped me. This journey would not have been bearable without all of you.

xii Declaration

I declare that the work in this dissertation is solely my own except where attributed and cited to another author. Most of the material in this thesis has been previously published by the author.

xiii Chapter 1

Introduction

The increasing interconnectedness of transcontinental world commerce and travel has made all nations susceptible to infectious diseases that ignore geographic and political boundaries, and, therefore constitute a global threat that puts every nation and every person at risk [120]. Despite the notorious effect of many modern infectious diseases [40,60] none are more devastating than HIV, which affects more than forty million people around the world [159] and 28.5 million people in sub- Saharan Africa alone [157]. It is expected that the per capita gross domestic product (GDP) in such affected countries has decreased by eight percent, rising to even more than twenty percent in heavily affected areas [158]. According to the US National Institutes of Health, sixteen new infectious diseases have been identified in the past two decades [3, 67]; moreover, five others have been identified as re- emerging. Indeed, the majority of countries have recently identified the spread of infectious disease as the greatest global problem they are facing [120]. Given its importance, researchers have recently used interdisciplinary areas such as complex networks to better understand the phenomenon of disease spreading [162].

1 The scientific field of complex networks is a new and interdisciplinary field that is attracting attention from researchers [190]. Their efforts in modeling abstract and tangible objects as networks composed of millions of nodes, have allowed advances in many areas, such as genetics [204], coauthorship collaboration [147], tourism [32], neurology [112] and transportation [156]. The first attempts to define new concepts and measures to characterize the topology of real-world networks resulted in a series of unifying principles and statistical properties present in many real-world networks [27]. A number of books [31, 56, 163, 202] and review articles [7,57,150,190] about complex networks can be found in the literature. Before using such interdisciplinary areas, the phenomenon of disease spreading was initially described using cellular automata, a concept introduced by von Neuman in the 1940’s [197]. In such modeling, each node represents an individual that can only be in one of a finite number of states, defined as a function of its own state and its neighbors; these nodes are considered the simplest form of a complex system [206]. Since then a number of models have been proposed, mathematically analyzed and applied to the planning and assessment of prevention, therapy and control strategies [13, 17, 92, 95, 144]. Recently associated with percolation processes [83], disease spreading models attracted the attention of physicists to its application in complex networks. The first analyses were pioneered by Pastor- Satorras and Vespignani [161, 162], which led to a variety of research efforts that attempted to understand the effects of the topology of social interaction networks on the rate and patterns of disease spreading [108]. The theoretical approach to simulate disease spreading uses compartmental models, i.e., models in which individuals are divided into a set of different groups [13, 144]. The SIS model describes diseases that do not confer immunity to their

2 survivors, so that an individual can acquire the same disease more than once. This applies for example to diseases such as tuberculosis and common flu [161, 162]. This model assumes that each individual can be in one of two possible states, susceptible (S) or infected (I). Susceptible nodes represent healthy individuals who can acquire the disease if exposed to infected nodes. Once a node catches the infection it becomes infected. Later it becomes susceptible again after a period of time in which it recovers. This model is based on two main parameters, the infection rate µ and the healing rate δ [161]. Conversely, there are some diseases that confer immunity to individuals who survive them, such as chickenpox and measles. Such diseases are better described by the SIR model [214]. Similar to the SIS model, the SIR model is a compartmental model in which healthy individuals are placed in the susceptible group (S); once infected, individuals are placed in the infected group (I). However, when an individual recovers from the disease, it is placed in a different group called recovered (R), in which it neither infects nor becomes infected again. Many variations of such models have been introduced and analyzed [13,17,92,95,144]. The models introduced above have fundamentally different dynamics following the initial insertion of a seed infection into a given population. In fact, long-term maintenance of the disease in a closed population is impossible in the SIR model, due to the decreasing number of susceptible individuals as the disease spreads through the population. Conversely, the SIS model represents endemic diseases in which the disease can persist indefinitely circulating in different subgroups because the same individual can be infected several times [27]. Therefore, the variables to be measured in the two models are different. Analyses that use the SIR model and its variations are interested in measuring what fraction of the population can

3 potentially acquire the disease in a given scenario or how fast it can spread before it dies out. Contrarily, analyses that use variations of the SIS model attempt to show which scenarios produce stable endemic behavior and assess its impact on the other parameters of the system. Most studies only analyze one aspect of a given model, such as the impact of changes in the topology of the contagious network or changes in the spreading rate of disease [162, 214]. Although useful, such analyses do not provide insights into how different models could be combined to better understand the phenomenon, or the limitations of a given model. The lack of methods to compare and analyze different models and assess how modeling decisions affect the dynamics of disease spreading is a gap in the literature. We need methods that provide answers to questions such as: Is it possible to combine different scenarios produced by different models in order to have a better understanding of the phenomenon? How do modeling decisions affect the dynamics of disease spreading? Is it possible to rank different models according to their accuracy in identifying statistical dependencies among the parameters of the system? In order to better understand the disease spreading phenomenon, a more general approach that deals with the variables of the system and its dependencies is more suitable, such as Probabilistic Graphical Models (PGMs). PGMs are a class of modeling tools widely used in practice [59], especially in Bayesian networks [164]. Bayesian networks consist of acyclic directed graphs in which nodes represent random variables and edges represent probabilistic dependencies among such variables. PGMs consist of a framework to reason within systems with large parameters that interact among them [113]. More specifically, PGMs allow a declarative representation of the system that we want to reason

4 with [154]. In other words, it allows for the analysis of a given system without any specific knowledge about the domain. PGMs use a graph-based representation to compactly encode a complex distribution over a high dimensional space. The representation of a dynamic system, in terms of a probabilistic graphical model, allows the use of different algorithms to reason with it, because it provides its own clear semantics [91]. In many real-world distributions, variables only tend to interact directly with a few others, such property is exploited by the graphical representation of the system [101]. The advantages of this representation include modularity, transparency and being written down tractably [215]. The main contribution of this dissertation is a new approach to study disease spreading models using Bayesian networks. The approach allows one to understand the relationship between the variables of the respective model and the results that it is able to generate, and compares different models by ranking them according to their ability to identify accurate and spurious dependencies. Formalized as a methodology, this approach will provide a new perspective to the study of disease spreading models. Our main assumption is that a merged structure of the results of both models is a better representation of the phenomenon than each model individually. This assumption, often called Weak Ensemble assumption, is supported by our controlled experiment and has been widely used in a variety of areas such as facial recognition [97] and medical diagnosis [136]. By creating a merged Bayesian model that accounts for all the results of the analyzed models we are able to evaluate how modeling decisions affect the dynamics of spreading and consequently the accuracy of such models. The remaining chapters in this dissertation are organized as follows:

5 • Chapter 2 provides a review of the literature in the area; it comprises a background study and highlights the related issues presented in the literature, focusing on its gaps.

• Chapter 3 presents the proposed approach for describing and comparing disease spreading models by using Bayesian networks.

• Chapter 4 validates the methodology proposed by applying it to two different comparative analyses of different disease spreading models presented in the literature.

• Chapter 5 summarizes the contributions presented in this dissertation and suggests future directions.

6 Chapter 2

Literature Review

In this chapter, we provide a background and a review of the literature related to the work described in this dissertation.

2.1 Related Work

2.1.1 Complex Networks

The study of networks is not recent; it dates back to Euler’s study on the problem of the seven bridges in the eighteenth century [205]. He used a graph, a mathematical structure that consists of nodes and edges connecting them, to prove that there was no solution to the problem. Many theories, algorithms and concepts have been developed ever since, however, what is called complex networks today is quite different to old network science [150]. Complex networks are concerned with the structure of naturally occurring networks, rather than with theoretical networks. A network is defined as a graph composed by a set of vertices V and a set of edges E; what makes it complex is its irregular structure, its dynamic evolution

7 over time and the entanglement of thousands or millions of interacting parts [27]. Networks are ubiquitous, they are present in various aspects of society, as can be seen in Figure 2.1. Examples of real-world networks include food webs [43], metabolic networks [102], genetic networks [204], the nervous system [112], organ transplantation networks [196], food networks [116], tourism and trade networks [32], the World Wide Web [33], scientific collaboration networks [147] and Internet routers [65]. Social parlor games that use concepts of network connectivity are well known, such as the Six Degrees of Marlon Brando created by a German newspaper [15], the Six Degrees of Kevin Bacon [66] and the Six Degrees of Monica Lewinsky created by The New York Times [110]. Other examples can be found in [150]. Concepts of complex networks have been used to model social interactions combining statistics, sociology and social psychology [14]. Such social contacts can be represented as graphs in which the relationships in this network can be directed or undirected, e.g. consider the set of individuals 1,2,...,n, and (i, j) stands for individual i has a social interaction with individual j. If one assumes a directed link then (i, j) 6= (j, i). In an undirected link (i, j) = (j, i) [73]. Complex networks can be characterized by many features, such as:

• The node degree, accounts for the number of edges of a given node. In directed networks, each node has an in-degree and an out-degree representing the number of incoming and outgoing edges, respectively. As an overall characteristic of the network one can cite its degree distribution, which is the probability distribution of degree over the entire network [41]. The degree distribution has become an important measure in analysis of real- world systems, such as the World Wide Web [8]. The degree distribution of

8 C OMPLEX S YSTEMS juxtapose many (1000 to 30,000) lenses as well revert to their original activity pattern (11). nature of neuronal ensembles and their nonran- as their associated neural machinery (omma- Synaptic properties also are bafflingly var- dom, inhomogeneous connectivity topologies tidia) (5). This has advantages: An image is ied. Chemical synapses show a host of has been incorporated by Tononi and his constructed from a mosaic of pixels, and no plastic phenomena whose time-courses colleagues into a formal definition of “neu- optical pinhole is needed. The eye can be small span at least nine orders of magnitude, from ronal complexity” using concepts drawn and yet cover a wide visual field. However, this milliseconds to weeks, providing a sub- from information theory (16). These con- comes at a price: Tiny lenses are more severely strate for learning and memory (8). Trans- cepts express the degree of interactions limited by diffraction than larger ones. In most mitter release is probabilistic and its regu- between elements of a neuronal population. insects,!!!!!!!!!!!!!!!!!!!!! the photons penetrating! each ommatid- lation can depend very precisely on the The complexity of a group of neurons ium are guided to hit six main photoreceptors functional context and modulatory milieu should be low if they fire independently ! whose phototransducing segments are inter- (12). In short, no brain, however small, is (although total entropy will be high) or if mixed.! Hence, each one of these six photore- structurally simple. they are all strongly correlated. Complexity ceptors! “sees” the sameI scene, Introduction and the global This dizzying variety of mechanisms, this will be high if a large number of subassem- 3 image resolution is given by the spacing be- bottomless bag of exquisite molecular and cel- blies of varied sizes can be formed within ! tween ommatidia. In fly eyes, in contrast, all six lular gizmos, appears to be there for one rea- the population. Given the nonstationary na- photoreceptors have separate phototransducing son—to endow neurons with adaptive, multi- ture of neuronal activity and our limited segments, thereby increasing spatial resolution stable dynamical properties. Their functions, ability to sample activity from more than a (5). Because neighboring lenses focus light however, cannot be understood without a handful of neurons simultaneously, it re- from overlapping parts of the world, two spe- consideration of the systems in which they mains an open challenge to apply this no- cific photoreceptors in neighboring ommatidia lie. Frustratingly, the converse is also true. tion of complexity to spike trains recorded will catch photons from the same source. To In a heroic effort, White, Southgate, Thom- from behaving animals. exploit this feature, the axons of these two son, and Brenner (13)mappedtheapprox- Codes and Computation photoreceptors converge to the same second- imately 600 electrical and 5000 chemical order neurons so as to increase efficiency. Con- synapses connecting the 302 neurons of C. We are beginning to understand the codes used versely, the six photoreceptors within one om- elegans (Fig. 1B). Yet, this knowledge by by spiking neurons to transmit information Fig. 1. Community structure in Tourism Network of 2010. The color nodes represent their community; countries in the same community are grouped together matidiumin a line. The largest diverge community toin the six tourism different network contains second-order mostly of the European countries,itself Caribbean failed islands and to British provide Overseas Territories. realistic ideas about about the environment from periphery to deeper cells (Fig. 1A) (5). As a consequence of this(a) the function and dynamics of this minimal brain structures.(b) Considered individually, many 18 The structure and function of complex networks peculiarobserved in optimal social sciences. design, This variance the inspatial interpretation mappingGiven thatnervous we have such system, high correlation simply coefficients, because we we know neurons use an instantaneous firing rate code exists because physical measurements are performed very pre- double checked it against other datasets that we know should betweencisely in tightly consecutive controlled environments. neuronal Onprocessing the other hand, stag-not be highlyvery correlated. little In this about way we the become intrinsic more con- and synaptic (17)witharesolutionontheorderofafew measurements in the area of social sciences are taken in an fident that the high correlation of Table III is significant. We esenvironment needs where to be many more complicating intertwined and interfering in factors a preciseperformed aproperties correlation analysis of between its neurons weighted out-degree (14). Brain circuits milliseconds. The brain, however, most likely exist [36]. In the case of trade and tourism network analysis, trade data and six global datasets [37] which include: (1) manner.the interpretation In conclusion, of the Pearson correlation evolution coefficient is (givena remark-refugee populationare not(by origin Boolean of country), networks, (2) time required whereto connec- represents the world using neural assemblies, ableby r) isengineer, taken in the context but imposes of the social complexity. sciences because we start a businesstivity (days), is (3) everything. average precipitation, They (4) percent are not made of and population codes could be more subtle. are measuring human flows and socio-economic interaction in of the population affected by droughts, floods, and extreme a global context. temperatures,static, (5) adolescent linear fertility neurons, rate, and (6) isotropic percent of nets, or con- Although some neurons integrate inputs regard- StructureIn our analysis we found high positive correlation be- the population in rural areas. We found that the correlation less of their temporal structure (18), evidence tween degree weights and PageRank for several network varies from nostant correlation connection to medium-negative weights. correlation. Recent We theoreti- Brainpairs. Weighted complexity degree correlation is reflected implies that in countries the complexity have did not includecal other work trade and exploring tourism dataset the for this complexity analysis and dy- exists that the relative timing of action poten- similar importance in terms of their position in the inbound because they would have given similar values as they all are ofand its outbound structural tourism makeup. networks, and At the the import most and exportelementa-highly correlatednamics to each of other. food-web Table IV shows networks the results of in ecology and tials matters (19), even allowing for combina- trade network. It may also support the four-way relationship these correlation analyses. rybetween level, tourism voltage- arrival, import, and outbound neurotransmitter-gated tourism, and export. incorporating analog connection weights and torial spatiotemporal codes (20). These alterna- ionicResults channels of this analysis of are all summarized types arein Table found III. through- nonlinear elements (15)mighthaveapplicabil- tives should not be seen as exclusive, but rather out the animal kingdom. The genome of78 the ity to neuroscience. as complementary and dependent on the de- worm Caenorhabditis elegans contains se- Amorerealisticaccountingofthedynamic mands of the task the animal must carry out. quences for 80 different types of potassi- um-selective ion channels, 90 ligand-gated receptors, and around 1000 G protein– Fig. 1. Circuit complexity in nervous systems. A linked receptors (7). The combinatorial (A)Projectionsofphotoreceptoraxonsfrom the retina (top) onto the lamina (bottom) of FIG.possibilities 8 Friendship network are staggeringof childrenFIG. in a US 2 for school.Three a Friendsnervous exampleships are determinedofthe the fly’s kinds by “neural asking ofnetworks the superposition participants, that andare eye.” hence the ar Notee top theic of this review. (a) A food web of predator-prey interactions directed, since A may say that B is their friend but not vice versa(c).Verticesarecolorcodedaccordingtorace,asmarked,and (d) thesystem split from with left to only right in 302 the figure neurons.between is clearlyAplysia primarily speciesalong cali- in lines a freshwater of race.organized The split lake divergencefrom [272].top to bottom of Picture the is between six courtes main middle receptoryofNeoMartinezandRichardWilliams.(b)Thenetworkof schoolfornica and high,amarinemollusk,expressesvolt- school, i.e., betweencollaborations younger and older chil betweendren. Picture scientists courtesyaxons of within James at a Moody. private one reti- research institution [171]. (c) A network of sexual contacts between individuals in the study by Potterat et al. [342]. age-dependent glutamate receptors whose nal bundle and their B It would be of some interest, and indeed practical im- methodprecise where not rearrangement all pairs are assigned a strength; in portance,actions were underlie, we to find at that least other partly, types of networks, associativethat case one can assume the remaining pairs to have a such as those those listed in Table II, show similar group connectionwithin strength six of new zero. laminar Then, starting with n ver- learning in mammalian cortex (8). Dendrit- structure also. One might wellA. imagine Types for exampleof networkstices with no edges between any of them, one adds edges graphs or sometimes digraphs,forshort.Agraphrep- bundles. [Reproduced that citation networks would divide into groups repre- in order of decreasing vertex–vertex connection strength. ic trees in mollusks and insects (9)areas resenting telephone calls or email messages between in- with permission from senting particular areas of research interest, and a good One can pause at any point in this process and examine dealprofusely of energy hasbranched been invested and in varied studiesAsetofverticesjoinedbyedgesisonlythesimplest of as this in phe- a pri-the component(28)] structure (B)Circuitdia- formed by the edges added so dividuals would be directed, since each message goes in nomenon [101, 138]. Similarly communities in the World far; these components are taken to be the communities only one direction. Directed graphs can be either cyclic, mate’s brain. The dynamics type of network;the firing there of aregram many of the ways nervous in which networks Wide Web might reflect the subject matter of pages, com- (or “clusters”) at that stage in the process. When all may be more complex than this (Fig. 3). For instance, meaning they contain closed loops of edges, or acyclic munitiesalobster’sneuronsareatleastasrichas in metabolic, neural, or software networks might edges havesystem been added, of the all vertices worm are connected to all reflect functional units, communities in food webs might others, and there is only one community. The entire pro- there may be more thanC. one elegans different(chemical type of vertex in a meaning they do not. Some networks, such as food webs, those in mammalian thalamus or neocortex. reflect subsystems within ecosystems, and so on. cess can be represented by a tree or dendrogram of union network, or moreoperations than synapses one between diff vertexerent only). sets typeIts in 302 which of the edge. communities And are approximately but not perfectly acyclic. AndThe traditional neither methodcan be for reduced extracting tocommunity canonical vertices or edgesat may any levelhaveneurons correspond a variety have to a horizontal of been properties, cut through nu- the One can also have hyperedges—edges that join more structure from a network is cluster analysis [147], some- 17 16 tree—see Fig. 9. timesintegrate-and-fire also called hierarchical models clustering (10. ).In Exquisite this merical or otherwise, associatedpooled so thatwith bilater- them. Taking the than two vertices together. Graphs containing such edges method, one assigns a “connection strength” to vertex Clustering is possible according to many different defi- molecular machines endowexample neurons of a social with networkal groups of people, of two to the six vertices may are called hypergraphs.Hyperedgescouldbeusedtoin- pairs in the network of interest. In general each of the nitions of the connection strength. Reasonable choices in- 1 n(n 1) possible pairs in a network of n vertices is 2complex nonlinear dynamicalrepresent properties men or women,re-clude variousequivalent people weighted of vertex–vertex di neuronsfferent distancenationalities, measures, dicate family ties in a social network for example—n in- − assigned such a strength, not just those that are con- the sizes of minimum cut-sets (i.e., maximum flow) [7], locations, ages, incomes, are or coalesced many other into sin- things. Edges dividuals connected to each other by virtue of belonging nectedgardless by an of edge, the although animal’s there aresize versions or evolution- of the gle nodes. Connec- may represent friendship, but they could also represent to the same immediate family could be represented by ary lineage. Moreover, these properties are animosity, or professionaltions acquaintance, are represented or geographical an n-edge joining them. Graphs may also be naturally not static, but adaptively tunable. Cultured 17 For some reason such trees are conventionally depicted with their 16 Not to be confused with the entirelyproximity. different use of the wordThey can“root”carry atby the lines. top weights,and theirConnectivity “leaves” representing, at the bottom, whichsay, is not partitioned in various ways. We will see a number of neuronsclustering introduced artificially in Sec. III.B. prevented from ex-the natural order of things for most trees. how well two people knowalone each fails other. to explain They can also be examples in this review of bipartite graphs:graphsthat pressing a natural dynamic behavior can directed, pointingthey are in onlynotor predict onerepresentative direction. circuit func- Graphs of the com- culture.contain In vertices order of totwo do distinctsalt types, and with sugar edges which running we consider not to be representative rapidly modify their molecular makeup and tion. [Reproduced with permission from (29)] posed of directedthis automatically, edges are themselves we filter called thedirected networkonly by between removing unlike types. all So-calledof the relationaffiliation networks between recipes. the edges below a certain threshold w⇤. We start with Table I where we provide the basic char- (e) www.sciencemag.org SCIENCE VOL 284 2 APRIL 1999 (f) 97 acteristics of the networks of recipes with 4 w⇤ 7. The   Degree important aspect to notice here is that in all instances we are dealing with a network that is relatively dense, which only becomes sparse at higher edge-threshold levels. We

name each network as NoRw⇤ , where w⇤ is the threshold used in the formation of the network. Lately, there has been renewed interest on small-world networks, or more simply, on the concept that most people are just a short-step way from other people in the world. This concept was first proposed by Milgram in his famous six-degrees-of-separation experiment [19]. The concept has been defined more precisely [6] and today a network is said (g) to be small world if has two characteristics: short average Fig. 1. A 300-node Network of Recipes. Node size represents the degree while the color represents the country of origin. For visual- path lengths and high clustering coecient. ization purposes, the picture shows a network filtered where, w⇤ = 6. The verification that the recipe network is small world That is, only edges with weight greater or equal to 6 are kept—the suggests that most foods are related to others and that, recipes share at least 6 ingredients. despite di↵erences in their taste, recipes are concentrated Figure 2.1: Examples of various real world networks. (a)around A tourism a core set of ingredients. network Even where when we im- We have built a network from a dataset of 300 recipes pose thresholds on the number of ingredients shared to be from 15 di↵erent countries. We tried to select recipes that greater than 4, we still observe high connectivity between nodes are grouped in communitiesare considered of more different traditional from Africacolors3, Argentina, [32].the (b) recipes. A This regulatory means that people’s network diet may not be as Brazil, China, Germany, India, Japan, Mexico, Poland, diverse as one may think. We tend to stay around a core of a bacteria where nodesPakistan, are colored Romania, Russia, according Thailand, and the USA. to Fig- theset ofcommunity ingredients allowing forthey some branching belong out. In fact, ure 1 displays the network where nodes are sorted along it is the existence of this core that makes the NoR have to [155]. (c) A friendship networkthe x-axis by of degree. children The di↵erent colors in a represent US theschool 15 high clustering where coe nodescients even areat high colored values of w⇤. di↵erent countries in the dataset. The dataset was build Surely we can ask the question: What are the most con- from recipes extracted from the Living Cookbook software nected, or most influential recipes in the world? Providing according to race [150]. (d)that aggregates A network recipes from many of websites. sexual The recipes contactsa definitive fromanswer to this [172]. question requires (e) a larger The dataset we used come primarily from the “Darkstar’s Meal-Master of recipes. Instead, we can provide a hint about the origin Brazilian airline network whereRecipes” nodes which divides are recipes colored according to countries. according4 of these to recipes. its Better community yet, is that from our [156]. dataset we can probably argue that the recipes in Figure 3 are likely IV. Experimental Results to be liked by most people in the world. Our argument is (f) A network of the nervous system of the worm C.based elegans, on the computation from of the[112]. Pagerank of each (g) recipe. In this section, we present results obtained from our Net- A recipes network where thework of Recipes. color We of concentrate a node on the network’s represents prop- its country of origin, erties and on the formation of communities. We aim at NoR4 NoR5 NoR6 NoR7 demonstrating that food and their inter-relation can help 1 ITALY 12 from [116]. define cultures. The idea is to see whether the communi- ties formed tend to be diverse (i.e., they include recipes 2 BRAZIL 18 CHINA 17 from many di↵erent countries) or uniform (i.e., they tend to include recipes from fewer countries). But before we 3 PAKISTAN 12 discuss the communities, we present some results related CHINA 1 to the network’s structure as it may help reveal other in- 4 PAKISTAN 15 PAKISTAN 7 teresting characteristics such as the universality of a recipe (e.g., how “liked” is the recipe around the world). 5 AFRICA 9 RUSSIA 17 In all experiments presented9 below, we have worked with four versions of the network of recipes described in Section Fig. 3. Pagerank of specific recipes in each of the Networks of Recipes III. The di↵erences between each version is only on the (NoRs) used in this paper. edge-value threshold w⇤, that is, once the network is cre- ated we have filtered out edges whose weights fall below a Pagerank provides a good measure of how often a per- given threshold. As explained before, the motivation for son following the network links will non-randomly reach a this is that most recipes have common ingredients such as given node (a recipe in our case). A high Pagerank value hence indicates the importance of that recipe in the world 3 We have combined the African countries into one because we did not have access to many recipes from Africa. of recipes. Importance here correlates to how easy the in- 4 http://home.earthlink.net/ darkstar105/ gredients can be found. Nodes with high Pagerank values ⇠ 41 such systems was shown to follow a power-law distribution, with a few highly connected nodes (hubs) and a majority of nodes with only few connections. Such characteristics gave rise to a whole new group of networks known as Scale-Free networks [9].

• Transitivity relates to the property of triangles in social connections, e.g. if node a is connected to node b and node b is connected to node c then node a is connected to node c [70]. More formally, it is measured using a concept

known as clustering coefficient. The clustering coefficient, Ci, of a node i is given by:

2mi Ci = , (2.1) ki(ki − 1)

where mi is the number of links between the ki neighbors of i; the clustering

coefficient of the entire network is just the average of all Ci over the number of nodes in the network, |V |.

• Communities identification is a common analysis performed in complex networks. It is related to identification if nodes can be clustered in groups that share common characteristics or functions [69]. Examples of such analyses include biological networks [155, 177], airline networks [156] and collaboration networks [79].

• Assortativity measures the property of similar nodes that are connected to each other [98, 130, 138, 148]. As pointed out by McPherson et al. [138] there are two types of assortativity: status and value. The first one refers to the fact that individuals with the same status (e.g. religious, race or

10 gender) tend to interact with each other. The latter claims that people with similar values tend to interact with each other more often, regardless of social status. However, many other factors can influence this property, such as: geography, social ties and organizational structure. In the network aspect this could relate to nodes that have similar degrees, e.g. hubs connecting with hubs [148].

• Centrality measures were designed to identify those individuals who have more influence than others [21]. Many variations of centrality measures have been proposed [200], such as:

– Betweenness Centrality: accounts for the number of shortest paths passing through a specific node, representing how likely a node is to receive and forward information to different groups [16]. For a network

N = (V,E) the betweenness CB(v) for a node v is defined as:

X αst(v) CB(v) = , (2.2) s6=t6=v∈V αst

where αst is the number of shortest paths from s to t, and αst(v) is the number of shortest paths from s to t that pass through node v.

– Closeness Centrality: is calculated by the sum of distances to all nodes in the network. A high closeness indicates that a node is relatively close to all other nodes [153].

Real-world networks have not only been analyzed by measures but also through network models that describe their topological patterns. The most common

11 network models are the Erd¨os-R´enyi model, Watts-Strogatz model and Barab´asi- Albert model. Traditionally networks have been created randomly using the Erd¨os-R´enyi model [62, 63]. In such a model any pair of nodes have the same probability p of having an edge connecting them and 1-p probability of not having such an edge. Thus, any edge in the network is identically and independently distributed as a Bernoulli random variable, and the average distance between any pair of nodes is O(log N) [208]. As can be seen on the left side of Figure 2.2, as we increase p the network becomes more connected. On the right side we see the degree distribution of a large network created randomly, and its comparison with a Poisson distribution. However, due to increasing availability of high volume data, many other models have been developed in order to incorporate the “not random” features of real-world networks. In Barab´asiand Albert [19], the authors report the existence of an universal law that characterizes large-scale properties of many real-world networks. They claim that for example the WWW network, where nodes represent web pages and two nodes are connected if one has a link to the other, and the citation network have a similar characteristic, namely their degree distribution (see Figure 2.3). The probability of a node having a degree k, given by P(k), decreases as a power law: P (k) = ck−λ where c and λ are constants. These distributions, informally called long-tail, indicate the existence of a few highly connected nodes (hubs) and a very large number of nodes with a small number of connections; an unexpected feature in all previous random network models. It has also been observed that most real-world networks have similar power-law degree distributions; the exponent of the distribution normally assumes values between 2 6 λ 6 3 [7]. Two main

12 (a)$ (b)$

Figure 2.2: (a) Example of network evolution using the model of [63] with different p. (b) The degree distribution of a random network with 10,000 nodes and p=0.0015, and calculated the number of nodes with degree k, Xk. Both figures were taken from [7]. factors drive the generation of a degree distribution that follows a power law: the growth of the network and preferential attachment [19]. Preferential attachment drives the likelihood of the connection of new nodes to highly connected ones, and network growth confirms that the number of nodes in the networks is continuously increasing. In the Barab´asi-Albert model a network starts with m0 nodes and new nodes are added to the network linked to m < m0 existing nodes with a probability proportional to the degree of such node. Let p(ki) be the probability that a new node will create an edge to a node i with degree k as:

ki p(ki) = X (2.3) kj j

Thus, highly connected nodes (hubs) have a higher chance to have edges to new added nodes and increase their degree.

13 Figure 2.3: The degree distribution of real world networks. (a) The Internet network at the router level. (b) Movie actor collaboration network. (c) Coauthorship network of physicists. (d) Coauthorship of neuroscientists. All plots were taken from [7].

14 The small-world network can be generated by using the Watts-Strogatz model [203]. It does not incorporate such random features, but is yet widely present widely in many real-world networks, such as social networks. The Watts-Strogatz model was inspired by the widely known Milgram’s experiment [137]. The experiment relies on social events; we often meet people, end up discovering that we have a friend in common, and think “what a small world”. Milgram decided to investigate this situation on a more global scale; the goal of the experiment was to find short chains of acquaintances connecting people all over the country. Over many trials the average number in that chain was about five and six, corroborating with the “Six Degree of Separation” principle [85]. The Watts-Strogatz model, also known as small-world, is based on the idea of rewiring edges of a low-dimensional regular lattice to create the shortcuts that will link remote nodes together. Such structure was initially uncovered for one- dimensional lattice, but was latter extended to higher dimensions [202]. The probability of rewiring an edge, the main parameter of the system, denoted by p, defines the network structure from a completely random (p=1 ) to a regular lattice (p=0 ). Between these extremes lies the small-world network (Figure 2.4a), in which the average shortest path among a pair of nodes (often calculated using the algorithm in [54]) grows proportionally to the logarithm of the number of nodes |V |. Another characteristic of this kind of network is the high clustering coefficient (Figure 2.4b).

15 CollectiveCollective dynamics dynamics of of‘small-world’ ‘small-world’ networks networks DuncanDuncan J. Watts CollectiveJ. Watts & Steven & Steven H. Strogatzdynamics H. Strogatz of ‘small-world’ networks DepartmentDepartment of TheoreticalDuncan of Theoretical and J. AppliedWatts and Applied Mechanics, & Steven Mechanics, Kimball H. StrogatzKimball Hall, CornellHall, Cornell University, University, Ithaca, Ithaca, New York New 14853, York 14853, USA USA Department of Theoretical and Applied Mechanics, Kimball Hall, Cornell University, Ithaca, New York 14853, USA

ABSTRACTABSTRACTNetworksNetworks of coupled of coupled dynamical dynamical systems systems have been have used been to used model to modelbiological biological oscillators, oscillators, Josephson Josephson junction junction arrays, arrays, excitable excitable media, media,neural networks,neural networks, spatialABSTRACTspatial games, games, geneticNetworks genetic control of control couplednetworks networks dynamical and many and systemsothermany self-organizingother have self-organizing been used systems. to modelsystems. Ordinarily, biological Ordinarily, the oscillators, connection the connection Josephson topology topology junctionis assumed is arrays, assumed to beexcitable either to be media,completelyeither neuralcompletely networks, regularregular or completely or spatialcompletely randomgames, random genetic. But many. controlBut biological,many networks biological, technological and manytechnological other and self-organizing social and networkssocial networks systems. lie somewhere lieOrdinarily, somewhere between the connection between these two thesetopology extremes two is extremesassumed. to. be either completely regular or completely random. But many biological, technological and social networks lie somewhere between these two extremes. Here weHere explore we explore simple simple models models of networks of networks that can that be can tuned be throughtuned through this middle this middle ground: ground: regular regular networks networks 'rewired' 'rewired' to introduce to introduce increasing increasing amounts amounts of of disorder.disorder. We find We Herethat find thesewe that explore systemsthese simple systems can models be can highly beof networkshighlyclustered, clustered, that like can regular likebe tuned regular lattices, through lattices, yet havethis yet middle small have ground:characteristic small characteristic regular path networks lengths, path 'rewired'lengths, like random like to introduce random graphs. graphs.increasing We call We amounts call of them 'small-world'them 'small-world'disorder. networks networksWe ,find by thatanalogy, by these analogy with systems the with small-world can the besmall-world highly phenomenon clustered, phenomenon like (popularly regular (popularly lattices,known knownas yet six have degreesas smallsix degrees characteristicof separation). of separation). path The lengths, neural The likeneuralnetwork random network of thegraphs. of the We call worm wormCaenorhabditis Caenorhabditisthem elegans 'small-world' elegans, the power ,networks the powergrid ,of by gridthe analogy westernof the withwestern United the small-world United States, States, and phenomenon the and collaboration the collaboration(popularly graph known ofgraph film as actorsofsix film degrees actorsare shown of are separation). shownto be small-world to Thebe small-worldneural network of the letters to nature networks.networks. worm Caenorhabditis elegans, the power grid of the western United States, and the collaborationremo graphved from of afilmcluster actorsed neighbourhood are shownto mak etoa shor bet cut small-worldhas, at model—it is probably generic for many large, sparse networks most, a linear effect on C; hence C(p) remains practically unchanged found in nature. networks. for small p even though L(p) drops rapidly. The important implica- We now investigate the functional significance of small-world ModelsModels of dynamical of dynamical systems systems with small-world with small-world coupling coupling display display enhanced enhanced signal-propagation signal-propagation speed, speed, computational computationaltion here power,is that at the power,andlocal level synchronizability. (asandreflected synchronizability.by C(p)), the Intransition Inconnectivity for dynamical systems. Our test case is a deliberately to a small world is almost undetectable. To check the robustness of simplified model for the spread of an infectious disease. The Models of dynamical systems with small-world coupling display enhanced signal-propagation speed,these results,computationalwe have tested manpower,y different andtypes synchronizability.of initial regular population In structure is modelled by the family of graphs described particular,particular, infectious infectious diseases diseases spread spread more easilymore easilyin small-world in small-world networks networks than in than regular in regular lattices. lattices. graphs, as well as different algorithms for random rewiring, and all in Fig. 1. At time t ¼ 0, a single infective individual is introduced particular, infectious diseases spread more easily in small-world networks than in regular lattices.give qualitatively similar results. The only requirement is that the into an otherwise healthy population. Infective individuals are rewired edges must typically connect vertices that would otherwise removed permanently (by immunity or death) after a period of8 be much farther apart than Lrandom. sickness that lasts one unit of dimensionless time. During this time, The idealized construction above reveals the key role of short each infective individual can infect each of its healthy neighbours ALGORITHMALGORITHMTo interpolateTo interpolate between between regular regular and random and random networks, networks, we consider we consider the following the following random random rewiring rewiring procedure. procedure.cuts. It suggests that the small-world phenomenon might be with probability r. On subsequent time steps, the disease spreads ALGORITHM To interpolate between regular and random networks, we consider the following random rewiring commonprocedure.in sparse networks with many vertices, as even a tiny along the edges of the graph until it either infects the entire We startWe with start a with wherea whereeach vertex each vertexlike so.like so.We chooseWe choose a vertex, a vertex, and andWith probabilityWith probability p, we reconnect p, fractionwe reconnectof short cutsWe would repeatsuffice.We repeatthisTo test processthis thisidea, process webyhav e bypopulation, or it dies out, having infected some fraction of the We start with a where each vertex like so. We choose a vertex, and With probabilitycomputed p, Lweand reconnectC for the collaboration Wegraph repeatof actors inthisfeature processpopulation by in the process. ring ofring n vertices of n verticesis connectedis connected to its to its the edgethe to edge its nearest to its nearest this edgethis to edge a vertex to a vertexchosenfilms chosen (generated frommovingdata availablemoving clockwiseat http://us.imdb clockwise around.com), around the ringk nearestof n verticesk nearest neighbors neighborsis connected to its clockwiseclockwise neighbour.the neighbour. edge to itsuniformly nearestuniformly at randomthis at edge random over toelectrical athe oververtexpower the chosengridtheof the ring,westernthe considering Uring,nitedmoving States,considering andclockwise eachthe neural each around network of the nematode worm C. elegans17. All three graphs are of k nearest neighbors clockwise neighbour. uniformly at random over the the ring, consideringTable each1 Empirical examples of small-world networks entire ring,entirescientific with ring, interest. withThe graph of film actors isvertexa surroga teinvertexfor turna social in turn entire18 ring, with vertex in turn Lactual Lrandom Cactual Crandom duplicateduplicate edgesnetwork edges, with the advantage of being muchuntilmor eoneeasilyuntil specified.lap one lap...... It is duplicatealso akin to the edgesgraph of mathematical collaborations centred,until oneFilm actor laps 3.65 2.99 0.79 0.00027 forbidden.forbidden. traditionallyOther- Other-, on P. Erdo¨s (partial datais acompleted.vailableisat completed.http:// Power grid 18.7 12.4 0.080 0.005 n = 12 n = 12 k = 4 k = 4 forbidden. Other- is completed.C. elegans 2.65 2.25 0.28 0.05 n = 12 k = 4 wise, wewise, leavewww we.acs.oakland.edu/ leave ,grossman/erdoshp.html). The graph of ...... wise, we leave Characteristic path length L and clustering coefficient C for three real networks, compared the power grid is relevant to the efficiency and robustness of to random graphs with the same number of vertices (n) and average number of edges per the edgethe in edge place. in place.19 powerthenetworks edge. Andin place.C. elegans is the sole example of a completely vertex (k). (Actors: n ¼ 225;226, k ¼ 61. Power grid: n ¼ 4;941, k ¼ 2:67. C. elegans: n ¼ 282, k ¼ 14.) The graphs are defined as follows. Two actors are joined by an edge if they have mapped neural network. acted in a film together. We restrict attention to the giant connected component16 of this Table 1 shows that all three graphs are small-world networks. graph, which includes ,90% of all actors listed in the Internet Movie Database (available at http://us.imdb.com), as of April 1997. For the power grid, vertices represent generators, These examples were not hand-picked; they were chosen because of transformers and substations, and edges represent high-voltage transmission lines Next, we consider the As before, we randomly We continue this process, For p =their 0,inherent interestAs pand increases,because complete thewiring diagramsAtwere p =betwe 1,en allthem. For C. elegans, an edge joins two neurons if they are connected by either Next, we considerNext, the we considerAs before, the we randomlyAs before, weWe randomly continue thisWe process, continue this process,For p = 0, ForAs p =p 0,increases, As pthe increases, Atthe p = 1, all Ata synapse p = or1,a g apalljunction. We treat all edges as undirected and unweighted, and all available. Thus the small-world phenomenon is not merely a vertices as identical, recognizing that these are crude approximations. All three networks edgesedges that connect that connect vertices vertices rewire rewireeach of each these of these circulatingcirculating around around the ring the and ring and the ringthe is ring isgraph becomesgraph becomes13,14 edges areedges re- are re- edges that connect vertices rewire each of these circulating around the ring and curiositthe ringy of socialis networksgraphnor becomesan artefact of an idealized edgesshow the small-world are re-phenomenon: L ) Lrandom but C q Crandom. to theirto second-nearest their second-nearestto their second-nearest edges edgeswith probability withedges probability pwith. probability proceedingp. proceeding p. outwardproceeding outward to more outward to more to moreunchanged. unchanged.unchanged.increasinglyincreasingly disordered.increasingly disordered. disordered.wired randomly.wiredwired randomly. randomly. neighbours clockwise. distant neighbours after each 1 neighbours clockwise.neighbours clockwise. distant neighboursdistant after neighbours each after each p=0.15 p=0.15p=0.15 lap, untillap, each until originallap, each until original edge each originaledge edge Regular Small-world Random 0.8 C(p) / C(0) has beenhas considered beenhas considered been once. considered once. once. 0.6 As thereAs are there nk/2As are edgesthere nk/2 are edgesin nk/2 in edges in the entirethe graph,entirethe graph,the entire rewiring the graph, rewiring the rewiring 0.4 L(p) / L(0) processprocess stops afterprocessstops k/2 after stopslaps. k/2 after laps. k/2 laps. 0.2

p=0$ p=1$ 0 p = 0 p = 1 0.0001 0.001 0.01 0.1 1 This construction allows us to 'tune' the graph between regularity (p = 0) and disorder (p = 1), and thereby to probeIncreasing therandomness intermediate region 0 < p < 1, This constructionThis construction allows allows us to 'tune' us to the'tune' graph the graphbetween between regularity regularity (p = 0) (p and = 0)disorder and disorder (p = 1), (p and = 1), thereby and thereby to probe to theprobe inte thermediate intermediate region region0 < p < 0 1, < p < 1, p about which little is known. about aboutwhich whichlittle is little known. is known. Figure 1 Random rewiring procedure for interpolating between a regular ring Figure 2 Characteristic path length L(p) and clustering coefficient C(p) for the lattice and a random network, without altering(a)$ the number of vertices or edges in family of randomly rewired graphs descri(b)$bed in Fig. 1. Here L is defined as the the graph. We start with a ring of n vertices, each connected to its k nearest number of edges in the shortest path between two vertices, averaged over all neighbours by undirected edges. (For clarity, n ¼ 20 and k ¼ 4 in the schematic pairs of vertices. The clustering coefficient C(p) is defined as follows. Suppose 2 = METRICS We quantify the structural properties of these graphs by their characteristic path length L(p) andexample clusterings shown here, but coefficientmuch larger n and k areC(p)used.in the rest of this Letter.) that a vertex v has kv neighbours; then at most kvðkv 1Þ 2 edges can exist METRICSMETRICSWe quantifyWe quantify the structural the structural properties properties of these of graphsthese graphs by their by characteristic their characteristic path length path length L(p) and L(p) clustering and clustering Wcoefficiente choose acoefficientvertex and C(p)the edg.e that C(p)connects. it to its nearest neighbour in a between them (this occurs when every neighbour of v is connected to every other L(p) measures the typical separation between two vertices (a global property). C(p) measures theFigure cliquishness 2.4: (a) of a The typical random neighbourhood rewiring (a of local edges property). in a network with twenty nodes L(p) measuresL(p) measures the typical the typical separation separation between between two vertices two vertices (a global (a globalproperty). property). C(p) measures C(p) measures the cliquishness the cliquishnesscloc ofkwise a typicalsense. ofWith a protypicalneighbourhoodbability p, weneighbourhoodreconnect this (aedge localto a vertex (a property). chosenlocal property).neighbour of v). Let Cv denote the fraction of these allowable edges that actually changinguniformly at randomp ovfromer the entir zeroe ring, with todup one.licate edges Whenforbidden; othep=0r- exist.weDefin havee C as the aav regularerage of Cv over lattice,all v. For friendsh whenip networks,p=1these L is defined as the number averaged over all C is defined as follows. Then at mostthewise wek latticeleav (ke the –edge 1) is in/ place. random,2 edgesWe repeat this betweenprocessLet Cby moving denote theclock extremeswise the statisticsfractionhave lies intuitiveof Smallmeanings: WorldL is the averag networks.e number of friendships (b)in the L is definedL is defined as the as number the number averaged averaged over all over all C is definedC is defined as follows. as follows. Then atThen most at k most (k –k 1) (k a/round 2 – edges thev1)ring, /v cons2 edgesideringLeteach Cvertex Letdenotein turn Cuntil onedenote thelapvis fractioncompleted. the Next,fraction of shortest ofchain connecting two people; Cv reflects the extent to which friends of v of edges in the shortest pairs of vertices. Suppose that a vertex v v wevconsider the edges that connect vertices to theirvsecond-nearest neighbours are also friends of each other; and thus C measures the cliquishness of a typical of edges in the shortest pairs of vertices. Suppose that a vertex v canv existv betweenCharacteristic them. (This path v lengththeseL(p) allowableand clustering edges that coefficient C(p) for the Watts and of edges in the shortestpath between twopairs vertices of vertices. Suppose that ahas vertex k neighbours. v can existcan between existoccurs between them. whencloc (Thisthem. kwise.everyAs befor(This neighbore, we rathesendomly ofrewir allowablethesee each of theseactually allowableedges edgeswith exist.prob thatedgesability Definep , friendshthat C ip ascircle. theThe data shown in the figure are averages over 20 random path between two vertices has k neighbours.v Strogatzand continue this model.process, circulating Botharound figuresthe ring and pr wereoceeding takenoutward to fromrealizations [203].of the rewiring process described in Fig.1, and have been normalized path between two vertices has kv neighbours.v occursoccurs when everywhen neighborevery neighborv of is connected ofactually actually exist. Define exist. DefineC as averagethe C as the of C v ismore connecteddistant neighbours after each lap, until each edge in the originalaveragelattice has by ofthe valuesC L(0),vC(0) for a regular lattice. All the graphs have n ¼ 1;000 vertices v is connectedbeen considereto everyd once. other(As there are nk/2 edges in the entiraveragee graph, the rofewir ingCoverv and allan avvertices.ervage degree of k ¼ 10 edges per vertex. We note that a logarithmic to everyto othereveryprocessneighbour stops otherafter k/2 lap ofs.) Thre v.)e realizations of thisoverprocess allareovershown, vertices.for all horizontalvertices.scale has been used to resolve the rapid drop in L(p), corresponding to neighbourdifferent values ofp . Forv.)p ¼ 0, the original ring is unchanged; as p increases, the the onset of the small-world phenomenon. During this drop, C(p) remains almost neighbourgr aphof becomesv.) increasingly disordered until for p ¼ 1, all edges are rewired constant at its value for the regular lattice, indicating that the transition to a small 2.1.2randomly. One of our Diseasemain results is that Spreadingfor intermediate values of p, the Modelsgraph is world is almost undetectable at the local level. a small-world network: highly clustered like a regular graph, yet with small shortest path shortest path k = 4 neighbours at most 6 edgeschar acterisbetweentic path length,4 neighbourslike a random graph. (See4 Fig.out2.) of 6 edges exist. C = 4/6 = 0.67 v Spreading is an ubiquitous dynamic processv present in many aspects of society, shortest path is 1shortest edge path is 3 edges k = 4 neighbours at most 6 edges between 4 neighbours 4 out of 6 edgesNature exist. © Macmillan C = Publishers 4/6 = Ltd0.67 1998 shortest path shortest path kv = 4 neighboursv at most 6 edges between 4 neighboursNATURE | VOL 393 | 4 JUNE 19984 out of 6 edges exist. Cv = 4/6 =v 0.67 441 is 1 edgeis 1 edge is 3 edgesis 3 edges For friendship networks, these statistics have intuitive meanings: L is the average number of friendshipsranging in from the shortest innovations chain connecting [175] to two epidemiological people. diseases [53]. A better For friendship networks, these statistics have intuitive meanings: L is the average number of friendships in the shortest chain connecting two people. For friendship networks,Cv reflects these the statistics extent to havewhich intuitive friends ofmeanings: v are also L friends is the averageof each other; number and of thus friendships C measures in the the shortest cliquishness chain of connecting a typical friendship two people. circle. C reflects the extent to which friends of v are also friends of each other; and thus C measures the cliquishnessunderstanding of a typical of this friendship process circle. is crucial to develop methods that either hinder Cv reflectsv the extent to which friends of v are also friends of each other; and thus C measures the cliquishness of a typical friendship circle. the spread, in cases of diseases, or accelerate the process in the case of information SMALL The regular lattice at p = 0 is The random network at p = 1 is a These limiting cases might lead one to suspect that large C is always associated with SMALL TheWORLDS regularThe regular latticea lattice athighly p = at0clustered, isp = 0 isThe large random Theworld random networkpoorly network at clustered, p = 1at is p a=small 1 is worldaThese where Theselimiting limiting largecases L ,mightcases and small leadmight Cone leadwith dissemination.to smallonesuspect to L .suspect On that the large As contrary,that mentioned C large is always we C findis previously, always associatedthat there associated is manywith a broad countries with recently identified the SMALL where L grows linearly with n. L grows only logarithmically with n. interval of p over which L(p) is almost as small as L yet C >> C . WORLDSWORLDSa highlya highlyclustered, clustered, large worldlarge worldpoorly poorlyclustered, clustered, small worldsmall whereworld wherelarge Llarge, and Lsmall, and C small with Csmall with L small. On theL. On contrary, the contrary, we find we that find thererandom that is there a broad pis a broadrandom where L grows linearly with n. L grows only logarithmically with n. interval of p over which L(p) isspread almost of as infectious small as L diseases yet as theC >> greatest C global. problem they face [120]. Given where L grows linearly with n. L grows only logarithmically with n. interval of p over which L(p) is almost as small as Lrandom randomyet Cp >> Crandomp .random its importance, researchers have recently been using interdisciplinary areas such as complex networks to better understand the phenomenon of disease spreading [162]. The use of networks to better understand the process of disease spreading is not new; early works date from more than thirty years ago, and analyzed the networks These small-world networks result from the immediate drop in L(p) caused by the introduction of a few long-range edges. of many infectious diseases such as gonorrhea [96] and AIDS [134]. Aiming to have TheseThese small-world small-worldSuch networks 'short networks cuts'result connect fromresult the from vertices immediate the thatimmediate would drop otherwise drop be in L(p) caused by the introduction of a few long-range edges. a better understanding of the dynamics of such networks, the creation of models in L(p) caused bymuch the introduction farther apart of than a few Lrandom long-range. For small edges. p, each short Such 'shortSuch cuts''short connect cuts' connect vertices vertices that would that wouldotherwisecut otherwise has be a highly be nonlinear much farther apart than L . For small effectp, each on short L, contracting the much farther apart than Lrandomrandom. For small p, each short cut hascut a highlyhas a distancehighlynonlinear nonlinear not just between the 16 effect oneffect L, contractingon Lpair, contracting of vertices the the that it distancedistance not just notconnects, between just between butthe between the their pair ofpair vertices of vertices thatimmediate it that itneighbourhoods, 5 hops to connects,shortcutconnects, to but betweenneighbourhoods but between their their of neigh- neighbourhood immediateneighbourhoodimmediate neighbourhoods,bourhoods neighbourhoods, and so on. neighbourhoods of neigh- 5 hops 5to hops to shortcutshortcut to toneighbourhoods of neigh- bourhoods and so on. The data shown in the figure are averages over 20 random realizations of the rewiring process, neighbourhoodneighbourhoodByneighbourhood contrast,neighbourhood an edgebourhoods removed and from so aon. clustered neighbour- hood to make a short cut has, at most, a linear effect on C; and have been normalized by the values L(0), C(0) for a regular lattice. All the graphs have n = By contrast,hence an edge C(p) removed remains practicallyfrom a clustered unchanged neighbour- for small p even The data 1000shown vertices in the andfigure an areaverage averages degree over of k20 = random10 edges realizations per vertex. ofWe the note rewiring that a logarithmicprocess, By contrast, an edge removed from a clustered neighbour- The data shown inhorizontal the figure scale are hasaverages been used over to20 resolve random the realizations rapid drop inof L(p),the rewiring corresponding process, to the onset of though L(p) drops rapidly. The important implication here is and have been normalized by the values L(0), C(0) for a regular lattice. All the graphs have n = hood tohood make to makea short a cutshort has, cut at has, most, at amost, linear a effectlinear oneffect C; on C; and have been normalizedthe small-world by the phenomenon. values L(0), DuringC(0) for this a regulardrop, C(p) lattice. remains All the almost graphs constant have nat = its value for the that at the local level (as 1000 vertices and an average degree of k = 10 edges per vertex. We note that a logarithmic hence C(p) remains practically unchanged for small p even 1000 vertices andregular an average lattice, degree indicating of kthat = 10 the edges transition per vertex.to a small We world note is that almost a logarithmic undetectable at the local level. hence C(p) remains practically unchanged for smallreflected p even by C(p)), the trans- horizontal scale has been used to resolve the rapid drop in L(p), corresponding to the onset of thoughthough L(p) drops L(p) dropsrapidly. rapidly. The important The important implication implication here is here is horizontal scale has been used to resolve the rapid drop in L(p), corresponding to the onset of that at theition local to levela small (as world is the small-worldthe small-world phenomenon. phenomenon. During this During drop, this C(p) drop, remains C(p) remainsalmost constant almost constant at its value at itsfor value the for the that at the local almostlevel (as undetectable. regular lattice, indicating that the transition to a small world is almost undetectable at the local level. reflectedreflected by C(p) by), C(p)the trans-), the trans- regular lattice, indicating that the transition to a small world is almost undetectable at the local level. ition toition a small to a worldsmall isworld is almostalmost undetectable. undetectable. The 4 neighbors of With shortcut, each vertex have this is still true 3 out of 6 edges for almost among themselves. every vertex. The 4 neighborsThe 4 neighbors of With of shortcut,With shortcut, C = 3/6 = 0.5 C = 0.48 each vertexeach havevertex havethis is stillthis true is still true 3 out of3 6 out edges of 6 edgesfor almostfor almost among themselves. every vertex. among themselves. every vertex. C = 3/6 = 0.5 C = 0.48 C = 3/6 = 0.5 C = 0.48 that describe the behavior of such systems was a natural step in this kind of research. Many of the models relied on the classification of groups having different sexual activity regimes, leading to highly heterogeneous networks, in which it is known to be harder to eradicate diseases [13]. Such heterogeneity requires special immunization strategies [94]. Early studies also focused on correlated networks, which are networks in which the degree of each node depends of the degree of its neighbors [80]. An example of this kind of network is undirected Markovian random networks, which are defined in terms of connectivity distribution (P(k)) and conditional connectivity distribution (P (k0|k)) [199]. The epidemic threshold in complex networks, i.e. the scenario in which new cases of a specific disease exceed the expectation [194], is determined by the connectivity of the nodes rather than by the connectivity distribution, as happens in uncorrelated networks, i.e. the results often found in scale-free and small-world networks can be shifted to this kind of network [28]. As pointed out by Morris et al. [142], who analyzed parameters in sexual relationships that can increase the speed and pervasiveness of HIV transmission, sex is simply not random (as is the case in most social interactions), thus it cannot be modeled as such. This is evidence that this new network science is better suited to model the spreading of diseases than previous stochastic models [121]. Recently the entire world feared the power of a pandemic disease known as H1N1 influenza [72]. Spreading simulations were performed, using a model that accounts for travel and short-range human mobility combined with demographic data to evaluate the effectiveness of travel restrictions on H1N1 influenza [18]. At the time of the pandemic, in 2009, about 40% of the international air traffic coming from and heading to Mexico was down due to travel restrictions. The results of the

17 simulations show that this 40% reduction only delayed the arrival of the infection in other countries by three days. Moreover, even if the restrictions had been as unrealistic as 90%, they would only have delayed the arrival of the infection by two weeks (Figure 2.5). This is due to the plurality of human mobility alternatives; the problem is not long-range mobility such as airplanes, but short-range mobility, such as vehicular transport. H1N1 Pandemic and Travel Restrictions

(a)$ (b)$

Figure 4. Delaying effects in the international spread. A, Delay in the case importation from Mexico to a given country compared with the reference scenario as a function of the travel reduction a. The delay is measured in terms of the date at which the cumulative distribution of the seeding from Mexico (see Figure 2) reaches 90%. The dotted line shows the logarithmic behavior relating the delay as a function of the imposed Figurerestrictions. The 2.5: largest (a) delay, Delay gained when in imposing thea~ case90%, is importationless than 20 days for allfrom countries. TheMexico model also to considers a given the implementation country of sanitary interventions in Mexico during the early stage that was able to damp the exponential increase of cases in the outbreak zone. Travel comparedrestrictions would thereforewith leadthe to referencea larger impact during scenario this phase due as to a the function mitigating effect of on the local travel epidemic.reduction If a country is seededα. during (b) Thethis phase, same the resulting as (a) delay where induced by earlier the travel restrictions dates for would the be larger, start thus creatingof the the intervention observed differences in are the resulting considered, delays by country. B, as in A, where earlier dates for the start of the intervention are considered, has a fixed a~90%: April 25, corresponding to the day after hasthe international a fixed alert;α April= 16, 90%. corresponding Figure to thetaken epidemic alert from in Mexico; [18]. March 28, corresponding to the onset of symptoms of the first case in the US; and 6 weeks before the international alert. In all these scenarios and for different countries, the delay is always less than 20 days, highlighting that even the enforcement of strong travel reduction as early as possible would have had little effect. doi:10.1371/journal.pone.0016591.g004

0 between topology and traffic, relating the number of passengers wij During the entire duration of the outbreak experienced by the Dk traveling from airport i to airport j to the degrees ki and kj of the two subpopulations, each of them can in principle seed some of the 1=2 1 subpopulations is expressed by SwijT~w0 kikj , with w0 neighboring subpopulations thus leading to a number Dk of representing the mobility scale of the system [38]. diseased subpopulations of degree k at generation 1, for various Disregarding the high-resolution details of numericalÀÁ approach- values of the degree k. By iterating the seeding events, it is possible n es, this synthetic metapopulation model can now be analyzed, to describe the evolution of the number Dk of diseased defining a new theoretical framework that allows for the study of subpopulations with degree k at generation n, yielding: epidemic containment. Starting from a single subpopulation infected at time t~0, it is possible to describe the invasion n{1 m {l D dynamics at the subpopulation level in a Levins-type approach by n n{1 k k k Dk~ D (k0{1)P(k k0)1{R 0 1{ 1 k0 0 considering the microscopic dynamics of infection and of j m~0 Vk !ð Þ k0 individual travel [37]. The system is characterized by a 18 X X subpopulation reproductive number R*. Analogous to the reproductive number R0 at the individual level, R* indicates a The r.h.s. of Eq. (1) describes the contribution of the threshold behavior of the system: if R*.1 the epidemic reaches subpopulations of degree k’ at generation n-1 to the infection of n{1 global invasion; otherwise, it is contained at its source. It is possible subpopulations with degree k at generation n. Each of the Dk to derive an expression for the global invasion threshold in a has (k’21) possible connections along which the infection can branching process approximation [39,40]. Under the assumption proceed (21 takes into account the link through which each of that subpopulations having the same number k of connections are those subpopulations received the infection). In order to infect a equivalent (i.e. the degree-block approximation, see Text S1), we subpopulation of degree k, three conditions need to occur: (i) the 0 define Dk as the number of diseased subpopulations of degree k at connections departing from nodes with degree k’ point to generation 0 (i.e. at the beginning of the branching process). subpopulations of degree k, as indicated by the conditional

PLoS ONE | www.plosone.org 6 January 2011 | Volume 6 | Issue 1 | e16591 SIS Models

Most models that encapsulate the behavior of infectious diseases spreading in a given population rely on compartments that divide the population into a discrete set of states [34]. As mentioned previously, the SIS model describes diseases that do not confer immunity to their survivors, so that an individual acquires the same disease more than once; this applies to diseases such as tuberculosis and common flu [161, 162]. This model assumes that each individual can be in one of two possible states, susceptible (S) or infected (I). Susceptible nodes represent healthy individuals that can acquire the disease if exposed to infected nodes. Once a node catches the infection it becomes infected. Later it becomes susceptible again after some time in which it recovers. This model is based on two main parameters, the infection rate µ and the healing rate δ [161]. The state of a node i with k neighbors at time t + 1, denoted as τi(t + 1), is defined as:

 if τ (t) = S and @n which τ (t) = I;  i k k S   or if τ (t) = I with probability δ  i  τi(t + 1) = (2.4)    if τi(t) = S with probability µ; or if  I  τi(t) = I with probability 1 − δ In their work, Pastor-Satorras and Vespignani focus on the persistence and prevalence of the infected population in scale-free networks, once they represent a good topological structure for disease spreading [123]. This kind of topology might reflect the process of disease spreading through casual social contacts, such as sexual relationships. Their results show, as expected, an epidemic threshold that defines two different populations when the exponent of the degree distribution is

19 less than -3 and higher than -4. However, when it assumes values between -2 and -3 there is no epidemic threshold, and the epidemic state can happen at any time, thus changing many standard immunization strategies in disease spreading. These findings also suggest that the infection rate of a disease does not matter, it can rapidly proliferate in these networks. They show that nodes with a low degree have a higher chance of being infected once they are connected to the hubs in high values of λ. The study of epidemic threshold values was initially investigated in [13]. Another work that uses the SIS model is Kuperman and Guillermo [117], specifically on small-world networks. In their work, before returning to the susceptible state (S) a node remains temporarily refractory (R) during a given period of time; after that time passes the individual who was infected becomes susceptible again. Formally, the state of a given node i at time t, denoted as τi(t), is given by:

τi(t) = S if πi(t) = 0,

τi(t) = I if πi(t) ∈ (1, πI ),

τi(t) = R if πi(t) ∈ (πI + 1, π0).

Where πi represents a counter of a node i, πi(t) = 0, 1, . . . , πI + πR ≡ π0, describing its phase in the cycle of the disease. Each individual remains infected and refractory for a given amount of time, defined by the parameters πI and πR, respectively. Its state in the next time step depends on its current state and the state of its neighbors. The rules for state transition are:

20 πi(t + 1) = 0 if πi(t) = 0 and no infection occurs,

πi(t + 1) = 1 if πi(t) = 0 and i becomes infected,

πi(t + 1) = πi(t) + 1 if 1 ≤ πi(t) < π0,

πi(t + 1) = 0 if πi(t) = π0.

Their work focuses on the impact of the variation of the rewiring probability p of the Watts-Strogatz model on the evolution of disease spread, ranging from low- amplitude at small p (endemic scenario) to wide amplitude at large p, resembling a periodic epidemic pattern typical of large populations. According to the authors such results are congruent with real-world data from [42]. Another variation of the SIS model was used by Javarone and Armano [100]. In their experiments, all nodes in the network have the same probability v of becoming infected once they have contact with an infected node, as it happens in the classic SIS model. Each node also has a fitness parameter, and heals with probability δ computed as follows:

−φi(t−ti) δi = f(φi, ti, t) = 1 − e (2.5)

The introduction of such fitness makes the healing probability of an individual increase over time, the higher this parameter the faster δ will increase, thus increasing the chances of healing. Their results simulate scenarios in different proportions of infection rates over healing rate in random networks, small-world networks and scale-free networks. Their findings show that when the infection rate is at least 75% higher than the healing rate, the infection becomes stable in any topological situation, whether scale-free networks, small-world networks or random networks.

21 Attempting to understand how to stop an infection rather than how to spread it, many works have studied immunization schemes in different network scenarios [182]. The authors of [51] claim that the random distribution of cures in an infection network with power law degree distribution is not effective, being totally useless to change any infection spread. On the other hand, any other strategy that takes into account the heterogeneity of the degree distribution, paying more attention to the hubs, is not only more cost effective but also less expensive by requiring fewer cures.

SIR Models

As mentioned previously, SIR models describe diseases that confer immunity to their survivors. In this perspective, the work of Zanette analyses the spread of an infection in a small-world network that uses a SIR model [214]. Such modeling is analogous to spreadinf a rumor, where susceptible individuals have not heard the rumor, infected individuals heard the rumor and are willing to transmit it, and removed individuals have no interest in the rumor and thus do not transmit it. Their results show that the regime of the spread can differ in two groups, those who are bound to a finite neighborhood of the initial infected node and those where it affects a finite fraction of the population. They also show that when p is small the infection is bound to a small neighborhood affecting only a small part of the whole network, as we increase p the infection reaches a significantly larger portion of the network. Similar results were also reported by [117]. Many works use differential equations to describe disease spreading [127]. One example is the work of Moreno et al. [212]. They use a variation of the SIR model in which they investigate the role of topology, mainly heterogeneous networks, such

22 as scale-free networks, on the dynamics of the spreading. Their goal is to maximize the spread of information, such as gossip peer-to-peer protocols [78], by analyzing time profiles for each state. They use a numerical technique [141] that deals with mean-field rate in differential equations, such as:

i(t) + s(t) + r(t) = 1 (2.6) di(t) = −λhkii(t)s(t) (2.7) dt ds(t) = λhkii(t)s(t) − αhkis(t)[s(t) + r(t)] (2.8) dt dr(t) = αhkis(t)[s(t) + r(t)] (2.9) dt (2.10)

Where λ stands for the infection rate, α stands for the healing rate and hki is the mean number of connections. The main advantage of such modeling is the modest use of CPU time and memory requirements for large system sizes if compared to actually simulating the behavior of each individual or generating actual networks [81]. Variations of such modeling have been used to model information dissemination in online web forums [193, 209, 210]. Hybrid approaches that use networks and differential equations have also been proposed [107]. However, an approach based on differential equations is not able to reproduce the peculiarities of connectivity patterns that are presented in real-world systems [106, 212]. The focus of this dissertation is network-based disease spreading models. May et al. looks at the infection dynamics from a different perspective, aiming to better understand the impact of the basic reproductive number (Ro), i.e. the average number of secondary infections produced when one infected individual is

23 introduced into a host population where everyone is susceptible [135]. The authors analyze the spread in a closed population using a SIR model in which, a priori, it is impossible for the infection to be maintained over a long period once the number of susceptible nodes decreases over time. They conclude that for diseases with

R0 < 1, the focus should be on identifying the source and potential reservoir and reducing or eliminating infective contacts. In cases of diseases with R0 > 1, which are the most dangerous, the origin of the infection is not important and efforts should be focused on intervention aimed at stopping a self-sustained epidemic. For random networks, as the one proposed by [63], it was shown that about 20% of the number of nodes in the network will never be infected [191]. Recently this property was also extended to small-world networks [214], however the percentage of never infected nodes is considerably larger. By using a SIR model, the work of Liu et al. corroborates with such findings through numerical simulations [122]. It has also been shown that the topology of scale-free and small-world networks influences the overall behavior of diseases that can be described using SIR models, with the fluctuation in the degree of the nodes playing a major role in the incidence of infection [140]. This characteristic becomes even more important in scale-free networks in which this fluctuation is higher, exposing a weakness in such networks to resist or eradicate an infection. If the infection has seeds among the less connected nodes or among the hubs, the behavior of the infection is practically the same in Small World networks, however the same cannot be said about scale- free networks, which exhibit higher vulnerability among the hubs. SIR models were previously studied in stochastic models in Ball et al. [71] both locally and globally.

24 A variation of the SIR model in which an equal spread rate among all nodes in the network is assumed, i.e. a node spreads the disease to all its immediate neighbors, was introduced by Newman and Watts [152] and later extended in [151]. The result of such spread is similar to the logistic growth function, which shows initial exponential increase followed by a saturation state [11]. The logistic growth function is a common sigmoid function present in many biological processes, such as tumor growth [6]. As a consequence of this behavior, the number of nodes that has the disease at any time is proportional to the number of immediate neighbors of the node that started the spread. However, this scenario is unrealistic, even for very virulent diseases, thus they empirically define a proportion of the population that is susceptible to the disease. The values that this parameter can assume become crucial to understand the situation where the largest connected component covers most of the network and an epidemic state ensues [149]. This problem is known as site percolation [201].

2.1.3 Probabilistic Graphical Models

Causation is the relation between two particular events: something happens and causes something else to happen. In this case each cause is a particular event and each effect is a particular event. A given event X can have more than one cause, none of which alone suffice to produce X. Causation is usually transitive, irreflexive (an event cannot cause itself) and antisymmetric (if X causes Y , Y cannot cause X) [188]. Probabilistic graphical models (PGMs) consist of a framework to reason with in systems with large parameters that interact among them [113]. PGM’s encode the

25 joint probability distribution over a finite set of variables X1, ..., Xn and represent it as a series of conditional probabilities over each variable given its parents in the graph, i.e., for each variable Xi and its parents, denoted as pa(Xi) there is a conditional probability distribution P (xi|pa(Xi)). More specifically, PGMs allow a declarative representation of the system that we want to characterize [154], in other words, it allow us to analyze a given system without any specific knowledge about the domain. PGMs are well suited to this analysis because they use a graph- based representation as a form to compactly encode a complex distribution over a high-dimensional space. In this study the nodes will represent the variables in the domain (the parameters of the simulation and the results) and the edges represent the direct probabilistic interaction between them. PGMs deal with uncertainty, an important aspect of many real-world processes that occurs because observations contain noise due to partial observation of the phenomenon, which nevertheless has to be dealt with [101]. In many real-world situations our understanding of what is actually going on is incomplete, thus we need to describe them probabilistically [37]. The representation of a dynamic system in terms of a probabilistic graphical model allows us to use different algorithms to reason with it, because it provides its own clear semantics [91]. In many real-world distributions variables only tend to interact directly with a few others; such property is exploited by the graphical representation of the system. The advantages of this representation include modularity, transparency and being written down tractably [215]. A widely known probabilistic graphical model is Naive Bayes [113], which assumes that instances fall into a number of mutually exclusive classes C, and that it also contains a number of features X1,...,Xn whose values are typically

26 observed. This representation relies on the assumption that the features are conditionally independent given the instance’s class. This assumption is called the Naive Bayes assumption, and is given by:

(Xi ⊥ X−i|C) for all i (2.11)

Formally a Bayesian network is a directed acyclic graph whose nodes are random variables and edges correspond to the direct influence of one node on another [113]. The illustration of the representation can be seen in Figure 2.6. This representation allows a linear parameterization in the number of variables instead of an exponential parameterization used in explicit representation of the joint probability. This representation uses two basic rules from probability which are more compatible with causality [50]. First, the Chain’s rule, defined as:

50 P (α1 ∩ ... ∩ αk) = P (α1)P (α2|α1))Chapter··· P (α 3.k|α1 The∩ ... Bayesian∩ αk−1). Network(2.12) Representation

Class

X1 X2 . . . Xn

Figure 3.2 The Bayesian network graph for a naive Bayes model Figure 2.6: The Bayesian network graph for a naive Bayes model. Figure taken from [113]. model (also known as the Idiot Bayes model). The naive Bayes model assumes that instances fall into one of a number of mutually exclusive and exhaustive classes. Thus, we have a class variable C that takes on values in some set c1,...,ck . In our example, the class variable is the student’s intelligence , and there are{ two classes} of instances — students with high I 27 intelligence and students with low intelligence. features The model also includes some number of features X1,...,Xn whose values are typically observed. The naive Bayes assumption is that the features are conditionally independent given the instance’s class. In other words, within each class of instances, the di￿erent properties can be determined independently. More formally, we have that

(Xi X i C) for all i, (3.6) ? | where X i = X1,...,Xn Xi . This model can be represented using the Bayesian network of figure{ 3.2. In this} example,{ } and later on in the book, we use a darker oval to represent variables that are always observed when the network is used. factorization Based on these independence assumptions, we can show that the model factorizes as: n P (C, X ,...,X )=P (C) P (X C). (3.7) 1 n i | i=1 Y (See exercise 3.2.) Thus, in this model, we can represent the joint distribution using a small set of factors: a prior distribution P (C), specifying how likely an instance is to belong to di￿erent classes a priori, and a set of CPDs P (Xj C), one for each of the n finding variables. These factors can be encoded using a very small| number of parameters. For example, if all of the variables are binary, the number of independent parameters required to specify the distribution is 2n +1 (see exercise 3.6). Thus, the number of parameters is linear in the number of variables, as opposed to exponential for the explicit representation of the joint.

Box 3.A — Concept: The Naive Bayes Model. The naive Bayes model, despite the strong as- sumptions that it makes, is often used in practice, because of its simplicity and the small number classification of parameters required. The model is generally used for classification — deciding, based on the values of the evidence variables for a given instance, the class to which the instance is most likely to belong. We might also want to compute our confidence in this decision, that is, the extent to which our model favors one class c1 over another c2. Both queries can be addressed by the following ratio: P (C = c1 x ,...,x ) P (C = c1) n P (x C = c1) | 1 n = i | ; (3.8) P (C = c2 x ,...,x ) P (C = c2) P (x C = c2) 1 n i=1 i | Y | And as immediate consequence of the definition of conditional probability, Bayes’rule, formally defined as:

P (β|α)P (α) P (α|β) = (2.13) P (β)

For example, this approach allows to answer queries that are concerned with the posterior probability distribution over the values m of M, conditioned on the fact that C = c, in other words the marginal over M that we obtain by conditioning on c, formally: p(M = m|C = c). The assignment of values to observed variables is called evidence. According to [113] such probability queries can be classified as:

• Causal Reasoning: Attempts to predict the downstream effects of various factors, starting from the independent ones until we reach the desired class.

• Evidential Reasoning: It reasons from effect to causes, starting from nodes closer to the target until we reach the top nodes, the independent ones.

• Intercausal Reasoning: Here the goal is to find out the interactions among different causes of the same effect. This type of reasoning is a very common pattern in human reasoning.

Such inference can be easily made by using Bayes’rule [84]:

p(m, c) p(M = m|C = c) = . (2.14) p(c)

Each of the instances of the numerator is a probability expression p(m, c), which can be computed by summing all the entries in the joint probability table that correspond to assignments consistent with m and c. More precisely let X be

28 the set of all variables of the system and W = X −M −C be the random variables that are neither query nor evidence. Then:

p(m, c) = X p(m, c, w). (2.15) w

Because M, C and W are all network variables, each term p(m, c, w) in the summation is simply an entry in the joint distribution. The probability p(c) can also be computed directly by summing up the joint, as follows:

p(c) = X p(m, c). (2.16) g

The key operation that it is performed when computing the probability of a subset of variables is that of marginalizing out variables from a distribution. The conditional probability of a parentless node is just its prior probability:

P (xi|0) = P (xi). The joint probability distribution over the set X = {X1, ..., Xn} can be obtained by taking the product of all of these conditional probability distributions as in Eq. 2.12:

n Y P (x1, ..., xn) = P (xi|pa(xi)) (2.17) i=1

Such probability distribution satisfies the Markov condition, i.e. that each variable is independent of all others given its parents [59]. For example, if neither

Xj nor Xk are parents of Xi then P (xi|pa(xi), xk, xj) = P (xi|pa(xi)). Bayesian network inference exploits the network structure, in particular the conditional independence and the locality of influence. However, this approach to the inference problem is not very satisfactory since it returns us to the exponential

29 blowup of the joint distribution that the graphical model representation was precisely designed to avoid [113]. One solution to overcome this problem is to adopt a dynamic programming strategy. Dynamic programming inverts the order of computation-performing it inside out instead of outside in. This approach is possible because of the Bayesian networks structure that allows some subexpressions in the joint probability distribution to only depend on a small number of variables and, once we compute these expressions and cache their results, we do not need to generate them exponentially again [61]. Another kind of query is MAPs (maximum a posteriori) queries, which aim to direct high probability joint assignment to some subset of non-evidence variables [113]. More formally, if W = X − E, the goal is to find the most likely assignment to the variables in W given the evidence E = e:

MAP (W|e) = argmaxwP (w, e) (2.18)

where argmaxxf(x) represents the value of x for which f(x) is maximal. The main difference between probability queries and MAP queries is that when each variable, individually, picks its most likely value it can be quite different from the most likely joint assignment of a group of variables [143]. One important aspect is that it is impossible to take a fully specified distribution P and construct a graph G for it; the joint distribution is much too large to be represented explicitly [215]. Another aspect is that not every distribution can be transformed into a graph that will hold its independencies. This first type of counterexample involves regularity in the parameterization of the distribution that cannot be captured in the graph structure. One example

30 is a deterministic relationships, as for example logical functions (and, or, xor) can lead to distributions that cannot be represented by a graph that yields to its dependencies [113]. Other complications can arise due to symmetric variable-level independencies that are not naturally expressed within a Bayesian network, or even because the independence assumptions imposed by the structure of Bayesian networks are simply not appropriate. In order to build a Bayesian network the following steps must be accomplished: selecting the system variables and their possible values; determining the structure of the graph; and obtaining the conditional probability distribution over each of the system variables [59]. Most researchers use a database to construct a Bayesian network, using the variables of the database as variables of the system and learning algorithms to generate the Bayesian network [44,165]. Such algorithms require the database to contain a sufficiently large number of cases, so that there are no missing values or selection bias. Many algorithms have been proposed to construct a Bayesian network; one of the most used is the PC-search algorithm [187]. The algorithm relies on the following basic observation: if X and Y are adjacent in G we cannot separate them with any set of variables. Thus, if X and Y are not adjacent in G then we can find a set U so that (X ⊥ Y |U), where U is the witness of their independence. For each pair of variables, we consider all potential witness sets and tests of independence. If we find a witness U that separates the two variables, the witness can be either parent of X or parent of Y. If we do not find a witness, then we conclude that the two variables are adjacent in G.

31 Many algorithms for causal inferences have been proposed over the past decades under weak assumptions about the data generating process [188]. Such assumptions are:

• Causal Markov Assumption: A directed acyclic graph G over a set of variables V and a probability distribution P (V ) satisfy the Causal Markov Assumption if and only if for every W in V , W is independent of V \(Descendants(W ) ∪ Parents(W )) given Parents(W ).

• Faithfulness Assumption: It says that all conditional independence relations are consequences of the Causal Markov Assumption applied to the directed graph representing the causal relations. Let G be a causal graph and P a probability distribution generated by G. < G, P > satisfies the Faithfulness assumption if and only if every conditional independence relation true in P is entailed by the Causal Markov Assumption applied to G.

• Causal Minimality Assumption: It says that each direct causal connection prevents some independence or conditional independence relation that it would otherwise obtain. Formally, for a graph G over a set of variables V and a probability distribution P satisfies the Causal Minimality condition if and only if for every proper subgraph H of G with vertex set V , the pair < H, P > does not satisfy the Causal Markov Assumption.

2.2 State-of-the-Art

The use of statistical frameworks to understand the disease spreading phenomenon has been extensively investigated in the literature. Most applications of Bayesian

32 networks to disease phenomena have been in improving the diagnoses of chronic [52], psychological [47,64,185], seasonal [169], cardiovascular [131] and degenerative diseases [36,124,170]. Other efforts used such probabilistic frameworks to model the transmission of real-world diseases. One example is the Severe Acute Respiratory Syndrome (SARS) that occurred in China in late 2002. The epidemic lasted 105 days and caused many economic and social problems [166]. In Walker et. al. [198] the authors use Markov Chain Monte Carlo (MCMC) sampling methods [77] to choose the best parameters for a variation of the SIR model modified to this disease [183, 184]. They focus on a contagious network with small-world features, thus justifying the choice of MCMC methods due to the random aspect of the long-range edges in small-world networks. They use Approximate Bayesian Computation to simulate their results and summary statistics to compare simulated data from the model with the observed data; i.e. they are interested in the posterior distribution of the parameters given the observed data. Thus, they use Bayes’rule, which expresses the posterior as the product of the likelihood of data given a particular set of parameters and describe the assumptions about the parameters before observing the data. The estimated parameters were consistent with the parameter values chosen based on medical opinions. Attempting not only to understand the rate of transmission and the direction and speed of the spread, but also uncover an effective measure to contain the outbreak, the work of Maeno [126] uses Monte Carlo stochastic simulation [109] to understand the influence of spatial heterogeneity in the spreading mechanism. The author focuses on the macroview of transportation in the dynamics of the spread. Its method aims to discover the effectively decisive topology of a network

33 of roads and transmission and reveals the parameters that govern such stochastic transmission in the early growth phase of the outbreak. Its results show that the estimation of parameters of spreading is particularly successful when the topology is sparse and reproduction is slow. In Blum et al. [26], the authors use Approximate Bayesian Computation (ABC) [22] to overcome the recurrent problem of missing data in epidemiological infections and assess an alternative to MCMC. They argue that MCMC, although popular, may be computationally prohibitive for missing high-dimensional observations, and that the proposed distributions must be fine-tuned for efficient MCMC algorithms [168]. ABC methods only rely on numerical simulations of the model and comparisons between simulated and observed summary statistics [173]. They show that for standard compartmental models, such as SIR, the posterior probability found with their method is similar to others found in the literature. The use of statistical frameworks to model endemic diseases such as dengue was investigated by Samat and Percy [178]. They attempted to estimate the relative risk of disease spreading by extending the fundamental SIR Poisson- gamma model and develop a Bayesian analytic approach. Many works on disease mapping use regression-type models in which both observable and unobservable variables are included to give a clean map, and so to represent the true excess risk [23, 29, 111, 119]. Their contributions include a method that takes into account the transmission of the disease in human populations and enables covariate adjustments and spatial correlation between adjacent areas, thus overcoming issues of previous modeling such as standardized morbidity ratio. With the goal to detect an epidemic or bio-terrorist attack before confirmed diagnosis can be made, the field of biosurveillance tracks the evolution of a given

34 disease over time [171]. Bio-terrorism attacks aim to cause fast and severe effects yet may not be readily recognized. Scientists do not have experience with such bio-agents, thus it is critical that they are detected early on in order to minimize damage and ensure mitigating measures can be taken [93]. Biosurveillance consists of two parts. In the first part lies the detection and tracking of a singular underlying phenomenon given a set of noisy observations (patient records) over time. The second part consists of using the relationships among identified objects to reason about the current epidemic, with the goal to leverage the information discovered in the first part to contextualize and explain the observations. Blind and Das [25] propose a statistical approach to biosurveillance analyses. Their approach is composed of Latent Semantic Analysis [49] to reduce the dimensions on disease space, then they cluster the entities using the nearest neighbor heuristic [45], and finally they use dynamic Bayesian networks to assess the state of a disease outbreak [99]. They validated their work in influenza-like diseases using real-world data.

2.3 Literature Gaps

2.3.1 Problem Statement

We focus on the following gaps in the literature that need to be investigated. These gaps can be summarized by raising the following research questions:

• Question 1: Is it possible to use Bayesian networks to better understand the phenomenon of disease spreading?

35 Gap 1: It is unclear how Bayesian networks can be used to provide a different perspective on the study and analysis of disease spreading models.

Description: Bayesian networks are a widely known framework that provides a holistic view of a system in terms of its dependencies. Such an approach could be used to present a new perspective on the study of disease spreading models, by showing how the different parameters of the system interact to produce real world behaviors of the phenomenon.

• Question 2: Is it possible to create a methodology to compare different disease spreading models by combining their results?

Gap 2: There is a lack of methods to compare different disease spreading models by assessing their limitations and modeling decisions.

Description: A merged Bayesian network created by merging the results of the analyzed models could potentially reveal which dependencies of each model are more likely to be accurate, by belonging to the actual phenomenon described, or spurious; thus we would have a way to compare the models analyzed based on such accuracy.

2.3.2 Thesis Statement

A methodology based on Bayesian networks to study and compare different disease spreading models highlights how the modeling decisions affect the dynamics of the disease spreading as well as identify the limitations of each model. This methodology will allow the comparison of different models based on their accuracy,

36 in order to identify which dependencies among the variables of the system are more likely to be present in real phenomenon.

37 Chapter 3

Proposed Methodology

In this chapter we present the main contribution of this dissertation, which is an approach, formalized as a methodology, to study and compare disease spreading models. Before explaining the details of the proposed methodology we present a controlled experiment that supports our Weak Ensemble assumption and shows how the methodology was designed and outlines the main ideas behind it.

3.1 Controlled Experiment

A given phenomenon has its own dynamics, in terms of a Bayesian network; such dynamics are represented in terms of the dependencies between the variables of the system. In this experiment, we suppose that the phenomenon is represented by the following algebraic calculations:

C = 2A + B2 (3.1)

K = 3C − 4E (3.2)

38 Where the variables A, B and E are independent variables and can assume any integer value between zero and nine; K is the target variable, i.e. the variable that we are trying to predict. In order to identify the structure of the Bayesian network that represents this phenomenon, we executed experiments using all possible combinations of values that each variable can assume and recorded the resulting values of K. The results were stored as a joint probability table where, for each assignment of a given variable, we are were to predict its effects on the other variables considered. Next, we used the PC search algorithm [187], which is an algorithm that given a joint probability table searches for the dependencies among the variables and creates a Bayesian network. The algorithm tests if every pair of variables is conditionally independent given any other third variable. These tests were performed for every value of each assessed variable, thus producing exhaustive and computationally heavy tests. The resulting Bayesian network found can be seen in Figure 3.1. In this structure we can see that variables A and B influence the variable C and that variables C and E influence our target variable K. We also have a variable D that is neither influenced by, nor influences any other variable. Variable D was introduced to assess the likelihood of a variable that is not in the system becoming part of it, in the process of generating models that partially describe the phenomenon. This is the ground truth, the actual structure of the phenomenon that we are trying to understand. As many other real-world phenomena, there are many models that attempt to describe this phenomenon. Each model has its own limitations; some of the models do not consider a given variable, whereas others only consider smaller intervals of the values that the variables can assume, like many models in the literature that attempt to describe a given phenomenon by

39 Figure 3.1: Bayesian network of the phenomenon of the controlled experiment. focusing on specific variables and neglecting others. Thus, we created different classes of models that have different characteristics:

• Class 1: This class of model uses a smaller interval of values to one of the independent variables; variable A for example can only assume values between two and five. We also changed the rules of the model, in this case they are: C = 3A + B3 and K = 3C − 6E.

• Class 2: This class of model keeps the value of one of the independent variables fixed, thus assuming that it is irrelevant either to its analysis or to the overall dynamics of the phenomenon; variable B for example can only assume value three. We also changed the rules of the model, in this case they are: C = A − B2 and K = C − 3E.

40 • Class 3: This class of model keeps the values of two of the independent variables fixed, similarly to Model C but with more variables not being considered; variable A for example can only assume value two and variable E can only assume value five. We also changed the rules of the model, in this case they are: C = 2A + 4B and K = C2 + E.

• Class 4: This class of model choses a smaller interval of values to two of the independent variables, similarly to Model B, but more variables are involved in this restriction. For example, variable B can only assume values between one and six and variable E can only assume values between four and nine. We also changed the rules of the model, in this case they are: C = 2A − 3B2 and K = 5C + 2E.

For each class of model presented we created three scenarios. All scenarios of a given class of model have the characteristic of the class it belongs to; Class 1

1 1 1 for example has instances M1 (A = [2, 5]), M2 (B = [4, 7]) and M3 (E = [6, 9]). Then we performed exhaustive simulations in all combinations of values that each parameter can assume in order to create a joint probability table and, consequently, a Bayesian network of each model to assess the effect of each restriction on the structure of the Bayesian network. The results can be seen in Figure 3.2. As can be seen in Figure 3.2, the restriction on the interval of the values of variable A in Class 1 did not allow the algorithm to accurately identify the dependency between variable A and variable C. In the Bayesian network created from an instance of Class 2, on the top right, we see that the restriction on variable B made the algorithm incorrectly identify dependencies among all other variables and variable B. In the Bayesian network created from an instance of Class 3, the

41 Figure 3.2: Bayesian networks that represent the different classes of models that describe the controlled experiment phenomenon. On the top left is a Bayesian network generated from an instance of Class 1, on the top right Class 2, on the bottom left Class 3, and on the bottom right Class 4.

42 restriction of having two variables with fixed values had a similar effect to the ones produced in the instance of Class 2, in which all other variables had dependencies to the fixed variables; in this case variables A and E remained fixed. In the Bayesian network created from an instance of Class 4, two variables remained fixed, thus the algorithm was unable to correctly identify the structure, maybe due to the fact that the intervals considered were too short to identify the dependencies among the variables. Subsequently we started to merge the results produced by different models in order to construct a Bayesian network from the joint probability table of all analyzed models. Our hypothesis is that this merging reveals a different structure of dependencies that belong to neither each model and yet could show us which dependencies are spurious or accurate (that are present in the phenomenon). We choose a merging strategy that would provide insights into the merging of inter- classes and intra-classes of models. Examples of the merged Bayesian networks can be seen in Figure 3.3. We noticed that, for example, in the Bayesian network generated from merging the results from an instance of Class 1 and from an instance of Class 2, in the left of Figure 3.3, the new information introduced by the instance of Class 1 made all the spurious dependencies between the variables and the one kept fixed in the instance of Class 2 (variable B) disappear. However, the new information introduced by the instance of Class 2 made the merged Bayesian network recognize that variable A does influence other variables, which the instance of Class 1 did not show. In general, we noticed that dependencies that are accurate, i.e. those also present in the ground truth of the phenomenon (Figure 3.1), persisted in the new structure. By merging the results of different models, we noticed that some patterns emerged,

43 Figure 3.3: Bayesian networks obtained from merging the results of different models. In the left we have the Bayesian network obtained from merging the results from an instance of Class 1 and an instance of Class 2. In the center we have the Bayesian network obtained from merging the results from an instance of Class 1 and an instance of Class 3. In the right we have the Bayesian network obtained from merging the results from an instance of Class 2 and an instance of Class 4. and those patterns allowed us to not only distinguish between dependencies that are more likely to be present in the phenomenon from spurious dependencies but also to identify new dependencies. Hence we have a way to compare different models according to the number of correctly identified dependencies. More details about the proposed approach and the steps taken will be provided in the next sections. The merging actually revealed more information about the structure of the ground truth of the phenomenon than each model individually, and hence supported our main assumption. Our assumption is that the combination of the results of different models would produce a better representation of the phenomenon that they describe than each model individually. Such an assumption, often referred to as Weak Ensemble Assumption is not new; early work dates back to the 1960’s [20] and ever since it has been successfully applied to a variety of other situations.

44 3.2 Weak Ensemble Assumption

The idea that the combination of different solutions can often improve the overall success rate of the main solution has a statistical basis [207]. As pointed out by Granger [82], in early 1960’s a comparison study between many prediction methods was made [20]. The study showed that the Box-Jenkins modeling procedure outperformed many other methods at the time, and the natural step would have been to discard the methods that did not perform as well and only use the BJ method. However, the author of the study observed that a simple average of the two methods outperformed each of them. They concluded that to produce a superior prediction, both methods had to be suboptimal and use different inputs, thus each method would be optimal for its particular set of inputs. The difference between this approach and others is, that it not simply combines different training sets to produce a better trained method, but rather trains each model with its own input and combines their results. This idea has spread quickly due to its simplicity, pragmatism and superior performance [82]. methods attempt to infer a function from a subset Rn to a subset Rp (parent function) given a set of m samples of that function. An algorithm that attempts to guess the parent function, having as basis only the learning set m that consists of Rn+p vectors as a baxix, is called generalizer. Normally the performance of such a generalizer is measured through its learning accuracy, i.e. its ability to identify patterns from the learning set that can be applied to any other input. However, the Weak Ensemble Assumption was introduced to boost another kind of accuracy, a generalization accuracy. Instead of trying to maximize the patterns identified from the learning set, methods that use the

45 Weak Ensemble assumption are focused on achieving a good generalization of that problem [207]. Often such methods partition the input, assign each part to a different sub-solution, analyze the partial results and combine them to achieve a better global solution [136,180]. A widely known method that uses this approach is the AdaBoost algorithm [74]. It consists of generating a sequence of weak classifiers where, at each iteration, the algorithm finds the best classifier based on the current weights, then increases the weight of incorrectly classified samples and decreases the weight of correct ones. This step requires the algorithm to learn different aspects of the data at each iteration [115]. It has been reported that when that approach is applied in neural networks, given a population of weak estimators (estimators with results marginally better than random), it is possible to construct a combined estimator that is as good or better in the error sense than any estimator [167]. In neural networks, the ensemble paradigm has been used to gather various neural networks trained for the same task [195]. It is also well suited for parallel computation where parts of the solution are computed in different estimators at the same time [89]. In [114], the authors attribute the key role of the heterogeneity of the inputs to the better performance of their combined approach; they use different but overlapping datasets. The ensemble approach uses local minima to build improved estimates whereas other estimators are hindered by local minima. They are well suited to such cases due to the basic fact that selection of weights in neural networks is an optimization problem with many local minima. Methods that use global optimization strategies when faced with many local minima yield “optimal” parameter weights that differ greatly in each run of the algorithm, which constitutes a great deal of randomness

46 coming from different initial conditions [88]. This randomness tends to produce different errors in the networks, thus the networks will make errors on different subsets of inputs. Thus, it is assumed that the collective predictions produced by the ensemble are less likely to be prone to errors than the estimations made by any of the individual networks. In [88], such arguments are supported by experiments and they conclude that an ensemble can be far less fallible than any individual network. Such an approach has not only been used in neural networks but also in a variety of methods in diverse areas [218]. Facial recognition systems combine different machine learning methods such as decision trees. Such systems are based on the hierarchical organization and implement successive levels of abstraction in terms of information granularity [86]. An ensemble of neural networks and eigenfaces has also produced better results than methods that only use neural networks [97]. In optical character recognition it has been shown, that boosting weak estimators with errors of less than fifty percent can generate methods with arbitrarily low error rates [58, 87]. In [128], the authors use three different ensemble strategies to assess which one produces better results in optical character recognition. The first approach consists of a collection of neural networks that are trained with the same data but with different random initial weights. The second approach, called bagging, consists of a re-sampling technique for generating data for each classifier, where each dataset is randomly sampled with replacement from a bigger repository. In their final approach, individual classifiers are trained hierarchically to learn harder and harder parts of the classification problem. They argue that the best strategy depends on the number of patterns to be recognized. In the area of scientific image recognition multiple artificial neural networks with

47 different layers and nodes have been used to achieve expert level accuracy in radar images of volcanoes on the surface of planet Venus [39]. Others applications that use this paradigm include medical diagnosis [46, 217] and seismic signals classification [181]. In the scope of this work we use such assumptions to support our comparison of different disease spreading models; figure 3.4 illustrates how this approach is applied. Each model must have different inputs, i.e. different initial conditions that can produce different scenarios and dynamics. As mentioned previously, having different inputs is crucial for the different models to produce better generalization when merged. For example, as Model 1 was trained with Input 1 it can reproduce some real-world scenarios while others cannot (scenarios are represented as blue circles of different sizes). The same happened with Model 2, which received Input 2 (not the same set as Input 1) and is able to reproduce some real-world scenarios. Both models can reproduce some scenarios, whereas other scenarios can only be reproduced by one model. This relationship between the scenarios produced by each model illustrates the suboptmality of each model, i.e. there are some scenarios that one model is able to reproduce and whereas others fail to do so. Thus, if we are able to guarantee such conditions, our combination of the results of each model should produce better representation than each model individually.

48 Input

Input 1 Input 2

Model 1 Scenario

Model 2

Figure 3.4: An illustration of the how weak ensemble approach fits our work.

3.3 Proposed Methodology

As mentioned in the previous chapters, disease spreading is an important phenomenon and a major source of concern of many countries. Formally, disease spreading models are models that describe the propagation of clinically evident illnesses resulting from the presence of a pathogenic microbial agent [132]. In the scope of this work we are interested in the set of disease spreading models that can be formally defined as:

• A graph G = {V,E} where:

V = {v1, v2, ..., vn} = set of individuals |V | = n = number of individuals n > 1

E = {e1, e2, ..., em} = set of interactions

49 |E| = m = number of interactions

n(n−1) n − 1 ≤ m ≤ 2

ei = {vj, vk} = {vk, vj} such that

vj ∈ V ; vk ∈ V ; 0 < j ≤ n; 0 < k ≤ n; j 6= k for all i ≤ m

• A set of states S = {s1, s2, ..., sr} |S| = r = number of states r > 1

• A set of nodes’ states U = {τ1, τ2, ..., τn} |U| = n = number of nodes’ states

τi(t) = τ(vi, t) = state of node i at time t

τi(t) = Φ(τi(t − 1),Pi(t − 1)) ∈ S for all i ≤ n where

i i i Pi = {p1, p2, ..., pl} = set of states of the neighbors of node i

|Pi| = l = number of neighbors of node i 0 < l < n

i pa(t) = τh(t) ∈ S such that {h, i} ∈ E for all h ≤ n for all a ≤ l

Various disease spreading models have been created that focused on understanding the forces that drive many real-world behaviors of such diseases. The models describe different kinds of diseases, most of them grouped according to their ability to confer immunity to the individuals that acquire it. From the group of diseases that do not confer immunity to the individuals that acquire it we highlight SIS models. Many variations of this kind of modeling have been proposed, such as that used by Pastor-Satorras and Vespignani [161]. Their model consists of a SIS model in which at each time step each susceptible node is

50 infected with probability v if one or more of its neighbors are infected, and infected nodes heal and become susceptible again with probability δ. Another variation of a SIS model is used by Kuperman and Abramson [117]. In this model the probability of a susceptible node becoming infected is proportional to the number of neighbors who are infected at a given time and the amount of time it takes before healing is determined by a system parameter. Another peculiarity of this model is the presence of a refractory state where nodes that were infected remain refractory during a specified period of time before returning to the susceptible state. In this state a node can neither infect nor become infected. In Javarone and Armano [100] the authors introduce a model in which each infected node has probability v of infecting one of its neighbors, if this neighbor is susceptible. Each node has its own fitness parameter, introduced by the authors, that defines how fast it can heal over time. This characteristic increases the healing likelihood over time, and each node has its own curve of healing likelihood. Another variation of a SIS model is presented by Xu and Chen [213] in which they introduce the concept of a delay. This concept introduces a time interval in which an infected node after being exposed to the disease is able to infect others. This delay might be due to the latency of the host or the latency of the disease itself. SIR models represent a group of diseases that confer immunity to the individuals that acquire them. The basic idea of this kind of model is presented by Murray [144]. In this model all susceptible nodes in the network have the same probability v of becoming infected once in contact with an infected neighbor, and all infected nodes have the same probability δ of healing. A variation of this model was used by Zanette [214]. In this model, the infection spreads more slowly. At each time step a randomly chosen infected node contacts one of its neighbors; if this

51 neighbor is susceptible then the latter becomes infected. However, if the neighbor is either infected or refractory the former becomes refractory. Using the concept of memory, Dodds and Watts [55] introduced a model in which individuals have a cumulative memory of their contacts with infected individuals. Each node has its own infection threshold that accounts for the number of contacts with infected individuals necessary for it to become infected. Only after this threshold is reached a susceptible node becomes infected. In Ma et al. [125] the authors present a SIR model that introduces a delay in the infection; whether due to a latency in the host or of the disease itself. Thus, a given susceptible node that was in contact with an infected one only becomes infected after some period of time, and not immediately after the contact. A comparison of disease spreading models based on agents with models based on differential equations was presented in [174]. The comparison presents the differences between the dynamics produced by the two types of models. The authors state that differential equation models produce a single trajectory for each parameter set, while stochastic models produce a distribution of outcomes. The comparison showed that the network topology and individual heterogeneity affect the dynamics, e.g. the clustering present in small-world networks slows the diffusion. The spreading dynamics differ for several metrics, such as speed of diffusion, peak of infected individuals, and total number of individuals that acquire the disease. As we can see, over the years many models have been proposed and analyzed. Although they differ in behavior and contributions they all aim to describe one common phenomenon, namely disease spreading. Each model has its own peculiarities, and in the process of designing the model some decisions and

52 assumptions were made for the respective model to better suit the desired analysis task. Such modeling decisions affect the dynamics of the spreading in different ways; in some cases they produce spurious behaviors that do not reflect the real- world phenomenon. As we have shown in our controlled experiment, combining different results from different models can present more information than each model individually and provide a way to compare them. This approach is only possible with the use of the Bayesian framework. Most of the research efforts related to Bayesian networks have focused on learning their structure from statistical data using batch algorithms [145]. Such approaches evaluate all of the data in order to identify statistical dependencies. However complete data are not always available; maybe not all could be stored, or new data are expected at some time in the future. In some cases, it is unfeasible to acquire or to simulate all data. To address such issues, incremental algorithms were developed that use previous experience and new data to refine the structure of the network [35]. The main problems in this situation arise when the new data have different, partial or new attributes [216]. The natural solution would be to construct the network with the available data and then use this structure as a prior probability to construct a new network [118]. As expected, this new structure is similar to the old one, once it relies on the fact that the first one is already a good accurate model. Such an approach can have unwanted consequences, i.e. that the new network is biased towards the old structure. In Friedman et al. the authors claim that the accuracy of incremental learning algorithms involves a tradeoff between the quality of the learned network and the amount of information that is maintained about past observations, and propose

53 different strategies to deal with this issue [76]. Their first strategy uses a naive approach that stores all previously seen data and repeatedly executes a batch learning algorithm after each new example is recorded. As it uses all information provided so far it is optimal in terms of the quality of the graph generated, however the extensive and repetitive use of batch algorithms leads to a heavy computation effort and memory allocation [10]. Their second approach, called Maximum A Posteriori probability (MAP) avoids this overload of previous data by summarizing data using the model already created, similar to the one proposed in Lam and Baccuhs [118]. In their last approach, which they call incremental, they provide a middle ground between the previous two approaches. It focuses on keeping track of just enough information to make the next decision in the search process, by maintaining a set of network candidates. When new data arrive, the procedure updates the information stored in the memory and invokes the search process to check whether if one of the network candidates is deemed more suitable than the current model or not. Another approach to incremental learning is based on sufficient statistics [75]. Sufficient statistics rely on the fact that it is possible to discard the actual data once sufficient statistics about them have been gathered, and only use the collected information instead of the actual data [10]. Those algorithms rely on the fact that a new data entry alone does not provide any information, it is only useful within the context of previous information, and not necessarily the actual data but rather their statistics that are useful [104]. The possibility of merging knowledge of different Bayesian networks as been thoroughly researched. One example is the work by Santos Jr. et al. [105]; they focus on Bayesian Knowledge Bases (BKBs) which are generalizations of Bayesian

54 networks in which dependencies are defined between states of random variables instead of between random variables, directed cycles are allowed, and the specified probability distribution does not need to be complete [176]. Their approach is based on the addition of new nodes to new input fragments once the probability distribution is not complete; moreover dependencies are drawn among the values of the variables not the variables themselves. Matzkevich and Abramson propose a strategy to fuse Bayesian networks that creates a new network representing the “consensus” of the experts who created the input networks instead of preserving the knowledge encoded by each expert, thus much information is lost in the process [133]. Other strategies include dividing the system in a set of small subsystems that deal with parts of the bigger problem [90]. Such approaches, mostly used in medical research, rely on the fact that a distinct local network is then devised for each subproblem, and the set of local networks is combined into a single global network. The fundamental principle guiding this strategy is graph union, in which the global network contains all of the nodes and all of the arcs of all of the local networks. Using the opposite strategy, Bonduelle et al. [30] use graph intersection to identify the source of disagreement among experts in a decision-making context. The authors begin by identifying the core set of nodes and edges common to all of the input network, then a combination of behavioral and mechanical techniques is used to refine the core to a more meaningful consensus structure. Jiang et al. [103] propose a framework for fusing multiple PGMs and integrating new ones over time, however their approach can require non-trivial changes to the input of the networks, such as the reversal of edges to avoid the creation of cycles and an ad-hoc addition of variables in order to get the conditional

55 probability tables in a state in which they can be combined to form a single set of joint probability tables. As pointed out by Da Costa and Laskey [48], Bayesian networks, although very useful in many contexts such as visual recognition [24] and language understanding [38], have serious limitations such as the assumption of a simple attribute value presentation, i.e. each instance of a given problem reasons over the same fixed number of attributes and only the evidence values change in each instance. In order to overcome such problems, the authors introduce a Multi-Entity Bayesian system [211] based on first order logic. They argue that first order logic is able to represent entities of different types interacting with each other in varied ways. In their system random variables represent features of entities and relationships among entities; the knowledge about attributes and their relationship is expressed as a collection of fragments. Each fragment represents a conditional probability distribution for instances of its resident random variables given their parents in the fragment. Each node in a fragment may have a list of arguments that could potentially represent different aspects of the reasoning, such as time and object types. In summary their system is very complex and can represent any real-world situation because of the use of first order logic, probability distribution, and a Boolean function to identify inputs. In this chapter we focus on the development of a methodology to compare disease spreading models with a merged Bayesian network as basis. This new network enables us to compare different models from the literature, and assess how the modeling decisions affect the dynamics of the spreading. The main idea is to create a merged Bayesian network from individual results of each model by merging them, thus this analysis can be performed through the changes in the

56 structure of the network. In this chapter, we attempt to fill the gap presented in the previous chapter. The proposed methodology can be seen in Figure 3.5 and is summarized in the following steps:

1. Identify the set of variables that influences each model’s dynamics, Xi =

{x1, x2, ..., xn}.

2. Define the set of metrics of the dynamics, Y = {y1, y2, ..., yn}.

i k 3. For each xi, such that xi ∈ X and xi ∈/ X , identify the values that this variable assumes in Xk so the joint probability table that is created have all

xi considered in all models analyzed.

4. Implement and run experiments in the scenarios reported by each model analyzed in order to create a joint probability table with the set X and Y , such that X = Xj ∪ Xk∀j∀k.

5. From the joint probability table of the results of each model create a Bayesian network using PC search algorithm.

6. Balance the joint probability table of each model in order to adjust the scenarios in each table to the same number.

7. Merge the joint probability tables of each model into a single one and create a Bayesian network of the merged table using PC search algorithm.

8. Compare the structure of the merged Bayesian network with the structure of the Bayesian networks of each model and rank each model according to

57 the number of dependencies that are also present in the merged Bayesian network.

We are going to provide details about each step mentioned above and explain its role in the proposed methodology.

• Step 1: To be able to construct the Bayesian network of a given model, we must first identify the possible variables that influence that model. Different models, although describing the same phenomenon, can have different sets of variables that influence their dynamics. Such a set is normally presented by the authors of the model in their experiments. We only consider those parameters of the system that change in their experiments as variables. For example, a scientist who attempts to develop a model to describe the acceleration of objects on Earth has to consider many variables, such as the weight of the object and, the air resistance, among others. However, as all his experiments are performed on Earth he does not realize that gravity is an important aspect that influences the acceleration, thus he just marginalizes that variable and assumes that it will always have the same value, Earth’s gravitational pull. Thus, the set of values of the parameters chosen by a given author of a model has great importance in the definition of the model itself. It reveals which variables were considered and which were not.

In order to compare different models, they must share some common features. Such features, here seen as parameters of the model, define specific assumptions made by the authors when they created the models. The choice of the set of variables must be made carefully in order to preserve the Causal Markov Assumption. It states that no external variable can influence the

58 Step 1 Step 2 Step 3 Define Each Define Metrics of Identify Missing Model’s Variables Dynamics Values Model 1 Model 2 Model 1 Model 2

Y1 X1 X1 X1 X1

X2 X Y2 2 x2

X3 x3 X3 … X X 4 4 X4 X4

X5 X Yn x5 5

Step 4 Step 5 Implement and Run Experiments Model 1 Create Bayesian Networks Simulation(){ Table 1 Table 1 BN 1 If(x=true){ var1 var2 var3 avg var1 var2 var3 avg X1 X3 for(int i=0;i

Step 6 Step 7 Balance the Tables Merge the Tables Table 1 Table 1 Table 1 var1 var2 var3 avg var1 var2 var3 avg var1 var2 var3 avg Merged Table V1.1 V2.2 V3.1 0.32 V1.1 V2.2 V3.1 0.32 V1.1 V2.2 V3.1 0.32 var1 var2 var3 avg V1.2 V2.1 V3.2 0.98 V1.2 V2.1 V3.2 0.98 V1.2 V2.1 V3.2 0.98 V1.1 V2.2 V3.1 0.32 V1.3 V2.3 V3,3 0.53 V1.3 V2.3 V3,3 0.53 V1.2 V2.1 V3.2 0.98 Table 2 Table 2 Table 2 V1.3 V2.3 V3,3 0.53 var1 var2 var3 avg var1 var2 var3 avg V1.1 V2.2 V3.1 0.32 var1 var2 var3 avg V1.1 V2.2 V3.1 0.32 V1.1 V2.2 V3.1 0.32 V1.2 V2.1 V3.2 0.98 V1.1 V2.2 V3.1 0.32 V1.2 V2.1 V3.2 0.98 V1.2 V2.1 V3.2 0.98 V1.3 V2.3 V3,3 0.53 V1.2 V2.1 V3.2 0.98 V1.3 V2.3 V3,3 0.53 V1.3 V2.3 V3,3 0.53 V1.4 V2.4 V3,4 0.71 V1.3 V2.3 V3,3 0.53 V1.4 V2.4 V3,4 0.71

Step 7 Step 8 Table 1 Merge the Tables and Create Merged BN Analyze BN’s var1 var2 var3 avg Merged Table V1.1 V2.2 V3.1 0.32 var1 var2 var3 avg Structure V1.2 V2.1 V3.2 0.98 V1.1 V2.2 V3.1 0.32 Merged BN V1.3 V2.3 V3,3 0.53 V1.2 V2.1 V3.2 0.98 X1 X X 3 BN 1 BN 2 Table 2 V1.3 V2.3 V3,3 0.53 2 var1 var2 var3 avg V1.1 V2.2 V3.1 0.32 Y1 V1.1 V2.2 V3.1 0.32 V1.2 V2.1 V3.2 0.98 Y2 Merged BN V1.2 V2.1 V3.2 0.98 V1.3 V2.3 V3,3 0.53 V1.3 V2.3 V3,3 0.53 V1.4 V2.4 V3,4 0.71 V1.4 V2.4 V3,4 0.71

Figure 3.5: Steps of the proposed methodology.

59 system other than the set already chosen; otherwise the reasoning using the created Bayesian network will not be consistent.

• Step 2: In order to merge different models, even SIS and SIR models, which have different analyses (endemic and epidemic, respectively) one needs to define a set of common measures of the dynamics to be able to capture all these behaviors. This step restricts the models used in the analysis; for example, it would not be feasible to compare disease spreading models with human mobility models because they produce fundamentally different dynamics and cannot be measured by the same set of metrics. If two different models cannot have the same set of measures to capture their dynamics, they cannot be merged and consequently cannot be compared.

This step is important to the proposed methodology, because it is through the defined metrics that we can assess how the modeling decisions affect the spreading dynamics. The straightforward approach to choose such measures is to carefully analyze the contributions of the compared models and define common measures that are able to describe such findings.

• Step 3: As mentioned previously, each model has its own set of variables

i that influence the dynamics, X , thus it is not rare that a given variable xi,

i k such that xi ∈ X and xi ∈/ X . Thus, we need to identify which values

k xi assumes on the model X , because in order to merge different results of different models they must share the same variables, although some models consider some variables and other models consider a different set, there is a

value that xi assumes when it is marginalized, and that value will be very important when merging different model’s results.

60 • Step 4: Once we have defined the set of common variables of the system that include the parameters of the models and the measures that capture the dynamics, we should implement the models analyzed in order to create a joint probability table with all the values of X and Y , so that X = Xj∪Xk∀j∀k. In order to create the broadest table possible, i.e. one that covers the broadest set of scenarios and parameters, it is necessary to run exhaustive experiments in all combinations of the set of values of the parameters Xi chosen in each

j model. For each parameter xi there are xi values that it can assume, thus one must at least perform ij experiments to cover all the scenarios of the chosen values.

The implementation of the analyzed models and the consequent creation of the joint probability table is an important step of the proposed methodology, because without the table it is impossible to create a Bayesian network of the model, which constitutes the core of the proposed methodology.

• Step 5: In order to create a merged Bayesian network that will serve as basis to analyze each individual model based on its own Bayesian network we need first to create the Bayesian network for each analyzed model. Individual Bayesian networks will reveal how the parameters of the model influence the set of defined measures, i.e. which parameters are necessary to predict certain scenarios and results. In this work we use a widely known algorithm to create such Bayesian network, the PC search algorithm [187]. The algorithm tests if every pair of variables is conditionally independent given any other third variable. These tests are performed for every value of each variable assessed, thus producing exhaustive and computationally heavy tests.

61 • Step 6: Due to the exponential combinatorial growth of the set of scenarios that will be simulated and the number of parameters and values chosen, different models can produce very different set of scenarios. For example, a given model with three parameters where each parameter can assume four values, will produce 81 scenarios. A second model with five parameters where each parameter can assume six values will produce 15,625 scenarios. Merging the results of both models will lead to the statistical dependencies present in the results of the second model overwhelming the ones of the first model as there are many more scenarios in the second set.

In order to avoid such problems, one must down-sample the table with the higher number of scenarios. Down-sampling is a technique widely used in machine learning methods, it randomly chooses data points from the bigger class in order for both classes to have the same size [115]. In this case the number of samples to be selected is exactly the number of scenarios of the smaller table, thus both tables will have the same number of scenarios.

• Step 7: In order to be able to make assertions about the accuracy of the dependencies of a given model we need to have the merged Bayesian network generated from the results of all analyzed models as basis. Thus, it is necessary to create a single joint probability table by merging individual ones and running the PC search algorithm to uncover the Bayesian network from the merged scenarios.

• Step 8: In this step we analyze the structure of the merged Bayesian network generated in order to understand what it might reveal about the structure of the individual Bayesian networks of each model. Now that we have our

62 merged Bayesian network, and assuming that it represents a more accurate representation of the phenomenon than each model individually under the Weak Ensemble assumption, we can use it as a basis for our comparison. Thus, as the merged BN is more accurate as regards to the real phenomenon than each model, the model with the more similar BN structure to the merged BN is more accurate. Thus, for each edge that is present in the BN of a given model and also present in the merged BN, the model scores one point. At the end, the model that has more points is more accurate than the others.

In the next chapter we perform comparative analyses of different disease spreading models using this proposed methodology in order to validate them and compare the analyzed models. We use SIS models and SIR models presented in the literature and transcribe their contributions and findings in our terminology so they can be properly analyzed.

63 Chapter 4

Model Analyses

In this chapter we perform two different comparative analyses, one for SIS models and the other for SIR models, using disease spreading models presented in the literature and applying the methodology proposed in Chapter 3. We also extended the SIS Model Analysis by introducing real-world data on the spreading of a SIS disease. Before applying the methodology, we are going to present a brief introduction of the models used, how they work, and what contributions they make.

4.1 SIS Models Analysis

First, we are going to present two SIS models used in this analysis and their peculiarities. In order to better discuss them we are going to refer them as:

• Pastor-Vespignani (PV) Model: Basic SIS model in which at each time step each susceptible node is infected with probability v if it has one or more infected neighbors, and infected nodes heal and become susceptible

64 again with probability δ. As reference we are going to use the work of Pastor-Satorras and Vespignani [161]. They focus their experiments on the prevalence and persistence of infected individuals specifically in small-world and scale-free networks. In order to do so, they use a SIS model with the rules presented in Eq. 2.4, in which all susceptible nodes have the same probability v of becoming infected if they are in contact with one or more infected individuals and after being infected all have the same probability δ of healing and returning to the susceptible state.

• Kuperman-Abramson (KA) Model: In this model the probability of a susceptible node becoming infected is proportional to the number of neighbors that are infected and the amount of time it takes before healing is determined as a system parameter. Another peculiarity of this model is the presence of a refractory state where nodes that were infected remain infected during a specified period of time before returning to a susceptible state; in this state a node can neither infect nor become infected. We are going to use the work of Kuperman and Abramson [117] as reference. The focus of their work is on the impact of the variation of the rewiring probability p of Watts-Strogatz model on the evolution of disease spread, ranging from low-amplitude at small p (endemic scenario) to wide amplitude at large p, resembling periodic epidemic pattern typical of large populations.

4.1.1 Analysis

In order to make a comparative analysis of the PV and KA model we are going to use the proposed methodology, step by step.

65 Step 1

As reiterated in Chapter 2, many different disease spreading models have been proposed. In this work we are interested in a specific group of them that meet the following criteria:

• Cellular Automata: the model should divide the population in a set of finite, discrete and mutually exclusive states, in which the transition between states happens following predetermined rules.

• Network-Based: individuals of the population analyzed must be represented as nodes in a network and the infection can only propagate through the edges connecting the nodes, representing either social interactions or sexual contacts.

• Local Description: the behavior of the infection should be described locally, as a function of the own state of a node and its neighbors.

• Discrete Time: the time involved in the simulation should be discrete and small enough to not account for births and deaths in the population.

As mentioned previously, Bayesian networks identify statistical dependencies among variables of a system thus allowing the use of queries to better understand its behavior. Therefore, it is necessary to identify each model’s possible variables that can potentially affect the results. Given the above criteria we identified the following properties that can be present in the models:

• Network: One of the parameters that has been reported to be relevant in disease spread dynamics is the topology of the network of social or sexual

66 contacts, i.e. its pattern of connectivity. In this work we chose four most common topologies: small-world networks [203], scale-free networks [19], random networks [63] and grids.

• Number of Nodes: Commonly referred to as |N|, defines the size of the network in terms of individuals in the population.

• Density of Edges: Defines the number of edges in the network based on the number of nodes, it can assume values between |N| (sparse) and |N|2 (dense) and represents the contacts among the individuals in which the disease will propagate. The higher the number of edges of a node (degree) the higher the chance to acquire the disease.

• Input: Defines the proportion of nodes that will start the simulation infected; this group is often selected randomly. This parameter can assume values from zero to one.

• Infection Rate: Defines the rate at which susceptible individuals become infected. Depending on the model used it can vary from a fixed probability to one proportional to the number of infected neighbors.

• Healing Rate: Specifies the rate at which infected individuals leave that state. Depending of the model used it can assume different forms, from a fitness parameter for each individual to a specific number of time steps.

The choice of such parameters must be made carefully because of the Causal Markov Assumption, which states that all possible variables that can influence the set of variables already chosen is within the set, i.e. there is no external variable that could influence a variable of the chosen set [188]. Thus, any variable

67 of the system that could potentially influence the results should be taken into consideration. Analyzing the experiments performed by PV model we have identified that its set XA is composed of: XA={Network, NumNodes, Input, InfRate, HealRate}. The authors of the PV model did not consider different densities of the network, i.e. they did not vary the intensity of the interactions between the individuals of the populations. Perhaps that was not their focus, or they assumed that these variables did not influence their results. In the KA model, the authors used the following set of parameters XB to perform their experiments: XB={Network, NumNodes, Density, InfRate, HealRate}. They did not consider different initial inputs of seeds of infected individuals. They assumed that this variable did not influence the results, because they were more interested in long term synchronization of the population and the initial configuration has little importance with respect to long-term behavior.

Step 2

In order to understand the dynamics in disease spreading simulations we need to define several measures that will capture different perspectives of such dynamics and will serve as basis to understand how the different parameters of the system can influence real-world scenarios. By carefully analyzing existing studies, we noticed that different measures are necessary, depending of the type of model used. For example, in the analysis of the SIS model the focus is on its endemic characteristic and the influence of certain aspects of the system on that behavior, i.e. most analyses are related to the stability of a density of infected nodes. Conversely, in SIR models, most analyses focus on the epidemic aspect of the spreading, trying

68 to understand how fast a disease can spread or what proportion of the population can become infected before the disease dies. Therefore, two fundamentally different analyses require different measures in order to capture different perspectives of the spreading phenomenon. In this analysis we are going to use both SIS models thus we are going to use the following measures to capture the endemic nature of such modeling: the average of infected nodes over time (average), the variance of infected nodes over time (variance), the peak of infected nodes during the simulation (peak) and the last density of infected nodes before the simulation ends (last). For example, in a simulation with four time steps, at time t0 the density of infected nodes, denoted as ρi = .3; at time t1, ρi = .5; at time t2, ρi = .6 and; at time t3, ρi = .4. Thus, the average of infected nodes is average = .45, the variance of infected nodes is variance = .0125, the peak of infected nodes is peak = .6 and the last density is last = .4.

Step 3

By analyzing each model’s set of parameters, we noticed that, from the union of both sets, the PV model does not consider the density of edges and the KA model does not consider the initial input of seeds of infected nodes in their respective experiments. However, as any network-based model, there is some value that both models used in their experiments for those missing parameters to simulate and produce their results. The PV model uses the density of three neighbors in scale-free and random networks in all experiments. The KA model initially assumes that ten percent of the individuals are randomly infected in all experiments. The identification of the values that such parameters assume,

69 although they are not considered variables, is very important for merging tables with different parameters.

Step 4

In order to create individual Bayesian networks of each model we performed exhaustive simulations using different values of each variable. For each considered variable we used the values reported by the authors on their respective experiments. Each table will have the same set of columns Y and X, where X = XA ∪XB, filling the missing attributes with the values identified in the previous step. Thus, for

k each variable xi that has k values we performed a combination of xi experiments to cover the broadest set possible scenarios in each model, thus creating a joint probability table of each model.

Step 5

Once the joint probability table was created we used the PC search algorithm [186] to create the Bayesian network of each model analyzed. We used the tool TETRAD [179] to construct the Bayesian networks. The resulting Bayesian network of PV model can be seen in Figure 4.1 and the resulting Bayesian network of KA model can be seen in Figure 4.2.

Step 6

Due to the set of parameters and their values chosen by the authors of each model, each table has 7500 data points and 11250 data points for the PV and KA model, respectively. Each possible combination of parameters was simulated ten times. Using the down-sampling strategy we sampled randomly and without repetition

70 pastor Network Avg NumNodes

Variance Density

Peak Input

Last InfRate

HealRate

Figure 4.1: The Bayesian network of PV model, generated by implementing the model used in [161].

Kuperman Network Avg NumNodes

Variance Density

Peak Input

Last InfRate

HealRate

Figure 4.2: The Bayesian network of KA model, generated by implementing the model used in [117].

71 7500 samples of the KA model table, so both tables have the same number of data points. The Bayesian network generated from the sampled table can be seen in Figure 4.3. A reduction of 33% of the table led to a change of 12.5% in the dependencies (five). The ones colored blue were added and the ones colored red were removed. We are going to use this structure in our comparison.

Step 7

Once both analyzed datasets were balanced, we merged them into a single dataset and used the PC search algorithm to identify the Bayesian network that corresponds to the graphical structure of the system’s dependencies of the merged scenarios. The resulting Bayesian network can be seen in Figure 4.4.

Step 8

Now we have created the Bayesian network of each model, we can use the structure of the merged scenarios to analyze potentially accurate and spurious dependencies in each model. As mentioned in the previous chapter, by analyzing each possible outcome of the merged scenario between two variables we are able to better assess the likelihood that a given dependency in one model is either accurate or spurious. Figure 4.5 presents the structure the Bayesian network of each model and the Bayesian network of the merged scenarios. Each row colored red indicates the presence of an edge between the two variables of the row and column. The number in each cell represents the corresponding scenario of the merged outcome. For example, in the KA model there is a dependency between the variable Network and the variable Avg, however the PV model does not identify such a dependency; while in the merged Bayesian network this dependency was identified. Thus, it

72 KupermanSampled Network Avg NumNodes

Variance Density

Peak Input

Last InfRate

HealRate

Figure 4.3: The Bayesian network generated from the sampled table of the results of KA model. Edges colored red were removed and edges colored blue were added.

Merged Network KupNPastor Avg NumNodes

Variance Density

Peak Input

Last InfRate

HealRate

Figure 4.4: The resulting Bayesian network created using the merged dataset from PV and KA model.

73 Model PV Model KA Model Merged Avg Variance Peak Last Avg Variance Peak Last Avg Variance Peak Last Network 0 1 0 0 1 1 1 1 NumNodes 1 1 1 1 1 0 1 1 Density 0 0 0 0 0 1 1 1 Input 1 1 1 1 1 0 1 1 InfRate 1 1 1 0 0 1 1 1 HealRate 1 0 0 1 0 1 1 1 Avg 1 0 0 1 1 1 0 0 Variance 1 1 0 1 0 1 1 1 Peak 1 1 1 1 0 0 1 1 Last 1 1 0 1 1 1 0 1

Figure 4.5: Table representing the Bayesian structure of each model and the merged one. Each cell which is red colored indicates the presence of an edge between the two variables of the row and column. Cells colored in yellow represent variables that were not considered in that model. The number in each cell represent the corresponding points scored in that edge. constitutes an example of scenario number three in our analysis, which indicates that the higher likelihood is that this dependency is actually an accurate one and the KA model is more accurate by identifying dependencies than the PV model. Figure 4.6 shows the structure of the Bayesian network of each model; the edges are colored to reflect accurate or spurious dependencies. Overall the KA model has identified more accurate dependencies than the PV model; the percentage of accurately identified dependencies in each model is .65 in the PV model and .725 in the KA model, Figure 4.7. We can see that although the KA model identified more spurious dependencies (red edges) it also identified more accurate dependencies (green edges). The problem with the structure of the Bayesian network of the PV model is that its structure failed to identify many dependencies that were accurate and did not show dependencies that were not accurate. For example, the variable Network affects all the measures yi of the

74 pastor Kuperman Network Network Avg Avg NumNodes NumNodes

Variance Variance Density Density

Peak Peak Input Input

Last Last InfRate InfRate

HealRate HealRate

Figure 4.6: The Bayesian network of each model with the dependencies colored to reflect accurate (green) and spurious (red) dependencies. The Bayesian on the left represent belongs to PV model and the Bayesian network on the right belongs to the KA model dynamics in the merged scenario, but the PV model was only able to identify one of these dependencies. This might have happened due to the set of restricted values that this variable assumed. The choice of the authors of the PV model to not consider different densities for the networks of disease spreading made their Bayesian network unable to identify that such variables in fact affect the dynamics of the spreading. In contrast, the KA model was more accurate in neglecting the variable input, which had little effect on the overall dynamics of the spread. We are only able to assess the impact of such modeling decisions on the dynamics because of the methodology of analysis proposed here.

4.1.2 CDC Data Analysis

The methodology proposed in this work aims to provide a comparison between two models, based on their accuracy to describe a given phenomenon. In order to

75 1

0.275 0.8 0.35

0.6

0.4 0.725 0.65

0.2

0 PVModel A Model KModel BA Model

Accurate Spurious

Figure 4.7: The proportion of accurate and spurious edges identified in each model analyzed. 2 make this analysis more robust we are going to introduce real world data from a real SIS disease, create its Bayesian network and then compare the four network structures. Thus, we will have a way of validating the results found in the previous section. We are going to use data from the Center for Disease Control and Prevention (CDC) which is an institution that monitors diseases connected to Department of Health and Human Services of the US government [1]. Among the diseases monitored by the CDC we choose chlamydia [2]. Chlamydia has been under the surveillance of CDC since 1984, thus there are large data available. It is a sexually transmitted disease caused by bacteria and the most common notifiable disease in the United States [192]. It is usually asymptomatic in women [189]. It is the most prevalent of all STDs throughout the world [160], and since 1994, has comprised the largest proportion of all STDs reported to CDC. Among sexually active women

76 aged between 16 and 24 years chlamydia screening increased from 23.1% in 2001 to 47% in 2014, more than double [129]. Chlamydia is a SIS disease. We should also look for data about a SIR disease to incorporate in our work. The problem is that the data available on SIR diseases have a similar behavior to SIS diseases (endemic). In other words, the real-world data available on SIR diseases incorporate many real-life dynamics that are not present in the models used in this work [117,139,146,161]. Such dynamics include, for example, a high rate of change among the individuals of the population. Such change can happen due to dying refractory individuals who are replaced by new individuals that are susceptible to that disease, thus ensuring that the spreading phenomenon will not have a peak followed by eradication behavior. Similar dynamics occur due to immigration and emigration behavior in countries and regions where the data are collected; such behavior also contrasts with the closed population simulations that are used in this work. Thus, in this case, although each individual can only acquire the disease once, the variation on the individuals of a given population makes it look like an endemic disease (a behavior common to SIS diseases). Another fact related to SIR diseases is that there are vaccines against most diseases and/or they have been eradicated in most countries. Examples include polio, smallpox, hepatitis B, measles and mumps [4], thus, their simulation must account for this factor of decrease in the spreading due to vaccination. The data provided by the CDC about chlamydia are organized in various ways, e.g. by region, state, county, age, gender or race. The data organized by states can be seen in Figure 4.8 and by county in the state of Florida in Figure 4.9. In this work, we are going to use the data organized by county because it is the dataset

77 with the highest degree of granularity and comprises a smaller group of individuals than the other datasets available. In order to produce a Bayesian network equivalent to the other networks studied here we must extract the same parameters used in the analyzed models and the same metrics used to capture the dynamics of the simulations from this dataset, thus some transformation in the data is necessary; for example, to get the size of population we need to transform the data about the name of the county into its respective population in that following year. One example is the Palm Beach county in Florida that had 1,443,810 residents in 2016, 1,421,843 residents in 2015 and 1,397,526 residents in 2014 (according to the United States Census Bureau [5]), so whenever we use the data about Palm Beach county we are going to assume these numbers to represent the actual population. For the purpose of this study we only considered the six Florida counties presented in the dataset: Miami-dade, Broward, Hillsborough, Orange, Duval and Palm Beach. Another example of a transformation used here is related to the variable input, which accounts for the initial percentage of the population that is infected. In this case we had to adapt the information provided by the dataset that is based on annual data. As an adaptation to this information, we defined that the value that this variable assumes in each year is the percentage of infected individuals in the last year. For example, in 2015 the input variable is the percent of individuals that had the disease at the end of 2014. After all modifications on the dataset were made, we used the dataset modified as input to generate its equivalent Bayesian network in TETRAD. The resulting network can be seen in Figure 4.10. As we can see the variable that most influence the metrics used is the Number of Nodes. Variable density, which accounts for the

78 Figure 3. Chlamydia — Rates of Reported Cases by State, United States and Outlying Areas, 2015

NOTE: The total rate of reported cases of chlamydia for the United States and outlying areas (Guam, Puerto Rico, and Virgin Islands) was 475.3 cases per 100,000 population.Figure 4.8: The total rate of reported cases of Chlamydia for the United States and outlying areas. Figure taken from [68].

Figure 4. Chlamydia — Rates of Reported Cases by County, United States, 2015

Figure 4.9: The total rate of reported cases of Chlamydia by county in the state NOTE:of Florida.Refer to the NCHHSTP Figure Atlas for takenfurther county-level from rate [68]. information: https://www.cdc.gov/nchhstp/atlas/.

STD Surveillance 2015 National Profile: Chlamydia 11

79 density of connections among individuals, has no edge, mostly because the dataset provided by CDC does not contain any information about this variable, and we did not find any study showing the correlations between the size of the network and its density in real-world cities. Thus, this variable assumed the same value for all entries. In order to use the Bayesian network created with real-world data of chlamydia as reference, to compare the previously created Bayesian networks created (Figure 4.1, Figure 4.3 and Figure 4.4)), we analyzed the number of accurate and spurious edges in these networks. In this case, if an edge connecting the same pair of nodes is present in both networks we consider it as an accurate edge. If the edge is present in the Bayesian network from PV, KA or merged model and is not present in the Bayesian network created from real data we consider it as spurious. If the edge is present in the Bayesian network created from real data and not present in the Bayesian network from the PV, KA or merged model, we assume it is missing. Figure 4.11 shows the proportion of accurate and spurious edges in each Bayesian network of this analysis. As we can see, the Bayesian network generated from the merged dataset of the results of the PV and KA models, is the one that shows the higher proportion of accurate edges, i.e., it has a topology that is more similar to the Bayesian network generated from real data than the Bayesian network generated from the individual results of each model. The Bayesian network generated from real data about chlamydia is the closest approximation that we have from the ideal Bayesian network that represents disease spreading in general. Thus, the results shown in Figure 4.11, corroborate with our main assumption that a merged model is a more accurate representation of a given phenomenon than each model individually.

80 Real data Network Avg NumNodes

Variance Density

Peak Input

Last InfRate

HealRate

Figure 4.10: The Bayesian network created from real data about Chlamydia

1

0.8

61% 67% 0.6 72%

0.4

0.2 39% 28% 33%

0 Merged Model A Model B

Correct Wrong/Missing

Figure 4.11: The proportion of accurate and spurious edges identified in each Bayesian network created in this analysis: merged, PV and KA models. This comparison has as basis the Bayesian network created from real-world data about Chlamydia.

81 4.2 SIR Models Analysis

We are going to present the analyzed SIR models and their peculiarities. In order to better discuss them we are going to refer them as:

• Nekovee-Moreno (Nek) Model: In this model, once a randomly selected infected node contacts one of its neighbors it has a probability v of infecting it if it is susceptible and a probability δ of becoming refractory if its neighbor is also infected or refractory. We are going to use as reference the work of Nekovee et al. [146]. Their results show the presence of a threshold as a function of the infection rate in networks with bounded degree fluctuations, such as random networks. This threshold is also present in finite-size scale- free networks, albeit at a much smaller value.

• Moore-Newman (Moore) Model: This SIR model implements a proportional infection rate in which an infected node has probability v of infecting all its neighbors who are susceptible and probability δ of healing, which does not depend on any contact. We are going to use as reference the work of Moore and Newman [139]. Their results show the presence of an epidemic threshold after which an epidemic can occur.

4.2.1 Analysis

In order to make a comparative analysis of the Nek and Moore models we are going to use the proposed methodology, step by step.

82 Step 1

The analyzed models meet the same criteria of SIS Model Analysis: they are based on cellular automata, the infection propagates through a network of contacts, the model is defined locally, and the time is discrete. Thus, from all general sets of variables we examined each model’s experiments in order to identify the set of variables that is used. In the Nek model, we identified the following set of variables X1 = {Network, NumNodes, InfRate, HealRate}. The authors of the model did not perform any simulations in which they varied the density of the network or the initial seed of infected nodes. In the Moore model we identified the following set X2 = {Network, Density, InfRate, HealRate}, the authors did not consider changes in the number of individuals in each network or in the initial input of infected individuals.

Step 2

As both models in this analysis are SIR models we are going to use the following measures to capture the epidemic nature of such dynamics: the peak of infected nodes over the time of the simulation (peak) and the final density of removed/refractory nodes after the simulation is done (totalR). The first measure, peak, accounts for how virulent the disease is; the higher this value the faster the disease spreads. The second measure, totalR, accounts for the proportion of the population that was infected during the time of the simulation, allowing us to analyze its epidemic behavior.

83 Step 3

Similar to the previous analysis, we analyzed the values that the missing variables assume in either model. For example, in the Nek model the authors did not change the value of the variable Density, thus it is not formally considered a variable. However, some value must be used in their simulations in order to perform the simulations. As mentioned previously the identification of such values is important because when we are merging the different datasets of each model we need to ensure that all variables considered by both models are present in the merged scenarios with the values reported by their authors. In the Nek model they use a fixed density of six neighbors in all their simulations, and in the Moore model the authors used a fixed number of nodes in the network of 5x105. The variable Input that was not considered in both models is irrelevant in this scenario because although it has been reported to affect the dynamics of disease spreading it was disregarded by the authors of both models.

Step 4

Once we identified the variables that each model uses and the respective values that they assume we performed exhaustive simulations in each scenario. Each table will have the same set of columns Y and X, where X = XA ∪ XB, filling the missing attributes with the values identified in the previous step. Thus, for

k each variable xi that has k values we performed a combination of xi experiments to cover the broadest set of possible scenarios in each model, thus creating a joint probability table of each model.

84 Step 5

We then used the joint probability table created in the previous step to generate a Bayesian network of each model. We used the PC search algorithm implemented in the tool TETRAD. The resulting Bayesian network of the Nek model can be seen in Figure 4.12 and the resulting Bayesian network of the Moore model can be seen in Figure 4.13.

Step 6

Trying to avoid the problem of one dataset overwhelming the other we need to balance the number of scenarios of each dataset properly for each model. In this analysis both models have two missing variables, thus each scenario was performed exactly ten times so each model has between 103 and 104 scenarios.

Step 7

After balancing each dataset properly, we merged them into a single one that will be used to create a merged Bayesian network that is the core of our analysis. The merged joint dataset was used as input in the PC search algorithm to uncover the Bayesian network of the merged scenarios. The resulting structure can be seen in Figure 4.14.

Step 8

Using the possible outcomes from the merging scenarios explained in the previous chapter, we now use the three Bayesian networks to rank the analyzed models according to their ability to accurately identify dependencies. As already discussed,

85 Nek - Model 1 Network

NumNodes

Density

Peak Input

TotalR InfRate

HealRate

Figure 4.12: The Bayesian network generated by implementing the model used in [146].

Moore- Model 2 Network

Density NumNodes

Peak Input

TotalR InfRate

HealRate

Figure 4.13: The Bayesian network generated by implementing the model used in [139].

86 KupNMoore Network

NumNodes

Density

Peak Input

TotalR InfRate

HealRate

Figure 4.14: The Bayesian network created by merging the PGMs of the Nek and Moore models. not all scenarios present a way to rank the models comparatively, such as scenarios one, two and five, however scenarios three and four are more interesting because they display different behavior between the analyzed models and thus we can rank the models. The table presented in Figure 4.15 shows how the structure of the Bayesian network of each model and the one of the merged scenarios interact to produce different outcomes. Figure 4.15 shows us that in the Nek model there is an edge connecting the variable Network and the variable Peak. The same edge was not identified in the Bayesian network of the Moore model, but it was again identified in the merged Bayesian network, thus characterizing an example of scenario number three in our analysis. By analyzing each possible dependency among the system’s variables, we are able to better assess the likelihood that a given dependency in each model

87 Model Nek Model Moore Model Merged Peak TotalR Peak TotalR Peak TotalR Network 1 1 0 0 NumNodes 0 0 0 0 Density 0 0 1 0 InfRate 1 0 1 0 HealRate 0 0 0 1 Peak 1 1 1 1 TotalR 1 1 1 1

Figure 4.15: Table representing the Bayesian structure of each model and the merged one. Each row which is red colored indicated the presence of the edge between the two variables of the row and column. Cells colored in yellow represent variables that were not considered in that model. The number in each cell represent the corresponding scenario of the merging outcome. is either accurate or spurious. Figure 4.16 shows the structure of the Bayesian network of each analyzed model; the edges are colored to distinguish between accurate and spurious dependencies. As can be seen in Figure 4.17, both models had the same proportion of accurate and spurious dependencies. The Bayesian networks used in this analysis not only contain fewer nodes because of the number of epidemic metrics and because fewer variables were considered by the authors, but they are also sparser when compared to our previous analysis. Each model was only able to identify five dependencies. The results of the Nek model, although considering the number of nodes as a variable that influences the dynamics did not show any direct effect on the defined metrics. We see that the behavior of the infection rate and healing rate in both models is similar. The same cannot be said about the variables related to the structure of the network, its topology, density and number of nodes. This might indicate that the topology plays a more important role in SIR models than in

88 Nek - Model 1 Moore- Model 2 Network Network

NumNodes

Density NumNodes Density

Peak Peak Input Input

TotalR TotalR InfRate InfRate

HealRate HealRate

Figure 4.16: The Bayesian network of each model with the dependencies colored to reflect accurate (green) and spurious (red) dependencies. The Bayesian on the left belongs to Nek model and the Bayesian network on the right belongs to Moore model

SIS models. The choice of both authors in not considering the initial seed of infected individual as a variable of the system decreased the possibility of new edges being identified, and the dynamic of the spreading was restricted to the remaining variables used. The choice of the authors of the Moore model to not consider the number of nodes of the network as a variable did not affect their results, because, although the Nek model considered them, they did not affect the dynamics. The merging Bayesian network has many more dependencies, this is evidence that both models had insufficient data and when combined showed a system where the variables have a bigger influence on the metrics of the dynamics.

89 Nek Model Ne Moore Model

Figure 4.17: The proportion of accurate and spurious edges identified in each model analyzed.

90 Chapter 5

Conclusions

The spreading of epidemic diseases is one of major problems of the modern world. With the plurality of human transportation and increasingly of global commerce, no country or individual person is safe. Many scientific efforts attempt to model the dynamic behavior of such spreading, consequently many models that describe both endemic and epidemic aspects of real world diseases have been proposed. However, there is a lack of literature on methods to compare different disease spreading models, assessing their scope and limitations. In this dissertation, we have described and analyzed our main contribution on disease spreading models: the development of a methodology to compare different disease spreading models by merging their results and using Bayesian networks. Our contribution presented a methodology to compare the accuracy of disease spreading models, as means of a new perspective to study them. Our methodology creates a Bayesian network for each analyzed model, merges their results and creates a new Bayesian network using the merged scenarios. Using these three structures we are able to assess the likelihood of the true nature of the dependencies

91 identified, i.e. if they also belong to the phenomenon that they attempt to describe or if they are only present in that specific model. Thus, we have a way to evaluate how the modeling decisions of each model affect the spreading dynamics produced by it. The methodology was applied to two different analyses in SIS models and SIR models. In the first analysis we saw that some modeling decisions of the authors of such models contributed considerably to the varied dynamics of the spreading, and to the restrictions of such models in identifying certain dependencies. In the second analysis the generated Bayesian networks were smaller and sparser, thus fewer dependencies were identified and both models had the same accuracy in identifying different dependencies in the system. Future directions for this work include analyzing more than a pair of models, i.e. comparative analyses of three or more models would allow for different kinds of comparisons instead of only two when we analyze two models. It would be also interesting to analyze SIS models with SIR models that produce different dynamics, in order to evaluate if they have any scenario or result in common. Due to the generic nature of the proposed methodology it can be also used for different phenomenon other than disease spread that uses complex networks, such as urban crime.

92 Bibliography

[1] Center of disease control and prevention. https://www.cdc.gov/about/ organization/cio.htm. Accessed: 2018-01-01.

[2] Chlamydia 2015 std statistics. https://www.cdc.gov/std/stats15/ chlamydia.htm. Accessed: 2018-01-01.

[3] Emerging and re-emerging infectious diseases. bethesda, md, usa: National

institutes of health. http://www3.niaid.nih.gov. Accessed: 2015-12-17.

[4] Immunization, vaccines and

biologicals. http://www.who.int/immunization/diseases/en/. Accessed: 2018-03-10.

[5] United states census bureau - county population totals and components of

change: 2010-2016. https://www.census.gov/data/tables/2016/demo/ popest/counties-total.html. Accessed: 2018-01-01.

[6] Bao-Quan Ai, Xian-Ju Wang, Guo-Tao Liu, and Liang-Gang Liu. Correlated noise in a logistic growth model. Phys. Rev. E, 67:022903, Feb 2003.

[7] Reka Albert and Albert-Laszlo Barabasi. Statistical mechanics of complex networks. Rev. Mod. Phys., 74:47–97, Jan 2002.

93 [8] Reka Albert, Hawoong Jeong, and Albert-Laszlo Barabasi. Internet: Diameter of the World-Wide Web. Nature, 401(6749):130–131, sep 1999.

[9] Barabasi Albert-Laszlo. Linked: the new science of networks. Perseus, 2002.

[10] Josep Roure Alcobe. Incremental methods for bayesian network structure learning. Artificial Intelligence Communications, 18(1):61–62, 2005.

[11] Uri Alon. An introduction to systems biology: design principles of biological circuits, volume 10. CRC press, 2007.

[12] L. A. N. Amaral, A. Scala, M. Barthelemy, and H. E. Stanley. Classes of small-world networks. Proceedings of the National Academy of Sciences, 97(21):11149–11152, 2000.

[13] Roy M Anderson, Robert M May, and B Anderson. Infectious diseases of humans: dynamics and control, volume 28. Wiley Online Library, 1992.

[14] Steven B Andrews and David Knoke. Networks in and around organizations. Jai Press, 1999.

[15] Anonymous. Six degrees from hollywood. Newsweek, 11, 1999.

[16] Reuven Aviv, Zippy Erlich, Gilad Ravid, and Aviva Geva. Network analysis of knowledge construction in asynchronous learning networks. Journal of Asynchronous Learning Networks, 7(3):1–23, 2003.

[17] Norman TJ Bailey et al. The mathematical theory of infectious diseases and its applications. Charles Griffin & Company Ltd, 5a Crendon Street, High Wycombe, Bucks, 1975.

94 [18] Paolo Bajardi, Chiara Poletto, Jose J. Ramasco, Michele Tizzoni, Vittoria Colizza, and Alessandro Vespignani. Human mobility networks, travel restrictions, and the global spread of 2009 h1n1 pandemic. PLoS ONE, 6(1):e16591, 01 2011.

[19] Albert-Laszlo Barabasi and Reka Albert. Emergence of scaling in random networks. Science, 286(5439):509–512, 1999.

[20] G. A. Barnard. New methods of quality control. Journal of the Royal Statistical Society. Series A (General), 126(2):255–258, 1963.

[21] Alex Bavelas. A mathematical model for group structures. Human Organization, 7(3):16–30, 1948.

[22] Mark A. Beaumont, Wenyang Zhang, and David J. Balding. Approximate bayesian computation in population genetics. Genetics, 162(4):2025–2035, 2002.

[23] L Bernardinelli, D Clayton, C Pascutto, C Montomoli, M Ghislandi, and M Songini. Bayesian analysis of space-time variation in disease risk. Statistics in medicine, 14(21-22):2433–2443, 1995.

[24] Thomas O Binford, Tod S Levitt, and Wallace B Mann. Bayesian inference in model-based machine vision. arXiv preprint arXiv:1304.2720, 2013.

[25] J. Blind and S. Das. Disease outbreak detection and tracking for biosurveillance: a data fusion approach. In Information Fusion, 2007 10th International Conference on, pages 1–7, July 2007.

95 [26] Michael G. B. Blum and Viet Chi Tran. Hiv with contact tracing: a case study in approximate bayesian computation. Biostatistics, 11(4):644–660, 2010.

[27] S. Boccaletti, V. Latora, Y. Moreno, M. Chavez, and D.-U. Hwang. Complex networks: Structure and dynamics. Physics Reports, 424(4-5):175 – 308, 2006.

[28] Marian Boguna and Romualdo Pastor-Satorras. Epidemic spreading in correlated complex networks. Phys. Rev. E, 66:047104, Oct 2002.

[29] D Bohning, E Dietz, and P Schlattmann. Space-time mixture modelling of public health data. Statistics in medicine, 19(17-18):2333–2344, 2000.

[30] Y. Bonduelle. Aggregating Expert Opinions by Reconciling Sources of Disagreement. PhD thesis, Stanford University, 1987.

[31] Stefan Bornholdt and Heinz Georg Schuster. Handbook of graphs and networks: from the genome to the internet. John Wiley & Sons, 2006.

[32] J. Brahmbhatt and R. Menezes. On the relation between tourism and trade: A network experiment. In Network Science Workshop (NSW), 2013 IEEE 2nd, pages 74–81, April 2013.

[33] , Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins, and Janet Wiener. Graph structure in the web. Computer Networks, 33(1-6):309 – 320, 2000.

[34] David Brown, Peter Rothery, et al. Models in biology: mathematics, statistics and computing. John Wiley & Sons Ltd., 1993.

96 [35] Wray Buntine. Theory refinement on bayesian networks. In Proceedings of the Seventh Conference on Uncertainty in Artificial Intelligence, UAI’91, pages 52–60, San Francisco, CA, USA, 1991. Morgan Kaufmann Publishers Inc.

[36] Ana Karoline Araujo Castro, Placido Rogerio Pinheiro, and Mirian Caliope Dantas Pinheiro. Rough Sets and Knowledge Technology: 4th International Conference, RSKT 2009, Gold Coast, Australia, July 14- 16, 2009. Proceedings, pages 216–223. Springer Berlin Heidelberg, Berlin, Heidelberg, 2009.

[37] Eugene Charniak. Bayesian networks without tears. AI magazine, 12(4):50, 1991.

[38] Eugene Charniak and Robert P Goldman. Plan recognition in stories and in life. arXiv preprint arXiv:1304.1497, 2013.

[39] Kevin J Cherkauer. Human expert-level performance on a scientific image analysis task by a system using combined artificial neural networks. In Working notes of the AAAI workshop on integrating multiple learned models, pages 15–21. Citeseer, 1996.

[40] CM Chukwuani. Socio-economic implication of multi-drug resistant malaria in the community; how prepared is nigeria for this emerging problem? West African journal of medicine, 18(4):303–306, 1999.

[41] Aaron Clauset, Cosma Rohilla Shalizi, and M. E. J. Newman. Power-law distributions in empirical data. SIAM Review, 51(4):661–703, 2009.

97 [42] Andrew Cliff and Peter Haggett. Island epidemics. Scientific American, 250:138–147, 1984.

[43] Joel Cohen, Frederic Briand, and Charles Newman. Community food webs: data and theory, volume 20. Springer Science and Business Media, 2012.

[44] Gregory F. Cooper and Edward Herskovits. A bayesian method for the induction of probabilistic networks from data. Machine Learning, 9(4):309– 347, 1992.

[45] T. Cover and P. Hart. Nearest neighbor pattern classification. Information Theory, IEEE Transactions on, 13(1):21–27, January 1967.

[46] P´aDraigCunningham, John Carney, and Saji Jacob. Stability problems with artificial neural networks and the ensemble solution. Artificial Intelligence in Medicine, 20(3):217–225, 2000.

[47] D.-I. Curiac, G. Vasile, O. Banias, C. Volosencu, and A. Albu. Bayesian network model for diagnosis of psychiatric diseases. In Information Technology Interfaces, 2009. ITI ’09. Proceedings of the ITI 2009 31st International Conference on, pages 61–66, June 2009.

[48] Paulo CG da Costa and Kathryn B Laskey. Multi-entity bayesian networks without multi-tears. Draft Version, 3(06), 2005.

[49] Scott C. Deerwester, Susan T Dumais, and Richard A. Harshman. Indexing by latent semantic analysis. 1990.

[50] Morris H DeGroot and Mark J Schervish. Probability and statistics. 2002.

98 [51] Zoltan Dezso and Albert-Laszlo Barabasi. Halting viruses in scale-free networks. Phys. Rev. E, 65:055103, May 2002.

[52] C.C. Dias, F. Magro, and P. Pereira Rodrigues. Preliminary study for a bayesian network prognostic model for crohn’s disease. In Computer-Based Medical Systems (CBMS), 2015 IEEE 28th International Symposium on, pages 141–144, June 2015.

[53] Odo Diekmann and JAP Heesterbeek. Mathematical epidemiology of infectious diseases, volume 14. Wiley Chichester, 2000.

[54] Edsger W Dijkstra. A note on two problems in connexion with graphs. Numerische mathematik, 1(1):269–271, 1959.

[55] Peter Sheridan Dodds and Duncan J. Watts. Universal behavior in a generalized model of contagion. Phys. Rev. Lett., 92:218701, May 2004.

[56] SN Dorogovtesev and JFF Mendes. Evolution of networks oxford university press, 2003.

[57] S. N. Dorogovtsev and J. F. F. Mendes. Evolution of networks. Advances in Physics, 51(4):1079–1187, 2002.

[58] Harris Drucker, Robert Schapire, and Patrice Simard. Improving performance in neural networks using a boosting algorithm. Advances in neural information processing systems, pages 42–42, 1993.

[59] Marek J. Druzdzel and Francisco J. Diez. Combining knowledge from different sources in causal probabilistic models. J. Mach. Learn. Res., 4:295– 316, December 2003.

99 [60] M Eandi and GP Zara. Economic impact of resistance in the community. International journal of clinical practice. Supplement, 95:27–38, June 1998.

[61] Frederick Eberhardt, Clark Glymour, and Richard Scheines. N-1 experiments suffice to determine the causal relations among n variables. In DawnE. Holmes and LakhmiC. Jain, editors, Innovations in Machine Learning, volume 194 of Studies in Fuzziness and Soft Computing, pages 97–112. Springer Berlin Heidelberg, 2006.

[62] P. Erd˝osand A. R´enyi. On random graphs. Publicationes Mathematicae Debrecen, 6:290–297, 1959.

[63] Paul Erd˝osand Alfr´edR´enyi. On the evolution of random graphs. Magyar Tud. Akad. Mat. Kutat´oInt. K¨ozl, 5:17–61, 1960.

[64] Z.S. Estabragh, M.M.R. Kashani, F.J. Moghaddam, S. Sari, and K.S. Oskooyee. Bayesian network model for diagnosis of social anxiety disorder. In Bioinformatics and Biomedicine Workshops (BIBMW), 2011 IEEE International Conference on, pages 639–640, Nov 2011.

[65] Michalis Faloutsos, Petros Faloutsos, and Christos Faloutsos. On power-law relationships of the internet topology. SIGCOMM Comput. Commun. Rev., 29(4):251–262, August 1999.

[66] Craig Fass, Mike Ginelli, and Brian Turtle. Six Degrees of Kevin Bacon. Plume, 1996.

100 [67] Anthony S Fauci, Nancy A Touchette, and Gregory K Folkers. Emerging Infectious Diseases: a 10-Year Perspective from the National Institute of Allergy and Infectious Diseases. Emerging Infectious Diseases, 11(4):519– 525, apr 2005.

[68] Centers for Disease Control, Prevention, et al. Std surveillance 2015. Atlanta: CDC, 2015.

[69] Santo Fortunato. Community detection in graphs. Physics Reports, 486(3- 5):75 – 174, 2010.

[70] Ove Frank. Transitivity in stochastic graphs and digraphs. The Journal of Mathematical Sociology, 7(2):199–213, 1980.

[71] Gianpaolo Scalia-Tomba Frank Ball, Denis Mollison. Epidemics with two levels of mixing. The Annals of Applied Probability, 7(1):46–89, 1997.

[72] Christophe Fraser, Christl A. Donnelly, Simon Cauchemez, William P. Hanage, Maria D. Van Kerkhove, T. DÃľirdre Hollingsworth, Jamie Griffin, Rebecca F. Baggaley, Helen E. Jenkins, Emily J. Lyons, Thibaut Jombart, Wes R. Hinsley, Nicholas C. Grassly, Francois Balloux, Azra C. Ghani, Neil M. Ferguson, Andrew Rambaut, Oliver G. Pybus, Hugo Lopez-Gatell, Celia M. Alpuche-Aranda, Ietza Bojorquez Chapela, Ethel Palacios Zavala, Dulce Ma. Espejo Guevara, Francesco Checchi, Erika Garcia, Stephane Hugonnet, Cathy Roth, and The WHO Rapid Pandemic Assessment Collaboration. Pandemic potential of a strain of influenza a (h1n1): Early findings. Science, 324(5934):1557–1561, 2009.

101 [73] Linton C Freeman. Social networks and the structure experiment. Research methods in social network analysis, pages 11–40, 1989.

[74] , Robert E Schapire, et al. Experiments with a new boosting algorithm. In ICML, volume 96, pages 148–156, 1996.

[75] Nir Friedman and Moises Goldszmidt. Sequential update of bayesian network structure. In Proceedings of the Thirteenth Conference on Uncertainty in Artificial Intelligence, UAI’97, pages 165–174, San Francisco, CA, USA, 1997. Morgan Kaufmann Publishers Inc.

[76] Nir Friedman and Moises Goldszmidt. Learning in Graphical Models, chapter Learning Bayesian Networks with Local Structure, pages 421–459. Springer Netherlands, Dordrecht, 1998.

[77] Dani Gamerman and Hedibert F Lopes. Markov chain Monte Carlo: stochastic simulation for Bayesian inference. CRC Press, 2006.

[78] A.J. Ganesh, A.-M. Kermarrec, and L. Massoulie. Peer-to-peer membership management for gossip-based protocols. Computers, IEEE Transactions on, 52(2):139–149, Feb 2003.

[79] M. Girvan and M. E. J. Newman. Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12):7821–7826, 2002.

[80] K.-I. Goh, B. Kahng, and D. Kim. Fluctuation-driven dynamics of the internet topology. Phys. Rev. Lett., 88:108701, Feb 2002.

102 [81] J. B. G´omez,Y. Moreno, and A. F. Pacheco. Probabilistic approach to time- dependent load-transfer models of fracture. Phys. Rev. E, 58:1528–1532, Aug 1998.

[82] C. W. J. Granger. Invited review combining forecastsâĂŤtwenty years later. Journal of Forecasting, 8(3):167–173, 1989.

[83] P. Grassberger. On the critical behavior of the general epidemic process and dynamical percolation. Mathematical Biosciences, 63(2):157 – 172, 1983.

[84] David M Grether. Bayes rule as a descriptive model: The representativeness heuristic. The Quarterly Journal of Economics, pages 537–557, 1980.

[85] John Guare. Six degrees of separation. Dramatists Play Service, Inc., 1992.

[86] S. Gutta and H. Wechsler. Face recognition using hybrid classifier systems. In Neural Networks, 1996., IEEE International Conference on, volume 2, pages 1017–1022 vol.2, Jun 1996.

[87] Lars Kai Hansen, Christian Liisberg, and Peter Salamon. Ensemble methods for handwritten digit recognition. In Neural Networks for Signal Processing [1992] II., Proceedings of the 1992 IEEE-SP Workshop, pages 333–342. IEEE, 1992.

[88] Lars Kai Hansen and Peter Salamon. Neural network ensembles. IEEE Transactions on Pattern Analysis & Machine Intelligence, (10):993–1001, 1990.

[89] Sherif Hashem. Optimal linear combinations of neural networks. Neural Networks, 10(4):599 – 614, 1997.

103 [90] David Heckerman. Probabilistic similarity networks. Networks, 20(5):607– 636, 1990.

[91] David Heckerman, Dan Geiger, and DavidM. Chickering. Learning bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20(3):197–243, 1995.

[92] JAP Heesterbeek. Mathematical epidemiology of infectious diseases: model building, analysis and interpretation, volume 5. John Wiley & Sons, 2000.

[93] Donald A. Henderson. The looming threat of bioterrorism. Science, 283(5406):1279–1282, 1999.

[94] Herbert W. Hethcote. An immunization model for a heterogeneous population. Theoretical Population Biology, 14(3):338 – 349, 1978.

[95] Herbert W. Hethcote. The mathematics of infectious diseases. SIAM Review, 42(4):599–653, 2000.

[96] Herbert W Hethcote and James A Yorke. Gonorrhea transmission dynamics and control, volume 56. Springer Berlin, 1984.

[97] Fu Jie Huang, Zhihua Zhou, Hong-Jiang Zhang, and Tsuhan Chen. Pose invariant face recognition. In Automatic Face and Gesture Recognition, 2000. Proceedings. Fourth IEEE International Conference on, pages 245–250, 2000.

[98] Herminia Ibarra. Homophily and differential returns: Sex differences in network structure and access in an advertising firm. Administrative Science Quarterly, 37(3):422–447, 1992.

104 [99] J.E. Introne, I. Levit, S. Harrison, and S. Das. A data fusion approach to biosurveillance. In Information Fusion, 2005 8th International Conference on, volume 2, pages 8 pp.–, July 2005.

[100] M.A. Javarone and G. Armano. A fitness model for epidemic dynamics in complex networks. In Signal Image Technology and Internet Based Systems (SITIS), 2012 Eighth International Conference on, pages 793–797, Nov 2012.

[101] Finn V Jensen. An introduction to Bayesian networks, volume 210. UCL press London, 1996.

[102] H Jeong, B Tombor, R Albert, Z N Oltvai, and A.-L. Barabasi. The large- scale organization of metabolic networks. Nature, 407(6804):651–654, oct 2000.

[103] Chang-an Jiang, Tze-Yun Leong, and POH Kim-Leng. Pgmc: a framework for probabilistic graphical model combination. In AMIA Annual Symposium Proceedings, volume 2005, page 370. American Medical Informatics Association, 2005.

[104] Paul Joyce and Paul Marjoram. Approximately sufficient statistics and bayesian computation. Statistical applications in genetics and molecular biology, 7(1), 2008.

[105] Eugene Santos Jr., John T. Wilkinson, and Eunice E. Santos. Fusing multiple bayesian knowledge sources. International Journal of Approximate Reasoning, 52(7):935 – 947, 2011. Selected Papers - Uncertain Reasoning Track - {FLAIRS} 2009.

105 [106] M. Karsai, M. Kivel¨a,R. K. Pan, K. Kaski, J. Kert´esz,A.-L. Barab´asi,and J. Saram¨aki. Small but slow world: How network topology and burstiness slow down spreading. Phys. Rev. E, 83:025102, Feb 2011.

[107] S. Kasereka, N. Kasoro, and A. P. Chokki. A hybrid model for modeling the spread of epidemics: Theory and simulation. In ISKO-Maghreb: Concepts and Tools for knowledge Management (ISKO-Maghreb), 2014 4th International Symposium, pages 1–7, Nov 2014.

[108] Matt J Keeling and Ken T.D Eames. Networks and epidemic models. Journal of The Royal Society Interface, 2(4):295–307, 2005.

[109] M.J Keeling and J.V Ross. On methods for studying stochastic disease dynamics. Journal of The Royal Society Interface, 5(19):171–181, 2008.

[110] D Kirby and Sahre P. Six degrees of monica. New York Times, 11, 1998.

[111] L Knorr-Held and J Besag. Modelling risk from a disease in time and space. Statistics in medicine, 17(18):2045–2060, September 1998.

[112] Christof Koch and Gilles Laurent. Complexity and the nervous system. Science, 284(5411):96–98, 1999.

[113] Daphne Koller and Nir Friedman. Probabilistic graphical models: principles and techniques. MIT press, 2009.

[114] Peter Sollich Anders Krogh. Learning with ensembles: How over-fitting can be useful. In Proceedings of the 1995 Conference, volume 8, page 190, 1996.

[115] Max Kuhn and Kjell Johnson. Applied predictive modeling. Springer, 2013.

106 [116] D.K. Kular, R. Menezes, and E. Ribeiro. Using network analysis to understand the relation between cuisine and culture. In Network Science Workshop (NSW), 2011 IEEE, pages 38–45, June 2011.

[117] Marcelo Kuperman and Guillermo Abramson. Small world effect in an epidemiological model. Phys. Rev. Lett., 86:2909–2912, Mar 2001.

[118] Wai Lam and Fahiem Bacchus. Using new data to refine a bayesian network. In Proceedings of the Tenth International Conference on Uncertainty in Artificial Intelligence, UAI’94, pages 383–390, San Francisco, CA, USA, 1994. Morgan Kaufmann Publishers Inc.

[119] Andrew B Lawson and Allan Clark. Spatial mixture relative risk models applied to disease mapping. Statistics in medicine, 21(3):359–370, February 2002.

[120] Joshua Lederberg, Margaret A Hamburg, Mark S Smolinski, et al. Microbial Threats to Health:: Emergence, Detection, and Response. National Academies Press, 2003.

[121] Fredrik Liljeros, Christofer R. Edling, and Luis A.Nunes Amaral. Sexual networks: implications for the transmission of sexually transmitted infections. Microbes and Infection, 5(2):189 – 196, 2003.

[122] Zonghua Liu, Ying-Cheng Lai, and Nong Ye. Propagation and immunization of infection on general networks with both homogeneous and heterogeneous components. Phys. Rev. E, 67:031911, Mar 2003.

107 [123] Alun L. Lloyd and Robert M. May. How viruses spread among computers and people. Science, 292(5520):1316–1317, 2001.

[124] M. Lopez, J. Ramirez, J.M. Gorriz, D. Salas-Gonzalez, I. Alvarez, F. Segovia, and R. Chaves. Multivariate approaches for alzheimer’s disease diagnosis using bayesian classifiers. In Nuclear Science Symposium Conference Record (NSS/MIC), 2009 IEEE, pages 3190–3193, Oct 2009.

[125] Wanbiao Ma, Mei Song, and Y. Takeuchi. Global stability of an {SIR} epidemicmodel with time delay. Applied Mathematics Letters, 17(10):1141 – 1145, 2004.

[126] Yoshiharu Maeno. Discovering network behind infectious disease outbreak. Physica A: Statistical Mechanics and its Applications, 389(21):4755 – 4768, 2010.

[127] Y. Maki and H. Hirose. Infectious disease spread analysis using stochastic differential equations for sir model. In Intelligent Systems Modelling Simulation (ISMS), 2013 4th International Conference on, pages 152–156, Jan 2013.

[128] Jianchang Mao. A case study on bagging, boosting and basic ensembles of neural networks for ocr. In Neural Networks Proceedings, 1998. IEEE World Congress on Computational Intelligence. The 1998 IEEE International Joint Conference on, volume 3, pages 1828–1833 vol.3, May 1998.

[129] R Mardon, S Shih, R Mierzejewski, S Halim, P Gwet, and JE Bost. National committee for quality assurance the state of health care quality, 2015.

108 [130] Peter V. Marsden. Homogeneity in confiding relations. Social Networks, 10(1):57 – 76, 1988.

[131] A.H. Marshall, L.A. Hill, and F. Kee. Continuous dynamic bayesian networks for predicting survival of ischaemic heart disease patients. In Computer- Based Medical Systems (CBMS), 2010 IEEE 23rd International Symposium on, pages 178–183, Oct 2010.

[132] Maia Martcheva. Introduction to Mathematical Epidemiology, volume 61. Springer, 2015.

[133] Izhar Matzkevich and Bruce Abramson. The topological fusion of bayes nets. In Proceedings of the Eighth International Conference on Uncertainty in Artificial Intelligence, UAI’92, pages 191–198, San Francisco, CA, USA, 1992. Morgan Kaufmann Publishers Inc.

[134] RM May and RM Anderson. Transmission dynamics of hiv infection. Nature, 326(6109):137–142, 1987.

[135] Robert M. May, Sunetra Gupta, and Angela R. McLean. Infectious disease dynamics: what characterizes a successful invader? Philosophical Transactions of the Royal Society of London B: Biological Sciences, 356(1410):901–910, 2001.

[136] Ronny Meir. Bias, variance and the combination of least squares estimators. Advances in neural information processing systems, pages 295–302, 1995.

[137] Stanley Milgram. The small world problem. Psychology today, 2(1):60–67, 1967.

109 [138] James M. Cook Miller McPherson, Lynn Smith-Lovin. Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27:415–444, 2001.

[139] Cristopher Moore and M. E. J. Newman. Epidemics and percolation in small-world networks. Phys. Rev. E, 61:5678–5682, May 2000.

[140] Y. Moreno, R. Pastor-Satorras, and A. Vespignani. Epidemic outbreaks in complex heterogeneous networks. The European Physical Journal B - Condensed Matter and Complex Systems, 26(4):521–529, 2002.

[141] Yamir Moreno, Javier B. G´omez,and Amalio F. Pacheco. Epidemic incidence in correlated complex networks. Phys. Rev. E, 68:035103, Sep 2003.

[142] Martina Morris and Mirjam Kretzschmar. Concurrent partnerships and transmission dynamics in networks. Social Networks, 17(3-4):299 – 318, 1995. Social networks and infectious disease: HIV/AIDS.

[143] Kevin Patrick Murphy. Dynamic bayesian networks: representation, inference and learning. PhD thesis, University of California, Berkeley, 2002.

[144] JD Murray. Mathematical biology, 2nd edn. biomathematics series, vol. 19, 1993.

[145] Richard E Neapolitan. Learning bayesian networks prentice hall, 2003.

[146] M. Nekovee, Y. Moreno, G. Bianconi, and M. Marsili. Theory of rumour spreading in complex social networks. Physica A: Statistical Mechanics and its Applications, 374(1):457 – 470, 2007.

[147] M. E. J. Newman. The structure of scientific collaboration networks. Proceedings of the National Academy of Sciences, 98(2):404–409, 2001.

110 [148] M. E. J. Newman. Assortative mixing in networks. Phys. Rev. Lett., 89:208701, Oct 2002.

[149] M. E. J. Newman. Spread of epidemic disease on networks. Phys. Rev. E, 66:016128, Jul 2002.

[150] M. E. J. Newman. The structure and function of complex networks. SIAM Review, 45(2):167–256, 2003.

[151] M. E. J. Newman, C. Moore, and D. J. Watts. Mean-field solution of the small-world network model. Phys. Rev. Lett., 84:3201–3204, Apr 2000.

[152] M. E. J. Newman and D. J. Watts. Scaling and percolation in the small-world network model. Phys. Rev. E, 60:7332–7342, Dec 1999.

[153] MarkE.J. Newman. Who is the best connected scientist?a study of scientific coauthorship networks. In Eli Ben-Naim, Hans Frauenfelder, and Zoltan Toroczkai, editors, Complex Networks, volume 650 of Lecture Notes in Physics, pages 337–370. Springer Berlin Heidelberg, 2004.

[154] Thomas Dyhre Nielsen and Finn Verner Jensen. Bayesian networks and decision graphs. Springer Science & Business Media, 2009.

[155] D. Oliveira and M. Carvalho. A comparison of community identication algorithms for regulatory network motifs. In Bioinformatics and Bioengineering (BIBE), 2013 IEEE 13th International Conference on, pages 1–6, Nov 2013.

111 [156] Douglas Oliveira, Marco Carvalho, and Ronaldo Menezes. Using network sciences to evaluate the brazilian airline network. In Ding-Zhu Du and Guochuan Zhang, editors, Computing and Combinatorics, volume 7936 of Lecture Notes in Computer Science, pages 849–858. Springer Berlin Heidelberg, 2013.

[157] Joint United Nations Programme on HIV/AIDS et al. Report on the Global HIV/AIDS Epidemic, July 2002. Joint United Nations Programme on HIV/AIDS, 2002.

[158] World Health Organization, Jeffrey Sachs, et al. Macroeconomics and health: investing in health for economic development: report of the Commission on Macroeconomics and Health. WHO, 2001.

[159] World Health Organization, June UNAIDS, et al. Report on the global aids epidemic, 4th global report, table of country-specific hiv. AIDS estimates and data, end, 2003:189–207, 2004.

[160] Jorma Paavonen and W Eggert-Kruse. Chlamydia trachomatis: impact on human reproduction. Human reproduction update, 5(5):433–447, 1999.

[161] Romualdo Pastor-Satorras and Alessandro Vespignani. Epidemic dynamics and endemic states in complex networks. Phys. Rev. E, 63:066117, May 2001.

[162] Romualdo Pastor-Satorras and Alessandro Vespignani. Epidemic spreading in scale-free networks. Phys. Rev. Lett., 86:3200–3203, Apr 2001.

112 [163] Romualdo Pastor-Satorras and Alessandro Vespignani. Evolution and structure of the Internet: A statistical physics approach. Cambridge University Press, 2007.

[164] Judea Pearl. Probabilistic reasoning in intelligent systems. 1988. San Mateo, CA: Kaufmann.

[165] Judea Pearl, Thomas Verma, et al. A theory of inferred causation. Morgan Kaufmann San Mateo, CA, 1991.

[166] J S M Peiris, Y Guan, and K Y Yuen. Severe acute respiratory syndrome. Nat Med, nov 2004.

[167] Michael P Perrone and Leon N Cooper. When networks disagree: Ensemble methods for hybrid neural networks. Technical report, DTIC Document, 1992.

[168] Richard J. Gratton Peter J. Diggle. Monte carlo methods of inference for implicit statistical models. Journal of the Royal Statistical Society. Series B (Methodological), 46(2):193–227, 1984.

[169] A. Pinango and R. Dorado. A bayesian model for disease prediction using symptomatic information. In Central America and Panama Convention (CONCAPAN XXXIV), 2014 IEEE, pages 1–4, Nov 2014.

[170] P.R. Pinheiro, A. Castro, and M. Pinheiro. A multicriteria model applied in the diagnosis of alzheimer’s disease: A bayesian network. In Computational Science and Engineering, 2008. CSE ’08. 11th IEEE International Conference on, pages 15–22, July 2008.

113 [171] Richard Platt, Carmella Bocchino, Blake Caldwell, Robert Harmon, Ken Kleinman, Ross Lazarus, Andrew F. Nelson, James D. Nordin, and Debra P. Ritzwoller. Syndromic surveillance using minimum transfer of identifiable data: The example of the national bioterrorism syndromic surveillance demonstration program. Journal of Urban Health, 80(1):i25–i31, 2003.

[172] J J Potterat, L Phillips-Plummer, S Q Muth, R B Rothenberg, D E Woodhouse, T S Maldonado-Long, H P Zimmerman, and J B Muth. Risk network structure in the early epidemic phase of hiv transmission in colorado springs. Sexually Transmitted Infections, 78(suppl 1):i159–i163, 2002.

[173] J K Pritchard, M T Seielstad, A Perez-Lezaun, and M W Feldman. Population growth of human y chromosomes: a study of y chromosome microsatellites. Molecular Biology and Evolution, 16(12):1791–1798, 1999.

[174] Hazhir Rahmandad and John Sterman. Heterogeneity and network structure in the dynamics of diffusion: Comparing agent-based and differential equation models. Management Science, 54(5):998–1014, 2008.

[175] Everett M Rogers. Diffusion of innovations. Simon and Schuster, 2010.

[176] Tzachi Rosen, Solomon Eyal Shimony, and Eugene Santos. Reasoning with bkbs – algorithms and complexity. Annals of Mathematics and Artificial Intelligence, 40(3):403–425, 2004.

114 [177] Jean-Francois Rual, Kavitha Venkatesan, Tong Hao, Tomoko Hirozane- Kishikawa, Amelie Dricot, Ning Li, Gabriel F Berriz, Francis D Gibbons, Matija Dreze, Nono Ayivi-Guedehoussou, Niels Klitgord, Christophe Simon, Mike Boxem, Stuart Milstein, Jennifer Rosenberg, Debra S Goldberg, Lan V Zhang, Sharyl L Wong, Giovanni Franklin, Siming Li, Joanna S Albala, Janghoo Lim, Carlene Fraughton, Estelle Llamosas, Sebiha Cevik, Camille Bex, Philippe Lamesch, Robert S Sikorski, Jean Vandenhaute, Huda Y Zoghbi, Alex Smolyar, Stephanie Bosak, Reynaldo Sequerra, Lynn Doucette- Stamm, Michael E Cusick, David E Hill, Frederick P Roth, and Marc Vidal. Towards a proteome-scale map of the human protein-protein interaction network. Nature, 437(7062):1173–1178, oct 2005.

[178] N.A. Samat and D.F. Percy. Dengue disease mapping in malaysia based on stochastic sir models in human populations. In Statistics in Science, Business, and Engineering (ICSSBE), 2012 International Conference on, pages 1–5, Sept 2012.

[179] Richard Scheines, Peter Spirtes, Clark Glymour, Christopher Meek, and Thomas Richardson. The tetrad project: Constraint based aids to causal model specification. Multivariate Behavioral Research, 33(1):65–117, 1998.

[180] Amanda JC Sharkey. Combining artificial neural nets: ensemble and modular multi-net systems. Springer Science & Business Media, 2012.

[181] Yair Shimshoni and Nathan Intrator. Classification of seismic signals by integrating ensembles of neural networks. Signal Processing, IEEE Transactions on, 46(5):1194–1201, 1998.

115 [182] Boris Shulgin, Lewi Stone, and Zvia Agur. Pulse vaccination strategy in the sir epidemic model. Bulletin of Mathematical Biology, 60(6):1123–1148, 1998.

[183] MICHAEL SMALL and CHI K. TSE. Small world and scale free model of transmission of sars. International Journal of Bifurcation and Chaos, 15(05):1745–1755, 2005.

[184] Michael Small and Chi Kong TSE. Plausible models for propagation of the sars virus. IEICE transactions on fundamentals of electronics, communications and computer sciences, 87(9):2379–2386, 2004.

[185] Soli Sorias. Overcoming the limitations of the descriptive and categorical approaches in psychiatric diagnosis: A proposal based on bayesian networks. T¨urkpsikiyatri dergisi= Turkish journal of psychiatry, 26(1):1, 2015.

[186] Pater Spirtes, Clark Glymour, Richard Scheines, Stuart Kauffman, Valerio Aimale, and Frank Wimberly. Constructing bayesian network models of gene expression networks from microarray data. 2000.

[187] Peter Spirtes and Clark Glymour. An algorithm for fast recovery of sparse causal graphs. Social Science Computer Review, 9(1):62–72, 1991.

[188] Peter Spirtes, Clark N Glymour, and Richard Scheines. Causation, prediction, and search, volume 81. MIT press, 2000.

[189] Walter E Stamm. Chlamydia trachomatis infections of the adult. Sexually transmitted diseases, 1999.

116 [190] Steven H Strogatz. Exploring complex networks. Nature, 410(6825):268–276, March 2001.

[191] Aidan Sudbury. The proportion of the population never hearing a rumour. Journal of Applied Probability, 22(2):443–446, 1985.

[192] Richard L Sweet, Daniel V Landers, Cheryl Walker, and Julius Schachter. Chlamydia trachomatis infection and pregnancy outcome. American journal of obstetrics and gynecology, 156(4):824–833, 1987.

[193] M. Tanaka, Y. Sakumoto, M. Aida, and K. Kawashima. Study on the growth and decline of snss by using the infectious recovery sir model. In Information and Telecommunication Technologies (APSITT), 2015 10th Asia-Pacific Symposium on, pages 1–3, Aug 2015.

[194] Ian Taylor, John Knowelden, et al. Principles of epidemiology. Principles of Epidemiology., (Edn 2), 1964.

[195] DS Touretzky, MC Mozer, and ME Hasselmo. Learning with ensembles: How over-fitting can be useful. Advances in Neural Information Processing Systems, pages 190–196, 1996.

[196] Srividhya Venugopal, Evan Stoner, Martin Cadeiras, and Ronaldo Menezes. Understanding organ transplantation in the usa using geographical social networks. Social Network Analysis and Mining, 3(3):457–473, 2013.

[197] John Von Neumann and Arthur Walter Burks. Theory of self-reproducing automata, 1966.

117 [198] David M. Walker, David Allingham, Heung Wing Joseph Lee, and Michael Small. Parameter inference in small world network disease models with approximate bayesian computational methods. Physica A: Statistical Mechanics and its Applications, 389(3):540 – 548, 2010.

[199] Victor L Wallace. Toward an algebraic theory of markovian networks. In Proceedings of the Symposium on Telecommunications and Computer Networks, MRI Symposium Proceedings, volume 22, pages 397–408, 1972.

[200] Stanley Wasserman. Social network analysis: Methods and applications, volume 8. Cambridge university press, 1994.

[201] B. P. Watson and P. L. Leath. Conductivity in the two-dimensional-site percolation problem. Phys. Rev. B, 9:4893–4896, Jun 1974.

[202] Duncan J Watts. Small worlds: the dynamics of networks between order and randomness. press, 1999.

[203] Duncan J Watts and Steven H Strogatz. Collective dynamics of /‘small- world/’ networks. Nature, 393(6684):440–442, jun 1998.

[204] Gezhi Weng, Upinder S. Bhalla, and Ravi Iyengar. Complexity in biological signaling systems. Science, 284(5411):92–96, 1999.

[205] Douglas B West. Introduction to graph theory. 1996. Prentiss Hall, Upper Saddle River, NJ.

[206] Stephen Wolfram. Cellular automata and complexity: collected papers, volume 1. Addison-Wesley Reading, 1994.

118 [207] David H. Wolpert. Stacked generalization. Neural Networks, 5(2):241 – 259, 1992.

[208] Ling Heng Wong, Philippa Pattison, and Garry Robins. A spatial model for social networks. Physica A: Statistical Mechanics and its Applications, 360(1):99 – 120, 2006.

[209] Jiyoung Woo and Hsinchun Chen. An event-driven sir model for topic diffusion in web forums. In Intelligence and Security Informatics (ISI), 2012 IEEE International Conference on, pages 108–113, June 2012.

[210] Jiyoung Woo, Jaebong Son, and Hsinchun Chen. An sir model for violent topic diffusion in social media. In Intelligence and Security Informatics (ISI), 2011 IEEE International Conference on, pages 15–19, July 2011.

[211] E. Wright, S. Mahoney, K. Laskey, M. Takikawa, and T. Levitt. Multi- entity bayesian networks for situation assessment. In Information Fusion, 2002. Proceedings of the Fifth International Conference on, volume 2, pages 804–811 vol.2, July 2002.

[212] Degang Xu and Xiyang Xu. Modeling and control of dynamic network sir based on community structure. In Control and Decision Conference (2014 CCDC), The 26th Chinese, pages 4648–4653, May 2014.

[213] XIN-JIAN XU and GUANRONG CHEN. The sis model with time delay on complex networks. International Journal of Bifurcation and Chaos, 19(02):623–628, 2009.

119 [214] Dami´an H. Zanette. Critical behavior of propagation on small-world networks. Phys. Rev. E, 64:050901, Oct 2001.

[215] Jia Zeng and Zhi-Qiang Liu. Probabilistic graphical models. In Type-2 Fuzzy Graphical Models for Pattern Recognition, pages 9–16. Springer, 2015.

[216] Yifeng Zeng, Yanping Xiang, and Saulius Pacekajus. Refinement of bayesian network structures upon new data. In Granular Computing, 2008. GrC 2008. IEEE International Conference on, pages 772–777, Aug 2008.

[217] Zhi-Hua Zhou, Yuan Jiang, Yu-Bin Yang, and Shi-Fu Chen. Lung cancer cell identification based on artificial neural network ensembles. Artificial Intelligence in Medicine, 24(1):25–36, 2002.

[218] Zhi-Hua Zhou, Jianxin Wu, and Wei Tang. Ensembling neural networks: Many could be better than all. Artificial Intelligence, 137(1âĂŞ2):239 – 263, 2002.

120