For Peer Review Journal: ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data An Event-based Framework for Characterizing the Evolutionary Behavior of Interaction Graphs For Peer Review Journal: ACM Transactions on Knowledge Discovery from Data Manuscript ID: TKDD-2007-12-0071 Manuscript Type: Regular full paper submission Date Submitted by the 30-Dec-2007 Author: Complete List of Authors: Asur, Sitaram; Ohio State University, Computer Science and Engineering Parthasarathy, Srinivasan; Ohio State University, Computer Science and Engineering Ucar, Duygu; Ohio State University, Computer Science and Engineering Keywords: Interaction networks, Evolutionary analysis Page 1 of 42 ACM Transactions on Knowledge Discovery from Data 1 2 3 4 5 6 7 An Event-based Framework for Characterizing the 8 Evolutionary Behavior of Interaction Graphs 9 10 11 SITARAM ASUR 12 SRINIVASAN PARTHASARATHY 13 and 14 DUYGU UCAR 15 Ohio State University 16 17 18 For Peer Review 19 Interaction graphs are ubiquitous in many fields such as bioinformatics, sociology and physical 20 sciences. There have been many studies in the literature targeted at studying and mining these 21 graphs. However, almost all of them have studied these graphs from a static point of view. The 22 study of the evolution of these graphs over time can provide tremendous insight on the behavior of entities, communities and the flow of information among them. In this work, we present 23 an event-based characterization of critical behavioral patterns for temporally varying interaction 24 graphs. We use non-overlapping snapshots of interaction graphs and develop a framework for 25 capturing and identifying interesting events from them. We use these events to characterize 26 complex behavioral patterns of individuals and communities over time. We show how semantic 27 information can be incorporated to reason about community-behavior events. We also demonstrate 28 the application of behavioral patterns for the purposes of modeling evolution, link prediction and influence maximization. Finally, we present a diffusion model for evolving networks, based on our 29 framework. 30 31 Categories and Subject Descriptors: H.2.8 [Database Management]: Database Applications| Data Mining 32 33 General Terms: Algorithms, Measurement 34 Additional Key Words and Phrases: Interaction networks, Evolutionary analysis, Diffusion of innovations 35 36 37 38 39 1. INTRODUCTION 40 Many social and biological systems can be represented as complex interaction net- 41 works where nodes represent entities (i.e. individuals, proteins) and edges mimic 42 the interactions among them. These interaction networks arise from a wide variety 43 of scientific domains like computer science, physics, biology and sociology. Online 44 communities such as Flickr, MySpace and Orkut, e-mail networks, co-authorship 45 networks and WWW networks are examples of interesting interaction networks. 46 The study of these complex interaction networks can provide insight into their 47 structure, properties and behavior. 48 Early research in this area [Girvan and Newman 2002; Newman 2006; Barabasi 49 and Bonabeau 2003; Barabasi et al. 2002] has primarily focused on the static proper- 50 ties of these networks, neglecting the fact that most real-world interaction networks 51 are dynamic in nature. In reality, many of these networks constantly evolve over 52 time, with the addition and deletion of edges and nodes representing changes in the 53 interactions among the modeled entities. Recently, there has been some interest in 54 studying dynamic graphs [Leskovec et al. 2005; Backstrom et al. 2006; Chakrabarti 55 56 ACM Transactions on Computational Logic, Vol. 2, No. 3, 09 2001, Pages 1{32. 57 58 59 60 ACM Transactions on Knowledge Discovery from Data Page 2 of 42 1 2 3 4 5 2 · Sitaram Asur et al. 6 7 et al. 2006; Falkowski et al. 2006]. Identifying the portions of the network that are 8 changing, characterizing the type of change, predicting future events (e.g. link pre- 9 diction), and developing generic models for evolving networks are challenges that 10 need to be addressed. For instance, the rapid growth of online communities has dic- 11 tated the need for analyzing large amounts of temporal data to reveal community 12 structure, dynamics and evolution. 13 14 Interaction networks are often modular in nature. The interactions existing be- 15 tween nodes can be used to group them into clusters or communities. For instance, 16 in a social network, these clusters represent people with similar contact patterns 17 or interests. The problem of identifying these clusters or communities from static 18 graphs has bForeen extensiv elyPeerstudied [Girv anReviewand Newman 2002; Clauset et al. 2004; 19 Reichardt and Bornholdt 2004; Flake et al. 2002] over the past decade. However, 20 in the case of evolving graphs, the clusters are typically not static. Instead they 21 constantly change over time, as the network evolves. We believe that studying the 22 evolution of these clusters, in particular their formation, transitions and dissolution, 23 can be extremely useful for effectively characterizing the corresponding changes to 24 the network over time. 25 Another important aspect is the behavior of the nodes of the network. Nodes 26 of an evolving interaction network represent entities whose interaction patterns 27 change over time. The movement of nodes, their behavior and influence over other 28 nodes, can help make inferences regarding future interactions as well as predicting 29 changes to communities in the network. For instance, in a social network, if a 30 person is very sociable, the chances of him/her interacting with new people and 31 joining new groups is very high. In the case of a collaboration network, if a person 32 is known to collaborate frequently with different people, then the chance of a new 33 collaboration involving this person is high. The influence exerted by a node can 34 be studied in terms of its effect on other nodes. If several other people join a 35 community when a particular individual does, it indicates a high degree of positive 36 influence for that person. 37 38 Another important factor influencing behavior and evolution is the semantic na- 39 ture of the interactions itself. For instance, in a co-authorship network, two authors 40 are connected if they publish a paper together. The topic or the subject area of 41 the paper will definitely influence future collaborations for each of these authors. If 42 there are different authors working on similar topics, the chances of them collabo- 43 rating in the future is higher than two authors working on unrelated areas. We wish 44 to examine the influence of the semantics of the interaction on future interactions. 45 The study of diffusion or flow of information in an evolving network is impor- 46 tant for social science research, viral marketing applications and epidemiology. For 47 instance, pandemic viruses pose a severe threat to society due to their potential 48 to spread rapidly and cause tremendous widespread illnesses and deaths. In viral 49 marketing, the goal is to propagate an idea or innovation through an interaction 50 network. Analysis of the evolution of interactions in a social network and the 51 identification of influential nodes can be used to devise effective containment poli- 52 cies for pandemic disease spread in the case of epidemiological studies, as well as 53 `word-of-mouth' advertising in marketing. 54 In this paper, we provide an event-based framework for characterizing the evolu- 55 56 ACM Transactions on Computational Logic, Vol. 2, No. 3, 09 2001. 57 58 59 60 Page 3 of 42 ACM Transactions on Knowledge Discovery from Data 1 2 3 4 5 An Event-based Framework for Characterizing the Evolutionary Behavior of Interaction Graphs · 3 6 7 tion of interaction networks. We begin by converting an evolving graph into static 8 snapshot graphs at different time points. We obtain clusters at each of these snap- 9 shots independently. Next, we characterize the transformations of these clusters by 10 defining and identifying certain critical events. We define efficient incremental al- 11 gorithms involving bit-matrix computations for this purpose. We use these critical 12 events to compute and reason about novel behavior-oriented measures, which offer 13 new and interesting insights for the characterization of dynamic behavior of inter- 14 action graphs. We also develop semantic information measures that can be used 15 to analyze and reason about community behavior. We illustrate our framework 16 17 on three different evolving networks - the DBLP co-authorship network, Wikipedia 18 and a clinicalFortrials patien Peert network. In eacReviewh case, the behavioral patterns that we 19 discover using our framework help us make useful inferences about cluster evolution 20 and link prediction. Finally, we use the behavioral measures to detail a diffusion 21 model for evolving networks and demonstrate their application for the task of in- 22 fluence maximization. 23 In short, the key contributions of this work are 24 (1) The identification of key critical events that occur in evolving interaction 25 networks. 26 (2) Efficient incremental algorithms for the discovery of these critical events 27 (3) Novel behavioral measures for stability, sociability, influence and popularity 28 that can be computed incrementally over time 29 30 (4) A diffusion model for evolving networks based on our framework 31 (5) Application of the events and behavioral measures on two real datasets for 32 modeling evolution, predicting behavior and trends (link prediction) and influence 33 maximization. 34 35 2. RELATED WORK 36 There has been enormous interest in mining interaction graphs for interesting pat- 37 terns in various domains. However, the majority of these studies [Girvan and 38 Newman 2002; Newman 2006; Barabasi and Bonabeau 2003; Barabasi et al. 2002; 39 Clauset et al. 2004; Reichardt and Bornholdt 2004; Flake et al. 2002] have fo- 40 cused on mining static graphs to identify community structures, patterns and novel 41 information.

Load more