From Simple to Complex: Modeling Social Contagion Samarth Swarup Network Dynamics and Simulation Science Lab, VBI, Virginia Tech, Blacksburg, VA 24061

Theory-Driven Models Abstract Data-Driven Models Understanding the spread of information and behavior over complex social networks requires new models that go beyond The Degree-Biased Voter Model the simple contagion models of traditional epidemiology. I have been working on developing, analyzing, and simulating new The Spread of Smoking Behavior We introduced this extension to the voter model to study loners complex contagion models to shed light on phenomena such as We study the spread of smoking behavior in adolescent the role of popularity in the spread of linguistic features in popularity and social influence, group behavior in decision- populations by combining data from the National a . In this model, a node in the network making, in information transfer, and the role of Longitudinal Study of Adolescent Health (Add Health) updates its state by copying the state of a neighbor hubs demographics and network structure in behavior change. This with our synthetic population data. chosen with probability proportional to the neighbor's poster presents a range and combination of theory-driven and degree. We focus in particular on the role of “loners” and data-driven models, all of which utilize the synthetic information We developed a technique to generate synthetic “hubs” in social networks. Loners are individuals on the framework that has been developed here at the Network friendship networks for high school students in our periphery of the network who do not participate in the loners Dynamics and Simulation Science Laboratory. This is synthetic populations based on hierarchical opinion dynamics in the network, i.e., their states are transdisciplinary work which brings together ideas from statistical decompositions of the community structure of actual fixed. Hubs are highly connected central nodes, regarded physics, socio-linguistics, sociology, , and high school friendship networks. For example, the figure as opinion leaders in the network. See figure on the right. computer science, and has been carried out in collaboration with on the top right shows the hierarchical community We showed that norms emerge and are researchers both inside and outside Virginia Tech. structure of one of the networks from Add Health. stable for long periods, but can suddenly We also developed a data-driven diffusion model based on shift in “sweeping changes”. The figure on DBVM dynamics on a scale-free network with loners. Size and Add Health data on observed smoking rates of middle-school the left illustrates this behavior. This model closeness to the center are both and high-school students over a period of three years. This was used to resolve a long-standing proportional to degree in the above model takes into account both the network structure debate in sociolinguistics about the figures. (smoking status of one’s friends) and demographic factors relative roles of loners and hubs in linguistic dynamics: which like age, household income, and household size. types of individuals introduce innovations into the network, Synthetic Information Environments and which types are conservative norm Combining the friendship networks and the diffusion model preservers/enforcers? Our model showed that this is not an with our synthetic populations, we are able to do large-scale, either/or situation. Both types of agents can be seen as long-term simulations to generate epidemic curves for playing each role depending on the instant of observation. smoking, such as the one shown on the right. The model can We also showed how to analytically calculate norm duration also form the basis for developing new interventions to and the transition time between norms. reduce the prevalence of smoking in adolescent populations. The DBVM (upper) spends Publications and Presentations: Publications and Presentations: most of the time in a  Centers and Peripheries: Network Roles in Language Change Zsuzsanna Fagyal, Samarth Swarup, Anna Maria Escobar, Les Gasser, and Kiran Lakkaraju, Lingua 120(8), pp.  A Synthetic Information Environment to Study the Spread of Smoking Behavior in Adolescent Populations (invited presentation) consensus or “norm” state, 2061-2079, 2010. Samarth Swarup, Center for Policy Informatics Workshop, Arizona State University, Tempe, AZ, Feb 4-6, 2010. which is markedly different  Centers, Peripheries, and Popularity: The Emergence of Norms in Simulated Networks of Linguistic Influence  Zsuzsanna Fagyal, Samarth Swarup, Anna Maria Escobar, Les Gasser, and Kiran Lakkaraju, U. Penn. Working A Synthetic Information Environment to Study the Spread of Smoking Behavior in Adolescent Populations (in preparation) from the dynamics of the Papers in Linguistics 15(2), Article 10, 2009: Selected papers (top 10%) from the NWAV 2008 conference. Richard Beckman, Christopher Kuhlman, Achla Marathe, Elaine Nsoesie, and Samarth Swarup. voter model (lower).  The Degree-Biased Voter Model (poster) Andrea Apolloni and Samarth Swarup, Dynamics Days Conference, Evanston, IL, Jan. 4-7, 2010.

The Bi-Threshold Model The Spread of Information This model is motivated by the observation that people seem to require In this work, we model and examine the spread of independent stimulus from multiple friends or neighbors to engage in certain social information through personal interaction in a synthetic behaviors. For example, people both start and quit smoking in groups. population. We use a probabilistic model to decide whether two people will converse about a particular topic based on In this model, therefore, we assume that each node in a social network has two their similarity and familiarity. This combination is used to thresholds associated with it, the up-threshold determines how many neighbors of model homophily (“like attracts like”). Similarity is modeled a node must be in state 1 for it to switch from state 0 to 1, and the down-threshold A synthetic information environment brings together multiple data sources and transforms them by matching selected demographic characteristics, while determines how many neighbors must be in state 0 for it to switch from state 1 to 0. into contextually-structured synthetic information that is relevant to a particular problem (“query”). familiarity is modeled by the amount of contact required to For modeling social contagion phenomena, we use synthetic populations that are detailed, convey information (see the figure on the left). We resolved We use sequential dynamical systems theory to prove several results about the disaggregated descriptions of individuals (demographics) and their interactions (social contact the results by age group, daily activities, time, household long-term behavior of this system, and computational complexity analysis to show networks and friendship networks). These data are impossible to gather through survey methods, income, household size and examined the relative effect of the intrinsic hardness of some algorithmic problems of interest, such as how to find because of their scale and detail; they have to be generated, in a way that statistically matches the these factors. the nodes which are most effective in impeding the diffusion process if we level of detail in each input data source. externally set them to state 0. One interesting result of this study was to show the role Running simulations of social contagions on these synthetic populations requires high- of children in spreading information through the social Additionally, we use simulation with synthetic populations to shed light on the performance computing resources and specially-developed, highly-parallel software. network. The figure on the right shows the fraction of transient dynamical behavior of the process. informed children at different activity locations. For The methodology for generating synthetic information and the technology for running simulations children, communication occurs mostly at school. So have been developed at NDSSL over a period of several years by a large, transdisciplinary team more children get informed at school than at home. This One of the problems we study is that of finding a set of of scientists. is due to the fact that the probability of interaction with critical nodes that can minimize the spread of contagion similar individuals is higher, and the duration of if we force them to be in state 0. We present a heuristic interaction is long enough to spread the information algorithm, called the Maximum Contributor Heuristic to find (strong ties). these critical nodes. The graph on the left shows how the total Due to these long-duration interactions at school and outbreak size varies with the number of seed nodes (which are home, children form a strongly connected backbone of the nodes initially in state 1), for different values of the size of the the social interaction network. critical set (β). We see that a reasonably small-sized set (β=500) can be very effective. The network here has ~75000 nodes. Which is the right model? Publications and Presentations:  Simulating Social Information Diffusion Using a Synthetic Population (poster) Publications and Presentations: Andrea Apolloni, Karthik Channakeshava, Lisa Durbeck, Maleq Khan, Christopher Kuhlman, Bryan Lewis, and Samarth Swarup, NICO “All models are wrong. Some are useful.” Complexity Conference, Northwestern University, Evanston, IL, Sept. 1-3, 2009. Selected as "poster winner."  Simple and Complex Contagion Dynamics of Two-Choice Agent-Based Bi-Threshold Systems  A Study of Information Diffusion over a Realistic Social Network Model Christopher Kuhlman, V. S. Anil Kumar, Madhav Marathe, S. S. Ravi, Daniel Rosenkrantz, and Andrea Apolloni, Karthik Channakeshava, Lisa Durbeck, Maleq Khan, Christopher Kuhlman, Bryan Lewis, and Samarth Swarup, The Samarth Swarup, Submitted to the AAMAS 2011 conference. - Box and Draper IEEE International Symposium on Social Computing Applications (SCA-09), Vancouver, Canada, Aug 29-31, 2009.

Virginia Bioinformatics Institute 10th Anniversary, October 7, 2010.