Deep Generative Modeling in Network Science with Applications to Public Policy Research
Total Page:16
File Type:pdf, Size:1020Kb
Working Paper Deep Generative Modeling in Network Science with Applications to Public Policy Research Gavin S. Hartnett, Raffaele Vardavas, Lawrence Baker, Michael Chaykowsky, C. Ben Gibson, Federico Girosi, David Kennedy, and Osonde Osoba RAND Health Care WR-A843-1 September 2020 RAND working papers are intended to share researchers’ latest findings and to solicit informal peer review. They have been approved for circulation by RAND Health Care but have not been formally edted. Unless otherwise indicated, working papers can be quoted and cited without permission of the author, provided the source is clearly referred to as a working paper. RAND’s R publications do not necessarily reflect the opinions of its research clients and sponsors. ® is a registered trademark. CORPORATION For more information on this publication, visit www.rand.org/pubs/working_papers/WRA843-1.html Published by the RAND Corporation, Santa Monica, Calif. © Copyright 2020 RAND Corporation R® is a registered trademark Limited Print and Electronic Distribution Rights This document and trademark(s) contained herein are protected by law. This representation of RAND intellectual property is provided for noncommercial use only. Unauthorized posting of this publication online is prohibited. Permission is given to duplicate this document for personal use only, as long as it is unaltered and complete. Permission is required from RAND to reproduce, or reuse in another form, any of its research documents for commercial use. For information on reprint and linking permissions, please visit www.rand.org/pubs/permissions.html. The RAND Corporation is a research organization that develops solutions to public policy challenges to help make communities throughout the world safer and more secure, healthier and more prosperous. RAND is nonprofit, nonpartisan, and committed to the public interest. RAND’s publications do not necessarily reflect the opinions of its research clients and sponsors. Support RAND Make a tax-deductible charitable contribution at www.rand.org/giving/contribute www.rand.org Deep Generative Modeling in Network Science with Applications to Public Policy Research Gavin S. Hartnett,* Raffaele Vardavas¤, Lawrence Baker, Michael Chaykowsky, C. Ben Gibson, Federico Girosi, David Kennedy, and Osonde Osoba RAND Corporation, Santa Monica, CA 90401, USA Abstract Network data is increasingly being used in quantitative, data-driven public policy research. These are typically very rich datasets that contain complex correlations and inter-dependencies. This richness both promises to be quite useful for policy research, while at the same time posing a challenge for the useful extraction of information from these datasets - a challenge which calls for new data analysis methods. In this report, we formulate a research agenda of key methodological problems whose solutions would enable new advances across many areas of policy research. We then review recent advances in applying deep learning to network data, and show how these meth- ods may be used to address many of the methodological problems we identified. We particularly emphasize deep generative methods, which can be used to generate realistic synthetic networks useful for microsimulation and agent-based models capable of informing key public policy ques- tions. We extend these recent advances by developing a new generative framework which applies to large social contact networks commonly used in epidemiological modeling. For context, we also compare and contrast these recent neural network-based approaches with the more tradi- tional Exponential Random Graph Models. Lastly, we discuss some open problems where more progress is needed. *Correspondence may be addressed to Gavin Hartnett ([email protected]) or Raffaele Vardavas ([email protected]). This document has not been formally reviewed, edited, or cleared for public release. It should not be cited without the permission of the RAND Corporation. RAND’s publications do not necessarily reflect the opinions of its research clients and sponsors. RAND® is a registered trademark. 1 Contents 1 Introduction 3 2 Aims 6 2.1 List of Research Aims . 6 2.2 ReportLayout............................................ 11 3 Graph Neural Networks 11 3.1 Multi-Layer Perceptrons . 12 3.2 Graph Convolutional Networks . 13 3.3 GraphSAGE ............................................. 15 3.4 Equivariance Under Graph Isomorphisms . 15 4 Policy-Relevant Network Data 16 4.1 The NDSSL Sociocentric Dataset . 17 4.2 The FluPaths Egocentric Dataset . 20 5 Predictive Modeling Over Networks 24 5.1 Node-Level Prediction . 24 5.2 Link-Level Prediction . 27 5.3 Graph-Level Prediction . 30 6 Generative Modeling Over Networks 31 6.1 Introduction to Generative Modeling . 31 6.2 Graph Generation By Iterated Link Prediction . 33 6.3 NDSSL Example . 35 7 Exponential Random Graph Models 40 7.1 NDSSL Example . 42 8 Network Fusion 47 9 Outlook 48 A Methodological Needs in Public Policy Network Science 54 B Mathematical Glossary 56 C The NDSSL Sociocentric Dataset 61 D The FluPaths Egocentric Dataset 65 2 1 Introduction The convergence between computational simulation modeling, large datasets of unprecedented scale, and recent advancements in machine learning represent an exciting new frontier for the scientific analysis of complex systems. This is particularly the case in the study of large networked systems, such as those found in biology, critical infrastructures, and social science. For example, in biology this con- vergence promises to help us better understand how the behavior of individual neurons affects overall brain activity. For large networked infrastructures that are critical to the modern world, such as power grids, communication networks, or transportation networks, large datasets together with machine learning can be used to optimize their design and ensure that they are robust to failures. As yet an- other example, many social systems may be represented by graphs describing interactions among various complex entities (e.g., people, firms, organizations, etc.). These entities are often adaptive and change their behavior based social interactions and changes in the landscape and environment that they in part help form. The graph structure governs how these entities (also known as actors or agents) interact. Agents are represented by vertices (also known as nodes) on the graph, and their interaction occurs through the edges (also known as ties or links). Microsimulation and Agent-Based Models (ABMs) are examples of two powerful computer simulation techniques that are increasingly becoming essential tools to predict societal evolution and to compare the impacts of different “what- if” scenarios. Agent-based models in particular can model the complex social systems with adaptive agents that interact over social and physical networks. These models can be used by quantitative pub- lic policy researchers to inform policy problems such as those involving public health strategies and the design of economic markets. The scope of public policy challenges is broad and evolving, as is the range of recently developed quantitative and data-driven methodologies, especially those methodologies that employ machine learning. With this report, we aim to help bridge the gap between the machine learning research community and the community of quantitative public policy researchers to accelerate the applica- tion of new and powerful learning algorithms to complex public policy challenges. We have written the report to be broadly useful for many different policy problem areas, although the underlying moti- vation for many of the datasets and methodologies we explore is the problem of modeling the spread of contagion over a complex and evolving population. This is because modeling a contagion pro- vides a clear example of a system where the dynamics can strongly depend on network structure. The question of whether a pathogen will percolate through a population and infect a significant fraction of individuals depends both on properties intrinsic to the pathogen (such as the degree of infectivity and the incubation period) as well as properties of the underlying network of connections between individuals. The spread of a pathogen in a densely connected population will be much more rapid than in a sparsely connected one - hence the critical importance of social distancing measures in re- sponse to the COVID-19 pandemic [1]. The case of COVID-19 has underscored the importance of another network phenomenon on contagion spreading, namely the spreading of rumors, risk per- ceptions, and protective behaviors. In particular, it should be clear that it is not merely enough to model the spread of an infectious disease through some static population, which can be done rather straightforwardly with deterministic, population-based compartmental models (i.e. variants of the 3 SEIR-model), where the population is divided into compartment classes including the susceptible, exposed (latent), infectious, and removed (e.g., recovered or dead). In reality, the population is adap- tive and individual behaviors will be greatly affected by any official policy actions (school closures, mandatory social distancing measures, etc.), as well as by the risk tolerance, beliefs, and values held by individuals. Additionally, there is evidence that in situations such as this an individual’s actions are strongly influenced by the actions of their immediate social network [2], which underscores the importance of the network structure [3,4]. As with most models,