Sentinel Surveillance Network Design and Novel Uses of Wikipedia

Total Page:16

File Type:pdf, Size:1020Kb

Sentinel Surveillance Network Design and Novel Uses of Wikipedia - Improving disease surveillance : sentinel surveillance network design and novel uses of Wikipedia. Fairchild, Geoffrey Colin https://iro.uiowa.edu/discovery/delivery/01IOWA_INST:ResearchRepository/12730531290002771?l#13730830480002771 Fairchild, G. C. (2015). Improving disease surveillance: sentinel surveillance network design and novel uses of Wikipedia [University of Iowa]. https://doi.org/10.17077/etd.sfbu328x https://iro.uiowa.edu Copyright 2014 Geoffrey Colin Fairchild Downloaded on 2021/10/01 19:58:37 -0500 - IMPROVING DISEASE SURVEILLANCE: SENTINEL SURVEILLANCE NETWORK DESIGN AND NOVEL USES OF WIKIPEDIA by Geoffrey Colin Fairchild A thesis submitted in partial fulfillment of the requirements for the Doctor of Philosophy degree in Computer Science in the Graduate College of The University of Iowa December 2014 Thesis Supervisor: Professor Alberto Maria Segre Graduate College The University of Iowa Iowa City, Iowa CERTIFICATE OF APPROVAL PH.D. THESIS This is to certify that the Ph.D. thesis of Geoffrey Colin Fairchild has been approved by the Examining Committee for the thesis requirement for the Doctor of Philosophy degree in Computer Science at the December 2014 graduation. Thesis committee: Alberto M. Segre, Thesis Supervisor Philip M. Polgreen Sriram Pemmaraju Ted Herman Gerard Rushton Sara Y. Del Valle ACKNOWLEDGEMENTS A Ph.D. is a long and arduous journey. I've been lucky to have been sur- rounded by a truly wonderful group of people over the last 6 years, and I know for a fact that I wouldn't have made it this far without them. First and foremost, I want to thank Dr. Alberto Segre and Dr. Phil Polgreen for their patience, guidance, and advice. They were quick to recognize my affinity for medicine and were able to direct me to a research area that I wasn't even aware of prior to grad school. Computational epidemiology, as it turns out, is the perfect union between my skill set and research interests. Both of you have been instrumental in molding me into the computer scientist I am today, and I am forever grateful. I also want to thank the rest of the CompEpi group at Iowa. Dr. Sriram Pemmaraju, Dr. Ted Herman, and Dr. Gerard Rushton have been especially great at critiquing my work and offering advice when necessary. The rest of my cohort at Iowa was equally important, providing many laughs and thought-provoking discussions. Dr. Sara Del Valle, I'm not sure you'll ever know how much I appreciate the opportunity you've given me at Los Alamos. I've met some brilliant and inspiring people here, and you've opened up new avenues for me that I didn't even know were available. I hope that one day I can pay it forward. To my friends outside of academia and work, thanks for keeping me sane and grounded. It's easy to lose track of how to have fun during grad school, but you all ensured that I never fell into that trap. ii My mom, sister, dad, and stepmom have provided the kind of unconditional love and support that I so often needed. Mom, I am forever indebted to you for the sacrifices you made in order for Britt and me to have the chances we did. You kept us in good schools, and you've always been encouraging. Britt, you're my best friend. We can talk about anything, we always support one another, and I know you've got my back when I need it. Dad and Linda, I love our daily emails and weekly conversations about life. You've both always shown such an interest in my work, and I really appreciate that. I love you all more than you'll ever know. I'm so lucky to have you in my life. Finally, I want to thank Tamiko. You supplied motivation when there was none, laughs when I was down, deep discussions, and constant encouragement. With- out you, I'm not sure I would've survived grad school. iii ABSTRACT Traditional disease surveillance systems are instrumental in guiding policy- makers' decisions and understanding disease dynamics. The first study in this dis- sertation looks at sentinel surveillance network design. We consider three location- allocation models: two based on the maximal coverage model (MCM) and one based on the K-median model. The MCM selects sites that maximize the total number of people within a specified distance to the site. The K-median model minimizes the sum of the distances from each individual to the individual's nearest site. Using a ground truth dataset consisting of two million de-identified Medicaid billing records representing eight complete influenza seasons and an evaluation function based on the Huff spatial interaction model, we empirically compare networks against the existing volunteer-based Iowa Department of Public Health influenza-like illness network by simulating the spread of influenza across the state of Iowa. We compare networks on two metrics: outbreak intensity (i.e., disease burden) and outbreak timing (i.e., the start, peak, and end of the epidemic). We show that it is possible to design a network that achieves outbreak intensity performance identical to the status quo network using two fewer sites. We also show that if outbreak timing detection is of primary interest, it is actually possible to create a network that matches the existing network's performance using 42% fewer sites. Finally, in an effort to demonstrate the generic usefulness of these location-allocation models, we examine primary stroke center selection. We describe the ineffectiveness of the current self-initiated approach iv and argue for a more organized primary stroke center system. While these traditional disease surveillance systems are important, they have several downsides. First, due to a complex reporting hierarchy, there is generally a reporting lag; for example, most diseases in the United States experience a reporting lag of approximately 1{2 weeks. Second, many regions of the world lack trustworthy or reliable data. As a result, there has been a surge of research looking at using publicly available data on the internet for disease surveillance purposes. The second and third studies in this dissertation analyze Wikipedia's viability in this sphere. The first of these two studies looks at Wikipedia access logs. Hourly access logs dating back to December 2007 are available for anyone to download completely free of charge. These logs contain, among other things, the total number of accesses for every article in Wikipedia. Using a linear model and a simple article selection procedure, we show that it is possible to nowcast and, in some cases, forecast up to the 28 days tested in 8 of the 14 disease-location contexts considered. We also demonstrate that it may be possible in some cases to train a model in one context and use the same model to nowcast or forecast in another context with poor surveillance data. The second of the Wikipedia studies looked at disease-relevant data found in the article content. A number of disease outbreaks are meticulously tracked on Wikipedia. Case counts, death counts, and hospitalization counts are often provided in the article narrative. Using a dataset created from 14 Wikipedia articles, we trained a named-entity recognizer (NER) to recognize and tag these phrases. The NER achieved an F1 score of 0.753. In addition to these counts in the narrative, v we tested the accuracy of tabular data using the 2014 West African Ebola virus disease epidemic. This article, like a number of other disease articles on Wikipedia, contains granular case counts and deaths counts per country affected by the disease. By computing the root-mean-square error between the Wikipedia time series and a ground truth time series, we show that the Wikipedia time series are both timely and accurate. vi PUBLIC ABSTRACT This dissertation presents three studies that aim to improve the current state of disease surveillance. The first study looks at designing traditional sentinel surveillance systems, which are instrumental in guiding policy-makers' decisions and understanding disease dynamics. We use several popular location-allocation models to algorithmically design surveillance networks of varying sizes. By simulating the spread of influenza across the state of Iowa, we demonstrate that we are capable of generating smaller networks capable of performance at least as good as the volunteer-based influenza surveillance network used by the Iowa Department of Public Health. We also apply these network design methods to primary stroke center placement. The second and third studies recognize that while these traditional surveillance systems are important, they have drawbacks, such as reporting lags and untrustwor- thy, unreliable, or unavailable data in some instances. To help solve these problems, we introduce a novel data source: Wikipedia. The second study displays how a linear model using time series of article accesses can nowcast and forecast a variety of diseases in a variety of locations. Finally, the third study demonstrates how disease-related data can be elicited from Wikipedia article content. We show how a named-entity recognizer can be trained to tag case, death, and hospitalization counts in the article narrative. We also analyze tabular time series data and show that they are accurate and timely. vii TABLE OF CONTENTS LIST OF TABLES . .x LIST OF FIGURES . xi CHAPTER 1 INTRODUCTION . .1 2 HOW MANY SUFFICE? A COMPUTATIONAL METHOD FOR SIZ- ING SENTINEL SURVEILLANCE NETWORKS . .8 2.1 Background . .9 2.2 Methods . 13 2.2.1 Online tool . 13 2.2.2 Algorithms . 15 2.2.2.1 Maximal coverage model . 15 2.2.2.2 K-median model . 17 2.2.3 Validation . 18 2.2.3.1 Medicaid dataset . 18 2.2.3.2 Simulation . 22 2.3 Results . 24 2.3.1 Outbreak intensity .
Recommended publications
  • VFMV SCRIPT on LINE VERSION 5:20:21.Docx
    1 VACCINATION from the Misinformation Virus We SEE MONTAGE OF OLD PIX: 1918 FLU PANDEMIC Bette Korber, PhD Our lifespan for most of human history is short, right? We had 40 years if we were lucky and 40%, 45% of kids, depending on the era of history died before the age of 5. Then when you think about our lives, we get to see our children and our grandchildren grow up. The expectation is if you have a baby, you're going to get to see that baby grow up, that wasn't always the case. So the idea that we get these long lives, we get to see our children grow up, we can expect to see our children grow up, this is a gift of science. We SEE FAMILY PICTURES from our INTERVIEWEES as we HEAR: Denise A. Gonzales, MD I think we've forgotten a lot of the lessons that we've learned over the last hundred years since vaccines were developed. There was a big polio outbreak, devastating. Then we developed a vaccine and we called it cured, we were free of polio. Likewise with measles, mumps, rubella and scarlet fever. John D. Grabenstein, RPh, PhD The medicine that has saved more lives than anything else is vaccines. Not insulin, not antibiotics, not medicines for blood pressure. It's vaccines that have saved the most lives. It's really a remarkable contribution to human health. We SEE MORE FAMILY IMAGES from our INTERVIEWEES as we HEAR Announcer: “Funding for this program was provided by ... Presbyterian Healthcare Services, STC Health, The City of Albuquerque and SafeTeen New Mexico.
    [Show full text]
  • HERBERT W. HETHCOTE Home Address: 1866 Commodore Ln NW Bainbridge Island, WA 98110-2628 Phone: 206-855-0881 Email: [email protected]
    Curriculum Vitae (August 2009) HERBERT W. HETHCOTE Home Address: 1866 Commodore Ln NW Bainbridge Island, WA 98110-2628 phone: 206-855-0881 email: [email protected] Former Business Address: retired in 2006 from Department of Mathematics University of Iowa, Iowa City, Iowa 52242 homepage: http://www.math.uiowa.edu/~hethcote/ EDUCATIONAL AND PROFESSIONAL HISTORY 1. Higher Education University of Michigan, Mathematics, Ph.D., December 1968. University of Michigan, Mathematics, M.S., June 1965. Univ. of Colorado, Applied Mathematics, College of Engineering, B.S., June 1964. 2. Professional and Academic Positions Emeritus Professor, 2006-present; Professor, 1979-2006; Associate Professor, 1973-79; Assistant Professor, 1969-73; Department of Mathematics, University of Iowa Visiting Lecturer, Mathematical Epidemiology course, Departments of Applied Mathematics and Global Health, University of Washington, Spring Quarter, 2009, http://www.amath.washington.edu/courses/504-spring-2009/AMATH.html Visiting Professor, Abteilung fur Mathematik in den Naturwissenschaften und Mathematische Biologie, Technical University of Vienna, Austria, May-June, 1997 Visitor, Department of Mathematics, University of Hawaii, Honolulu, Hawaii, 1992-93 Visiting Mathematician, Laboratory of Theoretical Biology, National Institutes of Health, Bethesda, Maryland, 1980-81. Visiting Associate Professor, Department of Mathematics, Oregon State University, Corvallis, Oregon, 1977-78. Visiting Mathematician, Department of Biomathematics, M.D. Anderson Hospital and Cancer
    [Show full text]
  • Speaker Titles and Abstracts
    Pros and Cons of ABMs for Epidemic Modeling Virtual Workshop June 21-22, 2021 SPEAKER TITLES/ABSTRACTS David Banks Duke University/SAMSI “Inference Without Likelihood” In most ABM applications, one cannot write out the likelihood function. In those cases, the only two recourses that statisticians have are emulators and approximate Bayesian computation. This talk describes the strengths and weaknesses fo those methods in the context of epidemiologic modeling. Paul Birrell Public Health England “Real-time Challenges in Modelling a Real-life Pandemic” In England policy decisions during the SARS-COV-2 pandemic have relied on prompt scientific evidence on the state of the pandemic. Since March 2020, through weekly contributions to governmental advisory groups, the "PHE/Cambridge" real-time model has contributed to real time pandemic monitoring and assessment, from an early exponential phase, to the current embryonic third wave. Mostly, these contributions have been in the form of estimation of key epidemic quantities and short-to-medium-term projections of severe disease. In this talk I describe the statistical transmission modelling framework we have used throughout and its continual adaptation throughout the pandemic to tackle the introduction of vaccinations, the ecological effects of new variants, the changing relationship between available data streams and many other challenges posed by the ever-evolving epidemiology. As datasets get longer and models become more complex, more attention must be paid to the computational aspects of fitting such a model to ensure that model developments do not jeopardise the timely provision of relevant, converged, outputs to policymakers. Georgiy Bobashev RTI “Hybrid System Dynamics and Agent-Based Models” Predictive models range in levels of complexity, applicability, and the tasks they can perform.
    [Show full text]
  • A Quantitative Portrait of Wikipedia's High-Tempo Collaborations During
    TBD A Quantitative Portrait of Wikipedia’s High-Tempo Collaborations during the 2020 Coronavirus Pandemic BRIAN C. KEEGAN, Department of Information Science, University of Colorado Boulder, United States CHENHAO TAN, Department of Computer Science, University of Colorado Boulder, United States The 2020 coronavirus pandemic was a historic social disruption with significant consequences felt around the globe. Wikipedia is a freely-available, peer-produced encyclopedia with a remarkable ability to create and revise content following current events. Using 973,940 revisions from 134,337 editors to 4,238 articles, this study examines the dynamics of the English Wikipedia’s response to the coronavirus pandemic through the first five months of 2020 as a “quantitative portrait” describing the emergent collaborative behavior atthree levels of analysis: article revision, editor contributions, and network dynamics. Across multiple data sources, quantitative methods, and levels of analysis, we find four consistent themes characterizing Wikipedia’s unique large-scale, high-tempo, and temporary online collaborations: external events as drivers of activity, spillovers of activity, complex patterns of editor engagement, and the shadows of the future. In light of increasing concerns about online social platforms’ abilities to govern the conduct and content of their users, we identify implications from Wikipedia’s coronavirus collaborations for improving the resilience of socio-technical systems during a crisis. Additional Key Words and Phrases: COVID-19; online collaboration; crisis informatics; public health ACM Reference Format: Brian C. Keegan and Chenhao Tan. 2020. A Quantitative Portrait of Wikipedia’s High-Tempo Collaborations during the 2020 Coronavirus Pandemic. Proc. ACM Hum.-Comput. Interact. 3, CSCW, Article TBD (Novem- ber 2020), 36 pages.
    [Show full text]
  • Arxiv:2103.15561V2 [Q-Bio.PE] 21 Apr 2021 Simulation Results
    Pyfectious: An individual-level simulator to discover optimal containment polices for epidemic diseases Arash Mehrjou* Ashkan Soleymani* Max Planck Institute for Intelligent Systems Max Planck Institute for Intelligent Systems Tübingen, Germany & Tübingen, Germany ETH Zürich, Zürich, Switzerland [email protected] [email protected] Amin Abyaneh Samir Bhatt Max Planck Institute for Intelligent Systems Faculty of Medicine, School of Public Health Tübingen, Germany Imperial College [email protected] London, UK [email protected] Bernhard Schölkopf Stefan Bauer Max Planck Institute for Intelligent Systems Max Planck Institute for Intelligent Systems & Tübingen, Germany CIFAR Azrieli Global Scholar [email protected] Tübingen, Germany [email protected] Abstract Simulating the spread of infectious diseases in human communities is critical for predicting the trajectory of an epidemic and verifying various policies to control the devastating impacts of the outbreak. Many existing simulators are based on compartment models that divide people into a few subsets and simulate the dy- namics among those subsets using hypothesized differential equations. However, these models lack the requisite granularity to study the effect of intelligent policies that influence every individual in a particular way. In this work, we introduce a simulator software capable of modeling a population structure and controlling the disease’s propagation at an individualistic level. In order to estimate the confi- dence of the conclusions drawn from the simulator, we employ a comprehensive probabilistic approach where the entire population is constructed as a hierarchi- cal random variable. This approach makes the inferred conclusions more robust against sampling artifacts and gives confidence bounds for decisions based on the arXiv:2103.15561v2 [q-bio.PE] 21 Apr 2021 simulation results.
    [Show full text]
  • COVID-19 Forecasting Using Web-Based and Participatory Survey Data
    COVID-19 Forecasting using Web-Based and Participatory Survey Data Kavya Kopparapu Bryan Wilder Harvard University Harvard University Cambridge, MA, USA Cambridge, MA, USA [email protected] [email protected] Milind Tambe Maia Majumder Harvard University Boston Children’s Hospital and Harvard Medical School Cambridge, MA, USA Boston, MA, USA [email protected] [email protected] ABSTRACT Given these impacts, predicting the spread of epidemics is essen- The COVID-19 pandemic has had a significant and unprecedented tial to mounting an appropriate public health response. A predictive impact on the world since 2020. Concurrently, the rise of internet model is only as good as its underlying data sources and so under- technology has led to the development of several significant data standing the role played by different data sources is an important sources for forecasting the development of the pandemic, includ- digital epidemiology question. More nuanced comparisons can help ing but not limited to mobility data, crowdsourced symptoms, and guide the development of predictive models, as well as inform search trends. While there has been significant algorithmic interest priorities for the development of new data sources. In the past, fore- in developing forecasting models, previous work has not provided casting has targeted seasonal diseases (such as influenza) or else a systematic investigation of the relative utility of different data more localized outbreaks (e.g., dengue, Ebola, or Zika). In response sources for COVID-19 forecasting. We present the first work com- to these outbreaks, traditional forecasting utilizes epidemiological paring internet data sources for use in deep learning models to data – such as patient-level data or aggregated data regarding cases, predict case incidence of COVID-19 across states the US.
    [Show full text]
  • COVID-19 Reopening Strategies at the County Level in the Face of Uncertainty: Multiple Models for Outbreak Decision Support Katr
    medRxiv preprint doi: https://doi.org/10.1101/2020.11.03.20225409; this version posted November 5, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity. It is made available under a CC-BY-NC-ND 4.0 International license . COVID-19 reopening strategies at the county level in the face of uncertainty: Multiple Models for Outbreak Decision Support Katriona Shea1, Rebecca K. Borchering1, William J.M. Probert2, Emily Howerton1, Tiffany L. Bogich1, Shouli Li3, Willem G. van Panhuis4, Cecile Viboud5, Ricardo Aguás2, Artur Belov6, Sanjana H. Bhargava7, Sean Cavany8, Joshua C. Chang9,10, Cynthia Chen11, Jinghui Chen12, Shi Chen13,14, YangQuan Chen15, Lauren M. Childs16, Carson C. Chow17, Isabel Crooker18, Sara Y. Del Valle18, Guido España8, Geoffrey Fairchild18, Richard C. Gerkin19, Timothy C. Germann18, Quanquan Gu12, Xiangyang Guan11, Lihong Guo20, Gregory R. Hart21, Thomas J. Hladish7,22, Nathaniel Hupert23, Daniel Janies24, Cliff C. Kerr21, Daniel J. Klein21, Eili Klein25, 26, Gary Lin25, Carrie Manore18, Lauren Ancel Meyers27, John Mittler28, Kunpeng Mu29, Rafael C. Núñez21, Rachel Oidtman8, Remy Pasco30, Ana Pastore y Piontti29, Rajib Paul13, Carl A. B. Pearson31, Dianela R. Perdomo7, T Alex Perkins8, Kelly Pierce32, Alexander N. Pillai7, Rosalyn Cherie Rael18, Katherine Rosenfeld21, Chrysm Watson Ross18, Julie A. Spencer18, Arlin B. Stoltzfus33, Kok Ben Toh34, Shashaank Vattikuti17, Alessandro Vespignani29, Lingxiao Wang12, Lisa White2, Pan Xu12, Yupeng Yang26, Osman N. Yogurtcu6, Weitong Zhang12, Yanting Zhao35, Difan Zou12, Matthew Ferrari1, David Pannell36, Michael Tildesley37, Jack Seifarth1, Elyse Johnson1, Matthew Biggerstaff38, Michael Johansson38, Rachel B.
    [Show full text]
  • Vaccine Safety and Efficacy Date: March 2, 2021
    MAT Workgroup Name: Vaccine Safety and Efficacy Date: March 2, 2021 Question or request: Does the Medical Advisory Team recommend that providers administer the Johnson & Johnson/Janssen vaccine to New Mexicans? Recommendations: The Medical Advisory Team (MAT) Vaccine Safety and Efficacy Workgroup has reviewed the relevant documents related to the Janssen Ad26.COV2.S COVID-19 vaccine as provided by the manufacturer, the U.S. Food and Drug Administration, and the U.S. Centers for Disease Control and Prevention and concurs with the conclusions that the vaccine may prevent serious or life-threatening disease and that the known and potential benefits of the product outweigh the known and potential risks of the product. Therefore, the MAT recommends: 1. For those individuals who elect to receive a vaccine, approved healthcare providers in New Mexico should administer the Janssen Ad26.COV2.S COVID-19 vaccine consistent with the Emergency Use Authorization issued by the U.S. Food and Drug Administration and the clinical guidance provided by the U.S. Centers for Disease Control and Prevention. 2. Approved healthcare providers in New Mexico who administer the Janssen Ad26.COV2.S COVID-19 vaccine should ensure that each potential vaccine recipient receives the required vaccine information sheet to allow for a well- informed decision and receives the required vaccination documentation card upon vaccination. 3. Approved healthcare providers and health systems should monitor on-going and updated guidance from U.S. Food and Drug Administration and the U.S. Centers for Disease Control and Prevention related to the management of vaccine supplies, which may be provided on official websites or via official email messages to pharmacies, professional associations, or state officials.
    [Show full text]
  • Enhancing Disease Surveillance with Novel Data Streams: Challenges and Opportunities
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by eScholarship - University of California UCSF UC San Francisco Previously Published Works Title Enhancing disease surveillance with novel data streams: challenges and opportunities. Permalink https://escholarship.org/uc/item/7018t7vn Journal EPJ data science, 4(1) ISSN 2193-1127 Authors Althouse, Benjamin M Scarpino, Samuel V Meyers, Lauren Ancel et al. Publication Date 2015 DOI 10.1140/epjds/s13688-015-0054-0 Peer reviewed eScholarship.org Powered by the California Digital Library University of California Althouse et al. EPJ Data Science (2015)4:17 DOI 10.1140/epjds/s13688-015-0054-0 REGULAR ARTICLE OpenAccess Enhancing disease surveillance with novel data streams: challenges and opportunities Benjamin M Althouse1*†, Samuel V Scarpino1*†, Lauren Ancel Meyers1,2,JohnWAyers3, Marisa Bargsten4, Joan Baumbach4,JohnSBrownstein5,6,7, Lauren Castro8, Hannah Clapham9, Derek AT Cummings9, Sara Del Valle8, Stephen Eubank10, Geoffrey Fairchild8, Lyn Finelli11, Nicholas Generous8, Dylan George12, David R Harper13, Laurent Hébert-Dufresne1, Michael A Johansson14,KevinKonty15,MarcLipsitch16, Gabriel Milinovich17, Joseph D Miller18,ElaineONsoesie5,6,DonaldROlson15, Michael Paul19, Philip M Polgreen20, Reid Priedhorsky8, Jonathan M Read21,22, Isabel Rodríguez-Barraquer9, Derek J Smith23, Christian Stefansen24,DavidLSwerdlow25, Deborah Thompson4, Alessandro Vespignani26 and Amy Wesolowski16 *Correspondence: [email protected]; Abstract [email protected] 1Santa Fe Institute, Santa Fe, NM, Novel data streams (NDS), such as web search data or social media updates, hold USA promise for enhancing the capabilities of public health surveillance. In this paper, we †Equal contributors outline a conceptual framework for integrating NDS into current public health Full list of author information is available at the end of the article surveillance.
    [Show full text]
  • Printable Program Printed On: 08/20/2021 Wednesday, June 2
    Printable Program Printed on: 09/26/2021 Wednesday, June 2 SDSS Virtual Expo SDSS Hours Wed, Jun 2, 11:30 AM - 6:30 PM GS01 - Welcome and Plenary Panel: Equitable and Inclusive Data and Technology General Session Wed, Jun 2, 11:45 AM - 1:00 PM Chair(s): Wendy Martinez, Bureau of Labor Statistics Equitable and Inclusive Data and Technology Safiya Noble, University of California, Los Angeles; Stephen T. Ziliak, Roosevelt University; Jamelle Watson-Daniels, Data for Black Lives CS01 - Data Science Shaping the Financial World Refereed Wed, Jun 2, 1:10 PM - 2:45 PM Chair(s): Kali Prasun Chowdhury, University of California, Irvine 1:15 PM BERT as a Filter to Detect Pharmaceutical Innovations in News Articles Martha Czernuszenko, The University of Virginia 1:45 PM Dissecting the 2015 Chinese Stock Market Crash Min Shu, University of Wisconsin-Stout 2:15 PM A Hierarchical Bayesian Approach to Detecting Structural Changes in Bank Liquidity Premia Padma Ranjini Sharma, Federal Reserve Bank of Kansas City CS02 - Bayesian Approaches Refereed Wed, Jun 2, 1:10 PM - 2:45 PM Chair(s): Claire Bowen, Urban Institute 1:15 PM Parameter Estimation for Ising Model with Variational Bayes Minwoo Kim, Michigan State University 1:45 PM Bayesian Heteroskedasticity-Robust Regression Gabriel Durbin Lewis, University of Massachusetts Amherst 2:15 PM A Scalable Bayesian Hierarchical Modeling Approach for Large Spatio-Temporal Count Data Aritz Adin, Public University of Navarre CS03 - Shaping Decisions with Classification and Clustering Refereed Wed, Jun 2, 1:10 PM - 2:45 PM Chair(s): Thomas Chen, Academy for Mathematics, Science, and Engineering 1:15 PM Unbiased Estimations Based on Binary Classifiers: A Maximum Likelihood Approach Marco J.H.
    [Show full text]
  • Agent-Based Modeling Approaches for Simulating Infectious Diseases
    LA-UR-13-26511 Agent-based Modeling Approaches for Simulating Infectious Diseases Sara Del Valle, PhD Los Alamos National Laboratory Team: Sue Mniszewski, Mac Hyman, Reid Priedhorsky, Geoffrey Fairchild SAMSI Program on Computational Methods in Social Sciences August 18, 2013 UNCLASSIFIED Operated by Los Alamos National Security, LLC for NNSA Motivation “Previously, scientists had two pillars of understanding: theory and experiment. Now there is a third pillar: simulation” U.S. Secretary of Energy Steven Chu” Operated by Los Alamos National Security, LLC for NNSA Agent-based Model (ABM) A computational approach for simulating the actions of agents that interact within an environment. Operated by Los Alamos National Security, LLC for NNSA Agent-based Model (ABM) A computational approach for simulating the actions of agents that interact within an environment. Operated by Los Alamos National Security, LLC for NNSA ABM History The concept was originally discovered in the 1940s by Stanislaw Ulam and John von Neumann while working at LANL Von Neumann Neighborhood ABM History Due to the computational requirements, ABMs did not become widespread until the 1990s SimCity Why ABM? System is too complex to be captured by analytical expressions or experiments Experiments are too expensive or undesirable Operated by Los Alamos National Security, LLC for NNSA ABM Characteristics Heterogeneity Individual Agent Behavior Specification Explicit Space Complex Interactions Randomness Emergent Phenomena Operated by Los Alamos National Security, LLC for NNSA Ecological Systems Ecological Systems Understand the causes of this behavior Fish, ants, humans, locus, birds, honeybees, immune system, bacterial infection, tumor proliferation Locus plagues can contain up to 109 individuals Ecological Systems Buhl et al Experiments Vicsek et al.
    [Show full text]
  • Enhancing Disease Surveillance with Novel Data Streams: Challenges and Opportunities
    Enhancing Disease Surveillance with Novel Data Streams: Challenges and Opportunities Benjamin M. Althouse, PhD, ScM1,*,y, Samuel V. Scarpino, PhD1,*,y, Lauren Ancel Meyers, PhD1,2, John Ayers, PhD, MA3, Marisa Bargsten, MPH4, Joan Baumbach, MD, MPH, MS4, John S. Brownstein, PhD5,6,7, Lauren Castro, BA8, Hannah Clapham, PhD9, Derek A.T. Cummings, PhD, MHS, MS9, Sara Del Valle, PhD8, Stephen Eubank, PhD10, Geoffrey Fairchild, MS8, Lyn Finelli, DrPH, MS11, Nicholas Generous, MS8, Dylan George, PhD12, David R. Harper, PhD13, Laurent H´ebert-Dufresne, PhD, MSc1, Michael A Johansson, PhD14, Kevin Konty, MSc15, Marc Lipsitch, DPhil16, Gabriel Milinovich, PhD17, Joseph D. Miller, MBA, PhD18, Elaine O. Nsoesie, PhD5,6, Donald R Olson, MPH15, Michael Paul10, Philip M. Polgreen, MD, MPH20, Reid Priedhorsky, PhD8, Jonathan M. Read, PhD21,22, Isabel Rodr´ıguez-Barraquer9, Derek Smith, PhD23, Christian Stefansen24, David L. Swerdlow, MD25, Deborah Thompson, MD, MSPH4, Alessandro Vespignani, PhD26, and Amy Wesolowski, PhD16 1Santa Fe Institute, Santa Fe, NM 2The University of Texas at Austin, Austin, TX, USA 3San Diego State University, San Diego, CA 4New Mexico Department of Health, Santa Fe, NM 5Children's Hospital Informatics Program, Boston Children's Hospital 6Department of Pediatrics, Harvard Medical School, Boston 7Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Quebec, Canada 8Defense Systems and Analysis Division, Los Alamos National Laboratory, Los Alamos, NM 9Department of Epidemiology, Johns
    [Show full text]