Thousands of Small, Constant Rallies:A Large-Scale Analysis of Partisan Whatsapp Groups

Total Page:16

File Type:pdf, Size:1020Kb

Thousands of Small, Constant Rallies:A Large-Scale Analysis of Partisan Whatsapp Groups 1 54 2 Thousands of Small, Constant Rallies: 55 3 56 4 A Large-Scale Analysis of Partisan WhatsApp Groups 57 5 58 6 Victor S. Bursztyn Larry Birnbaum 59 7 [email protected] [email protected] 60 8 Department of Computer Science Department of Computer Science 61 9 Northwestern University Northwestern University 62 10 Evanston, Illinois, USA Evanston, Illinois, USA 63 11 64 ABSTRACT 1 INTRODUCTION 12 65 13 There is growing concern about the use of social platforms On October 28th 2018, amid strong polarization and con- 66 14 to push political narratives during elections. One very recent spiracy theories that flooded social media, Brazilians elected 67 15 case is Brazil’s, where WhatsApp is now widely perceived far-right candidate Jair Bolsonaro their next president. With 68 16 as a key enabler of the far-right’s rise to power. In this paper, over 120M users in Brazil – the second largest market in 69 17 we perform a large-scale analysis of partisan WhatsApp the world – the role that WhatsApp played in this electoral 70 18 groups to shed light on how both right- and left-wing users process has emerged as a major focus of attention and con- 71 19 leveraged the platform in the 2018 Brazilian presidential troversy due to its alleged importance in a large number of 72 20 election. Across its two rounds, we collected more than 2.8M successful political campaigns. 73 21 messages from over 45k users in 232 public groups (175 right- Indeed, WhatsApp in Brazil connects an audience com- 74 22 wing vs. 57 left-wing). After describing how we obtained a parable in scale to television’s to content created and dis- 75 23 sample that is many times larger than previous works, we tributed with almost no barriers or filters other than user 76 24 compare right- and left-wing users on their social network curation. Although the promise of inexpensive one-to-one 77 25 metrics, regional distribution, content-sharing habits, and mobile communication may have sparked this popularity – 78 26 most characteristic news sources. WhatsApp’s 1.5B users are largely in developing countries 79 27 – group chats arguably made it catch fire. The app allows 80 28 CCS CONCEPTS users to create groups where up to 256 users can share text 81 29 • Networks → Online social networks. and multimedia messages, transforming these groups into 82 30 highly active social spaces. Users can also create public in- 83 31 KEYWORDS vites to their groups and share them as URLs across the web, 84 transforming these groups into small public forums. 32 chat applications, WhatsApp, elections, partisanship, data 85 However, the fact that all messages circulate with end-to- 33 collection, social network analysis 86 34 end encryption hinders transparency and even law-enforcement, 87 making WhatsApp a fertile ground for bad actors. A recent 35 ACM Reference Format: 88 36 Victor S. Bursztyn and Larry Birnbaum. 2018. Thousands of Small, study on the Brazilian electorate found that false stories 89 37 Constant Rallies: A Large-Scale Analysis of Partisan WhatsApp that circulated massively through WhatsApp’s network were 90 38 Groups. In Woodstock ’18: ACM Symposium on Neural Gaze Detection, more far-reaching than initially assumed, as it revealed that 91 39 June 03–05, 2018, Woodstock, NY. ACM, New York, NY, USA, 5 pages. 90% of Bolsonaro’s supporters think they are truthful [2]. But 92 40 https://doi.org/10.1145/1122445.1122456 measuring effects on the surface only to speculate about such 93 41 a complex network isn’t enough to protect future elections 94 42 from the use of WhatsApp as a political weapon. Instead, it’s 95 43 Permission to make digital or hard copies of all or part of this work for necessary to learn how to measure partisan activity at scale. 96 personal or classroom use is granted without fee provided that copies are not 44 In practice, this is challenging because it requires finding 97 made or distributed for profit or commercial advantage and that copies bear a large number of partisan WhatsApp groups and following 45 this notice and the full citation on the first page. Copyrights for components 98 46 of this work owned by others than ACM must be honored. Abstracting with the digital rallies that happen in them. Next, we need to find 99 47 credit is permitted. To copy otherwise, or republish, to post on servers or to meaningful ways to analyze such rallies and characterize 100 48 redistribute to lists, requires prior specific permission and/or a fee. Request each partisan group. There are many analyses that can be 101 permissions from [email protected]. 49 made to compare right- and left-wing users in WhatsApp: 102 Woodstock ’18, June 03–05, 2018, Woodstock, NY 50 How their social networks are structured, how much do they 103 © 2018 Association for Computing Machinery. represent the larger population of voters, and what types of 51 ACM ISBN 978-1-4503-9999-9/18/06...$15.00 104 52 https://doi.org/10.1145/1122445.1122456 content are consumed and shared by them. 105 53 1 106 Woodstock ’18, June 03–05, 2018, Woodstock, NY Bursztyn and Birnbaum 107 Addressing the challenges involved in analyzing partisan 3 METHODOLOGY 160 108 activity in WhatsApp, we make the following contributions: Our data collection started on September 1, 2018, and was 161 109 162 • concluded for the purpose of this analysis on November 1, 110 We introduce a new data collection method capable of 163 growing a sample as much as necessary and towards 2018, thus spanning exactly two months. It included mean- 111 ingful events that preceded the electoral race (e.g., Brazil’s 164 112 specific directions in the political spectrum; 165 • Supreme Court barring former President Lula from the race 113 Our code, released to the research community upon 166 publication, can mine invites to other public WhatsApp on September 11) as well as the first and second rounds of 114 the election (on October 8 and 28, respectively). 167 115 groups from the groups already joined; 168 • All data was collected using a dedicated smartphone with 116 We analyze the first large-scale dataset of partisan 169 activity in WhatsApp using a variety of methods and 64GB of storage, allowing for a maximum of 1GB of data per 117 average day in our study. WhatsApp’s daily backups were 170 118 standpoints, contributing with real measurements about 171 right- vs. left-wing users in the platform. observed so that our portfolio of groups would never exceed 119 this safe margin. For our setting, in practice, it allowed the 172 120 tracking of 700-800 public groups – the total fluctuated as 173 2 RELATED WORK 121 groups were closed or added. 174 122 Several works indicate a recent rise of interest in analyzing In this section we describe our data collection method- 175 123 public WhatsApp groups: ology in two parts. First, we discuss the basic steps also 176 124 Rosenfeld et al. [8] provided an initial study of WhatsApp implemented by previous works [1, 3, 6, 7]. Second, we de- 177 125 messages with a particular focus on predictive analysis. Al- scribe our improvements, which substantially enhance data 178 126 though it comprised 4M messages, the dataset spanned only collection, now made available at a public repository. As 179 127 100 users and didn’t include the actual contents of those with previous works, we emphasize that the resulting data 180 128 messages – only a handful of meta-data. They noted that a collection complies to WhatsApp’s privacy policy. 181 129 small majority of messages originated from group messaging 182 130 and used it to distinguish WhatsApp from plain texting. Data Collection - Base Method 183 131 Garimella and Tyson [3] collected the first large-scale Public WhatsApp groups are characterized by the ability 184 132 WhatsApp dataset. To do so, they looked for invites to public to join them through a public URL created by their own- 185 133 WhatsApp groups in websites that aggregated and organized ers, called “an URL invite”. At the same time, all WhatsApp 186 134 information about such groups as well as by searching for the groups are limited to 256 users. Considering this, Caetano et 187 135 string “chat.whatsapp.com” in Google. After automating the al. [1, Figure 2] summarizes the base method in three steps: 188 136 process of joining groups using Selenium over WhatsApp’s 1) Searching the web for invites to public WhatsApp groups; 189 137 web client, their work was able to collect 454,000 messages 2) Trying to join them; and 3) Extracting data from them. 190 138 from 45,794 users in a six month period, encompassing a For step #1, a typical solution is to look for invites to pub- 191 139 total of 178 groups about a wide variety of themes. lic WhatsApp groups in a set of publicly accessible sources. 192 140 Caetano et al. [1] provided an initial study of political dis- These sources include: (i) websites aimed at organizing in- 193 141 cussions in Brazilian WhatsApp groups and, at the same time, formation about public WhatsApp groups; (ii) the web, in 194 142 offered valuable measurements for a non-electoral setting. general, by searching for the string that is the prefix of URL 195 143 Their dataset comprised 273,468 messages from 6,967 users in invites (“chat.whatsapp.com”) in Google; and (iii) the social 196 144 a one month period, encompassing a total of 81 groups.
Recommended publications
  • Learning React Functional Web Development with React and Redux
    Learning React Functional Web Development with React and Redux Alex Banks and Eve Porcello Beijing Boston Farnham Sebastopol Tokyo Learning React by Alex Banks and Eve Porcello Copyright © 2017 Alex Banks and Eve Porcello. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com/safari). For more information, contact our corporate/insti‐ tutional sales department: 800-998-9938 or [email protected]. Editor: Allyson MacDonald Indexer: WordCo Indexing Services Production Editor: Melanie Yarbrough Interior Designer: David Futato Copyeditor: Colleen Toporek Cover Designer: Karen Montgomery Proofreader: Rachel Head Illustrator: Rebecca Demarest May 2017: First Edition Revision History for the First Edition 2017-04-26: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781491954621 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Learning React, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
    [Show full text]
  • UNIVERSITY of CALIFORNIA SAN DIEGO Simplifying Datacenter Fault
    UNIVERSITY OF CALIFORNIA SAN DIEGO Simplifying datacenter fault detection and localization A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science by Arjun Roy Committee in charge: Alex C. Snoeren, Co-Chair Ken Yocum, Co-Chair George Papen George Porter Stefan Savage Geoff Voelker 2018 Copyright Arjun Roy, 2018 All rights reserved. The Dissertation of Arjun Roy is approved and is acceptable in quality and form for publication on microfilm and electronically: Co-Chair Co-Chair University of California San Diego 2018 iii DEDICATION Dedicated to my grandmother, Bela Sarkar. iv TABLE OF CONTENTS Signature Page . iii Dedication . iv Table of Contents . v List of Figures . viii List of Tables . x Acknowledgements . xi Vita........................................................................ xiii Abstract of the Dissertation . xiv Chapter 1 Introduction . 1 Chapter 2 Datacenters, applications, and failures . 6 2.1 Datacenter applications, networks and faults . 7 2.1.1 Datacenter application patterns . 7 2.1.2 Datacenter networks . 10 2.1.3 Datacenter partial faults . 14 2.2 Partial faults require passive impact monitoring . 17 2.2.1 Multipath hampers server-centric monitoring . 18 2.2.2 Partial faults confuse network-centric monitoring . 19 2.3 Unifying network and server centric monitoring . 21 2.3.1 Load-balanced links mean outliers correspond with partial faults . 22 2.3.2 Centralized network control enables collating viewpoints . 23 Chapter 3 Related work, challenges and a solution . 24 3.1 Fault localization effectiveness criteria . 24 3.2 Existing fault management techniques . 28 3.2.1 Server-centric fault detection . 28 3.2.2 Network-centric fault detection .
    [Show full text]
  • Crawling Code Review Data from Phabricator
    Friedrich-Alexander-Universit¨atErlangen-N¨urnberg Technische Fakult¨at,Department Informatik DUMITRU COTET MASTER THESIS CRAWLING CODE REVIEW DATA FROM PHABRICATOR Submitted on 4 June 2019 Supervisors: Michael Dorner, M. Sc. Prof. Dr. Dirk Riehle, M.B.A. Professur f¨urOpen-Source-Software Department Informatik, Technische Fakult¨at Friedrich-Alexander-Universit¨atErlangen-N¨urnberg Versicherung Ich versichere, dass ich die Arbeit ohne fremde Hilfe und ohne Benutzung anderer als der angegebenen Quellen angefertigt habe und dass die Arbeit in gleicher oder ¨ahnlicherForm noch keiner anderen Pr¨ufungsbeh¨ordevorgelegen hat und von dieser als Teil einer Pr¨ufungsleistung angenommen wurde. Alle Ausf¨uhrungen,die w¨ortlich oder sinngem¨aߨubernommenwurden, sind als solche gekennzeichnet. Nuremberg, 4 June 2019 License This work is licensed under the Creative Commons Attribution 4.0 International license (CC BY 4.0), see https://creativecommons.org/licenses/by/4.0/ Nuremberg, 4 June 2019 i Abstract Modern code review is typically supported by software tools. Researchers use data tracked by these tools to study code review practices. A popular tool in open-source and closed-source projects is Phabricator. However, there is no tool to crawl all the available code review data from Phabricator hosts. In this thesis, we develop a Python crawler named Phabry, for crawling code review data from Phabricator instances using its REST API. The tool produces minimal server and client load, reproducible crawling runs, and stores complete and genuine review data. The new tool is used to crawl the Phabricator instances of the open source projects FreeBSD, KDE and LLVM. The resulting data sets can be used by researchers.
    [Show full text]
  • Artificial Intelligence for Understanding Large and Complex
    Artificial Intelligence for Understanding Large and Complex Datacenters by Pengfei Zheng Department of Computer Science Duke University Date: Approved: Benjamin C. Lee, Advisor Bruce M. Maggs Jeffrey S. Chase Jun Yang Dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science in the Graduate School of Duke University 2020 Abstract Artificial Intelligence for Understanding Large and Complex Datacenters by Pengfei Zheng Department of Computer Science Duke University Date: Approved: Benjamin C. Lee, Advisor Bruce M. Maggs Jeffrey S. Chase Jun Yang An abstract of a dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Department of Computer Science in the Graduate School of Duke University 2020 Copyright © 2020 by Pengfei Zheng All rights reserved except the rights granted by the Creative Commons Attribution-Noncommercial Licence Abstract As the democratization of global-scale web applications and cloud computing, under- standing the performance of a live production datacenter becomes a prerequisite for making strategic decisions related to datacenter design and optimization. Advances in monitoring, tracing, and profiling large, complex systems provide rich datasets and establish a rigorous foundation for performance understanding and reasoning. But the sheer volume and complexity of collected data challenges existing techniques, which rely heavily on human intervention, expert knowledge, and simple statistics. In this dissertation, we address this challenge using artificial intelligence and make the case for two important problems, datacenter performance diagnosis and datacenter workload characterization. The first thrust of this dissertation is the use of statistical causal inference and Bayesian probabilistic model for datacenter straggler diagnosis.
    [Show full text]
  • Unicorn: a System for Searching the Social Graph
    Unicorn: A System for Searching the Social Graph Michael Curtiss, Iain Becker, Tudor Bosman, Sergey Doroshenko, Lucian Grijincu, Tom Jackson, Sandhya Kunnatur, Soren Lassen, Philip Pronin, Sriram Sankar, Guanghao Shen, Gintaras Woss, Chao Yang, Ning Zhang Facebook, Inc. ABSTRACT rative of the evolution of Unicorn's architecture, as well as Unicorn is an online, in-memory social graph-aware index- documentation for the major features and components of ing system designed to search trillions of edges between tens the system. of billions of users and entities on thousands of commodity To the best of our knowledge, no other online graph re- servers. Unicorn is based on standard concepts in informa- trieval system has ever been built with the scale of Unicorn tion retrieval, but it includes features to promote results in terms of both data volume and query volume. The sys- with good social proximity. It also supports queries that re- tem serves tens of billions of nodes and trillions of edges quire multiple round-trips to leaves in order to retrieve ob- at scale while accounting for per-edge privacy, and it must jects that are more than one edge away from source nodes. also support realtime updates for all edges and nodes while Unicorn is designed to answer billions of queries per day at serving billions of daily queries at low latencies. latencies in the hundreds of milliseconds, and it serves as an This paper includes three main contributions: infrastructural building block for Facebook's Graph Search • We describe how we applied common information re- product. In this paper, we describe the data model and trieval architectural concepts to the domain of the so- query language supported by Unicorn.
    [Show full text]
  • 2.3 Apache Chemistry
    UNIVERSITY OF DUBLIN TRINITY COLLEGE DISSERTATION PRESENTED TO THE UNIVERSITY OF DUBLIN,TRINITY COLLEGE IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MAGISTER IN ARTE INGENIARIA Content Interoperability and Open Source Content Management Systems An implementation of the CMIS standard as a PHP library Author: Supervisor: MATTHEW NICHOLSON DAVE LEWIS Submitted to the University of Dublin, Trinity College, May, 2015 Declaration I, Matthew Nicholson, declare that the following dissertation, except where otherwise stated, is entirely my own work; that it has not previously been submitted as an exercise for a degree, either in Trinity College Dublin, or in any other University; and that the library may lend or copy it or any part thereof on request. May 21, 2015 Matthew Nicholson i Summary This paper covers the design, implementation and evaluation of PHP li- brary to aid in use of the Content Management Interoperability Services (CMIS) standard. The standard attempts to provide a language indepen- dent and platform specific mechanisms to better allow content manage- ment systems (CM systems) work together. There is currently no PHP implementation of CMIS server framework available, at least not widely. The name given to the library is Elaphus CMIS. The implementation should remove a barrier for making PHP CM sys- tems CMIS compliant. Aswell testing how language independent the stan- dard is and look into the features of PHP programming language. The technologies that are the focus of this report are: CMIS A standard that attempts to structure the data within a wide range of CM systems and provide standardised API for interacting with this data.
    [Show full text]
  • Learning React Modern Patterns for Developing React Apps
    Second Edition Learning React Modern Patterns for Developing React Apps Alex Banks & Eve Porcello SECOND EDITION Learning React Modern Patterns for Developing React Apps Alex Banks and Eve Porcello Beijing Boston Farnham Sebastopol Tokyo Learning React by Alex Banks and Eve Porcello Copyright © 2020 Alex Banks and Eve Porcello. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or [email protected]. Acquisitions Editor: Jennifer Pollock Indexer: Judith McConville Development Editor: Angela Rufino Interior Designer: David Futato Production Editor: Kristen Brown Cover Designer: Karen Montgomery Copyeditor: Holly Bauer Forsyth Illustrator: Rebecca Demarest Proofreader: Abby Wheeler May 2017: First Edition June 2020: Second Edition Revision History for the Second Edition 2020-06-12: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781492051725 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Learning React, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the authors, and do not represent the publisher’s views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work.
    [Show full text]
  • Elmulating Javascript
    Linköping University | Department of Computer science Master thesis | Computer science Spring term 2016-07-01 | LIU-IDA/LITH-EX-A—16/042--SE Elmulating JavaScript Nils Eriksson & Christofer Ärleryd Tutor, Anders Fröberg Examinator, Erik Berglund Linköpings universitet SE–581 83 Linköping +46 13 28 10 00 , www.liu.se Abstract Functional programming has long been used in academia, but has historically seen little light in industry, where imperative programming languages have been dominating. This is quickly changing in web development, where the functional paradigm is increas- ingly being adopted. While programming languages on other platforms than the web are constantly compet- ing, in a sort of survival of the fittest environment, on the web the platform is determined by the browsers which today only support JavaScript. JavaScript which was made in 10 days is not well suited for building large applications. A popular approach to cope with this is to write the application in another language and then compile the code to JavaScript. Today this is possible to do in a number of established languages such as Java, Clojure, Ruby etc. but also a number of special purpose language has been created. These are lan- guages that are made for building front-end web applications. One such language is Elm which embraces the principles of functional programming. In many real life situation Elm might not be possible to use, so in this report we are going to look at how to bring the benefits seen in Elm to JavaScript. Acknowledgments We would like to thank Jörgen Svensson for the amazing support thru this journey.
    [Show full text]
  • Ecological Momentary Assessments from a Cross-Platform, Native Application
    Bachelor’s Thesis Ecological Momentary Assessments from a Cross-Platform, Native Application Jake Davison July, 2018 supervised by: Dr. Frank Blaauw Dr. Ando Emerencia Contents 1 Introduction 3 1.1 Ecological Momentary Assessments . 3 1.2 u-can-act . 4 1.3 Project . 4 2 Related Work and Background 5 2.1 Mobile EMA Applications . 5 2.1.1 RealLife Exp . 5 2.1.2 mEMA . 5 2.1.3 PIEL Survey . 6 2.1.4 Need for New Application . 6 2.2 Native Mobile Applications From a Single Code Base . 6 2.2.1 React Native . 7 2.2.2 Ionic / Apache Cordova . 8 2.2.3 Choice of Technology . 10 2.3 Domain Specific Languages . 10 2.4 Service Oriented Architectures . 11 3 System Overview 12 3.1 Project Requirements . 12 3.2 System Architecture . 13 3.3 Display of Questions . 14 3.3.1 Range . 14 3.3.2 Checkbox . 15 3.3.3 Radio . 15 3.3.4 Textfield . 15 3.3.5 Time . 15 3.3.6 Raw . 15 3.3.7 Unsubscribe . 15 3.4 Required and Supported Functionalities . 15 4 Implementation 17 4.1 Receiving, Parsing and Displaying Question Data . 17 4.2 Logging In to Receive Questionnaires . 19 4.3 Sending User Input . 20 5 Conclusions 22 5.1 Resulting Software . 22 5.1.1 Single Code Base . 22 5.1.2 Pull in, Parse and Display Questions . 22 5.1.3 Submitting User Input . 23 5.1.4 User Login . 23 5.1.5 Push Notification . 24 1 5.1.6 Applications Running on Device for iOS and Android .
    [Show full text]
  • “They Keep Coming Back Like Zombies”: Improving Software
    “They Keep Coming Back Like Zombies”: Improving Software Updating Interfaces Arunesh Mathur, Josefine Engel, Sonam Sobti, Victoria Chang, and Marshini Chetty, University of Maryland, College Park https://www.usenix.org/conference/soups2016/technical-sessions/presentation/mathur This paper is included in the Proceedings of the Twelfth Symposium on Usable Privacy and Security (SOUPS 2016). June 22–24, 2016 • Denver, CO, USA ISBN 978-1-931971-31-7 Open access to the Proceedings of the Twelfth Symposium on Usable Privacy and Security (SOUPS 2016) is sponsored by USENIX. “They Keep Coming Back Like Zombies”: Improving Software Updating Interfaces Arunesh Mathur Josefine Engel Sonam Sobti [email protected] [email protected] [email protected] Victoria Chang Marshini Chetty [email protected] [email protected] Human–Computer Interaction Lab, College of Information Studies University of Maryland, College Park College Park, MD 20742 ABSTRACT ers”, and “security analysts” only install updates about half Users often do not install security-related software updates, the time more than non-experts [34]. Despite this evidence leaving their devices open to exploitation by attackers. We that there are factors influencing updating behaviors, the are beginning to understand what factors affect this soft- majority of research on software updates focuses on the net- ware updating behavior but the question of how to improve work and systems aspect of delivering updates to end users current software updating interfaces however remains unan- [17, 47, 24, 13, 23]. swered. In this paper, we begin tackling this question by However, a growing number of studies have begun to explore studying software updating behaviors, designing alternative the human side of software updates for Microsoft (MS) Win- updating interfaces, and evaluating these designs.
    [Show full text]
  • Realtime Data Processing at Facebook
    Realtime Data Processing at Facebook Guoqiang Jerry Chen, Janet L. Wiener, Shridhar Iyer, Anshul Jaiswal, Ran Lei Nikhil Simha, Wei Wang, Kevin Wilfong, Tim Williamson, and Serhat Yilmaz Facebook, Inc. ABSTRACT dural language (such as C++ or Java) essential? How Realtime data processing powers many use cases at Face- fast can a user write, test, and deploy a new applica- book, including realtime reporting of the aggregated, anon- tion? ymized voice of Facebook users, analytics for mobile applica- • Performance: How much latency is ok? Milliseconds? tions, and insights for Facebook page administrators. Many Seconds? Or minutes? How much throughput is re- companies have developed their own systems; we have a re- quired, per machine and in aggregate? altime data processing ecosystem at Facebook that handles hundreds of Gigabytes per second across hundreds of data • Fault-tolerance: what kinds of failures are tolerated? pipelines. What semantics are guaranteed for the number of times Many decisions must be made while designing a realtime that data is processed or output? How does the system stream processing system. In this paper, we identify five store and recover in-memory state? important design decisions that affect their ease of use, per- formance, fault tolerance, scalability, and correctness. We • Scalability: Can data be sharded and resharded to pro- compare the alternative choices for each decision and con- cess partitions of it in parallel? How easily can the sys- trast what we built at Facebook to other published systems. tem adapt to changes in volume, both up and down? Our main decision was targeting seconds of latency, not Can it reprocess weeks worth of old data? milliseconds.
    [Show full text]
  • The Compassion Project
    The Compassion Project Mitchell Black, Kyle Melton, and Amelia Getty 25 April 2019 Abstract The Compassion Project is a public, collaborative art installation sourced from approximately 6,500 artists around Bozeman. Each participating artist painted an 8”x8” wooden block with their own interpretation of compassion. To accompany their art, each artist wrote an artist statement defining compassion or explaining how their block relates to compassion. We created a mobile app as a companion to the installation. The primary usage of this app is to allow visitors to lookup the artist statement on their phone from the unique ID number assigned to each block. Additional app functionality includes favoriting pictures, personal user viewing history, and usage statistics. We also performed a comparative analysis of image descriptors for the blocks. We were interested in the idea using an picture to search for that block artist’s statement. We have images of each block to be used as a thumbnail when users look up a block. To evaluate which image descriptors are most effective at grouping the blocks, we took secondary photos of a subset of the blocks and tested whether the descriptors can pair the thumbnail of a block to our photo of a block. 1 Qualifications See attached resumes. 2 Bozeman, MT 59718 ⋄ 406-580-3154⋄ [email protected] ⋄ linkedin.com/in/ameliagetty Amelia Getty ​ ​ Computer Science senior with diverse experience poised to transition to software development or engineering. Organized and dependable. Aptitude to learn quickly and work independently and with a team. PROGRAMMING EXPERIENCE Java ⋄ PHP (Laravel) ⋄ C ⋄ Python ⋄ C++ ⋄ SQL Linux Systems Admin ⋄ Databases ⋄ Graphics ⋄ Networks ⋄ Security (2019) EDUCATION ♢ Computer Science​.​ ​Montana State University​, 2017 - 2019 ♢ BA Modern Languages and Literatures, German​.
    [Show full text]