Full Book As

Total Page:16

File Type:pdf, Size:1020Kb

Full Book As FAIRNESS AND MACHINE LEARNING Limitations and Opportunities Solon Barocas, Moritz Hardt, Arvind Narayanan Created: Wed 16 Jun 2021 01:46:08 PM PDT Contents About the book 5 Why now? 5 How did the book come about? 6 Who is this book for? 6 What’s in this book? 6 About the authors 7 Thanks and acknowledgements 7 Introduction 9 Demographic disparities 11 The machine learning loop 13 The state of society 14 The trouble with measurement 16 From data to models 19 The pitfalls of action 21 Feedback and feedback loops 22 Getting concrete with a toy example 25 Other ethical considerations 28 Our outlook: limitations and opportunities 31 Bibliographic notes and further reading 32 Classification 35 4 Supervised learning 35 Sensitive characteristics 41 Formal non-discrimination criteria 43 Calibration and sufficiency 49 Relationships between criteria 52 Inherent limitations of observational criteria 55 Case study: Credit scoring 59 Problem set: Criminal justice case study 65 Problem set: Data modeling of traffic stops 66 What is the purpose of a fairness criterion? 70 Bibliographic notes and further reading 71 Legal background and normative questions 75 Causality 77 The limitations of observation 78 Causal models 81 Causal graphs 85 Interventions and causal effects 88 Confounding 89 Graphical discrimination analysis 92 Counterfactuals 97 Counterfactual discrimination analysis 103 Validity of causal models 108 Problem set 116 Bibliographic notes and further reading 116 Testing Discrimination in Practice 119 Part 1: Traditional tests for discrimination 120 Audit studies 120 Testing the impact of blinding 124 5 Revealing extraneous factors in decisions 125 Testing the impact of decisions and interventions 127 Purely observational tests 128 Summary of traditional tests and methods 132 Taste-based and statistical discrimination 132 Studies of decision making processes and organizations 134 Part 2: Testing discrimination in algorithmic systems 136 Fairness considerations in applications of natural language processing 137 Demographic disparities and questionable applications of computer vision 139 Search and recommendation systems: three types of harms 141 Understanding unfairness in ad targeting 143 Fairness considerations in the design of online marketplaces 146 Mechanisms of discrimination 148 Fairness criteria in algorithmic audits 149 Information flow, fairness, privacy 151 Comparison of research methods 152 Looking ahead 154 A broader view of discrimination 157 Case study: the gender earnings gap on Uber 157 Three levels of discrimination 161 Machine learning and structural discrimination 165 Structural interventions for fair machine learning 170 Organizational interventions for fairer decision making 175 Appendix: a deeper look at structural factors 184 Datasets 187 A tour of datasets in different domains 188 Roles datasets play 196 Harms associated with data 207 Beyond datasets 211 Summary 219 Chapter notes 220 6 Bibliography 221 1 About the book This book gives a perspective on machine learning that treats fair- ness as a central concern rather than an afterthought. We’ll review the practice of machine learning in a way that highlights ethical challenges. We’ll then discuss approaches to mitigate these problems. We’ve aimed to make the book as broadly accessible as we could, while preserving technical rigor and confronting difficult moral questions that arise in algorithmic decision making. This book won’t have an all-encompassing formal definition of fairness or a quick technical fix to society’s concerns with automated decisions. Addressing issues of fairness requires carefully under- standing the scope and limitations of machine learning tools. This book offers a critical take on current practice of machine learning as well as proposed technical fixes for achieving fairness. It doesn’t offer any easy answers. Nonetheless, we hope you’ll find the book enjoyable and useful in developing a deeper understanding of how to practice machine learning responsibly. Why now? Machine learning has made rapid headway into socio-technical sys- tems ranging from video surveillance to automated resume screening. Simultaneously, there has been heightened public concern about the impact of digital technology on society. These two trends have led to the rapid emergence of fairness, ac- countability, transparency in socio-technical systems as a research field. While exciting, this has led to a proliferation of terminology, re- discovery and simultaneous discovery, conflicts between disciplinary perspectives, and other types of confusion. This book aims to move the conversation forward by synthesizing long-standing bodies of knowledge, such as causal inference, with recent work in the community, sprinkled with a few observations of our own. 8 fairness and machine learning How did the book come about? In the fall semester of 2017, the three authors each taught courses on fairness and ethics in machine learning: Barocas at Cornell, Hardt at Berkeley, and Narayanan at Princeton. We each approached the topic from a different perspective. We also presented two tutorials: Barocas and Hardt at NIPS 2017, and Narayanan at FAT* 2018. This book emerged from the notes we created for these three courses, and is the result of an ongoing dialog between us. Who is this book for? We’ve written this book to be useful for multiple audiences. You might be a student or practitioner of machine learning facing ethical concerns in your daily work. You might also be an ethics scholar looking to apply your expertise to the study of emerging technolo- gies. Or you might be a citizen concerned about how automated systems will shape society, and wanting a deeper understanding than you can get from press coverage. We’ll assume you’re familiar with introductory computer science and algorithms. Knowing how to code isn’t strictly necessary to read the book, but will let you get the most out of it. We’ll also assume you’re familiar with basic statistics and probability. Throughout the book, we’ll include pointers to introductory material on these topics. On the other hand, you don’t need any knowledge of machine learning to read this book: we’ve included an appendix that intro- duces basic machine learning concepts. We’ve also provided a basic discussion of the philosophical and legal concepts underlying fair- 1 These haven’t yet been released. ness.1 What’s in this book? This book is intentionally narrow in scope: you can see an outline 2 This chapter hasn’t yet been released. here. Most of the book is about fairness, but we include a chapter2 that touches upon a few related concepts: privacy, interpretability, explainability, transparency, and accountability. We omit vast swaths of ethical concerns about machine learning and artificial intelligence, including labor displacement due to automation, adversarial machine learning, and AI safety. Similarly, we discuss fairness interventions in the narrow sense of fair decision-making. We acknowledge that interventions may take many other forms: setting better policies, reforming institutions, or upending the basic structures of society. A narrow framing of machine learning ethics might be tempting about the book 9 to technologists and businesses as a way to focus on technical in- terventions while sidestepping deeper questions about power and accountability. We caution against this temptation. For example, mit- igating racial disparities in the accuracy of face recognition systems, while valuable, is no substitute for a debate about whether such sys- tems should be deployed in public spaces and what sort of oversight we should put into place. About the authors Solon Barocas is an Assistant Professor in the Department of Infor- mation Science at Cornell University. His research explores ethical and policy issues in artificial intelligence, particularly fairness in machine learning, methods for bringing accountability to automated decision-making, and the privacy implications of inference. He was previously a Postdoctoral Researcher at Microsoft Research, where he worked with the Fairness, Accountability, Transparency, and Ethics in AI group, as well as a Postdoctoral Research Associate at the Center for Information Technology Policy at Princeton University. Barocas completed his doctorate at New York University, where he remains a visiting scholar at the Center for Urban Science + Progress. Moritz Hardt is an Assistant Professor in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. Hardt investigates algorithms and machine learning with a focus on reliability, validity, and societal impact. After obtaining a PhD in Computer Science from Princeton University, he held positions at IBM Research Almaden, Google Research and Google Brain. Arvind Narayanan is an Associate Professor of Computer Science at Princeton. He studies the risks associated with large datasets about people: anonymity, privacy, and bias. He leads the Princeton Web Transparency and Accountability Project to uncover how companies collect and use our personal information. His doctoral research showed the fundamental limits of de-identification. He co-created a Massive Open Online Course as well as a textbook on Bitcoin and cryptocurrency technologies. Narayanan is a recipient of the Presidential Early Career Award for Scientists and Engineers. Thanks and acknowledgements This book wouldn’t have been possible without the profound contri- butions of our collaborators and the community at large. We are grateful to our students for their
Recommended publications
  • INSTITUTE for COMPUTATIONAL and BIOLOGICAL LEARNING (ICBL)
    ICBL (HANSON, GLUCK, VAPNIK) 1 August 20th 2003 Governor's Commission on Jobs & Economic Growth INSTITUTE FOR COMPUTATIONAL and BIOLOGICAL LEARNING (ICBL) PRIMARY PARTICIPANTS: Stephen José Hanson, Psych. Rutgers-Newark; Cog. Sci. Rutgers-NB, Info. Sci., NJIT; Advanced Imaging Center, UMDNJ (Principal Investigator and Co-Director) Mark A. Gluck, CMBN Rutgers-Newark (co-Principal Investigator and Co-Director) Vladimir Vapnik, NEC (co-Principal Investigator and Co-Director) We also include participants and contact personnel for other participating institutions including Rutgers NB-Piscataway, Rutgers-Camden, UMDNJ, NJIT, Princeton, NEC, ATT, Siemens Corporate Research, and Merck Pharmaceuticals (See attachments that follow this document.) PRIMARY OBJECTIVE: The Institute for Computational and Biological Learning (ICBL) will be an interdisciplinary center for research and training in the learning sciences, encompassing computational, biological,and psychological studies of systems that adapt, extract patterns from high complexity data and in general improve from experience. Core programs include learning theory, neuroinformatics, computational neuroscience of learning and memory, neural-network models, machine learning, human computer interaction and financial/economic modeling. The learning sciences is an emerging interdisciplinary technology, recently recognized by the National Science Foundation (NSF) as a national priority. The NSF announced this year a major multi-disciplinary initiative which will fund approximately five new national Learning Sciences Centers with budgets of $3M to $5M a year for up to 10 years, for a total commitment of up to $50,000,000 per center. As envisioned by NSF, these centers will be “large-scale, long-term Centers that will extend the frontiers of knowledge on learning and create the intellectual, organizational, and physical infrastructure needed for the long-term advancement of learning research.
    [Show full text]
  • Cancel Culture: Posthuman Hauntologies in Digital Rhetoric and the Latent Values of Virtual Community Networks
    CANCEL CULTURE: POSTHUMAN HAUNTOLOGIES IN DIGITAL RHETORIC AND THE LATENT VALUES OF VIRTUAL COMMUNITY NETWORKS By Austin Michael Hooks Heather Palmer Rik Hunter Associate Professor of English Associate Professor of English (Chair) (Committee Member) Matthew Guy Associate Professor of English (Committee Member) CANCEL CULTURE: POSTHUMAN HAUNTOLOGIES IN DIGITAL RHETORIC AND THE LATENT VALUES OF VIRTUAL COMMUNITY NETWORKS By Austin Michael Hooks A Thesis Submitted to the Faculty of the University of Tennessee at Chattanooga in Partial Fulfillment of the Requirements of the Degree of Master of English The University of Tennessee at Chattanooga Chattanooga, Tennessee August 2020 ii Copyright © 2020 By Austin Michael Hooks All Rights Reserved iii ABSTRACT This study explores how modern epideictic practices enact latent community values by analyzing modern call-out culture, a form of public shaming that aims to hold individuals responsible for perceived politically incorrect behavior via social media, and cancel culture, a boycott of such behavior and a variant of call-out culture. As a result, this thesis is mainly concerned with the capacity of words, iterated within the archive of social media, to haunt us— both culturally and informatically. Through hauntology, this study hopes to understand a modern discourse community that is bound by an epideictic framework that specializes in the deconstruction of the individual’s ethos via the constant demonization and incitement of past, current, and possible social media expressions. The primary goal of this study is to understand how these practices function within a capitalistic framework and mirror the performativity of capital by reducing affective human interactions to that of a transaction.
    [Show full text]
  • A Priori Knowledge from Non-Examples
    A priori Knowledge from Non-Examples Diplomarbeit im Fach Bioinformatik vorgelegt am 29. März 2007 von Fabian Sinz Fakultät für Kognitions- und Informationswissenschaften Wilhelm-Schickard-Institut für Informatik der Universität Tübingen in Zusammenarbeit mit Max-Planck-Institut für biologische Kybernetik Tübingen Betreuer: Prof. Dr. Bernhard Schölkopf (MPI für biologische Kybernetik) Prof. Dr. Andreas Schilling (WSI für Informatik) Prof. Dr. Hanspeter Mallot (Biologische Fakultät) 1 Erklärung Die folgende Diplomarbeit wurde ausschließlich von mir (Fabian Sinz) erstellt. Es wurden keine weiteren Hilfmittel oder Quellen außer den Angegebenen benutzt. Tübingen, den 29. März 2007, Fabian Sinz Contents 1 Introduction 4 1.1 Foreword . 4 1.2 Notation . 5 1.3 Machine Learning . 7 1.3.1 Machine Learning . 7 1.3.2 Philosophy of Science in a Nutshell - About the Im- possibility to learn without Assumptions . 11 1.3.3 General Assumptions in Bayesian and Frequentist Learning . 14 1.3.4 Implicit prior Knowledge via additional Data . 17 1.4 Mathematical Tools . 18 1.4.1 Reproducing Kernel Hilbert Spaces . 18 1.5 Regularisation . 22 1.5.1 General regularisation . 22 1.5.2 Data-dependent Regularisation . 27 2 The Universum Algorithm 32 2.1 The Universum Idea . 32 2.1.1 VC Theory and Structural Risk Minimisation (SRM) in a Nutshell . 32 2.1.2 The original Universum Idea by Vapnik . 35 2.2 Implementing the Universum into SVMs . 37 2.2.1 Support Vector Machine Implementations of the Uni- versum . 38 2.2.1.1 Hinge Loss Universum . 38 2.2.1.2 Least Squares Universum . 46 2.2.1.3 Multiclass Universum .
    [Show full text]
  • 2020 Vision: Info Pro Skills for a New Decade
    2020 Vision: Info Pro Skills for a New Decade Search Skills for Today’s Info Pros and Thriving in the New Information Landscape Presented by: Mary Ellen Bates Bates Information Services BatesInfo.com Presented for: Initiative Fortbildung e.V. 9 and 10 May 2019 2020 VISION DAY 1: Search Skills for Today’s Info Pros INSIDE A SEARCHER’S MIND: BRINGING THE DETECTIVE TO THE SEARCH ..........................................1 TECHNIQUES OF A DETECTIVE ......................................................................................................................2 DIFFERENT SEARCH APPROACHES.................................................................................................................3 GETTING CREATIVE....................................................................................................................................5 WHAT’S NEW (OR AT LEAST USEFUL) WITH GOOGLE: TIPS AND TOOLS FOR TODAY’S GOOGLE .........6 GOOGLE TRICKS........................................................................................................................................6 SEARCHING THE DEEP WEB / GREY LITERATURE ................................................................................8 SEARCH STRATEGIES FOR GREY LITERATURE....................................................................................................9 SOME GREY LIT/DEEP WEB TOOLS.............................................................................................................10 GLEANING INSIGHT FROM SOCIAL MEDIA........................................................................................12
    [Show full text]
  • Ciência De Dados Na Ciência Da Informação
    Ciência da Informação v. 49 n.3 set./dez. 2020 ISSN 0100-1965 eISSN 1518-8353 Edição especial temática Special thematic issue / Edición temática especial Ciência de dados na ciência da informação Data science in Information Sience Ciencia de datos en la Ciencia de la Información Instituto Brasileiro de Informação em Ciência e Tecnologia (Ibict) Diretoria Indexação Cecília Leite Oliveira Ciência da Informação tem seus artigos indexados ou resumidos. Coordenação-Geral de Pesquisa e Desenvolvimento de Novos Produtos (CGNP) Bases Internacionais Anderson Luis Cambraia Itaborahy Directory of Open Access Journals - DOAJ. Paschal Thema: Science de L’Information, Documentation. Library and Coordenação-Geral de Pesquisa e Manutenção de Produtos Consolidados (CGPC) Information Science Abstracts. PAIS Foreign Language Bianca Amaro Index. Information Science Abstracts. Library and Literature. Páginas de Contenido: Ciências de la Información. Coordenação-Geral de Tecnologias de Informação e Informática EDUCACCION: Notícias de Educación, Ciencia y Cultura (CGTI) Tiago Emmanuel Nunes Braga Iberoamericanas. Referativnyi Zhurnal: Informatika. ISTA Information Science & Technology Abstracts. LISTA Library, Coordenação de Ensino e Pesquisa, Ciência e Tecnologia da Information Science & Technology Abstracts. SciELO Informação (COEPPE) Scientific Electronic Library On-line. Latindex – Sistema Gustavo Saldanha Regional de Información em Línea para Revistas Científicas Coordenação de Planejamento, Acompanhamento e Avaliação de América Latina el Caribe, España y Portugal, México. (COPAV) INFOBILA: Información Bibliotecológica Latinoamericana. José Luis dos Santos Nascimento Indexação em Bases de Dados Nacionais Coordenação de Administração (COADM) Reginaldo de Araújo Silva Portal de Periódicos LivRe – Portal de Periódicos de Livre Acesso. Comissão Divisão de Editoração Científica Nacional de Energia Nuclear (Cnen). Portal Periódicos Ramón Martins Sodoma da Fonseca da Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (Capes).
    [Show full text]
  • NSA's Efforts to Secure Private-Sector Telecommunications Infrastructure
    Under the Radar: NSA’s Efforts to Secure Private-Sector Telecommunications Infrastructure Susan Landau* INTRODUCTION When Google discovered that intruders were accessing certain Gmail ac- counts and stealing intellectual property,1 the company turned to the National Security Agency (NSA) for help in securing its systems. For a company that had faced accusations of violating user privacy, to ask for help from the agency that had been wiretapping Americans without warrants appeared decidedly odd, and Google came under a great deal of criticism. Google had approached a number of federal agencies for help on its problem; press reports focused on the company’s approach to the NSA. Google’s was the sensible approach. Not only was NSA the sole government agency with the necessary expertise to aid the company after its systems had been exploited, it was also the right agency to be doing so. That seems especially ironic in light of the recent revelations by Edward Snowden over the extent of NSA surveillance, including, apparently, Google inter-data-center communications.2 The NSA has always had two functions: the well-known one of signals intelligence, known in the trade as SIGINT, and the lesser known one of communications security or COMSEC. The former became the subject of novels, histories of the agency, and legend. The latter has garnered much less attention. One example of the myriad one could pick is David Kahn’s seminal book on cryptography, The Codebreakers: The Comprehensive History of Secret Communication from Ancient Times to the Internet.3 It devotes fifty pages to NSA and SIGINT and only ten pages to NSA and COMSEC.
    [Show full text]
  • Cities Within the City: Doityourself Urbanism and the Right to the City
    bs_bs_banner Volume 37.3 May 2013 941–56 International Journal of Urban and Regional Research DOI:10.1111/1468-2427.12053 Cities within the City: Do-It-Yourself Urbanism and the Right to the City KURT IVESON Abstract In many cities around the world we are presently witnessing the growth of, and interest in, a range of micro-spatial urban practices that are reshaping urban spaces. These practices include actions such as: guerrilla and community gardening; housing and retail cooperatives; flash mobbing and other shock tactics; social economies and bartering schemes; ‘empty spaces’ movements to occupy abandoned buildings for a range of purposes; subcultural practices like graffiti/street art, skateboarding and parkour; and more. This article asks: to what extent do such practices constitute a new form of urban politics that might give birth to a more just and democratic city? In answering this question, the article considers these so-called ‘do-it-yourself urbanisms’ from the perspective of the ‘right to the city’. After critically assessing that concept, the article argues that in order for do-it-yourself urbanist practices to generate a wider politics of the city through the appropriation of urban space, they also need to assert new forms of authority in the city based on the equality of urban inhabitants. This claim is illustrated through an analysis of the do-it-yourself practices of Sydney-based activist collective BUGA UP and the New York and Madrid Street Advertising Takeovers. Introduction In many cities around the world we are presently witnessing the growth of, and interest in, a range of micro-spatial urban practices that are reshaping urban spaces.
    [Show full text]
  • Trump's Disinformation 'Magaphone'. Consequences, First Lessons and Outlook
    BRIEFING Trump's disinformation 'magaphone' Consequences, first lessons and outlook SUMMARY The deadly insurrection at the US Capitol on 6 January 2021 was a significant cautionary example of the offline effects of online disinformation and conspiracy theories. The historic democratic crisis this has sparked − adding to a number of other historic crises the US is currently battling − provides valuable lessons not only for the United States, but also for Europe and the democratic world. The US presidential election and its aftermath saw domestic disinformation emerging as a more visible immediate threat than disinformation by third countries. While political violence has been the most tangible physical effect of manipulative information, corrosive conspiracy theories have moved from the fringes to the heart of political debate, normalising extremist rhetoric. At the same time, recent developments have confirmed that the lines between domestic and foreign attempts to undermine democracy are increasingly blurred. While the perceived weaknesses in democratic systems are − unsurprisingly − celebrated as a victory for authoritarian state actors, links between foreign interference and domestic terrorism are under growing scrutiny. The question of how to depolarise US society − one of a long list of challenges facing the Biden Administration − is tied to the polarised media environment. The crackdown by major social media platforms on Donald Trump and his supporters has prompted far-right groups to abandon the established information ecosystem to join right-wing social media. This could further accelerate the ongoing fragmentation of the US infosphere, cementing the trend towards separate realities. Ahead of the proposed Democracy Summit − a key objective of the Biden Administration − tempering the 'sword of democracy' has risen to the top of the agenda on both sides of the Atlantic.
    [Show full text]
  • On Security and Privacy for Networked Information Society
    Antti Hakkala On Security and Privacy for Networked Information Society Observations and Solutions for Security Engineering and Trust Building in Advanced Societal Processes Turku Centre for Computer Science TUCS Dissertations No 225, November 2017 ON SECURITY AND PRIVACY FOR NETWORKED INFORMATIONSOCIETY Observations and Solutions for Security Engineering and Trust Building in Advanced Societal Processes antti hakkala To be presented, with the permission of the Faculty of Mathematics and Natural Sciences of the University of Turku, for public criticism in Auditorium XXII on November 18th, 2017, at 12 noon. University of Turku Department of Future Technologies FI-20014 Turun yliopisto 2017 supervisors Adjunct professor Seppo Virtanen, D. Sc. (Tech.) Department of Future Technologies University of Turku Turku, Finland Professor Jouni Isoaho, D. Sc. (Tech.) Department of Future Technologies University of Turku Turku, Finland reviewers Professor Tuomas Aura Department of Computer Science Aalto University Espoo, Finland Professor Olaf Maennel Department of Computer Science Tallinn University of Technology Tallinn, Estonia opponent Professor Jarno Limnéll Department of Communications and Networking Aalto University Espoo, Finland The originality of this thesis has been checked in accordance with the University of Turku quality assurance system using the Turnitin OriginalityCheck service ISBN 978-952-12-3607-5 (Online) ISSN 1239-1883 To my wife Maria, I am forever grateful for everything. Thank you. ABSTRACT Our society has developed into a networked information soci- ety, in which all aspects of human life are interconnected via the Internet — the backbone through which a significant part of communications traffic is routed. This makes the Internet ar- guably the most important piece of critical infrastructure in the world.
    [Show full text]
  • Researching Violent Extremism the State of Play
    Researching Violent Extremism The State of Play J.M. Berger RESOLVE NETWORK | JUNE 2019 Researching Violent Extremism Series The views in this publication are those of the author. They do not necessarily reflect the views of the RESOLVE Network, its partners, the U.S. Institute of Peace, or any U.S. government agency. 2 RESOLVE NETWORK RESEARCH REPORT NO. 1 | LAKE CHAD BASIN RESEARCH SERIES The study of violent extremism is entering a new phase, with shifts in academic focus and policy direction, as well as a host of new and continuing practical and ethical challenges. This chapter will present an overview of the challenges ahead and discuss some strategies for improving the state of research. INTRODUCTION The field of terrorism studies has vastly expanded over the last two decades. As an illustrative example, the term “terrorism” has been mentioned in an average of more than 60,000 Google Scholar results per year since 2010 alone, including academic publications and cited works. While Google Scholar is an admittedly imprecise tool, the index provides some insights on relative trends in research on terrorism. While almost 5,000 publications indexed on Google Scholar address, to a greater or lesser extent, the question of “root causes of terrorism, at the beginning of this marathon of output, which started soon after the terrorist attacks of September 11, 2001, only a fractional number of indexed publications addressed “extremism.” Given the nature of terrorist movements, however, this should be no great mys- tery. The root cause of terrorism is extremism. Figure 1: Google Scholar Results per Year1 Publications mentioning terrorism Publications mentioning extremism 90,000 80,000 70,000 60,000 50,000 40,000 30,000 20,000 Google Scholar Results 10,000 0 Year 1 Google Scholar, accessed May 30, 2019.
    [Show full text]
  • As Facebook Raised a Privacy Wall, It Carved an Opening for Tec
    As Facebook Raised a Privacy Wall, It Carved an Opening for Tec... https://www.printfriendly.com/p/g/FEuXeA As Facebook Raised a Privacy Wall, It Carved an Opening for Tech Giants nytimes.com/2018/12/18/technology/facebook-privacy.html By Gabriel J.X. Dance , Michael LaForgia and Nicholas Confessore December 19, 2018 For years, gave some of the world’s largest technology companies more intrusive access to users’ personal data than it has disclosed, effectively exempting those business partners from its usual privacy rules, according to internal records and interviews. The special arrangements are detailed in hundreds of pages of Facebook documents obtained by The New York Times. The records, generated in 2017 by the company’s internal system for tracking partnerships, provide the most complete picture yet of the social network’s data-sharing practices. They also underscore how personal data has become the most prized commodity of the digital age, traded on a vast scale by some of the most powerful companies in Silicon Valley and beyond. The exchange was intended to benefit everyone. Pushing for explosive growth, Facebook got more users, lifting its advertising revenue. Partner companies acquired features to make their products more attractive. Facebook users connected with friends across different devices and websites. But Facebook also assumed extraordinary power over the personal information of its 2.2 billion users — control it has wielded with little transparency or outside oversight. Facebook allowed Microsoft’s Bing search engine to see the names of virtually all Facebook users’ friends without consent, the records show, and gave Netflix and Spotify the ability to read Facebook users’ private messages.
    [Show full text]
  • Cc5212-1 Procesamiento Masivo De Datos Otoño 2020
    CC5212-1 PROCESAMIENTO MASIVO DE DATOS OTOÑO 2020 Lecture 4.5 Projects, Practice with Pig/Hadoop Aidan Hogan [email protected] Course Marking (Revised) • 75% for Weekly Labs (~9% a lab) – 4/4 obligatory, 4/7 optional • 25% for Class Project • Need to pass in overall grade Assignments each week Hands-on each week! Working in groups Working in groups! CLASS PROJECTS Class Project • Done in threes • Goal: Use what you’ve learned to do something cool/fun (hopefully) • Process: – Form groups of three (in the forum, before April 30th) – On April 30th we will assign the rest automatically – Start thinking up topics / find interesting datasets! – Register topic (deadline around May 21st) – Work on projects during semester – Deliverables will due be around week 13 • Deliverables: 4 minute presentation (video) & short report • Marked on: Difficulty, appropriateness, scale, good use of techniques, presentation, coolness, creativity, value – Ambition is appreciated, even if you don’t succeed Desiderata for project • Must focus around some technique from the course! • Expected difficulty: similar to a lab, but without any instructions • Data not too small: – Should have >250,000 tuples/entries • Data not too large: – Should have <1,000,000,000 tuples/entries – If very large, perhaps take a sample? • In case of COVID-19 data, we can make exceptions Where to find/explore data? • Kaggle: – https://www.kaggle.com/ • Google Dataset Search: – https://datasetsearch.research.google.com/ • Datos Abiertos de Chile: – https://datos.gob.cl/ – https://es.datachile.io/ • … PRACTICE WITH HADOOP/PIG Practice with Hadoop • Optional Assignment 1 (not evaluated): – Hadoop: Find the number of good movies in which each actor/actresses has starred.
    [Show full text]