Full Book As

FAIRNESS AND MACHINE LEARNING Limitations and Opportunities Solon Barocas, Moritz Hardt, Arvind Narayanan Created: Wed 16 Jun 2021 01:46:08 PM PDT Contents About the book 5 Why now? 5 How did the book come about? 6 Who is this book for? 6 What’s in this book? 6 About the authors 7 Thanks and acknowledgements 7 Introduction 9 Demographic disparities 11 The machine learning loop 13 The state of society 14 The trouble with measurement 16 From data to models 19 The pitfalls of action 21 Feedback and feedback loops 22 Getting concrete with a toy example 25 Other ethical considerations 28 Our outlook: limitations and opportunities 31 Bibliographic notes and further reading 32 Classification 35 4 Supervised learning 35 Sensitive characteristics 41 Formal non-discrimination criteria 43 Calibration and sufficiency 49 Relationships between criteria 52 Inherent limitations of observational criteria 55 Case study: Credit scoring 59 Problem set: Criminal justice case study 65 Problem set: Data modeling of traffic stops 66 What is the purpose of a fairness criterion? 70 Bibliographic notes and further reading 71 Legal background and normative questions 75 Causality 77 The limitations of observation 78 Causal models 81 Causal graphs 85 Interventions and causal effects 88 Confounding 89 Graphical discrimination analysis 92 Counterfactuals 97 Counterfactual discrimination analysis 103 Validity of causal models 108 Problem set 116 Bibliographic notes and further reading 116 Testing Discrimination in Practice 119 Part 1: Traditional tests for discrimination 120 Audit studies 120 Testing the impact of blinding 124 5 Revealing extraneous factors in decisions 125 Testing the impact of decisions and interventions 127 Purely observational tests 128 Summary of traditional tests and methods 132 Taste-based and statistical discrimination 132 Studies of decision making processes and organizations 134 Part 2: Testing discrimination in algorithmic systems 136 Fairness considerations in applications of natural language processing 137 Demographic disparities and questionable applications of computer vision 139 Search and recommendation systems: three types of harms 141 Understanding unfairness in ad targeting 143 Fairness considerations in the design of online marketplaces 146 Mechanisms of discrimination 148 Fairness criteria in algorithmic audits 149 Information flow, fairness, privacy 151 Comparison of research methods 152 Looking ahead 154 A broader view of discrimination 157 Case study: the gender earnings gap on Uber 157 Three levels of discrimination 161 Machine learning and structural discrimination 165 Structural interventions for fair machine learning 170 Organizational interventions for fairer decision making 175 Appendix: a deeper look at structural factors 184 Datasets 187 A tour of datasets in different domains 188 Roles datasets play 196 Harms associated with data 207 Beyond datasets 211 Summary 219 Chapter notes 220 6 Bibliography 221 1 About the book This book gives a perspective on machine learning that treats fairness as a central concern rather than an afterthought. We’ll review the practice of machine learning in a way that highlights ethical challenges. We’ll then discuss approaches to mitigate these problems. We’ve aimed to make the book as broadly accessible as we could, while preserving technical rigor and confronting difficult moral questions that arise in algorithmic decision making. This book won’t have an all-encompassing formal definition of fairness or a quick technical fix to society’s concerns with automated decisions. Addressing issues of fairness requires carefully understanding the scope and limitations of machine learning tools. This book offers a critical take on current practice of machine learning as well as proposed technical fixes for achieving fairness. It doesn’t offer any easy answers. Nonetheless, we hope you’ll find the book enjoyable and useful in developing a deeper understanding of how to practice machine learning responsibly. Why now? Machine learning has made rapid headway into socio-technical systems ranging from video surveillance to automated resume screening. Simultaneously, there has been heightened public concern about the impact of digital technology on society. These two trends have led to the rapid emergence of fairness, accountability, transparency in socio-technical systems as a research field. While exciting, this has led to a proliferation of terminology, re- discovery and simultaneous discovery, conflicts between disciplinary perspectives, and other types of confusion. This book aims to move the conversation forward by synthesizing long-standing bodies of knowledge, such as causal inference, with recent work in the community, sprinkled with a few observations of our own. 8 fairness and machine learning How did the book come about? In the fall semester of 2017, the three authors each taught courses on fairness and ethics in machine learning: Barocas at Cornell, Hardt at Berkeley, and Narayanan at Princeton. We each approached the topic from a different perspective. We also presented two tutorials: Barocas and Hardt at NIPS 2017, and Narayanan at FAT* 2018. This book emerged from the notes we created for these three courses, and is the result of an ongoing dialog between us. Who is this book for? We’ve written this book to be useful for multiple audiences. You might be a student or practitioner of machine learning facing ethical concerns in your daily work. You might also be an ethics scholar looking to apply your expertise to the study of emerging technologies. Or you might be a citizen concerned about how automated systems will shape society, and wanting a deeper understanding than you can get from press coverage. We’ll assume you’re familiar with introductory computer science and algorithms. Knowing how to code isn’t strictly necessary to read the book, but will let you get the most out of it. We’ll also assume you’re familiar with basic statistics and probability. Throughout the book, we’ll include pointers to introductory material on these topics. On the other hand, you don’t need any knowledge of machine learning to read this book: we’ve included an appendix that intro- duces basic machine learning concepts. We’ve also provided a basic discussion of the philosophical and legal concepts underlying fair- 1 These haven’t yet been released. ness.1 What’s in this book? This book is intentionally narrow in scope: you can see an outline 2 This chapter hasn’t yet been released. here. Most of the book is about fairness, but we include a chapter2 that touches upon a few related concepts: privacy, interpretability, explainability, transparency, and accountability. We omit vast swaths of ethical concerns about machine learning and artificial intelligence, including labor displacement due to automation, adversarial machine learning, and AI safety. Similarly, we discuss fairness interventions in the narrow sense of fair decision-making. We acknowledge that interventions may take many other forms: setting better policies, reforming institutions, or upending the basic structures of society. A narrow framing of machine learning ethics might be tempting about the book 9 to technologists and businesses as a way to focus on technical interventions while sidestepping deeper questions about power and accountability. We caution against this temptation. For example, mit- igating racial disparities in the accuracy of face recognition systems, while valuable, is no substitute for a debate about whether such systems should be deployed in public spaces and what sort of oversight we should put into place. About the authors Solon Barocas is an Assistant Professor in the Department of Infor- mation Science at Cornell University. His research explores ethical and policy issues in artificial intelligence, particularly fairness in machine learning, methods for bringing accountability to automated decision-making, and the privacy implications of inference. He was previously a Postdoctoral Researcher at Microsoft Research, where he worked with the Fairness, Accountability, Transparency, and Ethics in AI group, as well as a Postdoctoral Research Associate at the Center for Information Technology Policy at Princeton University. Barocas completed his doctorate at New York University, where he remains a visiting scholar at the Center for Urban Science + Progress. Moritz Hardt is an Assistant Professor in the Department of Electrical Engineering and Computer Sciences at the University of California, Berkeley. Hardt investigates algorithms and machine learning with a focus on reliability, validity, and societal impact. After obtaining a PhD in Computer Science from Princeton University, he held positions at IBM Research Almaden, Google Research and Google Brain. Arvind Narayanan is an Associate Professor of Computer Science at Princeton. He studies the risks associated with large datasets about people: anonymity, privacy, and bias. He leads the Princeton Web Transparency and Accountability Project to uncover how companies collect and use our personal information. His doctoral research showed the fundamental limits of de-identification. He co-created a Massive Open Online Course as well as a textbook on Bitcoin and cryptocurrency technologies. Narayanan is a recipient of the Presidential Early Career Award for Scientists and Engineers. Thanks and acknowledgements This book wouldn’t have been possible without the profound contri- butions of our collaborators and the community at large. We are grateful to our students for their

Load more