AWARE GOVERNMENT DATA RELEASES Micah Altman, Alexandra Wood, David R
Total Page:16
File Type:pdf, Size:1020Kb
TOWARDS A MODERN APPROACH TO PRIVACY- AWARE GOVERNMENT DATA RELEASES Micah Altman, Alexandra Wood, David R. O’Brien, Salil Vadhan & Urs Gasser† ABSTRACT Governments are under increasing pressure to publicly release collected data in order to promote transparency, accountability, and innovation. Because much of the data they release pertains to individuals, agencies rely on various standards and interventions to protect privacy interests while supporting a range of beneficial uses of the data. However, there are growing concerns among privacy scholars, policymakers, and the public that these approaches are incomplete, inconsistent, and difficult to navigate. To identify gaps in current practice, this Article reviews data released in response to freedom of information and Privacy Act requests, traditional public and vital records, official statistics, and e-government and open government initiatives. It finds that agencies lack formal guidance for implementing privacy interventions in specific cases. Most agencies address privacy by withholding or redacting records that contain directly or indirectly identifying information based on an ad hoc balancing of interests, and different government actors sometimes treat similar privacy risks vastly differently. These observations demonstrate the need for a more systematic approach to privacy analysis and also suggest a new way forward. DOI: http://dx.doi.org/10.15779/Z38FG17 © 2015 Micah Altman, MIT; Alexandra Wood, David R. O’Brien, Salil Vadhan & Urs Gasser, Harvard University. † Micah Altman and Alexandra Wood are the lead authors, with Alexandra Wood creating the initial draft of the manuscript and Micah Altman and Alexandra Wood taking primary responsibility for revisions. All authors, Micah Altman, Urs Gasser, David R. O’Brien, Salil Vadhan, and Alexandra Wood, contributed to the conception of the report (including core ideas and statement of research questions). Micah Altman, David R. O’Brien, and Alexandra Wood were primarily responsible for the methodology (development of the use cases and taxonomies applied), and David R. O’Brien for the project administration. Urs Gasser, David R. O’Brien, and Salil Vadhan contributed to the writing through critical review and commentary. Micah Altman, Urs Gasser, and Salil Vadhan provided scientific direction, and Urs Gasser led funding acquisition. Microsoft Corporation, in collaboration with the Berkeley Center for Law & Technology, supported the research and the writing of this report. In addition, this material is based upon work supported by the National Science Foundation under Grant No. 1237235, the Ford Foundation, and the John D. and Catherine T. MacArthur Foundation. We thank the members of the Privacy Tools for Sharing Research Data project for helpful comments. 1968 BERKELEY TECHNOLOGY LAW JOURNAL [Vol. 30:3 In response to these concerns, this Article proposes a framework for a modern privacy analysis informed by recent advances in data privacy from disciplines such as computer science, statistics, and law. Modeled on an information security approach, this framework characterizes and distinguishes between privacy controls, threats, vulnerabilities, and utility. When developing a data release mechanism, policymakers should specify the desired data uses and expected benefits, examine each stage of the data lifecycle to identify privacy threats and vulnerabilities, and select controls for each lifecycle stage that are consistent with the uses, threats, and vulnerabilities at that stage. This Article sketches the contours of this analytical framework, populates selected portions of its contents, and illustrates how it can inform the selection of privacy controls by discussing its application to two real-world examples of government data releases. TABLE OF CONTENTS I. INTRODUCTION: THE CHANGING LANDSCAPE OF GOVERNMENT RELEASES OF DATA ....................................... 1970 II. OVERVIEW OF CURRENT PRACTICES FOR RELEASING GOVERNMENT DATA .......................................... 1975 A. FOUR BROAD CATEGORIES OF GOVERNMENT DATA RELEASES ................................................................................... 1976 1. Freedom of Information and Privacy Act Requests ................ 1977 a) Types of Information Released ............................... 1979 b) Standards for Making Release Decisions ............... 1982 c) Privacy Interventions in Use ................................... 1984 2. Traditional Public and Vital Records ................................... 1986 a) Types of Information Released ............................... 1988 b) Standards for Making Release Decisions ............... 1989 c) Privacy Interventions in Use ................................... 1989 3. Official Statistics ................................................................. 1991 a) Types of Information Released ............................... 1992 b) Standards for Making Release Decisions ............... 1993 c) Privacy Interventions in Use ................................... 1995 4. E-Government and Open Government Initiatives ............... 1997 a) Types of Information Released ............................... 1999 b) Standards for Making Release Decisions ............... 2002 c) Privacy Interventions in Use ................................... 2004 B. SHORTCOMINGS IN CURRENT PRACTICES ............................. 2006 III. A FRAMEWORK FOR MODERNIZING PRIVACY ANALYSIS ........................................................................................... 2010 A. CHARACTERIZING PRIVACY CONTROLS, THREATS, VULNERABILITIES, AND USES ................................................... 2011 B. DEVELOPING A CATALOG OF PRIVACY CONTROLS AND INTERVENTIONS ........................................................................ 2015 1. Privacy Controls at the Collection and Acceptance Stage ....... 2017 2. Privacy Controls at the Transformation Stage ...................... 2020 2015] PRIVACY-AWARE GOVERNMENT DATA RELEASES 1969 3. Privacy Controls at the Retention Stage ............................... 2023 4. Privacy Controls at the Release and Access Stage ................... 2024 5. Privacy Controls at the Post-Access Stage ............................. 2028 C. IDENTIFYING INFORMATION USES, THREATS, AND VULNERABILITIES ...................................................................... 2032 1. Information Uses and Expected Utility ................................ 2032 2. Privacy Threats .................................................................. 2034 3. Privacy Vulnerabilities ........................................................ 2036 D. DESIGNING DATA RELEASES BY ALIGNING USE, THREATS, AND VULNERABILITIES WITH CONTROLS ............ 2040 1. Specifying Desired Data Uses and Expected Benefits ............ 2041 2. Selecting Controls ................................................................ 2042 IV. APPLYING THE FRAMEWORK TO REAL-WORLD EXAMPLES OF GOVERNMENT DATA RELEASES ............... 2048 A. PUBLIC RELEASE OF WORKPLACE INJURY RECORDS ............ 2049 1. Collection and Acceptance Stage ........................................... 2049 2. Retention Stage ................................................................... 2051 3. Post-Retention Transformation ........................................... 2052 4. Release and Access Stage ...................................................... 2052 5. Post-Access Stage ................................................................. 2056 6. Aligning Uses, Threats, and Vulnerabilities with Controls .............................................................................. 2056 B. MUNICIPAL OPEN DATA PORTALS .......................................... 2059 1. Collection and Acceptance Stage ........................................... 2060 2. Retention Stage ................................................................... 2061 3. Post-Retention Transformation ........................................... 2061 4. Release and Access Stage ...................................................... 2063 5. Post-Access Stage ................................................................. 2067 6. Aligning Use, Threats, and Vulnerabilities with Controls ..... 2068 V. SUMMARY .......................................................................................... 2070 1970 BERKELEY TECHNOLOGY LAW JOURNAL [Vol. 30:3 I. INTRODUCTION: THE CHANGING LANDSCAPE OF GOVERNMENT RELEASES OF DATA Transparency is a fundamental principle of democratic governance. Making government data more widely available promises to enhance organizational transparency, improve government functions, encourage civic engagement, support the evaluation of government decisions, and ensure accountability for public institutions. Releases of government data also promote growth in the private sector by guiding investment and other commercial decisions, supporting innovation in the technology sectors, and promoting economic development and competition broadly.1 Furthermore, improving access to government data also advances the state of research and scientific knowledge, changing how researchers approach their fields of study and enabling them to ask new questions and gain better insights into human behaviors.2 For instance, the increased availability of large-scale datasets is advancing developments in computational social science, a field that is rapidly changing the study of humans, human behavior, and human institutions, and effectively shifting the evidence base of social science.3 Scientists are also developing