Nothing Personal: the Concepts of Anonymization and Pseudonymization in European Data Protection

Total Page:16

File Type:pdf, Size:1020Kb

Nothing Personal: the Concepts of Anonymization and Pseudonymization in European Data Protection Master Thesis Law and Technology LLM Nothing personal: The concepts of anonymization and pseudonymization in European Data Protection Supervisors: Student: F. Stoitsev st Lorenzo Dalla Corte (1 ) ANR: 729037 Colette Cuijpers (2nd) August 2016 Filip Stoitsev (ANR 729037) Master Thesis: Law and Technology Nothing personal: The concepts of anonymization and pseudonymization in European Data Protection Table of Contents List of Abbreviations ............................................................................................................................ 2 Chapter 1 - Introduction ...................................................................................................................... 3 Chapter 2 – Defining the Concepts ...................................................................................................... 9 2.1. The concept of personal data .................................................................................................... 9 2.2. Anonymization ......................................................................................................................... 13 2.3. Pseudonymization .................................................................................................................... 14 2.4. Data Protection Directive ........................................................................................................ 14 2.4.1. Anonymization .................................................................................................................... 14 2.4.2. Pseudonymization ............................................................................................................... 18 2.5. GDPR ........................................................................................................................................ 19 2.5.1. Anonymization .................................................................................................................... 19 2.5.2. Pseudonymization ............................................................................................................... 20 2.6. Anonymization and Pseudonymization techniques ............................................................... 22 2.7. Conclusion ................................................................................................................................ 23 Chapter 3 – Identifying the threats .......................................................................................................... 24 3.1. Re-identification ....................................................................................................................... 24 3.2. The landmark re-identification studies .................................................................................. 25 3.2.1. Massachusetts Medical Database ........................................................................................ 26 3.2.2. AOL .................................................................................................................................... 26 3.2.3. Netflix ................................................................................................................................. 27 3.3. Utility versus Privacy ............................................................................................................... 29 3.4. New Challenges ........................................................................................................................ 32 3.4.1. Big Data .............................................................................................................................. 32 3.4.2. Profiling and Behavioral Advertising ................................................................................. 37 3.5. Conclusion ................................................................................................................................ 40 Chapter IV Measures to address the challenges .............................................................................. 42 4.1. Computer scientists’ recommendations ................................................................................. 42 4.2. Risk ............................................................................................................................................ 44 4.3. Risk-based approach in the European Data Protection Legislation ................................... 45 4.3.1. Risk-based approach and pseudonymization ...................................................................... 48 4.4. The robustness of Anonymization .......................................................................................... 50 4.5. DPIA .......................................................................................................................................... 52 4.6. Data protection by design and by default .............................................................................. 55 4.6. Conclusion ................................................................................................................................ 58 Chapter 5 - Conclusion ....................................................................................................................... 59 Bibliography ........................................................................................................................................ 61 Legislation and Case Law .............................................................................................................. 61 Books, Articles and Papers ............................................................................................................. 62 Documents and Reports ................................................................................................................. 67 Other ................................................................................................... Error! Bookmark not defined. 1 Filip Stoitsev (ANR 729037) Master Thesis: Law and Technology List of Abbreviations AOL - American On Line BD – Big Data CNIL - Commission Nationale de l’Informatique et des Libertes (French Supervisory Authority) DPD – Data Protection Directive DPbD – Data Protection by Design DPIA – Data Protection Impact Assessment ECHR – European Convention on Human Rights EDPB – European Data Protection Board EDPS – European Data Protection Supervisor EU – European Union GDPR – General Data Protection Regulation HHP - Heritage Health Prize ICO – Information Commissioner’s Office ICT - Information and Communications Technologies IMDb - Internet Movie Database IP – Internet Protocol ISO – The International Organization for Standardization MAC - Media Access Control (address) MIT - Massachusetts Institute of Technology MS – Member States NYC – New York City PbD – Privacy by Design PETs – Privacy Enhancing Technologies PSI – Public Sector Information TFEU – Treaty on the Functioning of the European Union UKAN - United Kingdom Anonymization Network US – United States WP29 – Article 29 Working Party 2 Filip Stoitsev (ANR 729037) Master Thesis: Law and Technology Nothing personal: The concepts of anonymization and pseudonymization in the light of the European Data Protection Chapter 1 - Introduction “In today’s era of instant information gratification, we have ready access to opinions, rationalizations, and superficial descriptions. Much harder to come by is the foundation knowledge that informs a principled understanding of the world.” Zoltan L. Torey 1 The clash between data use and data protection is one of the most relevant topics of our time, conceived as the ongoing conflict in which the private companies and governments are the aggressors who are looking for data and the individuals who are the victims or the providers of personal data - “the new oil of the internet and the new currency of the digital world”.2 Data protection law is meant to bring balance to this unequal dispute, however, its effectiveness has been repeatedly challenged by the critics. In that sense, the marginal role of the data protection legislation has been overtaken by fast development of data processing techniques, specifically those allowing/permitting the automated processing of vast amounts of data3. The dramatic change of information technologies and the widespread use of the internet have made the current Data Protection Directive4 (hereinafter referred as the “Directive” or “DPD”) obsolete5. This should come as no surprise, as the current data protection principles “were drawn up in 1990 and adopted in 1995, when only 1% of the European Union population was using the Internet and the founder of Facebook was only 11 years old!”6 The upcoming 1 Zoltan L. Torey, The Conscious Mind, MIT Press, (2014), 1. 2 Miglena Kuneva, Roundtable on Online Data Collection, Targeting and Profiling, (2009) 3 Orla Lynskey, The Foundation of the EU Data Protection, OUP, (2015) 1. 4 The European Parliament and the Council Directive 95/46/EC of 24 October 1995 on the protection of individuals with regard to the processing of personal data and on the free movement of such data, OJ L 281, Data Protection Directive. 5 Bert-Jaap Koops, ‘Trouble with European Data Protection Law’ 4 International Data Privacy Law, (2014), 250. 6 Viviane Reding, ‘Outdoing Huxley: Forging a High Level of Data Protection for Europe in the Brave New Digital World’, Speech at Digital Enlightenment Forum, (2012), 4. 3 Filip Stoitsev (ANR 729037) Master Thesis: Law and Technology General Data Protection Regulation7 (“GDPR”
Recommended publications
  • 1 Viewing the GDPR Through a De-Identification Lens
    Viewing the GDPR Through a De-Identification Lens: A Tool for Clarification and Compliance Mike Hintze1 In May 2018, the General Data Protection Regulation (GDPR) will become enforceable as the basis for data protection law in the European Economic Area (EEA). The GDPR builds upon many existing concepts in European data protection law and creates new rights for data subjects. The result is new and heightened compliance obligations for organizations handling data. In many cases, however, how those obligations will be interpreted and applied remains unclear. De-identification techniques provide a range of useful tools to help protect individual privacy. There are many different de-identification techniques which represent a broad spectrum – from relatively weak techniques that can reduce privacy risks to a modest degree, to very strong techniques that can effectively eliminate most or all privacy risk. In general, the stronger the de-identification, the greater the loss of data utility and value. Therefore, different levels of de-identification may be appropriate or ideal in different scenarios, depending on the purposes of the data processing. While there is disagreement on certain aspects of de-identification and the degree to which it should be relied upon in particular circumstances, there is no doubt that de-identification techniques, properly applied, can reduce privacy risks and help protect data subjects’ rights. Regulatory guidance and enforcement activity under the GDPR can further these key objectives by encouraging and rewarding the appropriate use of de-identification. Guidance that fully recognizes the appropriate roles of de-identification can also help bring greater clarity to many GDPR requirements.
    [Show full text]
  • Healthy Data Protection
    Michigan Technology Law Review Article 3 2020 Healthy Data Protection Lothar Determann Freie Universität Berlin Follow this and additional works at: https://repository.law.umich.edu/mtlr Part of the Comparative and Foreign Law Commons, Health Law and Policy Commons, Legislation Commons, and the Privacy Law Commons Recommended Citation Lothar Determann, Healthy Data Protection, 26 MICH. TELECOMM. & TECH. L. REV. 229 (2020). Available at: https://repository.law.umich.edu/mtlr/vol26/iss2/3 This Article is brought to you for free and open access by the Journals at University of Michigan Law School Scholarship Repository. It has been accepted for inclusion in Michigan Technology Law Review by an authorized editor of University of Michigan Law School Scholarship Repository. For more information, please contact [email protected]. HEALTHY DATA PROTECTION Lothar Determann* Modern medicine is evolving at a tremendous speed. On a daily basis, we learn about new treatments, drugs, medical devices, and diagnoses. Both established technology companies and start-ups focus on health- related products and services in competition with traditional healthcare businesses. Telemedicine and electronic health records have the potential to improve the effectiveness of treatments significantly. Progress in the medical field depends above all on data, specifically health information. Physicians, researchers, and developers need health information to help patients by improving diagnoses, customizing treatments and finding new cures. Yet law and policymakers
    [Show full text]
  • Rx-Anon—A Novel Approach on the De-Identification of Heterogeneous Data Based on a Modified Mondrian Algorithm
    rx-anon—A Novel Approach on the De-Identification of Heterogeneous Data based on a Modified Mondrian Algorithm F. Singhofer A. Garifullina, M. Kern A. Scherp [email protected] {aygul.garifullina,mathias.kern}@bt.com [email protected] University of Ulm BT Technology University of Ulm Germany United Kingdom Germany ABSTRACT measure to protect PII is to anonymize all personal identifiers. Prior Traditional approaches for data anonymization consider relational work considered such personal data to be name, age, email ad- data and textual data independently. We propose rx-anon, an anony- dress, gender, sex, ZIP, any other identifying numbers, among oth- mization approach for heterogeneous semi-structured documents ers [12, 16, 31, 34, 52]. Therefore, the field of Privacy-Preserving composed of relational and textual attributes. We map sensitive Data Publishing (PPDP) has been established which makes the as- terms extracted from the text to the structured data. This allows sumption that a data recipient could be an attacker, who might also us to use concepts like :-anonymity to generate a joined, privacy- have additional knowledge (e. g., by accessing public datasets or preserved version of the heterogeneous data input. We introduce observing individuals). the concept of redundant sensitive information to consistently Data to be shared can be structured in the form of relational anonymize the heterogeneous data. To control the influence of data or unstructured like free texts. Research in data mining and anonymization over unstructured textual data versus structured predictive models shows that a combination of structured and un- data attributes, we introduce a modified, parameterized Mondrian structured data leads to more valuable insights.
    [Show full text]
  • NIST SP 800-188, De-Identification of Government Datasets
    1 nd 2 NIST Special Publication 800-188 (2 DRAFT) 3 4 De-Identifying Government Datasets 5 6 7 8 Simson L. Garfinkel 9 10 11 12 13 14 15 16 17 I N F O R M A T I O N S E C U R I T Y nd 18 NIST Special Publication 800-188 (2 DRAFT) 19 20 De-Identifying Government Datasets 21 22 23 Simson L. Garfinkel 24 Information Access Division 25 Information Technology Laboratory 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 December 2016 43 44 45 46 47 48 U.S. Department of Commerce 49 Penny Pritzker, Secretary 50 51 National Institute of Standards and Technology 52 Willie May, Under Secretary of Commerce for Standards and Technology and Director 1 53 Authority 54 This publication has been developed by NIST in accordance with its statutory responsibilities under the 55 Federal Information Security Modernization Act (FISMA) of 2014, 44 U.S.C. § 3551 et seq., Public Law 56 (P.L.) 113-283. NIST is responsible for developing information security standards and guidelines, including 57 minimum requirements for federal information systems, but such standards and guidelines shall not apply 58 to national security systems without the express approval of appropriate federal officials exercising policy 59 authority over such systems. This guideline is consistent with the requirements of the Office of Management 60 and Budget (OMB) Circular A-130. 61 Nothing in this publication should be taken to contradict the standards and guidelines made mandatory and 62 binding on federal agencies by the Secretary of Commerce under statutory authority.
    [Show full text]
  • Addressing the Failure of Anonymization: Guidance from the European Union’S General Data Protection Regulation
    BRASHER_FINAL ADDRESSING THE FAILURE OF ANONYMIZATION: GUIDANCE FROM THE EUROPEAN UNION’S GENERAL DATA PROTECTION REGULATION Elizabeth A. Brasher* It is common practice for companies to “anonymize” the consumer data that they collect. In fact, U.S. data protection laws and Federal Trade Commission guidelines encourage the practice of anonymization by exempting anonymized data from the privacy and data security requirements they impose. Anonymization involves removing personally identifiable information (“PII”) from a dataset so that, in theory, the data cannot be traced back to its data subjects. In practice, however, anonymization fails to irrevocably protect consumer privacy due to the potential for deanonymization—the linking of anonymized data to auxiliary information to re-identify data subjects. Because U.S. data protection laws provide safe harbors for anonymized data, re-identified data subjects receive no statutory privacy protections at all—a fact that is particularly troublesome given consumers’ dependence on technology and today’s climate of ubiquitous data collection. By adopting an all-or-nothing approach to anonymization, the United States has created no means of incentivizing the practice of anonymization while still providing data subjects statutory protections. This Note argues that the United States should look to the risk-based approach taken by the European Union under the General Data Protection Regulation and introduce multiple tiers of anonymization, which vary in their potential for deanonymization, into its data protection laws. Under this approach, pseudonymized data—i.e., certain data * J.D. Candidate 2018, Columbia Law School; B.A. 2012, Bucknell University. Many thanks to Professor Ronald Mann for his insight throughout the Note-writing process.
    [Show full text]
  • Data Anonymizing API – Phase II
    Data Anonymizing API – Phase II TM Forum Digital Transformation World 2018 Nice, France 14-16 May © 2018 TM Forum | 1 Catalyst Champion & Participants Orange, Emilie Sirvent-Hien, Anonymization project manager and Sophie Nachman, Standards Manager To get standardized anonymization API allows sharing of data internally within Orange and externally with partners in order to unleash services innovation , together with guarantying privacy of customers, in compliance with GDPR, General Data Protection Regulation Vodafone Atul Ruparelia, Data Architect and Imo Ekong, Big Data Communication Specialist (Analytics CoE) Contribute towards standard open API for anonymization/pseudonymization (using rich TMF assets) allowing data sharing with internal/external partners to drive service innovation but also protecting PII data in compliance to GDPR. Cardinality Steve Bowker, CEO & Co-Founder, and Dejan Vujic, Head of Data Science Cardinality have implemented one of the largest Hadoop based analytics solutions in a telco in Europe which leverages a containerised microservices based architecture making extensive use of APIs, including different data anonymization, pseudonymization and encryption within their solution. Brytlyt Richard Heyns, CEO & Founder Brytlyt Brytlyt leverage advanced processing on GPUs in natively parallelizable algorithms which form the foundation of their high performance data analytics and machine learning Liverpool John Moores University, Professor Paul Morrissey, Amongst other things Paul is the Global Ambassador for the TMForum with responsibility for Big Data Analytics and Customer Experience Management. Provided input on business drivers, CurateFX, Osterwalder Business Canvass. Data Science and Software Programmers 5G Innovation Centre (5GIC) hosted by University of Surrey 5GIC members are collaborating closely to drive forward advanced wireless research, reduce the risks of implementing 5G (through their 5G testbed) and contribute to global 5G standardisation.
    [Show full text]