Anonymisation Models for Text Data: State of the Art, Challenges and Future Directions

Total Page:16

File Type:pdf, Size:1020Kb

Anonymisation Models for Text Data: State of the Art, Challenges and Future Directions Anonymisation Models for Text Data: State of the Art, Challenges and Future Directions Pierre Lison1, Ildiko´ Pilan´ 1, David Sanchez´ 2, Montserrat Batet2, and Lilja Øvrelid3 1Norwegian Computing Center, Oslo, Norway 2Universitat Rovira i Virgili, CYBERCAT, UNESCO Chair in Data Privacy, Spain 3Language Technology Group, University of Oslo, Norway fplison,[email protected] fdavid.sanchez,[email protected] liljao@ifi.uio.no Abstract tion techniques that render personal data no longer personal. Access to anonymised data is a prerequi- This position paper investigates the problem of automated text anonymisation, which is a pre- site for research advances in many scientific fields, requisite for secure sharing of documents con- notably in medicine and the social sciences. By fa- taining sensitive information about individuals. cilitating open data initiatives, anonymised data can We summarise the key concepts behind text also help empower citizens and support democratic anonymisation and provide a review of current participation. For structured databases, anonymi- approaches. Anonymisation methods have so sation can be enforced through well-established far been developed in two fields with little mu- privacy models such as k-anonymity (Samarati, tual interaction, namely natural language pro- cessing and privacy-preserving data publish- 2001; Samarati and Sweeney, 1998) or differen- ing. Based on a case study, we outline the ben- tial privacy (Dwork et al., 2006). These privacy efits and limitations of these approaches and models and their implementations are, however, discuss a number of open challenges, such as difficult to apply to unstructured data such as texts. (1) how to account for multiple types of seman- In fact, text anonymisation has been traditionally tic inferences, (2) how to strike a balance be- enforced manually, a process that is costly, time- tween disclosure risk and data utility and (3) consuming and prone to errors (Bier et al., 2009). how to evaluate the quality of the resulting These limitations led to the development of various anonymisation. We lay out a case for moving beyond sequence labelling models and incor- computational frameworks designed to extend auto- porate explicit measures of disclosure risk into mated or semi-automated anonymisation to the text the text anonymisation process. domain (Meystre et al., 2010;S anchez´ and Batet, 2016; Dernoncourt et al., 2017). 1 Introduction In this paper, we review the core concepts un- Privacy is a fundamental human right (Art. 12 of derlying text anonymisation, and survey the ap- the Universal Declaration of Human Rights) and proaches put forward to solve this task. These a critical component of any free society, among can be divided into two independent research di- others to protect citizens against social control, rections. On the one hand, NLP approaches rely stigmatisation, and threats to political expression. on sequence labelling to detect and remove prede- Privacy is also protected by multiple national and fined categories of entities that are considered sen- international legal frameworks, such as the General sitive or of personal nature (such as names, phone Data Protection Regulation (GDPR) introduced in numbers or medical conditions). On the other Europe in 2018. This right to privacy imposes hand, privacy-preserving data publishing (PPDP) constraints on the usage and distribution of data in- approaches take the notion of disclosure risk as cluding personal information, such as emails, court starting point and anonymise text by enforcing a pri- cases or patient records. In particular, personal vacy model. Anonymisation consists of a sequence data cannot be distributed to third parties (or even of transformations (such as removal or generalisa- used for secondary purposes) without legal ground, tion) on the document to ensure the requirements such as the explicit and informed consent of the derived from the privacy model are fulfilled. individuals to whom the data refers. This position paper makes the case that none As informed consent is often difficult to obtain of these approaches provide a fully satisfactory in practice, an alternative is to rely on anonymisa- account of the text anonymisation problem. We 4188 Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pages 4188–4203 August 1–6, 2021. ©2021 Association for Computational Linguistics illustrate their merits and shortcomings on a case Direct Identifier: A (set of) variable(s) unique study and discuss three open challenges: for an individual (a name, address, phone 1. How to ensure that anonymisation is robust number or bank account) that may be used against multiple types of semantic inferences, to directly identify the subject. based on background knowledge assumed to Quasi Identifier: Information (such as gender, be available to an adversary ; nationality, or city of residence) that in iso- 2. How to transform the text in order to minimise lation does not enable re-identification, but the risk of disclosing personal data, yet retain may do so when combined with other quasi- as much semantic content as possible ; identifiers and background knowledge. 3. How to empirically evaluate the quality (in Confidential Attribute: Private personal infor- terms of disclosure risk and utility preserva- mation that should not be disclosed (such as a tion) of the resulting anonymisation. medical condition). Identity Disclosure: Unequivocal association of We argue in this paper that NLP and PPDP ap- a record/document with a subject’s identity. proaches should be viewed as complementary (one focusing on linguistic patterns, the other on disclo- Attribute disclosure: Unequivocal inference of sure risk) and that future anonymisation approaches a confidential attribute about a subject. for text should seek to reconcile these two views. Anonymisation: Complete and irreversible re- In particular, we contend that text anonymisation moval from a dataset of any information that, editor model models should combine a data-driven directly or indirectly, may lead to a subject’s (which selects masking operations on the docu- data being identified. ment) with an adversary seeking to infer confiden- tial attributes from edited documents. De-identification: Process of removing specific, predefined direct identifiers from a dataset. 2 What is Anonymisation? Pseudonymisation: Process of replacing direct The most common definition of privacy amounts to identifiers with pseudonyms or coded values self-determination, which is the ability of individ- (such ”John Doe” ! ”Patient 3”). The map- uals, groups or organisations to seclude informa- ping between coded values and the original tion about themselves selectively (Westin, 1967). identifiers is then stored separately. Information related to an identified or identifiable Table 1: Key terms related to data anonymisation. person is known as personal data, or more precisely personally identifiable information (PII). Datasets with PII cannot be released without control as this terms “anonymisation” and “de-identification” in- would impair the privacy of the data subjects. terchangeably (Chevrier et al., 2019). GDPR-compliant anonymisation is the complete 2.1 Legal Requirements and irreversible process of removing personal iden- Various legal frameworks regulate how PII can be tifiers, both direct and indirect, that may lead to an collected and processed. In particular, the Gen- individual being identified. Direct identifiers cor- eral Data Protection Regulation introduced in Eu- respond to values such as names or social security rope (GDPR, 2016) states that data owners must numbers that directly disclose the identity of the have a legal basis for processing PII, the most im- individual. However, removing direct identifiers is portant one being the explicit consent of the data not sufficient to eliminate all disclosure risks, as subjects.Alternatively, data owners may choose to individuals may also be re-identified by combining anonymise the data to ensure it can no longer be at- several pieces of information together with some tributed to specific individuals. Anonymised data is background knowledge. For instance, the combi- no longer regulated by the GDPR and can therefore nation of gender, birth date and postal code can be freely released. be exploited to identify between 63 and 87% of Table1 defines some of the key terms related the U.S. population, due to the public availability to data anonymisation (Elliot et al., 2016). This of US Census Data (Golle, 2006). These types of terminology is, however, not always applied con- personal identifiers are called quasi-identifiers and sistently, as several authors seem to use e.g. the encompass a large variety of data types such as 4189 demographic and geospatial data. Anonymisation disclosure, which is usually more harmful from a therefore necessitates both the removal of direct privacy perspective, should also be avoided. identifiers and the masking of quasi-identifiers. The removal of personal information necessarily Other legal frameworks have adopted a different entails some data utility loss. Because the ultimate approach. In the US, the Health Insurance Porta- purpose behind data releases is to produce usable bility and Accountability Act (HIPAA) (HIPAA, data, the best anonymisation methods are those 2004) lists 18 data types, such as patient’s name, that optimise the trade-off between minimising the address or social security number, which
Recommended publications
  • Pdf 126.47 Kb
    •Platinum Metals Rev., 2011, 55, (3), 175–179• 9th International Frumkin Symposium “Electrochemical Technologies and Materials for 21st Century” doi:10.1595/147106711X579074 http://www.platinummetalsreview.com/ Reviewed by Alexey Danilov Introduction A. N. Frumkin Institute of Physical Chemistry and The 9th International Frumkin Symposium was held Electrochemistry of the Russian Academy of Sciences, at the Conference Hall of the Russian Academy of Moscow, Russia Sciences in Moscow, Russia, between 24th–29th Email: [email protected] October 2010. The event was organised jointly by the A. N. Frumkin Institute of Physical Chemistry and Elec- trochemistry of the Russian Academy of Sciences (IPCE RAS) and the Chemical Department of Lomono- sov Moscow State University. The Symposium was sponsored by the Russian Academy of Sciences and the International Society of Electrochemistry. The fi rst symposium of this series, held in 1979, was dedicated to the memory of Russian electrochemist Alexander Frumkin (1895–1976). Since then, the Sym- posia have been held every 3–5 years to discuss cur- rent understanding of fundamental electrochemistry and its applications. The 9th Frumkin Symposium included 4 plenary lectures, 112 oral and 131 poster presentations. Scien- tists from the Russian Federation, Ukraine, Belarus, the USA, Canada, Austria, the UK, Germany, Spain, Denmark, Poland, Serbia, Switzerland, Finland, France, Italy, Iran and Taiwan took part in fi ve ‘microsympo- sia’ which made up the conference, including ‘Electri- cal Double Layer and Electrochemical Kinetics’, ‘New Processes, Materials and Devices for Successful Elec- trochemical Transformation of Energy’, ‘Corrosion and Protection of Materials’, ‘Electroactive Composi- tion Materials’ and ‘Bioelectrochemistry’. Platinum group metals (pgms) are widely used by electrochemists as electrode materials for studies of adsorption, nucleation, electrodeposition, electro- catalytic reactions and many other electrochemical processes used in applications such as fuel cells.
    [Show full text]
  • This Is an Electronic Reprint of the Original Article. This Reprint May Differ from the Original in Pagination and Typographic Detail
    This is an electronic reprint of the original article. This reprint may differ from the original in pagination and typographic detail. On the scientific heritage of Mikhail Isaakovich Temkin Murzin, Dmitry Published in: Kinetics and Catalysis DOI: 10.1134/S0023158419040104 Publicerad: 01/01/2019 Document Version (Referentgranskad version om publikationen är vetenskaplig) Document License CC BY-ND Link to publication Please cite the original version: Murzin, D. (2019). On the scientific heritage of Mikhail Isaakovich Temkin. Kinetics and Catalysis, 60(4), 388–397. https://doi.org/10.1134/S0023158419040104 General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. Take down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. This document is downloaded from the Research Information Portal of ÅAU: 25. Sep. 2021 УДК 541.124: 541.128 On the Scientific Heritage of Mikhail Isaakovich Temkin D. Yu. Murzina, * a Åbo Akademi University, Turku, 20500 Finland *e-mail: [email protected] Translated by Andrey Zeigarnik Received March 20, 2019; Revised March 20, 2019; Accepted March 25, 2019 Abstract—The article presents a retrospective of the main scientific works of an outstanding physical chemist M.I. Temkin (1908–1991) on the kinetics of catalytic reactions and chemical engineering, as well as other branches of physical chemistry.
    [Show full text]
  • Just Published
    JustFundamentals Published: of Electrochemistry This past December,December, ECS was pleased to add to its monograph series, the second edition of Fundamentals of Electrochemistry by Vladimir S. Bagotsky. This new book, published by John A. WileyWiley & Sons, gives the basic outline of most topics of theoretical and applied electrochemistryelectrochemistry for students not yet familiar with this fi eld, as well as an outline of recent and advanced developments in electrochemistryelectrochemistry for people who are already dealing with electrochemical problems. Dr.Dr. Bagotsky is a retired professor from the Frumkin Institute of ElectrochemistryElectrochemistry at the Russian Academy of Sciences. Previously he headed the new batterybattery system department of Moscow Power SourcesSources Institute. He has extensive experience with both theoretical and applied electrochemistry.y. He has authored 400 scientifi c papers, the fi rst edition of Fundamentals of ElectrochemistryElectrochemistry, and served on the editorial boards for the Journal of Power SourcesSources and ElektrokhimiyaElektrokhimiya. In addition to this important contribution to the body of work on electrochemistry, Dr. Bagotsky will leave another legacy: he is contributing all of his royalties to ECS, for which the Society would like to express its sincere thanks. The author has had a long and interesting career; and he recently shared some of his experiences with us. Interface: Congratulations on the solid-state science) who often encounter knowledge of English, French, Russian, publication of this monograph. It is electrochemical problems. and German art and literature. expected to fi ll a gap, certainly in The Interface: You graduated from high At that time the most notable event in Electrochemical Society Monograph school in Switzerland; did you grow up electrochemistry was the beginning of Series, as well as in the list of in Bern? When and how did you start to the development of electrochemical publications from Wiley.
    [Show full text]
  • The 67Th Annual Meeting of the International Society Of
    Program of the 67th Annual Meeting of the International Society of Electrochemistry i The 67th Annual Meeting of the International Society of Electrochemistry Electrochemistry: from Sense to Sustainability 21-26 August, 2016 The Hague, The Netherlands CONTENTS LIST Organizing Committee ..................................................................................................................v Symposium Organizers ..........................................................................................................vi-vii Tutorial Lectures ....................................................................................................................... viii Plenary Lectures ...........................................................................................................................ix Prize Winners .......................................................................................................................... x-xii ISE Society Meetings ................................................................................................................ xiii Poster Sessions ........................................................................................................................... xii General Information ........................................................................................... inside front cover Registration Hours during the Meeting ..................................................... inside front cover On Site Registration Fees .........................................................................
    [Show full text]
  • Physics in a Mad World
    9281_9789814619288_tp.indd 1 5/8/15 9:35 am May 2, 2013 14:6 BC: 8831 - Probability and Statistical Theory PST˙ws This page intentionally left blank :RUOG6FLHQWLÀF 9281_9789814619288_tp.indd 2 5/8/15 9:35 am Published by :RUOG6FLHQWL¿F3XEOLVKLQJ&R3WH/WG 7RK7XFN/LQN6LQJDSRUH 6$RI¿FH:DUUHQ6WUHHW6XLWH+DFNHQVDFN1- .RI¿FH6KHOWRQ6WUHHW&RYHQW*DUGHQ/RQGRQ:&++( British Library Cataloguing-in-Publication Data $FDWDORJXHUHFRUGIRUWKLVERRNLVDYDLODEOHIURPWKH%ULWLVK/LEUDU\ 7UDQVODWLRQIURP5XVVLDQE\-DPHV0DQWHLWK &RYHUGHVLJQE\$QQD/RYVN\ PHYSICS IN A MAD WORLD &RS\ULJKWE\:RUOG6FLHQWL¿F3XEOLVKLQJ&R3WH/WG $OOULJKWVUHVHUYHG7KLVERRNRUSDUWVWKHUHRIPD\QRWEHUHSURGXFHGLQDQ\IRUPRUE\DQ\PHDQV HOHFWURQLFRUPHFKDQLFDOLQFOXGLQJSKRWRFRS\LQJUHFRUGLQJRUDQ\LQIRUPDWLRQVWRUDJHDQGUHWULHYDO system now known or to be invented, without written permission from the publisher. )RUSKRWRFRS\LQJRIPDWHULDOLQWKLVYROXPHSOHDVHSD\DFRS\LQJIHHWKURXJKWKH&RS\ULJKW&OHDUDQFH &HQWHU,QF5RVHZRRG'ULYH'DQYHUV0$6$,QWKLVFDVHSHUPLVVLRQWRSKRWRFRS\ LVQRWUHTXLUHGIURPWKHSXEOLVKHU ,6%1 ,6%1 SEN 3ULQWHGLQ6LQJDSRUH Lakshmi - Physics in a Mad World.indd 1 21/7/2015 11:14:59 AM August 17, 2015 11:16 ws-procs9x6x11pt-9x6 9281-00-TOC page v CONTENTS Preface ix M. Shifman Introduction: Information and Musings 1 M. Shifman Part I. Houtermans Professor Friedrich Houtermans. Works, Life, Fate 91 Victor Frenkel 1. Introduction 93 2. Beginnings 95 3. G¨ottingen 99 4. Berlin 107 5. 1933 119 6. London 125 7. Impressions of Kharkov 129 8. Kharkov: 1935-1937 141 9. Arrests 151 10. Last Months in the USSR (From Charlotte Houtermans’ Diary) 157 11. Descent into the Prisons: 1937-39 172 12. From Desperation to Hope (Continuation of Charlotte Houtermans’ Diary) 180 13. Fighting for Freedom 194 14. Descent into the Prisons: 1939-40 205 15. The Bridge over the River Bug 210 v July 27, 2015 16:53 ws-procs9x6x11pt-9x6 9281-00-TOC page vi vi Contents 16.
    [Show full text]
  • Jews in the Soviet Union Are Strongly Attracted by Zionism
    Solomon Rabinovich (born in 1904 ) is a vetcrnu Sovi et journali st. During World War II he was cd i­ tor of an Army new spa ­ per a nd won thr ee orde rs fo r valour in battle. At pr esent he works for the Nov osti Pre ss Agen cy. Solomon Rabinovich is one of the authors of th e bo ok in Yiddi sh This Is How ' Ve Lin. After he had mad e a trip to Israel th e Soviet Jewish m aga­ zine Snvletlsh HehnIand pub llsh ecl hi s tr av cl notc s. whi ch evok ed great int e­ rest am on g Sovi et and Io ­ reign read er s. In 1965 th e Novosti Press Agen cy published a book­ let by Habinovich about the life of the Jews in the USSR, whi ch was received with gr eat interest abroad, so mu ch so that we hav e now publish ed a new booklet by the same author. SOLOMON RABINOViCH IN Novosti Press Agency Publishing House Moscow " To US our Motherland is the country where many generations of our people were born and died, where we ourselves were born, work and will die." Mendele Moikher-Sforim YESTERDAY AND TODA Y 9 A Bit of Histor y . 9 Anti-Semitism Outlawed . 18 Notes of a Talk .. 20 Wh at Is Birobid jan ? . 24 How Many J ews Are There in the Jewish Aut onomous Region ? . 28 IN Ol'E FAMILY OF NATIO"S 30 A Helping Hand . .. 30 In Ti mes of Stre ss 32 The Sto ry of a J ewish Boy .
    [Show full text]
  • 9Th International Frumkin Symposium
    •Platinum Metals Rev., 2011, 55, (3), 175–179• 9th International Frumkin Symposium “Electrochemical Technologies and Materials for 21st Century” doi:10.1595/147106711X579074 http://www.platinummetalsreview.com/ Reviewed by Alexey Danilov Introduction A. N. Frumkin Institute of Physical Chemistry and The 9th International Frumkin Symposium was held Electrochemistry of the Russian Academy of Sciences, at the Conference Hall of the Russian Academy of Moscow, Russia Sciences in Moscow, Russia, between 24th–29th Email: [email protected] October 2010. The event was organised jointly by the A. N. Frumkin Institute of Physical Chemistry and Elec- trochemistry of the Russian Academy of Sciences (IPCE RAS) and the Chemical Department of Lomono- sov Moscow State University. The Symposium was sponsored by the Russian Academy of Sciences and the International Society of Electrochemistry. The fi rst symposium of this series, held in 1979, was dedicated to the memory of Russian electrochemist Alexander Frumkin (1895–1976). Since then, the Sym- posia have been held every 3–5 years to discuss cur- rent understanding of fundamental electrochemistry and its applications. The 9th Frumkin Symposium included 4 plenary lectures, 112 oral and 131 poster presentations. Scien- tists from the Russian Federation, Ukraine, Belarus, the USA, Canada, Austria, the UK, Germany, Spain, Denmark, Poland, Serbia, Switzerland, Finland, France, Italy, Iran and Taiwan took part in fi ve ‘microsympo- sia’ which made up the conference, including ‘Electri- cal Double Layer and Electrochemical Kinetics’, ‘New Processes, Materials and Devices for Successful Elec- trochemical Transformation of Energy’, ‘Corrosion and Protection of Materials’, ‘Electroactive Composi- tion Materials’ and ‘Bioelectrochemistry’. Platinum group metals (pgms) are widely used by electrochemists as electrode materials for studies of adsorption, nucleation, electrodeposition, electro- catalytic reactions and many other electrochemical processes used in applications such as fuel cells.
    [Show full text]
  • Pillars of Modern Electroc Hemistry
    ClassicsECS Pillars of Moder n Electro chemis by A. K. Shukla and T. Prem Kumar try lthough there is some archaeological evidence which suggests that some form of a primitive Birth Pangs Abattery (sometimes called a Baghdad battery) was used for electroplating in Mesopotamia ca. 200 BC, t was only in the sixteenth century electrochemistry as we know it today had its genesis that electricity began to be understood. in the pile of crowns of Alessandro Volta in 1800. The IThe English scientist William Gilbert (1544-1603), known as the “father of inspiration for his studies might have come from the magnetism” for his work on magnets, famous frog leg experiments of Galvani, who, however, was among the first to experiment with was content to conclude that the phenomenon was of electricity. He devised methods to produce biological origin. A metamorphosis took place with William Gilbert as well as strengthen magnets. The first seminal contributions from John Daniell and Michael electric generator was constructed by the Faraday. From such humble beginnings, electrochemistry German physicist Otto von Guericke (1602-1686) in 1663. The device generated today has matured into a multidisciplinary branch of static electricity by friction between a large study. Built on the precision of physics and depth of sulfur ball and a pad. By the mid-1700s materials science, it encompasses chemistry, physics, the French chemist Charles François de biology, and chemical engineering. Cisternay du Fay (1698-1739) discovered The uniqueness of electrochemistry lies in the fact two types of static electricity. He found that the application of a potential or electric field can that like charges repel each other while help overcome kinetic limitations at low tempera- the unlike charges attract.
    [Show full text]
  • The Duke Historical Review Volume I, Issue 1: Fall 2018 Cover Photo Courtesy of Everpedia Historia Nova: the Duke Historical Review Volume I, Issue 1: Fall 2018
    The Duke Historical Review Volume I, Issue 1: Fall 2018 Cover photo courtesy of Everpedia Historia Nova: The Duke Historical Review Volume I, Issue 1: Fall 2018 Historia Nova features exceptional historical analysis from under- graduate students at Duke University and at institutions from across the United States and around the world, with the ultimate mission of showing that history is innovative, history is new, and history is ours, as our name would suggest. For more information regarding the publication and submissions, visit us at: https://history.duke.edu/news-events/undergraduate Reach us at: [email protected] Historia Nova is an undertaking by undergraduate students at Duke University. Duke University is not responsible for its contents. Editorial Board Editor-in-Chief Lizzie Bond ‘21 Editors Sisi Tang ‘19 Alexander Frumkin ‘21 Karan Thadhani ‘21 President, Duke History Union Michael Brunetti ‘19 Advisory Board John J. Martin, Professor of History, Chair of the Department of History Dirk Bönker, Associate Professor of History James Chappel, Hunt Family Assistant Professor of History Sarah Deutsch, Professor of History Prasenjit Duara, Oscar L. Tang Family Professor of East Asian Studies John French, Professor of History Malachi Hacohen, Professor of History, Director of Undergraduate Studies Reeve Huston, Associate Professor of History Vasant Kaiwar, Visiting Associate Professor of History Adriane Lentz-Smith, Associate Professor of History Sucheta Mazumdar, Associate Professor of History Sumathi Ramaswamy, James B. Duke Professor of History Simon Partner, Professor of History Gunther Peck, Associate Professor of History Susan Thorne, Associate Professor of History Ashton W. Merck, Ph.D. Candidate, Department of History Hannah Ontiveros, Ph.D.
    [Show full text]