Dealing with Pseudonymization and Key Files in Small-Scale Research
Total Page:16
File Type:pdf, Size:1020Kb
Dealing with pseudonymization and key files in small-scale research A few basic steps lcrdm The National Coordination Point Research Data Management (lcrdm) is a national network of experts on research data management (rdm) in the Netherlands. The lcrdm connects policy and daily practice. Within the lcrdm experts work together to put rdm topics on the agenda that ask for mutual national1 cooperation. version: december 2019 more information: www.lcrdm.nl colophon Dealing with pseudonymization and key files in small-scale research A few basic steps publication date | December 2019 doi | 10.5281/zenodo.3571046 lcrdm Task Group Pseudonymization | Simone van Kleef (St. Antonius Hospital), Jan Lucas van der Ploeg (University Medical Center Groningen - umcg), Martiene Moester (Leiden University Medical Center - lumc), Henk van den Hoogen (University Maastricht/liaison lcrdm advisory group), Erik Jansen (University Maastricht/liaison Dataversenl), Tineke van der Meer (University of Applied Sciences Utrecht), Francisco Romero Pastrana (University Groningen), Jolien Scholten (Free University/vu), Leander van der Spek (University for Humanistic Studies), Ingeborg Verheul (lcrdm) Consultancy Group | Derk Arts (Castor), Marlon Domingus (Erasmus University), Laura Huis in ’t Veld (dans), Nicole Koster (University of Twente), Karin van der Pal (Leiden University Medical Center – lumc), Alfons Schroten (University Maastricht/memic). hand out | Boudewijn van den Berg (lcrdm) design | Nina Noordzij, Collage, Grou translation | Gosse van der Leij copyright | all content published can be shared, giving appropriate credit creativecommons.org/licenses/by/4.0 lcrdm lcrdm is supported by 2 Dealing with pseudonymization and key files in small-scale research A few basic steps Contents 5 1. Introduction 6 2. What is pseudonymization? 6 3. Wat is small-scale research? 7 4. Approach 8 5. Basic steps 9 6. In conclusion 10 7. Recommendations 11 Appendix 1: Definitions 12 Appendix 2: Survey results 17 Appendix 3: List of references 3 4 Dealing with pseudonymization and key files in small-scale research A few basic steps 1] Introduction Everyone who carries out scientific research with (sensitive) personal data, faces the same questions: How can we guarantee to participants in research that their personal data will be stored safely? How do we ensure that directly identifiable personal data, necessary for communication and organization of research, are separated from research data? It is common for datasets with (sensitive) personal data for research, to be stored in a data management system in pseudonymized form. For major research projects, part of the budget is often reserved for pseudonymization by Trusted Third Parties (ttp’s), but in the case of small-scale research this is not possible. There is less time and money available. What guidelines should be followed in such circumstances? Especially managing key(files) to link directly identifiable personal data and research data requires special care. How to store it? Who has access? Where to store it? And how can you make sure that access to the key file does not depend on the knowledge of one person and will also be available in the future in one form or another? Is there already a software application that has a satis- factory solution for these issues? In the period between February and June 2019, a lcrdm task group investigated if there are practical ways of pseudonymizing data for small-scale research which can also be used - relatively simply - by other institutions. In the absence of such, suggestions could be made about how best to take the first steps to apply pseudonymization in small-scale research. 5 2] What is pseudonymization? In a large number of research disciplines, pseudonymization has been common practice for some time. During research it may be necessary to identify research participants, for 1 There was an overwhel- ming demand for clarity example, in order to verify source data or to monitor persons over a longer period of time. concerning this subject as In such cases research data cannot be anonymized. Researchers then opt for pseudony- proven by the speed with which people signed up for mization; not only to protect the privacy of research participants, but also with regard to the task group. Within 24 scientific integrity (for research, researcher and research institute). The recent publicity hours, a ten-member task group had been formed surrounding the implementation of the avg (the Dutch equivalent of the gdpr) has gene- together with a consultation rated extra interest for this subject.1 group comprising a number of interested parties. Pseudonymization is here understood to mean: replacing the directly identifiable variables 2 Appendix 1 contains a list 2 of relevant definitions. in a dataset with a pseudonym. In some circles it is also called coding. This way of wor- king does not mean that a whole dataset has been pseudonymized. If the dataset contains 3 As opposed to pseudony- mized data, anonymous free text fields, they may contain potentially directly traceable data. In addition, a combi- data cannot be traced back nation of other (not directly identifiable) variables that are important for the research in to a person in any manner. A different task group is question, may lead to the identification of a person. concerned with the issues of working with anonymous data. The purpose of pseudonymization in thís form is therefore not to obtain an anonymous dataset (or as anonymous as possible3), but to protect the privacy of research participants from the onset, during the collection of data. Moreover, within the context of scientific integrity, the directly identifiable variables must be withheld from the researcher. 3] What is small-scale research? The task group focused on guidelines for pseudonymization during small-scale and quanti- tative research. We define small-scale research as: research with a limited number of parti- cipants and/or restricted financial means. The limitation to quantitative data stems from the fact that pseudonymization of qualitative data (e.g. video and audio files and transcripts thereof) necessitate other measures than simply replacing directly identifiable variables in a data file with a code. Videos can show recognizable images of people or during an interview that is subsequently quoted in the accompanying transcript. 6 4] Approach First, based on our own experience and available literature, the task group has made an inventory of how pseudonymization is applied in small-scale research. A survey has been drawn up to gain a better and more complete picture of current pseudonymization prac- tices in research institutes. It was distributed via different channels, especially among data management support staff. The survey asked how data was pseudonymized and who was responsible; if and what kind of software was used; where key files were stored; who had access and what happened to the key files once the project had been completed; if the in- stitute had a policy for pseudonymization and which problems it faced. Of 32 respondents, including (several) researchers and research support staff, 26 used (a form of) pseudonymi- zation. The most important conclusions that can be drawn from this survey are: 1. Most institutes do not use specific pseudonymization software for pseudonymizing data. Some institutes do have certain tools but these cannot be directly deployed outside their own research, or institute. Currently these tools are therefore not useful on a national level. 2. Most institutes do not have policy concerning pseudonymization or a subfield thereof, for example, dealing with key files. The variety of answers also show that opinions about whether or not something is permitted, differ widely. Appendix 2 contains a more comprehensive description of the survey results. To complement the survey, use cases in the daily practice at institutes of the task group members were examined. Relevant laws and regulations and other related documentation were also studied (see appendix 3). This led to the formulation of a list of basic steps for the pseudonymization of data. 7 5] Basic steps This report is aimed at research support staff, researchers and/or research institutes that have little knowledge of pseudonymization and lack sufficient tooling/infrastructure. Before going through the following basic steps, it is advisable to first check if there already is an existing policy for pseudonymization at the institute where you work. Contact a spe- cialist within your own organization if you have questions about implementation of (one of) the measures described below. Institutional policy always takes precedence over the general basic steps listed here. The task group identifies the following basic steps that researchers and research support staff can follow when pseudonymizing datasets for small-scale research. 1. In the data management plan, describe why and how you’re going to pseudonymize data, how access to the separately stored key file and the dataset is regulated and what happens to the key file and the data when the project is completed. 4 A research data ma- Identify the following categories in your data: nagement system (dms) is 2. a programme which allows • Data necessary for identification, to organize research or to communicate with you to store and manage research participants research data. A high-qua- lity dms records all activity » »» Store these in a key file in the research data base • Data required for analysis (audit trail) and ensures adequate security. » »» Preferably stored in a data management system4 • Data not needed (e.g. in case of a supplied dataset) » »» This data should be deleted. 3. Pseudonymize the data as quickly as possible, i.e. immediately when collecting data. If you are sent a dataset with identifiable data by another party, pseudonymize the data immediately after receiving it. 4. Use different pseudonyms for different datasets. This prevents that data from partici- pants who feature in multiple datasets can be linked via the pseudonym. 5. Store the key file separately from the research data.