Ref. Ares(2017)5557553 - 14/11/2017

SoBigData – 654024 www.sobigdata.eu

Project Acronym SoBigData SoBigData Research Infrastructure Project Title Social Mining & Big Data Ecosystem

Project Number 654024

Deliverable Title Legal and Ethical Framework for SoBigData 2

Deliverable No. D2.3

Delivery Date 31 August 2017

René Mahieu (TUDelft), David van Putten (TUDelft), Jeroen Authors van den Hoven (TUDelft) Stefanie Hänold (LUH), Iryna Lishchuck (LUH), Nikolaus Forgó (LUH)

SoBigData receives funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 654024 SoBigData – 654024 www.sobigdata.eu

DOCUMENT INFORMATION

PROJECT

Project Acronym SoBigData Project Title SoBigData Research Infrastructure Social Mining & Big Data Ecosystem Project Start 1st September 2015 Project Duration 48 months Funding H2020-INFRAIA-2014-2015 Grant Agreement No. 654024 DOCUMENT Deliverable No. D2.3 Deliverable Title Legal and Ethical Framework for SoBigData 2 Contractual Delivery Date 31 August 2017 Actual Delivery Date 14 November 2017 Author(s) René Mahieu (TUDelft), David van Putten (TUDelft), Jeroen van den Hoven (TUDelft) Stefanie Hänold (LUH), Iryna Lishchuck (LUH), Nikolaus Forgó (LUH) Editor(s) René Mahieu (TUDelft), David van Putten (TUDElft), Stefanie Hänold (LUH), Iryna Lishchuck (LUH) Reviewer(s) Salvatore Ruggieri (UNIPI), Dag Elgesem (external - UIB) Contributor(s) Tina Krügel (LUH); Gerhard Gossen (LUH) Work Package No. WP2 Work Package Title WP2 - NA1_Legal and Ethical Framework Work Package Leader TUDELFT Work Package Participants CNR, USFD, UNIPI, LUH, TUDelft, ETHZ, Dissemination Public Nature Report Version / Revision V2.3 Draft / Final Final Total No. Pages (including cover) 81

D2.3 Legal and Ethical Framework for SoBigData 2 Page 2 of 81 SoBigData – 654024 www.sobigdata.eu

Keywords Ethical principles, Responsibility infrastructure, Legal and ethical assessment sheet, Check list, Ethics brief, MOOC, data protection, General Data Protection Regulation, privacy, accountability, , Intellectual Property

D2.3 Legal and Ethical Framework for SoBigData 2 Page 3 of 81 SoBigData – 654024 www.sobigdata.eu

DISCLAIMER

SoBigData (654024) is a Research and Innovation Action (RIA) funded by the European Commission under the Horizon 2020 research and innovation programme.

SoBigData proposes to create the Social Mining & Big Data Ecosystem: a research infrastructure (RI) providing an integrated ecosystem for ethic-sensitive scientific discoveries and advanced applications of social data mining on the various dimensions of social life, as recorded by “big data”. Building on several established national infrastructures, SoBigData will open up new research avenues in multiple research fields, including mathematics, ICT, and human, social and economic sciences, by enabling easy comparison, re-use and integration of state-of-the-art big social data, methods, and services, into new research.

This document contains information on SoBigData core activities, findings and outcomes and it may also contain contributions from distinguished experts who contribute as SoBigData Board members. Any reference to content in this document should clearly indicate the authors, source, organisation and publication date.

The document has been produced with the funding of the European Commission. The content of this publication is the sole responsibility of the SoBigData Consortium and its experts, and it cannot be considered to reflect the views of the European Commission. The authors of this document have taken any available measure in order for its content to be accurate, consistent and lawful. However, neither the project consortium as a whole nor the individual partners that implicitly or explicitly participated the creation and publication of this document hold any sort of responsibility that might occur as a result of using its content.

The European Union (EU) was established in accordance with the Treaty on the European Union (Maastricht). There are currently 27 member states of the European Union. It is based on the European Communities and the member states’ cooperation in the fields of Common Foreign and Security Policy and Justice and Home Affairs. The five main institutions of the European Union are the European Parliament, the Council of Ministers, the European Commission, the Court of Justice, and the Court of Auditors (http://europa.eu.int/).

Copyright © The SoBigData Consortium 2015. See http://project.sobigdata.eu/ for details on the copyright holders.

For more information on the project, its partners and contributors please see http://project.sobigdata.eu/. You are permitted to copy and distribute verbatim copies of this document containing this copyright notice, but modifying this document is not allowed. You are permitted to copy this document in whole or in part into other documents if you attach the following reference to the copied elements: “Copyright © The SoBigData Consortium 2015.”

The information contained in this document represents the views of the SoBigData Consortium as of the date they are published. The SoBigData Consortium does not guarantee that any information contained herein is error-free, or up to date. THE SoBigData CONSORTIUM MAKES NO WARRANTIES, EXPRESS, IMPLIED, OR STATUTORY, BY PUBLISHING THIS DOCUMENT.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 4 of 81 SoBigData – 654024 www.sobigdata.eu

GLOSSARY

ABBREVIATION DEFINITION

CDR Call Data Record

DPD Data Protection Directive

DPO Data Protection Officer

GDPR General Data Protection Regulation

MOOC Massive Open Online Course

RI Research Infrastructure

TOM Technical and organisational measures

VSD Value Sensitive Design

VSDI Value Sensitive Institutional Design

D2.3 Legal and Ethical Framework for SoBigData 2 Page 5 of 81 SoBigData – 654024 www.sobigdata.eu

TABLE OF CONTENT

DOCUMENT INFORMATION ...... 2 DISCLAIMER ...... 4 GLOSSARY ...... 5 TABLE OF CONTENT ...... 6 DELIVERABLE SUMMARY ...... 8 EXECUTIVE SUMMARY ...... 9 1 Relevance to SoBigData ...... 10 1.1 Purpose of this document ...... 10 1.2 Relevance to project objectives ...... 10 1.3 SOBIGDATA project description ...... 10 1.4 Relation to other workpackages ...... 10 1.5 Structure of the document ...... 11 2 Implementing value-sensitive institutional design ...... 12 2.1 Background ...... 12 2.2 Value Sensitive Institutional Design ...... 12 2.3 Future Work ...... 14 3 Responsibility infrastructure ...... 15 4 Ethical principles ...... 19 5 The data science ethical and legal assessment sheet ...... 21 5.1 Data Protection checklist and associated guide ...... 21 5.1.1 Need for a data protection checklist instrument within SoBigData ...... 21 5.1.2 General concept/ Purpose of the Checklist ...... 21 5.1.3 Scope of application – Processing of personal data for scientific research purposes or archiving purposes in the public interest ...... 21 5.1.4 Development process ...... 22 5.2 List of legal requirements for processing of personal data for scientific research purposes or archiving purposes in the public interest – preliminary step for developing the checklist and associated guide 25 5.3 Researcher’s checklist and associated guide ...... 41 5.4 Public Information sheet ...... 59 6 Ethics briefs ...... 61 6.1 Need for Ethics Briefs ...... 61 6.2 Development process ...... 61

D2.3 Legal and Ethical Framework for SoBigData 2 Page 6 of 81 SoBigData – 654024 www.sobigdata.eu

6.3 Design elements ...... 61 6.4 Examples ...... 62 7 MOOC ...... 72 7.1 Need for a MOOC ...... 72 7.2 Development process ...... 72 7.3 Preliminary content ...... 72 7.4 Data protection law section ...... 72 7.5 Intellectual property rights section ...... 76 8 Conclusion: Ethics section ...... 81

D2.3 Legal and Ethical Framework for SoBigData 2 Page 7 of 81 SoBigData – 654024 www.sobigdata.eu

DELIVERABLE SUMMARY

This deliverable is organised as follows:

• Section 2 “Implementing value sensitive institutional design” gives an overview of the work presented in last year's deliverable, explains the necessity of designing an infrastructure in such a way that high level ethical values and legal norms are translated into practices that work for final users of the infrastructure and presents an overview of the elements that are developed to reach this goal.

The subsequent Sections 3 through 7 give a more detailed description of these elements separately.

• Section 3 describes the responsibility infrastructure designed to assign clear responsibilities to all actors within SoBigData. • Section 4 describes the key principles intended to give ethical direction to the work within the research infrastructure. • Section 5 describes the data science ethical and legal assessment sheet. • Section 6 describes the ethics briefs designed to provide more specialised information related to particular types of data and methods. • Section 7 describes the MOOC that is being developed to give every user of the infrastructure an introduction to the key elements of the legal and ethical issues related to social science with big personal data.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 8 of 81 SoBigData – 654024 www.sobigdata.eu

EXECUTIVE SUMMARY

This deliverable describes the integration of the legal and ethical framework that is described in deliverable D2.2 – “Legal and Ethical Framework for SoBigData 1” into the SoBigData infrastructure. As such, this document gives an overview of a value-sensitive institutional design that is being developed to translate the abstract ethical norms and legal principles into a practical framework that helps to empower users to do social science involving big data sets including personal data. This document constitutes key steps to building a research infrastructure (RI) that provides an integrated ecosystem for ethic-sensitive scientific discoveries and advanced applications of social data mining on the various dimensions of social life, as recorded by “big data”. More specifically, it contributes the objective to investigate, design and promote novel architectures, protocols and procedures for the safe and fair use of big data for research purposes, in order to boost excellence and international competitiveness of Europe’s big data research.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 9 of 81 SoBigData – 654024 www.sobigdata.eu

1 RELEVANCE TO SOBIGDATA

SoBigData proposes a research infrastructure (RI) for big data research, which foresees upload, sharing and research on social media1. This endeavour calls various legal and ethical issues into play. This document describes the various elements that are being developed to ensure that the ethical norms and legal principles will be translated into workable practices for all users of the SoBigData research infrastructure.

1.1 PURPOSE OF THIS DOCUMENT

The purpose of this deliverable is to describe the integration of the legal and ethical framework that is described in deliverable D2.2 into the SoBigData infrastructure. As such, this document gives an overview of a value-sensitive institutional design that is being developed to translate the abstract ethical norms and legal principles into a practical framework that helps to empower users to do social science involving big data sets including personal data.

1.2 RELEVANCE TO PROJECT OBJECTIVES

The work presented in this deliverable constitutes key steps to building a research infrastructure (RI) that provides an integrated ecosystem for ethic-sensitive scientific discoveries and advanced applications of social data mining on the various dimensions of social life, as recorded by “big data”. More specifically, it contributes the objective to investigate, design and promote novel architectures, protocols and procedures for the safe and fair use of big data for research purposes, in order to boost excellence and international competitiveness of Europe’s big data research.

1.3 SOBIGDATA PROJECT DESCRIPTION

“SoBigData proposes to create the Social Mining & Big Data Ecosystem: a research infrastructure (RI) providing an integrated ecosystem for ethic-sensitive scientific discoveries and advanced applications of social data mining on the various dimensions of social life, as recorded by “big data”. Building on several established national infrastructures, SoBigData will open up new research avenues in multiple research fields, including mathematics, ICT, and human, social and economic sciences, by enabling easy comparison, re-use and integration of state-of-the-art big social data, methods, and services, into new research”.2

1.4 RELATION TO OTHER WORKPACKAGES

The work presented in this deliverable will be used for training purposes (WP4, especialy T4.2 “training modules, lead by KCL). The work will be shared with WP8 Big Data Ecosystem and WP 10 SoBigData e- Infrastructure and integration of SoBigData RI into e-Infrastructures3, in order to technicaly integrate the developed elements into the infrastructure. In the end, this will contribute to the aims of WP6 Transnational Access and WP7 Virtual Access, by delivering practical tools for users of the infrastructure to do ethics-sensitive and legally compliant research.

1 SoBigData GA, Annex 1, DoA, p.13 2 Ibid, p.3 3 Ibid, p.135.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 10 of 81 SoBigData – 654024 www.sobigdata.eu

1.5 STRUCTURE OF THE DOCUMENT

This deliverable is organised as follows: Section 2 “Implementing value sensitive institutional design” gives an overview of the work presented in last year's deliverable, explains the necessity of designing an infrastructure in such a way that high level ethical values and legal norms are translated into practices that work for final users of the infrastructure and presents an overview of the elements that are developed to reach this goal. The subsequent sections 3 through 7 give a more detailed description of these elements separately. Section 3 describes the responsibility infrastructure designed to assign clear responsibilities to all actors within SoBigData. Section 4 describes the key principles intended to give ethical direction to the work within the research infrastructure. Section 5 describes the data science ethical and legal assessment sheet. Section 6 describes the ethics briefs designed to provide more specialised information related to particular types of data and methods. And finally section 5 describes the MOOC that is being developed to give every user of the infrastructure an introduction to the key elements of the legal and ethical issues related to social science with big personal data.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 11 of 81 SoBigData – 654024 www.sobigdata.eu

2 IMPLEMENTING VALUE-SENSITIVE INSTITUTIONAL DESIGN

2.1 BACKGROUND

Deliverable D2.2 provides a comprehensive overview of the legal and ethical aspects that are relevant for the SoBigData research infrastructure (RI).

With respect to the legal aspects, this means an overview of the current legal framework governing the use of personal data. Both the General Data Protection Regulation (GDPR) that will come into effect on May 15th 2018 and the current Data Protection Directive (DPD) that is in effect until that date have been extensively discussed there.

A key issue considered in D2.2 is the question of which actor, given the various configurations that are prevalent within the RI qualifies as the data controller. This is important, because the key legal responsibilities following from the data protection laws within the European legal framework belong to the data controller. The conclusion of the analysis that is presented in D2.2 is that, from a legal perspective, the RI is not considered to be the data controller. Instead, depending of the different ways that data and methods are shared through the RI, either the data provider, the final user, the software provider or a combination of them, should be regarded as the data controller.

With regards to the ethical consideration, D2.2 aims to give an comprehensive overview of the key ethical questions that are relevant for the use of personal data in social science research. It lays out 4 different moral reasons for constraining and regulating the use of personal data. It moreover discusses different existing frameworks of ethical principles that could serve as points of reference for the development of an ethical framework for SoBigData.

Most of the principles are defined at a level of abstraction that do not give direct rules for action. Thus the existing frameworks are useful to provide direction, but still need work in order to be applied to concrete situations. This is a key issue that surfaced while reviewing the current state of the art with regards to ethical use of personal data in the (social) sciences.

2.2 VALUE SENSITIVE INSTITUTIONAL DESIGN

The key aspect of this year’s effort in the legal and ethical work package is the transition of abstract legal and ethical principles into concrete practices. The aim is to design the research infrastructure in such a way that researchers using the infrastructure are supported and incentivised to uphold legal and ethical principles.

Based on the ground work in D2.2 and the collaborative work in the past year, an institutional design has been developed consisting of 5 main elements.

1. Responsibility infrastructure 2. Principles 3. The data science ethical and legal assessment sheet

D2.3 Legal and Ethical Framework for SoBigData 2 Page 12 of 81 SoBigData – 654024 www.sobigdata.eu

4. Ethics briefs 5. MOOC

In the next section, we will discuss the core features of the five elements of this institutional design and we will discuss how they relate to each other.

1. A responsibility infrastructure has been developed that assigns responsibilities to the key actors in the RI. It is important to assign roles in a clear and transparent manner since direct attribution of responsibility increases the propensity of actors individually and the infrastructure as a whole to produce ethically adequate outcomes.

While the legal responsibility does not belong to the SoBigData RI, the RI does take itself assume an ethical responsibility to provide an environment that is conducive to ethical behavior.

2. We have developed core principles that help guide actors within the RI. The principles are, by and large, procedural in nature, because we believe that the application of these general procedural principles can lead to moral behavior in concrete cases. 3. Accountability is guaranteed by the users filling out the legal and ethical self-assessment sheet. By reflecting on the implications of their research and motivating the decisions they have made, users reveal the grounds for their decisions and, because many of the parts of the sheet will be made publicly available to both their peers and the wider public, their research is open and they can be held accountable. This is line with the GDPR's accountability principle. This is not per se a disciplinary measure, but a way to stimulate further thought. So it is a primarily way of meeting principles introduced in section 4: No. 2 ('research should be critically questioned'), No. 3 ('that trade-offs are publicly discussed) and No. 5 ('that data subjects are informed and included in the discussion.) 4. We will develop and provide specialized ethics briefs dealing with specific ethical questions relevant to particular data sets or methods. There is a sizable and growing body of work concerning the ethical use of personal data in social science research. On the one hand, some of the knowledge, such as the main principles underlying the European legal framework, and key ethical principles such as autonomy, fairness and responsibility are relevant for all those who will work with personal data. On the other hand, there is a lot of detailed information that is mostly relevant for specific data sets or methods. Data scientists for whom law and ethics is not the core of their specialization are much more likely to absorb information that directly connects with the research that they (will) do.

We consider it a key element to provide users of the RI with a base level of knowledge and information about data protection law, intellectual property law and ethics. In order to dissipate this knowledge within the RI and to make sure that all users are familiar with the basic elements, we are developing a massive open online course (MOOC). This MOOC will be followed by every new user of the platform. Every section of the MOOC will end with a small exam (probably in the form of multiple choice) to test the knowledge of the new user.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 13 of 81 SoBigData – 654024 www.sobigdata.eu

2.3 FUTURE WORK

The purpose of the institutional design outlined above is to provide a research environment that will set a new standard for ethics-sensitive social science with personal data. SoBigData is currently working on the implementation of the elements described above. Since many of the elements are new, their functionality in practice will have to be evaluated by monitoring their use in practice. In the upcoming period, we will continue to develop the elements we have made drafts of here. We aim to furthermore test and evaluate the practical use of these elements, by, in collaboration with WP4, staging sessions in which we can collect feedback from the final users and other actors involved in SoBigData. Based on the feedback and practical experience, we will propose improvements upon the institutional design.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 14 of 81 SoBigData – 654024 www.sobigdata.eu

3 RESPONSIBILITY INFRASTRUCTURE The following actors have been considered to allocate responsibilities:

• Data scientist

• Technical ethics/privacy expert

• Ethical and legal experts

• Ethics board

• Project manager

• DPO’s of universities.

Ideally, data science practices would preserve privacy in a technical manner and at an early stage in the research process. Both the privacy risk assessment and data sanitisation would be carried out close to the source of the data gathering. In the case of using Call Data Records (CDR’s), this would be the case when the CDR raw data is converted into profiles at the telephone company. Another option would be to have the conversion algorithm as a part of the SoBigData infrastructure. In such cases, the infrastructure will have been designed by technical ethics/privacy experts in such a way that the data scientist does not interact with the privacy sensitive data. The methods of data sanitisation, however, have an effect on the usability of the data for research purposes. This delegation of the data sanitisation purely towards the technical ethics/privacy expert is therefore hard or impossible to maintain. It is instead necessary that data scientist reviews the proposed research plan in collaboration with ethics/privacy experts. In SoBigData, we thus encourage interaction between data scientists and privacy experts. At the level of the data type (e.g. CDR; or at the level of the exploratory, this needs to be decided) the data scientist will be provided with three things. 1. A short (1-2 page) introduction into the privacy/ethical issues that pertain to the specific data type, 2. references will be given to the top literature in the field, 3. Contact information will be given to the experts in the specific field that will be available to discuss issues of ethics and possibilities of implementing certain privacy preservation techniques.

We want to have a continuous analysis of the applicability of different privacy preserving techniques. The technical ethics/privacy experts will collaborate with the philosopher ethics/privacy experts to maintain an overview of the developments within the emerging field of ethical big data science. The aim is to maintain a knowledge base of what is going on, have an overview of the strengths and weaknesses of different approaches and create awareness of the conditions of applicability of the different approaches.

The responsibility of the ethics board is to create this overview of the existing literature and practices in relation to a certain data types, methods and tools. The technical experts will help to understand the technical aspects of the literature.

Data scientist

The data scientist using the SoBigData infrastructure has the responsibility to get acquainted with the fundamental ethical and legal aspects relating to his or her research. This is done first by watching the MOOC upon registering to the SoBigData infrastructure. Then, upon starting the use of specific data sets and methods and tools, by reading the ethics brief provided.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 15 of 81 SoBigData – 654024 www.sobigdata.eu

When starting up a research project the data scientist has the responsibility to take into account the ethical considerations in the set-up of his or her research and, if applicable, to make use of the methods proposed in the ethics briefs. In doing so, the data scientist is requested to contact the specific ethics/privacy expert(s) to discuss and possibly tweak the research set-up.

We consider it of central importance to apply the principle of openness. Lastly, when writing and publishing research done within SoBigData, the data scientist has the obligation to include the ethics and privacy considerations that were applied in the research. It should be stressed that there are cases in which not all ethical ideals can be applied to the fullest and a certain trade-off needs to be made. These cases should be explicitly mentioned and include the reasoning behind the position taken in the trade-off.

The scientist also needs to be aware that the law may set restrictions to his research. He must take the necessary precautions and should be aquainted with the basic legal principles and adhere to these. Internal review possibilities and tools provided for self-assessment should be used. We know it can be a difficult task, but please use the means provided to you by SoBigData and if you get insecure or lost you can ask experts for help.

Ethical and legal experts

The team of ethical and legal experts has the responsibility to develop a framework that promotes the ethical behavior and legal compliance within the research infrastructure.

Ethics board

Members of the ethics board critically assess the processes developed by the legal and ethical experts. If necessary, the board mobilizes additional relevant expertise with respect to the quality control regarding the ethical and legal aspects of the project and the methodology of designing the infrastructure that promotes moral and legal values. The ethics board can be called upon to give occasional advice, where and when required.

Project management

The project management should contribute to an efficient collaboration between the different work packages and the ethical and legal team. The management has the responsibility to maintain a thorough understanding of the development of the legal and ethical framework and the associated working processes. If, and when, responsibilities are assigned to actors within the infrastructure, the project management is in charge of overseeing and promoting the compliance with said responsibilities.

Data protection officers of universities/ other institutions

Data protection officers of universities/ other institutions have the responsibility to inform and advise the data controllers or the processors and the employees who carry out processing of their obligations pursuant to the data protection laws.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 16 of 81 SoBigData – 654024 www.sobigdata.eu

D2.3 Legal and Ethical Framework for SoBigData 2 Page 17 of 81 SoBigData – 654024 www.sobigdata.eu

D2.3 Legal and Ethical Framework for SoBigData 2 Page 18 of 81 SoBigData – 654024 www.sobigdata.eu

4 ETHICAL PRINCIPLES One of the elements of creating a research infrastructure that promotes the ethical use of personal data in social science is the development of a set of principles. For the development of these principles we have, as much as possible, built upon previous work. As a point of reference we have taken the general principles, such as the OECD privacy principles and looked at some more context-specific principles. The conclusion, in line with OECD, reported in D2.2, of this endeavor was that at this moment there is no established set of principles in the realm of the use of personal data in (social) research.

1. Inform researchers and create awareness. The use of personal data is growing at a high rate. And so is the awareness that while this development unlocks an immense potential for academic development and public good, this development also involves risks and ethical problems. However, while this awareness is growing, it is still not commonplace. To a large extent, legal and ethical issues relating to the use of personal data in social science is still a matter for specialists. Ideally we would like everybody involved in research based on personal data to be aware and act with ethical and legal considerations in mind. Therefore SoBigData aims to bring awareness and information to everyone involved. This principle is put into practice, amongst other things, by providing a MOOC and specialized ethics briefs.

2. All research is critically questioned; ethical questions are discussed early and openly. One of the central characteristics of ethical questions, is that they almost always involve trade-offs. Moreover, when it comes to ethics in big data science we are at the frontier of technological and societal development. Fixed practices of rules of thumb that entail ethical wisdom accumulated over time thus do not yet exist. Under these conditions, many questions will be hard to adequately answer by any individual alone. Only the combination of ideas from people with different sets of knowledge and sets of believe is necessary for finding answers. This may help individual researchers to develop a refined critical faculty to make ethical judgments, alleviate the pressure and doubt that may be associated with having to make difficult decisions alone and ensure dispersion of ethical knowledge in the research community. This principle is put into practice, amongst other things, by providing access to specialists and people with practical experience through the specialised ethics briefs.

3. Trade-offs are publicly discussed Another element that deserves to be highlighted is the relationship between academics and the public. Research within SoBigData is dependent on the public. Scientific research is, in most cases, publicly funded and the research is enabled by personal data that ultimately belongs to the public. An underlying assumption of most scientists is that their use of public funding and personal data is justified by the public benefits that the research is aimed to achieve. However, in order to empower the public and respect individual’s autonomy, it is necessary to lay open the considerations that gave rise to decisions concerning ethical trade-offs. This helps to establish a relationship of accountability. And, by opening up the decision making process, the rationality of any point of view can be more readily understood, even by those who may not agree with the final balancing. This principle is put into practice, amongst other things, by providing a list of questions that need to be answered and made publicly available via the RI (see 5.4).

4. Apply privacy preserving techniques in the technical design whenever possible. A key element of a move to an acceptable use of personal data is the development of and application of privacy preserving techniques. SoBigData aims to contribute to the development of these techniques (see D2.5 and D2.6). The intention is to develop methods and techniques that modify the data in such a way that individual

D2.3 Legal and Ethical Framework for SoBigData 2 Page 19 of 81 SoBigData – 654024 www.sobigdata.eu

identification becomes impossible (or prohibitively hard) while at the same time the utility for research of the datasets remains.

5. Inform the data subjects. In line with the intent of the law, this principle intends to create an environment of openness. We will be transparent about the processing of personal data that is enabled through the RI. While this principle is embedded in the GDPR, current practice in many research environments is rather restrained in opening up the data processing to the public. SoBigData will provide an environment that is conducive to opening up to the public. This principle is put into practice, amongst other things, by providing a list of questions that need to be answered and made publicly available via the RI (see section 5.4).

D2.3 Legal and Ethical Framework for SoBigData 2 Page 20 of 81 SoBigData – 654024 www.sobigdata.eu

5 THE DATA SCIENCE ETHICAL AND LEGAL ASSESSMENT SHEET

The data science ethical and legal assessment sheet, that is part of the value sensitive institutional design, is meant to help researchers within SoBigData to manage their legal obligations and comply to key ethical principles. The legal assessment sheet is constituted of a data protection checklist and an associated guidance document. The ethical assessment sheet consists of a question catalogue the researcher should answer. The objective is that the researcher critically assesses his own research and also creates transparency for the individuals involved in the research (by using their personal data) and the public.

5.1 DATA PROTECTION CHECKLIST AND ASSOCIATED GUIDE

5.1.1 NEED FOR A DATA PROTECTION CHECKLIST INSTRUMENT WITHIN SOBIGDATA

Researchers/universities/research institutes/other organisations processing personal data as data controllers in SoBigData need to comply with a number of data protection regulations which have been introduced in D 2.2. Under the new General Data Protection Regulation (GDPR)4 the data controller is not only responsible for being in compliance with the basic legal data protection principles laid down in Art. 5 (1) GDPR, e.g. processing personal data in a lawful, fair and transparent manner, the controller also must be able to demonstrate that he acts in compliance with these principles (Art. 5 (2) GDPR). If personal data is not handled in a legally compliant way, there could be serious consequences such as clearly raised administrative fines or reputational implications. In order to help researchers to comply with this task a checklist and associated guide will be provided. With this document a first prototype is introduced.

5.1.2 GENERAL CONCEPT/ PURPOSE OF THE CHECKLIST

The checkltist is accompanied by a researcher’s guide that will provide more detailed information. The developed checklist necessarily simplifies some legal issues. The result is a practical tool for self-evaluation and critical self-assessment that helps researchers to achieve a better understanding of data protection principles and the resulting requirements and restrictions for their scientific research/ archiving in the public interest and to have a better informed discussion with their research ethics committee or data protection office. It is no legal advice.

5.1.3 SCOPE OF APPLICATION – PROCESSING OF PERSONAL DATA FOR SCIENTIFIC RESEARCH PURPOSES OR ARCHIVING PURPOSES IN THE PUBLIC INTEREST

The checklist is meant to help the data controller within SoBigData to evaluate whether the envisaged processing of the personal data for scientific research purposes or archiving purposes in the public interest is in compliance with data protection law requirements. This includes all the processing steps from the collection and storage of the data and those processing steps after the request of an end-user to download a data set or have it analysed via the SoBigData platform/transnational access.

4 Regulation (EU) 2016/679 of the European Parliament and of the Council of April 2016 on the protection of natural persons with regard to the processing of personal data and the free movement of such data, and repealing Directive 95/46/ EC (General Data Protection regulation).

D2.3 Legal and Ethical Framework for SoBigData 2 Page 21 of 81 SoBigData – 654024 www.sobigdata.eu

5.1.4 DEVELOPMENT PROCESS

Such checklists with associated guides are no new invention. Already existing documents such as a checklist and associated guide provided by the University of Edinburgh5 have been studied and verified within WP 2. In a second step possible improvements for user-friendliness have been internally discussed within WP 2. Thirdly, due to the enactment of the GDPR the list had to consider the new legal situation. The GDPR harmonizes the legal framework on data protection to a considerable extent, but leaves also some scope for national regulations with regard to processing of personal data for research purposes, e.g. Member States are authorised to enact national regulations that allow the processing of special categories of personal data for research purposes or archiving purposes in the public interst (Art. 9 (2) (j) GDPR). Member States are also authorised to enact derogations from the data subject’s rights referred to in Art. 15, 16, 18 and 21 of the Regulation (Art. 89 (2) GDPR).

5.1.4.1 STUDYING EXISTING CHECKLISTS

The following documents have been considered:

• University of Edinburgh, Researcher‘s Data Protection Checklist, • Esomar Data Protection Checklist6 • ICO, A Quick ‚How to Comply‘ Checklist7 • University of Central Lancashire, Data Protection Checklist: Teaching, research, knowledge transfer, consultancy and related activities8 • Middlesex University, Data Protection Checklist for Researchers9

In general the idea of a checklist has been estimated by WP 2 members as a useful instrument for non- lawyers/ non-privacy experts to self-assess their research project in terms of data protection law requirements. This is because the checklist instrument is relatively short and filtered to the relevant aspects compared to the legal text of the Regulation or relevant legal textbooks. The format also enables a structured analysis of legal requirements in a logical order. An associated guide enables the researcher to self-study the various aspects and to have a better understanding of the several legal points mentioned in the list. It is also considered useful to not only let the researcher set checkmarks, but also let them make short elaborations in order to avoid premature decisions.

5 University of Edinburgh, Researcher’s Data Protection Checklist, http://www.ed.ac.uk/records-management/data- protection/guidance-policies/research/act/checklist; Researcher’s guide to the data protection principles, http://www.ed.ac.uk/records-management/data-protection/guidance-policies/research/act/guide-principles. 6 ESOMAR, Data Protection Checklist, https://www.esomar.org/uploads/public/knowledge-and-standards/codes-and- guidelines/ESOMAR-Data-Protection-Checklist_update-April-2016.pdf 7 ICO, A Quick ‚How to Comply‘ Checklist, https://ico.org.uk/media/for- organisations/documents/1558/getting_it_right_-_how_to_comply_checklist.pdf. 8 https://www5.uclan.ac.uk/ou/sds/resource-centre/External%20library/Data%20protection%20checklist.pdf. 9 Middlesex University, Data Protection Checklist for Researchers, https://webcache.googleusercontent.com/search?q=cache:OETAGg7D0lsJ:https://unihub.mdx.ac.uk/__data/assets/w ord_doc/0026/156491/MU_DPA_Checklist-Sept-2014-Final.docx+&cd=9&hl=de&ct=clnk&gl=de&client=firefox-b.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 22 of 81 SoBigData – 654024 www.sobigdata.eu

It was tried to reduce the use of legal terms that are open to broad realms of interpretation. However, due to its nature as technically neutral law, it is often hard to transfer the legal text to specific and clear requirements. Where suitable, examples were included to facilitate the understanding of legal conditions. Of course, this approach has limitations too, as examples cannot to a full extent reflect the eventually broad scope of interpretation or implications of a regulation. Nevertheless, it is believed that vivid examples can contribute to a better understanding of the law by the researchers.10 This belief has been strengthened after discussing with researchers how legal texts could be made more understandable to them where especially the use of examples was mentioned. Researchers will also receive suggestions for further reading to enhance their knowledge about relevant aspects. Researchers are also made aware of the possibility to contact their data protection officer or ethics committee.

It was further thought that researchers as a first step need to consider whether they process personal data or not. This is because only if the data processed is personal data the regime of the General Data Protection Regulation applies.11 As elaborated in D2.2. and in this deliverable the evaluation whether a data set is personal or non-personal can be a very difficult task. For this reason practical examples have been chosen to give researchers a better understanding under which condition de-identified data may still be personal data. In case of insecurity whether the data is of personal nature we also recommend to treat the data as personal data or get in touch with the internal data protection officer or ethics committee for clarification.

It is envisaged to implement the checklist as an electronic document. With pop-up elements researchers will be able to receive relevant information for those aspects for which they require more explanation. The researcher’s guide will, however, still also be maintained as a separate entire document and it will be recommended to those researchers not so experienced with data protection law requirements to study the entire document.

5.1.4.2 ADAPTION TO THE GENERAL DATA PROTECTION REGULATION (GDPR)

In D 2.2 an overview of the changes by the GDPR has been provided. For the checklist a separate table has been created where as a first step legal requirements deemed as relevant for the processing of personal data for scientific research purposes or archiving purposes in the public interest have been listed with additional elaboration for interpretation.

As already outlined in D 2.2. the main data protection principles under the Regulation already have existed under the legal regime of the Data Protection Directive. There is, however, an interesting question to clarify which refers to the principle of lawfulness and its relation to the purpose limitation principle.12

Under the Directive and implementing national laws every step of processing personal data must have a legal basis. This can be e.g. the (explicit) informed consent of the data subject or Member States are under the Directive also authorized to enact regulations that allow the processing of personal data for reasons of substantial public interest, e.g. scientific research. In addition, the processing must be in compliance with

10 See also Art. 29 Working Party, Opinion 4/2007 on the concept of personal data. 11 Art. 2 (1) GDPR; Albrecht, J. P. and Jotzo, F., Das neue EU Datenschutzrecht der EU, 2017, p. 58. 12 Elaborations in D 2.2. regarding to the interpretation of Recital 50 will be updated with this Deliverable; see also footnote 28.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 23 of 81 SoBigData – 654024 www.sobigdata.eu

the purpose limitation principle. The purpose limitation principle functions as a restrictive modality and requires that personal data shall be collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes.

The Regulation contains also both of the introduced principles in Art. 5 (1) GDPR, however in addition Recital 50 GDPR has to be considered: „The processing of personal data for purposes other than those for which the personal data were initially collected should be allowed only where the processing is compatible with the purposes for which the personal data were initially collected. In such a case, no legal basis separate from that which allowed the collection of the personal data is required.” Recital 50 also states: “Further processing for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes should be considered to be compatible lawful processing operations.”

Since the General Data Protection Regulation has been enacted, legal scholars are discussing the meaning and impact of Recital 50. Some of them opine according to the wording of Recital 50 that the further processing of personal data can be based on the legal ground that allowed the collection of the data if the new purpose is compatible with the initial purpose.13 Hence, the legal ground will be enlarged to cover also the compatible purpose. Other scholars take a much more restrictive view and argue that Recital 50 is to be interpreted much more narrowly or must be neglected at all.14 Especially in the case of further processing for privileged purposes such as archiving purposes in the public interest, scientific or historical research purposes or statistical purposes, it is argued that the interests of the data subject would despite the reference to Art. 89 (1) GDPR not be sufficiently considered.15 Consequently, the processing of personal data must according to this opinion have an own legal basis.

For a rather restrictive interpretation of Recital 50 it is also brought forward that according to Art. 8 (2) of the Charter of Fundamental Rights of the European Union personal data must be processed fairly for specified purposes and on the basis of the consent of the person concerned or some other legitimate basis laid down by law.16 Recital 50 itself, however, cannot be regarded as a legal ground. Of course recitals are part of the Regulation17 and it is consistent practice to use them for interpretation of the legal text, but they are not binding as such. The ECJ has underlined in constant jurisdiction that “it should be borne in mind that the preamble to a Community act has no binding legal force and cannot be relied on either as a ground for derogating from the actual provisions of the act in question or for interpreting those provisions in a manner

13 Frenzel, E. M., in: Paal, B. and Pauly, D. (edt.), Datenschutz-Grundverordnung, 2017, Art. 5 margin number 3; Richter, P., Big Data, Statistik und die Datenschutz-Grundverordnung, DuD 2016, p. 584; Piltz, K., Die Datenschutz- Grundverordnung, K & R 2016, p. 566; Schulz, S., in: Gola, P. (edt.), Datenschutz-Grundverordnung VO (EU) 2016/679, 2017, p. 256; Kühling, M., and Martini, M. in „Die Datenschutz-Grundverordnung: Revolution oder Evolution im europäischen und deutschen Datenschutzrecht“, EuZW 2016, p. 451; Härting, N., Datenschutz-Grundverordnung, 2016, p. 124. 14 Heberlein, H., in: Ehmann, E. and Selmayr, M. (edt.), Datenschutz-Grundverordnung, S. 267; Schantz, P., Die Datenschutz-Grundverordnung – Beginn einer neuen Zeitrechnung im Datenschutzrecht, NJW 2016, p. 1844; Schantz, P., in: Wolff, A. and Brink, S., (edt.), Beck Online Kommentar Datenschutzrecht, Art. 5, margin number 22; Herbst, T., in: Kühling, J. and Buchner, B. (edt.), Datenschutz-Grundverordnung: DS-GVO, 2017, pp. 205-206. 15 Schantz, P., in: Wolff, A. and Brink, S., (edt.), Beck Online Kommentar Datenschutzrecht, Art. 5, margin number 22. 16 Schantz, P., Die Datenschutz-Grundverordnung – Beginn einer neuen Zeitrechnung im Datenschutzrecht, NJW 2016, p. 1844; Heberlein, H., in: Ehmann, E. and Selmayr, M. (edt.), Datenschutz-Grundverordnung, p. 267. 17 Herbst,T., in: Kühling, J. and Buchner, B. (edt.), Datenschutz-Grundverordnung: DS-GVO, 2017, p. 205.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 24 of 81 SoBigData – 654024 www.sobigdata.eu

clearly contrary to their wording (Deutsches Milch-Kontor, C-136/04, EU:C:2005:716, paragraph 32 and the case-law cited)”.18 It is also questionable to enlarge the scope of application of a legal ground just on basis of a recital. One may ask why the European legislator has not set a clear rule in the actual legal text in Art. 6 or Art. 5 GDPR when modifying such an established principle as the principle of lawfulness was indeed intended. The fact that the purpose is compatible does not constitute a legitimate basis that is required by Art. 8 (2) of the Charter either.19 The principle of purpose limitation itself is neither a legal basis for processing personal data either20, it only sets further restrictions relating the processing of personal data as it demands that further processing must not be processed in a manner incompatible with the initial purpose.

As the the legal evaluation of the issue is at such an undecided state and there are some strong arguments for a restrictive application of Recital 50 data controllers will be recommended to check whether further processing of personal data is covered by its own legal ground.

5.2 LIST OF LEGAL REQUIREMENTS FOR PROCESSING OF PERSONAL DATA FOR SCIENTIFIC RESEARCH PURPOSES OR ARCHIVING PURPOSES IN THE PUBLIC INTEREST – PRELIMINARY STEP FOR DEVELOPING THE CHECKLIST AND ASSOCIATED GUIDE

No. Legal requirements

0 Processing of personal data:

This Regulation applies to the processing of personal data (Art. 2 (1) GDPR). It does not concern the processing of anonymous information, including for statistical or research purposes (Recital 26 GDPR). Personal data is any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;“ (Art. 4 (1) GDPR). To determine whether a natural person is identifiable, account should be taken of all the means reasonably likely to be used, such as singling out, either by the controller or by another person to identify the natural person directly or indirectly. To ascertain whether means are reasonably likely to be used to identify the natural person, account should be taken of all objective factors, such as the costs of and the amount of time required for identification, taking into consideration the available technology at the time of the processing and technological developments” (Recital 26 GDPR). Pseudonymised data that could be attributed to a natural person by the use of additional information shall be considered to be

18 ECJ, C-345/13. 19 Heberlein, H., in: Ehmann, E. and Selmayr, M. (edt.), Datenschutz-Grundverordnung, p. 267. 20 Heberlein, H., in: Ehmann, E. and Selmayr, M. (edt.), Datenschutz-Grundverordnung, p. 267.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 25 of 81 SoBigData – 654024 www.sobigdata.eu

personal data (Recital 26 GDPR).

Regarding the issue whether knowledge that can be used for re-identifying data subjects which is hold by third parties the ECJ recently had decided in his judgement in case C-582/14 of 19 October 2016: „In so far as that recital refers to the means likely reasonably to be used by both the controller and by ‘any other person’, its wording suggests that, for information to be treated as ‘personal data’ within the meaning of Article 2(a) of that directive, it is not required that all the information enabling the identification of the data subject must be in the hands of one person.“ The ECJ also stated that the possibility to combine data with information held by third parties which would enable re- identification of the data must constitute a means reasonably likely to be used and this „would not be the case if the identification of the data subject was prohibited by law or practically impossible on account of the fact that it requires a disproportionate effort in terms of time, cost and man-power, so that the risk of identification appears in reality to be insignificant.“

The judgement refers to the legal situation under the DPD. It is uncertain whether the reasoning of the ECJ will still be valid under the GDPR as it is formally a different legal act and the wording of the GDPR in the legal definition/associated recital has changed slightly, e.g. while Recital 26 GDPR states that all the means „reasonably likely“ to be used shall be considered, the wording in the Directive refers to means that are „likely reasonably“ to be used. In the German translation also a difference can be noticed the wording of recital 26 has changed from „vernünftigerweise“ to „wahrscheinlich“. Hence, according to the new wording it may not matter whether the identification/knowledge aquisition necessary for identification is prohibited by the law.21 In how far the knowledge of third parties under the regime of the GDPR shall be considered is not clear. The ECJ restricted it in the cited case to such knowledge that can be aquired by the party holding the data with means „likely reasonably“ to be used.22 Albrecht and Jotzo apply the same basic concept for the new legal situation under the Regulation when they require that besides the own knowledge means to re-identify of third parties must be considered if the controller could „reasonably likely“ aquire such knowledge.23 There are also opinions that (still) argue for a strict objective approach and it does not matter whether the party holding the data is able to aquire the knowledge that a third party holds that would enable re-identification.24 In the light of the technical possibilities in the Big Data era this would force parties holding data sets to regard all data as personal data as they would in most cases not be able to actually judge whether a third party holds the relevant information/ would be able to aquire such information with means reasonably likely to be used.25

However, in order to raise the level of protection for the data subject in the event of a transfer of the data, the controller should consider the means for re-identification of the transferee and in case

21 Feiler, L. and Forgó, N., EU-DSGVO Kurzkommentar, 2017, p. 72. 22 Weinhold, r., EuGH: Dynamische IP-adresse ist personenbezogenes Datum – Folgen der Entscheidung für die Rechtsanwendung, ZD-Aktuell 2016, 05366. 23 Albrecht, J. P. and Jotzo, F., Das neue EU Datenschutzrecht der EU, 2017, p. 59. 24 Klabunde, A., in: Ehmann, E. and Selmayr, M. (edt.), Datenschutz-Grundverordnung, S. 242; Feiler, L. and Forgó, N., EU-DSGVO Kurzkommentar, 2017, p. 72. 25 Albrecht, J. P. and Jotzo, F., Das neue EU Datenschutzrecht der EU, 2017, p. 59.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 26 of 81 SoBigData – 654024 www.sobigdata.eu

further transmissions by the transferee are not excluded, e.g. by contractual obligations, the controller/transferor must consider the means of secondary transferees, too.26

1 Principle of purpose limitation:

Personal data shall be collected for specified, explicit and legitimate purposes. Further processing must be compatible with that purpose (Art. 5 (1) (b) GDPR).

Any processing steps following the initial collection of the personal data are to be seen as further processing of personal data, regardless of whether the processing is for the purpose initially specified or for any additional purpose.27 Further processing is not restricted to the same data controller who initially collected the data.28

In some cases it is obvious that further processing is compatible, for example if the data have been collected to specifically achieve the purpose that shall be achieved with the intended further use.29 Further processing for archiving purposes in the public interest or scientific research shall in accordance with Art. 89 (1) GDPR, not be considered incompatible with the initial purpose (Art. 5 (1) (b) GDPR).30

Art. 89 (1) GDPR provides: “Processing for research purposes shall be subject to appropriate safeguards, in accordance with this Regulation, for the rights and freedoms of the data subject. Those safeguards shall ensure that technical and organisational measures are in place in particular in order to ensure respect for the principle of data minimisation. Those measures may include pseudonymisation provided that those purposes can be fulfilled in that manner. Where those purposes can be fulfilled by further processing which does not permit or no longer permits the identification of data subjects, those purposes shall be fulfilled in that manner.”

Suggestions of Art. 29 Working Party31 for safeguards for the data subject:

“When full anonymisation and use of aggregated data (at a sufficiently high level of aggregation) are not possible, data will often at least need to be partially anonymised (e.g. pseudo-anonymised, key- coded, and stripped of direct identifiers) and additional safeguards may also be required, as will be discussed below […]

Among the appropriate safeguards which may bring additional protection to the data subject the

26 Compare the opinion of Damman, U., in: Simitis,S. (edt.), Kommentar zum Bundesdatenschutzgesetz, 2011. 27 Art. 29 Working Party, Opinion 03/2013 on purpose limitation, p. 21; Schantz, P., in: Wolff, A. and Brink, S. (edt.), Beck Online Kommentar Datenschutzrecht, Art. 5, margin number 20. 28 Heberlein, H., in: Ehmann, E. and Selmayr, M. (edt.), Datenschutz-Grundverordnung, p. 302. Further processing has been interpreted more restrictively in D. 2.2. 29 Art. 29 Working Party, Opinion 03/2013 on purpose limitation, p. 22. 30 Recital 50 states: “Further processing for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes should be considered to be compatible lawful processing operations.” 31 Art. 29 Working Party, Opinion 03/2013 on purpose limitation, p. 26-28.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 27 of 81 SoBigData – 654024 www.sobigdata.eu

following could be considered:

- taking specific additional security measures (such as encryption); - in the case of pseudonymisation, making sure that data enabling the linking of information to a data subject (the keys) are themselves also coded or encrypted and stored separately; […] - restricting access to personal data only on a need-to-know basis, carefully balancing the benefits of wider dissemination against the risks of inadvertent disclosure of personal data to unauthorised persons. This may include, for example, allowing read only access on controlled premises. Alternatively, arrangements could be made for limited disclosure in a secure local environment to properly constituted closed communities. Legally enforceable confidentiality obligations placed on the recipients of the data, including prohibiting publication of identifiable information, are also important.

It is important to note that in high-risk situations, where the inadvertent disclosure of personal data would have serious or harmful consequences for individuals, even this type of access or restriction may not be suitable.“

There is also a Catalogue with technical and organisational measures in section 22 (2) German DPA (draft)32 that concerns measures to protect the interests of the data subject:

„In den Fällen des Absatzes 1 sind angemessene und spezifische Maßnahmen zur Wahrung der Interessen der betroffenen Person vorzusehen. Unter Berücksichtigung des Stands der Technik, der Implementierungskosten und der Art, des Umfangs, der Umstände und der Zwecke der Verarbeitung sowie der unterschiedlichen Eintrittswahrscheinlichkeit und Schwere der mit der Verarbeitung verbundenen Risiken für die Rechte und Freiheiten natürlicher Personen können dazu insbesondere gehören:

1. technisch organisatorische Maßnahmen, um sicherzustellen, dass die Verarbeitung gemäß der Verordnung (EU) 2016/679 erfolgt, 2. Maßnahmen, die gewährleisten, dass nachträglich überprüft und festgestellt werden kann, ob und von wem personenbezogene Daten eingegeben, verändert oder entfernt worden sind, 3. Sensibilisierung der an Verarbeitungsvorgängen Beteiligten, 4. Benennung einer oder eines Datenschutzbeauftragten,

32 Referentenentwurf des Bundesministeriums des Innern, Entwurf eines Gesetzes zur Anpassung des Datenschutzrechts an die Verordnung (EU) 2016/679 und zur Umsetzung der Richtlinie 2016/680 (datenschutz- anpassungs- und Umsetzungsgesetz EU –DSAnpUG-EU, https://www.datenschutzbeauftragter-online.de/wp- content/uploads/2017/01/DSAnpUG-EU-Entwurf-Kabinett.pdf; section 27 of the draft contains regulations implementing Art. 9 (2) (j) GDPR which also refers to Art. 89 (1) GDPR.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 28 of 81 SoBigData – 654024 www.sobigdata.eu

5. Beschränkung des Zugangs zu den personenbezogenen Daten innerhalb der verantwortlichen Stelle und von Auftragsverarbeitern, 6. Pseudonymisierung personenbezogener Daten, 7. Verschlüsselung personenbezogener Daten, 8. Sicherstellung der Fähigkeit, Vertraulichkeit, Integrität, Verfügbarkeit und Belastbarkeit der Systeme und Dienste im Zusammenhang mit der Verarbeitung personenbezogener Daten einschließlich der Fähigkeit, die Verfügbarkeit und den Zugang bei einem physischen oder technischen Zwischenfall rasch wiederherzustellen, 9. zur Gewährleitung der Sicherheit der Verarbeitung die Einrichtung eines Verfahrens zur regelmäßigen Überprüfung, Bewertung und Evaluierung der Wirksamkeit der technischen und organisatorischen Maßnahmen oder 10. spezifische Verfahrensregelungen, die im Falle einer Übermittlung oder Verarbeitung für andere Zwecke, die Einhaltung der Vorgaben dieses Gesetzes sowie der Verordnung (EU) 2016/679 sicherstellen.“

2 Lawful processing:

Personal data shall be processed lawfully (Art. 5 (1) (a) GDPR). Hence, for the processing of personal data a legal ground is necessary (Art. 6 GDPR/ Art. 9 GDPR (special categories of personal data)).

Processing shall be lawful, inter alia, where:

• the data subject has given consent to the processing of his or her personal data for one or more specific purposes (Art. 6 (1) (a) GDPR); • processing is necessary for the performance of a task carried out in the public interest (Art. 6 (1) (e) GDPR). It is assumed that scientific research is covered by the public interest.33 The basis for processing shall be laid down by Union law or Member State law to which the controller is subject (Art. 6 (3) GDPR). This legal ground is especially relevant for public universities34; • processing is necessary for the purposes of the legitimate interests pursued by the controller or by a third party, except where such interests are overridden by the interests or fundamental rights and freedoms of the data subject which require protection of personal data, in particular where the data subject is a child (Art. 6 (1) (f) GDPR). This legal ground cannot be used if the data controller is a public authority (e.g. public universities)35. Scientific research can be considered as a legitimate interest.36 “The fact that a controller acts not only

33 Reimer, Reimer, P., in Sydow, G. (edt.), Europäische Datenschutz-Grundverordnung, 2017, p. 347. 34 Ibid. 35 Recital 47 GDPR; Frenzel, E. M., in: Paal, B. P., and Pauly, D. A. (edt.), Datenschutz-Grundverordnung, 2017, Art. 5 margin number 26. 36 Reimer, P., in Sydow, G. (edt.), Europäische Datenschutz-Grundverordnung, 2017, p. 351.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 29 of 81 SoBigData – 654024 www.sobigdata.eu

in its own legitimate (e.g. business) interest, but also in the interests of the wider community, can give more 'weight' to that interest.”37 Protective measures to prevent any negative impact on the data subject may decrease the data subjects interest in not processing his or her personal data.38

In the event of special categories of personal data (revealing racial or ethnic origin, political opinions, religious or philosophical beliefs, or trade union membership, and the processing of genetic data, biometric data for the purpose of uniquely identifying a natural person, data concerning health or data concerning a natural person's sex life or sexual orientation) processing is allowed, inter alia, where:

• the data subject has given explicit consent to the processing of those personal data for one or more specified purposes, except where Union or Member State law provide that the prohibition referred to in paragraph 1 may not be lifted by the data subject (Art. 9 (2) (a) GDPR); • processing is necessary for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes in accordance with Article 89 (1) based on Union or Member State law which shall be proportionate to the aim pursued, respect the essence of the right to data protection and provide for suitable and specific measures to safeguard the fundamental rights and the interests of the data subject. (Art. 9 (2) (j) GDPR); • processing relates to personal data which are manifestly made public by the data subject (Art 9 (2) (e) GDPR)39.

The relevance of Recital 50 has been discussed in section 5.1.4.2.

3 Principle of data minimization:

„Personal data shall be adequate, relevant and limited to what is necessary in relation to the purposes for which they are processed“ (Art. 5 (1) (c) GDPR). The principle of data minimization is expression of the necessity principle which is elementary for the proportionality principle.40

As a reflection of that principle data controllers are obliged to implement appropriate technical and organizational measures to implement the principle of data minimization (Art. 25 (1) GDPR).

4 Principle of storage limitation:

Personal data shall be kept in a form which permits identification of data subjects for no longer than

37 Art. 29 Working Party, Opinion 06/2014 on the notion of legitimate interests of the data controller under Article 7 of Directive 95/46/EC, p. 35. 38 Art. 29 Working Party, Opinion 06/2014 on the notion of legitimate interests of the data controller under Article 7 of Directive 95/46/EC, pp. 36-40. 39 See for further restrictions D 2.2., point 2.3.4.2. 40 Albrecht, J. P. and Jotzo, F., Das neue EU Datenschutzrecht der EU, 2017, p. 40.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 30 of 81 SoBigData – 654024 www.sobigdata.eu

is necessary for the purposes for which the personal data are processed. Personal data may be stored for longer periods insofar as the personal data will be processed for scientific or historical research purposes or archiving purposes in the public interest in accordance with Article 89 (1) subject to implementation of the appropriate technical and organisational measures required by this Regulation in order to safeguard the rights and freedoms of the data subject (Art. 5 (1) (e) GDPR). In case the scientific research purpose could be also achieved with anonymous data the data must be anonymized (Art. 89 (1) GDPR).

In order to ensure that the personal data are not kept longer than necessary, time limits should be established by the data controller for erasure or for a periodic review (Recital 39 GDPR).

5 Principle of accuracy:

Personal data shall be accurate and, where necessary, kept up to date; every reasonable step must be taken to ensure that personal data that are inaccurate, having regard to the purposes for which they are processed, are erased or rectified without delay (Art. 5 (1) (d) GDPR).

From this follows in principle that researchers must ensure that their research data is accurate.

Whether the personal data must be up to date depends on the specific research purpose.41 Researchers do not need to ensure that the personal data is kept up to date, if the research is based on information representing a definitive time frame.42 Researchers are only to a certain degree obliged to erase or rectify inaccurate data. They only have to undertake reasonable steps – considering the purpose for which the data are processed – to erase or rectify inaccurate data without delay.

The principle of accuracy has particular importance if the data subject is affected by the processing. If decisions relating against the individual are based on the personal data or because data are transferred to third parties the data controller has to increase efforts to correct or where necessary update the inaccurate or outdated personal data.43

The accuracy of the data depends also on the purpose. In case data will only be processed in an aggregated form, e.g. the weight of a person is only processed by steps of 5 kg it is irrelevant whether the person weighs 72 or 74 kg.44 However, in these cases where the exact weight is not of relevance data controllers are required by the data minimization principle to de-identify the data more.

41 Feiler, L. and Forgó, N., EU-DSGVO Kurzkommentar, 2017, p. 90. 42 Albrecht, J. P. and Jotzo, F., Das neue EU Datenschutzrecht der EU, 2017, p. 53; University of Edinburgh, Researcher’s guide to the data protection principles, http://www.ed.ac.uk/records-management/data-protection/guidance- policies/research/act/guide-principles. 43 Schantz, P., in: Wolff, A. and Brink, S. (edt.), Beck Online Kommentar Datenschutzrecht, Art. 5, margin number 28. 44 Herbst, T., in: Kühling, J. and Buchner, B. (edt.), Datenschutz-Grundverordnung Kommentar, 2017, p. 209.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 31 of 81 SoBigData – 654024 www.sobigdata.eu

Rectification of data is not appropriate in all cases if the data proves wrong at a later stage. In case data relate to a specific procedure rectification may distort its meaningfulness. For example, a protocol reflects what the person has said in a court session – independently from the truth of the statement.45 In the event of archiving it must be considered that even if information proofs to be wrong the purpose of archiving is to present what data was available at a certain time.

6 Principle of integrity and confidentiality:

Personal data shall be processed in a manner that ensures appropriate security of the personal data, including protection against unauthorized or unlawful processing and against accidental loss, destruction or damage, using appropriate technical or organisational measures (Art. 5 (1) (f) GDPR).

Art. 32 GDPR states: „(1) Taking into account the state of the art, the costs of implementation and the nature, scope, context and purposes of processing as well as the risk of varying likelihood and severity for the rights and freedoms of natural persons, the controller and the processor shall implement appropriate technical and organisational measures to ensure a level of security appropriate to the risk, including inter alia as appropriate:

(a) the pseudonymisation and encryption of personal data; (b) the ability to ensure the ongoing confidentiality, integrity, availability and resilience of processing systems and services; (c) the ability to restore the availability and access to personal data in a timely manner in the event of a physical or technical incident; (d) a process for regularly testing, assessing and evaluating the effectiveness of technical and organisational measures for ensuring the security of the processing.

2. In assessing the appropriate level of security account shall be taken in particular of the risks that are presented by processing, in particular from accidental or unlawful destruction, loss, alteration, unauthorised disclosure of, or access to personal data transmitted, stored or otherwise processed.

3. Adherence to an approved code of conduct as referred to in Article 40 or an approved certification mechanism as referred to in Article 42 may be used as an element by which to demonstrate compliance with the requirements set out in paragraph 1 of this Article.

4. The controller and processor shall take steps to ensure that any natural person acting under the authority of the controller or the processor who has access to personal data does not process them except on instructions from the controller, unless he or she is required to do so by Union or Member

45 Schantz, P., in: Wolff, A. and Brink, S. (edt.), Beck Online Kommentar Datenschutzrecht, Art. 5, margin number 31.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 32 of 81 SoBigData – 654024 www.sobigdata.eu

State law.“

Annex to section 9 of the German Federal Data Protection Act46 includes a number of technical and organisational measures to raise the level of security and confidentiality:

„Where personal data are processed or used automatically, the internal organization of authorities or enterprises is to be arranged in such a way that it meets the specific requirements of data protection. In particular, measures suited to the type of personal data or data categories to be protected shall be taken,

1. to prevent unauthorized persons from gaining access to data processing systems with which personal data are processed or used (access control), 2. to prevent data processing systems from being used without authorization (access control), 3. to ensure that persons entitled to use a data processing system have access only to the data to which they have a right of access, and that personal data cannot be read, copied, modified or removed without authorization in the course of processing or use and after storage (access control), 4. to ensure that personal data cannot be read, copied, modified or removed without authorization during electronic transmission or transport, and that it is possible to check and establish to which bodies the transfer of personal data by means of data transmission facilities is envisaged (transmission control), 5. to ensure that it is possible to check and establish whether and by whom personal data have been input into data processing systems, modified or removed (input control), 6. to ensure that, in the case of commissioned processing of personal data, the data are processed strictly in accordance with the instructions of the principal (job control), 7. to ensure that personal data are protected from accidental destruction or loss (availability control), 8. to ensure that data collected for different purposes can be processed separately.

One measure in accordance with the second sentence Nos. 2 to 4 is in particular the use of the latest encryption procedures.“

Section 64 (3) of the German DPA (draft)47 states:

„Im Fall einer automatisierten Verarbeitung haben der Verantwortliche und der Auftragsverarbeiter

46 Federal Data Protection Act in the version promulgated on 14 January 2003 (Federal Law Gazette I p. 66), as most recently amended by Article 1 of the Act of 14 August 2009 (Federal Law Gazette I p. 2814). The FDPA is the current applicable law implementing the Data Protection Directive. 47 Referentenentwurf des Bundesministeriums des Innern, Entwurf eines Gesetzes zur Anpassung des Datenschutzrechts an die Verordnung (EU) 2016/679 und zur Umsetzung der Richtlinie 2016/680 (datenschutz- anpassungs- und Umsetzungsgesetz EU –DSAnpUG-EU, https://www.datenschutzbeauftragter-online.de/wp- content/uploads/2017/01/DSAnpUG-EU-Entwurf-Kabinett.pdf; section 27 of the draft contains regulations implementing Art. 9 (2) (j) GDPR which also refers to Art. 89 (1) GDPR.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 33 of 81 SoBigData – 654024 www.sobigdata.eu

nach einer Risikobewertung Maßnahmen zu ergreifen, die Folgendes bezwecken:

1. Verwehrung des Zugangs zu Verarbeitungsanlagen, mit denen die Verarbeitung durchgeführt wird, für Unbefugte (Zugangskontrolle), 2. Verhinderung des unbefugten Lesens, Kopierens, Veränderns oder Löschens von Datenträgern (Datenträgerkontrolle), 3. Verhinderung der unbefugten Eingabe von personenbezogenen Daten sowie der unbefugten Kenntnisnahme, Veränderung und Löschung von gespeicherten personenbezogenen Daten (Speicherkontrolle), 4. Verhinderung der Nutzung automatisierter Verarbeitungssysteme mit Hilfe von Einrichtungen zur Datenübertragung durch Unbefugte (Benutzerkontrolle), 5. Gewährleistung, dass die zur Benutzung eines automatisierten Verarbeitungssystems Berechtigten ausschließlich zu den von ihrer Zugangsberechtigung umfassten personenbezogenen Daten Zugang haben (Zugriffskontrolle), 6. Gewährleistung, dass überprüft und festgestellt werden kann, an welche Stellen personenbezogene Daten mit Hilfe von Einrichtungen zur Datenübertragung übermittelt oder zur Verfügung gestellt wurden oder werden können (Übertragungskontrolle), 7. Gewährleistung, dass nachträglich überprüft und festgestellt werden kann, welche personenbezogenen Daten zu welcher Zeit und von wem in automatisierte Verarbeitungssysteme eingegeben oder verändert worden sind (Eingabekontrolle), 8. Gewährleistung, dass bei der Übermittlung personenbezogener Daten sowie beim Transport von Datenträgern die Vertraulichkeit und Integrität der Daten geschützt wird (Transportkontrolle), 9. Gewährleistung, dass eingesetzte Systeme im Störungsfall wiederhergestellt werden können (Wiederherstellbarkeit), 10. Gewährleistung, dass alle Funktionen des Systems zur Verfügung stehen und auftretende Fehlfunktionen gemeldet werden (Zuverlässigkeit), 11. Gewährleistung, dass gespeicherte personenbezogene Daten nicht durch Fehlfunktionen des Systems beschädigt werden können (Datenintegrität), 12. Gewährleistung, dass personenbezogene Daten, die im Auftrag verarbeitet

werden, nur entsprechend den Weisungen des Auftraggebers verarbeitet werden können (Auftragskontrolle),

13. Gewährleistung, dass personenbezogene Daten gegen Zerstörung oder Verlust

geschützt sind (Verfügbarkeitskontrolle),

14. Gewährleistung, dass zu unterschiedlichen Zwecken erhobene personenbezogene Daten getrennt verarbeitet werden können (Trennbarkeit).

Ein Zweck nach Satz 1 Nummer 2 bis 5 kann insbesondere durch die Verwendung von dem Stand der

D2.3 Legal and Ethical Framework for SoBigData 2 Page 34 of 81 SoBigData – 654024 www.sobigdata.eu

Technik entsprechenden Verschlüsselungsverfahren erreicht werden.“

7. Fair and transparent processing:

Personal data shall be processed fair and in a transparent manner (Art. 5 (1) (a) GDPR). The principle of fairness and transparency shall make for the data subject concerned retraceable who is processing for what purpose his or her personal data.48

“It should be transparent to natural persons that personal data concerning them are collected, used, consulted or otherwise processed and to what extent the personal data are or will be processed. The principle of transparency requires that any information and communication relating to the processing of those personal data be easily accessible and easy to understand, and that clear and plain language be used. That principle concerns, in particular, information to the data subjects on the identity of the controller and the purposes of the processing and further information to ensure fair and transparent processing in respect of the natural persons concerned and their right to obtain confirmation and communication of personal data concerning them which are being processed. Natural persons should be made aware of risks, rules, safeguards and rights in relation to the processing of personal data and how to exercise their rights in relation to such processing“ (Recital 39 GDPR).

In particular, the information obligations (Art. 13 and 14 GDPR) and the right of access (Art. 15 GDPR) shall warrant that that personal data is processed transparently.49 It is also regulated in Art. 12 GDPR in which manner information obligations and right of access have to be fulfilled.

The law provides exemptions from data subjects rights to information and access in Art. 14 (5) and 89 (2) GDPR. Art. 14 (5) (b) GDPR states that information duties do not apply where and insofar as: “the provision of such information proves impossible or would involve a disproportionate effort, in particular for processing for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes, subject to the conditions and safeguards referred to in Article 89 (1) or in so far as the obligation referred to in paragraph 1 of this Article is likely to render impossible or seriously impair the achievement of the objectives of that processing. In such cases the controller shall take appropriate measures to protect the data subject's rights and freedoms and legitimate interests, including making the information publicly available.”

Art. 11 GDPR also needs to be considered:

„1. If the purposes for which a controller processes personal data do not or do no longer require the identification of a data subject by the controller, the controller shall not be obliged to maintain, acquire or process additional information in order to identify the data subject for the sole purpose of

48 Albrecht, J. P. and Jotzo, F., Das neue EU Datenschutzrecht der EU, 2017, p. 51. 49 Ibid.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 35 of 81 SoBigData – 654024 www.sobigdata.eu

complying with this Regulation.

2. Where, in cases referred to in paragraph 1 of this Article, the controller is able to demonstrate that it is not in a position to identify the data subject, the controller shall inform the data subject accordingly, if possible. In such cases, Articles 15 to 20 shall not apply except where the data subject, for the purpose of exercising his or her rights under those articles, provides additional information enabling his or her identification.“

According to Art. 11 GDPR and the principle of data minimisation data controllers are required to only obtain and hold personal data that they need in order to pursue their research purpose. If the researcher does not hold the data that allows to identify the data subject, an exemption from the following information obligations/ data subject’s rights applies.50

Arrangements to comply with the rights of data subjects/ exceptions that apply

7a. The information data controllers need to provide where the personal data have not been obtained from the data subject51:

Art. 14 (1) GDPR requires to inform the data subject about:

• the identity and the contact details of the controller and, where applicable, of the controller's representative; • the contact details of the data protection officer, where applicable; • the purposes of the processing for which the personal data are intended as well as the legal basis for the processing; • the categories of personal data concerned; • the recipients or categories of recipients of the personal data, if any; • where applicable, that the controller intends to transfer personal data to a recipient in a third country or international organisation and the existence or absence of an adequacy decision by the Commission, or in the case of transfers referred to in Article 46 or 47, or the second subparagraph of Article 49 (1), reference to the appropriate or suitable safeguards

50 Article 11 (2) GDPR only refers to Art. 15-20 GDPR, but it is argued that it needs to be extended also regarding the information rights in Art. 13 and 14 GDPR (Gola, in: (Gola, P. (edt.), Datenschutz-Grundverordnung VO (EU) 2016/679, 2017, p. 313); however Art. 11 (1) GDPR does refer to the Regulation as a whole and so does Recital 57 GDPR when it refers to “any provision of this Regulation”. 51 In case data are collected from the data subject Art. 13 GDPR applies. The information obligations only slightly differ. Most relevant here is that Art. 13 GDPR does not require to inform about the categories of personal data concerned and about data sources. The information must be provided at the time when the data are obtained.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 36 of 81 SoBigData – 654024 www.sobigdata.eu

and the means to obtain a copy of them or where they have been made available.

Art. 14 (2) GDPR states: The controller shall provide the information necessary to ensure fair and transparent processing in respect of the data subject:

• the period for which the personal data will be stored, or if that is not possible, the criteria used to determine that period; • where the processing is based on point (f) of Article 6 (1), the legitimate interests pursued by the controller or by a third party; • the existence of the right to request from the controller access to and rectification or erasure of personal data or restriction of processing concerning the data subject and to object to processing as well as the right to data portability; • where processing is based on point (a) of Article 6 (1) or point (a) of Article 9 (2), the existence of the right to withdraw consent at any time, without affecting the lawfulness of processing based on consent before its withdrawal; • the right to lodge a complaint with a supervisory authority; • from which source the personal data originate, and if applicable, whether it came from publicly accessible sources; • the existence of automated decision-making, including profiling, referred to in Article 22 (1) and (4) and, at least in those cases, meaningful information about the logic involved, as well as the significance and the envisaged consequences of such processing for the data subject.

The controller shall provide the information:

• within a reasonable period after obtaining the personal data, but at the latest within one month, having regard to the specific circumstances in which the personal data are processed; • if the personal data are to be used for communication with the data subject, at the latest at the time of the first communication to that data subject; or • if a disclosure to another recipient is envisaged, at the latest when the personal data are first disclosed.

„Where the controller intends to further process the personal data for a purpose other than that for which the personal data were obtained, the controller shall provide the data subject prior to that further processing with information on that other purpose and with any relevant further information as referred to in paragraph 2.“

This shall not apply insofar as “the provision of such information proves impossible or would involve a disproportionate effort, in particular for processing for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes, subject to the conditions and safeguards referred to in Article 89(1) or in so far as the obligation referred to in paragraph 1 of this Article is likely to render impossible or seriously impair the achievement of the objectives of that processing. In such cases the controller shall take appropriate measures to protect the data subject's

D2.3 Legal and Ethical Framework for SoBigData 2 Page 37 of 81 SoBigData – 654024 www.sobigdata.eu

rights and freedoms and legitimate interests, including making the information publicly available.”52 According to Recital 62 GDPR disproportionate effort could in particular be the case where processing is carried out for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes. In that regard, the number of data subjects, the age of the data and any appropriate safeguards adopted should be taken into consideration.”

7b. Give access to the data:

The data subject shall have according to Art. 15 GDPR the right to obtain from the controller confirmation as to whether or not personal data concerning him or her are being processed, and, where that is the case, access to the personal data and the following information:

• the purposes of the processing; • the categories of personal data concerned; • the recipients or categories of recipient to whom the personal data have been or will be disclosed, in particular recipients in third countries or international organisations; • where possible, the envisaged period for which the personal data will be stored, or, if not possible, the criteria used to determine that period; • the existence of the right to request from the controller rectification or erasure of personal data or restriction of processing of personal data concerning the data subject or to object to such processing; • the right to lodge a complaint with a supervisory authority; where the personal data are not collected from the data subject, any available information as to their source; • the existence of automated decision-making, including profiling, referred to in Article 22(1) and (4) and, at least in those cases, meaningful information about the logic involved, as well as the significance and the envisaged consequences of such processing for the data subject.

2. Where personal data are transferred to a third country or to an international organisation, the data subject shall have the right to be informed of the appropriate safeguards pursuant to Article 46 relating to the transfer.

3. The controller shall provide a copy of the personal data undergoing processing. For any further copies requested by the data subject, the controller may charge a reasonable fee based on administrative costs. Where the data subject makes the request by electronic means, and unless otherwise requested by the data subject, the information shall be provided in a commonly used electronic form.

Where personal data are processed for scientific or historical research purposes or statistical

52 The same rationale could be considered to apply if Art. 11 (1) GDPR applies.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 38 of 81 SoBigData – 654024 www.sobigdata.eu

purposes, Union or Member State law may provide for derogations from the rights referred to in Articles 15, 16, 18 and 21 subject to the conditions and safeguards referred to in paragraph 1 of this Article in so far as such rights are likely to render impossible or seriously impair the achievement of the specific purposes, and such derogations are necessary for the fulfilment of those purposes. Examples for such derogations can be found in the German DPA (draft):

Exception in section 27 (2) of the German DPA (draft)53:

„Die in den Artikeln 15, 16, 18 und 21 der Verordnung (EU) 2016/679 vorgesehenen Rechte der betroffenen Person sind insoweit beschränkt, als diese Rechte voraussichtlich die Verwirklichung der Forschungs- oder Statistikzwecke unmöglich machen oder ernsthaft beinträchtigen und die Beschränkung für die Erfüllung der Forschungs- oder Statistikzwecke notwendig ist. Das Recht auf Auskunft gemäß Artikel 15 der Verordnung (EU) 2016/679 besteht darüber hinaus nicht, wenn die Daten für Zwecke der wissenschaftlichen Forschung erforderlich sind und die Auskunftserteilung einen unverhältnismäßigen Aufwand erfordern würde.“

Exception in section 28 (2) of the German DPA (draft)54:

„Das Recht auf Auskunft der betroffenen Person gemäß Artikel 15 der Verordnung (EU) 2016/679 besteht nicht, wenn das Archivgut nicht durch den Namen der Person erschlossen ist oder keine Angaben gemacht werden, die das Auffinden des betreffenden Archivguts mit vertretbarem Verwaltungsaufwand ermöglichen.“

7c. Right to rectification:

The data subject shall have according to the right to obtain from the controller without undue delay the rectification of inaccurate personal data concerning him or her. Taking into account the purposes of the processing, the data subject shall have the right to have incomplete personal data completed, including by means of providing a supplementary statement (Art. 16 GDPR).

Where personal data are processed for scientific or historical research purposes or statistical purposes, Union or Member State law may provide for derogations from the rights referred to in Articles 15, 16, 18 and 21 subject to the conditions and safeguards referred to in paragraph 1 of this Article in so far as such rights are likely to render impossible or seriously impair the achievement of the specific purposes, and such derogations are necessary for the fulfilment of those purposes. Examples for such derogations can be found in the German DPA (draft):

Exception in section 27 (2) of the German DPA (draft)55:

53 Referentenentwurf des Bundesministeriums des Innern, Entwurf eines Gesetzes zur Anpassung des Datenschutzrechts an die Verordnung (EU) 2016/679 und zur Umsetzung der Richtlinie 2016/680 (datenschutz- anpassungs- und Umsetzungsgesetz EU –DSAnpUG-EU, https://www.datenschutzbeauftragter-online.de/wp- content/uploads/2017/01/DSAnpUG-EU-Entwurf-Kabinett.pdf; section 27 (2) of the draft contains regulations implementing Art. 89 (2) GDPR. 54 Ibid; section 28 (2) of the draft contains regulations implementing Art. 89 (3) GDPR.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 39 of 81 SoBigData – 654024 www.sobigdata.eu

„Die in den Artikeln 15, 16, 18 und 21 der Verordnung (EU) 2016/679 vorgesehenen Rechte der betroffenen Person sind insoweit beschränkt, als diese Rechte voraussichtlich die Verwirklichung der Forschungs- oder Statistikzwecke unmöglich machen oder ernsthaft beinträchtigen und die Beschränkung für die Erfüllung der Forschungs- oder Statistikzwecke notwendig ist.“

Exception in section 28 (3) of the German DPA (draft)56:

„Das Recht auf Berichtigung der betroffenen Person gemäß Artikel 16 der Verordnung (EU) 2016/679 besteht nicht, wenn die personenbezogenen Daten zu Archivzwecken im öffentlichen Interesse verarbeitet werden. Bestreitet die betroffene Person die Richtigkeit der personenbezogenen Daten, ist ihr die Möglichkeit einer Gegendarstellung einzuräumen. Das zuständige Archiv ist verpflichtet, die Gegendarstellung den Unterlagen hinzuzufügen.“

7d Right to erasure (right to be forgotten)

The data subject shall have according to Art. 17 GDPR „the right to obtain from the controller the erasure of personal data concerning him or her without undue delay and the controller shall have the obligation to erase personal data without undue delay where one of the following grounds applies:

(a) the personal data are no longer necessary in relation to the purposes for which they were collected or otherwise processed; (b) the data subject withdraws consent on which the processing is based according to point (a) of Article 6(1), or point (a) of Article 9(2), and where there is no other legal ground for the processing; (c) the data subject objects to the processing pursuant to Article 21(1) and there are no overriding legitimate grounds for the processing, or the data subject objects to the processing pursuant to Article 21(2); (d) the personal data have been unlawfully processed; (e) the personal data have to be erased for compliance with a legal obligation in Union or Member State law to which the controller is subject; (f) the personal data have been collected in relation to the offer of information society services referred to in Article 8(1).

2. Where the controller has made the personal data public and is obliged pursuant to paragraph 1 to erase the personal data, the controller, taking account of available technology and the cost of implementation, shall take reasonable steps, including technical measures, to inform controllers which are processing the personal data that the data subject has requested the erasure by such

55 Ibid; section 27 of the draft contains regulations implementing Art. 9 (2) (j) GDPR which also refers to Art. 89 (1) GDPR. 56 Ibid; section 28 (2) of the draft contains regulations implementing Art. 89 (3) GDPR.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 40 of 81 SoBigData – 654024 www.sobigdata.eu

controllers of any links to, or copy or replication of, those personal data.“

Paragraphs 1 and 2 shall inter alia not apply to the extent that processing is necessary for archiving purposes in the public interest, scientific or historical research purposes or statistical purposes in accordance with Article 89(1) in so far as the right referred to in paragraph 1 is likely to render impossible or seriously impair the achievement of the objectives of that processing.

7e Right to object:

According to Art. 21 GDPR: „1. The data subject shall have the right to object, on grounds relating to his or her particular situation, at any time to processing of personal data concerning him or her which is based on point (e) or (f) of Article 6(1), including profiling based on those provisions. The controller shall no longer process the personal data unless the controller demonstrates compelling legitimate grounds for the processing which override the interests, rights and freedoms of the data subject or for the establishment, exercise or defence of legal claims.

[…]

4. At the latest at the time of the first communication with the data subject, the right referred to in paragraphs 1 […] shall be explicitly brought to the attention of the data subject and shall be presented clearly and separately from any other information.

[…]

6. Where personal data are processed for scientific or historical research purposes or statistical purposes pursuant to Article 89(1), the data subject, on grounds relating to his or her particular situation, shall have the right to object to processing of personal data concerning him or her, unless the processing is necessary for the performance of a task carried out for reasons of public interest.“

5.3 RESEARCHER’S CHECKLIST AND ASSOCIATED GUIDE The checklist is meant to help you as the data controller within SoBigData to evaluate whether the envisaged processing of personal data for scientific research purposes or archiving purposes in the public interest is in compliance with data protection law requirements. This includes all the processing steps from the collection of the data, storage as well as analysis and transfer. This list does not concern transferring of personal data to third countries outside of the EU and EEA.

This checklist is for use alongside the Guidance notes on Research and the GDPR. Please refer to the notes for a more detailed explanation of the requirements.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 41 of 81 SoBigData – 654024 www.sobigdata.eu

You may choose to keep this form with your project management documentation so that you can show that you have taken into account the requirements of the GDPR

Checklist Guidance notes on research and the GDPR No

Are personal data contained in your data set you Are personal data contained in your data set 0. want process? you want to process?

You process data obviously relating to identified As a first step you will have to clarify whether persons the data that you want to use for your research is personal data or non-personal or data. Personal data is data that refers to an identified individual or to an identifiable You are able to re-identify person(s) to whom the individual. If your datset contains attributes data relates taking into account all the means like the name, date of birth or address, the reasonably likely to be used by you (eventually data subjects are identified and that data set using also the knowledge of third parties) would count as personal data. These are very clear cases; often researchers deal with or already de-identified data sets where obvious identifiers have been removed. In these cases You are unsure whether you would be able to re- you will have to check whether your data identify person(s) to whom the data relates by relates to an identifiable data subject. using reasonable means. In order to determine whether your data and relates to identifiable data subject(s) you will have to consider whether you are able to In the case of transfer of a data set you also have identify the people to whom the data relate considered possibilities to re-identify of the by using reasonable means. In other words transferee/secondary transferees. considering the time, effort or ressources, the context in which you use the data, the

available re-identification technologies and related costs would you deem it reasonable for you to re-identify the data subjects? Thereby if you know that a third party holds relevant information to re-identify the data would you be able to have access to the information? What would it cost you? In case the purpose of the processing implies identification of individuals, it will be assumed that you have the means “reasonably likely” to be used and the data is to be regarded as personal data.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 42 of 81 SoBigData – 654024 www.sobigdata.eu

In case you are transferring the data to third parties you may also consider the possibilities to re-identify of the transferee and secondary tranferees.

Anonymization is more and more difficult to achieve with increasing computing capacities and the ubiquitous availability of information. You also need to verify the status of the data regularly as identification risks may increase over time and depend on the development of information and communication technology.

Examples: If your dataset contains attributes like the name, date of birth or address the data subjects are identified and that data set would count as personal data.

Pseudonymised or link-coded data where the name and other key identifiers have been removed but where a separate file is hold by you that holds information to re-identify the data subjects the data is also to be regarded as personal data.

In case you have a collection of MAC addresses of peoples phones, and you have applied a hash function to them to make them anonymous, but you still possess the hash function, the hashed number still count as personal data.57

57 Examples like this could e.g. in form of a pop-up explained in more detail, especially as it could be questionable whether de-identified data could be re-identified with reasonable means. Hashing, for example, does not lead to anonymization of personal data if on the basis of the hash the original value can be found. Because in many cases the data controller controls the hashing formula, it is in most cases possible to recompute the hash with the original value. Rejo Zenger explains the argumentation in a blog post about an access request that he did to a company who perfomed wifi tracking. In it, he explains that it is possible to create a list that has the translation of all possible MAC- addresses to their respective hashes in 1,5 hours and a few dollars by using Amazon cloud computing even when a rather advanced form of hashing is used; see https://www.netkwesties.nl/885/welk-digitaal-spoor-heeft-

D2.3 Legal and Ethical Framework for SoBigData 2 Page 43 of 81 SoBigData – 654024 www.sobigdata.eu

Famous instances for re-identification individuals from “anonymized” data sets using publicly available information are:

• the public NYC taxicab database:

https://www.theguardian.com/techno logy/2014/jun/27/new-york-taxi- details-anonymised-data-researchers- warn

• the Netflix dataset; Narayanan, A., and Shmatikov, V., Robust de- anonymization of large spars datasets:

https://www.cs.cornell.edu/~shmat/s hmat_oak08netflix.pdf

In case you are unsure whether your data is personal data you can

Ø Get in contact with the data protection officer of your institute

Ø Ask your research ethics committee/ ethics board

Ø Read more about the issue:

• ICO, Anonymisation: managing data protection risk code of practice:

https://ico.org.uk/media/for- organisations/documents/1061/anony misation-code.pdf

citytraffic.htm; see also the Dutch Data Protection Authority, https://autoriteitpersoonsgegevens.nl/nl/nieuws/cbp- wifi-tracking-rond-winkels-strijd-met-de-wet.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 44 of 81 SoBigData – 654024 www.sobigdata.eu

• Jisc, Data protection and research data:

https://www.jisc.ac.uk/guides/data- protection-and-research-data

• Art. 29 Working Party on the concept of personal data:

http://ec.europa.eu/justice/data- protection/article- 29/documentation/opinion- recommendation/files/2007/wp136_e n.pdf

• Art. 29 Working Party on anonymization techniques:

https://cnpd.public.lu/fr/publications/ groupe-art29/wp216_en.pdf

• Lubarsky, B., “Re-identification of “anonymized” data”, the Georgetown Law Technology Review, April 2017

https://www.georgetownlawtechrevie w.org/re-identification-of- anonymized-data/GLTR-04-2017/

In case you are (still) unsure about the status of your data set, treat it as personal data. This means you will have to consider the following points on the checklist.

1. Purpose limitation principle: Purpose limitation principle:

D2.3 Legal and Ethical Framework for SoBigData 2 Page 45 of 81 SoBigData – 654024 www.sobigdata.eu

You further process the data for scientific The purpose limitation principle is a central research purposes or archiving purposes in the element of the European data protection law. public interest and you are not using the data to It means on the one hand that when the data support measures or decisions relating to the is collected for the first time from the data data subjects and the data are not processed in subject, the purpose of the collection must be such a way that damage or distress is likely to be specified. The purpose also has to be caused to any data subject. unambiguous and clearly expressed and must not be in contradicition with any legal

regulation (e.g. non-discrimination). Please list the technical and organisational

measures you have in place to protect the rights and freedoms of the data subject Any further processing after the initial collection of the personal data must not be

incompatible with the original purpose. Further processing for scientific research or archiving purposes in the public interest have -……………………………………………………………. been privileged by the law and are to be seen as compatible with the original purpose. -……………………………………………………………. In order to fall under the privileging rule you -……………………………………………………………. are allowed to process the personal data only -……………………………………………………………. for scientific research purposes or archiving purposes in the public interest. You are not -……………………………………………………………. allowed to use the data to support measures or decisions relating to the data subjects. You -……………………………………………………………. shall not process the data in such a way that -……………………………………………………………. damage or distress is likely to be caused to any data subject. -……………………………………………………………. Especially, when you want to make research -……………………………………………………………. data public you must ensure these are effectively anonymized or you have the

informed consent. You may also publish data if the publication of personal data is necessary for the representation of research results about contemporary events. Please clarify the individual case with your DPO/

ethics committee.

You also need to have appropriate technical and organizational measures in place to protect the rights and freedoms of the data subject. For example, the data should be de- identified as much as the research purpose

D2.3 Legal and Ethical Framework for SoBigData 2 Page 46 of 81 SoBigData – 654024 www.sobigdata.eu

allows it. There are other safeguards to be considered, too, inter alia:

ü Pseudonymization; ü Encryption; ü the pseudonymization and encryption key are also coded/encrypted and stored separately; ü access restrictions on a need-to- know-basis; ü read only access on controlled premises; ü limited disclosure in a secure local environment to properly constituted communities; ü contractual agreements setting up confidentiality obligations. Ø Read more:

• Art. 29 Working Party, Opinion 03/2013 on purpose limitation

http://ec.europa.eu/justice/data- protection/article- 29/documentation/opinion- recommendation/files/2013/wp203_en.pd f

Priniciple of lawfulness: Priniciple of lawfulness: 2.

The data subject has given explicit informed You will have to check whether you can base consent to use his or her data for your research the envisaged processing of personal data on a legal basis. This can be, for example, the or explicit informed consent of the data subject.58

58 For the case where informed consent is eligible for further processing a pop up frame with the following information will be available in order to enable the data controller to check the legal requirements on a valid informed consent:

D2.3 Legal and Ethical Framework for SoBigData 2 Page 47 of 81 SoBigData – 654024 www.sobigdata.eu

you do not work for a state public institution and In the event you have no informed consent you use the data for scientific research purposes from the data subject legitimising your and using the data will not cause unwarranted prejudice to the data subject. research you may still use the data if: Please indicate why you expect no damage for • You do not work for a public the data subject institution, • processing is necessary for your …………………………………………………………………………… scientific research and ………………. • the use of the data does not cause …………………………………………………………………………… unwarranted prejudice to the data subject. ……………….…………………………………………………………… ……………………………….

or In case you have no informed consent from the data subject legitimising your research you work for a public institution and the and you work for a public institution you may processing is necessary for your scientific still use the data if: research. In addition, it is required that a law • Processing is necessary for research determines that your public institution is charged purposes and with the task of scientific research. • a Union or Member State law to which the public institution you are The relevant law is: working for is subject establishes …………………………………………………………………………… scientific research as a task to be ……………. carried out in the public interest.

For a valid informed consent you need to inform the data subject about the following aspects:

• Who you are and your institutional affiliation • Who, if anyone, is funding the research • What kind of data you are asking the participant to provide • That provision of the data for your research is voluntary and that participants can withdraw their consent at any time and what effect that has on the use of already provided data • How you plan to store the data and what technical and organisational measures you have in place to protect the data • What you will do with the data and what is the purpose of your research • If you are planning to publish data give details about how they will be used in any subsequent publication • What will happen to the data after the research project has ended? Will you destroy the data or securely archive them?

You should ask individuals to sign a consent form. You should keep the consent form for as long as you keep the personal data. This will enable you to prove at a later stage – if required – that the individual gave his or her informed consent.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 48 of 81 SoBigData – 654024 www.sobigdata.eu

In case you process special categories of personal data you are able to base the processing either Please clarify with your data protection on the officer/ ethics committee what regulation applies in your case. explicit consent of the data subject In case you are processing special categories or of data, which are data revealing one or more of the following aspects: a national regulation that authorizes you to process the data for your research. o racial or ethnic origin, o political opinions, For these to apply you regularly have to fulfil at o religious or philosophical least the following conditions: beliefs, o or trade union membership, o genetic data, biometric data • Pursue a scientific research purpose; for the purpose of uniquely • The personal data you envisage to identifying a natural person, process must be necessary for your o data concerning health, research (check also No 3, principle on o data concerning a natural data minimization); person's sex life or sexual • Your interest in processing the data for orientation; your research outweigh the interest of

the data subject in not processing the data; you will have to be able to base the processing • You have appropriate technical and of such personal data either on the organisational measures in place to protect the personal data (check also on • explicit informed consent No.2, principle on purpose limitation). or

I have clarified with my data protection officer/ • on a national regulation authorizing ethics committee what regulation applies and I the processing for research purposes. have to consider the following additional requirements: Please get in touch with your data protection officer to figure out the respectivelaw and …………………………………………………………………………… requirements of your country. Most likely …………. these will include:

…………………………………………………………………………… • You will have to process the personal …………. data for a scientific research purpose;

• The personal data you envisage to process must be necessary for your research (check also No 3, principle on data minimization);

D2.3 Legal and Ethical Framework for SoBigData 2 Page 49 of 81 SoBigData – 654024 www.sobigdata.eu

• Your interest in processing the data for your research outweigh the interest of the data subject in not processing the data;

• You have appropriate technical and organisational measures in place to protect the personal data (check also on No.2, principle on purpose limitation).

Ø Read more: • Jisc, Data protection and research data:

https://www.jisc.ac.uk/guides/data- protection-and-research-data

• Art. 29 Working Party, Opinion 06/2014 on the notion of legitimate interests oft he data controller under Article 7 of Directive 95/46/EC

http://ec.europa.eu/justice/data- protection/article- 29/documentation/opinion- recommendation/files/2014/wp217_e n.pdf

• Art. 29 Working Party, Opinion 15/2011 on the definition of consent

http://ec.europa.eu/justice/policies/p rivacy/docs/wpdocs/2011/wp187_en. pdf

Principle of data minimization:3 Principle of data minimization: 3

3 I only hold and use that amount of data I need for 3 You shall only collect and keep those personal my research. 3 I have de-identified the data as data that you need for scientific research/ much as the research purpose allows it.3 archiving purposes. For example, if you do not 3 need the date of birth of the data subjects for 3 your research but only the year of birth,

D2.3 Legal and Ethical Framework for SoBigData 2 Page 50 of 81 SoBigData – 654024 www.sobigdata.eu

3 collect and hold only the year of birth. If age is 3 irrelevant for your research do not collect or hold any information relating age of the data subjects. The principle also requires that you de-identify the data as much as the scientific research purpose allows it.

Ø Read more:

• Jisc, Data Protection and research data,

https://www.jisc.ac.uk/guides/data- protection-and-research-data

4. Principle of storage limitation: Principle of storage limitation:

I erase or anonymize the data effectively if I do You shall keep personal data only for as long not need the data anymore. In case I store the as necessary. If you do not need the data data longer for evidence purposes or because I anymore and there are no other legal want to use the data for further research obligations to keep the data, the personal appropriate technical and organisational data shall be effectively erased or measures are in place to protect the data. I only anonymized. In order to ensure that the store the data in a personal form if it is required. personal data are not kept longer than necessary, time limits should be established

by the controller for erasure or for a periodic review. You may store the personal data longer insofar as the personal data will be processed for scientific or historical research purposes or archiving purposes in the public interest. In that case you will have to ensure that you have appropriate technical and organisational measures in place to protect the data (see elaborations on purpose limitation principle, No. 1). In case the scientific research purpose could also be achieved with anonymous data, the data must be anonymized.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 51 of 81 SoBigData – 654024 www.sobigdata.eu

Ø Read more:

• Jisc, Data Protection and research data,

https://www.jisc.ac.uk/guides/data- protection-and-research-data

5. Principle of accuracy: Principle of accuracy:

The research data I hold is accurate and up to In principle, researchers must ensure that date. I have put processes in place to keep the their research data is accurate. Whether the data accurate such as personal data must be up to date depends on the specific research purpose.

Researchers do not need to ensure that the …………………………………………………………………………… personal data is kept up to date, if the research is based on information representing a definitive time frame. or

Researchers are only to a certain degree my research is based on information representing obliged to erase or rectify inaccurate data. a definitive time frame because They only have to undertake reasonable steps – considering the purpose for which they are

processed – to erase or rectify inaccurate data ……………………………………………………………………………. without delay.

You need to invest increased effort if data are transferred to third parties. In case data will only be processed in an aggregated form, e.g. the weight of a person is only processed by steps of 5 kg it is irrelevant whether the person weighs 72 or 74 kg. However, in these cases where the exact weight is not of relevance you are required by the data minimization principle to de-identify the data more.

Rectification of data is not appropriate in all

D2.3 Legal and Ethical Framework for SoBigData 2 Page 52 of 81 SoBigData – 654024 www.sobigdata.eu

cases if the data proves wrong at a later stage. In case data relate to a specific procedure rectification may distort its meaningfulness. For example, a protocol reflects what the person has said in a court session – independently from the truth of the statement. In the event of archiving it may be considered that even if information proofs to be wrong the purpose of archiving is to present what data was available at a certain time.

Ø Read more:

• Jisc, Data Protection and research data,

https://www.jisc.ac.uk/guides/data- protection-and-research-data

6. Principle of security and confidentiality: Principle of security and confidentiality:

Please list the technical and organizational Personal data should be processed in a measures that are implemented. Reason why the manner that ensures appropriate security and security and confidentiality level is reasonably confidentiality of the personal data. You need high: to have technical and organisational measures in place that prevent

-……………………………………………………………. • unauthorised access to and processing -……………………………………………………………. of the data -……………………………………………………………. • unlawful processing

-……………………………………………………………. • accidental loss • destruction or damage or use of -……………………………………………………………. personal data and the equipment used for the processing.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 53 of 81 SoBigData – 654024 www.sobigdata.eu

-…………………………………………………………….

-……………………………………………………………. In particular, you should implement measures:

-……………………………………………………………. ü to prevent unauthorized persons from -……………………………………………………………. gaining access to data processing systems with which personal data are -……………………………………………………………. processed or used,

-……………………………………………………………. ü to prevent data processing systems from being used without -……………………………………………………………. authorization,

ü to ensure that persons entitled to use a data processing system have access only to the data to which they have a right of access, and that personal data ……………………………………………………………... cannot be read, copied, modified or ……………………………………………………………… removed without authorization in the course of processing or use and after ……………………………………………………………… storage,

……………………………………………………………… ü to ensure that personal data cannot be read, copied, modified or removed without authorization during electronic transmission or transport,

and that it is possible to check and establish to which bodies the transfer of personal data by means of data transmission facilities is envisaged, ü to ensure that it is possible to check and establish whether and by whom personal data have been input into data processing systems, modified or removed, ü to ensure that, in the case of commissioned processing of personal data, the data are processed strictly in accordance with the instructions of the principal (job control), ü to ensure that personal data are protected from accidental destruction or loss (availability control), ü to ensure that data collected for different purposes can be processed

D2.3 Legal and Ethical Framework for SoBigData 2 Page 54 of 81 SoBigData – 654024 www.sobigdata.eu

separately ü to encrypt your data using state-of- the-art technology.

7. Principle of fairness and transparency: Principle of fairness and transparency:

I do not hold information that enables me to The principle of fairness and transparency contact the data subject. Therefore, I do not have shall make for the data subject concerned to consider the following information obligations/ retraceable who is processing for what data subject’s rights, but I have made the purpose his or her personal data. The information about my research and the use of transparency principle is reflected in the data public (not the actual data set!). information obligations of the data controller and data subject’s rights (e.g. right of access

to their personal data). The law, however, or restricts these obligations and data subjects‘ rights to promote the principle of data minimization. According to the principle of data minimisation data controllers are I do hold information that allows me to identify required to only obtain and hold personal data and contact the data subjects and go on with 7a- that they need in order to pursue their d. research purpose (No. 3). If the researcher does not hold the data that allows to identify the data subject an exemption from the following information obligations/ data subjects‘ rights applies. You need to verify whether the information you hold allows you to identify the person. This would be the case if you still hold the name and contact data, e.g. the postal address or e-mail address. In case you do not hold contact data you should, however, make the information about your research and the use of the data public (not the actual data set!). Please use to the criteria below.

7a. Information duties: Information duties:

D2.3 Legal and Ethical Framework for SoBigData 2 Page 55 of 81 SoBigData – 654024 www.sobigdata.eu

I have informed the data subjects. The information data controllers need to provide where the personal data have not

been obtained from the data subject: or

• the identity and the contact details of informing the data subjects is impossible or the controller and, where applicable, involves disproportionate effort and I have made of the controller's representative; the information about my research and the use • the contact details of the data of the data public (not the actual data set!). protection officer, where applicable; • the purposes of the processing for which the personal data are intended as well as the legal basis for the processing; • the categories of personal data concerned • the recipients or categories of recipients of the personal data, if any; • the period for which the personal data will be stored, or if that is not possible, the criteria used to determine that period; • the existence of the right to request from the controller access to and rectification or erasure of personal data or restriction of processing concerning the data subject and to object to processing as well as the right to data portability; • where processing is based on point (a) of Article 6 (1) or point (a) of Article 9 (2), the existence of the right to withdraw consent at any time, without affecting the lawfulness of processing based on consent before its withdrawal; • the right to lodge a complaint with a supervisory authority;

D2.3 Legal and Ethical Framework for SoBigData 2 Page 56 of 81 SoBigData – 654024 www.sobigdata.eu

• from which source the personal data originate, and if applicable, whether it came from publicly accessible sources.

You will need to inform the data subject at the latest within a month. If you envisage to disclose the data to a third person, at the latest when the data are first disclosed.

In case you change the purpose for the processing you have to inform the data subject prior to that further processing about the new purpose and other points mentioned above that are influenced by the new purpose, e.g. the time of storage.

You do not have to inform the data subjects individually if the provision of such information proves impossible or would involve a disproportionate effort. In such cases you will have to make information about you/your institute and your research and what kind of data you use (not the actual data set!) publicly available, e.g. on the website of your research institute. Please use the criteria above.

In case data are collected from the data subject the information obligations only slightly differ. Most relevant here is that you are not required to inform about the categories of personal data concerned and about data sources. The information must be provided at the time when the data are obtained.

7b. Right of access: Right of access:

D2.3 Legal and Ethical Framework for SoBigData 2 Page 57 of 81 SoBigData – 654024 www.sobigdata.eu

The data subject has the right to obtain from the controller confirmation as to whether or I would be able to provide a copy of the data not personal data concerning him or her are relating to the data subject and the relevant being processed, and, where that is the case, information regarding the use of the personal access to the personal data and the following data information:

• the purposes of the processing; or • the categories of personal data concerned;

• the recipients or categories of I would not be able to provide a copy of the data recipient to whom the personal data relating to the data subject and the relevant have been or will be disclosed; information regarding the use of the personal • where possible, the envisaged period data as it would require disproportionate effort. I for which the personal data will be will contact the data protection officer or stored, or, if not possible, the criteria research ethics committee in the event of a used to determine that period; request. • the existence of the right to request from the controller rectification or erasure of personal data or restriction of processing of personal data concerning the data subject or to object to such processing; • the right to lodge a complaint with a supervisory authority; where the personal data are not collected from the data subject, any available information as to their source.

The controller shall provide a copy of the personal data undergoing processing if requested.

Member States laws can provide derogations for the right of access. For example, in the case complying with the request would require disproportionate effort or if complying with the request would seriously impair the achievement of the objective of your research

D2.3 Legal and Ethical Framework for SoBigData 2 Page 58 of 81 SoBigData – 654024 www.sobigdata.eu

contact your data protection officer or research ethics committee.

7c/e Right to rectification, erasure and object: Right to rectification, erasure and object: .

I comply with the principle of data minimization Please ensure you comply with Principle of (No. 3) and storage limitation (No. 4) as well as data minimization and storage limitation as the principle of accuracy (No. 5). well as the principle of accuracy. Data subjects have been given rights to enforce that their

personal data is accurate and that you only In case you cannot meet the need of the data process these when you need the data for subject you should contact your data protection your research. Under certain circumstances officer or research ethics committee. they can also object to the processing of their personal data for research purposes.

EU law or Member States laws can provide derogations for these rights. For instance, in the event complying with the request would require disproportionate effort or if complying with the request would seriously impair the achievement of the objective of your research please contact your data protection officer or research ethics committee.

5.4 PUBLIC INFORMATION SHEET

In the effort of making sure that the use of your dataset upholds the necessary legal and ethical standards, you have to complete a list of questions regarding your dataset, research and security and privacy measures. These questions are aimed to (1) help you form an informed opinion about your use of personal data, (2) function as an internal basis for juridical and ethical deliberation and (3) a document to inform the public about your use of personal data for research purposes to create transparency and accountability.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 59 of 81 SoBigData – 654024 www.sobigdata.eu

1. Give a brief description of your dataset, methods and the research that you seek (others) to perform. Describe the (public) benefit that you expect the research to have.

2a. Describe the potential privacy risks and the steps you are taking to minimize these risks to privacy. (for example the use of de- identification methods and other privacy-by-design methods)

2b.What steps are you taking to secure the data. (for example encrypting the data and restrict physical and digital access to the server)

3. Where did the data originate from? 4. Is this source publicly available? 5. Who is the (representative) of the data controller? 6. Who can be contacted? 7. What is the address? 8. On what legal basis you are processing personal data? 9. With whom are you sharing the data, that is, whom are you allowing to access the data? 10. How long you intend to keep the data, and how do you decide when to delete it? 11. What procedure do you have in place for a citizen to request access, rectification, restriction of processing?

If citizens are not happy with the processing of data, they have a right to lodge a complaint at the Data Protection Authority.

The new European data protection law gives researchers a privileged position with regards to the use of personal data. The rationale for this privileged position is the public benefit that is associated with scientific research. This privileged position comes with an increased responsibility to deal with this data in a respectful and ethical manner to prevent any damage for the individuals whose data you are using for your research.

In order to increase transparency for the individual to know what is happening with his personal information, the individual has to be individually informed by the data controller on important aspects regarding the collection and processing of personal data. But when informing data subject individually would involve a disproportionate effort, this obligation is lifted. However, when this exception is invoked, the obligation to inform all data subjects individually is replaced with an obligation to make the same information publicly available.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 60 of 81 SoBigData – 654024 www.sobigdata.eu

6 ETHICS BRIEFS

6.1 NEED FOR ETHICS BRIEFS

We are developing specialized ethics briefs dealing with specific ethical questions relevant for particular data sets or methods. There is a sizable and growing body of work concerning the ethical use of personal data in social science research. On the one hand, some of the knowledge, such as the main principles underlying the European legal framework, and key ethical principles such as autonomy, fairness and responsibility are relevant for all those who will work with personal data. On the other hand there is a lot of detailed information that is mostly relevant for specific data sets or methods. Data scientists for whom law and ethics is not the core of their specialization are much more likely to absorb information that directly connects with the research that they (will) do.

6.2 DEVELOPMENT PROCESS The initial concept of ethics briefs was developed in a working week, including data scientist, technical privacy and VSD scholars, privacy and digital ethics scholars, legal scholars, developers of the research infrastructure and project management. The key question at hand was how to translate the generally abstract and complex legal and ethical concepts identified as important in the ethical and legal framework to a practical level that has a higher chance of positively influencing actual research practices. A key driver in the development of the ethics briefs is the realization that data scientists have a different specialization and focus than ethical and legal specialist and cannot be expected to have the time or interest to attain a high level of specialization in those fields. The relevance of information becomes higher when it more specifically related to the work a data scientists wants to do.

A first prototype on Twitter data was then developed by the ethics members of work package 2. This first prototype was then reviewed by members of the ethics board and discussed and refined. At a later stage in a second prototype on the DE webarchive was reviewed by and discussed with technical specialists on this topic. On the basis of the different reviews and conversations the prototypes were refined. In the upcoming period the ethics briefs will be integrated in the digital platform. User feedback will be collected to see how this element of the value sensitive institutional design performs in practice and will be used to improve the deign.

6.3 DESIGN ELEMENTS

We will now process to describe the design elements that guide the production of the ethics briefs and provide an explanation of the rationale underlying these elements.

• Specific. Ethics briefs deal with specific ethical issues that are relevant in the context of particular types of data or particular methods. While many general texts of ethics in data science exists, they often contain a lot of information that is only relevant to particular types of data research or so general that it is hard to know how to apply it in specific cases. By providing information that is specifically catered to the type of work that a data scientist wants to do the relevance of that information goes up. This increases the chance that the information contained in the briefs will actually be read, understood and applied.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 61 of 81 SoBigData – 654024 www.sobigdata.eu

• Short. Ethics briefs are short; between 500-1000 word introductions to different ethical topics that are relevant in the context of particular kinds of data or method. By being short, the burden, in time and initial intellectual effort, put on the user to use the ethics brief is lowered. This increases the chance that the information contained in the briefs will actually be read, understood and applied. • References to more rich information. While the ethics briefs will be short, it is in no way intended to be shallow. In order to do justice to the depth of many ethical questions and provide the necessary and relevant resources to data scientists, the ethics briefs contain references to the top publications regarding the respective questions at hand. In this way, the ethics briefs are designed to function as a gate way document. • Practical examples. The ethics briefs contain practical examples whenever possible. Many of the ethical concerns related to the use of personal data and risks involved can sound very abstract or theoretical. When confronted with actual instances of harm being done through data, it becomes easier to imagine for a data scientist the possible harm that may follow from their own work and thus increases the incentive to spend effort to reduce the risk of harm.

Invitation to connect to specialists. Finally the ethics briefs will contain contact information of specialists with regards to the different topics. Applying ethical concepts is something not easily understood through mere transferral of knowledge. It should be seen as a techne (art), a practical form of knowledge. Guidance by a person with experience is essential in learning to apply an art. Fostering connections between legal and ethical specialists and data scientists may also help the specialists to gain a better understanding of the particular instances in which their work is applied.

6.4 EXAMPLES

Below we show the template for the ethics briefs and a first set of ethics briefs.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 62 of 81 SoBigData – 654024 www.sobigdata.eu

Ethics Brief Template

SoBigData Ethics Brief for [dataset/method]: [name dataset/method]

Exploratory: [Name exploratory]

Contact persons: [Name contact person]

[Introductory paragraph explaining the nature of the dataset/method and the type of research that can be done based on this dataset]

[Introduction to a set of 3 to 5 ethical questions dilemma’s that are relevant with regards to the dataset/method. Referring whenever possible to popular or scientific literature and real world examples]

To find the right balance it is important to apply the principles of deliberation and openness. When you are setting up your research make sure you:

• get informed about the topic, for example via the proposed literature • Commit your initial ethical deliberations to paper • Discuss these deliberations with others, such as colleagues, contact persons from SoBigData and the IRB of your institution • Discuss your deliberations in a section of publications of your research

Example literature

[List and short description of 2-4 of the best texts that interested researchers could read]

Example real life problems

[List 1-3 short descriptions of real life problems]

Contact persons

Philosopher ethics/privacy expert/ethics board: [name]: [e-mail address]

[short bio/description why relevant]

Technical ethics/privacy expert: [name]: [e-mail address]

[short bio/description why relevant]

Overall text should be between 500-1000 words.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 63 of 81 SoBigData – 654024 www.sobigdata.eu

SoBigData Ethics Brief for Dataset: Twitter data for

Exploratory: Societal Debates

Story: Polarised Political Debates or: Aalto-Twitter

Twitter data can be a rich source for social science research. It can be used for gaining a deeper understanding of the points of view held by different groups in society or see how the tone of a debate changes over time. In most cases the data is clearly made public by the data subject. Therefore it may seem that anyone, including researchers, could do with the data as they wish. This is not the always the case however. When a tweet is used in research, the context of that tweet changes. • It is possible (and even likely) that including a tweet in a publication generates more attention to that tweet than it would otherwise have had. • Moreover anonymisation of tweets is very hard. Many tweets are unique, so even with the name removed, it is very easy to do a search on the tweet and reconnect it to the person who tweeted it. A practical proposed by Tijerina and Keller is never to publish in a research a tweet that is longer than 5 words. • If a tweet is used that is not anonymized or is easily re-identifiable you should consider getting informed consent from the owner of the tweet.

But As Dag Elgesem (2015) argues, it depends on many aspects if consent is needed. And equally it depends on many factors if re-publicizing a non-anonymised tweet is acceptable. A practical example comes from research done within SoBigData. A team from Sheffield is conducting sentiment analysis on tweets about the Brexit, a highly sensitive political topic. In their ethical considerations they weighed the benefits of publishing some non-anonymised tweets and potential harm done to individual tweet authors. In the end they decided only to publish tweets of users that should expect to have public scrutiny such as cabinet members, members of parliament and account belonging to newspapers. To find the right balance it is important to apply the principles of deliberation and openness. When you are setting up your research make sure you: • get informed about the topic, for example via the proposed literature • Commit your initial ethical deliberations to paper • Discuss these deliberations with others, such as colleagues, contact persons from SoBigData and the IRB of your institution • Discuss your deliberations in a section of publications of your research Example litterature: Elgesem, Dag. 2015. Consent and information - ethical considerations when conducting research on social media. Kapittel 1, pages 14-34. In:Fossheim, Hallvard; Ingierd, Helene. 2015. Internet Research Ethics. 20 pages

D2.3 Legal and Ethical Framework for SoBigData 2 Page 64 of 81 SoBigData – 654024 www.sobigdata.eu

Tijerina, B. and Keller, E.F. 2015 Big Data Ethics Support Systems and Network. Kennedy, H., Elgesem, D., & Miguel, C. (2015). On fairness User perspectives on social media data mining. Convergence: The International Journal of Research into New Media Technologies. University of Sheffield Research ethics policy note on Research Involving Social Media Data. 10 pages.

Example real life problems: • Zimmer M. OkCupid Study Reveals the Perils of Big-Data Science. Wired. 14 May 2016. Concerns a case where a researcher made public a dataset containing profiles from dating website OK-cupid. The dataset was compiled by having a fake profile on the platform and having a bot scrape profiles of users. After privacy concerns were raised username and location were taken out of the database.

Contact persons: Philosopher ethics/privacy expert/ethics board: Prof. Dag Elgesem ([email protected]) of The University of Bergen, Information Sciences and Media Studies. Prof. Elgesem has a background in logic and analytical philosophy and has published widely on Ethics and IT, and Internet research ethics, specialising on ethical questions in doing social media research.

Technical ethics/privacy expert: Kalina Bontcheva ([email protected]) is senior researcher at the University of Sheffieldis has done many reserach projects that involved measuring social trends through using natural language processing techniques on corpora derived from social media. And she has had ample experience with discussing the ethics of this practice with ethics boards. [email protected]

D2.3 Legal and Ethical Framework for SoBigData 2 Page 65 of 81 SoBigData – 654024 www.sobigdata.eu

SoBigData Ethics Brief for Dataset: DE Webarchive

Exploratory: Societal Debates

Story: Monitoring topics across space and time

Contact persons: Gerhard Gossen

The number of internet archives is growing. The positive motivation for many of these archives is the preservation of cultural heritage that is produced and published on the web. For social scientists these archives present a great research opportunity, for example in observing cultural and political trends over time.

• Be aware that people who publish things online may not all be actively aware that content may be archived and therefore persist even after they “take it off line”. For many users of and contributors to the content of the internet the medium seems highly changeable. Content put online at one point in time may be taken off line at a later moment. However while it is often possible to delete content from the location where it is first published, this understanding of the potential temporal nature of online is naive. Content publicly available on the internet can be copied to another public or private location, making the deletion of the content at the initial place of publication ineffective as a way to make sure that the content is no longer available.

The European Court of Justice raised ethical issues with the right to be forgotten in Google vs Spain. In this decision the court held that the representation of a person by a a certain news story about him/her may give a skewed representation of him/her. Especially when time has passed this may constitute unfair harm. The same could hold for the representation of a person/issue through a web archive.

• By only publishing only highly condensed information, you can make sure that readers of your research can not gain knowledge about individuals’ personal data. • Be aware that (most) archives are incomplete. Most archives are incomplete representations of the domain they cover. It is important to be aware of this and carefully think about biases that that this may create in the data. • Be aware that the set of ideas published online are not necessarily representative of the set of ideas prevalent in a society.

There are some ideas about the use of technical means to alleviate some of the ethical concerns. It may be possible to have a privacy enhancing technology that automatically redacts (makes invisible) information that is private. Or to have an algorithm that automatically blocks queries on a database that seem to have malicious intent. However at this moment these techniques are not yet implemented. Therefore the burden of making correct ethical choices is on the researcher.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 66 of 81 SoBigData – 654024 www.sobigdata.eu

Example literature

Taylor, N. (2015) Questions of ethics at Web Archives 2015.

Good overview article with links to relevant literature on three main ethical questions. 1. The status of social media content and the related privacy questions in web archives. 2. The (skewed) and unclear incompleteness of web-archives with specific reference to discussions on the incompleteness of the Internet Archive Wayback Machine. 3. The Digital Divide in web archiving, focusing on the fact that relatively much more material form privileged internet content providers is archived.

Rauber, A., Kaiser, M., & Wachter, B. (2008). Ethical Issues in Web Archive Creation and Usage-Towards a Research Agenda. In 8th International Web Archiving Workshop (IWAW08). Similar to the ethics of web- search, identifies three assumptions basic to web archiving.1 The web constitutes a new form of publishing that should be archived just as other forms of publication are archived ( but in many cases “publishing” online should be seen as private communication in public, many users are not or little aware of the fact that they are publicly publishing)) 2. The web’s ephemeral nature is a deficiency in the design of the internet (yet for many users this ephemeral nature may be intended) 3. Web archiving is merely archiving material that is freely available anyway (but a picture over time of a certain person, topic, etc constitutes a very different thing.) http://cogdogblog.com/2016/06/dont-archive/ raises the question about the completeness of archives . It depends on certain conditions (e.g. the presence of a robot.txt file and this file allowing crawling) if a website is included in the archive. The text is written from a perspective of website owners who want their website to be part of the archive. What becomes clear is that, even for very tech savvy users, (non- )inclusion into the archive is (currently) not a matter of active choice.

Contact persons

Philosopher ethics/privacy expert: René Mahieu, PhD candidate at Delft University of Technology [email protected] . René works on the ethical framework in SoBigData where he tries to device an infrastructure that helps academics of different backgrounds to work together in an interdisciplinary way.

Technical ethics/privacy expert: Gerhard Gossen, PhD candidate at L3S Research Centre [email protected] . Gerhard works on search strategies for web archives.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 67 of 81 SoBigData – 654024 www.sobigdata.eu

SoBigData Ethics Brief for Dataset: Human mobility data

Exploratory: City of Citizens

GPS data concerning the movement of individuals or groups is an invaluable tool for the researcher to have. It allows us to discover how humans move throughout an area, be it a village, city, province or country. This has a wide variety of purposes, from improving how we manage traffic to learning what locations draw tourists in a city.

Yet the researcher must ask themselves difficult questions in order to ensure that the ends justify the means:

• Does their dataset allow for the targeting (i.e. pinpointing the exact location) of either identifiable individuals or specific groups, either when the dataset is used alone, or when it is combined with another dataset concerning the same data subjects?

In one widely publicised instance, Crawford & Metcalf (2016) provide an overview of the ways in which a seemingly innocuous dataset of taxi fares can be combined to reveal, not the identity of the taxi drivers, but also rather personal information, such as which of their passengers were celebrities or visited stripclubs. Much can be inferred from GPS data, especially when it is cross-referenced with other bits of information.

• If so, have appropriate measures been taken to minimise the possibility of (re-)identification of this or these individual(s)?

Promising ongoing methodological research is devoted to the question of how to reduce the specificity or ‘granularity’ of information, for as much as the purpose of the research will allow. Monreale et al. (2008), for example, have shown how GPS vectors can be clustered so as to reduce the possibility of targeting individuals. Put concretely: For purposes of traffic analysis, we do not need to know where John Smith is. It might be enough to know where the general collection of individuals in which John Smith may or may not reside is. Using methodology to reduce the granularity of a dataset, while preserving its salience, might allow for John Smith to be protected from unwanted attention.

• Has the original dataset been appropriately safeguarded?

In many cases, one might receive a dataset with fully individualisable data. As a researcher, one might choose to anonymise this dataset. Nevertheless, it is important that steps are taken that the original dataset is kept safe. State-of-the-art methodology amounts to nothing, if hard drives are left lying around.

• Do any insurmountable risks remain?

Even after having taking appropriate steps, risks may remain. GPS data may be used to discriminate against certain groups, as in the infamous case of ‘redlining’, where specific areas of the United States were denied certain benefits on the basis of their ethnic composition. If one suspects one’s research may be used for unintended purposes, it may be wisest to take the painful step of refraining from publication.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 68 of 81 SoBigData – 654024 www.sobigdata.eu

Example literature

Monreale, A., Andrienko, G. L., Andrienko, N. V., Giannotti, F., Pedreschi, D., Rinzivillo, S., & Wrobel, S. (2010). Movement Data Anonymity through Generalization. Trans. Data Privacy, 3(2), 91-121.

This paper, written by members of the SoBigData consortium, details how GPS trajectories can be generalised in order to improve the k-anonymity of the dataset, i.e. to reduce the probability of successfully identifying a selected individual. The main advantage provided by the method is to use an algorithm to replace the specific location of individuals with approximate areas.

Krumm, J. (2009). A survey of computational location privacy. Personal and Ubiquitous Computing, 13(6), 391-399.

This survey provides an overview of the developing field of computational methods to improve the anonymity of GPS datasets.

Example real life problems

• Crawford & Metcalf refer to a number of different researches into what can be inferred from a dataset released in 2013 by the New York Taxi & Limousine Commission. They show that this is not only due to the poor attempt at hashing the medallion numbers of the taxi driver, but moreover an general vulnerability of these kinds of specific datasets.

(Can be found in: Metcalf, J., & Crawford, K. (2016). Where are human subjects in big data research? The emerging ethics divide. Big Data & Society, 3(1), 2053951716650211.)

• Redlining, a term coined by the sociologist John McKnight, describes the act of indirectly discriminating against a certain ethnic or social group on the basis of their location. The location of individuals can often serve as a proxy for other, perhaps more sensitive pieces of information. (See: https://en.wikipedia.org/wiki/Redlining)

D2.3 Legal and Ethical Framework for SoBigData 2 Page 69 of 81 SoBigData – 654024 www.sobigdata.eu

SoBigData Ethics Brief for Dataset: Call Detail Records Exploratory: City of Citizens

Call detail records (CDR) are logs collected by telecom providers which contain the details of a telephone conversation. Rather than recording the content of the call (i.e. the spoken words), call detail records contain data about the conversation, such as the sending and receiving phone numbers, the starting time and duration of the conversation, the route by which the conversation travelled through the telephone network and more. CDRs have been used in a wide variety of cases, from studies on migration and transportation to that of the spread of infectious disease (as in the 2014 Ebola crisis). Yet they also pose a few significant risks. CDRs constitute a form of what is sometimes known as ‘individualisable data’. The fact that they contain phone numbers of individual citizens means that the information they contain might pose a harm to these citizens. Even if the phone numbers are hashed, each conversation has enough characteristics to make it relatively unique, allowing for the targeting of individual users. Recently Golle and Partridge (2009) showed that a fraction of the US working population can be uniquely identified by their home and work locations even when those locations are not known at a fine scale or granularity. Given that the locations most frequently visited by a mobile user often correspond to the home and workplace, the risk in releasing locations traces of mobile phone users appears very high. A first challenge for the researcher is therefore deciding how granular their data has to be. Researchers such as Letouzé (2014) show how it is possible to anonymise CDR data and reduce the specificity of the information it contains. It is a responsibility of the researcher to look into such methods and decide how to reduce the privacy risks associated with their research as much as possible. It should be noted in this regard that such anonymisation is never perfect. Indeed, as Zang and Bolot (2011) have shown, full anonymisation is even impossible as a result of the fact that all people tend to habitually visit the same place. And under different circumstances, with additional information gleaned from other data sources, it may always be possible to (re-)identify the data subjects in question. Research with CDRs is therefore always a matter of balancing risks and reward, gauging whether or not the research ends justify the possible dangers. Even when well-anonymised, it must always be noted that anonymisation of individuals is not always enough. CDR data can also be used to target specific groups (e.g. a vulnerable ethnic group that is known to be located in a certain area). CDRs are thus a prime example of highly sensitive data. It is the prerogative of the researcher to ask themselves whether the use of CDR data is legitimate with regard to the ends of the research pursued.

To find the right balance it is important to apply the principles of deliberation and openness. When you are setting up your research make sure you: • Get informed about the topic, for example via the proposed literature • Commit your initial ethical deliberations to paper • Discuss these deliberations with others, such as colleagues, contact persons from SoBigData and the IRB of your institution

D2.3 Legal and Ethical Framework for SoBigData 2 Page 70 of 81 SoBigData – 654024 www.sobigdata.eu

• Discuss your deliberations in a section of publications of your research

Example literature Golle, Phillippe and Partridge, Kurt. 2009. On the anonymity of home/work location pairs. In International Conference on Pervasive Computing, pages 390-397. Springer.

This paper discusses how individuals can be identified through the nearly unique combination of their home and work locations.

Zang, Hui and Bolot, Jean. 2011. Anonymization of location data does not work: A large-scale measurement study. In Proceedings of the 17th annual international conference on Mobile computing and networking, pages 145-156. ACM.

The authors look at the same problem as above, but from a different perspective: they consider the top N locations visited by each user instead of the simple home and work. The basic idea of this work is that more generally the number N of top preferential locations determines the power of an adversary and the safety of a user's privacy.

Letouzé, Emmanuel, & Vinck, Patrick. 2014. The Politics and Ethics of CDR Analytics.

This paper provides a general overview of both the political/ethical side and the technical side of CDR data. It also discusses possible means of anonymisation, whilst also identifying their advantages and shortcomings.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 71 of 81 SoBigData – 654024 www.sobigdata.eu

7 MOOC

7.1 NEED FOR A MOOC

A key element identified is the necessity to provide users of the RI with a base level of knowledge and information about data protection law, intellectual property law and ethics. This is based in the understanding that data protection law and digital ethics are highly specialized fields. Getting acquainted with the most important lessons in these fields maybe a daunting task for the average data scientists and getting into all the details may also be considered overkill. Specialization in science has clear benefits.

As mentioned in the introduction, one of the most pressing questions facing big personal data science now, is how to translate the work done in these specialized fields into working practices that influence the actual on the ground work that is being done with big data. In order to dissipate the basic knowledge within the RI and to make sure that all users are familiar with the basic element we are developing a massive open online course (MOOC). This MOOC will be followed by every new user of the platform. Every section of the MOOC will end with a small exam (probably in the form of multiple choice) to test the knowledge of the new user.

7.2 DEVELOPMENT PROCESS

For the development of the MOOC the members of the legal and ethical team have created a concise summary of the key elements of their respective specializations (See below 7.4- 7.6). These bodies of knowledge will be implemented into a format that makes the information easy to apprehend.

In developing the final format of the MOOC we have looked at different existing online educational tools about the legal and ethical elements of data science. We are looking into Mantra59 “a free online course for those who manage digital data as part of their research project” developed by the University of Edinburgh. This course is available online under a license which makes it possible to use it as a basis and make the changes deemed necessary.

7.3 PRELIMINARY CONTENT

In the sections below we describe the preliminary content of the MOOC on ethics, data protection law and intellectual property law.

7.4 DATA PROTECTION LAW SECTION60 A. Which regulations exist in Europe regarding the protection of personal data?

There are a number of national and international legal instruments on data protection in Europe that have to be considered when it comes to the processing of personal data. At present, the centerpiece of the European regulatory data protection framework is the Data Protection Directive 95/46/EC. Given the nature of a directive European Member States had to implement it into their national legal system. The European legislator enacted in spring 2016 a new legal instrument - the General Data Protection Regulation which

59 http://mantra.edina.ac.uk/ 60 This section is subject to the legal situation under the Directive 95/46/EC.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 72 of 81 SoBigData – 654024 www.sobigdata.eu

shall apply from 25 May 2018. This Regulation will be directly applicable in all Member States. Nevertheless the Regulation also provides for a number of implementing acts by the Member States.

B. When do I have to consider data protection law?

Crucial to the application of data protection regulations is the presence of personal data which is going to be processed.

Personal data are "any information relating to an identified or identifiable natural person ('data subject'); an identifiable person is one who can be identified, directly or indirectly, in particular by reference to an identification number or to one or more factors specific to his physical, physiological, mental, economic, cultural or social identity."

The data set provider will provide in the meta data whether the data set is regarded by him or her as personal or non-personal. This information should be considered by users and also verified if they download the data set.

You find more information on personal data in the Guidance notes on Research and the GDPR.

If you still struggle with the evaluation of the data as personal or non-personal please ask your data protection officer/ research ethics committee.

C. What is a data controller?

'Data controller' is who alone or jointly with others determines the purposes and means of the processing of personal data. Determination of the “means” of the processing does not only refer to technical procedures but also to the question which data shall be processed, who shall have access to the data, when is data to be deleted – which amounts in a nutshell to determining the “why” and “how” of certain processing activities. Data controllers processing personal data underly a number of legal rules. The most important regulations will be introduced to you in this MOOC.

If you are unsure whether you are a data controller ask for advice. For example, you can ask your data protection officer/ ethics committee.

D. Why is it important where the data controller is established?

Under the Regime of the Directive the establishment of the controller is important as the law of that Member State shall apply, where the controller is established, e.g. if the data controller is situated in Italy, Italian Data Protection law implementing the Directive is applicable to the processing of personal data by that data controller.

E. I think I do not qualify as a data controller – are there still legal requirements to consider?

SoBigData RI offers various forms of data analysis. In case you download data sets that contain personal data you will qualify as a data controller if you have downloaded the data and you are in control over the data set.

There are other possibilities to make specific analysis that do not give you the possibility to download the data but to apply a certain method via the platform on the data set. This is a borderline case. You are not

D2.3 Legal and Ethical Framework for SoBigData 2 Page 73 of 81 SoBigData – 654024 www.sobigdata.eu

given access to the data set itself and you will be only able to make analysis on the data according to default settings by the Data set provider, so data protection regulations may not apply to you as the User. We, however, encourage you to only initiate the analysis via the platform if you would be allowed to process the personal data yourself. It may also be the case that the analysis results that you will receive may still contain personal data although best effort is made to avoid this. In that case you would become a data controller with all related obligations. Please check the results carefully and if needed adhere to applicable data protection regulations.

F. Under which conditions am I allowed to process personal data?

Generally, it is prohibited to process personal data unless there is a legal ground allowing the processing.

The most important legal grounds for processing personal data in the research field are:

a) the Informed consent of the data subject; b) national regulations allowing the processing of personal data for research (these require that the interest of the researchers in pursuing their scientific research outweighs the interest of the data subject not to process their data for that purpose, it is usually required to ensure that personal data is sufficiently protected by technical and organizational measures); c) data has been made manifestly public by the data subject; this may be the case if the data subject has deliberately disclosed the data to the public and there is an obvious and conscious readiness by the data subject to make the data available to any member of the general public.

Please check the meta-data which may help you to assess the legal situation, e.g. if the data set is personal or if there is consent provided by the data subject. It is envisaged that the meta-data function as an evaluation aid for the User. Although reasonable effort is made to update the information it cannot be guaranteed that the meta data is correct and up to date.

In case of data protection sensitivities RI partners usually will only provide access to their data sets via transnational access which means you will have to apply to the respective Data set Provider and analysing the data will only be possible after admission and only on site of the Data set Provider.

G. What are the main data protection law principles?

The main principles are:

a) Personal data must be processed fairly and lawfully. This implies that personal data must not be processed if there is not a legal ground for doing so. In the new Regulation the transparency principle was amended which requires to process personal data in a transparent manner. It should be transparent to natural persons that personal data concerning them are collected, used, consulted or otherwise processed and to what extent the personal data are or will be processed.

b) The principle of purpose limitation requires that personal data may be only collected for specified, explicit and legitimate purposes and not further processed in a way incompatible with those purposes. Further processing of data for historical, statistical or scientific purposes is generally not to be considered as incompatible provided that appropriate safeguards for the data are in place.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 74 of 81 SoBigData – 654024 www.sobigdata.eu

The purpose of these safeguards is to prevent that the data will be used to the detriment of the data subject, e.g. to foster decisions or measures against him or her.

c) The principle of data minimization requires that personal data must be adequate, relevant and not excessive in relation to the purposes for which they are collected and/or further preprocessed.

d) Data must be also accurate and, where necessary, kept up to date; every reasonable step must be taken to that data which are inaccurate or incomplete (having regard to the purposes for which they were collected or for which they are further processed) are erased or rectified.

e) The data must be kept in a form which permits identification of data subjects for no longer than is necessary for the purposes for which the data were collected or for which they are further processed.

H. What further obligations do I have as a data controller?

There are several rights that data subjects enjoy that are framed in such a way to create obligations on the side of the data controller: such as

a) the right of access b) the right to correct, erasure or blocking of transfer of inaccurate or incomplete data c) the right to object the processing in case of compelling reasons.

There can be exceptions to these rights in the national laws where there is clearly no risk of breaching the privacy of the data subject and when data are processed solely for purposes of scientific research.

The data controller must as well implement appropriate technical and organizational measures to protect personal data against accidental or unlawful destruction or accidental loss, alteration, unauthorized disclosure or access, in particular where the processing involves the transmission of data over a network, and against all other unlawful forms of processing. Having regard to the state of the art and the cost of their implementation, such measures shall ensure a level of security appropriate to the risks represented by the processing and the nature of the data to be protected.

I. What restrictions apply if my research institute is established outside of the EU/EEA?

The Data Protection Directive applies to all EU countries and additionally to the non-EU countries Iceland, Liechtenstein and Norway (EEA). Precautions must be taken if personal data is transferred outside the EEA to third countries. Without such precautions, it would be very easy to undermine the high standards of data protection established by the Data Protection Directive as it takes minimal effort to move data around in international networks. Therefore the current legal framework set up by the Directive states that personal data can only be transferred to countries outside the the EEA when an adequate level of protection is guaranteed.

Only Users from countries that are listed by the European Commission as ensuring an adequate level of protection will be able to use the platforms full services. For all other countries only a limited number of services will be available. It will be decided in the future how the SoBigData IR will enhance its services also to those countries in compliance with the applicable data protection regulations.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 75 of 81 SoBigData – 654024 www.sobigdata.eu

7.5 INTELLECTUAL PROPERTY RIGHTS SECTION

Aspects of Intellectual Property Rights surrounding use of social media content

A. What is IP?

• Intellectual property (IP) refers to creations of the mind, such as inventions; literary and artistic works; designs; and symbols, names and images used in commerce (http://www.wipo.int/about- ip/en/).

• Original creations in the literary, scientific and/or artistic domain, whatever may be the mode or form of expression, such as: writings, photographic works, melodies, works of applied art; illustrations, maps, plans, sketches and three-dimensional works relative to geography, topography, architecture or science, constitute subject matter protected by copyright.

• SoBigData RE provides access to some social media content, such as Facebook, Twitter, Flickr.

• Some social media content, such as blogs, commentaries, tweets, photos from Flickr, may be protected by Intellectual Property (IP) rights, most typically copyrights. B. What actions matter for copyright?

• Distribution, reproduction, translation, modification, upload and sharing of such IP-protected content items to the public constitute copyright relevant actions and require authorization of the right holder, who is typically the author. C. Under what terms IP-protected content may be used?

• For example, an author may upload pictures to Flickr under Creative Commons (CC) share-alike license or “all rights reserved” or “all possible uses allowed”. Image 1. Snapshot of Flickr picture licensed under CC Attribution 2.0 Generic License

Source: Flickr Author: lost places

D2.3 Legal and Ethical Framework for SoBigData 2 Page 76 of 81 SoBigData – 654024 www.sobigdata.eu

License: Creative Commons Attribution 2.0 Generic License Under CC Attribution 2.0 Generic License you are free to:

● Share — copy and redistribute the material in any medium or format ● Adapt — remix, transform, and build upon the material for any purpose, even commercially.

Under the following terms:

• Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

• More than that, since the primary purpose of sharing such content was the distribution via social media, the processing of some social media content items beyond the settings of a platform, either collected via platform API or retrieved via web crawling, may stay governed by the platform terms. It is typically the case for Twitter content.

Image 2: Public Tweet retrieved on 01.11.2016

• Content from Twitter stays under the terms of Twitter.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 77 of 81 SoBigData – 654024 www.sobigdata.eu

D. Where can I find the terms under which to use the content?

• The terms, which govern the use of individual datasets are indicated in the metadata accompanying each particular dataset of SoBigData catalogue.

• Please consult the terms applicable to each dataset before using that dataset for doing your research and make sure that your research will be compliant with those terms. Under what terms may I use the content, which I access on SoBigData RI?

• Please be aware that access to the SoBigData catalogue works via registration as a user and acceptance of D4Science Terms available at: https://services.d4science.org/terms-of-use

• The D4Science Terms of Use govern in principle the use of D4Science services, such as: Email Service, Workspace Service, Social Service, etc. However, the use of individual datasets remains governed by the terms of individual datasets.

• Section Copyright of the D4Science Terms provides:

D2.3 Legal and Ethical Framework for SoBigData 2 Page 78 of 81 SoBigData – 654024 www.sobigdata.eu

• Please also be responsible for the research you make and be respectful of the rights of others parties. By doing so, each user contributes to the community of ethical research.

• If you want to do your research in a way that your rights are respected please respect the rights of the others as well.

Answering the questions below will help you to check your knowledge of IP-basics of doing social media research. Fo the IP section a prototype of a test has been developed: IP-Quiz:

What is IP?

a. Creations of the mind b. Innovations c. Scientific discoveries d. Logic, algorithms and mathematical formulae What items of social media content can be considered as IP-protected content: a. Photos b. Posts c. Music files d. Individual symbols, letters, numbers e. Videos f. Geo-data g. CDR h. IP-addresses What requirements shall a work satisfy to be protected by copyright:

D2.3 Legal and Ethical Framework for SoBigData 2 Page 79 of 81 SoBigData – 654024 www.sobigdata.eu

a. Original creation b. Expression

c. Financial investment d. Official registration What actions are relevant for copyright:

a. Distribution b. Reproduction

c. Upload d. Modification e. Streaming f. All Under what conditions you may use works released under Creative Commons Attribution License? a. Credit author b. Credit author, provide a link to the license and mark modifications, if any c. Pay license fees d. Release under the same license terms Under what terms you may use Twitter content as made available via D4Science Infrastructure: a. Twitter Terms b. D4Science Terms c. Metadata

Terms dictated by the researcher providing the data

D2.3 Legal and Ethical Framework for SoBigData 2 Page 80 of 81 SoBigData – 654024 www.sobigdata.eu

8 CONCLUSION: ETHICS SECTION Social science is not as it used to be. As a data scientist, you can now have enormous quantities of data at your disposal. But this data may lead to some serious ethical consequences, such as: • Threatening the autonomy of individual citizens, who lose control over what is known about them and how they present themselves; • Actually harming individuals, by revealing information (e.g. their address or credit card number) that allows them to be targeted by criminals; • Causing injustices, by allowing information to be used outside of its original context (e.g. medical info for your insurer), and • Perpetuating inequality, by causing asymmetries in who controls information and deepening societal divides. As a researcher, you have the responsibility to concern yourself with whether or not these consequences could be a result of your research. In order to make sure you are asking yourself the right questions, we invite you to fill out the ethical and legal self-assessment sheet we have provided. This will provide you with a general overview of the ramifications of your work. Answering these questions is not always easy, but do not worry, the problems themselves are today still often difficult.

Above all, try to uphold the following principles in your work: • Maintain and demonstrate awareness of the problems in your work; • Open yourself up to critical questioning; • Discuss the trade-offs you make publicly; • Apply privacy-preserving techniques whenever possible; • Aim to inform the data subjects of your research. In order to meet these standards, you need to consider if and how your research can be tailored in a way that possible harms are eliminated. Fortunately, there are ways how to do this. At SoBigData, we offer assistance and support in this regard. For each kind of dataset, we provide you with so-called ‘ethics briefs’. These contain information on how best to proceed with your analysis, by first outlining the risks involved with this particular set of data, linking you to relevant literature and offering indications on how your methodology can be improved, by for example using an algorithm to anonymise individuals in your dataset. As the field of data research is still evolving, solutions are not always forthcoming. That is why you may yourself want to make a name for yourself, by proposing new ethical methodologies.

If you are ever unsure on how to proceed, contact the person listed in the ethics brief, or refer to the ethical or legal team and we will help you get along.

D2.3 Legal and Ethical Framework for SoBigData 2 Page 81 of 81