Schwartmann/Weiß (Eds.)

White Paper on Pseudonymization Drafted by the Data Protection Focus Group for the Safety, Protection, and Trust Platform for Society and Businesses in Connection with the 2017 Digital Summit

– Guidelines for the legally secure deployment of pseudonymization solutions in compliance with the General Data Protection Regulation – Schwartmann/Weiß (Eds.)

White Paper on Pseudonymization Drafted by the Data Protection Focus Group for the Safety, Protection, and Trust Platform for Society and Businesses in Connection with the 2017 Digital Summit

– Guidelines for the legally secure deployment of pseudonymization solutions in compliance with the General Data Protection Regulation –

Headed by: Prof. Dr. Rolf Schwartmann Members: Walter Ernestus Research Center for Media German Federal Commissio- Law – Cologne University of ner for Data Protection and Applied Sciences Freedom of Information

Sherpa: Steffen Weiß, LL.M. Nicolas Goß German Association for Data Association of the Internet Protection and Data Security Industry

Members: Prof. Dr. Christoph Bauer Michael Herfert ePrivacy GmbH Fraunhofer Society for the Advancement of Applied Patrick von Braunmühl Research Bundesdruckerei GmbH Serena Holm Susanne Dehmel SCHUFA Holding AG German Association for Infor- mation Technology, Telecom- Dr. Detlef Houdeau munications and New Media Infineon Technologies AG

 Version 1.0., 2017 Sherpa contact: Issued by the Digital Summit’s Steffen Weiß data protection focus group German Association for Management contact: Data Protection and Data Security Prof. Dr. Rolf Schwartmann Heinrich-Böll-Ring-10 (TH Köln/GDD); 53119 Bonn Germany +49 228 96 96 75 0 [email protected]

Gesellschaft für Datenschutz und Datensicherheit e.V.

Members: Annette Karstedt-Meierrieks Members: Dr. Sachiko Scheuing DIHK – Association of German Acxiom Deutschland GmbH Chambers of Industry and Commerce Irene Schlünder Technologie- und Methoden- Johannes Landvogt plattform für die vernetzte German Federal Commissio- medizinische Forschung e.V. ner for Data Protection and Freedom of Information Dr. Claus D. Ulmer Deutsche Telekom AG Prof. Dr. Michael Meier University of Bonn/German Dr. Winfried Veil Informatics Society German Federal Ministry of the Interior Jonas Postneek Federal Office for Information Dr. Martina Vomhof Security German Insurance Association

Frederick Richter, LL.M. Foundation for Data Protection 4 White Paper on Pseudonymization Drafted by the Data Protection Focus Group for the Safety, Protection, and Trust Platform for Society and Businesses in Connection with the 2017 Digital Summit – Guidelines for the legally secure deployment of pseudonymization solutions in compliance with the General Data Protection Regulation –

Foreword

Data is the raw material of the future. It can consist of information about how we live our lives, which can be deployed for medical products, or of infor- mation about our driving behavior which can be based on weather forecasts or our mistakes at the wheel. Data is generated when we use interconnected appliances, from television sets to toasters.

Because this raw material is made up of human characteristics, it may not be mined like metal ores that are brought to a forge for smelting into steel. As a form of "digitized personality", this raw material requires special protection before it may be used.

European data protection law in the General Data Protection Regulation makes companies responsible for protecting by requiring them to ensure that data is protected by means of technological features and data protection-friendly default settings. In line with the ideas underlying the new law, pseudonymizing personal data represents a very important technique for protecting such information. It envisions the use of suitable processes to break the link between data and a given person, thereby making it possible to use the data in a manner that conforms to data protection requirements. 5

Due to the particular importance of pseudonymization, the Data Protection Focus Group at the German Government’s Digital Summit assumed responsibility for this economically significant issue. An interdisciplinary group of representatives from the business community, ministry administration, research, and watch- dogs worked with the Focus Group to produce this White Paper.

We would like to express our thanks for their constructive, efficient, and sus­ tained cooperation as part of this project. We would like to express our particu- lar thanks to assessor Steffen Weiß for his expert and painstaking coordination of the Focus Group’s work.

Cologne, June 2017

Professor Dr. Rolf Schwartmann (Head of the data protection focus group for the safety, protection, and trust platform for society and businesses in connection with the German government‘s digital agenda) 6 White Paper on Pseudonymization Drafted by the Data Protection Focus Group for the Safety, Protection, and Trust Platform for Society and Businesses in Connection with the 2017 Digital Summit – Guidelines for the legally secure deployment of pseudonymization solutions in compliance with the General Data Protection Regulation –

Contents

Foreword...... 4

1. Introduction...... 8

2. Mission for the Digital Summit / goals of the Focus Group...... 9

3. Framework conditions of pseudonymization...... 10 3.1. Definition...... 10 3.2. Differentiating between pseudonymization and anonymization...... 12 3.3. Functions...... 14 3.3.1. Protective function...... 14 3.3.2. Enabling and easing function...... 15

4. Techniques, and technical and organizational requirements ...... 17 4.1. Cryptographic basics and processes...... 17 4.2. Requirements regarding pseudonyms...... 18 4.2.1 Availability requirements...... 18 4.2.2. Role binding...... 18 4.2.3. Purpose binding...... 18 4.3. Examples of pseudonymization techniques for implementing availability requirements...... 19 4.3.1. Linkable, disclosable pseudonyms...... 19 4.3.2. Non-linkable disclosable pseudonyms...... 19 4.3.3. Linkable non-disclosable pseudonyms...... 20 4.3.4. Role binding...... 20 4.3.5. Organizational purpose binding...... 20 4.3.6. Technical purpose binding...... 20 4.4. Technical and organizational requirements...... 20 7

5. Transparency and data subjects' rights regarding pseudonymized data...... 22 5.1. General transparency requirements...... 22 5.2. Special conditions regarding pseudonymized data...... 22 5.2.1. Information obligations (GDPR Articles 13 and 14)...... 22 5.2.1.1. Initial situation...... 22 5.2.1.2. Special situation: Disclosing pseudonymized data to a third party...... 23 5.2.2. Data subjects' rights as per Articles 15-22 and 34...... 23 5.2.3. Transferring pseudonymized data to third parties...... 25 5.3. Tracing results to a person...... 25

6. Application scenarios...... 26 6.1. Pseudonymization and Entertain TV (DTAG)...... 26 6.1.1. Overview...... 26 6.1.2. Data generation...... 26 6.1.3. Pseudonymization...... 27 6.1.4. Statistics generation...... 29 6.1.5. Opt-out...... 30 6.2. Direct marketing...... 30 6.2.1. Promotional campaign...... 30 6.2.1.1. Using pseudonymized data for comparing and matching data with external sources – creating an analysis database...... 30 6.2.1.2. Interest-based advertising: Using pseudonymized data for selecting data...... 31 6.2.1.3. Gauging the success of a campaign with pseudonymized data...... 32 6.2.2. Data marketing on behalf of an agency/data broker...... 32 6.2.3. Using display advertising...... 32 6.3. Pseudonymization in the guidelines on data protection in medical research – TMF's generic solutions 2.0...... 34 6.3.1. Overview: TMF's data protection guideline...... 34 6.3.2. Pseudonymization as a technical and organizational measure for informational separation of powers...... 36 6.3.3. Involving a trustee...... 38 6.3.4. Repseudonymizing by pseudonymizing service (custodian) when exporting data to research module...... 42 8 White Paper on Pseudonymization Drafted by the Data Protection Focus Group for the Safety, Protection, and Trust Platform for Society and Businesses in Connection with the 2017 Digital Summit – Guidelines for the legally secure deployment of pseudonymization solutions in compliance with the General Data Protection Regulation –

1. 1. Introduction formation was lost. This ensures their per- sonal rights are not infringed. Data pseu- The global phenomena of digitization and donymization can also win people's trust networking continue apace and are linked if highly transparent processes (involving with a significant increase in data volumes technology) are used to "encode" personal and information requirements. This also information for the data subject's benefit, pertains to our personal data. Protecting and if this approach makes it evident that this data should be a core ambition if we the relevant controller is interested in safe- want to ensure that nobody's data can be guarding this information. subjected to processing that he/she can no This White Paper is intended to provi- longer control or even clearly understand. de an overview of the relationship between At the same time, technical advances also pseudonymization and data protection, provide us with new opportunities that and what functions it can serve. The paper arise when we divulge our personal data. also looks at technical and organizational There is therefore a need to carefully weigh options for performing pseudonymizati- our desire to have our data excluded from on. Finally, it investigates specific applica- processing against the extent of essential tion scenarios that are already in use when data processing, i.e. by a data controller. As handling pseudonymized data. its starting point, this kind of balancing act The paper's contents are aligned with should be based on the question of whether the legal provisions of the EU's General it is or is not necessary to process personal Data Protection Regulation (GDPR), which data. will enter into force on May 25, 2018. The Pseudonymizing personal data provides GDPR explicitly deals with the topic of one way to negotiate a route midway bet- pseudonymization in a number of diffe- ween the conflicting interests of individuals rent sections. The Digital Summit can play and data processors, and to develop usa- an important role by contributing, within ge scenarios where plaindata is no longer a European context, information derived required. Pseudonymizing is when infor- from many years of experience with pseud- mation pertaining to a person's identity is onymization as a result of Germany's still- removed during the course of processing. applicable Federal Data Protection Act, in Pseudonymization can benefit data the hope of assisting all those involved to protection in myriad ways. For example, arrive at a shared understanding of and ap- even if a controller loses data by accident, proach to the matter. pseudonymization makes it very difficult to connect data to the individuals whose in- 9

2. Mission for the Digital Summit / However, the Focus Group's objectives goals of the Focus Group go further than simply developing guideli- nes. In the name of Europe-wide harmoni- This White Paper was developed within zation, it would be beneficial to use these the context of the Digital Summit, the guidelines to draft a code of conduct on nine-platform response of the German pseudonymization and thereby create an government to the digital transformation. acknowledged and binding set of stan- Platform 8, "Safety, protection, and trust dards. The GDPR expressly encourages as- for society and businesses," is overseen sociations and organizations to draft rules by Germany's federal interior minister, Dr. and so add to the detail and scope of GDPR Thomas de Maizière, and its mission is to data processing activities, pseudonymiza- establish online safeguards and security in tion among them. The relevant data pro- such a way that digitization can realize its tection supervisory authorities will then full potential for society and the economy approve these rules. The European Commis- in Germany. A modern, highly effective data sion may decide that the approved code of protection system can safeguard the free- conduct submitted to it has general validity dom and rights of individuals while within the European Union. at the same time making it possible to seize the opportunities associated with digitiza- tion. The data protection focus group was established as part of the preparations for Platform 8, and it is headed up by Prof. Dr. Schwartmann. In addition to investigating the issues of pseudonymization and its application, the Focus Group would also like to see this paper generate guidelines for the legally compliant handling of pseudonymization solutions for deployment by private- and public-sector organizations and bodies alike. Guidelines can play a major role in ensuring that pseudonymization is deplo- yed in a uniform manner across the board. 10 White Paper on Pseudonymization Drafted by the Data Protection Focus Group for the Safety, Protection, and Trust Platform for Society and Businesses in Connection with the 2017 Digital Summit – Guidelines for the legally secure deployment of pseudonymization solutions in compliance with the General Data Protection Regulation –

3. Framework conditions of This gives rise to three requirements that pseudonymization pseudonymization must meet:

a) Data cannot be attributed to a specific 3.1. Definition person without the use of additional Pseudonymization has long played a role information in Germany's data protection law, and it is enshrined in Section 3.6a of the country's If it is possible to attribute data to an Federal Data Protection Act. Its application identifiable personwithout further ado takes the form of technical and organiza- (i.e. because the data contains a name, an tional measures for secure data processing address, or a staff number), this data is not as well as of data-minimizing measures. pseudonymized. In this situation, the ad- Section 15.3 of Germany's Telemedia Act ditional pseudonymization requirements do even allows the use of data on the condi- not apply. tion that such data is pseudonymized. The GDPR now introduces a harmonized defini- b) Separate storage of additional infor- tion of pseudonymization for all of Europe, mation and refers to this solution in several places. Article 4.5 of the GDPR defines pseud- The data which would make it possible to onymization as "the processing of personal identify a person must be stored separately data in such a manner that the personal in such a manner that it cannot be com- data can no longer be attributed to a spe- bined without further effort. One option is cific data subject without the use of addi- the logical separation of data by means of tional information, provided that such ad- different access authorizations. A technical ditional information is kept separately and or organizational separation is insufficient is subject to technical and organizational if it does not prevent access to the data measures to ensure that the personal data which facilitates attribution. The protection are not attributed to an identified or iden- level of the data in question can be used as tifiable natural person". a gauge for how thorough this separation should be. Before applying pseudonymization processes, it is always necessary to clarify who has access to the allocation tables or encryption processes, who generates the pseudonym, whether the risk of depseud- 11

onymization can be ruled out, and under To incentivize the deployment of pseu- what conditions the combination of iden- donymization, Recital 29 clarifies that pseu- tification data is permitted.1 If other data donymization measures, even if allowing is to be added to the pseudonymized data, general analysis, should be possible within the new data must be checked to assess the same controller if that controller has ta- whether it could undo pseudonymization ken technical and organizational measures because the merger of the two sets of data necessary to prevent the unauthorized re- makes it possible to unambiguously iden- identification of the person in question. In tify a person. other words, it is not necessary to involve a third party, i.e. a data custodian. Individual c) Securing technical and organizational cases can be checked to assess which va- measures for non-attribution riant should be prioritized in light of data protection requirements. Recital 26 of the GDPR makes it clear that Pseudonymization should also be con- personal data that has undergone pseudo- sidered as a criterion when assessing the nymization can be considered information question of whether processing data for a about an identifiable natural person if it is purpose other than the one for which the possible to attribute it to a natural person data was initially collected is compatible by including additional information. As with this original purpose (Article 6.4.e).2 data which can be attributed to an indivi- As pseudonymized data should be treated dual, it is subject to the GDPR. as personal data, it must be deleted when The regulation understands pseudonymiza- such data is no longer necessary in relation tion first and foremost as arisk-reducing to the purposes for which it was collected technical or organizational measure (see or otherwise processed.3 Recital 28). This recital also clarifies that the explicit introduction of pseudonymiza- tion in the Regulation is not intended as an impediment to other data protection measures.

1 Paal/Pauly, General Data Protection Regulation, GDPR, Section 4, Lines 40-47. 2 See point 3.3.2. 3 See point 4.4. 12 White Paper on Pseudonymization Drafted by the Data Protection Focus Group for the Safety, Protection, and Trust Platform for Society and Businesses in Connection with the 2017 Digital Summit – Guidelines for the legally secure deployment of pseudonymization solutions in compliance with the General Data Protection Regulation –

3.2. Differentiating between pseudonymi- anonymization, but such a definition arises zation and anonymization from the definition of "personal data" in During the course of the discussion about Article 4.1 of the GDPR, and it is laid out the GDPR, one question that resurfaced in Recital 26: time and again is whether the same diffe- rence was made between pseudonymization "The principles of data protection should and anonymization everywhere in Europe. therefore not apply to anonymous informa- The EU's current Data Protection Directive tion, namely information which does not (95/46/EC) does not mention pseudonymi- relate to an identified or identifiable natural zation, and only Recital 26 clarifies that the person or to personal data rendered anonymous GDPR's protection principles "should there- in such a manner that the data subject is not or fore not apply [...] to personal data rende- no longer identifiable. This Regulation does not red anonymous in such a manner that the therefore concern the processing of such an- data subject is not or no longer identifi- onymous information, including for statistical able". Concerning this issue, the Regulation or research purposes." establishes clarity by detailing a definition of pseudonymization in Article 4.5 and di- stinguishes between pseudonymization and anonymization in the recitals (above all in Recital 26). Article 4 does not itself define

The difference between pseudonymized and anonymized data can therefore be described as follows:

Pseudonymized data Anonymized data

Individuals can be identified following Identifying individuals is impossible or the addition of information that is stored requires inordinate effort. separately or publicly accessible. 13

At what point exactly it is no longer pos- In other words, the following considera- sible to identify the individual in question tions are of key importance: is a topic that was repeatedly disputed in n The costs of identifying someone the past, and it is relevant for distinguishing n How long it takes to identify someone between pseudonymization and anonymi- n Technology available at the time of zation. Section 3.6 of Germany's Federal processing Data Protection Act explains when this is n Technological development (i.e. what the case, stating that anonymization exis- future developments can be predicted) ted only if disproportionate effort (time, ex- n Other objective factors pense, and labor) was required to attribute information to a specific person. Like the Federal Data Protection Act, the Recital 26 of the GDPR also supplies GDPR adopts a relative view of how to additional details regarding when informa- identify someone. It does not require the tion is anonymized in such a manner that it absolute irreversibility of anonymization for is no longer possible to identify the person all time, instead focusing on a situation in in question: which nobody can or would, in all proba- bility, undertake de-anonymization because "To determine whether a natural person is it would require too much effort and be too identifiable, account should be taken ofall the complex, if not impossible. This view should means reasonably likely to be used, such as sin- be based on conditions at the time when gling out, either by the controller or by another the data is processed, but attention should person to identify the natural person directly also be paid to technological developments or indirectly. To ascertain whether means are for the future.4 Another issue where this reasonably likely to be used to identify the plays an important role concerns data that natural person, account should be taken of a controller pseudonymized and for which all objective factors, such as the costs of and only it possesses the information to link the the amount of time required for identification, data to the relevant data subject: If this in- taking into consideration the available techno- formation is passed to another controller, logy at the time of the processing and technolo- can it be considered as anonymized or not gical developments." for this second controller? 5

4 For information on the legal means that a website provider has for linking people to the data of its visitors' dyna- mic IP addresses, see the European Court of Justice C-582/14, dated October 19, 2016. 5 See point 5.2.3. for additional information 14 White Paper on Pseudonymization Drafted by the Data Protection Focus Group for the Safety, Protection, and Trust Platform for Society and Businesses in Connection with the 2017 Digital Summit – Guidelines for the legally secure deployment of pseudonymization solutions in compliance with the General Data Protection Regulation –

The GDPR criteria named here can processing) does not directly address the therefore be used to distinguish pseudo­ pseudonymization of personal data, but nymization and anonymization in indivi- it becomes relevant, and beneficial for the dual cases. individual in question, when attribution to a specific person is no longer necessary to 3.3. Functions achieve the purpose of processing. Here, 3.3.1. Protective function pseudonymization is a measure for imple- Pseudonymization fulfills a range of pur- menting what is stipulated by law and an poses in the GDPR. Of major importance expression of the parsimonious approach to is the protective function it extends to so- personal data. Continuing in this vein, "pri- meone whose data is being processed. This vacy by design", detailed in Article 256, is a protective function is expressed at several data protection principle that calls for data points in the law. Pseudonymization pro- minimization to be put into practice when tects someone from direct identification. the means for processing personal data are To this end, Article 4.5 of the GDPR states chosen and also when technical and orga- that it is not possible to attribute pseudo- nizational measures are used to perform nyms to a specific person without making the actual processing. Pseudonymization is use of additional information. It would po- one of the options at these two junctures. tentially require inordinate effort to identify Following the "privacy by design" principle, the relevant person without additional in- pseudonymization ensures that it is possi- formation over which the processing party ble to uncouple personal information from has control. The pseudonym functions as a other data at an early stage. This can facili- "mask". Further details are provided regar- tate effective and comprehensive protection ding the prevention of direct identification: for the individuals concerned. This additional information must be stored According to the GDPR, the security separately and be subject to technical and of personal data processing, expressed in organizational protective measures. Recital 83 by measures for protecting data The principle of data minimization as from its unintentional or unlawful destruc- per Article 5.1.c (i.e. personal data must tion, loss, modification, unauthorized dis- be adequate and suitable for the given closure, or unauthorized access, can also purpose, and its scope must also be limi- involve pseudonymization. In other words, ted to what is required for the purpose of pseudonymization would be part of a data

6 See Art. 25. 7 Sentence 2 of Recital 28 states: "The explicit introduction of “pseudonymization” in this Regulation is not intended to preclude any other measures of data protection." 15

protection strategy7 which can be put into 3.3.2.  Enabling and easing function practice via a catalog of technical and or- If the purpose of processing does not re- ganizational measures. In this case, the per- quire directly identifying data subjects but sonal data should be subject to protection is not an option, pseu- that is commensurate with the risk (see Ar- donymization can be used to protect the ticle 32, first sentence, first half-sentence). data subjects. For anyone involved in pro- Pseudonymization also affords protec- cessing who does not have access to the tion according to Recital 28 by reducing key, Identifying the sources is just as im- risks for people whose data is being pro- possible is in the case of anonymized data, cessed. Details of these "risks" are provided because the additional information neces- by Recital 75's information on infringe- sary for attributing personal data to specific ments of the personal data's protection. sources is stored separately. Such infringements can be described as Given the GDPR's risk-based approach, "data breaches". For example, risks can it is therefore justified that pseudonymiza- take the form of physical, material, or non- tion also benefits the controller. Pseudony- material damage, such as identity theft or mization can permit it to process data in fraud, financial loss, or reputational dama- ways that would otherwise not be permit- ge. Pseudonymized data can reduce these ted, something that is particularly impor- risks by making it impossible or difficult to tant now, in the era of big data and the identify the relevant people in the event internet of things. of data being lost or stolen. EU lawmakers Article 6.4 describes an important ex- have therefore identified particular infor- ample of this and applies to processing for mation obligations, as outlined in Recital a purpose other than that for which the 85, should pseudonymization be breached. personal data have been collected. Whe- The controller for the data must therefore ther a new purpose is compatible with the act immediately if the protective function is original purpose and whether further pro- compromised. cessing may therefore use the same legal basis require careful consideration. Article 6.4) lists several criteria that must be taken into account. The purposes are compatible if there are suitable guarantees8 , and pseu- donymization can be one of these guaran-

8 Article 6.4.e, along with Article 29 of the data protection group's WP203, pp. 25 and Monreal ZD 2016, 507, 511 put forward the claim that suitable guarantees can make up for deficiencies arising from the application of other consideration criteria. 16 White Paper on Pseudonymization Drafted by the Data Protection Focus Group for the Safety, Protection, and Trust Platform for Society and Businesses in Connection with the 2017 Digital Summit – Guidelines for the legally secure deployment of pseudonymization solutions in compliance with the General Data Protection Regulation – tees. Pseudonymization is normally not the subject which require personal data to be sole factor, but it can play the most impor- protected. When weighing up different in- tant role in the decision to permit further terests, it can be of benefit to the controller processing. if it has pseudonymized the data.9 A special case regarding the compatibi- Pseudonymizing data can free the con- lity of processing with the original purpose troller from certain data protection obli- is outlined in Article 5.1.b: Processing for gations or minimize these obligations. For archiving purposes in the public interest, example, Article 11.1 limits the obligation scientific or historical research purposes, to uphold the rights of the data subject. or statistical purposes. These activities are If identifying the data subject is not or no compatible with the original purpose if the longer necessary when a controller proces- requirements outlined in Article 89.1 are ses data, the controller is not obligated to adhered to. This necessitates suitable mea- maintain, acquire, or process further in- sures to protect the rights and freedoms of formation for identifying the data subject the data subject. Pseudonymization can be simply to comply with the Regulation. This one of these measures provided it makes satisfies the principle of data minimization fulfilling the above-mentioned goals pos- as per Article 5.1.c. This changes the requi- sible (Art. 89.1), second sentence). Pseud- rements regarding fulfillment of the data onymization can therefore make statistics subject's rights as per Article 11.2. and research undertakings possible. Just how much pseudonymization can In addition to the cases explicitly na- contribute to the enabling and easing med in the Regulation, pseudonymizing function is something that must be as- data can play a role in connection with sessed differently in different sectors and provisions requiring the consideration of processing situations. It can therefore make interests. For example, Article 6.1.f permits sense to define pseudonymization rules for data processing necessary to safeguard the different situations in sector-specific codes controller's legitimate interests, provided of conduct. Article 40.2.d expressly addres- these are not overridden by the interests or ses this. fundamental rights and liberties of the data

9 For example, see Buchner/Petri in Kühling/Buchner, GDPR, Article 6, Rn. 154. 17

4. Techniques, and technical and put, but inversion is difficult, i.e. it is hard organizational requirements to take the functional value and identify the input from it. Here, "easy" and "hard" Different techniques can be used to im- are to be understood in the sense of com- plement pseudonymization. For example, putational complexity theory. Simplifying it is possible to use an allocation table greatly, "hard" can be rephrased as "practi- that links every plaintext item of data to cally impossible within a suitable length of one or more pseudonyms. If people have time". access to the allocation table, they can link A cryptographic hash function is a a pseudonym to the associated plaintext one-way function that is collision-resistant. item of data by scanning the relevant ent- It attributes a hash value with a fixed length ries. Access to the allocation table should to an input of any length. Collisions exist as therefore be restricted. Alternatively, pseud- the input is larger than the output. How- onymization can entail the use of different ever, "collision resistance" makes it practi- cryptographic techniques which transform cally impossible to calculate these collisi- a plaintext item of data into one or more ons, i.e. different inputs with the same hash pseudonyms. Access to the cryptographic value. An example of a cryptographic hash keys and, if necessary, other parameters can function is SHA-256, which computes hash be used to manage/restrict the irreversibi- values with a length of 32 bytes. lity of the pseudonymization process. The An encryption process uses a key to following sections take a particularly close transform plaintext into ciphertext. Dec- look at cryptographic processes and their ryption is the inverse process of transfor- uses in connection with pseudonymization. ming ciphertext into the original plaintext. Symmetric encryption processes see the use 4.1. Cryptographic basics and processes of the same key for encryption and decryp- This section provides an overview of the tion. Asymmetric encryption processes use main terms associated with cryptographic a pair of keys, one public and one private. processes and their uses in connection with The public key is used for encryption, while pseudonymization. the private key is used only for decryption. A one-way function is an "easily" com- Encryption processes are normally de- puted mathematical function that is "dif- terministic, i.e. the same plaintext is trans- ficult" to invert. This means that it is "easy" formed into the same ciphertext (using the to compute the functional value of an in- same key). If an encryption process gene- 18 White Paper on Pseudonymization Drafted by the Data Protection Focus Group for the Safety, Protection, and Trust Platform for Society and Businesses in Connection with the 2017 Digital Summit – Guidelines for the legally secure deployment of pseudonymization solutions in compliance with the General Data Protection Regulation – rates different ciphertexts each time while in certain situations. This can be achieved using the same key, this process is described by generating the pseudonym by encrypting as being probabilistic. Every deterministic the plaintext. Therefore, if someone knows encryption process can be converted into the technique and the key used, he/she can a probabilistic process by attaching (for ex- decrypt the pseudonym and so disclose the ample) a new random value to the plaintext underlying plaintext. This disclosure may before deterministic encryption and remo- be bound to fulfilling a specific purpose or ving the random value following determi- usage by someone with a specific role. nistic decryption. Another availability option is linkability regarding a relation r. If linkability exists 4.2. Requirements regarding pseudonyms regarding a specific relation, someone can In order to process pseudonymized data, it take two pseudonyms and assess if the may be necessary for the generated pseudo- underlying plaintexts align in that relation nyms to contain certain features of the un- context. Linkability in terms of equality is a derlying plaintext. The features are determi- simple example. Pseudonyms that meet this ned in advance and then made available via availability option make it possible to check the pseudonyms, while further information if the plaintexts underlying two pseudo- about the plaintexts remains hidden. These nyms are identical. Linkability can also be are described as availability requirements bound to specific roles or purposes. for pseudonyms. Pseudonyms which satisfy specific availability requirements are descri- 4.2.2. Role binding bed below as pseudonyms with availability Availability options can be bound to speci- options. Pseudonymization safeguards the fic roles. If different roles are defined in a of the protected data and so system, different availability options can be safeguards the privacy of the data's owner. assigned to the various roles. This makes it Pseudonyms' availability options make part possible to ensure that only people with a of the information in the underlying data given role have access to the information available, thereby ensuring a degree of usa- provided by a given availability option. bility (utility) despite confidentiality. These options therefore provide "utility despite 4.2.3. Purpose binding privacy". Availability options can be restricted to usage for specific purposes. For example, a 4.2.1. Availability requirements system can grant access to a pseudonym One availability option is the disclosability associated with a specific availability option of the plaintext underlying the pseudonym under the condition that evidence of a spe- 19

cific situation pertaining to a purpose has ted technology. Purely technical implemen- been provided. It is possible to use technical tation can ensure the real-time automatic or organizational methods to enforce the enforcement of purpose limitation. purpose binding of availability options. In the case of technical purpose bin- 4.3. Examples of pseudonymization tech- ding, purely technical methods are used niques for implementing availability to ensure such purpose limitation of an requirements availability option without the need for a This section describes examples for crea- person to perform any action. These tech- ting pseudonyms with specific availability nical methods can include checking system options. The equality of the plaintexts un- values (which characterize the purpose) derlying pseudonymization is considered such as system time before approval of an for the purpose of data linkability. availability option. Using threshold schemes is another example. If specific events docu- 4.3.1. Linkable, disclosable pseudonyms mented in data sets occur very frequently, Deterministic encryption algorithms can be the relevant availability option is approved used to create linkable, disclosable pseudo- only when a fixed frequency threshold is nyms. Linkability (even without knowledge exceeded. of the encryption or decryption key) is es- In the case of organizational purpose tablished because these algorithms assign binding, one or more natural persons are identical plaintexts to identical ciphertexts given the power to approve certain avai- (pseudonyms). Disclosability is established lability options. This means that a natural when the decryption key is known. person interacts with the system to assess if the purpose applies. To establish an orga- 4.3.2. Non-linkable disclosable nizational purpose limitation, the relevant pseudonyms pseudonyms can be additionally encrypted Non-linkable but disclosable pseudonyms using a key known only to that person. can be created by means of probabilistic The advantages of technical over or- encryption techniques. Pseudonym linkabi- ganizational purpose binding deserve em- lity is not established because probabilistic phasizing. In contrast to organizational encryption algorithms map identical plain- purpose binding, no user interaction with a texts onto non-identical ciphertexts (pseu- system is necessary. This limits the exploita- donyms). Disclosability is established when tion of power. The underlying trust model the decryption key is known. of a technical purpose limitation can be derived from the security of the implemen- 20 White Paper on Pseudonymization Drafted by the Data Protection Focus Group for the Safety, Protection, and Trust Platform for Society and Businesses in Connection with the 2017 Digital Summit – Guidelines for the legally secure deployment of pseudonymization solutions in compliance with the General Data Protection Regulation –

4.3.3. Linkable non-disclosable pseudonym linkability and corrective mea- pseudonyms sures based on plaintext can be initiated. Deterministic one-way functions (e.g. cryp- Correspondingly, a person assessing tographic hash functions) can be used to the purpose can ensure a disclosability of create linkable but non-disclosable pseu- pseudonyms restricted to a predefined ex- donyms. Linkability is established because ceptional situation by using the decryption deterministic algorithms map identical key only he/she knows for decrypting the plaintexts onto identical result values pseudonym. (pseudonyms). A one-way function is used for preventing the reversal of pseudonymi- 4.3.6. Technical purpose binding zation, i.e. disclosure. Technical purpose binding can also restrict data usage to exceptional situations, for 4.3.4. Role binding example the frequent occurrence of data Availability options can be linked to roles sets with pseudonyms representing the by means of an additional probabilistic same plaintext. Such occurrence would pseudonym encryption with one of the de- be connected to a frequency threshold. In cryption keys assigned only to the respec- this context, a cryptographic secret sharing tive role. This ensures that only the role in scheme 10 can be used which manages a se- question has access to the linkable or dis- cret key for every plaintext value and adds closable pseudonym. It is possible to restrict a unique partial key from the associated disclosure of pseudonyms by ensuring that secret key to the pseudonym every time a the decryption key necessary for disclosure plaintext value is pseudonymized. If the is assigned solely to a particular role. number of pseudonyms related to a plain- text exceeds the defined threshold, the par- 4.3.5. Organizational purpose binding tial keys can be used to determine the sec- Linking a role to the person assessing the ret key, and decryption can be performed. purpose of a data processing can present an organizational purpose limitation regar- 4.4. Technical and organizational ding availability options of pseudonyms. In requirements the case of disclosability of pseudonyms The technical and organizational require- bound to a specific purpose, an exceptional ments called for in the GDPR can be elabo- situation can be recognized due to existing rated as follows.

10 See A. Shamir. How to share a secret. In: Communications of the ACM Bd. 22, ACM, 1979, p. 612–613. 21

n Pseudonymization requires the use of in advance as they cannot factor in the state-of-the-art techniques (see BSI salt value combined with the plaintext. If [German Federal Office for Security in In- the same pseudonyms need to be created formation Technology] or ENISA guide- for the same plaintexts, the same salt va- lines on cryptoprocedures for examples). lues must be used and stored in a suitable These techniques must be replaced from manner. time to time by current techniques (par- n State-of-the-art technical and organi- ticularly in the case of pseudonymized zational measures must be used when data intended for long-term use). This creating and managing (incl. distributing, provides a very high level of security. storing, using, deleting) secret parameters n Pseudonymization must be performed as (keys and salt values). early as possible when processing data. n Depending on the use case, suitable in- n In the case of plaintext data from small tervals (depending on time or data vo- value ranges or with limited diversifica- lume) must be defined for changing the tion within a range, pseudonymization secret parameters (salt values and keys) processes are prone to pseudonym dis- used. closure due to brute force attacks, for n Access to salt values and keys must be re- example using rainbow tables. In these stricted to an absolute minimum of trust- attacks, the attacker calculates a plain- worthy users (need-to-know principle). text-pseudonym allocation table for all n Integration of the pseudonymization (few) possible plaintexts or uses a previ- concept in an IT security management ously created table. The table can be used system (i.e. as per ISO/IEC 27001) to pre- to determine the corresponding plain- vent unauthorized access to pseudony- texts for given pseudonyms. Salt values mized data. can be used to make things difficult for n Pseudonymized data must be deleted the attacker and so reduce the risk. Be- in line with data protection regulations fore running the pseudonymization pro- when the purpose for processing no lon- cess, a salt value is selected according to ger exists. the context (i.e. for each data set) and combined with the plaintext value. This enlarges the value range and value diver- sification, and it is no longer possible to use allocation tables calculated or created 22 White Paper on Pseudonymization Drafted by the Data Protection Focus Group for the Safety, Protection, and Trust Platform for Society and Businesses in Connection with the 2017 Digital Summit – Guidelines for the legally secure deployment of pseudonymization solutions in compliance with the General Data Protection Regulation –

5. Transparency and data subjects’ 5.2. Special conditions regarding rights regarding pseudonymized pseudonymized data data 5.2.1. Information obligations (GDPR 5.1. General transparency requirements Articles 13 and 14) Transparency must be established concer- ning data processing, the processed data, 5.2.1.1. Initial situation the purpose and procedure of data pro- Basic information obligations as per Articles cessing, and many other issues concerning 13 and 14 entail virtually no special condi- data processing (see in particular Articles tions concerning the processing of pseud- 13.1 and 13.2, 14.1. and 14.2, 15.1 and onymized data. Information as per Articles 15.2). 13.1 and 13.2 must be provided to the data The goal of establishing transparency is subject "when personal data relating to to resolve the uncertainties of the data sub- the data subject are collected from the jects regarding data processing, give them data subject". Information as per Articles the tools to protect their data themselves, 14.1 and 14.2 shall be provided "within a and to enable them to exercise their rights reasonable period after obtaining the per- in the context of personal data processing. sonal data" (see Article 14.3.a). Normally, This must be done in a concise, transpa- data may not be available in pseudony- rent, intelligible, and easily accessible mized form when collected, permitting easy form, using clear and plain language (Ar- notification of the data subject. Similarly, ticle 12.1.1.) Suitable, easily understood (by the situation in Article 14.3.b, in which the the data subjects) icons can be used in the personal data is to be used to communi- context of information obligations as per cate with the data subject (i.e. in the case Articles 13 and 14 (Article 12.7). of direct marketing) does not give rise to any special conditions as communicating "with" the data subject obviously precludes eliminating the data's assignment to the data subject. 23

5.2.1.2. Special situation: Disclosing tification when his/her data was collected pseudonymized data to a third or initially used, the data subject can be in- party formed via publicly accessible sources (i.e. In contrast, problems could arise in connec- campaigns on a website or in other media). tion with the situations described in Article 14.3.c and (in particular) in the event of 5.2.2. Data subjects' rights as per Articles further processing for another purpose as 15-22 and 34 per Articles 13.3 and 14.4. If the control- The following conditions apply when satis- ler performed pseudonymization before fying the information obligation in Article disclosure to another recipient or before 15 and data subjects' rights as per Articles intended further processing, the recipient 16-22 after pseudonymization of data. can no longer easily inform the data sub- Special rules apply if the controller is ject as it cannot identify this person wit- not or no longer required (Article 11) to hout the addition of further information. identify the data subject, something that In such situations, it is recommended that is frequently the case following pseudony- the third party processing pseudonymized mization. data provide general information via its own website stating that it processes pseu- a) As per Article 11.1, the controller is not donymized data. obligated to maintain, acquire, or pro- Information as per Articles 13 and 14 of cess additional information in order to the GDPR does not need to be provided on identify the data subject for the sole an individual-by-individual basis: Instead, purpose of complying with the GDPR. this can be handled in advance via a web- site, basic contractual details, or notices, as b) If the controller can prove that it is not arises in particular from Recital 58, senten- able to identify the data subject, it must ces 1-3. Another factor that rules against inform the data subject of this (Article sending individual information is that such 11.2) if possible (i.e. normally in connec- notification (e.g. via e-mail) would over- tion with data subject rights requiring an burden the data subject with information. action to be taken from the data sub- Most data subjects would therefore not take ject). any notice of the information they receive. Again in connection with Articles 13.3 and In situations a) and b), data subject 14.4, if the decision is taken at a later date rights as per Articles 15-20 do not apply. to make further use of the data and the This pertains to rights concerning infor- data subject therefore did not receive no- mation, corrections, completion, deletion 24 White Paper on Pseudonymization Drafted by the Data Protection Focus Group for the Safety, Protection, and Trust Platform for Society and Businesses in Connection with the 2017 Digital Summit – Guidelines for the legally secure deployment of pseudonymization solutions in compliance with the General Data Protection Regulation –

(except when the purpose of processing no advertising facilitated by tracking techno- longer exists, in which case the data must logy, i.e. cookies, mobile identifiers, finger- be deleted), restrictions on processing, noti- printing, etc. is not subject to the right to fication, and data portability. This does not object laid out in Article 21.2 as such tech- apply only if the data subject successfully nology does not enable an individual to be facilitates identification byproviding ad- identified. A data subject would still be able ditional information (Article 11.2, second to object as per Article 21.1 to advertising sentence, last half-sentence). using pseudonymized data. In such a situ- Data subject rights as per Article 21 and ation, the difference to Article 21.2 is that 22 apply instead.11 Accordingly, the data the latter features no further prerequisites, subject can continue to file an objection whereas the data subject has particular to the following types of data processing12 reasons when making an objection as per (Article 21): Article 21.1. The various interests must be weighed up. n Processing pseudonymized data on the Furthermore, the data subject has the basis of legitimate interest (as per Artic- right to not be subject to a decision based le 6.1.f), which also comprises profiling only on automated processing (incl. profi- based on legitimate interest, ling) which produces legal effect concer- n Processing required to perform a task in ning him or her or similarly significantly the public interest, or in the exercise of affects him or her (Article 22). official authority vested in the controller If the data subject's objection is suc- (Article 6.1.e), cessful, his/her original data set is blocked: n Processing for research or historical re- It cannot be processed for the purpose to search purposes, or for statistical reasons which he/she has objected. This blocking is as per Article 89.1 (Article 21.6), unproblematic if the objection takes place n Processing for direct marketing purposes before pseudonymization. If the objection (Article 21.2). takes place after pseudonymization, the controller must first reidentify the data It is unclear what role the right to object subject before it can perform the necessa- to direct marketing plays in the context ry blocking process. However, it is normally of pseudonymization. If direct marketing is not possible to remove the data set from understood as an activity whereby a specific data that has already been pseudonymized data subject is contacted in person (letter or as it is normally a case of large-scale auto- e-mail), such a situation does not arise in mated data processing. the case of pseudonymized data. Therefore, 25

5.2.3. Transferring pseudonymized data to To sum up: It is recommended that third parties controllers planning to further pro- If the controller transfers pseudonymized cess pseudonymized personal data to data to a third party, the recipient must make this processing transparent to check if the data can be considered to be data subjects in advance. They should unidentifiable data as per Article 11. More allow for a right to object which per- than the initial controller, the data recipient mits the future exclusion of a data has the problem of being unable to comply subject's data from pseudonymized with data subjects' rights because it can- further processing after the data sub- not establish a link between the data and a ject has exercised his/her right to ob- person who wants to exercise his/her rights. ject. 5.3. Tracing results to a person Tracing pseudonyms based on more com- Article 34 deals with another data subject plex pseudonymization processes (i.e. using right: The controller must immediately in- a recognized hashing algorithm) to a natu- form a data subject if the personal data ral person is only possible for whoever has breach is likely to result in a high risk to the key used for pseudonymization. the rights and freedoms of natural persons. If the data is to be traced back to a Subject to a review of the specific case, it specific person and if this tracing does not is to a degree likely that the breach will not serve to comply with rights vis-a-vis the result in a high risk for the data subject if data subject, it requires the data subject's the controller has already performed pseu- consent. The data processing controller has donymization. For example, in the event this key and must use it if the data subject of data theft, the culprit will certainly be exercises his/her rights. Depending on the unable to identify the data subjects in the data subject's specific choices, the control- majority of cases. It is recommended that ler must then provide information about the controller notifies the data subjects via the data, correct the data, or delete it. This specified channels. An individual informati- may require the involvement of a service on is not considered necessary. provider that processes data on behalf of the controller.

11 Unlike Article 11.2, Article 12.2 refers not only to data subjects' rights in Articles 15-20 but also to data sub- jects' rights in Articles 21-22. According to Article 12.2, line 1, the controller also seems to therefore have the right to refuse in the events of objections (Article 21) and automated individual decision-making (Article 22). Articles 11.2 and 12.2 are clearly contradictory. 12 This applies to personal data processing and pseudonymized data processing. 26 White Paper on Pseudonymization Drafted by the Data Protection Focus Group for the Safety, Protection, and Trust Platform for Society and Businesses in Connection with the 2017 Digital Summit – Guidelines for the legally secure deployment of pseudonymization solutions in compliance with the General Data Protection Regulation –

6. Application scenarios under the product name of Entertain TV. The company provides customers with a 6.1. Pseudonymization and Entertain TV set-top box for using the product. Statis- (DTAG) tics on viewers' habits are maintained for Deutsche Telekom markets access to tele- various purposes, including obligations vis- vision programs and films via the internet a-vis broadcasters.

6.1.1. Overview The following diagram shows the data flow through the individual system, from the set-top box to the statistics.

Pseudony- Entertain DWH Usage Analaytics Tool mization + Salt / Pepper

Max. retention of Statistics 7 days Database 24 months Usage Data (sequence Salt / Pepper 3 months of numbers)

Reference data

TV-ARCHIV

6.1.2. Data generation analyses, which document activities such as Using the set-top box, i.e. when the con- activation/deactivation, channel changes, sumer uses the system's remote control, information about the programs watched, generates a range of events depending on information about users' recording activi- what button was pressed and the relevant ties, or information about users watching context. These events form the basis of the recorded programs. Corresponding event 27

data sets contain a range of informati- non-disclosable for DWH Usage, the unit on, i.e. about the set-top box (device ID), involved in processing. The pseudonyms the customer's account ID, date/time, and are created using a deterministic crypto- other specific subjects. graphic hashing process (SHA-512) and a uniform pseudonymization salt/pepper 6.1.3. Pseudonymization component. Pseudonym linkability is es- The account ID is a pseudonym for the cus- tablished because deterministic processes tomer, and the device ID is a pseudonym transpose identical plaintexts onto identi- for the associated set-top box. The event cal result values (pseudonyms) when salt/ data sets required for analysis purposes do pepper is identical. The output length of not contain any attributes featuring perso- SHA-512 means that the risk of collisions nal data (of an immediate kind). Managed is negligible (< 10-70). As the DWH Usage separately in organizational terms, there are Department does not have access to the allocation tables which permit the pseudo- pseudonymization salt/pepper, it cannot nyms (account and device IDs) to be linked trace pseudonyms back to people and so to customers or set-top boxes. Access to disclose plaintexts. This also applies if get- these tables would make it possible to ul- ting around a one-way function by creating timately trace the device and account IDs a look-up table, which is possible without to trace back to the customers. Tracing is using or knowing the salt/pepper due to sometimes necessary, i.e. for billing services the limited number of possible input values. as per the contract. Ultimately, the pseudonymization pro- However, as no party wants to trace cess used means that no Deutsche Tele- details back to plaindata for the purpose kom Group employee can view, analyze, of generating statistics, account and device or transfer anyone else information about IDs are subjected to additional pseudony- specific customer's usage behavior. mization before processing. In this instance, pseudonymization takes place within the Data Warehouse Acquisition Layer (DWH ACL) unit, and processing (statistics gene- ration) takes place in Date Warehouse Usa- ge (DWH Usage), a separate unit (see chart below). The underlying pseudonymization pro- cess means that statistics are generated using pseudonyms that are linkable but 28 White Paper on Pseudonymization Drafted by the Data Protection Focus Group for the Safety, Protection, and Trust Platform for Society and Businesses in Connection with the 2017 Digital Summit – Guidelines for the legally secure deployment of pseudonymization solutions in compliance with the General Data Protection Regulation –

The following chart outlines the underlying organizational and authorization concept, and the various units involved.

Organizational and authorization concept

Products & Innovation T-Systems IT-BD

Microsoft DWH AcL DWH Usage Department only (= old Set-Top-Box) sees statistics Huawei Makes a hash of Hashed information (= new Set-Top-Box) Salt / Pepper + ID is aggregated

Provide account ID No functional users No access to DWH and device ID AcL and does not Has only access to No access to DWH know Salt / Pepper statistics, therefore No functional users Usage numbers Only the application Disclosure of No access to the reads out account ID und DWH AcL, DWH Usa- Salt / Pepper device ID ge and Salt/Pepper not possible No link to the ID

T-Systems ICTO

Technical Operations AcL

Issues Salt / Pepper

No access to other systems 29

6.1.4. Statistics generation All attributes traceable to the end custo- set-top boxes watched a certain channel at mer are pseudonymized using an account a certain time. The anonymous statistics no or device ID. Payments are possible as lin- longer contain account and device IDs or kable pseudonyms are used. For example, the generated pseudonyms, which prevents this means that it is possible to answer a the statistical figures from being traced question about how many households or back to the hashed IDs.

Deutsche Telekom must meet certain obligations towards broadcasters, so it trans- fers only anonymized statistics on viewers' user behavior, e.g. market share, using relative figures as illustrated in the following table.

14%

August 2016 12% September 2016

October 2016 10% November 2016

8%

6%

4%

2%

0 Broadcasters 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 30 White Paper on Pseudonymization Drafted by the Data Protection Focus Group for the Safety, Protection, and Trust Platform for Society and Businesses in Connection with the 2017 Digital Summit – Guidelines for the legally secure deployment of pseudonymization solutions in compliance with the General Data Protection Regulation –

6.1.5. Opt-out sufficient, this data is augmented with ad- Data protection notifications inform every ditional characteristics. The desired charac- Entertain customer that data is gathered for teristics that were previously not present, statistical purposes. Deutsche Telekom uses e.g. age and estimated income bracket, are e-mails and pop-ups in Entertain itself to often obtained from external sources. inform customers of this before introducing To select a target group, a range of cha- this analysis solution. racteristics such as age, previous purchasing Every customer has the option of ob- behavior, etc. can be used to assess the jecting (opt out) to his/her pseudonymized likelihood of members in this group bu- usage data being collected and analyzed. ying a certain product. Similarly, offline He/she can use his/her set-top box to per- (direct mailing, phone-based advertising, and form this opt-out. This used to require the e-mail advertising) and online options input of a PIN number regarding the pre- are available when defining target groups vious product (old Microsoft set-top box), according to their affiliation with a speci- but it is no longer necessary for the new fic customer segment. Pseudonymized data product (new Huawei set-top box). By op- plays an important role in campaigns of ting out, the customer's usage data is not these kinds. Further details are given below. used either for a pseudonymized usage pro- file or for anonymous statistics. Customer can also use conventional communication 6.2.1.1. Using pseudonymized data for channels to inform Telekom that they want comparing and matching data to exercise their opt-out right. with external sources – creating an analysis database When augmenting the existing data with 6.2. Direct marketing externally sourced data, the first step is to 6.2.1. Promotional campaign compare the databases of the advertiser and The objective of an advertising campaign is data provider and then match them. The re- to deploy advertisements within a certain levant steps in this process are as follows: period of time in a certain target market or 1. A "data hygiene process" (standardiza- for a certain target group in order to incre- tion, homogenization, parsing, checking ase sales or increase awareness of a product for duplicates, etc.) is used between the or service. Analyses based on pre-existing databases to correct characteristics data are necessary to identify a target mar- which instantly identify data subjects ket or target group for these campaigns. (e.g. name and address) and match their If an advertiser's existing data pool is not quality levels before saving. 31

2. The same algorithm is used to pseudo- 6.2.1.2. Interest-based advertising: nymize the processed name and address Using pseudonymized data for fields in both databases to form a lin- selecting data kable ID number. The deployed pseud- 1. The advertiser's analysts select the ID onymization process varies between a numbers for a promotional campaign simple "memnum", i.e. a 17-figure num- from the in-house analysis database. ber consisting of the first three letters – They are selected according to a simple (distributed throughout the memnum) selection of characteristics (i.e. age 65+ of the first name, last name, street and and city = "Frankfurt am Main" or "Ber- house number, and zipcode, and a pro- lin" or "Munich" or "Stuttgart"), Howe- prietorially developed pseudonymization ver, more complex statistical models can algorithm. The system then deletes the also be used. direct identifiers. 2. The selected ID numbers are matched 3. The ID numbers are used to compare with the reference file which stores the the databases of the advertiser and data plaindata (name, address, etc.) and ID supplier against each other. If the same numbers. pseudonyms (ID numbers) are found 3. The advertising material is then dis- ("linkability"), the relevant data sets can patched to the selected customers (via be matched. A company can use this e-mail, direct mailing, display, etc.). procedure to save additional characte- ristics in an external source's marketing file.

The result is an analysis database compri- sing in-house and purchased characteris- tics and ID numbers, and without names or addresses. This database can be used to create a customer profile (i.e. age group of 65+ and living in a city). This information can then be used to address people with similar profiles (i.e. by selecting only people aged 65+ and living in cities from the con- sumer database). 32 White Paper on Pseudonymization Drafted by the Data Protection Focus Group for the Safety, Protection, and Trust Platform for Society and Businesses in Connection with the 2017 Digital Summit – Guidelines for the legally secure deployment of pseudonymization solutions in compliance with the General Data Protection Regulation –

6.2.1.3. Gauging the success of a cam- 1. The data owner concludes an agreement paign with pseudonymized data with the agency/data broker regarding A promotional campaign's success is nor- order data processing. mally evaluated after it finishes. Reactions 2. A file with selection characteristics and to the campaign, i.e. information requests pseudonyms (ID numbers) but without or purchases, are saved, and the advertising direct identifiers is made available to the message and success of the advertising agency/data broker. channel are evaluated. Reactions are saved 3. The agency/data broker selects the pseu- as raw data in the advertiser's CRM system donyms (ID numbers) using the selec- for future analyses. Just as when creating tion characteristics for its customers and an analysis database, the direct identifiers sends this selection to the data owner. are removed in this instance before the data 4. The data owner transfers the data is made available to the analysis unit. straight to the customers of the agency/ data broker, which contacts its end cus- 6.2.2. Data marketing on behalf of an tomers in writing (or it uses a lettershop agency/data broker for sending printed advertising to end If commercial data is marketed, the data customers). owners (the entities responsible for collec- ting, processing, and using the data) integ- 6.2.3. Using display advertising rate a control mechanism so that every time For online services such as publishers, so- that data is to be deployed or sold, such an cial media and e-retailers, advertising is an action may only be performed by the hired important revenue generator. In some sec- order data marketer (agency/broker), and tors, it is the sole source of income, and that the data owner is aware of this. This it enables internet users to use services ensures that the necessary licensing fee is free of charge. Advertisements should be paid correctly. Data marketing usually con- of relevance to users and not seem like a sists of the following steps: nuisance. The following example is one of the variants possible for interest-based dis- play advertising, i.e. when an advertisement appears automatically on the screen of a user's device. 33

1. A user registers on or logs into the online 7. The pseudonym (P1), the agency charac- service provider's website. teristics, and a source ID are made availa- 2. The online service pseudonymizes the ble to a real-time bidding market for user's contact data (P1) by creating a selection via an advertising place plat- user ID. form – a demand side platform (DSP), 3. The online service installs a cookie with sell side platform (SSP)13 or other. This the pseudonym/user ID (P1) on the user's enables an advertiser such as a garden computer. center to target people with, for examp- 4. In a parallel process, the online service le, a house that has a garden. The source sends the pseudonym/user ID number ID identifies the online service provider (P1), the user's name, and the user's and ensures that the data stocks can be address to an agency at regular intervals kept apart, and that separate invoicing as part of its order data processing ac- is possible regarding the online services. tivities. 8. If the DSPs/SSPs or another party in 5. On behalf of the online service, the agen- the real-time bidding market identifies cy processes the data (pseudonym/user a cookie with a pseudonym (P1), it can ID number (P1), name, address) provided use this cookie to zero in on the relevant separately in file form and generates an user's display in an anonymized but still internal pseudonym (P2) based on this targeted manner. information. The system then deletes all information apart from the two pseudo- Just like with written advertisements, the nyms (P1 and P2). user is able to make an objection against 6. The agency now has a file with pseudo- interest-based advertising. The sectoral nyms (P2) and consumers' characteristics self-regulation of the EDAA (European (estimated/identified). Using the pseud- Digital Advertisement Alliance), operated onym (P2), further data is added to the in Germany by DDOW (Deutsche Daten- consumer characteristics (estimated cha- schutzrat Online-Werbung) offers users this racteristics and affinities) saved by the option in the form of an icon.14 agency for a user. The pseudonym (P2) is then deleted.

13 In their role as technical service providers, DSPs help advertisers to find the right advertising space to their target group at predetermined conditions: They are hub platforms that enable the efficient sale of advertising inventory via a range of channels (ad networks, ad exchanges, etc.). SSPs are different in that they handle the sale of advertising spaces. 14 For further information, see http://www.youronlinechoices.com/de/nutzungsbasierte-online-werbung/. 34 White Paper on Pseudonymization Drafted by the Data Protection Focus Group for the Safety, Protection, and Trust Platform for Society and Businesses in Connection with the 2017 Digital Summit – Guidelines for the legally secure deployment of pseudonymization solutions in compliance with the General Data Protection Regulation –

6.3. Pseudonymization in the guidelines that almost always takes the form of me- on data protection in medical dical details or at least contains this kind research – TMF's generic of information. Data protection also there­ solutions 2.0 fore concerns doctor-patient confidentiali- ty if data is sourced from treatment situa- 6.3.1. Overview: TMF's data protection tions. As medical confidentiality is part of guideline a doctor's professional rights and can be TMF15 first published its data protection applied along with data protection rights, guideline for medical research projects in TMF's data protection guideline can only 2003, and it is now in its second edition allude to this legal framework to a lesser (2014). The idea for developing the gui- degree. deline was inspired by the creation of re- The guideline has a modular structure. search networks throughout Germany in The individual modules depict typical situ- the 1990s. These networks were faced with ations so that these can be used as generic uncoordinated data protection regulations foundations for developing suitable specific issued by the different states in Germany data protection concepts. However, there as well as the national government, all of is a certain interdependence between the which were in turn overlaid by EU laws. The modules, as the data is often used in dif- guideline's goal was to identify generic so- ferent contexts or passed from one context lutions for dealing with complex situations. to another. Medical research requires sensitive data

15 Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. 35

Medical research areas: The TMF data protection guideline’s modules

Recruitment, data transfer Clinical research Controlled clinical trials Data transfer Samples

Recruitment Recruitment, Results, Recruitment, Samples data data analytical results

Results Data Patient distant Biobanks research Data

Analytical results

The special feature of the TMF's data pro- dation, a data protection working group tection guidelines is the fact that the gene- operating within it also contributes to sup- ric concepts were agreed in extensive ne- port for research institutes implementing gotiations with Germany's data protection the guideline. As part of a peer review, the authorities, and they have been recommen- working group presents a data protection ded by the participants at the conference concept that members study, thereby iden- of Germany's state and national data pro- tifying weaknesses and proposing additio- tection officials since 2003. This establishes nal measures. The concept may be edited, a level of legal security for researchers as and the members vote on its conformity. the TMF's guidelines guarantee that a data The relevant data protection authority then protection concept meets the data pro- normally recognizes this vote as fit for pur- tection requirements. Since TMF's foun- pose. 36 White Paper on Pseudonymization Drafted by the Data Protection Focus Group for the Safety, Protection, and Trust Platform for Society and Businesses in Connection with the 2017 Digital Summit – Guidelines for the legally secure deployment of pseudonymization solutions in compliance with the General Data Protection Regulation –

6.3.2. Pseudonymization as a technical seem slight, but it has considerable reper- and organizational measure for cussions. While the Federal Data Protection information-al separation of powers Act merely requires re-identification to be The "informational separation of powers” difficult, the GDPR requires it to be impos- is a cornerstone of TMF's generic data pro- sible, which is identical to anonymization. tection concept. Pseudonymization is the Achieving de facto anonymity for pseudo- main tool for separating identifying data nymized data sets raises the requirements on the one hand and producing research considerably. However, within the context data on the other. It serves as a technical of biomedical research, this is unlikely to and organizational measure for protecting be possible as the data sets are very often test persons. large and complex, and, increasingly, they During research activities, data proces- feature genomic information, which would sing is generally performed on a consensual make data that is de-identified to the point basis, i.e. test persons receive information, of anonymity practically worthless for re- normally during the data's collection, ex- search purposes. Furthermore, the data's plaining what the objectives and risks are. anonymity is not a feature that is secure This takes place before the data is used, and in the long term – it depends on a range it is known as "informed consent". The data of factors, including unknown outside ad- can be used while it still reveals a link to ditional information, some of which might the data subject: Anonymization is neces- not be available until some time in the fu- sary only in connection with the principle ture. As a result, research data is generally of data minimization but is not required for not actually anonymous but instead largely the data's usage. de-identified so its usability can be main- However, the GDPR16 interprets pseu- tained for the purpose in question. Data donymization differently from the Federal from which immediate identifiers have been Data Protection Act17. The difference may removed normally contains a lot of additi-

16 Article 4.5 of the GDPR defines pseudonymization as "the processing of personal data in such a manner that the personal data can no longer be attributed to a specific data subject without the use of additional infor- mation, provided that such additional information is kept separately and is subject to technical and organizati- onal measures to ensure that the personal data is not attributed to an identified or identifiable natural person". 17 Section 3.6a of the Federal Data Protection Act (6a) states that in pseudonymization, the name and other identifying features are replaced with code, so as to make it impossible or far more difficult to identify the affected individual. 37

onal details that, combined, do not prevent The following model is based on the Fe- re-identification. Re-identification is me- deral Data Protection Act's interpretation of rely made more difficult, but not de facto pseudonymization. This model can also be impossible. Pseudonymization is, however, applied in line with the GDPR, but details just one of the options for protecting test may needs adjusting because the remaining persons. Others are conceivable and, given data sets generally do not meet the pseu- the situation, practical. The question arises donymization requirements laid out in the here if a new term needs to be found for GDPR. data that is "encoded" but are not pseudo- nymized as per the GDPR.

Clinical Study module module Practitioner Investigator

IDAT MDAT IDAT PID MDAT Identity Clinical DB management Study DB

PID MDAT Patient PID MDAT list PID IDAT PID LabID LabID

Pseudonymization service PID PSN Research DB Analytics DB PSN MDAT LabID AnaDAT LabID Research Biobank module module

PID Patient identifier (pseudonym within the clinical database) ·MDAT Medical data · IDAT Identifying data of a patient · LabID Laboratory data/sample number · PSN Pseudonym in the research context · AnaDAT Analysis data 38 White Paper on Pseudonymization Drafted by the Data Protection Focus Group for the Safety, Protection, and Trust Platform for Society and Businesses in Connection with the 2017 Digital Summit – Guidelines for the legally secure deployment of pseudonymization solutions in compliance with the General Data Protection Regulation –

A pseudonym must not be revealing: It may >> It must be an independent legal entity not contain elements that reveal identifi- (own permanent legal form). cation data, i.e. dates of birth, initials, etc. >> It must have its own separate premises Pseudonyms must only be allocated using and staff. an assignment list or rule which must be >> It must be contractually obligated guarded well. to implement the data protection concept. 6.3.3. Involving a trustee >> It must be otherwise independent. Involving a trustee is recommended for complying with to the order to store data Order data management by the custodian separately as per Section 4.5 of the GDPR. for the research institute does not meet The TMF guideline has considered this as these requirements. useful as per existing data protection law. A trustee (TTP) that stores patients' identi- fication data (IDAT) and pseudonyms (PID) must meet certain requirements.

Contract data processing (i.e. according to Sect. 11 FDPA)

Power to instruct Researcher Trustee Working relationship

Sphere of responsibility

Informational separation of powers (Controller-to-Controller)

Bound to rules Researcher Trustee i.e. cooperation agreement

Sphere of responsibility 39

Example: Key elements of the >> IDAT and PID both visible only for clinical model custodian >> Access controlled via 1x token in treat- >> Central data pool (e.g. EHR repository ment context or data warehouse) >> No public use of data pool (no access >> Online access to all data for staff per- and no search function from outside) forming treatment (IDAT & MDAT) >> External research only with exported >> Saving MDAT under pseudonym (PID) data >> Pseudonym (PID) known only to data warehouse staff and custodian The process of transferring a clinical data- >> MDAT and IDAT both visible only at base to a treatment database available to client workstations (in treatment con- medical staff for further patient treatment text) and not on application server and also containing data for research pur- poses consists of the following steps:

Treatment database

The treatment database contains 1. Separation of the clinical data the clinical findings from patients into a treatment database and patient list PID Ticket LabID Anamnesis Finding Labor. 1 Labor. 2 ?§$&%/? ABCDE Abdominal pain Tumor (abdomen) 1,5 2,5 (&%$§&§ EDCBA Headache High blood pressure 2,5 1,5

Clinical The clinical database is split up database into two parts

Name First name Date of birth Anamesis Finding Laborat. 1 Laborat.2 Müller Fritz 01.01.1950 Abdominal pain Tumor (abdomen) 1,5 2,5 Huber Hans 02.02.1950 Headache High blood pressure 2,5 1,5

Patient list Name First name Date of birth PID Müller Fritz 01.01.1950 ?§$&%/? Huber Hans 02.02.1950 (&%$§&§

The patient list contains the iden- tifiable information from patients 40 White Paper on Pseudonymization Drafted by the Data Protection Focus Group for the Safety, Protection, and Trust Platform for Society and Businesses in Connection with the 2017 Digital Summit – Guidelines for the legally secure deployment of pseudonymization solutions in compliance with the General Data Protection Regulation –

2. Referencing via a pseudonym Treatment database

The treatment database contains the clinical findings from patients

PID Ticket LabID Anamnesis Finding Laborat. 1 Laborat. 2 ?§$&%/? ABCDE Abdominal pain Tumor (abdomen) 1,5 2,5 (&%$§&§ EDCBA Headache High blood pressure 2,5 1,5

Patient Name First name Date of birth PID Müller Fritz 01.01.1950 ?§$&%/? PID list Huber Hans 02.02.1950 (&%$§&§ ?§$&%/? The patient list contains identi­ fiable information from patients (&%$§&§

The patient list and the treatment database The patient list and the treatment database reference one another using a shared random are logically and physically seperated. They secret identifier(PID) . are administered independently. 41

3. Total data access for doctor Treatment database providing treatment The treatment database contains 12345 all clinical findings from patients Practitioner PID Ticket Labor ID Anamnesis Finding Labor 1 Labor 2 ?§$&%/? ABCDE Abdominal pain Tumor (abdomen) 1,5 2,5 (&%$§&§ EDCBA Headache High blood pressure 2,5 1,5

Finding

Name PID A practitioner‘s request will be First name Ticket transfered via an encrypted line to Date of birth the patient list.

The patient wil be searched in the Ticket database. When he/she is found a unique random number is genera- ted (ticket). The ticket for access will be com- municated to the workstation of the practiti- oner (without the PID) and to the treatment Patient Name First name Date of birth PID Müller Fritz 01.01.1950 ?§$&%/? database (together with the PID). list Huber Hans 02.02.1950 (&%$§&§

The patient list contains identifia- The ticket for access will be stored temporarily ble information from the patient in the data set together with the PID but not in the patient database.

The practitioner can access the requested in- formation from the treatment database using the ticket for access.

The ticket for access will be deleted in the treatment database after the practitioner‘s re- quest or when the request has timed out. 42 White Paper on Pseudonymization Drafted by the Data Protection Focus Group for the Safety, Protection, and Trust Platform for Society and Businesses in Connection with the 2017 Digital Summit – Guidelines for the legally secure deployment of pseudonymization solutions in compliance with the General Data Protection Regulation –

6.3.4. Repseudonymizing by pseudony- phic procedure to transform the PID from mizing service (custodian) when the patient list into a pseudonym PSN that exporting data to research module is used as identification in the research da- The goal of the pseudonymization service is tabase. As the pseudonymization service to provide particular protection for data in will never need or even see the medical data a research database created for long-term (MDAT) again, these are transferred using storage. This is done by using a cryptogra- asymmetrical encryption.

PID PSN Service Sender Recipient provider

encrypted user data b) User data (MDAT) is communicated

PID PSN Service Sender Recipient provider Access ticket Access ticket

Acess ticket + user data

Pseudonymization is a purely machine- tervention. Data can only be accepted from based procedure that requires no staff in- authorized senders. Notes 43