The Privacy Policy Landscape After the GDPR
Total Page:16
File Type:pdf, Size:1020Kb
Proceedings on Privacy Enhancing Technologies ; 2020 (1):47–64 Thomas Linden, Rishabh Khandelwal, Hamza Harkous, and Kassem Fawaz* The Privacy Policy Landscape After the GDPR Abstract: The EU General Data Protection Regulation 1 Introduction (GDPR) is one of the most demanding and comprehen- sive privacy regulations of all time. A year after it went For more than two decades since the emergence of the into effect, we study its impact on the landscape of privacy World Wide Web, the “Notice and Choice” framework has policies online. We conduct the first longitudinal, in-depth, been the governing practice for the disclosure of online and at-scale assessment of privacy policies before and after privacy practices. This framework follows a market-based the GDPR. We gauge the complete consumption cycle of approach of voluntarily disclosing the privacy practices and these policies, from the first user impressions until the com- meeting the fair information practices [25]. The EU’s recent pliance assessment. We create a diverse corpus of two sets General Data Protection Regulation (GDPR) promises to of 6,278 unique English-language privacy policies from in- change this privacy landscape drastically. As the most side and outside the EU, covering their pre-GDPR and the sweeping privacy regulation so far, the GDPR requires post-GDPR versions. The results of our tests and analyses information processors, across all industries, to be trans- suggest that the GDPR has been a catalyst for a major parent and informative about their privacy practices. overhaul of the privacy policies inside and outside the EU. This overhaul of the policies, manifesting in extensive tex- tual changes, especially for the EU-based websites, comes at mixed benefits to the users. Research Question While the privacy policies have become considerably longer, Researchers have conducted comparative studies around our user study with 470 participants on Amazon MTurk the changes of privacy policies through time, particularly 1 indicates a significant improvement in the visual represen- in light of previous privacy regulations (e.g., HIPAA and 2 tation of privacy policies from the users’ perspective for GLBA ) [1, 3, 5, 26]. Interestingly, the outcomes of these the EU websites. We further develop a new workflow for studies have been consistent: (1) the percentage of websites the automated assessment of requirements in privacy poli- with privacy policies has been growing, (2) the detail-level cies. Using this workflow, we show that privacy policies and descriptiveness of policies have increased, and (3) the cover more data practices and are more consistent with readability and clarity of policies have suffered. seven compliance requirements post the GDPR. We also The GDPR aims to address shortcomings of previous assess how transparent the organizations are with their pri- regulations by going further than any prior privacy reg- vacy practices by performing specificity analysis. In this ulation. One of its distinguishing features is that non- analysis, we find evidence for positive changes triggered by complying entities can face hefty fines, the maximum of 20 the GDPR, with the specificity level improving on average. million Euros or 4% of the total worldwide annual revenue. Still, we find the landscape of privacy policies to be in a Companies and service providers raced to change their pri- th transitional phase; many policies still do not meet several vacy notices by May 25 , 2018 to comply with the new reg- key GDPR requirements or their improved coverage comes ulations [9]. With the avalanche of updated privacy notices with reduced specificity. that users had to accommodate, a natural question follows: What is the impact of the GDPR on the landscape of Keywords: GDPR, Privacy, Security, Automated Analysis online privacy policies? of Privacy Policy, Polisis Researchers have recently started looking into this ques- DOI 10.2478/popets-2020-0004 tion by evaluating companies’ behavior in light of the Received 2019-05-31; revised 2019-09-15; accepted 2019-09-16. GDPR. Their approaches, however, are limited to a small number of websites (at most 14) [8, 32]. Concurrent to our work, Degeling et al. [9], performed the first large- scale study focused on the evolution of the cookie consent Thomas Linden: University of Wisconsin, E-mail: tlin- notices, which have been hugely reshaped by the GDPR [email protected] (with 6,579 EU websites). They also touched upon the Rishabh Khandelwal: University of Wisconsin, E-mail: [email protected] Hamza Harkous: École Polytechnique Fédérale de Lausanne, E-mail: [email protected] *Corresponding Author: Kassem Fawaz: University of Wisconsin, E-mail: [email protected] 1 The Health Information and Portability Accountability Act of 1996. 2 The Gramm-Leach-Bliley Act for the financial industry of 1999. The Privacy Policy Landscape After the GDPR 48 growth of privacy policies, finding that the percentage of D. Compliance (Sec. 8). We design seven queries that sites with privacy policies has grown by 4.9%. codify several of the GDPR compliance metrics, namely those provided by the UK Information Commissioner (ICO). The GDPR’s effect is evident; for both of our an- Methodology and Findings alyzed datasets, we find a positive trend in complying Previous studies have not provided a comprehensive an- with the GDPR’s clauses: substantially more companies swer to our research question. In this paper, we answer improved (15.1% for EU policies and 10.66% for Global this question by presenting the first on-scale, longitudinal policies, on average) on these metrics compared to those study of privacy policies’ content in the context of the that worsened (10.3% for EU policies and 7.12% for Global GDPR. We develop an automated pipeline for the collec- policies, on average). tion and analysis of 6,278 unique English privacy policies E. Specificity (Sec. 9). Finally, we design eight queries by comparing pre-GDPR and post-GDPR versions. These capturing how specific policies are in describing their data policies cover the privacy practices of websites from dif- practices. Our results show that providers are paying spe- ferent topics and regions. We approach the problem by cial attention to make the users aware of the specific data studying the change induced on the entire experience of collection/sharing practices; 25.22% of EU policies and the users interacting with privacy policies. We break this 19.4% of Global policies are more specific in describing experience into five stages: their data practice. Other providers, however, are attempt- A. Presentation (Sec. 4). To quantify the progress in ing to cover more practices in their policies at the expense privacy policies’ presentation, we gauge the change in user of specificity; 22.7% of EU policies and 17.8% of Global perception of their interfaces via a user study involving 470 policies are less specific than before. participants on Amazon Mechanical Turk. Our results find Building on the above, we draw a final set of takeaways a positive change in the attractiveness and clarity of EU- across both the time and geographical dimensions (Sec. 12). based policies; however, we find that outside the EU, the visual experience of policies is not significantly different. B. Text-Features (Sec. 5). We study the change in 2 GDPR Background the policies’ using high-level syntactic text features. Our analysis shows that both EU and Global privacy poli- The European Union began their expedition into privacy cies have become significantly longer, with (+35%, +25%) legislation since the 1990s. The Data Protection Direc- more words and (+33%, +22%) more sentences on average tive (95/46/EC) [11] set forth a set of principles for data respectively. However, we do not observe a major change protection as a preemptive attempt to safeguard EU con- in the metrics around the sentence structure. stituents in a data-heavy future. This directive was further We devise an approach, inspired by goal-driven require- amended with the ePrivacy Directive (2002/58/EC) [10] to ments engineering [35], to evaluate coverage, compliance, enhance the privacy of high-sensitivity data-types such as e- and specificity in the privacy policies. While previous lon- communications data. While the two directives maintained gitudinal studies either relied on manual investigations some focus on protecting user privacy, article 1, paragraph or heuristics-based search queries [1, 3, 9], we build on 2 of the Data Protection Directive very clearly states that the recent trend of automated semantic analysis of poli- "Member States shall neither restrict nor prohibit the free cies. We develop a total of 24 advanced, in-depth queries flow of personal data between Member States for reasons that allow us to assess the evolution of content among connected with the protection afforded under paragraph 1" the set of studied policies. We conduct this analysis by [11]. Thus, while the set of principles intended to increase building on top of the Polisis framework (Sec. 6), a recent user privacy, the primary focus of the legislation was to system developed for the automated analysis of privacy prevent the inhibiting of personal data flow. Furthermore, policies [12]. Following this approach, we perform a deeper the directives designated the Member State where a par- level of semantic analysis that overcomes the limitations ticular controller is located as the supervising authority. of keyword-based approaches. Naturally, the decentralized approach led to a wide vari- C.