Department of Law

Spring Term 2020

Master Programme in Intellectual Law Master’s Thesis 30 ECTS

Text and Data Mining in EU Law

Author: Gabriella Svensson Supervisor: Kacper Szkalej

1 Table of Contents INTRODUCTION ...... 3 Subject and Purpose ...... 3 Material and Method ...... 4 Delimitations ...... 5 Outline ...... 5 1. A BRIEF INTRODUCTION TO TEXT AND DATA MINING ...... 7 2. FUNDAMENTAL EU COPYRIGHT LAW ...... 11 2.1 Protectable Subject Matter and Exclusive Rights ...... 11 2.2 The DSM Directive and TDM ...... 14 3. WHY AND HOW MIGHT COPYRIGHT BE AN OBSTACLE FOR TDM ...... 21 3.1 Beneficiaries ...... 22 3.2 Lawful Access ...... 26 3.3 Retrieval and Analysis ...... 37 3.4 Sharing Results and Spreading Knowledge ...... 38 3.5 Storage and Verification ...... 40 4. CONCLUSION ...... 42 Bibliography ...... 46 Legislation ...... 46 Case Law ...... 46 EU Publications ...... 46 Reports ...... 47 Books ...... 47 Articles ...... 48 Other publications ...... 50

2 INTRODUCTION

Text and data mining (TDM) can be a useful tool in such diverse fields as scientific research, journalism, culture and not least training of artificial intelligence (AI) and its importance is likely to only grow in the future. Despite its huge potential there are many indicators that copyright law restricts use of TDM – keeping users from optimal application. Copyright law should foster innovation and creativity and when it risks having the opposite effect it needs to be well motivated.

In the summer of 2019 the Directive on Copyright and in the Digital Single Market (the DSM Directive)1 was adopted and provided the EU with two new copyright exceptions for TDM, which along with the rest of the directive are currently being implemented into national law. Given the recent changes in European copyright law concerning TDM and the technique’s promising usefulness it is now relevant to investigate how they interrelate.

Subject and Purpose

The purpose of this thesis is to describe whether and to what extent copyright can be an obstacle for TDM with focus on the recent changes in EU law, critically comment what has changed following the new directive and what is still missing for an efficient application of TDM. Efficient application is for the purpose of this thesis not to be understood as economic efficiency, but rather practical application within the boundaries of current framework with a satisfactory level of legal certainty.

The thesis will answer the following research questions: • Who may benefit from the exceptions in the DSM Directive? • How may contractual provisions restrict the efficient application of TDM? • How may technological protection measures (TPM) restrict the efficient application of TDM?

1 Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market and amending Directives 96/9/EC and 2001/29/EC (DSM Directive) [2019] OJ L 130, 17.5.2019, p. 92–125

3 • To what extent does the legal framework permit publication of the result including parts of the input material? • To what extent does the legal framework permit storage of the copies generated during the TDM process? • How well does the DSM Directive meet its objectives of increased legal certainty and harmonisation? The above questions will be discussed throughout the text and critically analysed in search of inconsistencies and practical problems that might arise when using TDM.

Material and Method

The main focus will lie on EU copyright law with examples from states in and outside of Europe where suitable. The main union law to be discussed is: • Dir. 2019/790 on Copyright and Related Rights in the Digital Single Market (DSM Directive) • Dir. 2001/29/EC on the harmonisation of certain aspects of copyright and related rights in the information society (InfoSoc)2 • Dir. 96/9/EC on the Legal Protection of Databases ()3

The thesis will be based on a qualitative method through analysis of relevant provisions in the above legal texts and study of scholarly articles and reports commenting the drafting of the DSM Directive or evaluating the final version as well as related legal questions. Stakeholder and interest organisation views will be part of the material to reflect practical issues that might arise but will only be given little space in order to avoid giving the lobbying activity that revolved the drafting process of the directive too much focus. Finally, it is not the author’s choice to exclude EU case law from the material, but relatively few questions regarding the admissibility of TDM under the abovementioned legislation have been referred to the CJEU4 – possibly as a result of an undeniable level of uncertainty surrounding the

2 Directive 2001/29/EC of the European Parliament and of the Council of 22 May 2001 on the harmonisation of certain aspects of copyright and related rights in the information society (InfoSoc) [2001] OJ L 167, 22.6.2001, p. 10–19 3 Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases (Database Directive) [1996] OJ L 77, 27.3.1996, p. 20–28 4 Court of Justice of the

4 relationship between copyright and TDM reducing the motivation to test the law. As a result, there will be few references made to EU case law throughout the following text.

Delimitations

It should be noted that TDM involving a minimum of copying or use of techniques crawling though data and processing each work separately could be performed without infringing copyright or database law5, but the objective of this thesis is not to offer technical solutions to a legal problem. Rather, the choice of technique should be based on what is optimal to reach the desired result and not dictated by law. Similarly and for the same reason, the thesis will only describe the TDM process to the extent required to appreciate the legal discussion, limiting the technical detail to a minimum.

The main focus point for the thesis is copyright exceptions6, hence the rightholder’s exclusive rights will only be described to provide the right context. Article 3 of the DSM Directive applies to cultural heritage institutions in addition to scientific research organisations, but the thesis is limited to discussing the latter as a beneficiary only. Finally, other legal areas such as data protection and the General Data Protection Regulation (GDPR)7 can naturally hinder TDM too, but only copyright law will be touched upon.

Outline

The thesis is divided into four chapters beginning with a brief introduction to text and data mining presenting the most central steps in the process and when it can be applied. The second chapter handles the fundamental EU copyright law that will form the basis for the following legal discussion as well as approaching the question of protectable subject matter. The third chapter forms the main body of the thesis and serves to identify legal barriers

5 Christophe Geiger, Giancarlo Frosio and Oleksandr Bulayenko, ‘Text and Data Mining: Articles 3 and 4 of the Directive 2019/790/EU’ (Centre for International Studies (CEIPI) 2019) 2019–08 7 accessed 26 February 2020; Maria Bottis and others, ‘Text and Data Mining in the EU Acquis Communautaire Tinkering with TDM & Digital Legal Deposit’ (2019) 12 Erasmus Law Review 179, 179. 6 For practical reasons “exceptions and limitations” are referred to as “exceptions” throughout the thesis. 7 Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation) [2016] OJ L 119, 4.5.2016, p. 1–88

5 causing copyright law to hinder the efficient application of TDM. The issues are presented in the order they appear at the various stages of the TDM process and solutions offered by community law is presented where available. The final chapter will provide the conclusion, including observed shortcomings in applicable EU copyright law and suggestions aiming to improve the efficient application of TDM.

6 1. A BRIEF INTRODUCTION TO TEXT AND DATA MINING

The purpose of this chapter is to define TDM within the scope of the thesis and to shortly present the stages of the TDM process aiming not at a full technical description of TDM but a concise introduction necessary to appreciate the legal discussion. To illustrate that TDM can be of great use for the society and that, subsequently, not only traditionally defined scientific research purposes would benefit from a strong exception, the chapter ends with examples of the diverse application areas for TDM.

The definition of TDM8 used throughout this thesis is the act of, using computer code, analysing and recombining large amounts of digital information (hereinafter referred to as “input material”) in order to identify new patterns and associations and with the objective to extract knowledge.9 This corresponds largely with the definition provided by the DSM Directive.10 A human with the same ambition would likely apply a verification-based approach, i.e. to work out a hypothesis which they subsequently seek to confirm though testing. TDM means applying a discovery approach – that is, investigating multiple possible relationships in a pre-defined dataset at the same time in order to identify those which stand out. The advantage compared to manual work is the ability to handle complex combinations and not being limited by human imagination.11 Additionally, since TDM research can be extended with very little additional risk it is possible to carry out more tests, including those less certain to produce results, at very small marginal costs.12

There are a number of different mining techniques, but their common starting point is the identification of input material – which can be both individual works as well as works organised in a database and, depending on area of application for the study, for instance

8 Also referred to as ”text and data analysis” 9 Eleonora Rosati, ‘Copyright as an Obstacle or an Enabler? A European Perspective on Text and Data Mining and Its Role in the Development of AI Creativity’ (2019) 27 Asia Pacific Law Review 198, 1–2. 10 DSM Directive art. 2(2) 11 Francesca Bignami, ‘European Versus American Liberty: A Comparative Privacy Analysis of Anti-Terrorism Data-Mining’ (2011) 48 Boston College Law Review 609, 614–615. 12 Thomas Margoni and Martin Kretschmer, ‘The Text and Data Mining Exception in the Proposal for a Directive on Copyright in the Digital Single Market: Why It Is Not What EU Copyright Law Needs’ (CREATe, 25 April 2018) accessed 14 February 2020.

7 consist of digitally stored text, data, images or sounds.13 Generally the TDM process can be summarised in the following steps14:

1. Retrieval of input material – Copies of sources or databases, in whole or in part, are made and downloaded to the own server or platform.15 2. Creation of a dataset – Data relevant for the study is identified and copied from the collected material and entered in a dataset. This step can involve adding metadata and/or adapting the format to one readable by machines16 – PDFs are for instance not machine readable.17 This is referred to as normalisation and annotation and is sometimes performed by publishers as a service, yet some researchers prefer to do it themselves.18 3. Analysis of the dataset – The computer recombines the data in the dataset and searches for patterns.19 4. Publication – The process is sometimes finished by publication of findings.

According to a report from 2007 half of all published scientific papers are only read by its author, referees and editor.20 However, like people in the 18th century got access to and started to read more books when the printing press was invented, we are now learning to read a million books at a time through software.21 Less time can now be spent on researching what someone else has already done and more time can be spent making new discoveries.22 This illustrates the evolution that use of digital information has undergone from mere information retrieval to value extraction. Algorithms can perform complicated comparisons and find patterns in collections of information which are too encompassing for the human brain to

13 Bottis and others (n 5) 179. 14 Marco Caspers and Lucie Guibault, ‘D3.3 Baseline Report of Policies and Barriers of TDM in Europe’ (FutureTDM 2016) 8–9 accessed 6 February 2020. 15 Bottis and others (n 5) 179; Caspers and Guibault (n 14) 8. 16 Caspers and Guibault (n 14) 8. 17 Michelle Brook, Peter Murray-Rust and Charles Oppenheim, ‘The Social, Political and Legal Aspects of Text and Data Mining (TDM)’ (2014) 20 D-Lib Magazine 2 accessed 25 February 2020. 18 Bottis and others (n 5) 179. 19 ibid. 20 Margoni and Kretschmer (n 12). 21 Martine Oudenhoven, ‘TDM and the Reading Revolution’ (FutureTDM, 12 April 2017) accessed 10 February 2020. 22 Margoni and Kretschmer (n 12).

8 handle and this has given large gatherings of data an additional value separate from that of each component making the collection.23

TDM makes it possible to draw new knowledge from literature for use in the natural and human sciences, computers can learn to recognise motives by mining of photographs and audio recordings can be mined to create translation tools.24 A concrete example is how the BlueDot project used TDM to discover the outbreak of the corona virus and managed to warn their clients to avoid the Wuhan area a little under two weeks before the WHO (World Health Organisation) sent a similar public notice.25 The company uses AI to predict disease outbreaks and to track down the connection between outbreaks and travel, among other sources 100,000 news reports in 65 languages were searched through each day.26 Other initiatives use TDM to try to find a vaccine for the disease by analysing scientific texts about the coronavirus family.27

In journalism TDM can be used to control the accuracy of historical facts and thus expose fake news. It was also a central tool in exposing the Panama Papers scandal28.29 TDM has made its own contribution to culture as a tool to expose art forgeries30, in an attempt to

23 Maurizio Borghi and Stavroula Karapapa, Copyright and Mass Digitization (1st edn, Oxford University Press 2013) 50–51 accessed 13 March 2020. 24 Sean Flynn and others, ‘Implementing User Rights for Research in the Field of Artificial Intelligence: A Call for International Action’ (2020) 2020 European Intellectual Property Review 11, 8. 25 Marc Prosser, ‘How AI Helped Predict the Coronavirus Outbreak Before It Happened’ (Singularity Hub, 5 February 2020) accessed 30 April 2020. 26 ibid; Sean Flynn and João Pedro Quintais, ‘Implementing User Rights for Research in the Field of Artificial Intelligence: A Call for Action at International Level’ (Kluwer Copyright Blog, 21 April 2020) accessed 30 April 2020. 27 Flynn and others (n 24) 3. 28 The leak of 11.5m files which revealed offshore tax evasion among a high number of prominent individuals. Luke Harding, ‘What are the Panama Papers? A guide to history's biggest data leak’ The Guardian (London 5 April 2016) https://www.theguardian.com/news/2016/apr/03/what-you-need-to-know-about-the-panama-papers accessed 28 May 2020. 29 Margoni and Kretschmer (n 12); P Bernt Hugenholtz, ‘The New Copyright Directive: Text and Data Mining (Articles 3 and 4)’ (Kluwer Copyright Blog, 24 July 2019) accessed 13 January 2020; Geiger, Frosio and Bulayenko (n 5) 5. 30 Jackie Snow, ‘This AI Can Spot Art Forgeries by Looking at One Brushstroke’ MIT Technology Review (Cambridge, Massachusetts, 21 November 2017) accessed 28 May 2020.

9 complete Beethoven’s 10th symphony31 and to let a code create “affordable” art32. By processing of text, it can be used to find grammatical patterns33, to create automated translation tools34, processing of legal texts – for instance in order to identify common denominators in case law35 and smart disclosure systems warning consumers for risks in accepting far reaching terms and conditions online36. Another major area of use is development of AI, which is dependent on the use of TDM to train machine learning systems.37 Finally, TDM is used in intelligent crime analysis to try to predict criminal acts,38 in search of terrorism suspects by mining call records or other digital information about individuals as well as to single out possible terrorists from boarding a plane by cross referencing airline records with other data.39

The vast area of application and possible users shows TDM’s economic potential but also its ability to work in the public interest in service of many parts of the society motivating the facilitation of use of the technique.

31 Justin Huggler, ‘Computer Is Set to Complete Beethoven’s Unfinished Symphony’ The Telegraph (13 December 2019) accessed 19 May 2020. 32 Per Kristian Bjørkeng, ‘Han syntes kunst var for dyrt i Oslo. Løsningen hans blir elsket av vennene hans, men avvist av eksperter.’ Aftenposten (Oslo, 22 January 2020) accessed 19 May 2020. 33 Hugenholtz (n 29). 34 Margoni and Kretschmer (n 12). 35 Adam Wyner and others, ‘Approaches to Text Mining Arguments from Legal Cases’ in Enrico Francesconi and others (eds), Semantic Processing of Legal Texts: Where the Language of Law Meets the Law of Language (Springer 2010) 61 accessed 19 May 2020. 36 Rossana Ducato and Alain Strowel, ‘Limitations to Text and Data Mining and Consumer Empowerment. Making the Case for a Right to “Machine Legibility”’ (Kluwer Copyright Blog, 19 March 2019) accessed 5 February 2020. 37 Hugenholtz (n 29); Flynn and Quintais (n 26). 38 Mohammad Reza Keyvanpour, Mostafa Javideh and Mohammad Reza Ebrahimi, ‘Detecting and Investigating Crime by Means of Data Mining: A General Crime Matching Framework’ (2011) 3 Procedia Computer Science 872, 872–873. 39 Bignami (n 11) 615–616.

10 2. FUNDAMENTAL EU COPYRIGHT LAW

Since frequently only parts of works are used in TDM, this chapter starts with a definition of protectable subject matter before it moves on to a discussion about the exclusive rights that may be infringed during the TDM process, followed by the available exceptions for TDM with focus on the DSM directive.

2.1 Protectable Subject Matter and Exclusive Rights

Entire works such as a database or a digital library can be part of the input material, but so can isolated segments of them such as selected parts of the database or only the library book sections describing male and female characters40. It therefore makes sense to first investigate what kind of input material can constitute protectable subject matter and the status for fractions of it. For the input material to enjoy copyright protection under the InfoSoc Directive it needs to be original.41 According to the CJEU, the originality requirement is met if the work contains the author’s own intellectual creation42 – an interpretation that should be understood as implicitly harmonised by the directive43. Words as such are not protected44, but the choice, sequence and combination of them can be, presuming that they meet the originality requirement45. The CJEU has found that a combination of 11 words is capable of this46 and that a data capture process therefore can be within the scope of the directive.47 Relevant for TDM is that high-level data – books newspapers, articles – therefore is more likely to be held covered by copyright than low-level data – such as phone numbers or scientific measurements.48 For a database to be eligible for copyright protection, the author’s own intellectual creation must be shown in the “selection and arrangement” of content49,

40 This example is inspired by a real project to investigate how gender was represented in literature during a 300- year period. Matthew Sag, ‘The New Legal Landscape for Text Mining and Machine Learning’ (2019) 66 Journal of the Copyright Society of the USA 64, 5 41 Berne Convention for the Protection of Literary and Artistic Works 1886 (Berne Convention) art. 2(5) 42 C-5/08 Infopaq 1 para 34-37 43 Isabella Alexander, ‘The Concept of Reproduction and the “Temporary and Transient” Exception’ (2009) 68 The Cambridge Law Journal 520, 521. 44 C-5/08 Infopaq 1 para 46 45 C-5/08 Infopaq 1 para 45-47 46 C-5/08 Infopaq 1 para 48 47 C-5/08 Infopaq 1 para 51 48 Caspers and Guibault (n 14) 15. 49 Database Directive art. 3(1)

11 which applies disregarding whether the individual parts as protectable or not.50 It follows naturally that extracting parts from such a database runs a relatively low risk of copyright infringement since it would require the extraction to mirror the selection and arrangement of data which made it original.51 Merely organising a database in alphabetical or other logical order is not considered to show intellectual creation in selection and arrangement and is therefore not sufficient for copyright protection52 and since the structure of scientific databases are dictated by technical factors, they generally don’t meet the originality requirement.53 On the other hand, the investment made in arranging “non-original” databases, may enjoy protection under the sui generis right54 if the maker can show that a substantial investment (qualitatively or quantitatively speaking) has been made in obtaining, verifying or presenting the data.55 For instance, it would be difficult to prove that a database containing computer-generated airline schedules meets the requirement of a substantial investment56 and is an example of material which is neither protected by copyright not the sui generis right. Data generated by a satellite on the other hand is likely to be deemed observed rather than generated and would if so be eligible for protection from the sui generis right if “arranged in a systematic or methodological way”57.58 Raw data59 can be processed using TDM and is neither protected by copyright nor the sui generis protection since it lacks originality and isn’t “arranged in a systematic or methodological way”60 and as such doesn’t meet the requirements for a database.61 This corresponds with the general principle in copyright law that facts and data as such aren’t protectable subject matter, but the original expression of them is.62 As an example, scientific articles are generally protected by copyright, but not the

50 Justine Pila and Paul LC Torremans, European Intellectual (2nd edn, Oxford University Press 2019) 489. 51 P Bernt Hugenholtz, ‘Against “Data Property”’ in Hanns Ullrich, Peter Drahos and Gustavo Ghidini, Kritika : Essays on Intellectual Property, vol 3 (Edward Elgar Publishing 2018) 57 accessed 15 April 2020. 52 Pila and Torremans (n 50) 489; Caspers and Guibault (n 14) 16. 53 Lucie Guibault and others, Safe to Be Open - Study on the Protection of Research Data and Recommendations for Access and Usage (Universitätsverlag Göttingen 2013) 21 accessed 25 February 2020. 54 Pila and Torremans (n 50) 490. 55 Database Directive art. 7 56 C–30/14 Ryanair Ltd v PR Aviation BV EU:C:2015:10 (Ryanair v PR Aviation) para 22 57 Database Directive art. 1(2) 58 Hugenholtz (n 51) 59–60. 59 Data that has not been processed for use. 60 Database Directive art. 1(2) 61 Hugenholtz (n 51) 60. 62 Christophe Geiger, Giancarlo Frosio and Oleksandr Bulayenko, ‘The Exception for Text and Data Mining (TDM) in the Proposed Directive on Copyright in the Digital Single Market - Legal Aspects’ (Centre for International Intellectual Property Studies (CEIPI) 2018) 2018–02 2

12 research data they are based on.63 This can and has been used as an argument as to why TDM shouldn’t be considered infringement – after all there can be no infringement if there is no protectable subject matter in the first place. Additionally it can be argued that during the TDM process the work(s) providing the input material is not used “as a work” – a public is not enjoying its expressive features (exploitative use) – rather data is transformed and processed (non-exploitative/non-consumptive use).64 Scholars have used this way of reasoning to motivate that “the right to read is the right to mine”65, but this is where the EU differs from for instance the US where TDM, because of its transformative nature, is deemed fair use66. Due to the way copyright law has evolved in Europe67 and the broad interpretation68 applied to the “all-inclusive” reproduction right69, it is likely that the making of copies as part of the TDM process during the retrieval, selection and analysis of data would be considered copyright relevant acts and constitute infringement of input material enjoying protection under the InfoSoc Directive70 or the Database Directive71.72 Additionally, if the result is published and the original input material quoted, there is a risk of infringement of the right of distribution73 (paper copies) or communication to the public74 (digital copies).75 It is worth noting that the scope of the exclusive rights for databases enjoying copyright protection are limited by the fact that a lawful user may do what is necessary in order to get access and for the sake of normal use76 even if this, for example, involves making reproductions.77 Corresponding exclusive rights under the sui generis protection are extraction (i.e. reproduction) and re-utilisation (i.e. distribution/communication to the public).78 The former might be infringed during the retrieving stage, when creating a target dataset and during the

accessed 6 February 2020; Margoni and Kretschmer (n 12). 63 Guibault and others (n 53) 21. 64 Geiger, Frosio and Bulayenko (n 62) 2; Margoni and Kretschmer (n 12); Hugenholtz (n 29). 65 Peter Murray-Rust, ‘The Right to Read Is the Right to Mine’ (Open Knowledge Foundation Blog, 1 June 2012) accessed 25 February 2020. 66 Sag (n 40) 32. 67 Geiger, Frosio and Bulayenko (n 62) 2; Margoni and Kretschmer (n 12); Hugenholtz (n 29). 68 C–5/08 Infopaq International A/S v Danske Dagblades Forening [2009] ECR I-6569 (Infopaq 1) para 43 69 Hugenholtz (n 29). 70 InfoSoc art. 2 71 Database Directive art. 5(a) 72 Caspers and Guibault (n 14) 18. 73 InfoSoc art. 4, Database Directive art. 5(c) 74 InfoSoc art. 3, Database Directive art. 5(d) 75 Caspers and Guibault (n 14) 19–20. 76 Database Directive art. 6(1) 77 Pila and Torremans (n 50) 489–490. 78 Database Directive art. 7

13 mining/analysis while the latter might be infringed at publication.79 It is worth noting that also these exclusive rights are limited by the lawful user’s right to use insubstantial parts of the database for whatever purpose.80 Hence, the risk of infringement at publication is very small81, but since entire works are often copied during the previous stages of the TDM process, there is a high chance to reach a level of substantial parts.82 Additionally, repeated and systematic extraction is not permitted if it conflicts with the normal use of a database or unreasonably prejudices the legitimate interests of the maker of the database.83

The paradox of copyright law is that reproduction is an exclusive right but generating some kind of reproduction has become essential for the act of using certain works, in particular online, such as browsing or searching the web.84 Applied to TDM this means that repeated reproduction/extraction of works – which might seem a clearly copyright relevant act – is necessary to perform TDM and for the underlying purpose to extract information – i.e. the use is transformative, and the work is not treated as a work and the act should as such be permissible.85 However, the CJEU’s formalistic interpretation of the law in the Infopaq cases has given the rightholder the possibility to prevent this kind of purely technical copies – arguably with detrimental consequences for copyright’s incentive effects of creating new content.86

2.2 The DSM Directive and TDM

The fact that the EU legislator chose to handle mining of protected sources through an exception signals that they see TDM as infringement by default when it occurs outside the scope of a licence agreement and there is no applicable exception87 and there can no longer be any doubt that in the eyes of the legislator the right to read is not the right to mine.88 The DSM Directive fails to explicitly spell out that mining of subject matter which is neither

79 Caspers and Guibault (n 14) 21–22. 80 Database Directive art. 8(1) 81 Caspers and Guibault (n 14) 19–20. 82 ibid 21–22. 83 Database Directive art. 7(5) 84 Borghi and Karapapa (n 23) 51–52. 85 ibid 51. 86 European Copyright Society, ‘General Opinion on the EU Copyright Reform Package’ (24 January 2017) 5 accessed 19 February 2020. 87 Rosati (n 9) 21. 88 ibid 23.

14 protected by copyright nor the sui generis right is admissible – yet this must have been the intention.89 According to the recitals “Text and data mining can […] be carried out in relation to mere facts or data that are not protected by copyright, and in such instances no authorisation is required under copyright law…”90 (emphasis added) but since facts and data as a general principle cannot generate copyright protection the legislator surely had facts and data contained in works which for one reason or other are unprotected. This could for example be subject matter that don’t qualify for copyright or sui generis protection – such as raw data – or due to expiry of protection. Additionally, there is the reduced scope of exclusive rights in respect of the lawful user.

If the input material is protected by copyright or the sui generis right one needs to get authorisation or rely on an exception in order not to infringe any exclusive rights. However, trying to get authorisation is problematic for the purpose of TDM in the sense that you need a lot of data for the process and possibly from a lot of different sources. Obtaining authorisation for each and every source can be both time consuming and expensive and is likely to cause projects to strand. To further complicate the situation, it is not always clear who is the rightholder – this is particularly true for out-of-commerce works91 and when a database has been created by public funding92. With the above in mind, being able to rely on an exception is clearly to prefer.

The objective of the legislative process concerning TDM under the new directive was to maximise legal certainty and minimise clearance costs for researchers.93 These goals could have been reached in a number of ways such as the introduction of new licensing practices, a reinterpretation of the reproduction right or an exception for TDM in copyright and database law.94 The legislator chose the latter by adding a new mandatory exception to the “list” of existing copyright exceptions indicating a change of mindset in the sense that all previous exceptions, apart from the one for temporary reproductions, are optional. This can be interpreted as an intention to harmonise the rules on TDM and make it independent from

89 Geiger, Frosio and Bulayenko (n 5) 7–8. 90 DSM Directive Recital 9 91 Rosati (n 9) 8. 92 Guibault and others (n 53) 84. 93 Bottis and others (n 5) 183. 94 ibid 185–186; European Commission, ‘Standardisation in the Area of Innovation and Technological Development, Notably in the Field of Text and Data Mining: Report from the Expert Group’ (Publications Office of the European Union 2014) 51 accessed 3 March 2020.

15 national borders. The exceptions for temporary reproductions95, teaching and research96, quotation97, press98 and the above mentioned use of insubstantial parts99 are (depending on implementation and scope of implementation) commonly suggested as providing some protection for TDM outside of the DSM Directive.

In addition to function across borders unrestrictedly, the ideal legislation should be able to accommodate similar technological processes in the future, and not only TDM as it works today100, but using a closed list is quite unequipped for that.101 Additionally, since it is impossible to predict future technical advancements, the wording of the exception can turn out to be more restrictive than intended. It would have been more flexible to use an exception in the shape of an open norm102 similar to the “” regime in the US, since it could stand technological development and social change103 at the same time as it would apply the same to all types of exclusive rights and users.104 It has been suggested that such an exception could include uses not yet covered by an exception but which can be justified by an important public interest and/or fundamental human rights105 or it could allow use that doesn’t interfere with markets for copyright protected works, i.e. use which doesn’t include expressive communication to a public.106 To comply with the EU’s obligations as a contracting party to the WCT article 10107, the three-step test would continue to apply as a control mechanism to make sure that a fair balance between rightholder and user is maintained.108 An even more radical option would have been to – instead of an open norm – use a “recalibrated” version of the three-step test to allow case-by-case assessment and using the list of exceptions109 work as a reference to identify situations which could be exempted if they pass the three-step test.110 The recalibration would be for courts to cease applying the test as a way to give existing

95 InfoSoc art. 5(1). 96 InfoSoc art. 5(3)(a), Database Directive art. 6(2)(b) and 9(b). 97 InfoSoc art. 5(3)(d). 98 InfoSoc art. 5(3)(c). 99 Database Directive art. 8(1). 100 Margoni and Kretschmer (n 12). 101 Geiger, Frosio and Bulayenko (n 62) 20. 102 Also referred to as general exception or opening clause 103 European Commission, ‘Expert Group Report’ (n 94) 66; Geiger, Frosio and Bulayenko (n 62) 20; European Copyright Society (n 86) 5; Bottis and others (n 5) 186. 104 Flynn and others (n 24) 7–8. 105 Geiger, Frosio and Bulayenko (n 62) 31. 106 Flynn and others (n 24) 7–8. 107 WIPO Copyright Treaty 1996 (WCT) 108 Martin Senftleben, ‘The Perfect Match: Civil Law Judges and Open-Ended Fair Use Provisions’ (2017) 33 American University International Law Review 231, 267. 109 InfoSoc art. 5(1)-(4) 110 Senftleben (n 108) 271; Bottis and others (n 5) 187.

16 limitations a narrow interpretation and to start using it to sanction new.111 In order to improve the balance between rightholder and user a possible addition would be a fourth “step” safeguarding the legitimate interest of third parties.112 The introduction of an open norm was however not even mentioned as one of the options presented in the European Commission’s Impact Assessment113; they were all variations of a new exception except the alternative of doing nothing. Scholars have been calling for the introduction of a general open norm in EU copyright law, but there is a longstanding concern that it would not fit the traditions of civil law countries (which make up the majority of Member States) and that civil law judges don’t have the necessary experience to handle open-ended defences114, which could possibly explain why the EU legislator chose the more traditional option of adding two new exceptions to the list in order to accommodate for TDM.

The Member States have until the 7th June 2021 to implement an exception for TDM as required by the DSM Directive115 but those that already apply a broader one concerning uses or fields covered by the DSM directive may maintain such exceptions if compatible with the Database or the InfoSoc Directive. Interestingly, they may also adopt new broader exceptions under same conditions.116 Hence, the Member States that have already implemented an exception for TDM under the exception for teaching and scientific research may continue to apply it and Member States wishing to pave the way as much as possible for TDM can supplement the DSM exceptions with national law as long as they derive from the exhaustive117 list of exceptions in the InfoSoc or Database directives. Additionally, such exceptions need also to conform with the three-step test118 to safeguard a fair balance between rightholders and users119. During the drafting process of the DSM Directive, the EU legislator rejected more encompassing TDM exceptions with the motivation that they would have a “negative impact on copyright as a fundamental right” and have a significant negative impact on the licensing market and rightholders’ revenues.120 Naturally, it is the final version of the

111 Senftleben (n 108) 272. 112 European Commission, ‘Expert Group Report’ (n 94) 56. 113 European Commission, ‘Commission Staff Working Document - Impact Assessment on the Modernisation of EU Copyright Rules Part 1/3’ (2016) Text SWD(2016) 301 final 107–109 accessed 24 February 2020. 114 Senftleben (n 108) 231–232. 115 DSM Directive art. 29 116 DSM Directive art. 25 117 InfoSoc Recital 32 118 DSM Directive art. 7(2), InfoSoc art. 5(5) 119 InfoSoc Recital 31 120 European Commission, ‘Impact Assessment on the Modernisation of EU Copyright Rules’ (n 113) 117–118.

17 directive that decides what is permissible and not, but maybe the reasoning of the legislator could give a hint of how a rightholder could argue before the CJEU to question the lawfulness of a broad national implementation. If nevertheless a balance is struck, this provision certainly opens up for fragmentation between Member States and the sought harmonisation will suffer. DSM takes the role as a minimum exception and, as is their habit, the EU legislator has drafted very general and formable provisions and clearly stated that exceptions related to TDM may be introduced under other directives. On the one hand it’s laudable that the Member States that wish to support this type of research have been given the discretion to create more generous national exceptions for TDM, but on the other it might prove to be catastrophic in view of reaching the goal of harmonisation enabling cross-border activity.

Onwards DSM Article 3 will be referred to as “the scientific research exception” and Article 4 as “the general exception”. The latter is sometimes referred to as “the commercial exception”, but I find that misleading since there is room for some commercial intent under Article 3 and vice versa.

The Scientific Research Exception under the DSM Directive Article 3 Article 3 of the DSM Directive provides an exception for research organisations to carry out TDM – including storage of the copies produced in the process – for the purpose of scientific research. The exception only applies where the user already has lawful access to the work or other subject matter, but if they do the rightholder is prevented from contracting out121. Licence terms excluding TDM is thereby null and void.122 As opposed to the general exception which applies to the InfoSoc Directive, Database Directive and Software Directive123, the scientific research exception only makes an explicit reference to the former two.124 It is unclear whether this is an oversight by the legislator and if a reference to the Software Directive is in fact necessary since software is considered literary works in copyright law.125

121 DSM Directive art. 7(1) 122 Hugenholtz (n 29). 123 Directive 2009/24/EC of the European Parliament and of the Council of 23 April 2009 on the legal protection of computer programs (Software Directive) [2009] OJ L 111, 5.5.2009, p. 16–22 124 DSM Direcrtive art. 4(1) and 3(1) 125 Benjamin White and Maja Bogataj Jančič, ‘Articles 3-4: Text and Data Mining’ (Guidelines for the Implementation of the DSM Directive) accessed 27 April 2020.

18 The exception was inspired by and partially overlap126 the InfoSoc Directive’s teaching and scientific research exception127 which continues to govern all use for the purpose of scientific research besides TDM128. The InfoSoc exception encompasses any reproduction for the purpose of teaching or scientific research, hence simply making it mandatory would get effects reaching far longer than TDM. More importantly, input material can enjoy both copyright and sui generis protection and while the exception is available for both it is with slight variations: For the copyright exception to apply the sole purpose has to be scientific research without commercial purpose, hence any type of project with a commercial gain, including private-public cooperation, is excluded. Unlike the copyright exception, the corresponding provision under the sui generis protection129 lacks “sole” which indicates that projects with only partly scientific research as its purpose could use the exception. Hence, the sui generis protection could potentially cover the entire TDM process when performed for that purpose. A hurdle with the sui generis protection on the other hand, is that it requires the source has to be named, which is stricter than the copyright exception, and problematic for TDM since the process requires a lot of sources.130 Finally, the earlier exceptions are optional and whether they have been implemented or not as well as scope varies a lot between Member States131 and the mandatory status of the scientific research exception for TDM is a step closer to removing this barrier. In addition it opens up for public-private cooperation through the definition of a research organisation and it explicitly handles storage of the input material.

The General Exception under the DSM Directive Article 4 In the first drafts of the directive there was only a TDM exception for scientific research and the exception for other purposes was only added later and remained optional until the final law proposal, where it received mandatory status.132 The general exception applies to TDM of lawfully accessed works, disregarding the beneficiary’s purpose (other than performing TDM).133 As such, it addresses a wider group of potential beneficiaries and purposes compared to the scientific research exception, but it provides considerably weaker protection since it has been subjected to the rightholder’s express reservation of the right to make

126 Bottis and others (n 5) 185. 127 InfoSoc art. 5(3)(a) 128 DSM Directive Recital 15 129 Database Directive art. 9(b) 130 Caspers and Guibault (n 14) 32. 131 ibid 30. 132 Hugenholtz (n 29). 133 DSM Directive art. 4(1)

19 reproductions and extractions for the purpose of TDM134.135 This can be achieved both by contract as well as technical means, meaning that the rightholder may actively prevent TDM without a scientific purpose through, for example, contract, machine readable means or terms and conditions on their website.136 Copies generated during the mining process may only be kept as long as necessary for the purposes of TDM137, which is reminiscent of the temporary reproduction exception, both in wording and content.138 Indeed, it has been suggested that by exchanging “temporary” for “intermediary” copy to identify a stage in a process rather than to put a timely limitation on it, the temporary reproduction exception could encompass TDM until the stage of publishing. Intermediary copies should in this setting be understood as a copy that doesn’t have any independent significance in itself but is part of a technological process.139 However, the DSM recitals140 state that the exception for temporary reproductions continues to apply to TDM activity falling within said exception’s scope which acknowledges that both of the TDM exceptions are intended to go beyond and not interfere with the temporary reproduction exception.

134 DSM Directive art. 4(3) 135 Hugenholtz (n 29). 136 ibid; Rossana Ducato and Alain Strowel, ‘Limitations to Text and Data Mining and Consumer Empowerment: Making the Case for a Right to Machine Legibility’ (2018) 2018 CRIDES Working Paper Series 45, 17. 137 DSM Directive art. 4(2) 138 Ducato and Strowel (n 136) 16. 139 Borghi and Karapapa (n 23) 58–59. 140 DSM Directive Recital 9 and 18

20 3. WHY AND HOW MIGHT COPYRIGHT BE AN OBSTACLE FOR TDM

This chapter aims to explain how copyright despite recent additions still might create barriers for the performance of TDM. It starts with a brief note on how legal uncertainty negatively effects the development of the digital single market and the EU’s competitiveness towards for instance the US. It proceeds by discussing various aspects where EU copyright law may hinder the efficient performance of TDM using the new DSM Directive as a starting point but with reference to the InfoSoc and Database directives where relevant.

TDM is part of the EU data economy together with for instance smart manufacturing and the internet of things141 and in 2015 the value of the EU data economy alone was 1,87% of EU GDP, which was an increase by 5,6% since only the year before.142 The EU legislator has recognised that by not accommodating for TDM the digital single market risks competitive disadvantages towards more accepting regimes; acquiring licences is burdensome and there is a risk that EU entities therefore would choose to perform their TDM activity outside of the Union, that companies based outside the Union would prefer not to invest in companies within the EU and finally, that talented European researchers seek occupation elsewhere (where they would be less restricted).143 Researchers are reluctant to test the law, including when there is an exception if it is not well defined.144 Before the introduction of the DSM Directive, TDM was associated with a lot of legal uncertainty; it was not clear whether TDM required authorisation from the rightsholder, could benefit from an exception or whether the intended use would constitute infringement of the rightholder’s exclusive rights at all.145 The DSM Directive aims to provide legal certainty in the digital environment and cross-border situations in general and regarding when TDM acts might infringe copyright or the sui generis database right in particular.146

141 Hugenholtz (n 51) 51. 142 European Commission, ‘Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee and the Committee of the Regions “Building A European Data Economy”’ (2017) COM(2017) 9 final 2 accessed 28 May 2020. 143 Pamela Samuelson, ‘The EU’s Controversial Digital Single Market Directive - Part II: Why the Proposed Mandatory Text- and Data-Mining Exception Is Too Restrictive - Kluwer Copyright Blog’ (Kluwer Copyright Blog, 12 July 2018) accessed 14 February 2020. 144 Brook, Murray-Rust and Oppenheim (n 17) 3. 145 Rosati (n 9) 6. 146 DSM Directive Recital 8 and 3

21 3.1 Beneficiaries

As mentioned above anyone with lawful access can benefit from the general exception, but as shall be further elaborated below, the scientific research exception provides a stronger defence for users and it is therefore desirable to try to become eligible for the latter. For a project to be able to benefit from the scientific research exception it needs to have a scientific research purpose and the beneficiary needs to be part of and pursue the project within a research organisation.147 This is sometimes referred to as the “double limitation”. It follows from the recitals that a scientific research purpose should be understood to include both the natural and human sciences148 implying a diverse range of possible application areas, which is welcome if the aim is to permit use of TDM in as many situations as possible, but at the same time the vague formulation opens up for legal uncertainty: How does one know if one is approaching the border of natural or human sciences and does it mean that the purpose needs to be solely scientific as opposed to mainly educational but partly scientific?149 Sometimes scientific discoveries are made “stumbling across” something and TDM is a technique apt for finding connections or patterns in unexpected places. In that sense law demanding a clear and precise research plan to be able to use the exception to apply would restrict the potential of TDM.

As for the second limitation of the applicability of the exception, the directive defines “research organisations” based on their scope and structure. First, they need to have been established for the primary goal of conducting scientific research (possibly together with educational services). Second their work needs to be either not-for-profit or as part of a public-interest mission recognised by the state.150 Such recognition can be reflected through public funding, provisions in national laws or through public contracts. Universities and their libraries as well as hospitals which perform research are offered as examples. Finally, a negative example is provided: Organisations over which a commercial undertaking has a decisive influence, which could result in better rights/priority to access the results, falls outside the scope of the definition.151 It is not entirely clear if this example is supposed to highlight an organisation which fails to meet the not-for-profit or the public-interest

147 DSM Directive art. 3(1) 148 DSM Directive Recital 12 149 Geiger, Frosio and Bulayenko (n 5) 33; Geiger, Frosio and Bulayenko (n 62) 28. 150 DSM Directive art. 2(1) 151 DSM Directive Recital 12

22 requirement or both but it is tempting to use it as an example of a not not-for-profit organisation to get an indication where the legislator intends the line to be drawn. Neither is it clear how whether donations from private undertakings may change an entity’s status as a research organisation. The definition provided by the directive allows Member States to define research organisations in accordance with national law, so the question arises what would be the outcome if an entity in a state with very including rules wants data from a state applying a more restrictive definition? Scholars have expressed concern regarding the definition of a research organisation due to its use of vague legal terms which are not defined, and which they fear will result in a lot of work for the CJEU.152 As an example, the research for a vaccine will likely meet the scientific research purpose requirement and it follows quite clearly that a medical institution of a university researching a vaccine constitutes a research organisation and can benefit from the scientific research exception. A scientist on secondment to a pharmaceutical seem unlikely to meet the requirements due to the pharmaceutical’s strong underlying purpose of maximising income, unless it can be shown that the work is done within a public-private partnership. As indicated above the latter depends on how big an influence the pharmaceutical has on the project. An independent researcher on the other hand, fall quite clearly outside the scope of the exception, since they don’t belong to an organisation – even if the same work could have benefited from the exception had it been performed for instance at a university or in a hospital. Hence, it is the “who” and not the purpose of what they intend to do that puts most restrictions on the scope of beneficiaries and calling it “exception for research organisations” would have corresponded better with the content of the provision since “scientific research” promises a much wider scope than “research organisations”. (However, naturally the headings need not to affect the content of national implementation.)

The exception as adopted in the Directive is limited to not-for-profit scientific research and scientific research with a public interest objective153 but an alternative to limiting beneficiaries to research organisations would have been to limit them to non-commercial entities (an option which was considered by the Commission). However, restrictions on

152 Max Planck Institute for Innovation and Competition, ‘Position Statement of the Max Planck Institute for Innovation and Competition on the Proposed Modernisation of European Copyright Rules: PART B Exceptions and Limitations: Chapter 1 Text and Data Mining’ (2017) 4 accessed 22 April 2020. 153 ibid 2.

23 commerciality would be counterproductive in the sense that a lot of discoveries valuable for the society are made by commercial entities or with a commercial interest. The above mentioned BlueDot project was executed by a commercial entity, the Hathi Trust154 copies by a public-private partnership, journalism is often driven by private companies and research projects – in particular for medical treatments – are commercialised after the project.155 On a practical level it can also be difficult to distinguish between commercial and non-commercial in public-private partnerships.156 The scientific research exception has an expansive scope in that it includes non-commercial users as well as the work that commercial organisations do as part of a public-interest mission.157 Rather than excluding commercial entities the EU legislator has chosen to define who should be eligible for the exception – and in doing so they avoided the problems associated with interpreting “commercial”. However, like with a non- commercial requirement, a lot of possible contributors to societal gain are still excluded – only covering research organisations means that start-ups, individual researchers and journalism fall outside the scope of beneficiaries158 – and it has been argued that the double limitation therefore comes very close to a limitation to non-commercial purposes.159 The reason beneficiaries were limited to research organisations in the first place was to compensate rightholders for the lack of a non-commercial requirement160 and the rationale was that others should pay for a licence to mine161, leaving the purely commercial TDM market untouched162.163 While it is possible to understand why larger corporations could be required to pay, limiting the TDM exception to non-commercial scientific research would have had an unfavourable outcome for unaffiliated researchers such as independent data scientists and think-tank personnel.164 It has even been argued that not letting SMEs benefit

154 A not-for-profit digital collaboration between academic and research libraries. https://www.hathitrust.org/ Accessed 28 May 2020. 155 Flynn and others (n 24) 9–10. 156 Bottis and others (n 5) 180. 157 Geiger, Frosio and Bulayenko (n 62) 27; Bottis and others (n 5) 180. 158 Geiger, Frosio and Bulayenko (n 62) 27–28. 159 Margoni and Kretschmer (n 12). 160 European Commission, ‘Assessment of the Impact of the European Copyright Framework on Digitally Supported Education and Training Practices: Final Report’ (Publications Office of the European Union 2016) 116 accessed 26 May 2020. 161 Margoni and Kretschmer (n 12). 162 European Commission, ‘Impact Assessment of the European Copyright Framework on Digitally Supported Education and Training Practices’ (n 160) 116–117. 163 This refers to the market for value added licences (including for instance normalisation of data) common in particular in life science and pharmaceuticals. 164 Samuelson (n 143).

24 from the exception is against the freedom of expression and information165 as well as the freedom to conduct a business166.167 Possibly this may be a result of the fact that in the Impact Assessment stakeholders were divided into researchers, corporate research users and rightholders. That means that everyone who might use TDM and who wasn’t a researcher were grouped together creating a very diverse group and I am very doubtful that it was possible to summarise their various needs and possibilities in a fair summary. Hence, while the statement that corporate research users had “generally not asked EU intervention” and were deemed to have different needs than research organisations168 may apply to corporate research, this does not mean the entire group benefited from that. The definition of “research organisation” – despite trying to be very general and inclusive – to me reveals a very traditional view of where scientific research is conducted, by whom and for what purpose and contrary to the objective of promoting innovation in scientific research.

An additional drawback for scientific innovation in Europe is that both Japan and the US apply exceptions that are not limited to non-profit scientific research hence the EU Digital Single Market could suffer a competitive disadvantage by applying a narrow exception.169 The general exception is, as we shall see later, not as strong. Finally, TDM is more common in the commercial sector and the insurmountable obstacle of obtaining licences from rightsholders is not smaller for commercial entities than for research organisations170 – hence it is a group that should not be omitted.

The above objectives could easily be achieved by extending the group of beneficiaries to everyone with lawful access – as is the current practice in the UK.171 Introducing such a criterion ought not to be so controversial, in particular in comparison to other IP law. Similar provisions can be found in the right to use information reached by reverse engineering in

165 Charter of Fundamental Rights of the European Union [2012] (Fundamental Rights Charter) OJ C 326, 26.10.2012, p. 391–407 art. 11 166 Fundamental Rights Charter art. 16 167 Margoni and Kretschmer (n 12). 168 European Commission, ‘Impact Assessment of the European Copyright Framework on Digitally Supported Education and Training Practices’ (n 160) 116. 169 Samuelson (n 143). 170 Eleonora Rosati, ‘EU Text and Data Mining Exception for the Few: Would It Make Sense?’ (2018) 13 Journal of Intellectual Property Law & Practice 429, 429–430. 171 Geiger, Frosio and Bulayenko (n 62) 33.

25 law as well as the exception for experimental use172 for the purpose of increasing scientific knowledge in law.173 It can even be said that increase of scientific knowledge through experimental use is encouraged through the publication of detailed descriptions of patented inventions174. Similarly, the Software Directive makes an exception “to observe, study or test the functioning of the program in order to determine the ideas and principles which underlie any element of the program”.175 Hence, just like the rightholder who allows others to use their software have to assume that the users might make reproductions for said reason, the rightholder of copyright protected works could be expected to assume that someone with lawful access might perform TDM.176 Copyright need to continue to make room for follow-on creativity and the exercise of fundamental rights.177 Research is firmly justified by the Fundamental Rights through the freedom of information and safeguarded through limitations in the scope of the exclusive rights and exceptions and limitations.178

3.2 Lawful Access

Common for both of the TDM exceptions in the DSM Directive is that the user is required to have lawful access to the work, which according to the recitals includes open access, licence and works freely available online.179 These examples alone are in my opinion too detailed to give users a satisfactory definition to rely on for legal certainty. An example which is not handled by the directive is for instance whether lawful access to a public library is sufficient to mine material submitted to the library as legal deposits.180 Another situation which is unclear under the DSM Directive is sending copies across borders, which is important for cooperation and validation, and this applies even where both of the concerned nations permit TDM – for example it does not appear from the Directive whether a researcher in the EU may

172 76/76/EEC: Convention for the European patent for the common market (CPC 1975) [1975] OJ L 17, 26.1.1976, p. 1–28 art. 31(b), Agreement on a Unified Patent Court Agreement 2013 [2013] (UPC Agreement) OJ C 175/1 art. 27(b) 173 Flynn and others (n 24) 5-6 supra note 16. 174 Pila and Torremans (n 50) 194. 175 Software Directive art. 5(3) 176 Max Planck Institute for Innovation and Competition (n 152) 6–7. 177 Flynn and Quintais (n 26). 178 Flynn and others (n 24) 5. 179 DSM Directive art. 3(1) and 4(1) and Recital 14 180 Laurent Fournier and Jonas Holm, ‘Nya förut-sättningar för text- och data-utvinning’ (Kungliga Biblioteket (National Library of Sweden), 27 February 2020) accessed 27 April 2020.

26 transfer a lawfully acquired database to a research partner in the US for mining.181 Another unclarity is the relationship (if any) between “lawful access” and the concept of “lawful use” from the temporary reproduction exception182 – meaning use either authorised by the rightholder or not restricted by law183 – is unclear. If the concepts are intended to have the same meaning the same phrase should have been used.184

The Directive has been criticised because subjecting TDM to lawful access makes it possible for publishers (or other rightholders) to prevent beneficiaries from performing TDM by not granting them licence or increasing the price. Additionally, there is a fear that the licensing prices may go up if publishers routinely start adding a fee for TDM, not least since it risks creating a gap between rich and poor research institutions and/or between Member States as regards innovation.185 Scholars argue that if a use does not harm a market, it should not matter if the source has been accessed in a lawful manner or not from the perspective of fairness towards exclusive rights.186 I would argue that the act of unlawfully accessing a work harms the market of the source. To make a harsh comparison: It is not illegal to read a stolen book, but it is illegal to steal it. On the other hand, it can be argued that purchase of a licence was never a feasible option due to high prices and that the market therefore was not harmed, but this is a larger discussion on the boundaries of this thesis, so I will leave the question open. The lawful access requirement is intended to shield “private actors from an obligation to open up their data to third parties”187 and it secures payment to the rightholders, which supposedly strikes a balance between them and users of a wide TDM exception. In other words: rightholders are compensated for the TDM exception through the lawful access requirement since it secures payment for access.188 This should also be seen in the light that at least for the scientific research exception, rightholders are not allowed to exclude TDM from licence terms.

181 Flynn and others (n 24) 10. 182 InfoSoc art. 5(1) 183 InfoSoc Recital 33 184 Geiger, Frosio and Bulayenko (n 62) 32. 185 ibid 30; Geiger, Frosio and Bulayenko (n 5) 33–34; European Copyright Society (n 86) 4. 186 Flynn and others (n 24) 10. 187 Geiger, Frosio and Bulayenko (n 5) 33 supra note 138; European Commission, ‘Expert Group Report’ (n 94) 58. 188 Nicolas Jondet, ‘The Text and Data Mining Exception in the Proposal for a Directive on Copyright: Why the European Union Needs to Go Further than the Laws of Member States’ (2018) 67 Propriétés Intellectuelles 25, 19.

27 Looking at how some Member States had already implemented a TDM exception before the DSM Directive, it is difficult to tell with certainty how lawfulness has been handled without making a closer study than this thesis allows, but the UK189 and France both require lawful access, the Estonian Copyright Act requires attribution – which could indirectly indicate a lawful access requirement – while Germany’s copyright law is silent on the matter190. Outside of Europe it seems national copyright law tend to focus on what you intend to do with the data, which purpose you have and whether that purpose is commercial or not.191 The American fair use doctrine is an example of this, but it should be noted that the lawfulness of unlicensed TDM has not yet been expressly ruled on.192

TDM is an important tool for research and as such it has therefore been argued that it lies in the public interest not to give rightholders control over its use.193 Arguably it is essential to draft an exception that does not indirectly give the rightholder too much power forcing researchers to revert to sites dedicated to circumvent paywalls such as Sci-Hub and LibGen194, but on the other hand it should not be forgotten that exceptions does not generally grant access, but grants certain use(s) for defined purposes.

Contractual override Rightholders commonly use contractual provisions to limit access to and use of their works by excluding TDM from permitted uses under a licence or making it subject to additional payment. To guarantee TDM and make an exception truly efficient it should not be overridable by contact195 and contractual provisions contrary to the scientific research exception have been made null and void196, or in plain English, if you have a licence to access content, any provision saying that TDM is excluded from the allowed uses under the licence

189 Albeit, no longer a Member State. 190 Caspers and Guibault (n 14) 29–30; Geiger, Frosio and Bulayenko (n 5) 24–26. 191 European Commission, ‘Expert Group Report’ (n 94) Section 4.1 ‘TDM outside Europe’. 192 Rosati (n 9) 15. 193 European Copyright Society (n 86) 5. 194 Balázs Bodó, ‘The Science of Piracy, the Piracy of Science. Who Are the Science Pirates and Where Do They Come from: Part 1’ (Kluwer Copyright Blog, 6 March 2019) accessed 4 May 2020. 195 Iain Hargreaves, ‘Digital Opportunity - Review of Intellectual Property and Growth’ (Department for Business, Innovation & Skills 2011) Independent Report 11/968 47 accessed 24 April 2020. 196 DSM Directive art. 7(1)

28 is unenforceable.197 The general exception, however, can as mentioned above only be used as long as the rightholder has not expressly reserved use of the work in an appropriate manner198 by for instance expressing it in the terms and conditions of a website, through contractual agreements or by machine-readable means199. This would create problems for an independent researcher needing access to a source originating from a major publishing house if the licence agreement excludes TDM. A researcher affiliated with a research institution could use the scientific research exception as defence for performing TDM, while an individual researcher or journalist would have to try to get an extended licence by paying more. However, for mining of sources neither protected by copyright nor the sui generis right the power of contract is a general problem.200 The CJEU made it clear in Ryanair v PR Aviation that even if a the definition of a database in the Database Directive201 is met, the provisions concerning the copyright or sui generis protection does not apply if the database fails to meet the conditions202 for application of protection.203 In practice this means that applicability of the directive can’t hinder the rightholder of unprotected work from limiting use through contract204 as long as they are in compliance with other provisions in national law.205 As an example, an online database consisting of automatically stored records of historical sales on the stock market would meet the requirement of a database if sorted in a chronological order, but fails to meet the originality as well as the substantial investment criteria and would be ineligible for protection. As an effect none of the TDM exceptions would provide a defence for mining the website if its terms and conditions exclude TDM from permitted uses – disregarding whether the site is otherwise publicly available. In essence this means that the user of a protected database has more extensive possibilities to perform TDM, than the user of an unprotected database.206 The situation has not changed since the introduction of the DSM Directive and so to mine such data as in the above example would require rightholder authorisation (likely in exchange for compensation) for lawful access.

197 Hugenholtz (n 29); Geiger, Frosio and Bulayenko (n 62) 27. 198 DSM Directive art. 4(3) 199 DSM Directive Recital 18 200 Ducato and Strowel (n 136) 19. 201 Database Directive art. 1(2) 202 Database Directive art. 3(1) “the author’s own intellectual creation” and art. 7 “substantial investment” 203 C–30/14 Ryanair v PR Aviation para 35 204 C–30/14 Ryanair v PR Aviation para 39 205 Rosati (n 9) 12. 206 Caspers and Guibault (n 14) 45–46.

29 Technical override Another of the most common options for rightholders to try to control their resources is through technological protection measures (TPM), i.e. tools controlling access and/or what user can do with a digital work207. Like the case for contractual means, the consequences for performers of TDM varies depending on which exception that is available: As stated previously, the rightholder has the possibility to opt-out from the general exception by expressly reserving use in an appropriate manner.208 In addition to contract, such a manner can be technological means such as metadata, paywalls, password control systems, time limited access, encryption measures etc.209 As an example: some useful input material consists of large parts of a certain online database available for download until a predefined limit has been reached. When a certain amount of data has been reached the user is technically prevented from downloading more. A beneficiary of the general exception is not allowed to create a code to work around the technical barrier since by introducing it, the rightholder has reserved use by technical means and as such prevented users from lawful use. As we shall see below, this applies even if the restrictions go beyond what is stipulated by copyright law due to the strong protection against circumvention of technological measures. Had the user been a beneficiary of the scientific research exception the picture is different in theory, but risks having the same result in practice: On first impression it may seem that this exception is non-overridable by TPMs since the rightholder is only entitled to apply measures with the purpose to ensure the security and integrity of the networks and databases where the works or other subject matter is stored210 – for instance be to ensure that only persons with lawful access can actually access the content. According to the recitals, these measures should be proportionate and may not go beyond what is necessary to reach that purpose and they should not undermine effective application of the scientific research exception.211 Some have praised the recitals’ clarity regarding when measures may be applied and to what extent212 while others have expressed that the restriction that measures should not undermine the effective application of the exception needs to be entered into the article for clarity and to strengthen beneficiaries.213 Due to the strong protection of technological measures under the

207 Teresa Nobre and Natalia Mileszyk, ‘Article 7: Contractual and Technological Override’ (Guidelines for the Implementation of the DSM Directive) accessed 4 May 2020. 208 DSM Directive art. 4(3) 209 DSM Directive Recital 18, Nobre and Mileszyk (n 206). 210 DSM Directive art. 3(3) 211 DSM Directive Recital 16 212 Bottis and others (n 5) 182. 213 Geiger, Frosio and Bulayenko (n 62) 30.

30 InfoSoc Directive, there is concern that the rightholder’s possibility to apply measures actually opens up for TPMs to a greater extent than the provision in the DSM Directive might give away at first sight.214 Before continuing, a review of the protection for technological measures in copyright law is called for:

The InfoSoc Directive article 6(1) prohibits circumvention of technological measures215 and it’s noteworthy that the scope of protection encompasses not only technological measures designed to prevent copyright infringement (such as anti-copying devices), but any use which has not been authorised by the rightholder – disregarding whether such use constitutes a copyright relevant act or not.216 The alternatives available to the rightholder are in other words restricted by what is technologically possible rather than what is prescribed by copyright law – they can control access and use of works beyond the exclusive rights217 – and is sometimes referred to as “”.218 As such, the extensive prohibition of circumvention is a threat to users’ rights, both for non-restricted uses such as enjoyment of the work as well as to benefit from copyright exceptions in general.219 For TDM, the possibly far- reaching effects of technological measures applied to ensure security and integrity risks creating a barrier.220

InfoSoc article 6(4) sub-paragraph 1 serves to clarify the relationship between the prohibition of circumvention of technological measures in article 6(1) and the copyright exceptions in article 5221 and stipulates that unless the rightholder has applied voluntary measures to ensure the possibility to benefit from certain explicitly referred exceptions, the Member State shall ensure that the rightholder does so.222 An important distinction to make is that the purpose of

214 Samuelson (n 143). 215 InfoSoc art. 6(1) 216 Séverine Dusollier, ‘Tipping the Scale in Favor of the Right Holders: The European Anti-Circumvention Provisions’ in Eberhard Becker and others (eds), Digital Rights Management: Technological, Economic, Legal and Political Aspects (Springer 2003) 465–466 accessed 26 March 2020. 217 Séverine Dusollier, ‘The Protection of Technological Measures: Much Ado About Nothing or Silent Remodeling of Copyright?’ in Rochelle Cooper Dreyfuss and Jane C Ginsburg (eds), Intellectual Property at the Edge (Cambridge University Press 2014) 255–256 accessed 26 March 2020. 218 ibid 253. 219 Geiger, Frosio and Bulayenko (n 5) 35–36. 220 ibid 19. 221 Alvise Maria Casellati, ‘The Evolution of Article 6.4 of the European Information Society Copyright Directive’ (2001) 24 Columbia - VLA Journal of Law & the Arts 369, 374; Dusollier (n 217) 254. 222 InfoSoc art. 6(4) 1st sub-para

31 the provision is not to give access to possible beneficiaries but to safeguard efficient application the listed copyright exceptions once there is legal access to the source.223 A slightly outdated but still illustrative example is a teacher who has bought a dictionary on CD ROM with a technical lock which prevents her from making copies to use for educational reasons.224 With the intention to make the TDM exceptions non-overridable by TPMs225 they were added to the “list” of exceptions to be guaranteed by the Member States.226 The Member States have the discretion to decide how to make it possible for exception beneficiaries to make use of a work blocked by TPMs but a report conducted 2016 showed that only nine states had implemented some kind of procedure.227 Some states have chosen courtly procedures while others require users to file a complaint with relevant authority.228 The dedicated British authority has reportedly only received a handful of requests since the establishment of the procedure229 and these types of systems has received critique since there is on the one hand a risk that users believe that they will not be able to use an exception or that trying to exercise their rights would be a waste of time with uncertain results.230

It follows from article 6(4) sub-paragraph 1 that the legislator’s preferred solution is for the rightholder to voluntarily “facilitate the exercise of exceptions to their rights”. This is really quite remarkable in that it creates something of an anti-theft alarm that goes off disregarding the intentions of the user and whether there is an applicable exception. The weakness in this system is that it doesn’t require rightholders to remove TPMs as long as they give beneficiaries of an exception the means to still get access231 forcing the user to contact the rightholder or subsequently go through the process provided by the Member State for each work they intend to use. Given the vast number of sources needed to perform TDM and the lack of time limit dictating how long it may take to get access this is bound to slow down the TDM process. According to a survey conducted 2020 users are required to wait on average

223 Dusollier (n 217) 254 and 472. 224 Dusollier (n 216) 472. 225 European Commission, ‘Expert Group Report’ (n 94) 57. 226 InfoSoc art. 6(4) through DSM Directive art. 7(2) 227 European Commission, ‘Impact Assessment of the European Copyright Framework on Digitally Supported Education and Training Practices’ (n 160) 63–64. 228 Nobre and Mileszyk (n 207); Urs Gasser, ‘Legal Frameworks and Technological Protection of Digital Content: Moving Forward Towards a Best Practice Model’ (Berkman Klein Center for Internet & Society 2006) 2006–04 29–30 accessed 30 March 2020. 229 Margoni and Kretschmer (n 12). 230 Martina Gillen and Gavin Sutter, ‘DRMS and Anti-Circumvention: Tipping the Scales of the Copyright Bargain?’ (2006) 20 International Review of Law, Computers & Technology 287, 291. 231 Nobre and Mileszyk (n 207).

32 one month232 before they receive access blocked by TPMs – and some reported to never have received access despite contacting the rightholder.233 For perspective, this can be compared to the fast reconnection for consumers on services such as iTunes.234 In addition – the same survey showed that when finding out users mine their material – rightholders are in the habit of applying sanctions such as suspension of campuses, threats to cut off access, technically limiting downloads, requirements of additional payment and technical frustration of TDM. This causes scientists to circumvent the TPMs themselves or to revert to SciHub and their likes235 and Member States should be aware that these are the signals they send to the research community if they use the same solution when implementing the provisions regarding the relationship between TPMs and the new TDM exceptions. The scientific research exception doesn’t explicitly handle this type of retribution from the rightholders, but the use of past tense in the general exception implies that such acts of “punishment” are not within the scope of having expressly reserved use in an appropriate manner.

The solutions implemented by the Member States to ensure availability of exceptions following InfoSoc article 6(4) sub-paragraph 1 have been described as disproportionate in exchange for what can be gained236 and the above is evidence that the existing exceptions are limited not by law but by technique and are as such not very effective in practice.237 The advantage is that the beneficiary doesn’t have to be a hacker to enjoy exceptions and the option to apply voluntary measures makes room for the freedom of contract.238 However, EU’s solution has been called insufficient and accused of putting beneficiaries in a weak position because while an obligation to only do what’s necessary has been put on the rightholder, beneficiaries have no authority to act and circumvent illegal TPMs.239 The provision turns use of an exception to negotiation and contracting with the rightholder.240 If the procedures to exercise an exception are too burdensome, the law has made them so inefficient that they might as well not have been there in the first place. It has been said that

232 From 24 hours to 2,5 months 233 LIBER Europe, ‘Europe’s TDM Exception for Research: Will It Be Undermined By Technical Blocking From Publishers?’ (LIBER, 10 March 2020) accessed 24 April 2020. 234 ibid. 235 ibid. 236 Dusollier (n 217) 264. 237 Nobre and Mileszyk (n 207). 238 Casellati (n 221) 378. 239 Geiger, Frosio and Bulayenko (n 5) 35; Geiger, Frosio and Bulayenko (n 62) 31–32; Margoni and Kretschmer (n 12). 240 Dusollier (n 216) 478.

33 “ineffective rights regimes are worse than no rights at all”241 but I think this extends to exceptions too.

The anti-circumvention provision has so far resulted in more scholarly critique than case law242 but this doesn’t have to mean that all the critique directed towards the provision at the introduction of the directive was unjustified.243 Empirical studies show that potential beneficiaries of exceptions feel restricted by the prohibition and there are indications that copyright holders gradually shift their strategy from prevention of use and reproductions towards moderation and control substituting the most restraining with more refined TPMs designed to keep users sufficiently happy by letting them use the work in a controlled manner – for instance through limiting possible reproductions of music files purchased online to a certain number of copies or devices.244 The legal issue has as such not been resolved which brings a dangerous side effect of steering user behaviour245 which in my mind is contrary to free flow of information and ideas which is the core of copyright.

Returning to application of the scientific research exception with the above in mind, the legislator chose also to include a new feature in the DSM Directive: Member States shall encourage rightholders and research institutions to define commonly agreed best practices regarding storage of copies and application of technical measures.246 This is reminiscent of the solution offered for far reaching technical measures247 in that the rightholder is given the opportunity to apply voluntary measures to make available the works before the state intervenes. For the scientific research the best practices encouragement comes in addition. Hence it would seem that Member States have an opening to not just stand by and hope for rightholders to apply voluntary measures, they have an opening to encourage them and it is in their interest to be perceived as encouraging because in the absence of voluntary measures, they have an obligation to ensure that rightholders enable mining in accordance with the exception. It seems rightholders’ ability to apply TPMs are weakened since they may only be applied to protect the platform’s security and performance to avoid overload caused by high use and similar, but it is left to the Member States to choose how high security they can

241 Hargreaves (n 195) 5. 242 Dusollier (n 217) 258. 243 ibid 254 and 264. 244 ibid 264–266. 245 ibid 266. 246 DSM Directive art. 3(4) 247 InfoSoc art. 6(4) 1st sub-para

34 accept and how to implement this provision, but through DSM article 3(4) the EU legislator implies application of “soft law”. It’s also up to the national law to discern whether the encouragement to find “best practices” should be a one-off or under continuous revision. Arguably it would be challenging to find the optimal solution on the first try but it is very difficult to draw any conclusions regarding the efficiency of the exception at this stage. Finally, it ought to be mentioned that the three-step test applies to the implementation of the new exceptions and that a fair balance248 should be kept between rightholders and users. Part of the fair balance is the continued application of the Member States’ obligation249 to ensure that the rightholders make the exceptions available to the beneficiaries250 and the best practice solution has already been subject to critique for not fully compensating that rightholders can prevent beneficiaries or restrict how TDM is performed by use of technological measures251 hence it will be interesting to see how the Member States try to strike a balance.

As for the general exception it would be fair to say that the legislator has tipped the scales in favour of the rightholder considering the possibility to opt-out using technical means and the prohibition of circumvention. The best practices provision doesn’t apply to the general exception, but it is included in the list of exceptions that Member States should guarantee access to when blocked by technological measures252. This creates a situation very similar to that where InfoSoc article6(4) sub-paragraph 4 is applicable – a paragraph which has been called “the greatest defect of the whole construction [of InfoSoc’s provisions on circumvention of technical means]”253. The abovementioned obligation to provide beneficiaries of a copyright exception with a way around technological measures254 does not extend to “works or other subject-matter made available to the public on agreed contractual terms in such a way that members of the public may access them from a place and at a time individually chosen by them.”255 Consequently, it does not apply to subscription services (such as Westlaw) or streaming services (such as Spotify).256 The more customary a library rather than a bookshop business model becomes in the digital industry, the more effect the

248 DSM Directive Recital 6 249 InfoSoc art. 6(4) 1st, 3rd and 5th sub-para 250 Bottis and others (n 5) 182. 251 EUA and others, ‘Future-Proofing European Research Excellence: A Statement from European Research Organisations on Copyright in the Digital Single Market’ 2 accessed 26 March 2020. 252 Ref. InfoSoc art. 6(4) sub-para 1 through DSM Directive art. 7(2) 253 Dusollier (n 216) 474. 254 InfoSoc art. 6(4) 1st sub-para 255 InfoSoc art. 6(4) 4th sub-para 256 Gillen and Sutter (n 230) 291.

35 exception will get and the balance will tip in the rightholder’s favour.257 An effect of the rule is that where information is accessed using digital means, the one who accessed it becomes the end user in the sense that further development of the content is prevented without the consent of the rightsholder and Member States are prevented from interfering if the rightsholder of an on-demand/streaming service refuses to authorise or makes unreasonable demands. This risk of stopping the flow of information is, I dare say, contrary to the intention that copyright enable an intellectual and creative exchange in the public. At its best this provision stipulates freedom of contract over copyright law258. It is therefore encouraging to see that InfoSoc article 6(4) sub-paragraph 4 is not applicable to any of the TDM exceptions as the DSM Directive259 only refers to article 6(4) sub-paragraphs 1, 3 and 5 and that the recitals state that Member States should take appropriate measures in accordance with InfoSoc article 6(4) sub-paragraph 1 “including where works and other subject matter are made available to the public through on-demand services”260. I understand this to mean that there is no limitation to Member States’ authority to act for making available the TDM exceptions even for works which have been purchased under contract and made available online.261 This is good news for beneficiaries of the scientific research exception, but the possibility for rightholders to reserve use of their works under the general exception through technological means in practice that status quo is kept for beneficiaries wishing to use input material available on-demand or through streaming (i.e. online). This construction is likely to have similar effects as InfoSoc article 6 sub-paragraph 4 and slow down the efficient application of TDM since online access is a prevalent manner for accession sources today. Additionally, rightholders can now protect any type of source from TDM technologically.

To conclude the discussion on TPMs and overridability I would like to make the following remarks: A rightholder does not have the possibility to reserve use from beneficiaries of the scientific research exception, but they are allowed to use technical means to protect the functioning of their service. The prohibition of circumvention of technical measures should in theory not prevent users from getting access when the rightholder has introduced measures beyond what is necessary, preventing efficient use of the exception, but unless Member States make drastic changes when implementing the best practices provision, there are indications

257 ibid 291–292; Dusollier (n 216) 474. 258 Casellati (n 221) 388. 259 DSM Directive 7(2) 260 DSM Directive Recital 7 261 Nobre and Mileszyk (n 207).

36 that users’ weak position will remain262 and TPMs will continue to create obstacles for TDM beyond what is stipulated by copyright law – in particular in view of newer control increasing techniques such as block-chain technology. The rightholder may use technical means to reserve use towards “non-researchers” intending to perform TDM, which creates a situation similar to that under InfoSoc article 6(4) sub-paragraph 1. InfoSoc article6(1) sub-paragraph 1 applies, but in practice it is difficult to see when Member States would have the possibility to intervene to help beneficiaries of the general exception to exercise it given that rightholders can reserve use by using same technical means. Seen together with rightholder’s possibility to also use contract to opt-out, one wonders what remains of the exception when the ability to skip the need for authorisation is removed?263 Those who cannot rely on the scientific research exception are forced to try to find contractual agreement with the rightholder and there is no satisfactory justification for this; start-ups, journalists and information intermediaries have the same potential as researchers affiliated with a research organisation.264 Additionally, the chosen solution is very restrictive compared to jurisdictions with more open clauses and might reduce the EU Member States’ competitiveness.265 It cannot be denied that a clearer option would have been to spell it out that neither of the DSM exceptions can be overruled by TPMs and give users effective means to remove them.266

3.3 Retrieval and Analysis

A central part of the TDM process is the creation and analysis of a dataset. These steps are normally referred to separately, but will be dealt with together for the purposes of this thesis based on the assumption that if a project has come this far, the researchers have probably either found out that what they plan to do will not constitute an infringement, found an exception to can rely on or cancelled their plans. Hence, this part of the procedure generally doesn’t include any actions that will force reconsideration of a project’s permissibility from a legal point of view.

262 Ducato and Strowel (n 36). 263 A similar remark has been made about InfoSoc art. 6(4) 1st sub-paragraph: Dusollier (n 215) 472 264 Max Planck Institute for Innovation and Competition (n 152) 3–4. 265 Geiger, Frosio and Bulayenko (n 5) 30. 266 Geiger, Frosio and Bulayenko (n 62) 34.

37 Interestingly, and as noted above, the mandatory exception for temporary reproductions267 from the InfoSoc Directive continues to apply to TDM268 and the analysis step of the TDM process might benefit from this if a technique is applied where copies are automatically deleted from the computers RAM memory and the making of them are an integral and essential part of the technological process. However, since the other stages don’t meet the transient requirement of the temporary reproduction exception as they require manual work to be removed it cannot be used as a defence for the entire TDM process.269 Additionally, it doesn’t apply to databases270 and there is no corresponding exception in the Database Directive suggesting that there is no temporary reproduction exception applicable to databases.271 Hence the effectiveness of the exception used alone for TDM is limited and provides little legal certainty272 and it is necessary to revert to the DSM Directive.

3.4 Sharing Results and Spreading Knowledge

Once the analysis is finished there might be reason to share the results with the scientific community or a wider audience but depending on the content publication might constitute an infringement. It is not common to publish extracts from the mined source together with the result, but some research requires source material to be communicated within the research community for verification.273 Additionally, outside of the scientific sphere – for instance for journalists – publication of the results is very relevant.

In this instance the TDM exceptions in DSM Directive have been treated the same by the EU legislator: as mentioned above they only apply to the right of reproduction, but it is possible to revert to the “old” copyright exceptions in search of an applicable defence: For copyright protected material, there are in particular two exceptions which potentially could be useful to publish your results: the exceptions for quotation and press. The copyright exception for quotation274 – which is too narrow to be useful for the other stages of the TDM process – could be used to permit quotes from the input material in publication of the results.275 If TDM

267 InfoSoc art. 5(1) 268 DSM Directive Recitals 9 and 18 269 Caspers and Guibault (n 14) 24. 270 InfoSoc art. 1(2)(e) 271 Caspers and Guibault (n 14) 23–24. 272 Geiger, Frosio and Bulayenko (n 5) 13. 273 Geiger, Frosio and Bulayenko (n 62) 8. 274 InfoSoc art. 5(3)(d) 275 Caspers and Guibault (n 14) 39.

38 has been performed for a journalistic purpose, the press exception276 might provide a defence, but only in very specific cases277 as only “the press” can benefit from it and the topic for the publication needs to be “current economic, political or religious”.278 Additionally, some member states have restrictions on the medium and all use of the exception needs to be proportional, which can be difficult to know in advance. Attribution naturally restricts an efficient use of the exception as well.279 It therefore seems like a literal implementation of the exception maybe could allow a data journalist to publish work online but with the possible hurdle of attribution of sources quoted. Finally, depending on if and how it has been implemented, the exception for teaching and scientific research280 might provide a defence but the corresponding exception for works protected by the sui generis right cannot be used for the same purpose since it doesn’t extend to reutilisation. However, the lawful user may not be prevented from re-utilisation of insubstantial parts of the data281 disregarding for what purpose and this could provide the exception needed.282

The weakness preventing efficient application of TDM is that all of the above exceptions are optional – hence the same rules are unlikely to apply cross borders within the EU. For efficient sharing of results it would therefore have been advisable for the EU to enable communication of research files283, possibly through the introduction of an exception in line with the parody exception284 which similarly is based on use of prior work without compensation to the rightholder. Interestingly, as mentioned earlier, the DSM Directive has given the Member States the option to adapt broader TDM exceptions as long as they are compatible with the exceptions in the InfoSoc and Database directives.285 It therefore seems possible to include a TDM exception for communication to the public and distribution286 by revision or implementation of one or more of the above mentioned exceptions as long as it passes the three-step test. If successful, this has the potential to increase legal certainty for TDM users. For copyright protected material, introducing a press exception extended to include parts of the input material would mean that at least journalists would with certainty be

276 InfoSoc art. 5(3)(c) 277 Caspers and Guibault (n 14) 44. 278 ibid 43. 279 ibid 44. 280 InfoSoc art. 5(3)(a) 281 Database Directive art. 8(1) 282 Caspers and Guibault (n 14) 32. 283 Geiger, Frosio and Bulayenko (n 62) 34. 284 Margoni and Kretschmer (n 12). 285 DSM Directive art. 25 286 InfoSoc art. 5(4)

39 able to report their results with examples from the input material without risking infringement. If well-defined and clearly delimited such an exception should be able to meet the three-step test given its important purpose of providing the public with information – which additionally is supported by the freedom of expression and information287. This would only cover the press, but it seems difficult to find an exception that could encompass all of the beneficiaries, purposes and sources following from the general exception for TDM; the exception for quotation for example has a very specific purpose and doesn’t apply to databases while the teaching and scientific research exception does but also suffers from its (in this context) narrow purpose.

3.5 Storage and Verification

Licences often contain provisions on for how long data accessed through it may be stored288 and during the drafting of the DSM Directive stakeholders expressed the wish to be able to store research files used for the purpose of TDM.289 There might be several reasons for TDM users to want to store the input material – in the academic world not least to be able to control the accuracy, replicability and transparency of the results later.290 Beneficiaries of the scientific research exception may now keep copies for the purpose of scientific research – including for verification of results291 but the files should be stored in a secure environment and it will be matter for national legislation to decide appropriate levels of security.292 The main objective surely must be to ensure that third parties cannot reach the data without authorisation.293 Germany is one of the Member States which has a TDM exception based on the teaching and scientific research exception and German law designates certain institutions for long-term storage of the files created in the TDM process.294 Possibly this is the kind of arrangement the legislator had in mind (but less intrusive provisions are naturally also an option). Interestingly, the “best practices” provision295 also extends to storage. This is, in to

287 Fundamental Rights Charter art. 11 – See in particular (2) “The freedom and pluralism of the media shall be respected.” 288 Fournier and Holm (n 180). 289 Geiger, Frosio and Bulayenko (n 62) 34. 290 Flynn and Quintais (n 26). 291 DSM Directive art. 3(2) 292 DSM Directive Recital 15 293 Max Planck Institute for Innovation and Competition (n 152) 11. 294 Geiger, Frosio and Bulayenko (n 5) 26. 295 DSM Directive art. 3(4)

40 my knowledge, unprecedented in copyright law and it will be interesting to see how the Member States choose to implement this.

The corresponding possibility for those relying on the general TDM exception only provides for copies to be retained as long as necessary for the purposes of TDM296, which is notably narrower. The prevailing interpretation seems to be that copies may be retained for the duration of the TDM process. Member states will have the opportunity to make their own interpretation, but with a strict reading of the provision, there is risk that it will become unnecessary difficult to build on previous research not knowing the particulars on which it is built. TDM is very resource saving compared to manual work, but it is still more efficient to know what has been done before in order to avoid the risk of repeating tests which have already been performed by others without knowing it.

296 DSM Directive art. 4(2)

41 4. CONCLUSION

The introduction of a TDM exception was motivated by the lack of a harmonised European approach for TDM; since the available exceptions were optional it was treated differently (or not at all) by national law and this created legal uncertainty – in particular in cross-border situations – and due to varying degree of overridability by contractual and technical measures.297

Post the DSM Directive it is a matter of fact that, due to the actions taken – namely reproductions of works or parts of works – during the process, TDM is copyright infringement. However, it is also clear that TDM can be exempted in the name of not-for- profit science and science with a public interest objective298. For other purposes the status is generally speaking more or less unchanged in the sense that on paper, there is an exception for anyone with lawful access which rightholders are obliged to act pre-emptively to opt-out from, but the required steps to take correspond with rightholders’ prevailing practices and they will therefore not need to change their habits in order to block use of the new exception. If anything, the rightholders claiming that TDM is infringement, working actively to prevent it if it hasn’t been explicitly authorised, have been proven right. The general exception doesn’t necessarily lead to less harmonisation or legal uncertainty, but it forces users to reach agreements with every rightholder of every used source and with no guarantee of success. As such the rightsholder’s possibility to opt-out undermines what the provision accomplishes.299 It follows from the Impact Assessment, that self-regulation was not understood as sufficient to fulfil the harmonisation objective and satisfactory reduce the legal uncertainty for researchers300, but self-regulation is just what the general exception creates, only not for researchers as defined by the directive, but for all other users. It shows that the EU legislator understood the practical implications of such a solution, so one might ask why it was understood insufficient for some users but not for others.

297 Bottis and others (n 5) 183. 298 Max Planck Institute for Innovation and Competition (n 152) 4. 299 Rosati (n 9) 21. 300 European Commission, ‘Impact Assessment on the Modernisation of EU Copyright Rules’ (n 113) 113; Bottis and others (n 5) 183.

42 It is something of a turning point that the new provisions are mandatory, and it is indeed an essential feature in order to reach the major goal of harmonisation301, to facilitate cross-border cooperation302 and to support the functioning of the digital single market303 as well as removing legal uncertainty caused by varying degree of implementation of the optional exceptions in national law304. Copyright exceptions reflect fundamental freedoms universal throughout the EU and as such they deserve uniform implementation.305 According to the Commission a motivation for introducing new exceptions was that “guarantee[d] legal certainty in cross-border situations” was not something that could be accomplished by national law alone306, but it is difficult to understand how the possibility to maintain or introduce broader exceptions provided by article 25 of the DSM Directive contributes to this objective. Not only are the two new TDM exceptions drafted in such a way as to give the Member States some discretion when implementing them, but they are also able to create or uphold more far reaching exceptions as long as they are compatible with InfoSoc and the Database Directive. Hence the legislator has opened up for larger discrepancies between national law than one would normally expect from exceptions following from a directive and made it increasingly difficult for Member States to reach de facto harmonisation. It seems international consensus regarding TDM and copyright is still far away307 and this provision has the potential to make it worse to the detriment of legal certainty.

Given the broad scope of the general exception, everyone with lawful access to input material is eligible for an exception for the purpose of text and data mining but it can be argued that in order to be fully mandatory, the law needs a high level of harmonisation and to be non- overridable. In this light, what the DSM Directive lacks for a fully efficient application and competitive advantage is to (1) change the provisions so that once someone has lawful access it is not possible to override by contract and (2) to clearly spell it out that neither TPMs nor voluntary measures may hinder third parties from benefiting from any of the exceptions.308 Instead, both of the exceptions are sensitive towards TPMs and the general exception can be overridden by contract and creates as such a lawful access problem. TPMs are very effective

301 Geiger, Frosio and Bulayenko (n 62) 27. 302 European Commission, ‘Impact Assessment on the Modernisation of EU Copyright Rules’ (n 113) 81. 303 Geiger, Frosio and Bulayenko (n 5) 36. 304 Caspers and Guibault (n 14) 100. 305 European Copyright Society (n 86) 2. 306 European Commission, ‘Impact Assessment on the Modernisation of EU Copyright Rules’ (n 113) 81. 307 Flynn and others (n 24) 1. 308 Geiger, Frosio and Bulayenko (n 5) 36–37; Bottis and others (n 5) 185; European Commission, ‘Expert Group Report’ (n 94) 56–57.

43 in restricting use and the available procedures for beneficiaries of an exception to get access have generally been seen an additional burden.309 The DSM Directive does not change this aspect and TPMs are likely to continue as one of the main barriers for TDM. In an otherwise strong exception, this seems to be the weakness of the scientific research exception.

Contrary to earlier legislation, the scientific research exception cannot be overridden by contract – which to a varying degree used to be a barrier depending on how much national law allowed contractual arrangement to depart from copyright law.310 Non-overridability by contract precludes the industry from developing standard contracts311 – but this remains a possibility outside scientific research since the general exception does not enjoy the same protection. To come in a stronger position and not depend on negotiations with the rightholder(s) it is desirable for anyone needing to use TDM to be able to benefit from the scientific research exception – but as shown the scope of beneficiaries is, due to the double limitation, rather limited and unsatisfactory if a more than traditional view of who performs scientific research is applied. The legislator has chosen not to extend the strong protection of the scientific research exception to other purposes – even though there are more purposes that are beneficial for the society; journalism is arguably one which in addition is protected by the Fundamental Rights Charter, but still falls outside the scope of the exception. Possibly the intention is that Member States have the option to provide for TDM under the press exception using article 25 (to the detriment of harmonisation). A welcome addition on the other hand is the permissibility of private-public cooperation since it in addition to increasing the scope of beneficiaries reduces the legal uncertainty as compared to a non-commercial requirement.

Since the DSM Directive doesn’t provide exceptions to the right of communication to the public or distribution for TDM, users have to revert to the InfoSoc or Database directives to find a defence for publishing their results if they include parts of the input material. Depending on the situation, there might be an applicable exception, but it would be favourable for legal certainty if the Member States considered to mention explicitly whether those exceptions apply to TDM – as a suggestion as part of the imminent implementation work. However, the law does not seem to allow national legislators to introduce an exception for this purpose as wide as the general exception. Storage of input material as an exception to

309 Caspers and Guibault (n 14) 99–100. 310 ibid 99. 311 Bottis and others (n 5) 185; European Commission, ‘Expert Group Report’ (n 94) 56–57.

44 the right of reproduction on the other hand is allowed to a varying extent under the DSM Directive – whether the provisions are satisfactory or not remains to be seen.

Finally a note for the future: Choosing to introduce two new exceptions is only a short term solution in that it only accommodates for techniques for extracting and recombining knowledge as known today. In order to find a solution that will last without revision and isn’t always a step behind the technical and the society’s development, the legislator would have to reconsider a more open structure. TDM has the possibility to change the foundations for how information is made available and to find a matching solution might require challenging the habitual structure of European copyright law.

45 Bibliography

Legislation

Berne Convention for the Protection of Literary and Artistic Works 1886 (Berne Convention)

76/76/EEC: Convention for the European patent for the common market (CPC 1975) [1975] OJ L 17, 26.1.1976, p. 1–28

WIPO Copyright Treaty 1996 (WCT)

Directive 96/9/EC of the European Parliament and of the Council of 11 March 1996 on the legal protection of databases (Database Directive) [1996] OJ L 77, 27.3.1996, p. 20–28

Directive 2001/29/EC of the European Parliament and of the Council of 22 May 2001 on the harmonisation of certain aspects of copyright and related rights in the information society (InfoSoc) [2001] OJ L 167, 22.6.2001, p. 10–19

Agreement on a Unified Patent Court Agreement 2013 [2013] (UPC Agreement) OJ C 175/1

Directive 2009/24/EC of the European Parliament and of the Council of 23 April 2009 on the legal protection of computer programs (Software Directive) [2009] OJ L 111, 5.5.2009, p. 16–22

Directive (EU) 2019/790 of the European Parliament and of the Council of 17 April 2019 on copyright and related rights in the Digital Single Market and amending Directives 96/9/EC and 2001/29/EC (DSM Directive) [2019] OJ L 130, 17.5.2019, p. 92–125

Charter of Fundamental Rights of the European Union [2012] (Fundamental Rights Charter) OJ C 326, 26.10.2012, p. 391–407

Case Law

C–5/08 Infopaq International A/S v Danske Dagblades Forening [2009] ECR I-6569 (Infopaq 1) C–302/10 Infopaq International A/S v Danske Dagblades Forening EU:C:2012:16 (Infopaq 2) C–30/14 Ryanair Ltd v PR Aviation BV EU:C:2015:10 (Ryanair v PR Aviation)

EU Publications

European Commission, ‘Standardisation in the Area of Innovation and Technological Development, Notably in the Field of Text and Data Mining: Report from the Expert Group’ (Publications Office of the European Union 2014) accessed 3 March 2020

46

European Commission, ‘Assessment of the Impact of the European Copyright Framework on Digitally Supported Education and Training Practices: Final Report’ (Publications Office of the European Union 2016) accessed 26 May 2020

European Commission, ‘Commission Staff Working Document - Impact Assessment on the Modernisation of EU Copyright Rules Part 1/3’ (2016) Text SWD(2016) 301 final accessed 24 February 2020

European Commission, ‘Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee and the Committee of the Regions “Building A European Data Economy”’ (2017) COM(2017) 9 final accessed 28 May 2020

Reports

Caspers M and Guibault L, ‘D3.3 Baseline Report of Policies and Barriers of TDM in Europe’ (FutureTDM 2016) accessed 6 February 2020

Hargreaves I, ‘Digital Opportunity - Review of Intellectual Property and Growth’ (Department for Business, Innovation & Skills 2011) Independent Report 11/968 accessed 24 April 2020

LIBER Europe, ‘Europe’s TDM Exception for Research: Will It Be Undermined By Technical Blocking From Publishers?’ (LIBER, 10 March 2020) accessed 24 April 2020

Max Planck Institute for Innovation and Competition, ‘Position Statement of the Max Planck Institute for Innovation and Competition on the Proposed Modernisation of European Copyright Rules: PART B Exceptions and Limitations: Chapter 1 Text and Data Mining’ (2017) accessed 22 April 2020

Books

Guibault L and others, Safe to Be Open - Study on the Protection of Research Data and Recommendations for Access and Usage (Universitätsverlag Göttingen 2013) accessed 25 February 2020

Pila J and Torremans PLC, European Intellectual Property Law (2nd edn, Oxford University Press 2019)

47

Wyner A and others, ‘Approaches to Text Mining Arguments from Legal Cases’ in Enrico Francesconi and others (eds), Semantic Processing of Legal Texts: Where the Language of Law Meets the Law of Language (Springer 2010) accessed 19 May 2020

Articles

Alexander I, ‘The Concept of Reproduction and the “Temporary and Transient” Exception’ (2009) 68 The Cambridge Law Journal 520

Bignami F, ‘European Versus American Liberty: A Comparative Privacy Analysis of Anti- Terrorism Data-Mining’ (2011) 48 Boston College Law Review 609

Bodó B, ‘The Science of Piracy, the Piracy of Science. Who Are the Science Pirates and Where Do They Come from: Part 1’ (Kluwer Copyright Blog, 6 March 2019) accessed 4 May 2020

Borghi M and Karapapa S, Copyright and Mass Digitization (1st edn, Oxford University Press 2013) accessed 13 March 2020

Bottis M and others, ‘Text and Data Mining in the EU Acquis Communautaire Tinkering with TDM & Digital Legal Deposit’ (2019) 12 Erasmus Law Review 179 Brook M, Murray-Rust P and Oppenheim C, ‘The Social, Political and Legal Aspects of Text and Data Mining (TDM)’ (2014) 20 D-Lib Magazine accessed 25 February 2020

Casellati AM, ‘The Evolution of Article 6.4 of the European Information Society Copyright Directive’ (2001) 24 Columbia - VLA Journal of Law & the Arts 369

Ducato R and Strowel A, ‘Limitations to Text and Data Mining and Consumer Empowerment: Making the Case for a Right to Machine Legibility’ (2018) 2018 CRIDES Working Paper Series 45

Ducato R and Strowel A, ‘Limitations to Text and Data Mining and Consumer Empowerment. Making the Case for a Right to “Machine Legibility”’ (Kluwer Copyright Blog, 19 March 2019) accessed 5 February 2020

Dusollier S, ‘Tipping the Scale in Favor of the Right Holders: The European Anti- Circumvention Provisions’ in Eberhard Becker and others (eds), Digital Rights Management: Technological, Economic, Legal and Political Aspects (Springer 2003) accessed 26 March 2020

48 Dusollier S, ‘The Protection of Technological Measures: Much Ado about Nothing or Silent Remodeling of Copyright?’ in Rochelle Cooper Dreyfuss and Jane C Ginsburg (eds), Intellectual Property at the Edge (Cambridge University Press 2014) accessed 26 March 2020

Flynn S and others, ‘Implementing User Rights for Research in the Field of Artificial Intelligence: A Call for International Action’ (2020) 2020 European Intellectual Property Review 11

Flynn S and Quintais JP, ‘Implementing User Rights for Research in the Field of Artificial Intelligence: A Call for Action at International Level’ (Kluwer Copyright Blog, 21 April 2020) accessed 30 April 2020

Gasser U, ‘Legal Frameworks and Technological Protection of Digital Content: Moving Forward Towards a Best Practice Model’ (Berkman Klein Center for Internet & Society 2006) 2006–04 accessed 30 March 2020

Geiger C, Frosio G and Bulayenko O, ‘The Exception for Text and Data Mining (TDM) in the Proposed Directive on Copyright in the Digital Single Market - Legal Aspects’ (Centre for International Intellectual Property Studies (CEIPI) 2018) 2018–02 accessed 6 February 2020

Geiger C, Frosio G and Bulayenko O, ‘Text and Data Mining: Articles 3 and 4 of the Directive 2019/790/EU’ (Centre for International Intellectual Property Studies (CEIPI) 2019) 2019–08 accessed 26 February 2020

Gillen M and Sutter G, ‘DRMS and Anti-Circumvention: Tipping the Scales of the Copyright Bargain?’ (2006) 20 International Review of Law, Computers & Technology 287

Hugenholtz PB, ‘Against “Data Property”’ in Hanns Ullrich, Peter Drahos and Gustavo Ghidini, Kritika : Essays on Intellectual Property, vol 3 (Edward Elgar Publishing 2018) accessed 15 April 2020

Hugenholtz PB, ‘The New Copyright Directive: Text and Data Mining (Articles 3 and 4)’ (Kluwer Copyright Blog, 24 July 2019) accessed 13 January 2020

Jondet N, ‘The Text and Data Mining Exception in the Proposal for a Directive on Copyright: Why the European Union Needs to Go Further than the Laws of Member States’ (2018) 67 Propriétés Intellectuelles 25

Keyvanpour MR, Javideh M and Ebrahimi MR, ‘Detecting and Investigating Crime by Means of Data Mining: A General Crime Matching Framework’ (2011) 3 Procedia

49 Computer Science 872

Margoni T and Kretschmer M, ‘The Text and Data Mining Exception in the Proposal for a Directive on Copyright in the Digital Single Market: Why It Is Not What EU Copyright Law Needs’ (CREATe, 25 April 2018) accessed 14 February 2020

Murray-Rust P, ‘The Right to Read Is the Right to Mine’ (Open Knowledge Foundation Blog, 1 June 2012) accessed 25 February 2020

Rosati E, ‘EU Text and Data Mining Exception for the Few: Would It Make Sense?’ (2018) 13 Journal of Intellectual Property Law & Practice 429

Rosati E, ‘Copyright as an Obstacle or an Enabler? A European Perspective on Text and Data Mining and Its Role in the Development of AI Creativity’ (2019) 27 Asia Pacific Law Review 198

Sag M, ‘The New Legal Landscape for Text Mining and Machine Learning’ (2019) 66 Journal of the Copyright Society of the USA 64

Samuelson P, ‘The EU’s Controversial Digital Single Market Directive - Part II: Why the Proposed Mandatory Text- and Data-Mining Exception Is Too Restrictive - Kluwer Copyright Blog’ (Kluwer Copyright Blog, 12 July 2018) accessed 14 February 2020

Senftleben M, ‘The Perfect Match: Civil Law Judges and Open-Ended Fair Use Provisions’ (2017) 33 American University International Law Review 231

Other publications

Bjørkeng PK, ‘Han syntes kunst var for dyrt i Oslo. Løsningen hans blir elsket av vennene hans, men avvist av eksperter.’ Aftenposten (Oslo, 22 January 2020) accessed 19 May 2020

EUA and others, ‘Future-Proofing European Research Excellence: A Statement from European Research Organisations on Copyright in the Digital Single Market’ accessed 26 March 2020

European Copyright Society, ‘General Opinion on the EU Copyright Reform Package’ (24 January 2017) accessed 19 February 2020

Fournier L and Holm J, ‘Nya förut-sättningar för text- och data-utvinning’ (Kungliga Biblioteket (National Library of Sweden), 27 February 2020)

50 accessed 27 April 2020

Huggler J, ‘Computer Is Set to Complete Beethoven’s Unfinished Symphony’ The Telegraph (13 December 2019) accessed 19 May 2020

Nobre T and Mileszyk N, ‘Article 7: Contractual and Technological Override’ (Guidelines for the Implementation of the DSM Directive) accessed 4 May 2020

Oudenhoven M, ‘TDM and the Reading Revolution’ (FutureTDM, 12 April 2017) accessed 10 February 2020

Prosser M, ‘How AI Helped Predict the Coronavirus Outbreak Before It Happened’ (Singularity Hub, 5 February 2020) accessed 30 April 2020

Snow J, ‘This AI Can Spot Art Forgeries by Looking at One Brushstroke’ (MIT Technology Review, 21 November 2017) accessed 28 May 2020

White B and Jančič M B, ‘Articles 3-4: Text and Data Mining’ (Guidelines for the Implementation of the DSM Directive) accessed 27 April 2020.

51