<<

Licensing: More than meets the eye

Mashael Khayyat Trinity College Dublin & King Abdulaziz University, Jeddah [email protected]

Frank Bannister Trinity College Dublin [email protected]

Abstract

In discussions of data (hereafter simply open data or OGD) the question of how such data should be licensed or whether they need to be licensed at all has to date received only limited attention – at least in the academic literature. A common assumption, at least in the public sphere, is that a large fraction of the data collected by governments can and should be released free of any constraints or restrictions for all to access and do with as they will. However, even for data that do not fall within the ambit of the security of the state it is far from obvious that this must be so; different forms of formal licensing may be appropriate and necessary in many cases. A libertarian approach to OGD is just one of a number of licensing options.

A common assumption, at least in the public sphere, is that a large proportion of the data collected and held by governments can and should be released free of any constraints or restrictions for all citizens, communities and organizations to access and use as they wish. However, even for data that does not fall within the ambit of personal , the security of the state or is otherwise sensitive, it is far from obvious that this should be so; different forms of formal licensing may be appropriate in some cases and necessary in others. A libertarian, free-for-all approach to open government data is just one of a number of licensing options from which governments can choose.

This paper will explore the various dimensions of open data licensing. Starting from a definition of what a licence is, it will first look at the debate(s) that have surrounded licensing in the worlds of the open systems, freeware, shareware and . It will then examine and critique a number of existing or proposed open data licences including various international and national licencing frameworks. The Creative and Open Database (ODbL) Licenses will be critically examined and possible problems with the concepts underlying various licences will be explored. The question of what may be suitable for standard public licenses and what may require bespoke or customised licensing will be analysed. Other questions to be investigated will be the policing and conformance as well as the implications of modern analytics and the mashing up of large data sets from different sources.

1

1. Introduction

The modest, but growing, body of research into the barriers to the release of data collected and held by governments consistently includes a discussion of legal issues. There are several obstacles which fall under this general heading including existing legislative requirements such as data protection acts, rights, risks of consequential harm, commercial sensitivities, concerns about modern data analysis technology and individual privacy rights (Barry & Bannister 2014; Janssen 2012; Bertot et al 2010). Some scholars and theorists argue that, in such a legally complex situation, a well-designed licensing regime is not only necessary, it is critical to the success of open data initiatives. Creating a legal environment in which citizens, communities and corporates can use such data with clarity and confidence about their rights and obligations is essential if societies are to make the most of these data. Its absence is likely to hinder creativity and the economic, social and political benefits that are widely expected to ensue (Korn and Oppenheim 2011). According to Korn and Oppenheim an understanding of open data licensing is important for establishing “which and how” data can be re-used. It is not only important to understand the legal issues that may arise in the context of licensing open data, but also the different types of licences that are available and the implications that they carry with them.

This paper examines open government data licensing and explores a number of its dimensions. Its objectives are to highlight the complexities surrounding this topic, to examine some of these and to critique the current approach to open government data (OGD) licensing. This paper is organised as follows. Section two looks at the background to open licencing starting with a brief review of the Open Source movement and the approach to open licensing of . Section three examines current OGD/open data (OD) licences and different approaches to licencing. Section four is a critique of the concept of OGD and includes some reflections on how the question of OGD licensing might evolve. Section five is a brief conclusion and contains some recommendations for future research.

2.0 Background

2.1 The Open Software Movement and

In the world of information and communications technology (ICT) the term ‘licence’ is traditionally associated with software though licensing is also used for other type of intellectual property such as methodologies. Within the Information Systems (IS) literature and community there has been and continues to be much discussion about the relative merits and demerits of open software and open source. Over several decades, the Open Source movement has proposed or developed a number of business models which are designed to offer users various forms of freedom to modify software and pass it on, partially encumbered or unencumbered, to others. The foundational principle of the movement is that software should be free in the sense of free of restrictions on use and modification rather than free of charge (which is separate issue). Unsurprisingly, some complicated legal problems can arise once one starts exploring the question of software licences in any dept

2

Non- comes in a number of flavours. One key distinction is whether or not the source code is available. Freeware is the term generally applied to applications which anybody can use for free and without a licence, but which the user cannot modify or sell on a third party. Many PC games and utilities, for example, fall into this category. Some smartphone Apps broadly fall into this category. Another variation is shareware where the user needs a licence or permit and there may or may not be a charge for use. More complicated problems arise when source code is made available. This means that the user can modify the code, but while the original source code may be free, a user may feel entitled to charge for his modifications. Thus a developer may take some open source code, modify it and charge for the enhanced product. The latter may not matter provided he supplies the source code of his modification to other users to do what they want without further conditions or cost even though this will limit his ability to make money from his enhancement. Developers have sometimes tried to circumvent this problem by embedding proprietary code or by attaching proprietary add-ons to open source code. Others have tried to ‘claim jump’ and hijack the free source code. This problem has led to what is called the Open Source Definition which sets out a number of criteria for open source software namely:

 There must be free redistribution. No royalties or fees;  Distribution must include the source code;  Derived works are allowed. Modifications must be permitted;  The integrity of the author's source code must be maintained;  There can be no discrimination against persons or groups;  There can be no discrimination against fields of endeavour;  Distribution of licence. One licence covers all;  A licence must not be specific (tied) to a product ;  A licence must not restrict other software;  A license must be technology-neutral.

( 2014). Not all of these apply, or can be adapted to apply, to data, but some can. An attempt to create a similar set of principles for open data is discussed in section four.

An obvious question about open source is this: in such a world how does a software developer make a living? Various attempts have been made to address this problem and as a result there are currently over 100 free and/or open source licences available - including one from the EU namely the Public Licence (Joinup 2014; SchmThe most important and influential of these licences is probably the General Public Licence (GPL) which incorporates the concept of copyleft. Under a GPL licence, a developer who modifies open source code cannot impose any conditions on a user’s use further modification of the modified product although he can charge for the modifications that he has made ( Foundation 2014).

A full discussion of this is beyond the scope of this paper. This summary is presented because several of the problems and issues in open data have parallels in open source software though there are other issues that arise with data, but which are not a problem with software and vice versa. Nonetheless, given that the open source movement has been around for several decades there are likely to be useful lessons which can be drawn from the accumulated knowledge in this field. As will

3 be seen, the principle of copyleft has been adapted and applied to data in the Licence.

2.2 The Legislative Context

There is a number of critical laws surrounding the data that governments use and the way that government are allowed to use such data. Even without considering other factors (of which there are many – see below) existing legislation has multiple implications when it comes to licensing both software and data. Central to any discussion of data licensing are two types of act: data protection acts and (FoI) acts, though other legislation and quasi legislation (such as privacy rights and official secrets acts) also bear on licence design. Of these, the most important are data protection acts (DPAs)

Most developed countries now have a DPA. The first DPA was enacted in the in the German state of Hesse in 1970 (Privacy International 2014). Since that time almost every country in the developed world has enacted some form of this legislation (DLA Piper 2013). The primary purpose of such acts has been to protect the personal data of individual citizens (Korn and Oppenheim 2011). But governments hold far more than citizens’ personal details or matters related to state security. They hold large volumes of data which are commercially valuable and in particular which have embedded Intellectual Property Rights (IPR). Korn and Oppenheim note that IPR can encompass several subsidiary rights including , database rights, moral rights, and other rights. For the purposes of this paper the most important of these are copyright and database rights. The latter will be considered first.

A database structure might enjoy copyright which protects the author’s or designer’s rights in their creation. As a consequence, individual data items such as records and metadata may require copyright protection to protect a creator’s IPR. Such rights are also necessary to protect creators or designers from what Korn and Oppenheim describe as “derogatory treatment”, i.e. amending data or quoting it out of context in a way that could mislead others and potentially damage the reputation of the data creator or provider. Korn and Oppenheim state that there are three levels of rights regarding database rights:

 Database rights,  Dataset rights and  Data rights

These rights can arise where an individual or corporate has put substantial investment, be that in the form of financial and/or and/or technical resources, into the construction of a database or the assembly of data in such a database. A complex database design might be considered to be a form of intellectual property and legal protection of this may be justifiable. Korn and Oppenheim stress that when it comes to databases, datasets and data the question of IPR is usually complex and recommend taking legal advice about a database license before using others’ data is essential to avoid legal risks. Miller, Styles and Heath (2008) claim that without “a legally recognised ”, communities will lack the authority to publish data or products, services or even findings

4 derived from such data. Their solution to this problem is to use a ‘Share-Alike’ agreement for any open data. This concept is discussed further below.

The question of IPR in databases and data is a particularly difficult one for government. There are several reasons for this. As Nicol, Caruso and Archambault (2013, p.6) note:

“Governments produce and own large datasets [and] most national open data policies primarily target these datasets”.

Government datasets include personal data, corporate data, military intelligence, applications, criminal records, employee performance records and so on. It is not just the volume of data that governments hold that matters; it is the sheer variety. Governments also have much duplicated data typically scattered across multiple agencies and departments. Such data may be inconsistent and lack integrity. Government agencies range from those concerned with the security of the state to those whose job it is to foster enterprise or deal with serious social problems. And so on. It all makes for a demanding legal challenge when it comes to determining the rules, terms and conditions for data release.

2.3 Licensing and Open Data Policy

The need for comprehensive open data access policies was formally recognised in 2004 by the Ministers of Science and Technology of the thirty members of the Organisation for Economic Co- operation and Development (OECD) countries as well as those of China, Israel, Russia, and South Africa. Discussions of open data policy generally contain references to licencing as one of the steps involved in policy formulation. Schutzberg (2014) states that one of the key objectives of open data policy statements is to set out under what license conditions data are to be made available. Open data stakeholders (local, state, regional, federal and private entities) may choose to develop and implement open data policies. He lists some components that are commonly found in open data policy statements:

1. Which data are to be made open and which are not; 2. When past and future data are to be available; 3. Where the data can be obtained/accessed; 4. In what format(s) the data are to be available; 5. Under what license(s) the data are to be available; and 6. The cost (if any) for reproduction or use the data or any software associated with it.

Kaufman and Wagner (2012) also consider licensing as an integral aspect of open data processing. They consider a licence as one of seven steps necessary to open and maintain data, namely:

1. Find […] data; 2. Convert data; 3. Test […] output; 4. Write up a license agreement; 5. Publish and publicize;

5

6. Update and modify as needed; and 7. create and maintain a dialogue”

(Kaufman and Wagner 2012 as cited in Wimmer et al., 2013, p.77).

Another example of the presence of licencing in an open data framework can be found in the Ten Open Data Building blocks proposed by Davies (2012) (see table 1). Davies, despite comments he has made elsewhere (see below), would appear to acknolwedge that it is important to have an ‘explicit licence’ although in line with his general fairly libertarian approach he emphasises the importance of using licences that have the fewest constraints though with acknowledgment of the source of the data.

Open data Brief explanation building block 1.Leadership and “a top level mandate” from Senior politicians, and an “engaged and well resourced bureaucratic support ‘middle layer’ of skilled government bureaucrats” are essential to secure the release of open data (Hogge 2010). 2. Datasets Datasets are at the core of open data Open datasets need to be accessible(usually online), technically Open(in a non-proprietary format),and legally open (Eaves 2009). 3. Licences A range of copyright and intellectual property laws can cover Datasets. It is important to have an explicit license because without it, re- users will not know their legal permissions and rights of dealing with data such as sharing data, combining data with other data, building a commercial service off the back of a dataset. At the same time, open data advocates stress on the importance of facilitates licenses that have least constraints with acknowledgment of the source of the data. 4. Data standards Describes what a dataset can contain such as the fields of it, how they commonly represented, and what conventions should be used for sharing dates, locations, categories and other common elements. 5. Data portals A data portal provides access to open datasets, hosting meta-data that Describes them, and allowing visitors to search for Relevant datasets. 6. Interpretations, Third parties can provide their own interpretations or analysis in static reports and interfaces and publications; they can build interfaces and visualisations of data to show trends applications and patterns in it; and they can create Interactive applications that provide Useful functionality. 7. Outreach and Just putting data online is not enough to get it used. Outreach, community engagement building and engagement is required. The five stars of open data engagement explains that an open data initiative should: be demand driven; put data in context; support conversations around data; build capacity, skills and networks; and lead to collaboration on data as a common resource 8. Capacity building Capacity building often needs to Take place on both supply and use sides of an open Data initiative 9. Feedback loops Establish channels through which they can accept and work with feedback, either enhancing the data they hold, or taking action on the basis of feedback. 10. Policy and Develop a statutory footing by creating ‘right to data’ legislation, or writing open legislative lock-in data clearly into contracts and policies.

Table.1: Ten Building Blocks of an Open Data Initiative (Davies 2012)

All three of these discussions include licensing as a key component in the development of open data policies though views on the nature of those licences may not coincide. This leads to the question of

6 whether there should be (or even if is it possible to devise) a single open data licence in the same way that the Open Source movement has tried to do for software or will a number of different types of licence be required?

2.4 Defining “Licence”

According to the UK Licensing Framework:

“A licence is a legal document giving permission to use information” and it is considered as “a mechanism that gives people and organisations permission to re-use information and other material that is protected by copyright or database right. A licence should also provide clarity as to what users and re-users are permitted to do and whether there are any restrictions on the extent of that permission” (UK Government Licensing Framework, 2013, p.20, p.10)1.

Under the UK Licensing Framework, licences must set out clear conditions of use. For example users must not use the information/data to mislead others, misrepresent the data or suggest that any use that they make of the data is endorsed by a public sector body (and in particular the data source).

Davies (2012, p2)2 defines a license as setting out:

“… explicitly what someone who accesses a dataset can do with it […]and without an explicit license, a user does not know if they have the legal permissions to share data further, to combine it with other data, or to build a commercial service off the back of a dataset”

In a subsequent paper Davis et al (2013, p15) add that:

“Open Knowledge Definition (OKD) presents a stringent definition of an open license as one that requires, at most, attribution of the dataset source” [emphasis added].

Davies (2010) argues that “open” in this context means that the user is free to use, re-use and distribute the data, but with two important caveats: first the data source is always attributed and second that the any new information created from the original data is shared with others. This has parallels with the concept of copyleft. These precepts, like those of copyleft, may not appeal to some users. Unsurprisingly, Davis et al note that in practice many datasets do not meet the strong conditions of the Open Knowledge Definition.

A further nuance, noted by Davies et al, is that simple, permissive licenses are preferred because incompatible licenses may make it difficult to combine datasets – a key requirement for mashups or data analytics. Having presented this as a problem, Davies et al do not elaborate on what he means by incompatible or permissive licenses or give any examples of where this has happened; he simply uses it to underline what he considers to be the importance of permissive licenses.

1 http://www.nationalarchives.gov.uk/documents/information-management/uk-government-licensing- framework.pdf 2 http://www.opendataimpacts.net/2012/08/ten-building-blocks-of-an-open-data-initiative/

7

The concept of permissive licencing is also discussed by Miller, Styles and Heath (2008, p4) who present the case for this type of licence:

”…permissive licensing of data for the web means that we can all begin to move forward in lowering the walls of our silos, releasing data to play its part in the Data Web”.

However, they do not provide any technical explanation of what permissiveness means, confining their discussion to metaphor.

In the above discussions, an open data licence is considered to be a legal document (however brief) containing explicit rules as to what can and cannot be done with datasets. It may be noted that, besides Kaufman and Wagner’s steps, other importance aspects may need to be considered in opening and maintaining data, for example publicity or public notification so that citizens are aware of open data, what data are available and how such data can be used. In addition, they argue, it is important to foster public innovation by encouraging creative use of these data with prizes and/or awards.

3. An Overview of Data Licensing Approaches.

3.1 Some Open Data Licences

According to Hatcher and Waelde (2007) (cited in Davies 2012) open data providers may “create a customized” licensing framework or “use one of the standard” open database licenses. According to Miller, Styles and Heath (2008), the Talis Community License released the first public open data license in 2006. Korn and Oppenheim (2011) list the following ‘standard’ licences:

1. Creative Commons Attribution; 2. Creative Commons Zero (CC0); 3. and Dedication Licence (PDDL); 4. Open Data Commons Attribution Licence (ODC); 5. Creative Commons Attribution Share Alike (but limited interoperability); 6. Open Government Licence (OGL).

Shutzberg (2014) adds to the list above a number of open data licenses from public authorities in the UK. In practice the emergence of many such localised licences seems probable.

Like Hatcher and Waelde, Korn and Oppenheim note that there are both standard licence and bespoke licences. They suggest that while bespoke licences facilitate the use and potential reuse of data, standard licences lead to better interoperability and increased user awareness of the licence terms which in turn leads to better compliance. They also claim that bespoke licences are not common; most licences are standard.

3.2 Two Important Open Data Licences

8

Two important general purpose licenses are the Open Database Licence (ODbL) and the Creative Commons licence(s). The ODbL is published by the Open Knowledge Foundation. The ODbL is particularly well suited for countries in the European Union because these countries have specific rights that cover databases and the ODbL specifically addresses these rights. While databases can contain different content such as text, images and videos, the ODbL does not cover contents of each component of the database; instead it governs the rights over the database. In is therefore only a half-solution as users have to licence the content of the database separately. ODbL specifies that:

“…any subsequent use of the database must provide attribution, an unrestricted version of the new product must always be accessible, and any new products made using ODbL material must be distributed using the same terms. It is the most restrictive of all ODC licenses.”3

Arguably the most important standard open data licence offered to date is the Creative Common (CC) licence. The CC licence is the brainchild of the American academic and uses the same principles as copyleft. According to Chignard (2013, p2):

“Lawrence Lessig, […] is the founder of Creative Commons licenses, based on the idea of copyleft and free dissemination of knowledge.”

Chignard claims that the CC Licence is the most widely used licence for open data and one which:

“…provides authors with a way of formalizing their legal right to offer, in effect, to their work”.

According to creativecommons.org4, The Creative Commons copyright licenses and toolset provides individuals, companies and institutions a standardized way of granting copyright permissions to use their creative work. Data made available under the Creative Commons licenses can be distributed, copied, edited, remixed, and built upon, all within the boundaries of copyright law.

The licence has a three layer structure illustrated in figure 1.

Figure.1 the three layers design of CC license

3 http://guides.uflib.ufl.edu/content.php?pid=32772&sid=3760010 4 https://creativecommons.org

9

The top layer is the Legal Code within which the licence is valid. The second layer is the Commons Deed, which is sometimes referred to as the “human readable” version of the license. A feature of a CC license is that presents information in a format that ordinary citizens can read and understand. Most publishers and re-users (creators, educators, scientists, etc.) have no legal training and hiring professional legal expertise is expensive – especially for individuals and voluntary or community groups. Lastly, in order to make the Web recognize when a work is available under a , a Machine Readable version of the licence is in a third layer.

Implementing a CC Licence requires the two steps shown in figure 2. The licence has a number of options for data use (see figure 3). One of these options is, in effect, a data equivalent of copyleft (and even uses the backward C symbol of copyleft)

Figure.2: Steps in publishing data under Creative Commons (CC) Licence.

10

The meaning of the six options for licensing under the Creative Commons5 in figure 2 are

Attribution (CC BY). This is the most open of the licence variants. Provided the user acknowledges the source of the data (s)he can do anything that (s)he wishes with it including adding to the data, manipulating it and offering any derivations of it for sales. Note that this, like other CC licences, does not stipulate any requirement for this to be free of charge.

Attribution – No Derivs (CC BY-ND). This permits the user to redistribute the data, but not to add to, modify or manipulate it. They can charge or use it for commercial purposes provided the source provider is credited.

Attribution-Non Commercial-ShareAlike (CC BY-NC-SA): Under this licences users can do what they want provided they acknowledge the data provide and pass their work on to others under the same (CC BY-NC-SA) licence.

Attribution- ShareAlike (CC BY-SA). This is the equivalent of copyleft, i.e. users may do what they want with the data provided that they credit the provider as the source of the original data and pass any of their own work or added data on free on the same terms as they were given the data.

Attribution-Non Commercial (CC BY-NC): This allows the user to manipulate and add to the original data and pass this on on a non-commercial basis. Whilst the new version must acknowledge the original; it does not have to be licensed on the same terms.

Attribution-Non Commercial-NoDerivs (CC BY-NC-ND). This is the most restrictive of the six CC licence forms. It allows third parties to access the original data and use it with acknowledgement as well as to share it, but they are not allowed change it or use it commercially.

A user using one of these licences is required show the type of licence with any product or service they on it.

Finally there is the so called Creative Commons Zero (CC0) licence. This licence has no restrictions or requirements whatsoever, not even to attribute the source. Effectively the data suppliers waive all of their rights (or as many as they can). Note that this is not one of the six categories offered under the CC licence regime above.

3.3 Some European Open Data Licensing Models

Different countries have adopted different approaches to licencing. Bunakov and Jeffery (2013) show the national Public Sector Information (PSI) portals of eight countries. This is summarised in table 3. Each country has its own regulations for data re-use.

5 https://creativecommons.org/licenses/

11

Table 3: European governmental data portals Licences (Bunakov and Jeffery 2013)

This table hides what are more complicated and nuanced licencing regimes in a number of countries. For example the French Licence Ourverte includes a number of features than can accompany Creative Commons categories (figure 4). Each white box in figure represents a granular regulation component within open data licence.

Figure 4: Regulation components of the French governmental portal for open licence Bunakov and Jeffery (2013)

12

As noted above the UK (like the Netherlands) uses a framework (see figure 6)

Figure 5: UK Government Licensing framework Source: http://www.nationalarchives.gov.uk/documents/information-management/uk-government- licensing-framework.pdf (page 24)

Germany offers multiple licences for different modes of data reuse. German governmental agencies options to choose the most appropriate licence for each case of data publishing.

Ireland is not included in Bunakov and Jeffery’s analysis. In Ireland a National Cross Industry Working Group (2012) recommended open data providers have a licence model that clarifies the financial side of the licence. They also recommended that an ad hoc open data licence be created specifically for Ireland.

3.4 Conformant and Non Conformant Licences.

The Open Definition Organization6 categorises licenses into Conformant and Non-Conformant. Conformant means that they conform to the principles set forth in . Conformant Licenses are classified as Recommended, Non-reusable, Little Used, Discontinued or Deprecated as shown in Tables 4, 5, and 67. Table 7 shows a list of non-conformant licences and table 8 discontinued licences.

6 opendefinition.org/licenses/ 7 http://opendefinition.org/licenses/

13

Licence Domain By SA Comments

Creative Commons CCZero (CC0) Content, N N Dedicate to the Public Domain (all rights Data waived) Open Data Commons Public Data N N Dedicate to the Public Domain (all rights Domain Dedication and Licence waived (PDDL) Creative Commons Attribution 4.0 Content, Y N (CC-BY-4.0) Data Creative Commons Attribution (CC- Content Y N All versions 1.0-3.0, including jurisdiction BY) “ports” Open Data Commons Attribution Data Y N Attribution for data(bases) License (ODC-BY) Creative Commons Attribution Content, Y Y Share-Alike 4.0 (CC-BY-SA-4.0) Data Creative Commons Attribution Content Y Y All versions 2.0-3.0, including jurisdiction Share-Alike (CC-BY-SA) - “ports”; version 1.0 is little used and not recommended because it is incompatible with future versions. Open Data Commons Open Data Y Y Attribution-ShareAlike for data(bases) Database License (ODbL) Free Art License (FAL) Content Y Y

Table 4: Conformant Recommended Licenses

Licence Domain By SA Comments

UK Open Government Licence Content, Y N For use by UK government licensors; re-uses of 2.0 (OGL-UK-2.0) Data OGL-UK-2.0 material may be released under CC-BY or ODC-BY. Note version 1.0 is not approved as conformant Open Government Licence – Content, Y N For use by Canada government licensors. Note Canada 2.0 (OGL-Canada-2.0) Data version 1.0 is not approved as conformant Table 5: Conformant Non-reusable or Little Used Licenses

Licence Domain By SA Comments

GNU Free Documentation Y Y Only conformant subject to certain provisos License (GNU FDL

MirOS License Code, Y N Little used Data Talis Community License Data ? ? Deprecated in favour of ODC licenses

Against DRM Content Y Y Little used

Design Science License Data Y Y Little used

EFF Open Audio License Content Y Y Deprecated in favour of CC-BY-SA

Table 6: Conformant but Deprecated Licenses

14

License name Comments Creative Commons No-Derivatives (by-nd-*) violates principle 3., Creative Commons No-Derivatives “Reuse”, as they do not allow works, in part or in whole, to be re- Licenses used in derivative works. Creative Commons Non-commercial licenses (by-nc-*) do not Creative Commons Non-Commercial support the Open Knowledge Definition principle 8, “No Discrimination Against Fields of Endeavor”, as they exclude usage in commercial activities. Used on Gutenberg’s ebooks of public domain texts. It is non- Project Gutenberg License open because it restricts commercial use. Note that the license only applies if the user continues to use the Gutenberg name – if you remove the licensing information and any reference to Project Gutenberg then the resulting text is open. Table 7: Non-conformant Licenses

License name Comments The license has been discontinued. Creative Commons Creative Commons Developing Nations developing nations license does not support principle “7. No License Discrimination Against Persons or Groups”.

Discontinued in favour of Creative Commons. In late 2004 the Open Publication License site was overhauled and turned into a portal to open academic content. In August 2007, David Wiley, the author of open content launched the draft License. License is not conformant if either options A or B are added to the main body of the license. Option A prohibits ‘substantive modification’ and option B prohibits commercial use of printed copies. Formerly used for a variety of material produced by UK central UK PSI (Public Sector Information) Click- and local government. This license is not open. Use Licence Table 8: Discontinued Licenses

Table 9 summarises the position in relation to a number of important licences drawn from Korn and Oppenheim (2011, p6) and Halonen (2013, p.61).

15

Who can use Licence Type the resource Can the licensed data be modified? Suitability for data, datasets and and under databases what terms?

Creative Commons:

Attribution (CC-BY) Anyone YES, but you must attribute. You must also Not specifically geared towards data, datasets ensure that you do not impose any and databases, but can be used with minimal restrictions on the whole of the work licensed amounts of data (to avoid attribution stacking) beyond the terms of this licence. and as long as only an “insubstantial” amount of any databases or datasets are reused.

Attribution Share Anyone YES, but you must attribute and if you use or As above. Share Alike requirement can impact Alike (BY-SA) reuse the data etc., you must use the CC BY negatively on interoperability of data and SA end user licence for onward licensing. prevent linked open data.

Attribution Anyone – for YES, but you must attribute. As above. Although NC restriction does not Non-Commercial non-commercial pose immediate problems, but ambiguity of (BY-NC) purposes only what constitutes non-commercial may be problematic. There may also be interoperability problems with linking to data licensed under more permissive terms.

Attribution No Anyone NO and you must attribute. As above. Reuse and repurposing of data, Derivatives (BY-ND) datasets and databases not permitted.

Attribution Anyone – for YES, but you must attribute and if you use or As above. Share Alike requirement can impact Non-Commercial non-commercial reuse the data etc., you must use the CC BY negatively on interoperability of data and Share Alike purposes only SA end user licence for onward licensing. prevent linked open data. Although NC (BY-NC-SA) restriction does not pose immediate problems, but ambiguity of what constitutes non- commercial may be problematic. There may also be interoperability problems with linking to data licensed under more permissive terms.

Attribution Anyone – for NO and you must attribute As above. Reuse and repurposing of data, Non-Commercial non-commercial datasets and databases not permitted. No Derivatives purposes only Although NC restriction does not pose (BY-NC-ND) immediate problems, but ambiguity of what constitutes non-commercial may be problematic. There may also be interoperability problems with linking to data licensed under more permissive terms.

Creative Commons Anyone YES, with no restrictions whatsoever. Ideal. Zero

Open Data Anyone YES but you must attribute any public use of Ideal – although there may be some attribution Commons Open the database, or works produced from the requirements, leading to possible attribution Database Licence database, in the manner specified in the stacking and also interoperability issues ODbL. For any use or redistribution of the associated with the Share Alike requirement. database, or works produced from it, you must make clear to others the license of the database and keep intact any notices on the original database. Share-Alike: If you publicly use any adapted version of this database, or works produced from an adapted database, you must also offer that adapted database under the ODbL.

Open Data Commons Anyone (applies Yes – but you must attribute any public use Ideal – although there may be some attribution Attribution Licence to data, datasets of the database, or works produced from the requirements, leading to possible attribution and databases) database, in the manner specified in the stacking. ODbL. For any use or redistribution of the database, or works produced from it, you must draw third parties’ attention to the original licence of the database and keep intact any notices on the original database.

Public Domain and Anyone (applies YES, with no restrictions whatsoever Ideal. Dedication Licence to databases)

Open Government Anyone (applies YES, but you must attribute. Can be used with minimal amounts of data (to Licence 16 to content, data, avoid attribution stacking). databases and source code)

16

4. Reflections and Critique

4.1 Principles versus practicalities

In December 2007 a group of open government advocates held a meeting in Sebastopol, California to discuss OGD. From this emerged a set of eight principles (as well as some sub principles which are not reproduced here). They argued that open government data should be:

1. Complete. All public data that is not subject to valid privacy, security or privilege limitations should be available. 2. Primary. Data should be made available at as low a level of detail as is available, not just in aggregated or summary form. 3. Timely: Data should be released as quickly as possible. 4. Accessible: Data should be available to as wide a range of users as possible and for as wide a range of purposes as possible; 5. Machine processable: 6. Non-discriminatory: There should be no requirement to register or provide personal information in order to obtain data. 7. Non-proprietary: It should not be possible for anybody to acquire any kind of proprietary right or technical control over public data 8. License-free: Data should not be subject to any copyright, patent, trademark or trade secret regulation.

The group proposed that reasonable privacy, security and privilege restrictions “can be allowed”, but they did not pursue the implications of this in any detail. This is quite a libertarian manifesto and while it raises some difficult questions, it provides a useful baseline against which to discuss issues in OGD licensing.

4.2 Ownership rights in data?

Copyright in data is a complicated matter. Schutzberg (2014) declares that, unlike the Open Source initiative which keeps a list of licenses that follow open source principles, there is no equivalent process involved for open data licences. According to Miller, Styles and Heath (2008, p2):

“Copyright protection applies to acts of creativity and categorically does not extend either to databases [or] to those non-creative parts of their content.”

(emphasis added). They go on to argue that data should be open by default and that there is no need for copyright because copyright and related forms of protection are only for creative work. Unfortunately this assertion is misleading and incorrect. First the law recognises a sui generis right for databases which is specifically designed to recognise the cost of compiling such a database. In the EU, this right falls under Directive 96/9/EU of the European Parliament and Council. Secondly, certain types of data may be copyright. Under Article three of the EU directive, databases which:

17

"…by reason of the selection or arrangement of their contents, constitute the author's own intellectual creation" are protected by copyright. So while it may not be possible to copyright a number, it is quite possible to copyright a photograph.

This raises potential difficulties for OGD licences. Consider a situation where a professional photographer takes a picture of a building for use in (say) a state publication on historic public buildings. Does the photographer or the state retain copyright in that picture? Arguably the answer could be yes. What then is the position if this photograph is embedded in some other document containing data which is clearly not copyright (such as viewing hours for the building in question)? How does one deal with copyright in this situation? This problem is not insoluble, but solutions may be messy or expensive to implement.

While, therefore, it can be argued that where the data is simply collected either directly (as in a census or survey) or as a by-product of another process (such as making a passport application) there is little or no creativity involved and therefore no copyright this may not be true when other forms of data, such as sound and pictures or video, are concerned. As Davies, Perini, and Alonso (2013, p.15) note, combining different data sets can create much value out of open data and that this in turn may create:

“…significant challenges in determining the legal status of derivative datasets”.

In short, this is a complex area and it is far from clear that current licence regimes deal with it adequately.

4.3 Should all OGD be free?

This leads directly to the question of whether OGD should be free of charge. The various open data licences discussed in section three are primarily concerned with ownership rights and about rights of usage and acknowledgement rather than payment. Ownership of data can have a number of meanings. One meaning is that the data is, in some sense, owned by the public. If somebody goes to trouble and expense to compile data that is in the public sphere (to take a trivial example, the number of non-pay car parking spaces available in different locations in a city) does that person ‘own’ these data? Current law suggests that they have some property rights in this situation. If a private individual or organization collects such data, it is under no obligation to give it away for free; many companies collect and sell such publically available data on a routine basis. The company cannot not claim copyright in the base data, anybody else is free to collect the same data, but it can claim property rights in the compiled data. If, on the other hand, the state collects the data then, the argument runs, since the costs of collecting these data have been borne by the taxpayer the taxpayer owns it and is entitled to free access to it.

This is a common argument, but can be questioned on the grounds that there are plenty of examples of taxpayer funded resources, ranging from tolled roads to national parks, for which those who actually use or directly benefit from those resources pay a charge, even if the charge is a small

18 fraction of the full economic cost of provision. Where a resource or service that benefits a specific subgroup of society is funded by general taxation, then it may well be appropriate that the beneficiaries makes an additional contribution to that cost. In the case of OGD, suppose a commercial organisation can use government collected parking space data to generate a profit (say by creating an App identifying where such spaces are available); this will not be of much benefit taxpayers who do not own cars. Why then, should the government not try to recompense all taxpayers by making those who may benefit from using the data pay to use it?

4.4 Privacy risks?

At first glance, the question of privacy would appear to have no implications for licensing of OGD as such. Private data should not be released in the first place so issues of personal privacy should not arise or need subsequent legal protection. Of course there is the question of determining what data should be released, but that upstream of licensing. Once again, practice is not this clear cut.

Public ownership of data was discussed above. A second type of data ownership is personal. How much of a citizen’s data is legitimately in the public domain even (where possible) in an anonymised form? Name and address may be public information, but what about social security or personal identity number? What about personal tax returns, health records, details about minor infractions of the law, social welfare receipts, driving license information, passport number? As noted, in general both principles and licences do not deal with the question of privacy on the presumption that private data will not or should be not be released in the first place and therefore need not be protected in licences. But, as recent events have demonstrated, seemingly ‘non private’ data can be exploited to target individuals for commercial purposes by companies as such as Google and Facebook (and they are the visible face of this industry; there are far less visible and more worrying entities at work in the data mining business) and in so doing have impacted adversely on what people perceive as their privacy.

This capability comes from data analytics and the power of modern technology driven inference. As a result, from a privacy perspective, it is no longer valid to assume that once data is anonymised or even when it is aggregated, that individuals are safe from threats to their privacy. As increasing amounts of data are combined from multiple public and private sector sources, deanonymisation become a greater risk. The risks from this are not just that citizens are targeted with unsolicited ‘personalised’ advertisements, but of more ominous impacts like higher insurance premiums, difficulties in obtaining credit and other forms of discrimination.

4.5 Legal risks to the taxpayer?

One of the characteristics of FoI as a form of OGD has been its use to embarrass politicians and public servants (Worthy 2010; Grimmelikhuijsen 2010). While this might be considered healthy in a democracy, it suggests that there may also be risks to the taxpayer arising from OGD. One source might be incorrect data. Consider an example where a government agencies dealing with social welfare has a data on a computer that says a particular citizen is believed to be a fraud risk, but that these data are incorrect. As long as such data is confidential and restricted to (say) case officers, there is limited legal risk to the state if only because it is unlikely that the citizen concerned will ever

19 find this out. However, were such data to get into the public domain it would constitute libel and a citizen might justifiably sue the state. A further problem is downstream impacts. According to Davies , Perini, and Alonso (2013, p.15) combining different data sets can create much value of open data, yet it is argued that can create:

“significant challenges in determining the legal status of derivative datasets”.

Unless licences are well designed and bullet proof, it is not difficult to envisage the state becoming embroiled in legal disputes about rights and ownership.

4.6 Problems with Existing Licences?

Science Commons and Creative Commons8 make a number of criticisms of existing licences. They state that there are many objectives of sharing data including:

 Reducing unnecessary transaction costs,  Simplifying legal tools and  Providing clarity and certainty of (provider and re-user) rights

They go on to suggest that the Open Database License (ODbL) fails to attain these objectives for sharing data publicly for several reasons including:

 “ODbL fails to promote legal predictability and certainty over the use of databases.  ODbL is complex and difficult for non-lawyer to understand and apply.  ODbL can result in high transaction costs on the community.  ODbL imposes contractual obligations even in the absence of Copyright.”

They propose using other licences including the CC0 which they consider to be more consistent and simple than ODbL. In addition, they suggest using other public domain dedications or copyright waiver. An obvious question is whether governments would (or even could) grant a copyright waiver for such data.

Furthermore, although, Open Government Licence covers many aspects of re-using data, it does not cover several other critical aspects including9:

 “Personal data in the Information;  Information that has neither been published nor disclosed under information access legislation (including the Freedom of Information Acts for the UK and Scotland) by or with the consent of the Information Provider;  Departmental or public sector organisation logos, crests and the Royal Arms except where they form an integral part of a document or dataset;  Military insignia;

8 http://sciencecommons.org/resources/readingroom/comments-on-odbl/ accessed on 03/03/2014 9 ibid

20

 Third party rights the Information Provider is not authorised to license;  Other intellectual property rights, including , trademarks, and design rights; and identity documents such as the British Passport”

A more extreme view is expressed by Miller, Styles and Heath (2008) who express doubts about the potential benefits of open data licensing at all. They suspect that licensing could hinder the process of opening up data and may even discourage the re use of data.

4.7 Problems of Licence Design?

Ubaldi (2012, p37) comments that:

“FOI and PSI legislation as well as clear licensing guidelines are a cornerstone of OGD”

He emphasises three important prerequisites in order to be able to publish open data namely:

 The presence of a Freedom of Information concept;  Good Public Sector Information legislation and  Clear guidelines of open data licenses

Tin practice the first two of the above may not always be there and the third is easier to state than to deliver. For example licences may need to be tailored to different types of user. A licence for business or commercial use might be different from that for research use or use by non-profits or state agencies. Even the geographical nature of data usage might be dealt with differently, in terms of global, international, national and local usage. Sharing data in a small community is different from sharing data between countries. Consequently, different terms and conditions may apply to both types of user and geographical location of usage. There are already political debates about storage of data outside of the jurisdiction in which users reside (Kandukuri et al 2009; Kertesz and Varadi 2014). Can a licence be enforced once data has been moved to a different polity? Developing a unified legal framework for open datasets is seen as an important issue to resolve as data increasingly travels across and is stored beyond national boundaries in jurisdictions where different rules rights may apply. Halonen (2012) argues that there is a need for internationalisation of open data, in which licensing is a major determinant. She argues that data must be licensed under a licence that recognizes the users’ rights to take advantage of data in a range of ways including commercial data/right/purpose though her arguments for this are rhetorical and normative.

4.8 Policing Compliance

Finally, having established a licencing regime, the regime needs, like any other regulatory system, to be policed. Problems that might arise are complaints about misuse or misleading use of data, privacy breaches, disputes over the right to charge for added value, disputes over copyright and possibly even more arcane matters such as the right to be forgotten (which, as is currently becoming evident, in itself opens a whole host of legal issues the consequences of which are still being worked out).

21

It is possible, though it seems improbable, that governments will simply be able to release data subject to a licence and then walk away. In addition to offices for Data Protection and FoI, government are likely find that they need an Open Government Data Commissioner and accompanying bureaucracy. This remains to be seen. The importance of licences may also vary based on the re-use of data and this may need regulatory control. For example, if a company plans to redistribute the data, then arbitrators may need to be involved to make sure that the authorization is given to re-use the data. In contrast, less concern regarding the licence will be accrued if a citizen plan to re-use published data because the citizens may assume the provision of the data online gives them right to re-use published data. However, in both cases, the need of clear and simple licence is important even where the data are already available publicly.

5 Summary and Some Concluding Thoughts

The purpose of this paper has been to unpack some of the issues surrounding OGD licensing. The libertarian view that public data, bar some exceptions for security, privacy, etc., should open and free, in both senses of the latter, is, as has been shown, is faced with many potential legal pitfalls and may not be politically realistic. The problems can be summarised under eight broad headings:

 First: what data is to be released? As noted, this would not appear initially to be a licencing problem. However the question of what happens when data that was thought to be sanitised from a security, privacy or commercial sensitivity perspective turns out not to be so? Can licences anticipate and contain such risks?

 Second: there is the question of return for the taxpayer on money invested in data acquisition. Should the beneficiary pay?

 Third: directly related to the preceding point there is the question of acquired property rights. If private companies can acquire this right, states can.

 Fourth: contrary to popular perception, copyright can exist in OGD particularly in the form of visual material, but also potentially in other forms of data and meta data.

 Fifth: there is the question of consequential legal problems for the state arising from data release and whether licences can be designed which can forestall this (and it is far from evident that this is the case).

 Sixth: there is the problem of control of the use data once it moves beyond the jurisdiction of the issuing government.

 Seventh: there are questions of differential licencing for different types of user and the problems of containing restricted licences.

22

 Finally there is the matter of establishing a regulatory infrastructure to manage and police this process.

The rising demand for OGD is not primarily about – it is about releasing and creating value. Already there is a rapidly growing number of products and services available which are based on such data. Human ingenuity and creativity being what it is, there are undoubtedly numerous good things that will emerge from the release of such data in years to come. In such circumstances, it is easy to understand why so many people believe that licences only get in the way. This examination suggests that festine lente might be a better motto. There are several benefits of licensing, not the least of which is providing users with a certainty about where they stand. There is also the lurking worry about Donald Rumsfeld’s famous ‘unknown unknowns’: the possible consequences of analytics, data mining, mash ups, machine learning and other emergent technologies.

One of the most closely guarded secrets of the modern era is how to construct a nuclear weapon. In 1976 a 21 year old student names John Phillips, while still an undergraduate student at Princeton and working from published materials including textbooks, produced an outline design for a world- war two type nuclear bomb (Rein 1976). There is some controversy as to whether Phillips’ design would have worked, though some nuclear engineers thought that it could have been used to construct an operational weapon. This was done long before the invention of the Web and in an area where massive state security and protection of critical data was involved. The moral is that one can never be sure what data will yield to the intelligent mind. Consequently, when it comes to OGD licensing it might be well to remember the words of former US president Theodore Roosevelt that governments should “speak softly, but carry a big stick”.

References

Barry, E. and F. Bannister (2014) Barriers to Open Data Release; A view from the top, Information Polity, 19(1/2), 129-152.

Bertot, J.C., P.T. Jaeger, J. M. Grimes (2010) Using ICTs to create a culture of transparency: E- government and social media as and anti-corruption tools for societies, Government Information Quarterly, 27(3), 264-271.

Bunakov, V. & Jeffery, K. Licence management for Public Sector Information (2013), Parychek, P and N. Edelmann (Eds.) CEDEM, Proceeding of the Conference for E-Democracy and Open Governement, 277-288.

Chignard, S. (2013). A Brief History of Open Data. ParisTech Review. Available at: http://www.paristechreview.com/2013/03/29/brief-history-open-data/

Davies, T. (2012) Ten Building Blocks of an Open Data Initiative. Available at: http://www.opendataimpacts.net/wp-content/uploads/2012/08/Ten-Building-Blocks-of-an-Open- Data-Initiative.pdf

23

Davies, T. (2013) Fernando Perini, F. Alonso, J. 2013. Researching the emerging impacts of open data ODDC conceptual framework. Available at: http://www.opendataresearch.org/sites/default/files/posts/Researching%20the%20emerging%20im pacts%20of%20open%20data.pdf

DLA Piper (2013) Data Protection Laws of the World, DLA Piper. Available at: http://files.dlapiper.com/files/Uploads/Documents/Data_Protection_Laws_of_the_World_2013.pdf.

Free Software Foundation (2014) A Quick Guide to GPLv3. Available at: http://www.gnu.org/licenses/quick-guide-gplv3.html

Grimmelikhuijsen, S. (2010) Do transparent government agencies strengthen trust?, Information Polity, 14(3), 173-176.

Halonen, A. (2012). Being open about data. Analysis of the UK open data policies and applicability of data. Available at: http://finnish-institute.org.uk/images/stories/pdf2012/being%20open%20about%20data.pdf.

Janssen, M., Y. Charalabidis and A. Zuiderwijk (2012) Benefits, Adoption Barriers and Myths of Open Data and Open Government, Information Systems Management, 29(4), 258-268

Joinup (2014) European Union Public Licence. Available at: https://joinup.ec.europa.eu/software/page/eupl

Kertesz, A. and S. Varadi (2014) Legal Aspects of Data Protection in Cloud Federations, in Nepal, S. and M. Pathan (Eds.) Security, Privacy and Trust in Cloud Systems, Berlin, Springer-Verlag.

Korn, N. & Oppenheim, C. (2011) Licensing open data: a practical guide (version 2.0). Hefce: JISC, junio.

Miller, P., Styles, R. & Heath, T. (2008) Open data commons, a license for open data. Proceedings of the 1st Workshop about on the Web (LDOW2008).

Nicol, A., Caruso, J. & Archambault, É. (2013) Open Data Access Policies and Strategies in the European Research Area and Beyond, info@ science, 1, 495.6505.

Open Source Initiative (2014) The Open Source Definition (Annotated). Available at: http://opensource.org/osd-annotated.

Rein, R. K. (1976) A Princeton Tiger Designs An Atomic Bomb in a Physics Class, People, 6(17), October 25th 1976. Available on line at: www.people.com/people/archive/article/0,,20067027,00.html

Privacy International (2014) Data Protection and Privacy. Available at: https://www.privacyinternational.org/issues/data-protection-and-privacy-laws

24

Schmitz, P-E. (2013) The European Union Public Licence (EUPL), International Free and Open Source Software Law Review, 5(2), 121-136.

Schutzberg, A. (2014) Nine Things You Need to Know about Open Data [Online]. Available: http://www.directionsmag.com/articles/nine-things-you-need-to-know-about-open-data/385680 [Accessed 25/02/2014.

Ubaldi, B. (2013) “Open Government Data: Towards Empirical Analysis of Open Government Data Initiatives”, OECD Working Papers on Public Governance, No. 22, OECD Publishing. http://dx.doi.org/10.1787/5k46bj4f03s7-en

Wimmer, M., Scholl, J., Janssen, M. & Traunmüller, R. (2013) Electronic Government. Proceedings of Ongoing Research, General Development Issues and Projects of EGOV, Berlin, Springer-Verlag.

Worthy, B. (2010) More Open but Not More Trusted? The Effect of the Freedom of Information Act 2000 on the United Kingdom Central Government. Governance, 23 (4), 561–582.

25