Digital Object Identifiers for Publishers and the e-Learning Community

Submitted: September 2003 A Report for the JISC from TSO

The Information Management Company Digital Object Identifiers for Publishers and the e-Learning Community

Contents

Page

I Conclusion 3 2 Introduction 4 2.1 Scope 4 2.2 Digital Identifiers 4 2.3 Stakeholders 5 3 Evolution of Digital Identifiers 6 3.1 Non-digital Origins 6 3.2 Digital identifiers for digital objects 7 3.3 Universal resource names 8 3.4 Persistent urls 8 3.5 XRI 9 3.6 Handle 9 3.7 Digital object identifiers 10 3.8 Metadata for digital identifiers 11 3.9 Building digital object identifier systems 11 3.10 Minimum requirements for publishers and the e-learning community 12 4 Digital Object Identifier Lifecycle 14 4.1 Definition 14 4.2 Assignment 14 4.3 Publication 14 4.4 Resolution 14 4.5 Maintenance 14 5 Uses of Digital Object Identifiers 16 5.1 Publication 16 5.2 Discovery 17 5.3 Syndication and assembly 17 5.4 Digital object identifiers and handle processes 18 5.5 Costs 21 6 Use Scenarios 22 6.1 Publishing with a digital identifier 22 6.2 Embedding dois 22 6.3 Identifiers for metadata records 22 6.4 Digital rights 22 6.5 Multiple resolution 23 6.6 Multiple copies 23 6.7 Cross sector example 24 6.8 Researcher 25 7 Further Work 26 8 Recommendations 26 9 References 27 10 Glossary 28 11 Case studies 29 11.1 Publisher case study: granada 29 11.2 The case study: sosig 30 11.3 Publisher case study 2: TSO (The Stationery Office) 34 11.4 The case study 2: sunderland 35 12 Acknowledgements 36

2 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

1 Conclusion

A digital identifier system adopted by publishers in the higher and further education sectors in the UK should be able to: ● reference multiple object types ● be able to integrate with existing standards ● offer scope for future extensions and migration ● satisfy the needs of the broad JISC community and other sectors ● avoid any semantics or location information in the identifier

It is unlikely that the whole JISC community will use a single identifier system for the following reasons. ● A single, local identifier implementation may disable medium to long-term goals of JISC information interoper- ability and it could create a narrow ghetto that restricts interoperability with other sectors. The broader needs of a range of other sectors should and must be considered. ● Many information objects and or metadata records, in particular those that are provided by third party publishers, may be associated with digital identifiers prior to them entering the JISC community. It may be therefore be essential for the JISC community to create, and interact with, a range of existing identifier types to enable exchange. ● Digital Identifiers need to do more than just exist; they must also serve some extended function. Digitally enabled identifiers can support certain information service processes once their particular scheme is known. ● Actionability requires the availability of metadata to access services. These elements can contain descriptive, administrative, classification and service information. ● The provision of associated Identifier Services, such as being able to access a unique description of a resource, which are regarded to be key to interoperability for digital information systems. ● The informal sharing of information resources may have different digital identifier requirements to that of the more formal traditional publishing and dissemination processes.

There are some services that are critical and should be centrally influenced by government, including the control of namespaces, a single government registration authority and key resolution services or gateways. In addition, there is a need for other, more informal methods of creating digital identifiers that have low cost and minimal barriers for users.

In conclusion, it is prudent to positively review the acceptance of handle based Digital Object Identifiers, which have international deployment across many information communities, in the UK and beyond. They have considerable advantages for information publishing in particular the ability to go beyond an opaque identifier to that of an “actionable” identifier i.e. to be resolved to single and multiple locations and offer authentication capability.

3 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

2 Introduction

2.1 Scope

This report is concerned with persistent, digital identifiers and their use in the UK higher and further education sectors. It considers what digital identifiers are and what can be done with them. The creators and users of the identifiers may be publishers, authors, JISC services, tutors or learners. Identifiers can be associated with almost anything, but the two main types of items that are considered are: ● digital content objects for e-learning communities ● the metadata records associated with learning objects

2.2 Digital Identifiers

Digital identifier is a generic term for a label or name that can be transmitted electronically. They can be associated with electronic, non-electronic or abstract entities such as books, images, reports, metadata records or events.

A persistent identifier is intended to provide permanence and is expected to be globally valid and unique. Modern, persistent, digital identifiers are derived from previous work and experience in the publishing world, with its use of ISBN book identifiers, and the world wide web community with its use of URL website addresses.

There is a large family of identifiers such as Digital Object Identifier (DOI), Uniform Resource Identifier (URI), (URN), Persistent Uniform Resource Location (PURL) and eXtensible Resource Identifier (XRI). Digital Object Identifiers offer the capability to point to more than one associated location and use metadata (data about the DOI), which provides information as to the title of the resource, its author and nature.

Persistent, digital identifiers need to be effectively managed throughout their lifecycle (naming, assignment, publication, resolution and maintenance) in order to be able to guarantee their global validity, persistence and ability to be discovered by searching mechanisms. If this is done effectively then it becomes possible not only to use them for publishing but also for discovery, syndication and assembly into larger resources (e.g., learning material assembled from smaller components).

Although Digital Object Identifiers are issued on a cost-recovery business model, like ISBN book identifiers, there can be an overhead in such digital persistent identifiers since central registries need to be maintained and investment made in systems to support publication and discovery.

Persistent identifiers for digital objects are designed to address some of the problems with other identifier schemes, such as the moving of objects to new locations rendering them hard to find again.

An aim of digital identifiers for digital objects is to make it easier to reference objects re-used across different systems and to foster interoperability. For example, a report might itself contain references to other reports in a digital library system and these could be directly linked to the new report in the references section and be opened up and even paid for by selecting that link.

The various features enabled by the main types of identifiers are also considered. Some, such as DOIs enable several other metadata services to be built around them.

4 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

2.3 Stakeholders

Within the UK Further and Higher Education communities there are several bodies that have a stake in the application of digital identifiers including:

● traditional publishers who wish to make their content, or information about it, available electronically (e.g., e- books, journals, papers) ● learning materials content developers who wish to make their e-learning materials easy to index, find, purchase and integrate within electronic learning environments ● educational institutions and training institutions who wish to locate and integrate educational materials into their programmes ● government bodies who wish to foster access to learning and are interested in opportunities offered by electronic media ● quality agencies who wish to maintain and enhance quality standards in new forms of learning

The Office of the e-Envoy has established a set of minimum standards or specifications, called the e-government interoperability framework or eGIF, that suppliers need to be able to conform to in their dealings with government. Part of the eGIF deals with learning, education and training and recommends several specifications including metadata schemes. One of the mandatory elements is a unique identifier. If the identifier were a certain type of digital identifier for digital objects then it could be resolved to multiple locations including information in an official format.

The E-learning Strategy Unit, in the DfES, is developing a unified e-learning strategy that includes recognition of the need for technical standards and specifications. The value of particular persistent, digital identifiers such as DOIs in supporting this strategy is being considered.

5 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

3 Evolution of Digital Identifiers

3.1 Non-Digital Origins

The requirement to be able to name or identify things is not new. Identifiers have been used to uniquely identify journals or books for some time. Knowledge and experience gained from this earlier work has been built on when creating the new digital identifiers. It is useful to able understand the basics of non-digital identifiers. There are traditional, non-digital object identifiers that identify clusters of items and those that identify individual items.

Most familiar will be the numeric identifiers which appear on books (International Standard Book Number System - ISBN) and periodicals (International Standard Serial Number - ISSN). These are used to identify clusters of documents contained in a book or serial publication such as a journal. ISBNs (http://www.isbn-international.org/) are made up of 4 components after the ISBN designator:

● the group (or area of the world or language group from which the publisher comes) ● the publisher within that group ● the title of the book ● a final check digit.

Similarly ISSN numbers for periodicals (http://www.issn.org) are made up of eight digits which uniquely identify a particular serially occurring periodical. The eight digits are arranged in two groups of 4, separated by a hyphen and preceded by the designator ISSN in order to distinguish them from ISBNs.

These familiar identifiers are gradually being evolved by publishers in the Scientific, Technical and Medical fields such as Elsevier, INSPEC, Springer and IEEE (Paskin, Information Identifiers, Learned Publishing 10(2), 135-56 1997) into a common format known as PII.

PII (Publisher Item Identifier) is based on the Elsevier Standard Serial Document Identifier and the ADONIS number which it has superceded. The PII builds on and incorporates existing publisher identifiers (ISBN for books and ISSN for serial items such as journals) and is made up of three parts a one letter identifier (either B for books or S for serial items) then either the ISBN or the ISSN + year of publication and then an identifier and a final check digit.

Techniques such as these uniquely identify a particular book or periodical, but as there is no central registry of numbers it is difficult to use them to find a particular article or chapter.

In order to be able to locate a single item the BiblID system was developed (ISO9115,1987). This system did not gain wide acceptance and has been replaced by the Serial Item and Contribution Identifier (SICI) (Paskin, Information Identifiers, Learned Publishing 10(2), pp 135-56, 1997). The SICI consists of two parts, the first part provides the unique identification of serial issues (SII) and the second a unique identifier for serial contributions (SCI). The SII is made up of the ISSN, issue date, issue numbering, SICI standard version number, and a check character. The SCI adds the contribution location (and a title code in case more than one title begins on a page) between the issue numbering and the SICI version number. SICIs are expressly designed to locate individual articles or issues. In many ways this system complements PII and it is conceivable that a publisher would assign a PII to the collection and SICIs to point to the physical location of the article.

Of course, there are other items that one might wish to identify such as pieces of music, videos or domestic products and comestibles and a whole raft of similar identification schemes for identifying: the Composer, Author and Editor/Publisher of music or literary texts (CAE), the edition of printed music (International Standard Music Number – ISMN), a particular recording (International Standard Recording Code – ISRC), audiovisual works

6 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

(International Standard Audio-Visual Number – ISAN) and products via the Universal Product Code/European Article Number (UPC/EAN) identifiers. These identification schemes are currently being unified in the Common Information System into codes such as the International Standard Work Code (for musical compositions and works) in order to better meet the needs for efficient protection and management of copyright material.

Another general aspect of identifiers comes into play, when identifying libraries of, for example books, which use a particular format in which to record the information about their stock. This information (the library catalogue) about other information (books) is known as meta-data. Should a library wish to exchange information with another library then it is most easily accomplished if a standard format is agreed and this is the purpose of the UNIMARC format (http://www.ifla.org). The key point to note here is that not only do we need standard naming conventions for the physical items, we also need a standard way of recording metadata – information about these resources.

Groups of Documents Single Documents Document Components Items ISSN PII None Coden SICI ISBN Biblid ISRN/ISMN/ISRC URN ISAN Metadata Unimarc DOI MARC Effect ISWC URL/PURL URC DC

Table 1: Identifiers for non-digital objects

Table 1 gives an overview of the main systems used in the non-digital world for identifying various artifacts (from consumer products and individual pieces of music through to collections of papers). In essence there is a need to identify single and collections of items and to be able to be able to identify more than one realization or manifestation of a work (e.g, several different recordings of the same written piece of music). There is a need for information about data (metadata) so that catalogues and indexes can be created to allow users to access and exchange information easily.

3.2 Digital Identifiers for Digital Objects

Most people are familiar with digital objects and they are common in the JISC community. They include word processor documents, music and video files that are commonly stored on personal computers or local networks. Each of these “digital objects” has its own filename which is assigned to it when it is created and stored. The names used are initially known only locally. That is, they exist locally on a computer and would not necessarily be meaningful to someone else and cannot be easily found and used by someone else. For individual computer users, the ability to easily create and name their own digital objects is essential. When the digital objects that they create are made publicly available (e.g., downloadable music files, training courses and product documentation) it becomes valuable that each object has a persistent and unique name. The full pathname associated with files may be unique but also indicates the location, so once the object is moved the pathname changes. It is also useful if the digital identifier can be easily discovered by performing a web-like search in some form of “catalogue”. A familiar digital object identifier is the world wide web URL (uniform resource locator) such as http://www.jisc.ac.uk. Web-based digital identifiers

7 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

Although the World Wide Web-based URL might be the most familiar it is actually part of a broader family of digital identifiers for the web. This family is formally known as Uniform Resource Identifiers (URI – see http://www.w3.org/Addressing/ and http://www.ietf.org/rfc/rfc2396.txt ). They comprise the following addressing conventions: ● World Wide Web URL e.g. http://www.bbc.co.uk/ ● the file transfer protocol (FTP) e.g. ftp://ftp.freeware.com ● gopher (precursor to the World Wide Web) gopher://teach.resources.com ● the email addressing scheme e.g., mailto:[email protected] ● the internet news service scheme e.g., news:comp.infosystems.www.servers.unix ● the telnet (remote computer access) protocol e.g., telnet://melvyl.ucop.edu/ As can be seen from above, the basic format of a URI is the name of the type of activity (e.g., ftp, gopher, http) followed by the address of the required resource. This format was chosen so that it could easily be transcribed by users and written down and passed on to others. All URIs reference a unique, single location at which a resource can be found and it is herein that there weakness lies: the location of the resource can be changed without warning and hence users may suddenly find that they are unable to use the resource as expected and no alternatives are offered. A very valuable property of digital identifiers to support persistence is to separate the name from the location.

3.3 Universal Resource Names

One solution to this problem of persistent names is the Uniform Resource Name (URN) which is a subset of the URI mentioned above. The format of the URN (RFC 2141) is the word “urn:” followed by the namespace (indicating an organization, such as a publisher, which has named a resource) followed by the unique identifier in that namespace (as defined by the organisation in that namespace) and separated by a colon e.g., URN:NBN:fi- fe19981001 would point to a particular electronically available resource at the National Finnish Library which would be retrieved by means of a browser looking up the location of the requested item in an online registry (which operates like the domain name servers already used to translate world wide web addresses into their machine- readable equivalent). In common with other national libraries National Bibliographic Numbers (NBN) are used to identify items for which no existing identifier is available and have modeled the form of this on the URN (world wide web.ietf.org RFC3188). Whilst the proposals, made in 1997, for URNs included important abilities to reference more than one realization or manifestation of a document (e.g. PDF and plain HTML) little progress has been made since then and the URN has not reached a critical mass of users.

3.4 Persistent URLs

One attempt to force the pace of progress has been the PURL (Persistent URL) which has itself lead to other similar URN based identifiers (OAI see http://www.openarchives.org/ and POI see http://www.ukoln.ac.uk/distributed-systems/poi/). We shall consider PURLs as the main representative of this style of URN.

A PURL is (http://www.purl.org/ ) a form of URL which acts as a nickname (a permanent identifier) for a real URL which contains the resource. A user enters a PURL address which is then looked up by the PURL registry which returns the real URL to the browser. This may seem a convoluted process but in separating the persistent identifier “nickname” from the real location of the resource means that as long as the owner of the resource keeps their entry at the PURL registry up to date then it will always be possible to locate that resource. A PURL is made up of three parts: http:// followed by the address of the lookup service for the third part which is the actual persistent identifier, e.g., http://purl.oclc.org/OCLC/PURL/FAQ. PURLs are a half-way-house to real URNs and the intention is to map PURLs to URNs when their full specification is clarified. One weakness of PURLs is that they point only to one location and cannot permit the multiple reference to different versions of a resource (e.g., Welsh, Gaelic, Gallic

8 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

and English versions of government documents). Nor do they support metadata – one of the main reasons for using a persistent identifier in the first place. PURLs can only be used over http and cannot be extended to use other www protocols as and when they appear. PURLs have been available for several years but are not widely implemented in commercial settings and have a limited technical infrastructure.

A further URL based scheme is the Open URL (van den Sompel 1999) which recognizes that there are limitations to the digital identifiers mentioned above, in that links do not typically lead to alternatives e.g. if in the user's local library service does not have direct access to the text then the user just gets referred to the splash page of a supplier and no other services are provided to access the item. In order to solve this problem the digital identifier needs to point to a service to provide the item and which provides facilities to recognise users across domains. OpenURLs offer one solution to this problem by providing a link to services which then provide the resource. The nature of these service components is not yet fully defined and more work is required in order to guarantee full interoperability. OpenURL can be used in conjunction with DOI.

3.5 XRI

OASIS members have formed a new technical committee to establish a common identification scheme for distributed directory services. The Extensible Resource Identifier (XRI) Technical Committee purposes to create a URI scheme and a corresponding URN namespace for distributed directory services that enable the identification of resources (including people and organizations) and the sharing of data across domains, enterprises, and applications. XNS Public Trust Organization (XNSORG) will contribute the Extensible Name Service (XNS) specifications to the TC to serve as a basis for the OASIS committee work. The committee "will define a Uniform Resource Identifier (URI) scheme and a corresponding Uniform Resource (URN) namespace that meet these requirements, as well as basic mechanisms for resolving XRIs and exchanging data and metadata associated with XRI-identified resources.

XRI could become a longer term solution and could become a market leader. It is still in the early stages of development and is not yet a strong contender for implementation in the short or medium term. It is, however, being designed to be compatible with other existing URI schemes so a system implemented now could be migrated to XRI if the technology matures and offers value.

3.6 Handle

The was developed by the Corporation for National Research Initiatives (CNRI). It is a system that offers persistence, location independence and global uniqueness. Names are registered through a registry of naming authorities that uses a handle server. It is capable of multiple resolutions and is used by the Digital Object Identifiers system described below.

An organization that uses a large number of identifiers could set up a handle server of its own. This would enable it to name objects and manually enter their location into the freely available handle software. Such a system would work well for informal collections of objects which need to be referred to by other parties yet are not expected to be permanent (e.g., drafts of a project tender held in a shared workspace before submission). Such an approach would soon encounter both staff and resource constraints if it were to be used for large scale public operations.

9 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

3.7 Digital Object Identifiers

The Digital Object Identifier was initiated by the American Association of publishers. This system brings together expertise of publishers (e.g. Reed-Elsevier) and the naming registry (Handle system) developed by the US Corporation for National Research Initiatives which is built to allow backwards compatibility with legacy systems and offers additional benefits of the ability reference multiple versions of the same resource.

A Digital Object Identifier (http://www.doi.org/handbook_2000/index.html ) is made up of a prefix which indicates the organisation involved and a suffix determined by that organisation (this could be an existing identifier or a novel one) e.g., 10.1002/prot.999. Note that an organization could also have multiple prefixes and many organizations could share the same prefix.

Like PURL, Digital Object Identifiers can be entered into a standard browser and the underlying resource be retrieved, e.g. http://dx.doi.org/10.1002/prot.999. The real strength of Digital Object Identifiers is that they are able to offer the user a choice of realisations of a particular resource. This facility is made possible by the inclusion of metadata associated with the Digital Object Identifier, which indicates all the available resources associated with that particular identifier. An additional side-effect of the metadata is that not only is the Digital Object Identifier machine readable but it is also possible to offer some human-readable information within the metadata if desired. The need for metadata in Digital Object Identifiers is such a major area for consideration (see next section).

Table 2: Overview of Persistent Identifiers

1 In theory it could be – but only Handle is used at the moment. 2 Handle is itself a protocol 3 Only when used as URNs are they then augmented with a controlled prefix (i.e. a country code) 4 Only if the PURL –URL relationship is maintained 5 DDD = Dynamic Delegation Discovery system 6 The NID (namespace identifier) must be approved by the IETF to ensure uniqueness 7 A gradient of absolute to relative persistence can be defined 8 Any namespace can be defined

10 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

3.8 Metadata for Digital Identifiers

Metadata is data about data, that is it tells us something useful about the item such as its title, author or location.

For the JISC community, metadata could be used to be able to describe the resource associated with a given digital identifier and to indicate what related services are available (e.g., language translation, pay per use etc) as well as to enable the digital identifier to be discovered and indexed through web search or other mechanisms. The metadata could indicate: ● the resource identified by the digital identifier, its name, title and type ● whether or not it has an alternative identifier such as an ISSN ● who created and published the resource ● of what the resource is made and how it is perceived (audio, video) Initial work was undertaken in conjunction with URNs which were to have associated descriptions but the proposals were seen to be too simplistic and have been superseded by work by various bodies on metadata (Digital Object Identifier: current status & outlook Paskin, D-Lib, 1999 5(5)). Perhaps the most easy to understand work is the ARK work undertaken by the internet engineering task force (IETF) which focuses on guaranteed persistence of a resource (Kunze, IETF 2003). The metadata format is designed both to be machine and human readable and provides information about: who provided the resource, what is, where and when it was produced by means of the basic tags listed above in bold. Metadata also typically enables relationships between resources to be expressed such as the location of related resources, from where each particular resource was derived, whether it forms part of a larger resource and what version the current resource is.

The most recent general work on metadata for digital identifiers has followed a late 2002 meeting of organizations seen as maintenance authorities for metadata elements organized by the EU IST CORES project to reach consensus on assigning URIs to metadata. (Identifying Metadata Elements with URIs, Baker T, Dekkers M, D-Lib Magazine 9(7-8), 2003). Work from partners includes GILS (Global Information Locator Service) which cross references different data sources by means of mapping and outlines an initial idea for URIs for their metadata. At a more advanced stage is the ONIX standard book industry products (also used for other products) which is now standardizing its metadata on the INDECS system (which is the only metadata system to have been extensively tested) as used in Digital Object Identifier. MARC (Machine readable cataloguing) standards for bibliographic type records and is proposing a URN based format for longer term use at the Library of Congress. The Common European Information Research Format (CERIF) is also proposing URI-based metadata standards as are the IEEE Learning Object Metadata (LOM) and Dublin Core metadata groups.

What comes out of these discussions is the clear picture that metadata issues are not yet fully resolved for digital identifiers and that particular concerns centre around whether or not the metadata should be built hierarchically within the URI or independently; how can redundant identifiers be identified; should identifiers with the same meaning but different possible values have the same name and should different versions share the same persistent identifier? INDECS, CORES and ARK are possible contenders but the exact outcome is not yet clear and more real-world testing is required before a robust standard can be adopted.

3.9 Building Digital Object Identifier systems

Digital Object Identifiers can access various services which may be manually operated or automated. Manual links are activated by simply clicking on them (e.g., to view a review of a particular video). Standardized application profiles and associated application programmer interfaces (APIs) enable services to be able to link to a particular Digital Object Identifier and also to be able to communicate with other Digital Object Identifiers and hence offer combined services with the same look and feel.

For example, a student in a library may access a paper in a journal to which that University library subscribes and wish to access one of the papers listed in the references to which the university does not subscribe and which resides on a separate publisher’s server. A Digital Object Identifier linking with the use of the appropriate APIs would enable the system to offer the student access to the paper directly upon payment of an access charge.

11 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

3.10 Minimum Requirements for Publishers and the e-learning community

Digital identifiers need to be able to identify an object uniquely, each one is only ever associated with a single object. Also, because of the way that the names are allocated and managed, each one is globally unique and persistent so they will always exist even if the thing they refer to disappears.

A key feature of DOIs is the ability to resolve the name to one or more locations. This could be achieved by incorporating the DOI into a web address that can be “clicked on” in browser. These locations may provide a pointer to the object itself, some information about how it is catalogued, its version history or rights management. The locations may change but the DOI will remain the same so enabling it to be used as a reference that can be distributed.

The reusability of digital content objects and their metadata makes it possible to distribute them widely, locate them with some form of “search engine” and to combine them with other objects to create new content objects. For example, it would be possible to combine images from a picture library with a report and then publish the whole item on a particular system as a new, coherent whole with its own unique name.

There are also metadata considerations. There is now widespread and increasing use of metadata records, for example in repositories or online catalogues, used for finding and describing learning content. Most metadata records contain values that uniquely identify the object they refer to and some specify DOIs. A single object may be useful in a range of localised contexts and so separate metadata records may be applied to it. As these records proliferate and become dispersed the ability to differentiate between whether they refer to manifestations of the same object or different objects becomes more important. Name spaces such as ‘Title’ are not sufficiently accurate and nor is the location if the object is found in more than one place or may have been moved.

As well as the objects, the metadata records themselves are separate things of value that may require tracking or resolution to the original version or location. Identifiers can also be associated with the metadata records themselves and schemes such as IEEE LOM, have an element that identifies the metadata record.

Therefore, the minimal requirements for digital identifiers for publishers and e-learning providers would be to have an identifier with the following properties:

● globally persistent: the same identifier is used by all, everywhere and permanently ● unique identification: every entity which needs to be identified within an identified namespace can be and its name is not used by any other resource ● functional granularity: it should be possible to identify an entity when there is a reason to distinguish it; ● actionable: the identifier does not just name a resource but links to it and may provide supplementary services ● interoperable: the identifier can be used across different systems to allow one system to access a resource within another ● designated authority: the author of metadata must be securely identified; ● appropriate access: everyone who requires access to the metadata on which they depend should obtain it yet the privacy and confidentiality of metadata should be respected by those who do not require access to it. ● metadata: provides information about the resource to enable it to be discovered by search and used in other services ● extensible: the schema permits future extensions ● multiple resolution: an identifier points to different realizations of the same resource ● backward compatibility: allows support of existing legacy naming conventions ● independent – the name is given by the naming authority alone which sets its own requirements for naming resources

12 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

Figure 1: DOIs and Associated Services

There should also be well-defined processes for the management of Digital Object Identifiers so that they retain these properties. It would make no more sense for a Digital Object Identifier to point to two unrelated resources than it would for an ISBN number to refer to two completely unrelated publications. The nature of the processes required for the use of Digital Object Identifiers is detailed in the next section.

13 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

4 Digital Object Identifier Lifecycle

There are five main phases: definition, assignment, publishing, resolution and maintenance. 4.1 Definition

The first phase of assignment concerns naming a particular resource by allocating it a Digital Object Identifier. What will typically happen is that an individual in an organisation will wish to make a resource, such as a document, publicly discoverable by means of a persistent identifier. The employee approaches the unit within their organisation which issues Digital Object Identifiers and is given a Digital Object Identifier for that resource. The nature of this identifier is determined by the individual organisation – it may be an existing standard number such as ISSN or be an internally defined scheme.

4.2 Assignment

The organisation’s Digital Object Identifier issuing unit will require the actual physical location of a resource (e.g., a library location identifier for a physical resource or a URL for an electronic resource) and will also request metadata information about the resource to be made public such as its title, the author, date of publication and its nature. This information is then passed to a recognised Registration Authority (RA) who enter the information into their systems.

4.3 Publication

Once the Registration Authority has entered the details of the resource into their system and the publishing organisation has placed the resource at the registered location it then becomes possible to enter the full Digital Object Identifier into, for example, a web browser and gain access to the resource by a process known as resolution.

4.4 Resolution

The resolution process enables a Digital Object Identifier entered into a browser to be turned from an opaque “name” into an actual resource of use to an end user. Unbeknown to the user, what happens is that, the Digital Object Identifier is broken up into its two separate part: the prefix which indicates the registration organisation and the organisation-determined identifier. The system will first look at the prefix to determine the registration authority which holds the full entry of the Digital Object Identifier and it will then pass the suffix to the servers at that registration authority. The servers at the registration authority will then return both the actual, current location of the resource and its metadata. The browser then goes to that location and retrieves the resource rather like a web page.

4.5 Maintenance

Over time it is possible that a resource pointed to by a particular resource will need to be updated. For the resource publisher this is simply a matter of refreshing the content at the location which has been stored at the Digital Object Identifier Registration agency that the resource publisher uses. More subtle maintenance issues emerge in the case of one publisher whose resources become the property of another publisher (perhaps due to a merger). In this instance some of the resources may be relocated to new servers and in this case the new owner merely has to inform the Registration Authority of the new location of the resource so that the Digital Object Identifier record can be updated and users of the resource will still be able to access the resource with the same Digital Object Identifier. Over time some Digital Object Identifiers will point to obsolete resources and in this case the owner can elect to point the Digital Object Identifier to a helpful message rather than simply leaving the browsing tool simply to produce an obscure error code.

14 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

Figure 2: Digital Object Identifier Lifecycle

Above we have outlined the basic lifecycle of a Digital Object Identifier from its creation to its eventual demise. Having seen this basic lifecycle we can now turn to some typical uses for Digital Object Identifiers.

15 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

5 Uses of Digital Object Identifiers

Our discussion of the typical lifecycle of a Digital Object Identifier has already pointed to some of its obvious uses such as publication but there are other uses for Digital Object Identifiers too such as resource discovery, syndication and assembly and we shall look at these in more detail below.

Figure 3: Digital Object Identifier services

5.1 Publication

The key reason for the existence of Persistent identifiers like Digital Object Identifiers is to enable the publication of resources. Making resources available is in itself easy but in order for these to be available publicly to a wide range of users, with different needs and goals through different systems there simply must be a way of unambiguously identifying each resource. A settled identifier which is reliable, permanent, unique, can be used in any system now and yet is sufficiently forward facing to enable future extension is a key element of realizing this vision. Once such a standard naming convention is adopted it then becomes possible not just to make resources simply available but to offer value added services too.

One particular value-added service would be to create chains of linked digital objects by means of metadata information which expresses the relationship between each object. An example for this would be to create web of each and every law and case relating to that law so that legal professionals would be able to browse either by case or by law and read up on the related material. Such cross-referenced information would be based on metadata structures, could have its own Persistent identifier and could form the basis of a valuable revenue earning service.

16 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

5.2 Discovery

An additional value added service would could be connected with Digital Object Identifiers is that of resource discovery. A key reason for assigning a unique name to a resource is to enable it to be found. In the case of Digital Object Identifiers this is aided by the fact that each Digital Object Identifier has some associated metadata. Hence, it becomes possible to devise tools which look through such metadata and retrieve objects which relate to a user’s aims whether that is locating a particular book, course, periodical or other resource.

One particularly novel discovery mode would be to perform “reverse lookup” within cross-referenced sets of resources such as legal texts. It would be possible to build tools to use such interlinked resources to produce a filtered list of documents related to a key document and indicate the relationships between each of the documents. For example legal cases could be filtered on the basis of the legal jurisdiction in which they were based.

Figure 4: Service Layers for Cross-referenced Objects

5.3 Syndication and Assembly

The development of Digital Object Identifiers are likely to lead to the development of new markets. One possible area of business is in the cataloguing of available digital resources. For example, it would be useful for teachers of language to know all the available audio tracks, videos, texts and exercises which practice a particular area of grammar teaching. Teachers could then use these raw components to create their own new courses in which the digital objects were the basic building blocks. In order to permit such re-use of digital objects, it would be necessary to catalogue them and refer to them in that catalogue by means of their persistent Digital Object Identifier.

Since combining such digital resources would be time consuming, require skill and the ability to find just the right resource it is likely that specialist aggregators would specialize in the production of course materials based on raw components.

Having such raw components available in standardized packages and being able to recombine them would make it possible to offer a greater range of CAL materials at a lower base cost and which would more closely meet the needs of pupils, students and teachers.

17 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

5.4 Digital Object Identifiers and Handle Processes

Digital Object Identifiers are made up of two components the prefix which indicates the naming authority and the suffix which indicates the resource.

Figure 5: DOI Prefix and Suffix syntax

The resource is indicated by means of metadata (information about the Digital Object Identifier) and these are defined by means of a minimum set of data elements known as the kernel (see table)

18 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

Table 3: DOI Kernel metadata elements

Note that minimal kernel metadata elements, extensions for a specific DOI Genre and administrative data (e.g., registrant, registration date, record version number) are compulsory.

19 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

The exact resource is found by handle technology which enables the Digital Object Identifier to be resolved to a particular location at which it is stored. The handle is designed to accommodate very large numbers of resources and to allow distributed administration over the Internet. It supports secured Handle resolution, and security services such as data confidentiality, service integrity, and non-repudiation on request. The resolution of a Handle returns a Handle data structure that is a collection of typed indexed data see figure 2:

10.1786/HI4N6H2Y50SP

10.1786/9JUI43DS2L0A

Figure 6: Structure of a Handle

The actual process is resolution involves routing resolution requests from a high-level service to a particular site (which may itself be distributed) which then takes responsibility for locating the particular record requested (see figure 3):

10.1786/HI4N6H2Y50SP

10.1786/HI4N6H2Y50SP

10.1786/HI4N6H2Y50SP

10.1786/9JUI43DS2L0A

Figure 7: Resolution across a distributed Handle system

Further details of the operation of Digital Object Identifiers can be found in the Digital Object Identifier handbook available from http://www.doi.org/handbook_2000/index.html.

20 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

5.5 Costs

Digital Object Identifiers are issued on a cost-recovery business model in which the cost is borne by the holder of the Digital Object Identifier prefix. This means that the end-user of Digital Object Identifiers is able to use Digital Object Identifiers at no cost, unless the resource owner wishes to charge for the resource. The owner of the Digital Object Identifier may need to pay registration costs and may incur additional costs from the registration agency if the resource is frequently accessed. A model for zero, or very low, cost to publishers of DOIs with a single location resolution is under consideration. There would continue to be a small cost for additional metadata and resolution services.

The need for such charges are to recover the investment in IT that registration authorities will require in order to maintain an orderly service. The registration authority will need to maintain sufficient bandwidth of access to its servers to ensure reliable resolution of Digital Object Identifiers to physical locations and have sufficient server resource to store Digital Object Identifiers and maintain associated services.

Given the economies of scale, it is unlikely that a JISC registration service could operate at a competitive cost level. However, JISC could consider becoming a registration authority or operate its own similar services if the cost benefits can be established.

21 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

6 Use Scenarios

6.1 Publishing With a Digital Identifier

The simplest example of the use of a digital identifier is the publication of a resource with a single resolution. Upon publication, a DOI or other identifier, is registered with a core set of DOI metadata and a location provided that links via a URL of the location of the resource itself. The publisher undertakes to the registration authority to maintain the DOI metadata.

If the location remains the same there may be no further maintenance of the DOI needed. If the location changes or the resource becomes unavailable the publisher submits a new URL that links to the resource, or an informative message if the resource is withdrawn. Similarly any core metadata changes that are required need only be made once.

Any references that use the DOI, rather than the original location, will automatically make use of this new location so reducing the problem.

6.2 Embedding DOIs

A DOI could be attached to the resource itself, such as using HTML metatags, or embedded in other file formats such as Word or PDF. The DOI can then provide access to links to one or more metadata records, possibly using an extension to the delivery software such as Acrobat.

An advantage of embedding the DOI in the object itself is that the metadata, associated with the resource by the publisher, can be updated without the degradation and the ‘Chinese whisper’ effect that can happen if various versions of metadata records associated with a resource are created and distributed. The maintenance of the metadata becomes simpler and immediately reflects any changes to the information associated with a resource just using a single DOI embedded in it.

6.3 Identifiers for Metadata Records

A metadata record is an entity in itself. It can be useful to reference such a record and link to its location, rather than replicate and distribute the metadata record. By assigning a DOI to a metadata record the relationships between the resource and the metadata can be managed more easily.

All the advantages of applying a persistent digital identifier to a resource can also gained in managing and uniquely identifying metadata instances.

6.4 Digital Rights

A learner wishes to learn about Visual Basic and is interested in the “Visual Basic 1” course offered through his local college. He goes to a national portal and browses to the link for the course he requires. On clicking this link he is offered several choices each with different cost and rights associated with them. He could download a pack of materials for learning at home or gain access to the materials online via the college’s virtual learning environment. He chooses the virtual learning environment version and is passed to a registration system which takes his payment, creates a login for him and gives him access to the course.

22 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

The rights to freely use these course materials within the college environment are later purchased by the college and located on the local server, with links to associated support materials. All the course components links associated with Digital Object Identifiers that resolve to digital rights information so it was only necessary to update the Digital Object Identifier registry and the relocation and change of rights can be transparent to end users.

Without Persistent Identifiers all the links within all the course materials could have to have been changed by hand. The digital rights information, enable free use of the materials within the college or to external payment services have only to be changed in one location by the publisher.

6.5 Multiple Resolution

A learner wishes to find a video clip about a particular topic so she goes to her preferred resource discovery service and browses the catalogue until she finds a title that appears to meet her needs. She clicks on the link to find out more, the Persistent Identifier link could offer her a choice of resolutions, some free reviews from other viewers of the video, some links to similar videos and various links to merchandise applicable to that video.

After reading some of the reviews the learner decides against that particular video but clicks on some of the links to related videos and chooses one of them. This new Persistent Identifier links to a range of formats and as she is browsing on a third generation mobile phone she selects the format for lower bandwidth.

This scenario is an example of the multiple resolution capability of Persistent Identifiers for digital objects. The first video is linked, via the metadata descriptions associated with it, to related resources and in the second example video is offered in various formats applicable to different platforms. Not only does the Persistent Identifier automate the relationship between these related resources but it can (via linked services) control access to it if required. Identifiers which reference just single locations, such as PURL, this is not achievable. Moreover, as more digital objects are given identifiers it will become ever more important for them to be able to be discovered accurately (by means of associated metadata) and to be able to link to related resources. Indeed it is quite likely that much value will be gained by being able to access such cross-referenced resources. For example, it would be useful for scientific, technical and medical students to be able to able to automatically list and read all related papers in a particular area.

6.6 Multiple Copies

The third scenario concerns the potential problem which could arise if users copy documents. Were a original creator of a document (for example a software manual) to publish it as the Digital Object Identifier 20.11/AH along with its associated metadata and the actual document itself (Doc. 1) then it would be possible for such a public document to be copied and assigned separate, new Digital Object Identifiers and metadata (30.87/GG, 71.56/TT and 26.94/WE). Whilst each of these individual copies may well be correctly linked to the software manual they would probably each be given slightly different metadata descriptions to go with the individual Digital Object Identifier which had been registered against them. Over time it will be difficult for indexing mechanisms to be able to determine which of the Digital Object Identifiers leads to the original document - the one which in the long term is likely to have the highest level of integrity. What this case illustrates is that there is a need to manage both the Digital Object Identifier and its associated metadata. It would certainly be good practice to ensure that new versions of documents always refer to the original Digital Object Identifier (so that it can always be recovered) but it may be necessary to enforce integrity of data by preventing core (kernel) metadata from being copied.

23 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

Figure 8: Multiple Instances of Metadata

6.7 Cross Sector Example

A publisher has produced a learning object for the schools community and submitted metadata to Curriculum Online. This learning object is discovered as being useful for the FE community by a lecturer.

A DOI was associated with this learning object by the publisher and this can be resolved to a further DOI that is associated with the metadata record that has been submitted to the Curriculum Online portal.

Examination of the DOI resolutions for the learning object indicates the existence of the Curriculum Online metadata record and also resolves to digital rights metadata. It is clear that the learning object can be purchased, who the publisher is and how to find out more information about it. The college content management system has a record of the resource DOI that shows that the college has rights to use the learning object throughout its own intranet, but not to copy or use it more widely.

By accessing the Curriculum Online metadata record, using the resource or the metadata record DOI in interactions with the Curriculum Online Portal, the lecturer is able to create the basis of a new metadata record, without infringing the publishers rights, adding information that describes a use for the learning object within an FE context.

Later, the lecturer’s metadata record, including the original DOI that identifies the object, is published to a wider community and is identified using a new DOI.

Links have now been established between the original object, the DOI metadata from the original publisher, separate metadata records describing the learning objects use in different sector contexts and digital rights information controlled by the publisher.

The publisher may then change the digital rights or the primary location, but the DOIs provide a resolution to the original learning object and the updated information provided by the publisher.

24 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

6.8 Researcher

A researcher at a university interested in the area of neural networks. These are artificial intelligence techniques that have drawn inspiration from biological research. Our researcher might be interested to access a paper that has been mentioned to them at a conference. The researcher has been given the following Digital Object Identifier as a world wide web

URL: http://www.publisher-doi.com/20.1986/TTY65T7E8P72H so they enter it into their browser.

The browser notices the standard hypertext reference and looks to resolve the first part of the URL world wide web.publisher-doi.com which is simply the web name for the registration agency used to register this Digital Object Identifier 20.1986/TTY65T7E8P72H. Standard world wide web domain name lookup services translate this into the physical location of the registration agency’s servers. These are then passed the remainder of the Digital Object Identifier i.e. 20.1986/TTY65T7E8P72H. As is the convention the first part (20.1986) of this indicates the issuer of the Digital Object Identifier and the second part (TTY65T7E8P72H) the actual resource. The issuer is looked up at the registration agency’s server and it then looks up the location of the actual resource before returning this as a standard URL (http://www.carpathia.edu/papers/mccarthy/nn65) to the web browser for it to display as a standard page. What is retrieved is the following page from Carpathia University:

Figure 9: Carpathia University web page for Professor McCarthy’s papers

This web page is actually derived from the metadata stored along with the Digital Object Identifier and offers the researcher a choice of two formats in which to download the paper which they wish to read (PDF or LaTeX) and a link to additional related papers which the researcher might well find relevant. This example shows the inherent usefulness of being able to refer to more than one digital object via a single Digital Object Identifier. There is no need to search to try to guess the correct URL at which a paper in a particular format might be stored – all available resources can be shown directly and also there is the added value of being linked directly to further papers which are likely to be of direct relevance.

25 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

7 Further Work

Whilst some issues are clear it has become evident during the course of this work that further investigation is required in a number of areas. These include, but are not restricted to: ● the nature of the metadata to be used alongside the adopted Digital Object Identifiers ● the processes for management of persistence over time ● details of APIs ● enhanced services (cross referencing, rights management)

8 Recommendations

The following are recommendations for the JISC community. ● To avoid confusion JISC should have and disseminate a flexible policy for digital identifiers. The digital identifiers employed should incorporate metadata and be able to uniquely identify objects. Any digital identifier should be implemented with a view to future integration with other developing systems. ● To avoid becoming a ghetto for identifiers, JISC should build services and capacity into JISC repositories to implement and interoperate with a range of leading digital identifiers, even if they are not all used for JISC publications. ● Encourage user collaboration for the provision of informal naming services. These could build on free handle technology and be based on a manually entered record for small scale operations (large scale operations would require the human and IT resources of a registration authority) ● URLs can continue to provide a solution for short time scale, single location, user authored resources, with low management and minimal cost. The disadvantages are lack of persistence and uniqueness; these can be countered to some extent by clear guidelines for best practice. ● For resources or metadata records that are to be published internally on a large scale by JISC services, a candidate for a local solution is to implement a Handle server with the ability to provide multiple resolution if required. The management overhead and server maintenance costs have to be considered. ● For a more global solution for resources or metadata records that are to be published by JISC services, on any scale, the use of the Digital Object Identifiers implementation of Handle is the best solution. There is a cost but this is minimized by the efficiency of a large-scale registration authority. In the longer term JISC could consider becoming a registration authority. ● The government should provide a national framework for name space, registration authority and resolution service governance. ● JISC should engage with leading service agencies to provide identifier services and resolution gateways.

26 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

9 References

● PURL, http://purl.oclc.org/docs/new_purl_summary.html

● Handles, http://www.ietf.org/internet-drafts/draft-sun-handle-system-10.txt

● POI, http://www.ukoln.ac.uk/distributed-systems/poi/

● OAI, http://www.openarchives.org/OAI/2.0/guidelines-oai-identifier.htm

● DOI, http://www.doi.org/handbook_2000/index.html

● Common Names, http://www.commonname.com

● DOI for CMS, http://www.cmswatch.com/Features/TopicWatch/FeaturedTopic/?feature_id=66

● XRI, http://xml.coverpages.org/ni2003-01-08-a.html

● A Uniform Resource Identifier Scheme for SNMP, Lopes RP, Oliviera JL, IEEE Workshop on IP Operations & Management 2002

● Uniform Resource Identifiers and the Effort to Bring "Bibliographic Control" to the web: an Overview of Current Progress, Schwartz R, Bulletin of the American Society for Information Science, Oct/Nov 1997, 24(1)

● A Metadata Kernel for Electronic Permanence, Kunze JA, Journal of Digital Information 2001 2(2)

● The ARK Persistent Identifier Scheme, Kunze J, IETF, 3/3/03

● Interoperability: Digital Rights Management and the Emerging Ebook Environment, Mooney S, D-LIB Magazine, 2001, 7(1)

● DOI: Current Status & Outlook, Paskin N, D-Lib Magazine 1999 5(5)

● Digital Object Identifiers, Paskin N, Information Services & Use, 22, 2002, 97-112, IOS Press

● Digital Object Identifier, Jacso P, Information Today

● Integration of Simultaneous Searching & Reference Linking across Bibliographic Resources on the Web, Misch WH, Habing TG, Cole TW, JCDL 2002, ACM

● A common model to support Interoperable metadata, D-Lib Magazine, 5(1) 1999

● The Indecs Metadata framework, Rust G, Bide M, 2000

● INDECS Summary Report, 2000

● Information Identifiers, Paskin N, Learned Publishing 10(2), pp 135-56, 1997

● Open Linking in the Scholarly Environment Using the OpenURL Framework, D-Lib Magazine 7(3), 2001, Van de Sompel H, Beit-Arie O

● Identifying Metadata Elements with URIs, Baker T, Dekkers M, D-Lib Magazine 9(7-8), 2003

27 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

10 Glossary

Term Definition

API application programmer interfaces – information for developers to be able to write programs to work with other systems

ARK IETF defined metadata language for describing Persistent Identifiers

BIBLID Bibliographic Identity defined by ISO9115, 1987

CAE Composeur, Auteur, Editeur or Composer, Author and Editor/Publisher Persistent Identifier for music texts

CORES EU Project investigating metadata for URIs

DOI Digital Object Identifier

eGIF Electronic government interoperability framework. Standards to ensure interoperable government systems

Handle Directory system which returns the real location of a DOI

IEEE Institution of Electronic and Electrical Engineers

IETF Internet Engineering Task Force – standards organisation for the internet

INDECS EU project which defined metadata for Persistent Identifiers

ISBN International Standard Book Number – Persistent Identifier for books

ISSN International Standard Serial Number – Persistent Identifier for periodical publications

JISC Joint InfraStructure Committee

MARC Machine Readable Cataloguing – Bibliographic metadata coding scheme

Persistent Identifier A generic term for permanent identifiers such as URL, ISBN etc

PII Publisher Item Identifier – Persistent Identifier for books

PURL Persistent URL (see URL)

SICI Standard Identifier Contribution Identifier – Persistent Identifier to enable articles to be located in larger works

UNIMARC Universal MARC bibliographic coding system (see MARC)

UPC/EAN Universal Product Code/European Article Number

URI Uniform Resource Identifier

URL Uniform Resource Location

URN Uniform Resource Name

XRI eXtensible Resource Identifier

28 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

11 Case studies

11.1 Publisher Case study: Granada

Granada used their allocated DOIs to reference a set of metadata records for Granada Learning products. A combination of Curriculum Online metadata and generic IMS metadata records were hosted on a test server and DOIs were created to reference each XML file. These records were accessed via a number of tools including standard web browsers, command-line tools and server-side Java web applications.

A selection of DOI URLs were embedded in an HTML page for testing access via a web browser, however the resources that were referenced were intended to be accessed programmatically, and so this is where usage was focused. They embedded a DOI URL within a sample SCORM courseware package as an externally referenced metadata record. This was then imported into the Learnwise VLE system and the metadata verified.

The procedure for embedding the DOI URLs in HTML pages was identical to embedding any other form of URL. Granada found it extremely simple to use DOI URLs with a variety of tools.

The DOI URLs were embedded in the body of a test HTML page and as external metadata links in a SCORM courseware package manifest. The DOIs resolved successfully.

No significant performance degradation when using the resolution service was observed, and response times appeared to be as expected when accessing Internet hosted resources.

Granada Learning sees potential use for DOIs wherever long-lived resources exist, requiring location-independent access throughout their lifespan. They believe that the technology is particularly appropriate for providing access to external resources in vendor neutral environments.

Overall Granada found working with DOIs straight forward and they will definitely consider making use of the technology in future projects. The system provides a robust framework for persistent resource location. When coupled with an API for querying and maintaining DOI embedded metadata, the technology forms a compelling solution to a range of problems.

29 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

11.2 HE Case Study: SOSIG

SOSIG, the Social Science Information Gateway, didn't make any particular use of DOIs but exposed DOIs in their records so that 3rd parties could make use of them, they were included in the exported OAI records to the RDN ResourceFinder for example. Enough use of the DOIs was made to show that they worked and could be used in other ways.

SOSIG records are harvested into the central RDN ResourceFinder database.

This database contains copies of all RDN records and can be searched using the Web interface at: http://www.rdn.ac.uk/

If someone searched for "British Journal Social Work" and looked at the first result, then clicked on the 'More information' link a second window appeared containing a DOI button. Clicking on the DOI button linked to the SOSIG metadata about the resource. It could have linked to the resource, if an alternative URL had been supplied. This demonstrated the DOI being passed from SOSIG to the RDN central service as part of a Dublin Core metadata record and then being used to add a link to our search results.

Clearly, the DOI button could also have been added directly to the search results page but it was not considered useful to interfere with the live RDN service. The DOIs are also being exposed thru our central Z39.50 target described at http://www.rdn.ac.uk/publications/workingwithrdn/

The DOIs were embedded simply be adding an extra line into the URI field. The RDN uses the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) to exchange metadata records. Records are encoded using a Dublin Core XML format. The DOIs are encoded in an additional dc:identifier element.

Two options for encoding the DOIs were considered.

1. doi:10.1786/107939PHE4RY 2. http://dx.doi.org/10.1786/107939PHE4RY

The second form was selected in order that systems that displayed the DOI in a Web interface would generate a clickable link without having to modify the DOI in any way. A DOI-enabled SOSIG record as encoded for exchange using the OAI-PMH looks something like this:

oai:rdn:SOSIG:903023698-18548 2003-07-29
British Journal of Social Work (The)

30 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

Published by Oxford University Press on behalf of the British Association of Social Workers, BJSW is the pre- eminent academic journal of social work in the UK. Publishing papers of the high standard, it contains a mixture of current research, practice, and policy developments drawn from contributors from the UK and elsewhere. The Web site contains tables of contents and abstracts from volume 26 1996 onwards, as well as information about subscriptions, the editorial board, advertising and subscription rates, and instructions for prospective authors. The article titles and abstracts can be searched by keyword and there is a facility for emailing new tables of contents. BASW …………..etc student social workers Journals (contents and abstracts) en en http://bjsw.oupjournals.org/ http://dx.doi.org/10.1786/107939PHE4RY http://www.sosig.ac.uk/ en This metadata record is for use by RDN partners only. All other use prohibited without permission.

31 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

http://www.sosig.ac.uk/resource?query=903023698-18548&database=SOSIG

Figure 10: RDN metadata resolved using a DOI

The embedding was very simple - although they had to alter a script which exported the records to OAI format as previously we had only exported the first URI in the record.

Similarly, the software used to generate the 'More information' page (see above) had to be modified to look at all the available dc:identifier elements and to treat any that contained a DOI as a special case (i.e. by generating the DOI button).

The DOIs resolved successfully, though there was, naturally, a slight delay between creating the URIs and getting the DOIs registered. There were no specific statistics collected about the performance of the resolution service, but based on anecdotal evidence the resolver appeared to be very fast.

32 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

At the moment, any service that wants to attach additional metadata to a resource that is described by the RDN (e.g. some rights metadata or an annotation of some kind) can only do so by either:

1. identifying the resource directly using its current URL 2. identifying the resource indirectly using the RDN metadata record identifier

The first option was considered poor because of the problem of persistence of URLs. The second option is workable, but is poor in modeling terms and may cause longer term issues when collaborating with external third- party services. Agreeing to build shared services with other organisations based on DOIs for the resources feels like a 'good thing'. Some of the functionality required of such services could be based on PURLs or Handles. One exception to this would be in the case of wanting to work closely with a 'publisher' where use of DOIs already has quite widespread acceptance. Clearly, this is potentially quite a common case for the RDN. The actual process of setting up DOIs was fairly unproblematic. The demonstration could be improved by resolving directly to the resource. It would also be useful to be able to keep the DOIs for a while - in terms of being able to demonstrate them to other parts of the community.

33 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

11.3 Publisher Case Study 2: TSO (the Stationery Office)

UK Official Publications (UKOP) is the comprehensive catalogue of UK official publications and is recognised as "the official catalogue" by Government. UKOP includes Parliamentary publications, legislation - including Acts and Statutory Instruments, the publications of central government departments and devolved bodies and the output of quangos, agencies and selected international bodies, such as the UN and the WHO and the European Commission.

The UKOP catalogue contains over 450,000 records. It combines the entire TSO publications catalogue together with COBOP (The Catalogue of British Official Publications that are not published by TSO). As all Government departments are mandated to provide copies of their non-TSO publications for cataloguing within COBOP, the comprehensive extent of the coverage is guaranteed. TSO wanted to improve the ease of identifying and locating information within UKOP.

TSO has applied DOIs to the entire UKOP database of 450,000 metadata records. The DOIs and associated metadata records were created through an automated process using the TSO DOI API which is directly connected into the DOI/Handle System global infrastructure. Existing metadata from the UKOP database was mapped to the DOI metadata and the DOIs were inserted appended in to the UKOP identifier element of the record. The process took, on average, two seconds per DOI. Thus the existing UKOP metadata remained largely undisturbed, but yet crucially is linked to a DOI record with all the attendant benefits of DOI.

As The DOI/Handle system is optimised for resolution, accessing any of the 450,000 UKOP DOIs is efficient and fast. Nor has it not resulted in any degradation to the existing service to UKOP users.

As UKOP is a subscription-based service TSO is in the process of developing a permissions’ layer.

Although UKOP is a subscription service, using the DOI system architecture allows a permission layer to be used to control access to any of the exposed DOI/UKOP metadata records.

Assigning DOIs to the UKOP metadata not only improves the discoverability of the information within the database, but also creates citation opportunities with much wider resource discovery, resolving and linking to other associated information. This makes UKOP an essential online resource for public, academic and corporate organisations.

34 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

11.4 HE Case Study 2: Sunderland

Sunderland used DOIs to reference a range of typical objects with which Higher Education institutions deal. These included:

● Degree course entries in online prospecti e.g. New Route PhD, BSc and MSc courses ● Electronic course materials, course components, handouts ● Publicity material: prospectus ● Research outputs: theses, papers ● Student information: careers, library, staff, alumni ● Staff information: phone directory, funding agencies ● Physical objects: books.

The items selected covered a variety of different user groups from staff publishing course materials and searching for documents on the intra-net, through prospective an existing student information to administration and corporate management.

The DOIs for the items for these user groups were embedded in a web page to test access and user reactions. Not only was it technically easy to embed the DOIs in the existing web-based systems and environment but it was also easy for end-users to access the objects pointed to by means of their standard web tools. The technology is therefore easy to employ and to use yet offers a number of advantages over existing technology for the HE sector:

● Stable identification across the intra-net and internet ● Increased likelihood of published resources being easily discoverable ● Built-in control over access to given resources ● Enhanced services built on DOI-referenced resources.

35 www.tso.co.uk Digital Object Identifiers for Publishers and the e-Learning Community

12 Acknowledgements

The authors of the report, Chris Bowerman of the University of Sunderland, and Mike Collett of Schemeta Limited, would like to thank all of those who participated and contributed to the project. In particular they would like to thank the following:

Prof Benedict du Boulay University of Sussex Dr Steve Jeyes Edexcel International Charlie Jones Granada Learning Ltd Dr Rose Luckin University of Sussex Andy Powell UKOLN Alan Treece Granada Learning Ltd Jonathan Whiting TSO Robin Wilson TSO

36 www.tso.co.uk If you would like to find out more about any aspects of this report or TSO’s digital identifier and metadata services, please contact:

The Stationery Office 0870 600 5522

e-mail [email protected] www.tso.co.uk

[doi:10.1786/543675452980]

Copyright The Stationery Office Limited 2003