CEN CWA 16385

WORKSHOP January 2012

AGREEMENT

ICS 03.180; 35.240.99

English version

Interoperability of Registries

This CEN Workshop Agreement has been drafted and approved by a Workshop of representatives of interested parties, the constitution of which is indicated in the foreword of this Workshop Agreement.

The formal process followed by the Workshop in the development of this Workshop Agreement has been endorsed by the National Members of CEN but neither the National Members of CEN nor the CEN-CENELEC Management Centre can be held accountable for the technical content of this CEN Workshop Agreement or possible conflicts with standards or legislation.

This CEN Workshop Agreement can in no way be held as being an official standard developed by CEN and its Members.

This CEN Workshop Agreement is publicly available as a reference document from the CEN Members National Standard Bodies.

CEN members are the national standards bodies of Austria, Belgium, Bulgaria, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, Netherlands, Norway, Poland, Portugal, Romania, Slovakia, Slovenia, Spain, Sweden, Switzerland, Turkey and United Kingdom.

EUROPEAN COMMITTEE FOR COMITÉ EUROPÉEN DE NORMALISATION EUROPÄISCHES KOMITEE FÜR NORMUNG

Management Centre: Avenue Marnix 17, B-1000 Brussels

© 2012 CEN All rights of exploitation in any form and by any means reserved worldwide for CEN national Members.

Ref. No.:CWA 16385:2012 E

CWA 16385:2012 (E)

1. Contents

Foreword ...... 4 Introduction ...... 5 1 Scope ...... 6 2. Normative References ...... 7 3. Terms, Definitions, Symbols and Abbreviations ...... 8 3.1. Terms and Definitions ...... 8 3.2. Abbreviations ...... 9 4. Related Work ...... 10 4.1. General ...... 10 4.2. Federated Repositories for Education (FRED) ...... 10 4.3. The Learning Registry ...... 10 4.4. The GLOBE/ARIADNE v1.0 UDDI Registry ...... 11 4.5. The Spider ...... 11 5. Data Model ...... 13 6. Access to a Registry ...... 15 6.1. General ...... 15 6.2. Query ...... 15 6.2.1. General ...... 15 6.2.2. Simple Query Interface (SQI) ...... 15 6.2.3. Search/Retrieval Using URL (SRU) ...... 16 6.3. Synchronize ...... 16 6.3.1. General ...... 16 6.3.2. Synchronizing by Harvesting Using OAI-PMH ...... 16 6.3.3. Subscribing to News ...... 17 6.3.4. Data Replication ...... 17 6.4. Publish ...... 18 6.4.1. General ...... 18 6.4.2. The Simple Publishing Interface (SPI) ...... 18 7. Collection Registry Specification ...... 20 7.1. General ...... 20 7.2. Data Model ...... 20 7.3. Access Protocols ...... 24 7.4. Functionality ...... 24 7.4.1. General ...... 24 7.4.2. Add a New Collection Descriptor ...... 25 7.4.3. Search Collection Descriptors of the Registry ...... 26 7.4.4. Harvest Available Collection Descriptors of the Registry ...... 26 7.4.5. Modify a Collection Descriptor ...... 27 7.4.6. Report a Problem with the Registry ...... 27 7.4.7. Subscribe to News ...... 28 7.4.8. Remove a Collection Descriptor...... 29 7.4.9. Login ...... 29 8. 1st Reference Implementation: The ARIADNE Registry ...... 31 8.1. General ...... 31 8.2. Functionality ...... 31 8.3. Implementation ...... 31 9. 2nd Reference Implementation: LORRy ...... 33 9.1. General ...... 33 9.2. Functionality ...... 33 9.3. Implementation ...... 33

2

CWA 16385:2012 (E)

10. Case Studies ...... 34 10.1. General ...... 34 10.2. Interoperability between the ARIADNE Registry and LORRy ...... 34 10.2.1. Context ...... 34 10.2.2. Storing Collection Descriptions ...... 34 10.2.3. Demonstration ...... 34 10.2.4. Analysis ...... 35 10.3. Interoperability between the ARIADNE registry and the Learning Registry...... 35 10.3.1. Context ...... 35 10.3.2. Storing Collection Descriptions ...... 36 10.3.3. Collection Descriptions Protocols...... 38 10.3.4. Demonstration ...... 39 10.3.5. Analysis ...... 40 11. Conclusions ...... 41 12. Bibliography ...... 42

3

CWA 16385:2012 (E)

Foreword This CEN Workshop Agreement (Learning Technologies (WS/LT) )has been drafted and approved by a Workshop of representatives of interested parties on 10 October 2011, the constitution of which was supported by CEN following the public call for participation made on 10 August 2011.

A list of the individuals and organizations which supported the technical consensus represented by the CEN Workshop Agreement is available to purchasers from the CEN-CENELEC Management Centre. These organizations were drawn from the following:

List of Contributors and Editors

Joris Klerkx, Katholieke Universiteit Leuven (Editor)

Daniel Rehak, ADL (Editor)

David Massart, European Schoolnet (EUN) (Editor)

Fredrik Paulsson (Editor)

José Luis Santos, Katholieke Universiteit Leuven

Michael Totschnig, Vienna University of Economics and Business Administration

Frans Van Assche, Katholieke Universiteit Leuven

Tien Dung-le, European Schoolnet

Frederic Bergeron, LORNET

Elena Shulman, European Schoolnet

Contributions from the following partners are acknowledged: CEN WS/LT, ARIADNE, European Schoolnet, ASPECT, ICOPER, IMS, GLOBE, Learning Registry, Share.tec, Organice Edunet, Natural Europe, etc.

The formal process followed by the Workshop in the development of the CEN Workshop Agreement has been endorsed by the National Members of CEN but neither the National Members of CEN nor the CEN- CENELEC Management Centre can be held accountable for the technical content of the CEN Workshop Agreement or possible conflict with standards or legislation. This CEN Workshop Agreement can in no way be held as being an official standard developed by CEN and its members.

This CEN Workshop Agreement is publicly available as a reference document from the National Members of CEN: Austria, Belgium, Bulgaria, Croatia, Cyprus, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, Netherlands, Norway, Poland, Portugal, Romania, Slovakia, Slovenia, Spain, Sweden, Switzerland, Turkey and the United Kingdom.

Comments or suggestions from the users of the CEN Workshop Agreement are welcome and should be addressed to the CEN-CENELEC Management Centre.

4

CWA 16385:2012 (E)

Introduction Over the last 15 years, considerable effort has been spent on the development of standards and specifications for learning object repositories, with significant results, including:

• IEEE 1484.12.1-2002 Standard for Learning Object Metadata (LOM) specifies how to describe learning content [IEEELOM 2002],

• CWA 15555 Guidelines and Support for Building Application Profiles in e-Learning (delivered under SA/CEN/2004/25) specifies how application profiles can be derived from IEEE LOM [CWA15555 2006],

• CWA 14645 Availability of alternative language versions of a learning resource in IEEE LOM (delivered under SA/CEN/2000/42) specifies how the availability of alternative language versions of a learning resource can be described in IEEE LOM [CWA14645 2003],

• CWA 15454 A Simple Query Interface Specification for Learning Repositories (delivered under SA/CEN/2003-13) defines SQI (Simple Query Interface)—a query interface to access content in learning repositories [CWA15454 2005]. Alternatives to SQI include SRU/SRW [SRU 2007],

• The ProLearn Query Language defines a query language for searching learning object repositories [PLQL 2008]. Alternatives include CQL [CQL 2008] and XQuery [XQuery 2007],

• Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) defines how metadata can be harvested from learning object repositories [OAIPMH 2002]. Alternatives include publishing a sitemap for a web crawler to harvest [Sitemap 2008],

• The Simple Publishing Interface (SPI) (delivered under SA/CEN/2007-24A) in the CEN Workshop on Learning Technologies defines how metadata and content can be inserted into learning object repositories [CWA16097 2010]. Alternatives include the Atom Publishing Protocol [ATOM 2005], PENS [PENS 2006], and SWORD [SWORD 2008].

This work resulted in global infrastructures that are interconnected worldwide. As a result, we have moved from the earlier problem of a scarcity of learning resources with a new problem of abundance. In the ICT PSP Digital Content1 (under eContentplus), the standards and specifications mentioned above were the core of the technical backbone for EU-funded projects including:

• MELT [http://info.melt-project.eu/]

• MACE [http://www.mace-project.eu/]

• ICOPER [http://www.icoper.org/]

• ASPECT [http://aspect-project.org/]

• ORGANIC.EDUNET [http://www.organic-edunet.eu/]

This standardization work goes one step further and attempts to standardize ways for federations such as the EUN Learning Resource Exchange (LRE) [http://lreforschools.eun.org/], ARIADNE [http://www.ariadne- eu.org/], GLOBE [http://www.globe-info.org], and others to automatically discover new repositories for inclusion in their federation.

1 http://ec.europa.eu/information_society/activities/econtentplus/about/index_en.htm 5

CWA 16385:2012 (E)

1 Scope Typically, a federation of repositories consists of a number of participating Learning Object Repositories. The locations of those repositories and the description of the protocols they support for exposing their learning resources to the federation are maintained and managed at the federation level. This can either be managed:

• By tools such as harvesters or federated search engines that connect the repositories to the federation, or

• In a separate registry that manages this information for all the repositories on behalf of these tools.

These registries are generally not available outside of the individual federation in which they operate. The obvious problem is that this leads to a duplication of effort because repository descriptions must be entered in the registry of each federation where they are a member. As the result, there are difficulties to keep the information up-to-date across all the registry instances in all the federations. For example, if the Open Learn (OU-UK) repository changes the location of their OAI-PMH target, the location should be changed in the registries of ARIADNE, ASPECT, ICOPER, etc.

The present document gives guidance to enable the connection of learning object repositories, in order to further increase their impact in making relevant content available to teachers, trainers and (life-long) learners, by specifying how a network of registries can be set such that changes in the description of a repository only needs to be made once. This document does not build new specifications but rather profiles existing specifications.

Therefore, this CWA will focus on the following topics:

• Overview of the data model that can be used to describe collection registries (Clause 5),

• Specify protocols and APIs that can be used to provide access to these registries (Clause 6),

• A Collection Registry Specification (Clause 7),

• Registry reference implementations that can be used to validate the specifications (Clauses 8 and 9),

• Case studies (Clause 10).

The data model and APIs facilitate the use of registries by external tools that can manage, query or update the information that the registries contain. They make it possible to share registries between federations, thus enabling the automatic discovery of repositories and the automatic federation to new repositories into a federation.

6

CWA 16385:2012 (E)

2. Normative References

The following documents, in whole or in part, are normatively referenced in this document and are indispensable for its application. For dated references, only the edition cited applies. For undated references, the latest edition of the referenced document (including any amendments) applies.

CWA 15454 2005 A Simple Query Interface Specification for Learning Repositories

NOTE CEN Workshop Agreement 15454:2005 (E), European Committee for Standardization, November 2005. Available at: ftp://ftp.cenorm.be/PUBLIC/CWAs/e-Europe/WS-LT/CWA15454-00-2005-Nov.

CWA 16097 2010, The Simple Publishing Interface (SPI) Specification,

NOTE CEN Workshop Agreement 16097:2010 (E), European Committee for Standardization, February 2010. Available at: ftp://ftp.cenorm.be/CEN/Sectors/TCandWorkshops/Workshops/CWA16097.pdf

[LODE 2010] D. Massart, N. Nicholas, and N. Ward, IMS GLC Learning Object Discovery and Exchange Base Document, v1.0, IMS Global Learning Consortium, March 2010.

NOTE Available at http://imsglobal.org/LODE/spec/imsLODEv1p0bd.html

[OAIPMH 2002] The Open Archives Initiative Protocol for Metadata Harvesting, Open Archives Initiative, V2.0, June 2002.

NOTE Available at: http://www.openarchives.org/OAI/openarchivesprotocol.html

ISO 2146 2010 Information and documentation - Registry services for libraries and related organizations

NOTE Available at: http://www.iso.org/iso/catalogue_detail.htm?csnumber=44936

[IEEELOM 2002] IEEE Standard for Learning Object Metadata, IEEE Std 1484.12.1™-2002, IEEE Computer Society, September 2002

7

CWA 16385:2012 (E)

3. Terms, Definitions, Symbols and Abbreviations 3.1. Terms and Definitions We use the following definitions as illustrated in Figure 1. Please note, that these definitions may vary from those used by other communities to define the same terms. learning object: a digital resource designed for, or identified as being used for, learning, aka a learning resource, learning content. metadata record: a digital record describing the characteristics of another object. The structure and semantics of the record are typically defined by a metadata standard, e.g., IEEE LOM, Dublin Core. learning object metadata: a metadata record describing a learning object. The metadata will include information about the structure and cataloging of the object and information about how it can be used in learning. There may be multiple learning object metadata records for any single learning object. repository: a managed, persistent digital store of learning resources (e.g., learning content, learning objects, learning object metadata). A repository may store the resources, the metadata, or both. A repository generally exposes a set of services used to access the resources or to submit resources. referatory: a managed set of links to metadata describing learning resources, e.g., links to learning object metadata. A referatory only stores links, not the corresponding metadata objects. The links need to be resolved to access the metadata. Note, as defined, a referatory only stores links to metadata; other definitions of a referatory permit the referatory to store both links to learning resources and to metadata. collection: a named set of learning resources stored in a repository. The collection will have at least one associated metadata record that describes the collection as a whole (the collection descriptor). For the purpose of this document, only one entire collection is stored in a repository, i.e., the mapping between a collection and a repository is 1:1. collection descriptor: the metadata that describes a collection, e.g., its name, typical target audience (schools, higher education), predominate language, predominate topic. registry: a managed persistent digital store of information about collections. For each collection, the description in the registry includes the collection descriptor, additional information about the corresponding repository holding the collection (e.g., its location), and information about the services that the repository provides for the collection (e.g., the services, their end points, protocols, policies). collection registry: a registry as defined herein. paradata record: a digital resource describing the contextualized use of a learning object.

While a repository is just a general persistent digital object store and can hold any kind of data, and a registry generally holds any type of metadata records, herein we use the unqualified terms repository and registry to define specific types of repositories and registries, i.e., a repository is a learning resource repository and a registry is a collection registry. Other communities may use the unqualified terms in different ways, e.g., a registry might be what is called a referatory herein, or may not allow the use of the unqualified terms. We further require that the registry only contain metadata about the referenced collections and its corresponding repositories hold the actual learning content and its associated metadata.

8

CWA 16385:2012 (E)

Figure 1 - Definitions 3.2. Abbreviations For the purposes of the present document, the following abbreviations apply:

API – Application Programming Interface CQL – Common Query Language FRED – Federated Repositories for Education GLOBE – Global Learning Objects Brokered Exchange IEEE LTSC LOM – IEEE Learning Technology Standards Committee Learning Object metadata IMS LODE – IMS Learning Object Discovery and Exchange LRE – Learning Resource Exchange LOR – Learning Object repository LORRy – Learning Object Repository Registry OAI-PMH – Open Archives Initiative - Protocol for Metadata Harvesting SRU – Search/Retreival via URL SUM – Service Usage Models SQI – Simple Query Interface RSS – Really Simple Syndication PLQL – Prolearn Query Language UDDI – Universal Description, Discovery and Integration

9

CWA 16385:2012 (E)

4. Related Work 4.1. General This clause presents previous and related work that is taken into account in this document. 4.2. Federated Repositories for Education (FRED) The Federated Repositories for Education (FRED) project2 “documented generic service-oriented models and produced software toolkits that support development of repository federations”. The FRED project used the e-Framework modeling and documentation methodology [http://e-framework.org] to describe components of a service-oriented infrastructure. The FRED project outputs included the detailed description of several services, including:

• a harvest service using OAI-PMH [OAIPMH 2002] and LOM [IEEELOM 2002]

• an obtain service, and

• a search service using SRU [SRU 2007], CQL [CQL 2008] and a LOM context set [LOMCQL 2007].

It produced two SUMs (Service Usage Models) that describe a comprehensive set of services needed in (1) operating a repository federation from the user perspective (e.g., publishing to it, discovery from it, using learning content) and (2) establishing, provisioning and managing the repository federation from the system perspective [Rehak 2009]. The FRED SUMs model the services for a single repository federation that includes a centralized system registry describing the policies and operations of the federation, a metadata registry holding a copy of the metadata from the constituent federants, and a federation registries with metadata about the repositories (repositories may contain resources or metadata) and collections in the federation (collection-to-repository mapping is M:N). While FRED models a particular view of a repository federation, the underlying services, standards and model are designed to be reused in describing other federation and registry models. 4.3. The Learning Registry The Learning Registry3 [Rehak 2011] aims to make “learning resources easier to find, easier to access and easier to integrate into learning environments wherever they are stored—around the country and the world.” It defines a set of open APIs and open interoperability standards to provide three fundamental, enabling capabilities:

1. a lightweight mechanism to publish (push) metadata, paradata, assertions or resources into a learning resource distribution network, independent of format or the resource, metadata or paradata;

2. the ability for anyone to consume the published data and then, in turn, to publish additional feedback on its use into the network (e.g., additional paradata), amplifying the overall knowledge about the resources;

3. a high latency, loosely connected network of master-master synchronizing brokers distributing resources, metadata and paradata.

There is no central control, central registries or central repositories in the core distribution network. Published data eventually flows to all nodes in the network. The network aims to be self-assembling. Edge services can connect to any distribution node to find what resources (and sources) are in the network, what’s changed, what’s being used, etc. Organizations may build consumer-facing, value-added services at the edge nodes to enable using, finding, sharing, and amplifying the resources, metadata and paradata for user communities. The Learning Registry provides social networking for metadata (trusted social collaboration around learning resources), enabling a learning layer on the social web.

2 http://fred.usq.edu.au/ 3 http://learningregistry.org/ 10

CWA 16385:2012 (E)

4.4. The GLOBE/ARIADNE v1.0 UDDI Registry ARIADNE is a European association open to the world, for knowledge “Sharing and Reuse”. The core of the ARIADNE infrastructure is a federation of learning object repositories. It is one of the founding members of the GLOBE4 consortium, an alliance of which participating organizations have committed to work collaboratively on a shared vision of ubiquitous access to quality educational content.

Figure 2 illustrates the ARIADNE network in the beginning years where one chose to follow the approach of federated search across several repositories and providers. This allowed transparent search for learning objects from one single user client. The actual access points to which these queries were distributed, were registered in a registry, based on the UDDI “Universal Description, Discovery and Integration” OASIS standard [UDDI 2004]. UDDI defines a universal method to dynamically discover and invoke web services. However, in recent years, ARIADNE has opted for a harvesting approach where learning object metadata is harvested to a central repository of learning resources. This allows for more efficient search than a federated approach where one has to take into account possible bandwidth problems.

Figure 2 - SQI UDDI Registry 4.5. The Spider

The Spider5 offers a service for searching repositories targeting Swedish schools. Existing repositories have been interconnected in a “repository network” in order to accomplish the infrastructure needed for the service.

The Spider service was constructed using several existing technologies and software platforms. The RDF metadata store and management platform SCAM (Standardized Contextualized Access to Metadata) is used as the basis for the Spider [Paulsson 2003] and function as a metadata store for the Spider.

The Spider uses OAI-PMH for harvesting and SQI for querying repositories. However, the Spider is also set up to be able to serve as a “hub” for federated repository queries using the SRU-protocol (illustrated in

4 http://www.globe-info.org/ 5 The Spider is called Spindeln in Swedish 11

CWA 16385:2012 (E)

Figure 3). The Fire protocol that was previously used in MELT/LRE together with LRE Query Language (LRE-QL) is also supported, but is no longer maintained [Paulsson 2009].

Figure 3 - A Schematic View of the Spider Service [Paulsson 2009]

The Spider provides a single point of entry to quire several of the largest repositories in Sweden and is offered as a free service via the Swedish schoolnet. Most of the archives that connect to the Spider use LOM or Dublin Core as the basis for their metadata. The Spider supports both LOM and DC. The Spider can also be queried can also been using the Spider Widget, which can be integrated in the schools local environment, such as in an LMS or a local portal. It is also possible to use the Spider as an integrated part of other (third party) services by using the Spider API, which provides direct access to most of the Spider functionality.

The Spider is the Swedish node in the European Learning Resource Exchange (LRE).

12

CWA 16385:2012 (E)

5. Data Model As stated in the Clause 3, this document considers the registry as a managed persistent digital store of information about collections. For each collection, the description in the registry includes the collection descriptor, additional information about the corresponding repository holding the collection (e.g., its location), and information about the services that the repository provides for the collection (e.g., the services, their end points, protocols, policies).

Instead of developing new specifications from scratch, the choice was made to profile existing specifications for our purposes. As described in the project proposal, in the first phase of this project, we chose to represent this information in the Learning Object Repository Registry data model, that is part of the IMS Learning Object Discovery & Exchange (LODE) specification [LODE 2010, Massart 2010]. This specification aims to facilitate the discovery and retrieval of learning objects stored in more than one collection. It can be seen as a glue specification that profiles existing general-purpose repository and digital library protocols in order to take into account requirements specific to the educational domain, rather than creating new protocols. The Learning Object Repository Registry Data Model, for learning object collections, is used in discovering and configuring access to those collections.

It provides a schema of describing:

• collections of learning objects,

• the persons responsible for those collections, and

• the available mechanisms for interacting with those collections.

This data model is based on the ISO 2146 standard [ISO2146 2010]: “Registry Services for Libraries and related organizations” developed by ISO TC46 SC4 WG7 and proposes a framework for building registry services for libraries and related organizations.

Descriptions following this model are intended to facilitate exchange of learning content between different collections. The data model provides a consistent framework, independent of local registry configurations, for:

• collection discovery,

• evaluation and vetting of collections,

• access to collections (e.g., harvesting or searching), and

• automated configuration of access to collections.

The UML diagram in Figure 4 illustrates the overall structure of the model; refer to the XML Binding of the model for further information (http://imsglobal.org/xsd/imsloreg_v1p0.xsd). The semantics of the model are discussed below.

The IMS LODE Registry data model consists of two main parts [LODE 2010]: (1) collections of learning content and collections of metadata used to describe this content; (2) the targets (i.e., the access points or service end points) used to access these collections.

Collections

Both types of collections (content and metadata collections) are described using two types of attributes: (1) attributes that describe a collection has a whole such as ‘description’ or ‘average annual increase’; (2) attributes (called properties in LODE) that describe the elements contained in a collection such as ‘language’ or ‘format’. These properties use a quantifier to indicate that ‘some’, ‘most’, or ‘all’ the elements of a collection have this property. For example, most of the learning objects in a content collection are in English.

13

CWA 16385:2012 (E)

Figure 4 - UML Class Diagram of the IMS LODE Registry Data Model [Massart 2010]

Targets

A target corresponds to a point of access to a collection. A target has a location and supports a given protocol. Access to targets is usually constrained by a certain access policy. The description of a target provides all the parameters necessary to automatically connect to the target. These parameters depend on the protocol supported by the target. Currently, IMS LODE defines schemata for expressing the configuration parameters of the SQI, OAI-PMH, and SRU protocols. The specification includes an extension mechanism that allows for easily adding support for other protocols.

A profile of this data model has been created that is recommended to describe repositories in our reference implementations. This is presented in Clause 7.

14

CWA 16385:2012 (E)

6. Access to a Registry 6.1. General Each federation of learning object repositories makes use of a registry. Interconnecting various registries with each other in order to make the resources of their respective federations globally available means that open standards and specifications should be deployed on top of these registries for accessing their content. This document therefore describes functionality to provide answers to the following questions:

• How to query a registry?

• How to synchronize the contents of different registries with each other?

• How to add a new repository to the registry?

• How to update a description of a registry?

This clause presents those specifications that can be used to achieve this functionality. We have organized the specifications in three parts: (i) query, (ii) synchronize, and (iii) publish. 6.2. Query 6.2.1. General

There exist many specifications to issue queries to a search engine. For example, OpenSearch6 is a collection of simple formats for search queries and results. It is used by a vast number of search engines and search applications around the Internet. In Technology Enhanced Learning (TEL), there exist two major specifications used by numerous learning object repositories. End users can use both specifications to issue queries to search services. 6.2.2. Simple Query Interface (SQI)

The Simple Query Interface (SQI) specification [CEN15454 2005], supported by the CEN WS-LT (CWA 15454:2005), presents an Application Programming Interface (API) for querying learning object repositories. Since one major design objective is to keep the specification simple and easy to implement, the interface is labeled the “Simple Query Interface” (SQI). In the context of SQI, learning object repositories are defined as collections of educational material, courses, and learning objects with associated descriptions (referred to as “metadata”). Examples of repositories for learning are educational brokers, knowledge pools, streaming video servers, etc.

SQI has the following characteristics:

● SQI is neutral in terms of results format and query languages. The repositories accessed through SQI can be of highly heterogeneous nature,

● SQI supports synchronous and asynchronous queries to support heterogeneous use cases,

● SQI supports both a stateful and a stateless implementation, and

● SQI is based on a session management concept in order to separate authentication issues from query management.

Because SQI makes no assumptions about the query language or results format, SQI is a good candidate for providing access to a collection registry. A client tool using SQI can issue a query to the registry and get back the information about a specific repository in the collection.

6 http://www.opensearch.org/Home 15

CWA 16385:2012 (E)

6.2.3. Search/Retrieval Using URL (SRU)

The Search/Retrieval via URL (SRU) [SRU 2007] and the SRU via HTTP SOAP Transport (previously denoted SRW) are a web-enabled and modernized development of the Z39.50 protocol initially developed in the 70’s [Z39.50 2003]. Z39.50 and SRU address the same problem, providing a query and search protocol for repositories and registries. SRU supports transport via HTTP GET, HTTP POST or HTTP SOAP, i.e., rest-like and web-service based protocols. The results of a SRU query can be processed using XSL in order to apply different stylesheets or for other modifications.

SRU is complementary to the OAI-PMH [OAIPMH 2002] Protocol. While OAI-PMH is designed to harvest full metadata sets (all or limited date range filters), which can then be searched and filtered, SRU/W is used to formulate individual queries with better precision. The two approaches can be used in combination in different scenarios depending on the service objectives. The abstract model of SRU makes it possible to return query responses in a variety of formats expressed using plain text or XML (Context Sets). Examples of commonly used formats are Dublin Core [DCCQL 2007] and MODS [bibCQL 2009]. There is no single official CQL Context Sets for representing IEEE LOM, but various research projects, such as the Federated Repositories for Education (FRED) project have developed LOM-based context sets, e.g., [LOMCQL 2007]. The current version of the SRU specification set is Version 1.2. SRU is currently being revised and standardized within OASIS. 6.3. Synchronize 6.3.1. General

For synchronizing the contents of multiple registries, a number of approaches can be used: (i) harvesting collection descriptors on a regular basis with OAI-PMH, (ii) subscribing to news, or (iii) in a data replication mode. 6.3.2. Synchronizing by Harvesting Using OAI-PMH

The Open Archives Imitative Protocol for Metadata Harvesting [OAIPMH 2002] is a protocol designed to collect (harvest) metadata from various repositories. There are two actors in this approach. On one side there is the Service Provider (the harvester), who wants to collect the metadata (for example to enhance it or make it better reachable), and on the other side are the Metadata Provider(s), who want to make their metadata available.

To be able to offer a target to harvest, one has to set up a web service on top of the repository—in our case the registry—that supports a series of request, that are called "verbs":

• Identify: returns the description of the repository/registry,

• ListMetadataFormats: returns a list of supported metadata formats,

• GetRecord: returns the metadata for an individual requested item,

• ListIdentifiers: returns a list of identifiers of available metadata items in the repository/registry,

• ListRecords: returns a list of available metadata objects in the repository/registry,

• ListSets: returns a list of available sets in the repository/registry. Sets are used to organize items in the repository/registry.

For harvesting purposes, the Service Provider will usually invoke the ListRecords verb (with a metadata format prefix and usually a date span) on the Metadata Provider to obtain all metadatarecords from the provider that have been created or updated within the specified date span. The overall flow of making a service request and obtaining a response is shown in Figure 5.

16

CWA 16385:2012 (E)

Figure 5 - OAI-PMH Request/Response Workflow

In OAI-PMH, a repository—or in the case of this document a registry—is a network accessible server that can process the six OAI-PMH requests in the manner described in the specification. A repository is managed by a data provider to expose metadata. To allow various repository configurations, OAI-PMH distinguishes between three distinct entities related to the metadata made accessible by the OAI-PMH:

• resource: A resource is the object or "stuff" that metadata is "about". In our case of a collection registry, a resource is a repository that contains learning resources.

• item: “An item is a constituent of a repository from which metadata about a resource can be disseminated” [OAIPMH 2002]. That metadata may be disseminated on-the-fly from the associated resource, cross-walked from some canonical form, actually stored in the repository, etc. In our case, the item is the collection descriptor that describes the repositories or collections in the registry.

• record: “A record is metadata in a specific metadata format. A record is returned as an XML-encoded byte stream in response to a protocol request to disseminate a specific metadata format from a constituent item” [OAIPMH 2002]. The metadata format in our case is the data model described above.

If a collection registry deploys an OAI-PMH target on top of its collection descriptors; one could set up incremental harvesting between registries on a regular basis. Updated, new or deleted collection descriptors could therefore be synchronized between those registries. 6.3.3. Subscribing to News

Both RSS [RSS 2009] and ATOM [ATOM 2005] are web feed formats used to publish frequently updated works—such as blog entries, news headlines, audio, and video—in a standardized format. An RSS document (which is called a "feed", "web feed" or "channel") includes full or summarized text, plus metadata such as publishing dates and authorship. This could be used for a manual synchronization between registries. For example, an owner of registry X could subscribe to a newsfeed of registry Y. Whenever a new collection is registered in Y, the owner receives an alert, and can choose to add the same collection in his own registry. As this process may require some manual step, this approach may not scale in a big network of interconnected registries. 6.3.4. Data Replication

The first version of the ARIADNE infrastructure, built around 2001, synchronized learning objects over so- called regional knowledge pools [Duval 2001]. This replication of resources was based on a three-level star- shape topology, in which regional knowledge pools replicated from and to the central knowledge pool. End users interacted with those local knowledge pools that in turn interact with regional ones. A replication scheme such as this could replicate collection descriptors over multiple registries.

CouchDB [CouchDB], a NoSQL document store, includes a built in replication function. Each CouchDB instance maintains a local copy of data. The replication process is used to synchronize a source instance

17

CWA 16385:2012 (E) with a target instance. By uniquely identifying documents and update versions, the replication process is able to discern difference between the two instances and efficiently only transfer the deltas from the source to the target. All of the mechanics of replication are hidden from the user. Any CouchDB instance can be connected to any number of other instances, in any topology. If there is at least one path from any instance to all other instances, the entire collection will reach a state of “eventual consistency” were all instances are fully synchronized. CouchDB replication can be triggered periodically by an external process, or can run continuously; whenever a change is introduced (new, update) into one instance, it is immediately replicated at all targets. Note that couchDB is not a specification but a rather a tool that can be used to achieve replication of documents. 6.4. Publish 6.4.1. General

One of the requirements of a registry is that it should be possible to add/delete repositories to a registry and to update collection descriptors in a registry. 6.4.2. The Simple Publishing Interface (SPI)

The Simple Publishing Interface (SPI) is used to push digital resources or their metadata into a repository [CWA16097 2010, Ternier 2010]. SPI makes relatively few assumptions about the resources and metadata that can be published. Therefore, it is a good candidate for being used in the context of a registry. The UML class diagram in Figure 6 illustrates these assumptions.

Figure 6 - Digital Resources and their Metadata. Reproduced from [CWA16097 2010]

Every resource must have an identifier and may have an associated filename. A resource can be described by zero or more metadata instances. Every metadata instance describes exactly one resource. It must have a metadata identifier that identifies the metadata instance itself and must have the resource identifier of the resource it describes.

SPI does not assume that a resource and its metadata need to be published in the same repository. As a consequence, SPI supports four operations:

• Submitting (publishing) a resource to a repository/registry,

• Deleting a resource from a repository/registry,

• Submitting a metadata record to a repository/registry, and

• Deleting a metadata record from a repository/registry.

There is no explicit operation to update a resource or a metadata instance. An SPI binding can offer this functionality out-of-band as an SPI extension, or deleting the resource and re-publishing it may provide update functionality.

Besides extending the specification, an SPI binding may also omit operations. Similarly, an implementation of a binding can (if the binding allows this) omit operations. Depending on the content managed by a repository (i.e., resources only, metadata only, or resources and metadata), a repository can support different combinations of these operations. For instance, a referatory will only offer operations to submit and delete metadata and omit the operations to manage resources.

Submitting a resource involves sending a binary stream to the target. Depending on the binding that is used, this byte-stream can be encoded in various ways. SPI supports two ways of submitting a resource to a repository: "by-value" and "by-reference".

18

CWA 16385:2012 (E)

In by-value publishing, the resource is directly embedded (after encoding) in a message sent to a repository. In by-reference publishing, the message sent to the repository only contains a reference (e.g., a URL) to the submitted resource. It is then the responsibility of the repository to use this reference to retrieve the resource and store it.

By-value publishing is useful for a standalone application (e.g. an authoring tool), which is generally not associated with a web server from which a repository can obtain a resource. Embedding a resource in a message passed to the repository is beneficial for publishing a resource from a desktop application. It lowers the threshold for publication because uploading the resource to a third party component that hosts the referenced resource is unnecessary. By-reference publishing is particularly suited to publishing large resources, since embedding large files into a single message may cause degraded performance, resulting in a need for a distinct method (e.g., FTP, HTTP, SCP, etc.) for a large resource.

The submission of a metadata instance to a repository is similar to the submission of a resource by-value. The metadata instance itself is embedded in a message sent to the repository. Since multiple metadata instances can describe a single resource, the operation specifies the identifier for the metadata instance and the identifier of the resource it describes. Publishing an additional metadata instance for a resource can be realized by publishing it using a different metadata identifier.

SPI supports two delete operations, one for resources and one for metadata. These operations are straightforward. The identifier of the object (resource or metadata instance) to delete is submitted to the repository that then completes the deletion.

19

CWA 16385:2012 (E)

7. Collection Registry Specification 7.1. General This clause explains the combination of the data model and the protocols that the project team proposes as a first specification to assure technical interoperability of Collection Registries. 7.2. Data Model In the first phase of this work, we decided to use a profiled version of the IMS LODE data model that has been described in Clause 5. To validate the use of this model, we have described all repositories and referatories in several European projects such as ASPECT and ICOPER, and in the GLOBE consortium and the ARIADNE federation.

Figure 6 shows an example description of the data model a XML binding. It captures all information about the OpenLearn Repository in the registry:

• The identifier of this repository:

o openlearn_open_ac_uk in the catalog of ARIADNE • The description of the repository:

o The full collection of the OpenLearn LearningSpace • The target description:

o The protocol identifier of the protocol to be used to access the repository: . oai-pmh-v2

o Protocol information (in this case OAI-PMH): . The location:

• http://openlearn.open.ac.uk/local/oai/oai2.php

. The supported metadata-formats:

• oai_dc: Dublin Core

• oai_lom: IEEE LOM

• oai_lre: IMS ILOX

. Day granularity:

• YYYY-MM-DDThh:mm:ssZ

. Earliest datestamp:

• 2006-10-25T00:00:00Z

For each repository that is registered in the registry, the full xml description can be seen in the registry interface. Figure 7 shows the Korea Open CourseWare repository that is accessible through the Simple Query Interface (SQI). Mandatory protocol information that is captured in this example is:

• Anonymous session

• Synchronous mode

• Result format:

o Metadata format: IEEE LOM

20

CWA 16385:2012 (E)

• Supported methods in Synchronous mode:

o setQueryLanguage

o setResultsFormat

o setMaxQueryResults

o setMaxDuration

o setResultsSetSize

o synchronousQuery

o getTotalResultsCount • Session Identifier:

o (given) Persistent ID • Target location:

o http://www.kocw.net/home/services/TargetService

21

CWA 16385:2012 (E)

Figure 6 - Description of an OAI-PMH Target with the Registry Data Model

22

CWA 16385:2012 (E)

Figure 7 - Description of an SQI Target

Figure 8 shows a collection in a Fedora-based repository, available by using SRU/W. Mandatory protocol information that is described in this example are:

• target location:

o http://fedora.dlib.indiana.edu:8080/SRW/search/FedoraAdmin • Supported Version:

o 1.1 • Allowed Record Schemes:

o http://dlib.indiana.edu/xml/iudlAdmin/version1.0/

o srw/schema/1/mods-v3.1 • Extra allowed data:

o https://wiki.dlib.indiana.edu/display/INF/SRU+Server

23

CWA 16385:2012 (E)

Figure 8 - Description of an SRU Target 7.3. Access Protocols This document recommends that a collection registry supports the following protocols to enable interconnection of repositories and other registries.

• OAI-PMH for synchronizing collection descriptors between interconnected registries.

• SPI for adding, updating and deleting repositories from a registry.

• SQI for searching the contents of the registry. As a query language, we recommend to adopt PLQL level 1. This level can express exact searches on metadata fields. The latter are denoted by means of paths. It supports paths as simple concatenations of elements (separated by dots), starting from the root, with no omission; expressions and parentheses are not allowed. For instance, one could send a query to a registry metadataCollection.target.targetDescription.protocolIdentifier.entry = “oai-pmh” to get all repositories in the registry that have an OAI-PMH target on top of its repository.

The implementation that is described in the following clause supports both the data model and the protocols that are described herein. This version is used to validate if we can reach our objectives with this combination of protocols 7.4. Functionality 7.4.1. General

The following four use cases demonstrates four basic use cases that a collection registry should support minimally to set up a network of interoperable collection registries:

24

CWA 16385:2012 (E)

I. Add a new Collection Descriptor

II. Search available Collection Descriptors

III. Harvest available Collection Descriptors

IV. Modify a Collection Descriptor

The following use cases are not mandatory but are recommended:

V. Report a problem

VI. Subscribe to news

VII. Remove a Collection Descriptor

VIII. Login

All of these use cases are described in the clauses below. 7.4.2. Add a New Collection Descriptor Summary:

A person adds a new collection descriptor to the registry. The addition of a new descriptor into the registry is published as a news item in a RSS feed.

Actors:

A registered user.

Trigger:

Triggered by the actor.

Description:

• The user navigates to the login page of the registry web page. If a user is not yet registered, he needs to register first.

• The user logs in and navigates to the add repository page.

• The system shows both a form that the user can fill in and an upload field that allows the user to upload a metadata instance that describes the repository to be added. The form contains the necessary fields to enter needed information about the new repository.

• The user fills in the form or uploads a metadata object describing the repository.

• The system tries to automatically extract information from the combination of the protocol and the target URL and asks the user for confirmation of this extracted information.

• The system tries to validate the information provided by the user.

• The user validates all information and clicks on a "save in registry" button.

• The system publishes the news story of a new repository so that other interested partners (persons, client tools, etc.) are aware of a new target.

Result:

A new target has become visible in the registry. Subscribed users are informed about this.

Remarks:

A collection registry owner can decide to add a validation or quality check to this ingestion procedure before finally adding the target to the registry.

25

CWA 16385:2012 (E)

7.4.3. Search Collection Descriptors of the Registry Summary:

Any actor or person is able to search the contents of the registry.

Actors:

Any Person.

Trigger:

Triggered by the actor.

Description:

• The user navigates to the registry web page.

• The system shows a list of all repositories in the registry. From that page, a user can issue a simple query to select repositories that are of interest to him.

• The user fills in simple keywords or tags.

• The system displays the list of repositories matching the issued query. The list contains all information that is available in the registry.

Result:

The user is able to find interesting repositories that are available in the registry.

Remarks:

Besides basic information about the repositories in the registry, the registry can display the complete metadata instance that describes the repository. 7.4.4. Harvest Available Collection Descriptors of the Registry Summary:

Any actor or person is able to harvest all collection descriptors of the Registry.

Actors:

Any Person.

Trigger:

Triggered by the actor.

Description:

• The user navigates to the configuration page of his harvesting tool that is based on OAI-PMH. This tool can be a registry on its own that supports pulling data using OAI-PMH.

• The user configures his harvesting tool with the OAI-PMH tool of this harvester to harvest (some of) the contents of the registry and enters the destination registry where the collection descriptors should be harvested.

• The harvesting tool harvests the collection descriptors to the provided destination.

Result:

The user is able to harvest all or some of the collection descriptors that are available in the registry.

Remarks:

26

CWA 16385:2012 (E)

If the collection registry supports incremental harvesting (see OAI-PMH protocol for more information), the user only has to configure the target once. After this, harvesting can be automated. 7.4.5. Modify a Collection Descriptor

Summary:

A person modifies a collection descriptor in the registry. The modification of the descriptor into the registry is published as a news item in a RSS feed.

Actors:

A registered user.

Trigger:

Triggered by the actor.

Description:

• The user navigates to the login page of the registry web page. If a user is not yet registered, he needs to register first.

• The user logs in and navigates to the search page of the registry. If the target in question is not listed in the default screen, he issues a query to find the target he wants to modify.

• The system shows both a form that the user can fill in and an upload field that allows the user to upload a metadata instance that describes the repository to be added. The form contains the necessary fields to modify existing information about the target.

• The user fills in the form or uploads a metadata object describing the repository.

• The system tries to validate the information provided by the user.

• The user validates all information and clicks on a "save in registry" button.

• The system publishes the news story of a modified repository so that other interested partners (persons, client tools, etc.) are aware of this fact.

Result:

A modified target is visible in the registry. Subscribed users are informed about this. 7.4.6. Report a Problem with the Registry Summary:

A person is using the user interface of the Registry and detects a problem or a bug. The person can report this problem to the support team of the Registry.

Actors:

Any Person.

Trigger:

Triggered by the actor.

Description:

• The users is browsing the registry web application.

• The system displays an error or a problem, or the information that is presented to the user contains an error.

• The user can click on the ‘Found a bug’ that is available in the user interface.

27

CWA 16385:2012 (E)

• The system forwards the page to the “Request For Change” tracking system.

• The system pre-fills information in the form so that there is a low-barrier for the user to report problems.

• The user adds further information about the bug he has found.

• The user submits the problem.

• The system sends out an automated mail to the support team of the Registry.

Result:

The Collection Registry support team learns about a problem that is found by a user. They can take action to make sure the problem is fixed.

Remarks:

A screenshot of the pre-filled form of the “Request for Change” tracking system can be seen in Figure 9.

Figure 9 - Screenshot of a possible Request for Change system that is used in the ARIADNE Registry. In this case, the TRAC system is used 7.4.7. Subscribe to News Summary:

News is provided by a set of RSS feeds to which any person can subscribe.

Actors:

Any Person.

28

CWA 16385:2012 (E)

Trigger:

Triggered by the actor.

Description:

• The user navigates to the registry web page, and can subscribe to the following RSS feeds:

o List of all repositories added.

o List of all OAI-PMH targets added.

o List of all SQI targets added.

o List of all SPI targets added. • The user adds the feed to his or hers favorite newsreader.

Result:

The user is able to get news updates on the registry in his newsreader. 7.4.8. Remove a Collection Descriptor Summary:

A person removes a new collection descriptor to the registry. The removal of the descriptor from the registry is published as a news item in a RSS feed.

Actors:

A registered user.

Trigger:

Triggered by the actor.

Description:

• The user navigates to the login page of the registry web page. If a user is not yet registered, he needs to register first.

• The user logs in and navigates to the search page of the registry. If the target in question is not listed in the default screen, he issues a query to find the target he wants to modify.

• The system shows both a form that the user can fill in and an upload field that allows the user to upload a metadata instance that describes the repository to be added. The form contains the necessary fields to modify existing information about the target.

• The user fills in the form or uploads a metadata object describing the repository.

• The system tries to validate the information provided by the user.

• The user validates all information and clicks on a "save in registry" button.

• The system publishes the news story of a modified repository so that other interested partners (persons, client tools, etc.) are aware of this fact.

Result:

A modified target is visible in the registry. Subscribed users are informed about this. 7.4.9. Login Summary:

A user logs on to the registry configuration.

29

CWA 16385:2012 (E)

Actors:

A registered user such as a Collection owner.

Trigger:

Triggered by the actor.

Description:

• A user navigates to the login page of the registry configuration tool.

• The system displays the logon page with 2 fields (username + password) and a 'forgot password" link.

• The user fills in his or her credentials.

• The system checks the provided credentials and displays all pages that can be accessed by a logged-in user.

Result:

The user is able to:

• add repositories,

• search repositories in the registry,

• remove repositories from the registry,

• edit repositories configuration,

• configure a harvester,

• etc.

30

CWA 16385:2012 (E)

8. 1st Reference Implementation: The ARIADNE Registry 8.1. General The ARIADNE Foundation is a non-profit association that aims to foster Share and Reuse of Learning Resources. ARIADNE uses a collection registry to provide the information necessary for systems to be able to select the appropriate protocols such as OAI-PMH, SQI, SPI, SRU/SRW supported by a given learning object repository. In the first year of this project, ARIADNE, ASPECT and this project team collaborated to build a first reference implementation that can be found at: http://ariadne.cs.kuleuven.be/ariadne-registry

This implementation uses a profiled data model as described in 7.2. 8.2. Functionality The registry reference implementation provides the following use cases that were described above:

I. Add a new Collection Descriptor

II. Search available Collection Descriptors

III. Harvest available Collection Descriptors

IV. Modify a Collection Descriptor

V. Report a problem

VI. Subscribe to news -- The user can subscribe to the following RSS feeds:

a. List of all repositories added:

http://ariadne.cs.kuleuven.be/ariadne-registry/rss/LastTargetAdded.jsp

b. List of all OAI-PMH targets added:

http://ariadne.cs.kuleuven.be/ariadne-registry/rss/LastTargetAddedOai.jsp

c. List of all SQI targets added:

http://ariadne.cs.kuleuven.be/ariadne-registry/rss/LastTargetAddedSqi.jsp

d. List of all SPI targets added:

http://ariadne.cs.kuleuven.be/ariadne-registry/rss/LastTargetAddedSpi.jsp

VII. Remove a Collection Descriptor

VIII. Login 8.3. Implementation The registry has been built on top of the ARIADNE repository architecture, which is a standards-based architecture for managing learning objects in an open and scalable way. Detailed information on the ARIADNE repository architecture can be found in [Klerkx 2010].

At the moment, the registry implementation supports:

• OAI-PMH for synchronizing collection descriptors between interconnected registries:

o http://ariadne.cs.kuleuven.be/ariadne-registry/services/oai • SPI for adding, updating and deleting repositories from a registry:

31

CWA 16385:2012 (E)

o http://ariadne.cs.kuleuven.be/ariadne-registry/services/SPI • SOAP-binding of SQI for searching the contents of the registry:

o Session Management: . http://ariadne.cs.kuleuven.be/ariadne-registry/services/SqiSessionManagement

o Target: . http://ariadne.cs.kuleuven.be/ariadne-registry/services/SqiTarget

• RESTful binding of SQI for searching the contents of the registry:

o http://ariadne.cs.kuleuven.be/ariadne-registry/api/sqitarget

32

CWA 16385:2012 (E)

9. 2nd Reference Implementation: LORRy 9.1. General A second learning object repository registry is available online at: http://lreregistry.eun.org:5984/registry/_design/registry/index.html

The Learning Resource Exchange (LRE:http://lreforschools.eun.org) Learning Object Repository Registry (LORRy) is used to manage information about the LRE content providers and the learning object collections they wish to share using the LRE. The registry is used to manage the entire lifecycle of the relationship between a content provider and the LRE.

This implementation uses the profiled data model as described in 7.2. The implemented use cases (as defined in detail in Clause 8) of this registry are described below. 9.2. Functionality The 2nd registry reference implementation provides the following use cases that were described in 7.4:

I. Add a new Collection Descriptor

a. Note that LORRy has a 2-step registration. Once that an initial form with basic information about the collection is reviewed and accepted by LORRy’s administer, the provider is invited to complete a second form with technical information, such ad the details of supported protocols.

II. Search available Collection Descriptors

III. Harvest available Collection Descriptors

IV. Modify a Collection Descriptor 9.3. Implementation

The LRE Registry is implemented on top of Apache CouchDB [CouchDB], a document-oriented database that offers incremental replication with bi-directional conflict detection and resolution. The registry is a key element of a federation such as the LRE. Inter-registry replication makes it possible to easily maintain and update a back-up registry that is, therefore, always ready to replace the primary registry in case of a problem. Moreover, this replication feature also permits the sharing and exchange of collection descriptions between different federations.

From a specification standpoint, the LORRy makes collection descriptions available in the IMS LODE Registry format using the OAI-PMH protocol. This target is available here: http://lreregistry.eun.org:5984/registry/_design/registry/_list/oai-pmh/all-records

33

CWA 16385:2012 (E)

10. Case Studies 10.1. General This document provides two case studies of the functionality of a collection registry. Subcluase 10.2 shows interoperability between two reference implementations of this CWA: the ARIADNE registry and LORRy. clause demonstrates interoperability between a reference implementation (i.e ARIADNE) and the Learning Registry which is an existing registry of learning resources. 10.2. Interoperability between the ARIADNE Registry and LORRy 10.2.1. Context

Both the ARIADNE registry and LORRy are reference implementations of the model herein. Functionality of both collection registries has been described in Clauses 8 and 9. This case study validates Interoperability between those on two fronts: data model and protocols. 10.2.2. Storing Collection Descriptions

Both reference implementations use the same approach to describing collections and make use of the IMS LODE data model that has been described herein. In ARIADNE, the ARIADNE validation service has been created to ensure that only compliant metadata are stored in its Collection Registry. This validation service provides validation of descriptions against predefined application profiles by using a combination of techniques including XSD schema and Schematron rules. 10.2.3. Demonstration

To validate interoperability between registries using the model described herein, the following demonstration, as shown in Figure 10 has been developed. Whenever a content provider registers his learning object repository (LOR) to the ARIADNE registry (step 1a) by providing the necessary information such as used data models to describe learning resources and protocols to access those resources a number of semi- automated steps occur. First, the Collection Registry sends an alert to the ARIADNE harvester (step 1.3) and publishes the newly added LOR as a news item in its RSS feed (Step 1.3) to share the collection with possibly interested parties. The ARIADNE harvester tool harvests a metadata description for each learning resource of the content provider (step 3.1). This description in enriched by providing it a unique identifier in the ARIADNE federation (step 3.2), transforming it to the ARIADNE application profile (step 3.3), validating it against the validation scheme in use (step 3.4) and finally publishing it into the ARIADNE LOR (step 3.5).

The ARIADNE registry and LORRy synchronize their collection descriptions using OAI-PMH (step 4). The ARIADNE Registry supports automatic incremental harvesting by allocating datestamps to all collection descriptions. In the current demonstration, LORRy successfully synchronized all collections of the ARIADNE Registry to LORRy. At the moment 22 SQI targets, 64 OAI-PMH targets and 1 SRU target are synchronized in the network of registries.

34

CWA 16385:2012 (E)

Figure 10 - Demonstration Interoperability (adapted from [Klerkx 2010]). 10.2.4. Analysis

Sharing collection descriptions with the approach defined herein is quite effective using OAI-PMH for synchronizing resources. However, this scenario only works optimal if content providers register their LORs in one registry. If they would register them in multiple registries, extra functionality could to be built in each registry such as a duplication detector. However, we consider this out-of-scope of this document as duplication detectors functionality has to be employed on the level of learning objects, because one can never be sure that learning resources aren’t available in multiple repositories. Whenever duplicates are detected, a tool such as the harvester, can decide not to publish the duplicate in the repository.

Furthermore, if a content provider would modify information about his collection, the registry should update the date-stamp of description to the modification date. This way, incremental harvesting can be employed in the network of registries. 10.3. Interoperability between the ARIADNE registry and the Learning Registry 10.3.1. Context

The Learning Registry [Rehak 2011] takes a slightly different approach to the sharing and distribution of collection descriptions. Like this work, the Learning Registry supports the concept of having a description of a collection, i.e., the metadata record(s) that describe the various aspects of the collection, such as information about the collection as a whole, information about the repository that the collection resides in, and information about the services that the repository provides.

The Learning Registry differs in that it does not tie a collection registry to any single federation or any single storage location. The Learning Registry is a collection of nodes, a node essentially being a metadata repository. These nodes can be operated independent of any learning object repository or collection, and the core nodes of the Learning Registry public distribution network are not associated with any repository (private nodes may be operated by the same organization that hosts and operates a repository, but even in this case, the Learning Registry node and the repository are separate entities). Each node contains a diverse collection of metadata records, not metadata of just a single type or schema. These metadata records can

35

CWA 16385:2012 (E) describe learning objects, services, collections, repositories, registries or anything else. In essence, a registry in this work is a logical subset of the metadata records that contain collection descriptions held at a node of the Learning Registry.

While the data model herein assumes that there is a single authoritative collection description, the Learning Registry enables different parts of the description to be held in different description records, e.g., each service target might be in a separate record. Allowing multiple description records could simplify adding services or changing their descriptions through incremental operations.

The Learning Registry is designed to allow data from any source to be published at any node, and likewise for this data to be accessed from any node. Underlying the Learning Registry approach is a content distribution protocol; the metadata published at any node will eventually, and automatically, be synchronized with all other nodes in the network. Federations, per se, do not exist; all nodes are peers. Publication and query may happen anywhere, no provider or consumer of a collection description needs to know about the makeup of a federation or the structure of the Learning Registry network; they can go to any node and publish their data or get the latest collection descriptions. The Learning Registry uses CouchDB as its implementation layer, thus distribution and synchronization is automatic. 10.3.2. Storing Collection Descriptions

In the Learning Registry, a collection description is just a metadata record, and the Learning Registry supports all types of metadata records. An XML data instance of the data model described herein can be stored directly in the Learning Registry as is.

The Learning Registry native data store at each node is a NoSQL database of JSON documents. The XML collection description needs to be wrapped in a JSON document for storage. This JSON document, called a learning resource description (a collection is just another kind of learning resource) contains some data not present in the LODE data model; some of this data is Learning Registry specific, some duplicates the data in the XML and some is additional descriptive data. This additional data is used to enable certain features of the Learning Registry. The Learning Registry model of the JSON document describing a learning resource, i.e., the metadata encoding describing a learning resource, consists of [LR 2011]:

• Resource Locator: the resource locator (a URL) provides a unique identifier for the learning resource described in the metadata. When the metadata is for a collection description (the learning resource is the collection description), it is the way to uniquely describe the repository that the collection is held in. The locator provides a way to reference the collection, to access it, and to assemble collations of data from different sources that describe it, i.e., all the different metadata records about a single locator for the entire description of the collection.

• Data Provider: the separate identity for each person or organization that:

o owns the resource being described, i.e., the owner of the collection or repository,

o owns and curates the metadata about the resource, i.e., who provides the collection description (the Learning Registry allows someone other than the owner of the collection to provide the metadata that describes it), and

o is submitting or publishing the resource description into the Learning Registry network, i.e., whoever is providing and publishing the metadata (in the Learning Registry, one organization may harvest from a collection registry that is owned by someone else and publish the harvested collection description data without being the metadata owner or curator).

These values are used to allow consumers to access and filter resource descriptions that come from specific providers. The different data provider values separate collection ownership, curating metadata about a collection, and publishing that data.

• Infomation Assurances: URL identifying the statement of information assurances under which the description for the learning resource is being provided, , e.g., is the collection description public, private, copyrighted, or licensed. Information assurances apply only to a collection description, not to the identified collection. Information assurances provide a mechanism to support approved reuse and sharing of collection descriptions by clearly identifying the conditions under which the metadata may be shared.

36

CWA 16385:2012 (E)

• Digital Signature: Submissions are digitally signed using OpenPGP. The signature supports message integrity and tamper resistance, and provides a provable assertion that the submitter is in control of their digital identity, thus enabling a mechanism to build trust and reputation about submitters.

• Hashtags: Collection of unstructured tags that describe whatever is described in the metadata, e.g., unstructured hashtags describing the collection. Hashtags provide an alternative to formal metadata, and a way to associate arbitrary information with a learning resource.

• Payload Schema: the designation of the schema and storage model for the payload metadata, e.g., that the metadata is LODE.

• Resource Data: the metadata that describes the learning resource, e.g., the LODE XML collection description record.

• Workflow Data: information such as message IDs, versions, time stamps, transit nodes, etc., used to manage the resource description as it flows through the distribution network.

Taking a LODE XML collection description record and wrapping it in the Learning Registry JSON document structure is rather straightforward with a few exceptions due to missing or ambiguous data in the LODE metadata.

• Resource Locator: The LODE metadata does not have a single URL for the collection, but rather has a URL for each service target. There is no automated, unambiguous way to discern the collection URL from the service target location. E.g., for the OAI service at http://caad.asro.kuleuven.be/MACE/oai/index.php, the collection URL might be http://caad.asro.kuleuven.be/ or http://caad.asro.kuleuven.be/MACE/. If multiple collection descriptions are submitted to the Learning Registry, the Resource Locator is used to collate them. For purposes of demonstration, we assume the Resource Locator is the domain name portion of the URL of the first service target location in the XML.

• Data Provider:

o Collection Owner: The LODE metadata does not require the identification of the owner of the repository or collection. There is no automated, unambiguous way to discern it from the XML. Since this is optional data for the Learning Registry, for purposes of demonstration, it is omitted.

o Owner/Curator of the Metadata Record: The LODE metadata does not describe who created or owns the collection description. There is no automated, unambiguous way to discern it from the XML. In the demonstration, we are harvesting data, so thus we assume that the metadata is owned and curated by the source used in harvest. This assumption is clearly invalid when metadata records have been federated and the harvest comes from the federator and not the federant.

o Publisher to the Learning Registry: This data needs to be provided by whoever is actually submitting the collection description to the Learning Registry. For demonstration, we use the identity of the publishing organization participating in the demonstration.

• Information Assurances: Rights to the collection description are not provided in the LODE metadata. For the purpose of demonstration, we assign CC0 rights to the collection descriptions. In the experiment the data is only published on a temporary basis to a Learning Registry test network, so the data is never made available to the general public under these terms.

• Digital Signature: The submission process automatically generates a digital signature on behalf of the publishing organization.

• Hashtags: This list of unstructured hashtags is optional. For purposes of demonstration, to enable discovery, four hashtags are extracted from the XML:

o The catalog name from the identifier.

o The entry name from the identifier.

o The string value from the description, and 37

CWA 16385:2012 (E)

o The type of each service available, e.g., OAI-PMH, SQI, SPI, SRU. We also tag the record as “collection description”. Since the Learning Registry contains multiple, diverse metadata records, this simplifies consumers being able to separate collection descriptions from other data without knowledge of the XML schema or XML data.

• Payload Schema: the schema is LODE. The Learning Registry uses an uncontrolled vocabulary for schema names; thus for demonstration the value used is “imsloreg”. Different communities of practice could adopt a different vocabulary value.

• Resource Data: the LODE XML collection description record as is, wrapped in a JSON string.

• Workflow Data: Automatically generated. 10.3.3. Collection Descriptions Protocols

The Learning Registry provides the following services at each node:

• Distribution Service: used to distribute resource descriptions from one node to another node, providing the essential capability to flow data through the network. Distribution from a node to its partners is triggered periodically, based on node policies and data volumes.

• Publish Services: used to push learning resource descriptions, e.g., collection descriptions, from external data providers into a node for distribution through the network.

o Publish: The publish API supports publishing one or more resource description directly to a specific node.

o SWORD: The SWORD API lets a SWORD client submit resource descriptions directly to a node.

o OAI-PMH Intermediary: By policy Learning Registry nodes do not directly harvest data sources. A data provider may configure an OAI-PMH intermediary that harvests data from the source, transforms it into a JSON resource description and uses the publish API to push it to a node.

• Access Services: used by consumers to pull resource descriptions and other data from a node for external processing.

o Obtain: The Obtain API gets the list of all resource descriptions or all resource identifiers held at a node, gets the data for a list of resource description identifiers, or gets the collation of all data for a given resource identifier, i.e., it can obtain and collate all the collection descriptions for a single collection.

o Harvest: The Harvest APIs support OAI-PMH harvest (with Learning Registry extensions) of data from a node. The APIs may be used to return the complete raw JSON resource descriptions (by document ID or collated by resource identifier), or the payload data, in XML, for a specified payload schema.

o Slice: The Slice API returns a subset of the resource description data to a consumer. The API provides limited views into the data, e.g., subset by identity, by schema, by keyword value.

The Learning Registry can support the collection registry protocols either directly or indirectly.

• Synchronization: The OAI-PMH distribution protocol is not needed for synchronization through the Learning Registry distribution network. The Learning Registry automatically supports synchronization through its distribution process. Nodes in the network support OAI-PMH harvest. Thus an external client can access any Learning Registry node and harvest by metadata type, requesting “imsloreg” metadata, and can use the returned data to link and synchronize the Learning Registry network into an external federation or collection registry.

• Adding, Updating, Deleting: The Learning Registry supports SWORD-based publishing to add data.Update and delete are currently not supported by the SWORD 1.3 API, but will be supported by the upcoming SWORD 2.0 API. The Learning Registry does not have a closed set of APIs, thus an 38

CWA 16385:2012 (E)

SPI interface could be developed. In addition, as noted, it’s possible to build an OAI-PMH harvest proxy that takes data from another collection registry and publishes (or synchronizes) the collection descriptions into the Learning Registry network.

• Search: The Learning Registry does not have a native search API; others can build this functionality on top of the built in CouchDB Map-Reduce-View functionality, or a Learning Registry node can be linked to a search API like Elastic Search (which uses Lucene to index the data). As with SPI, a SQI or SRU search interface that is specific to collection descriptions could be developed.

The slice function can be used to find data that matches the value of “collection description” in the hash tags or “imslorg” in the schema, returning the entire set of collection descriptions held at a node. These can be piped into another filtered search. External access to collection descriptions is also provided via OAI-PMH harvest or via direct access to individual metadata records, either by record, or by collection using the collection resource URL to collate the data about the collection. 10.3.4. Demonstration

To validate using the Learning Registry to support interoperability of collection descriptions, the following demonstration has been developed.

Figure 11

39

CWA 16385:2012 (E)

• A set of LODE XML collection descriptions was harvested from an existing collection registry that supports OAI-PMH data harvest (the Ariadne registry). An existing open source harvest intermediary that performs the harvest, wraps the XML into the Learning Registry JSON document format, and publishes these directly into a node in the Learning Registry was customized to extract the LODE metadata and fill in the other values that are described above. This demonstrates both synchronization to an external registry and adding data.

• The Learning Registry distribution process automatically flows the data to other nodes in the network, providing network wide data synchronization.

• At a different node, data can accessed in three ways:

o The OAI-PMH client at the node can be called, requesting data that is of metadata dissemination “imslorg”. This demonstrates synchronization to an external registry.

o The slice API can be called to get resource descriptions that are collection descriptions. Requests are for data slices that either have the hashtag “collection description” or the payload schema “imslorg”. This demonstrates query.

o A simple API using Elastic Search has been built to demonstrate query. 10.3.5. Analysis

The lack of some data in the LODE data model makes use of OAI-PMH to extract existing collection data from existing federations problematic, since the full description of the collection identity, metadata curator and terms of service for the collection description are missing. If collection descriptions are entered directly into the Learning Registry, this data could be provided by the client when it is added to the Learning Registry. But an OAI-PMH harvest from the Learning Registry will not return this data, as it is part of the JSON, not part of the LODE XML (a Learning Registry harvest of the JSON is available that would return the data).

With the exception of OAI-PMH, the APIs provided by the Learning Registry are not directly interoperable with those described herein. Equivalent functionality to create end-user clients does exist. Directly interoperable APIs could be built.

Since the Learning Registry model is slightly different from that described herein, and uncouples data distribution and synchronization from federation, OAI-PMH is just used to federate a Learning Registry network with a collection registry federation.

40

CWA 16385:2012 (E)

11. Conclusions The present document has given guidance to enable the connection of learning object repositories, in order to further increase their impact in making relevant content available to teachers, trainers and (life-long) learners, by specifying how a network of registries can be set such that changes in the description of a repository only needs to be made once. Clause 7 has provided a specification that is divided in three parts:

• A data model to describe collections (7.2);

• Access protocols to get information from the collection registry at the one hand, and to add or modify the contents of the registry on the other hand (7.3).

• Mandatory and recommended Functionality of a Collection Registry (7.4).

Two reference implementations were built that follow this specification. Two different case studies have been provided in this document to show the functionality of the collection registry and to prove the validity of the implementation. Those case studies showed that the data model has to be profiled further. More in detail, the collection identity, metadata curator and terms of service for using the collection description should be added. However, both case studies showed successful sharing of collection descriptions between three federations.

41

CWA 16385:2012 (E)

12. Bibliography [ASPECT 2010] The ASPECT infrastructure and Toolset v2.0. Available at: http://www.aspect- project.org/sites/default/files/docs/ASPECT_D2p6.pdf

[ATOM 2005] The Atom Syndication Format, RFC 4287, IETF, December 2005. Available at: http://www.ietf.org/rfc/rfc4287.txt

[ATOM 2007] The Atom Publishing Protocol, RFC 5023, IETF, October 2007, Available at: http://tools.ietf.org/html/rfc5023

[bibCQL 2009] The bib Context Set for CQL, Version 1.0, The Library of Congress, July 2009. Available at: http://www.loc.gov/standards/sru/resources/bib-context-set.html

[CouchDB] Apache CouchDB: The Apache CouchDB Project. Available at: http://couchdb.apache.org/

[CQL 2008] CQL: Contextual Query Language, SRU Version 1.2 Specifications, The Library of Congress, August 2008. Available at: http://www.loc.gov/standards/sru/specs/cql.html

[CWA14645 2003] Availability of alternative language versions of a learning resource in IEEE LOM, CEN Workshop Agreement 14645:2003 (E), European Committee for Standardization, January, 2003. Available at: ftp://cenftp1.cenorm.be/PUBLIC/CWAs/e-Europe/WS-LT/cwa14645-00-2003-Jan.pdf

[CWA15454 2005] A Simple Query Interface Specification for Learning Repositories, CEN Workshop Agreement 15454:2005 (E), European Committee for Standardization, November 2005. Available at: ftp://ftp.cenorm.be/PUBLIC/CWAs/e-Europe/WS-LT/CWA15454-00-2005-Nov.pdf

[CWA15555 2006] Guidelines and Support for Building Application Profiles in e-Learning, CEN Workshop Agreement 15555:2006 (E), European Committee for Standardization, June, 2006. Available at: ftp://ftp.cenorm.be/PUBLIC/CWAs/e-Europe/WS-LT/CWA15454-00-2005-Nov.pdf

[CWA16097 2010] The Simple Publishing Interface (SPI) Specification, CEN Workshop Agreement 16097:2010 (E), European Committee for Standardization, February 2010. Available at: ftp://ftp.cenorm.be/CEN/Sectors/TCandWorkshops/Workshops/CWA16097.pdf

[DCCQL 2007] The Dublin Core (DC) Context Set, Version 1.1, The Library of Congress, July 2007. Available at: http://www.loc.gov/standards/sru/resources/dc-context-set.html

[Duval 2001] Duval, E., et al., The Ariadne Knowledge Pool System, Communications of the ACM, (44/5), May 2001.

[IEEELOM 2002] IEEE Standard for Learning Object Metadata, IEEE Std 1484.12.1™-2002, IEEE Computer Society, September 2002.

[ISO2146 2010] Information and documentation - Registry services for libraries and related organizations. Available at: http://www.iso.org/iso/catalogue_detail.htm?csnumber=44936

[Klerkx 2010] Klerkx, J., Vandeputte, B., Parra, G., Santos, J.L., Van Assche, F., Duval, E., (2010). “How to Share and Reuse Learning Resources: the ARIADNE Experience”, in Proceedings of Fifth European Conference on Technology Enhanced Learning, (EC-TEL 2010), September, Barcelona, Spain.

[LR 2011] Learning Registry Technical Specification. Available at: http:// goo.gl/2Cf3L

[LODE 2010] D. Massart, N. Nicholas, and N. Ward, IMS GLC Learning Object Discovery and Exchange Base Document, v1.0, IMS Global Learning Consortium, March 2010. Available at http://imsglobal.org/LODE/spec/imsLODEv1p0bd.html

42

CWA 16385:2012 (E)

[LOMCQL 2007] IEEE LOM CQL Context Set, Federated Repositories for Education (FRED) Project, V1.82, University of Southern Queensland, 2007. Available at: http://fred.usq.edu.au/files/CQLFREDContextSet182.pdf

[Massart 2010] D. Massart, et al., “Taming the Metadata Beast: ILOX”, D-Lib Magazine, 16(11/12), 2010. Available at: http://www.dlib.org/dlib/november10/massart/11massart.print.html

[OAIPMH 2002] The Open Archives Initiative Protocol for Metadata Harvesting, Open Archives Initiative, V2.0, June 2002. Available at: http://www.openarchives.org/OAI/openarchivesprotocol.html

[Paulsson 2003] F. Paulson, Standardized Content Archive Management – SCAM. IEEE Learning Technology Newsletter, 5(1), 40-42. Available at: http://lttf.ieee.org/learn_tech/issues/january2003/learn_tech_january2003.pdf

[Paulsson 2009] F. Paulson, “Connecting learning object repositories - strategies, technologies and issues”, Proceedings from The Fourth International Conference on Internet and Web Applications and Services, Venice, Italy, 2009.

[PENS 2006] Guidelines for Package Exchange Notification Services, CMI010, AICC, March 2006. Available at: http://aicc.org/docs/tech/cmi010v1a.pdf

[PLQL 2008] S. Ternier, et al., “Interoperability for Searching Learning Object Repositories, The ProLearn Query Language”, D-Lib Magazine, 14(1/2), 2008.

[Sitemap 2008] Sitemaps XML Format, V0.90, sitemaps.org, February 2008. Available at: http://www.sitemaps.org/protocol.php

[Rehak 2009] D. Rehak, N. Nicholas and N Ward, “Service-Oriented Models for Educational Resource Federations”, D-Lib Magazine, 15(11/12), 2009. Available at: http://www.dlib.org/dlib/november09/rehak/11rehak.html

[Rehak 2011] D. Rehak, S. Midgley, P. Jesukiewicz. “The Learning Registry: ‘Social Networking for Metadata”, in Proceedings Open Repositories 2011, Austin TX, June 2011.

[RSS 2009] RSS 2.0 Specification, RSS Advisory Board, October 2009. Avaibable at: http://www.rssboard.org/rss- specification

[SRU 2007] Search/Retrieval via URL Specifications, SRU Version 1.2 Specifications, The Library of Congress, August 2007. Available at: http://www.loc.gov/standards/sru/specs/

[SWORD 2008] SWORD AtomPub Provife V 1.3. Available at: http://www.swordapp.org/docs/sword-profile- 1.3.html

[Ternier 2010] S. Ternier, D. Massart, M. Totschnig, J. Klerkx, and E. Duval, “The simple publishing interface (spi),” D-Lib Magazine, 16(9/10), 2010. Available at: http://www.dlib.org/dlib/september10/ternier/09ternier.html

[UDDI 2004] UDDI Verson 3.0.2, OASISUDDI Spec TC, October 2004. Available at: http://www.oasis- open.org/committees/uddi-spec/doc/spec/v3/uddi-v3.0.2-20041019.htm

[XQuery 2007] XQuery 1.0: An XML Query Language, W3C Recommendation, W3C, January 2007. Available at: http://www.w3.org/TR/xquery/

[Z39.50 2003] Information Retrieval (Z39.50): Application Service Definition and Protocol Specification, ANSI/NISO Z39.50-2003, NISO, 2003. Available at: http://www.loc.gov/z3950/agency/Z39-50-2003.pdf

43