KGCM Conference Paper
Total Page:16
File Type:pdf, Size:1020Kb
Developing the Discovery Layer in the University Research e- Infrastructure Malcolm WOLSKI, Joanna RICHARDSON, Mark FALLU, Robyn REBOLLO, Joanne MORRIS Division of Information Services, Griffith University Brisbane, Queensland 4111, Australia Abstract component of the research endeavour, and (2) research collaboration is fundamental to the resolution of the major Governments worldwide are faced with the challenge of challenges facing humanity in the twenty-first century [3]. creating research e-infrastructures to not only manage but also make accessible and discoverable increasingly large ANDS is building the Research Data Australia (RDA) amounts of research data. Universities in turn are under service [4]. It consists of web pages describing data pressure to ensure that their research strategies and collections produced by or relevant to Australian support services are aligned with these national researchers. RDA publishes only the descriptive metadata; imperatives. This paper describes a nationally funded it is at the discretion of the custodian whether access, i.e. Australian university initiative to build a research e- links, will be provided to the corresponding data. Behind infrastructure layer which connects individual researchers RDA lies the Australian Research Data Commons and the University to the Research Data Australia service (ARDC) which is the infrastructure and systems needed to in order to expose details of their research activity as well support data and metadata capture, publication feeds, and as available research data outputs. As governments work applications such as data integration, visualisation and towards fully functional e-infrastructures which will be analysis. both cross-disciplinary and cross-border, the semantic metadata exchange service described in this paper offers a 3. ANDS Objectives model which supports the interactive discovery of, and navigation to, content that may reside locally or across the The long term (ten year) objectives for data management world. within the Australian National Data Service (ANDS) are to: Keywords: Research infrastructure, VIVO, semantic web, Increase the amount of research data that is Vitro, discovery systems, Kepler routinely deposited into stable, accessible and 1. Introduction sustainable data management and preservation environments In a submission to the European Commission, Kroes [1] writes: ―Information and Communication Technologies Enable Australian researchers to discover, (ICT) are the most recent transformational factors in exchange, reuse and combine data from other science. They enable close and almost instantaneous researchers and other domains within their own collaboration between scientists all over the world and research in new ways they provide access to unprecedented volumes of Facilitate the sharing of Australian data to scientific information.‖ ICT have helped to create a world support international and nationally distributed in which knowledge—and its application—is seen as a multidisciplinary research teams key to global competitiveness and national prosperity is viewed as underpinned by knowledge innovation [2]. Support the development of data management Within this context, governments worldwide are grappling services and support within institutions that with the challenges of creating robust research e- promote good data management practices for infrastructures which can not only manage this researchers information but also ensure its discoverability and accessibility. Key stakeholders in the Australian research environment—ANDS, National Library of Australia, 2. Australian National Data Service funding bodies such as the Australian Research Council and the National Health and Medical Research Council, As part of the Australian government’s NCRIS (National research institutes and universities—all have knowledge Collaborative Research Infrastructure Strategy) initiative, to be shared. In building its national collaborative the Australian National Data Service (ANDS) was formed infrastructure, ANDS has utilised a federated approach to support the ―Platforms for Collaboration‖ capability. which supports multi-layers, i.e. RDA aggregates at the The service is underpinned by two fundamental concepts: national level data about Australian research which has (1) with the evolution of new means of data capture and been aggregated at the local level. storage, data has become an increasingly important Critical to the model is the ability to enhance project (activity) but also has the relationship (manages) discoverability and accessibility of all aspects of research datasets that, in turn, has relationship (is part of) to improve knowledge communication. The connectivity Collection A, etc. between research data and researchers is important, especially for purposes of re-use and in cross-disciplinary research. Identifying relationships between people, institutions, projects and the relevant research data created enhances opportunities for collaboration and new research [5] [6]. This paper describes how Griffith University has built a research e-infrastructure layer which connects individual researchers and the University to the Research Data Australia service. The local technical framework developed for the service is based on semantic web, triple store and open access technology. 4. Griffith University’s Metadata Exchange Hub Figure 1: RIF-CS – Linked Data A Metadata Exchange Hub has been developed as part of As part of the ANDS-EIF project, staff analysed the pros an ANDS-EIF (Education Investment Fund) funded and cons of existing software solutions as the potential project involving collaboration between Griffith foundation for the Hub. Since the major project driver was University and the Queensland University of Technology. to develop an open source solution which could be used as The Hub was built to meet ANDS’ requirements for an exemplar / good practice for Australian universities institutions to provide aggregated metadata store solutions which want to be part of the national collaborative to populate Research Data Australia (RDA). The research infrastructure, the Project Team decided to use a metadata feeds encapsulate metadata providing high-level semantic web solution called VIVO as the metadata store, descriptions of research datasets and entities related to which also includes mechanisms for the editing and them, such as researchers, research groups, research display of Hub metadata. Other software used for the projects and research services. The metadata schema used project included Kepler [11] for data workflow and is the Registry Interchange Format - Collections and transformation, OAI-CAT [12] for OAI-PMH provision, Services (RIF-CS) [7], which is a subset of the ISO and custom Java code for object Identifier creation. standard 2146 [8]. The development of a metadata aggregator (Hub) has become a core piece of 5. Architecture of the Hub infrastructure [9]. The following diagram (Figure 2) is a simple illustration To populate RDA, the metadata is harvested from of the Metadata Exchange Hub components. VIVO, institutions via the Open Archives Initiative’s Protocol for which is based on technology developed at Cornell, has Metadata Handling (OAI-PMH). This protocol is a HTTP been implemented with minimal changes to the REST based web service with six methods defined for underlying software architecture. Research activity interrogation and harvesting of structured metadata. The metadata is uploaded to Research Data Australia (RDA) default metadata schema for OAI-PMH is Dublin Core, using the Registry Interchange Format - Collections and but other schemas may also be used. For the purposes of Services (RIF-CS). transporting and aggregating research metadata for RDA, the RIF-CS schema is used. RIF-CS is a high level As part of the Metadata Exchange Hub project in schema that defines four classes of objects – collections, Australia, a number of additions have been made to VIVO parties, activities and services. The objects of these to support the requirements of the ANDS’ metadata stores classes may be related to each other via relationships program, including (a) an extended ontology capable of defined in a controlled vocabulary [10]. RIF-CS can also fully expressing RIF-CS and modelling research activity be effectively modelled using Resource Description in Australian research institutions; (b) an OAI-PMH Framework (RDF) and related semantic web standards. provider for OAI-PMH feeds; (c) customised web page See Figure 1. templates for presentation; and (d) workflow modules, e.g. Kepler, to support data ingestion and transformation. An important part of Griffith University’s Metadata A more detailed explanation of key modules follows. Exchange Hub is to expose the relationships –using RIF- CS—among researchers, their projects and their research outputs, as illustrated in Figure 1. These relationships form a linked graph. For example, Mary Jane (party) has the relationship (is a participant, i.e. researcher) of a Dublin Core terms\* http://purl.org/dc/terms/ Event Ontology\* http://purl.org/NET/c4dm/even t.owl# FOAF\* http://xmlns.com/foaf/0.1/ FOR 2008 Ontology http://purl.org/asc/1297.0/2008 /for/ geopolitical.owl http://aims.fao.org/aos/geopoli tical.owl# ns http://www.w3.org/2006/vcard /ns# SEO 2008 Ontology http://purl.org/asc/1297.0/2008 /seo/ SEO 1998 Ontology http://purl.org/asc/1297.0/1998 /seo/ SKOS (Simple Knowledge http://www.w3.org/2004/02/sk Organization System)\* os/core# time http://www.w3.org/2006/time# Vitro