Project Review - CORE (Connecting Repositories)Project Reviewpage 1 of 2
Total Page:16
File Type:pdf, Size:1020Kb
Discovery Project Review - CORE (COnnecting REpositories)project reviewPage 1 of 2 CORE was funded by JISC to improve access to collections that support research and education. This document is part of a series that describes the lessons from 8 JISC CORE projects funded under the Discovery programme in 2011 to explore open metadata for (COnnecting REpositories) libraries, museums and archives. More information about the projects can be found at: http://www.jisc.ac.uk/whatwedo/programmes/inf11/infrastructureforresourcediscovery.aspx http://core-project.kmi.open.ac.uk The other documents in the series can be found at: http://discovery.ac.uk Background Institution The Open University “The COnnecting REpositories Responsible Knowledge Media Institute (KMi) (CORE) project aims to facilitate group the access and navigation across relevant scientific papers stored in Open Access A research project in the Knowledge Media Institute, but repositories. The project will Capacity outputs embedded into the OU’s Open Research Online make a new open metadata service (ORO). repository available in the Linked Data format describing Metadata about Open Access scholarly articles, as stored in UK institutional repositories. Innovatively, CORE the semantic relatedness Data Scope between resources stored seeks to provide machine-generated metadata capturing across a selection of UK assertions about the similarity of one article to another. repositories, including the Open University Open Research More than 3 million RDF triples, describing over 50,000 Online (ORO).” Data Scale Open Access articles stored in 61 of the UK’s 168 institutional repositories. Mechanics Formats CORE relies upon use of the Open Archives Initiative Protocol for Metadata Harvesting1 (OAI-PMH) to harvest metadata from open access repositories around the UK. The team attempted to follow links in the metadata to PDF documents containing the full text of the paper, which proved difficult or impossible in the majority of cases. Although OAI-PMH is reasonably easy to work with, the project found that data quality varied quite significantly. For example, although 24% of metadata records contained a link to something described as the ‘full text’ of a paper, only half of these actually contained useful text (some were simply PDFs that, when opened, stated “No full text available”). Of those, only half (so 6% of the total set) contained text suitable for the project to analyse and compute relatedness to other documents. Enhancement CORE developed an RDF-based schema2 that is used to expose the statements of relatedness upon which the project depends. This RDF is freely available for reuse, and the schema reuses concepts from existing ontologies such as MuSIM3, BIBO4 and FOAF5 rather than inventing the whole structure from scratch. ______________ 1 http://www.openarchives.org/OAI/openarchivesprotocol.html 4 http://bibliontology.com 2 http://core-project.kmi.open.ac.uk/node/13 5 http://www.foaf-project.org 3 http://kakapo.dcs.qmul.ac.uk/ontology/musim/0.2/musim.html Improving access to collections and enabling new services for UK education and research Discovery Project Review - CORE (COnnecting REpositories) Page 2 of 2 Usability The CORE data is available via a basic search interface6 and a SPARQL endpoint7. It has also been embedded8 within the Open University’s institutional repository, Open Research Online, and made available via a pilot application for Android devices9. Impact Licensing CORE data is made available for reuse under a Creative Commons Attribution Licence. It is worth noting that the project team did not directly address any issues with respect to licenses for the data they harvested from third parties, “assuming” that “this [harvesting and processing] behaviour is in the spirit of open access.” Benefits – How will end users benefit? CORE was able to demonstrate that some meaningful relationships could be computed between items in institutional repositories, and that these relationships could be displayed to the end-user of a repository in ways that might facilitate further exploration on their part. Some limitations in the metadata provided by repositories, and the project’s focus upon solely open access material, serve to limit the broader utility of this as an end user-facing service, but there is clearly scope to explore opportunities for both data enrichment and a broadening of CORE’s scope. Outcome CORE is embedded within the Open University’s institutional repository, and the team have submitted a bid for further funding with which to extend their work. Lessons Learned • It is possible to compute a measure of similarity between documents in institutional repositories, and to report this similarity to end users. • Data quality within institutional repositories is a concern, if robust services are to be built that rely upon them. See Also • OpenDOAR - http://www.opendoar.org. A directory of open access repositories ______________ 6 http://core.kmi.open.ac.uk/search 8 http://core-project.kmi.open.ac.uk/node/22 7 http://core.kmi.open.ac.uk/squery 9 http://core-project.kmi.open.ac.uk/node/12 Improving access to collections and enabling new services for UK education and research.