Establishing an Eprint Repository at the University of Melbourne Implementation Aspects
Total Page:16
File Type:pdf, Size:1020Kb
Establishing an Eprint Repository at the University of Melbourne Implementation Aspects Eve Young and Shirley Sullivan University of Melbourne Melbourne, Victoria 3010 AUSTRALIA E-mail: [email protected] In 2002, the University of Melbourne Information Division established a repository for research output of University of Melbourne staff. The repository is one of a growing number, both nationally and internationally, using open source software compliant with the protocols and standards of the Open Archives Initiative. The paper discusses these and also outlines the authors’ experiences in establishing the repository. The paper complements EPRINTS@MELBOURNE by Jane Garner, Lynne Horwood and Shirley Sullivan and which outlines the means used to populate and publicise the repository to academic staff. Acknowledgements The authors wish to thank Andrew Gfrerer from Teaching, Learning and Research Support for his advice on technical issues. Introduction While there is more and more freely accessible academic content on the Internet, finding it can be difficult. Much relevant information is “hidden” within databases and repositories, as it is not picked up through popular search engines. Other hidden information is housed on academics' computers that hold a wealth of original content, such as research articles, field notes, images, etc., not all of which will appear in journals or books (Young 2002). Many academic institutions are creating "institutional repositories" for staff to upload copies of their research papers, data sets, and other work. The idea is to gather as much of the intellectual output of an institution as possible in an easy-to- search online collection (Young 2002). The Open1 Archives2 Initiative (OAI) is an international movement that encourages data sharing by developing and promoting technical standards and supporting organisational aspects so that distributed repositories become interoperable and cross searchable, increasing the visibility, accessibility and impact of scholarly repositories. As Krichel points out, the OAI “created the opportunity for the library community to enter as providers of freely available scholarly literature in institution-based digital archives” (Krichel 2002). The repository established at the University of Melbourne was named UMER (University of Melbourne Eprint Repository). The eprint working group comprised staff from the Teaching, Learning and Research Support and Information Resources 1 Open here means “open” from the architectural perspective – defining and promoting machine interfaces that facilitate the availability of content from a variety of providers. Openness does not mean “free” or “unlimited” access to the information repositories that conform to the OAI technical framework (http://www.openarchives.org/documents/FAQ.html). 2 The term "archive" in the name Open Archives Initiative reflects the origins of the OAI in the eprints community where the term archive is generally accepted as a synonym for repository of scholarly papers. (http://www.openarchives.org/documents/FAQ.html). In this paper, however, the authors are using the term repository to avoid confusion between the archivist and IT use of the word “archive”. Access departments of the Information Division. Their skills covered information technology, intellectual property, metadata and academic support services. Historical background The initiative had its origins in a desire for enhanced access to already existing eprint repositories. A forum was established in October 1999 and named the Universal Preprint Service (UPS) initiative. The forum was set up to “discuss and solve matters of interoperability between author self-archiving solutions, as a way to promote their global acceptance.” (http://vole.lanl.gov/ups.htm - accessed 30th October 19993) Sponsors of the forum were the Council on Library and Information Resources (CLIR), the Digital Library Federation (DLF), the Scholarly Publishing and Academic Resources Coalition (SPARC), the Association of Research Libraries (ARL) and the Research Library of the Los Alamos National Laboratory (LANL). The participants in the meeting were academic librarians and computer scientists specialising in archiving, metadata, and interoperability. (The list of invitees is at Appendix 1; the list of institutions represented is in Appendix 2.) Such was the enthusiasm to achieve results that by the time of the first meeting, a prototype for the UPS multidisciplinary digital library service was created for the main existing eprint repositories. By the end of October the initiative had changed its name to the Open Archives Initiative. This change of name reflected the wider utility expected of the software, which was no longer seen as restricted to eprint repositories. Open Archives Initiative The OAI is based at Cornell University and is supported by the Coalition for Networked Information (CNI) and the Digital Library Federation (DLF). A steering committee4 sets policy and a technical committee5 advises on the infrastructure. OAI provides a mechanism that allows OAI compliant participants to register themselves as data and/or service providers (Needleman 2002, p.156), although registration is not required in order to use the protocols. OAI compliance means using unqualified Dublin Core metadata tags, which ensures that distributed documents in OAI compliant documents can be searched as though they are one large database. 3 This document is still accessible at http://www.openarchives.org/news/ups1-press.htm, but has changed its title to the name it now goes by. 4 Members of the OAi steering committee include the following: • Caroline Arms (Library of Congress) • Lorcan Dempsey (Joint Information Systems Committee, UK) • Dale Flecker (Harvard University) • Ed Fox (Virginia Tech) • Paul Ginsparg (Los Alamos National Laboratory) • Daniel Greenstein (DLF) • Carl Lagoze (Cornell University) • Clifford Lynch (CNI) • John Ober (California Digital Library) • Diann Rusch-Feja (Max Planck Institute for Human Development) • Herbert van de Sompel (Cornell University) • Don Waters (The Andrew W. Mellon Foundation) 5 The interoperability infrastructure was developed by a technical committee, which continues to advise on the infrastructure as experience with it develops. Herbert Van de Sompel and Carl Lagoze are responsible for coordination of OAI activities, which are centered at Cornell University. These are two of the original group who met at Sante Fe in October 1999. (http://www.openarchives.org/documents/FAQ.html#Who manages the Open Archives Initiative) The Open Archive Interoperability6 Framework Interoperability is the ability of two or more systems or components to exchange information and use the exchanged information without special effort on either system.7 In other words, systems that comply with the interoperability standards are able to ‘talk” to each other in such a way that data found in one system is comprehensible and usable by another (Fietzer 2002 p. 82). The OAI model distinguishes between data providers and service providers, although it is possible to be both. An organisation can make its metadata available to service providers and at the same time harvest metadata from other data providers, using the harvested metadata, either alone or in conjunction with its own metadata, to provide value-added services (Needleman 2002 p.156). (Gathering of metadata about a digital resource by a service provider is called metadata harvesting.) The Open Archives website maintains a growing list of tools implemented by members of the OAI community. These cover the needs of both data providers and service providers and can be found at http://www.openarchives.org/tools/tools.html. Data providers8 By implementing the OAI technical framework, data providers (such as institutional repositories and discipline-specific repositories) provide a submission mechanism, a long-term storage system and a means of exposing metadata for harvesting by the service providers. Service providers9 Service providers search and harvest metadata from OAI compliant data providers and use it as a basis for creating value- added services such as indexes, catalogues and portals to materials that are distributed across multiple libraries, museums, archives, and other repositories (Digital Library Federation 2001). The search engines available from UMER’s website are Arc (http://arc.cs.odu.edu), MyOAI (http://www.myoai.com/), and OAISTER (http://oaister.umdl.umich.edu/o/oaister/). The eprint working group also investigated DP9 (http://arc.cs.odu.edu:8080/dp9/index.jsp), but making use of this required a lot of preliminary work, so this will be revisited in 2003. Citebase (http://citebase.eprints.org/) is another tool to watch. Citebase is a citation-ranked and impact discovery service. It presently only enables searches across arXiv, CogPrints and BioMed Central repositories but will expand beyond these. “The most immediate plans are to include coverage of RePEc and eprints.org repositories, the latter targeting citation indexing at institutional archives for the first time” (Hitchcock 2002a). Citebase harvests the OAI metadata records for papers in these repositories, but also extracts references from each of the papers and ranks search results based on references to papers (Hitchcock 2002a). 6 Interoperability is a broad term, touching many diverse aspects of archive initiatives, including their metadata formats, their underlying architecture, their openness to the creation of third-party digital library