Standards-Based Interfaces for Harvesting and Obtaining Assets from Digital Repositories

Total Page:16

File Type:pdf, Size:1020Kb

Standards-Based Interfaces for Harvesting and Obtaining Assets from Digital Repositories Op standaarden gebaseerde interfaces voor het verzamelen en verwerven van objecten uit digitale bewaarplaatsen Standards-Based Interfaces for Harvesting and Obtaining Assets from Digital Repositories Jeroen Bekaert Promotoren: prof. dr. ir.-architect E. De Kooning, dr. H. Van de Sompel Proefschrift ingediend tot het behalen van de graad van Doctor in de Ingenieurswetenschappen Vakgroep Architectuur en Stedenbouw Voorzitter: prof. dr. B. Verschaffel Faculteit Ingenieurswetenschappen Academiejaar 2005 - 2006 iv dedicated to P.B. (in memoriam) iii iv English Summary Various communities are deploying and maintaining digital repositories, as a serious and long-term commitment to store, safeguard, and share rich collections of digital assets. Each community has a different perspective on the design and management of digital repositories. For example, the learning community is developing repositories that focus on use and re-use of learning objects; the digital library community focuses on storage and access of scholarly articles; the cultural heritage community works on curation of digital images, and so on. This focus on particular materials and application domains has led to a variety of parallel technical approaches used in the design and implementation of repository systems, and, as a result to a serious lack of cross-community and cross-repository interoperability. The net result of this lack of interoperability is the inability to use and re-use assets stored in heterogeneous repositories in cross-repository services and service workflows. As a result, the promise and potential of a truly interconnected digital knowledge environment remains unfulfilled. This general problem presents itself at a smaller, yet significant, scale in the context of the ongoing deployment of institutional repositories. These repositories are increasingly used by universities and research institutions as tools to take on the stewardship of the materials created by their constituency, and to make them accessible to the general public. Several institutional repository systems have been developed and adopted successfully, including v DSpace (Massachusetts Institute of Technology & Hewlett-Packard Labs), Fedora (Cornell University & University of Virginia), and EPrints (University of Southampton). All of these institutional repository systems are conceived as autonomous software components and their content is accessed using diverse proprietary interfaces. This lack of common repository interfaces hinders the realization of the vision of a new scholarly communication system that would have these institutional repositories – and other scholarly repositories – as its core components, as realizing that vision would require the development of federations of such scholarly repositories, the provision of cross-repository service overlays, and the expression and invocation of automated workflow applications. The study that is object of this doctoral dissertation is framed in this problem space, and focuses on the definition and design of two essential interfaces to digital repositories: An interface to enable harvesting of batches of digital assets (harvest interface) and an interface to enable obtaining disseminations of a single digital asset (obtain interface). The characteristics and requirements for such harvest and obtain interfaces have been explored over the course of two elaborate experiments. A first experiment focused on the multi-faceted use of the Open Archives Initiative Protocol for Metadata Harvesting (OAI- PMH) and NISO’s OpenURL Framework for Context-Sensitive Services as information access protocols in a repository architecture designed and implemented for ingesting, storing, and accessing a vast collection of digital assets at the Research Library of the Los Alamos National Laboratory. A second experiment aimed at designing and implementing a robust solution for the recurrent and accurate transfer of digital assets from the repository of the American Physical Society to the aDORe environment of the Los Alamos National Laboratory; again the OAI-PMH played a significant role in the deployed solution. The insights obtained from these experiments are generalized and they result in the definition of generic harvest and obtain interfaces that could be supported across heterogeneous repositories. The proposed protocol-based interfaces operate in a manner that is neutral to the way in which a digital repository stores, manages and identifies its digital assets, and they build upon existing technologies that have recently received considerable buy-in from the digital library and scholarly information communities. These technologies are the OAI-PMH, NISO’s OpenURL Framework and several XML-based techniques for the representation and packaging of compound digital assets. By proposing these repository interfaces, the author hopes to contribute to the ongoing global discourse aimed at reaching new levels of interoperability across systems that store valuable content, and hence towards the realization of a richer scholarly information environment. vi Nederlandstalige Synthese Digitale bewaarplaatsen voor het ontsluiten, opslaan en langdurig bewaren van datacollecties hebben de laatste tien jaar een sterke toename gekend in uiteenlopende disciplines en vakdomeinen. Elke discipline heeft haar eigen visie op het ontwerp en beheer van digitale bewaarplaatsen. Zo zijn er bewaarplaatsen die zich richten op het gebruik en hergebruik van elektronisch leer- en studiemateriaal, digitale bibliotheken voor het stockeren en toegankelijk maken van onderzoeksartikels, bewaarplaatsen voor het ontsluiten van gedigitaliseerd cultureel erfgoed, enzovoort. Deze evolutie kan ongetwijfeld positief worden genoemd, zij het niet dat deze verscheidenheid aan visies en materialen tot een veelvoud aan parallelle technologische oplossingen voor het ontwerpen en ontwikkelen van digitale bewaarsystemen heeft geleid. Het gevolg hiervan is een gebrek aan interoperabiliteit van de verscheidene bewaarsystemen en disciplines. Het gemis aan eenvormigheid en coherentie leidt tot een onvermogen om objecten, die in heterogene digitale bewaarplaatsen gestockeerd worden, te gebruiken en hergebruiken in systeemoverschrijdende online-diensten en werkschema- toepassingen. De mogelijkheden van een sterk geconnecteerde digitale kennisomgeving worden bijgevolg niet gerealiseerd, laat staan optimaal benut. Deze problematiek komt tot uiting op een kleinere, maar belangrijke schaal in de huidige, sterke ontwikkeling van zogenaamd institutionele digitale bewaarplaatsen. Deze laatste vii worden door universiteiten en andere onderzoeksinstellingen meer en meer als hulpmiddel gebruikt om eigen onderzoeksresultaten en -artikels intern te bewaren en via het Internet toegankelijk te maken voor externe geïnteresseerden. Voorbeelden van kant-en-klare softwarepakketten voor de uitbouw van dergelijke institutionele bewaarplaatsen zijn DSpace (Massachusetts Institute of Technology & Hewlett-Packard Labs), Fedora (Cornell University & University of Virginia) en EPrints (University of Southampton). Elk van deze systemen wordt autonoom ontwikkeld en ontsloten aan de hand van onderling sterk verschillende technische specificaties. Door het gebrek aan uniforme interfaces wordt de realisatie van nieuwe wetenschappelijke communicatienetwerken, gebaseerd op institutionele of andere bewaarplaatsen, aanzienlijk gehinderd. Dergelijke netwerken zijn immers sterk afhankelijk van de ontwikkeling van federaties van bewaarplaatsen en van de uitbouw van online-diensten die de verschillende bewaarplaatsen overkoepelen. Het onderzoek dat in het kader van deze doctorale studie werd uitgevoerd, moet binnen deze interoperabiliteitsproblematiek worden gesitueerd. De studie richt zich op de definitie en het ontwerp van een aantal generieke interfaces voor heterogene digitale bewaarsystemen. Het proefschrift onderzoekt in het bijzonder interfaces voor het verzamelen van bundels van complexe objecten (harvest interfaces), en interfaces voor het verwerven van disseminaties van individuele complexe objecten (obtain interfaces). Om deze interfaces te kunnen ontwikkelen, en hun eigenschappen te verkennen, werden twee uitgebreide experimenten uitgevoerd. In een eerste experiment werd onderzoek verricht naar een digitale bewaaromgeving voor het deponeren, stockeren, verwerken en ontsluiten van de omvangrijke datacollecties van de Research Library van het Los Alamos National Laboratory. Zowel het Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) en NISO’s OpenURL Framework for Context-Sensitive Services Framework worden hier op een veelzijdige manier ingezet. Een tweede experiment beoogde de ontwikkeling van een raamwerk voor de herhaaldelijke en accurate overdracht van nieuwe en gewijzigde complexe dataobjecten van de digitale bewaarplaats van de American Physical Society naar de digitale bewaaromgeving van het Los Alamos National Laboratory. Ook hier is een belangrijke rol weggelegd voor het OAI-PMH raamwerk. De inzichten die uit beide experimenten zijn voortgevloeid, worden veralgemeend en monden uit in de definitie van interfaces voor het verzamelen en verwerven van digitale objecten. Deze interfaces zijn inzetbaar in een verscheidenheid aan digitale bewaarsystemen. De voorgestelde interfaces functioneren onafhankelijk van de wijze waarop een digitaal bewaarsysteem data- objecten stockeert, beheert en identificeert. De interfaces worden opgebouwd aan de hand van een aantal recente en eenvoudige technologieën die succesvol
Recommended publications
  • Bibliography of Erik Wilde
    dretbiblio dretbiblio Erik Wilde's Bibliography References [1] AFIPS Fall Joint Computer Conference, San Francisco, California, December 1968. [2] Seventeenth IEEE Conference on Computer Communication Networks, Washington, D.C., 1978. [3] ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, Los Angeles, Cal- ifornia, March 1982. ACM Press. [4] First Conference on Computer-Supported Cooperative Work, 1986. [5] 1987 ACM Conference on Hypertext, Chapel Hill, North Carolina, November 1987. ACM Press. [6] 18th IEEE International Symposium on Fault-Tolerant Computing, Tokyo, Japan, 1988. IEEE Computer Society Press. [7] Conference on Computer-Supported Cooperative Work, Portland, Oregon, 1988. ACM Press. [8] Conference on Office Information Systems, Palo Alto, California, March 1988. [9] 1989 ACM Conference on Hypertext, Pittsburgh, Pennsylvania, November 1989. ACM Press. [10] UNIX | The Legend Evolves. Summer 1990 UKUUG Conference, Buntingford, UK, 1990. UKUUG. [11] Fourth ACM Symposium on User Interface Software and Technology, Hilton Head, South Carolina, November 1991. [12] GLOBECOM'91 Conference, Phoenix, Arizona, 1991. IEEE Computer Society Press. [13] IEEE INFOCOM '91 Conference on Computer Communications, Bal Harbour, Florida, 1991. IEEE Computer Society Press. [14] IEEE International Conference on Communications, Denver, Colorado, June 1991. [15] International Workshop on CSCW, Berlin, Germany, April 1991. [16] Third ACM Conference on Hypertext, San Antonio, Texas, December 1991. ACM Press. [17] 11th Symposium on Reliable Distributed Systems, Houston, Texas, 1992. IEEE Computer Society Press. [18] 3rd Joint European Networking Conference, Innsbruck, Austria, May 1992. [19] Fourth ACM Conference on Hypertext, Milano, Italy, November 1992. ACM Press. [20] GLOBECOM'92 Conference, Orlando, Florida, December 1992. IEEE Computer Society Press. http://github.com/dret/biblio (August 29, 2018) 1 dretbiblio [21] IEEE INFOCOM '92 Conference on Computer Communications, Florence, Italy, 1992.
    [Show full text]
  • UK Research Data Registry Mapping Schemes
    UK Research Data Registry Mapping Schemes Project Information Project Identifier PID TBC Project Title UK Research Data (Metadata) Registry Pilot Project Hashtag #jiscrdr Start Date 1 October 2013 End Date 31 March 2014 Lead Institution Jisc Project Director Rachel Bruce Project Manager — Contact Email — Partner Institutions Digital Curation Centre (Universities of Edinburgh, Glasgow, Bath); UKDA (University of Essex) Project Webpage URL http://www.dcc.ac.uk/projects/research-data-registry-pilot Programme Name Jisc Capital Programme Document Information Author(s) and Role(s) Alex Ball (Metadata Coordinator) Date 12 May 2014 Project Refs T3.1; D3.1 Filename uk-rdr-mapping-v09.pdf URL http://www.dcc.ac.uk/sites/default/files/documents/ registry/uk-rdr-mapping-v09.pdf Access This report is for general dissemination Document History Version Date Comments 01 29 November 2013 Summary of RIF-CS. First draft of DDI mapping. 02 8 January 2014 Second draft of DDI mapping. First draft of DataCite mapping. 03 22 January 2014 New introduction. Expanded summary of RIF-CS. First draft of EPrints ReCollect, NERC Discovery, and OAI-PMH Dublin Core mappings. 04 23 January 2014 Section on tips for contributors. 05 31 January 2014 Second draft of NERC Discovery (UK GEMINI), EPrints mappings. 06 12 February 2014 Third draft of DDI mapping. 07 18 March 2014 Corrections to all mappings in the light of codification. 08 21 March 2014 First draft of MODS mapping. Registry quality levels included. 09 9 May 2014 Fixed small errors. Added MODS elements to tips section. 1 Contents 1 Introduction 3 1.1 Typographical conventions .
    [Show full text]
  • Describing Data Patterns. a General Deconstruction of Metadata Standards
    Describing Data Patterns A general deconstruction of metadata standards Dissertation in support of the degree of Doctor philosophiae (Dr. phil.) by Jakob Voß submitted at January 7th 2013 defended at May 31st 2013 at the Faculty of Philosophy I Humboldt-University Berlin Berlin School of Library and Information Science Reviewer: Prof. Dr. Stefan Gradman Prof. Dr. Felix Sasaki Prof. William Honig, PhD This document is licensed under the terms of the Creative Commons Attribution- ShareAlike license (CC-BY-SA). Feel free to reuse any parts of it as long as attribution is given to Jakob Voß and the result is licensed under CC-BY-SA as well. The full source code of this document, its variants and corrections are available at https://github.com/jakobib/phdthesis2013. Selected parts and additional content are made available at http://aboutdata.org A digital copy of this thesis (with same pagination but larger margins to fit A4 paper format) is archived at http://edoc.hu-berlin.de/. A printed version is published through CreateSpace and available by Amazon and selected distributors. ISBN-13: 978-1-4909-3186-9 ISBN-10: 1-4909-3186-4 Cover: the Arecibo message, sent into empty space in 1974 (image CC-BY-SA Arne Nordmann, http://commons.wikimedia.org/wiki/File:Arecibo_message.svg) CC-BY-SA by Widder (2010) Abstract Many methods, technologies, standards, and languages exist to structure and de- scribe data. The aim of this thesis is to find common features in these methods to determine how data is actually structured and described. Existing studies are limited to notions of data as recorded observations and facts, or they require given structures to build on, such as the concept of a record or the concept of a schema.
    [Show full text]
  • Index of /Sites/Default/Al Direct/2007/April
    AL Direct, April 4, 2007 Contents U.S. & World News ALA News Booklist Online D.C. Update Division News Round Table News Awards Seen Online Tech Talk April 4, 2007 Actions & Answers Poll Calendar U.S. & World News Axed library media teachers protest reassignment School librarians and their boosters are protesting a March 13 decision by the board of the Madera (Calif.) Unified School District to eliminate entirely as of the 2007–2008 academic year the category of library media teacher from its roster of certificated posts and to reassign to classrooms the four media specialists who serve MUSD’s three middle schools and two high schools.... Is accessibility a Salt Lake County rescinds “One Book” concern for you? Find selection, author invitation out how to meet your Two weeks after he was notified in January that his communication or 2004 novel An Unfinished Life had been selected by physical requirements. Salt Lake County (Utah) Library Services for its “One County, One Book” reading program, author Mark Spragg received an e-mail from the library informing him that the book’s selection and the library’s invitation to speak at an October event had been rescinded.... ALA News ALA to co-sponsor advocacy programs at TLA Annual Conference In partnership with Texas Library Association and Texas Woman’s University, ALA will co- sponsor two advocacy programs on Thursday, April 12, during TLA’s Annual Conference held April 11-14, in San Antonio. The programs Getting teens to read —“The ABCs of Advocacy” and “Creating Advocacy Leaders: An for fun is the ultimate Advocacy Institute Program”—are part of ALA’s Advocacy Institute challenge, yet research initiative...
    [Show full text]
  • Access Interfaces for Open Archival Information Systems Based on the OAI-PMH and the Openurl Framework for Context-Sensitive Services
    Access Interfaces for Open Archival Information Systems based on the OAI-PMH and the OpenURL Framework for Context-Sensitive Services Jeroen Bekaert1,2, and Herbert Van de Sompel1 1 Digital Library Research & Prototyping Team, Los Alamos National Laboratory, MS P362, PO Box 1663, Los Alamos, NM 87544-7113, US {jbekaert, herbertv}@lanl.gov 2 Dept. of Architecture and Urbanism, Faculty of Engineering, Ghent University, Jozef-Plateaustraat 22, 9000 Gent, Belgium {jeroen.bekaert}@ugent.be Abstract. In recent years, a variety of digital repository and archival systems have been developed and adopted. All of these systems aim at hosting a variety of compound digital assets and at providing tools for storing, managing and accessing those assets. This paper will focus on the definition of common and standardized access interfaces that could be deployed across such diverse digital respository and archival systems. The proposed interfaces are based on the two formal specifications that have recently emerged from the Digital Library community: The Open Archive Initiative Protocol for Metadata Harvesting (OAI-PMH) and the NISO OpenURL Framework for Context-Sensitive Services (OpenURL Standard). As will be described, the former allows for the retrieval of batches of XML-based representations of digital assets, while the latter facilitates the retrieval of disseminations of a specific digital asset or of one or more of its constituents. The core properties of the proposed interfaces are explained in terms of the Reference Model for an Open Archival Information System (OAIS). 1. Introduction The Open Archival Information System (OAIS) Reference Model [1] addresses a wide range of information preservation functions, including ingest, storage, management, and access.
    [Show full text]
  • Describing Data Patterns. a General Deconstruction of Metadata Standards 2013
    Repositorium für die Medienwissenschaft Jakob Voß Describing Data Patterns. A general deconstruction of metadata standards 2013 https://doi.org/10.25969/mediarep/4151 Veröffentlichungsversion / published version Hochschulschrift / academic publication Empfohlene Zitierung / Suggested Citation: Voß, Jakob: Describing Data Patterns. A general deconstruction of metadata standards. Berlin: Humboldt-Universität zu Berlin 2013. DOI: https://doi.org/10.25969/mediarep/4151. Erstmalig hier erschienen / Initial publication here: https://doi.org/10.18452/16794 Nutzungsbedingungen: Terms of use: Dieser Text wird unter einer Creative Commons - This document is made available under a creative commons - Namensnennung - Weitergabe unter gleichen Bedingungen 4.0/ Attribution - Share Alike 4.0/ License. For more information see: Lizenz zur Verfügung gestellt. Nähere Auskünfte zu dieser Lizenz https://creativecommons.org/licenses/by-sa/4.0/ finden Sie hier: https://creativecommons.org/licenses/by-sa/4.0/ Describing Data Patterns A general deconstruction of metadata standards Dissertation in support of the degree of Doctor philosophiae (Dr. phil.) by Jakob Voß submitted at January 7th 2013 defended at May 31st 2013 at the Faculty of Philosophy I Humboldt-University Berlin Berlin School of Library and Information Science Reviewer: Prof. Dr. Stefan Gradman Prof. Dr. Felix Sasaki Prof. William Honig, PhD This document is licensed under the terms of the Creative Commons Attribution- ShareAlike license (CC-BY-SA). Feel free to reuse any parts of it as long as attribution is given to Jakob Voß and the result is licensed under CC-BY-SA as well. The full source code of this document, its variants and corrections are available at https://github.com/jakobib/phdthesis2013.
    [Show full text]
  • LCC Principles of Identification V1 1 April 2014 with Appendices
    Linked Content Coalition Principles of Identification v1.1 Principles of identification Version 1.1, April 2014 Editors Norman Paskin ( [email protected] ), Godfrey Rust ( [email protected] ) An identifier is a name which is unique within its type and domain. This document comprises the LCC recommendations for the design and use of identifiers within the digital network in the content and rights supply chain. Detailed support for the recommendations is provided in the attached appendixes. These recommendations are presented as a model of best practise for identification to support the highest level of automation, interoperability, trust and accuracy within the network. They are not mandatory in the sense that none is legally or systematically enforceable for all identifier types, and failure to comply will not normally block the supply chain entirely, but make its operation more time- consuming, labour-intensive and error-prone. “The digital network” here includes, but is not limited to, the internet. 1 Entities: what should be identified? Public, persistent identification of key supply chain entities is essential. • Each entity which needs to be recognised distinctly in the digital network should be assigned at least one persistent public identifier so that it may be denoted unambiguously wherever that is required or useful. The entity denoted by an identifier is known as its referent. • A public identifier is one that is accessible and recognisable by people or machines within the digital network. • Key entities which require identifiers include each item of content (“ creation ”) which needs to be recognised (at whatever level of granularity is required), and each party (person or organization) who is recognised as, or claims to be, a contributor or rightsholder of content or an asserter of metadata.
    [Show full text]
  • Internet Bible As Is...(April 2011)
    * * * 1:1 [kjv] In the beginning was the Word, and the Word was with God, and the Word was God. [bbe] From the first he was the Word, and the Word was in relation with God and was God. 1:2 [kjv] The same was in the beginning with God. [bbe] This Word was from the first in relation with God. 1:3 [kjv] All things were made by him; and without him was not any thing made that was made. The Gospel from John, New Testament, The Holy Bible In computing, word is a term for the natural unit of data used by a particular computer design. A word is simply a fixed sized group of bits that are handled together by the system. The number of bits in a word (the word size or word length) is an important characteristic of computer architecture. The size of a word is reflected in many aspects of a computer's structure and operation; the majority of the registers in the computer are usually word sized and the amount of data transferred between the processing part computer and the memory system, in a single operation, is most often a word. The largest possible address size, used to designate a location in memory, is typically a hardware word (in other words, the full-sized natural word of the processor, as opposed to any other definition used on the platform). http://en.wikipedia.org/wiki/Word_(computing) Internet Bible as is....(April 2011) It was compiled and written by Igor ([email protected], http://www.appservgrid.com) on 04/07/2011 year of Anno Domini 0/0/42 years ACE (after Common Era appearing of RFC-1 at 04/07/1969 A.D.) at the Town like "Holy Sacrament" and "Es como el sagrado sacramento" ( http://en.wikipedia.org/wiki/Sacramento,_California ) in honour of names of greatest Vinton Gray Cerf and Tim Berners-Lee and thousands others.
    [Show full text]
  • LCC Identifiers Workstream Paper V0 2 Nov 7 2012
    Linked Content Coalition LCC identifiers specification v0.2 LCC Identifiers workstream Identifiers specification Version 0.2 | November 2012 These are the identifier specifications recommended for LCC. Detailed support for the recommendations is provided in the attached appendix. 1. Identification is essential • Each entity which needs to be recognised distinctly in the network should have at least one public identifier. • Types of entities here include those identified in the LCC Rights Reference Model: Party, Creation, Place, Context, Right, RightsAssignment, Assertion, RightsConflict . • In particular, this includes each item of content which needs to be recognised (at whatever level of granularity is required), and each person or organization who is recognised as (or claims to be) a contributor or rightsholder of content (an "interested party"). • A public identifier is not necessarily humanly readable: "public" means that it is accessible to people or machines within the network. 2. Identifiers should be resolvable • A resolvable identifier in the digital network is one that enables a system to locate the identified resource, or some metadata or service related to it, elsewhere in the network. • Because the World Wide Web is the dominant network using the Internet, then it is a minimum requirement to support that, and a potential requirement to support other networks. This in effect recognises the URI as a practical common framework for global digital content identification. First class identifiers may still be used where appropriate and expressed as URIs where necessary. The URI syntax can incorporate existing standard or proprietary identifiers while remaining globally unique, and much technology already exists for recognising and resolving URIs in various ways.
    [Show full text]
  • Options for Improving Access to Legislative Records: a White Paper
    Preserving State Government Digital Information Minnesota Historical Society Options for Improving Access to Legislative Records: A White Paper Abstract Access to information has been greatly enhanced by the ubiquity of the web. But there are limitations that impede even greater access. Information stored in proprietary formats or dynamically generated content may be unreadable by web search engines, thus making certain web content invisible to searchers. Persons using assistive technologies may have difficulty interpreting certain content elements. Approaches to making records accessible are compared and relative strengths and weaknesses of selected tools and file formats discussed. Ways to facilitate aggregation and analysis which can improve access while leveraging user-created value are also considered. Any comments, corrections, or recommendations may be sent to the project team, care of: Nancy Hoffman Project Analyst Minnesota Historical Society [email protected] / 651.259.3367 Overview of Access Considerations Legislative records that are only available in paper form clearly limits access to those citizens who can get their hands on a copy of the printed document. Now that the internet “has become the de- facto platform of interoperability and search engines have emerged as primary information portals,” 1 information made available on the web offers the promise of making these same documents and data easier to obtain and use. Electronic records may still pose serious barriers to access and use. They will have many of the same limitations as paper records if they are only held on internal network systems and not made available on the web. Even exposing records on the internet will not automatically make them easy to find and use.
    [Show full text]
  • OAI 4 Issues in Managing Persistent Identifiers
    OCLC Online Computer Library Center OAI 4 CERN Issues in Managing Persistent Identifiers Stuart Weibel Senior Research Scientist October, 2005 In the digital world… y Unambiguous identification of assets in digital systems is key: y Physical y Digital y Conceptual y Knowing you have what you think you have y Comparing identity (referring to the same thing) y Reference linking y Managing intellectual property What do we want from Identifiers? y Global uniqueness y Authority y Reliability y Appropriate Functionality (resolution and sometimes other services) y Persistence – throughout the life cycle of the information object The Identifier Layer Cake • Identifiers come in many sizes, flavours, and colours… what questions do we ask? Social Business Policy Application Functionality The Web: http…TCP/IP…future infrastructure? Social Layer y The only guarantee of the usefulness and persistence of identifier systems is the commitment of the organizations which assign, manage, and resolve identifiers y Who do you trust? y Governments? y Cultural heritage institutions? y Commercial entities? y Non-profit consortia? y We trust different agencies for different purposes at different times Business layer y Who pays the cost? y How, and how much? y Who decides (see governance model)? y The problem with identifier business models… y Those who accrue the value are often not the same as those who bear the costs y You probably can’t collect revenue for resolution y Identifier management generally needs to be subsidiary to other business processes Policy
    [Show full text]
  • Document History 1. Definition of the Subject 2. Keywords
    State of the Art Report (StoAR) Name: <StoAR_Persistent_Identifiers> Work Package: <Tools> Responsible: <Ahmed Rahali> Document history Date Version Status Author(s) Changes / comments <20.10.2004> 1 In progress ARA <13.01.2005> 1 First version ARA 1. Definition of the subject Persistent identification is regarded as an increasingly important aspect for strategies aimed to ensure effective management of digital information resources and their long-term access. The assignment of unique identifiers allows digital objects to be labelled or referenced in such a way that they can be reliably found over time in a dynamic distributed information environment. To date, various schemes have been suggested as a global approach to persistent identification, including the URN (Uniform Resource Name), the DOI (Digital Object Identifiers), the PURL (Persistent URL), the Handle System, the ARK (Archival Resource Key) and the XRI (eXtensible Resource Identifier). Each of these approaches came to birth as a result of progressive ongoing research and has succeeded, in varying degrees, to build communities around it. This document is aimed to provide an overview of current activity in the field of persistent identifiers and an understanding of the diverse persistent identification strategies being used and developed. The document is to serve as a reference for review and evaluation of all these various schemes and as a basis for recommendations of relevance to the eSciDoc project. 2. Keywords Persistent identifier, Persistent identification, Persistent link, Digital identifier, Persistence, Uniform Resource Name, Uniform Resource Identifier, Uniform Resource Locator, National Bibliography Number, Persistent URL, Handle, Digital Object Identifier, Archival Resource Key, eXtensible Resource Identifier, URI, URL, URN, NBN, PURL, DOI, ARK, infoURI, XRI.
    [Show full text]