OAI-PMH for Absolute Beginners a Non-Technical Introduction
Total Page:16
File Type:pdf, Size:1020Kb
Tutorial 1 OAI and OAI-PMH for absolute beginners a non-technical introduction Monica Duke UKOLN, University of Bath, United Kingdom [email protected] Philip Hunter UKOLN, University of Bath, United Kingdom [email protected] Overview of the morning ¾ Overview and Introductions ¾ Part I History and overview ¾ Short break (10.30 am) ¾ Quiz ¾ Part II Main Ideas of the OAI-PMH ¾ Part III Implementation issues CERN Workshop on Innovations in Scholarly Communications (OAI3) 12th-14th Febuary 2004 Acknowledgements ¾ These slides have a long history! ¾ Many of them have been kindly donated by (taken from!) Herbert Van de Sompel Carl Lagoze Michael Nelson Simeon Warner Andy Powell Pete Cliff Uwe Muller (and others probably!) CERN Workshop on Innovations in Scholarly Communications (OAI3) 12th-14th Febuary 2004 Tutorial 1 OAI and OAI-PMH for absolute beginners An introduction to the Open Archives Initiative and the Protocol for Metadata Harvesting Part I: History and basic concepts The Open Archives Approach ¾ Facilitates access to heterogenous web- accessible material ¾ A low-barrier interoperability solution ¾ Based on repositories supporting Metadata sharing Publishing Archiving ¾ Arose out of the e-print community ¾ 2 main features Open Archives Initiative OAI Protocol for Metadata Harvesting (OAI-PMH) CERN Workshop on Innovations in Scholarly Communications (OAI3) 12th-14th Febuary 2004 The Open Archives Initiative ¾ Mission "The Open Archives Initiative develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content." ¾ Executive for management, Steering and Technical Committees ¾ Funding Digital Library Federation (DLF) National Science Foundation (NSF) Coalition for Networked Information (CNI) ¾ Participation of a world-wide community, especially Europe and North America CERN Workshop on Innovations in Scholarly Communications (OAI3) 12th-14th Febuary 2004 OAI-PMH ¾ A mechanism for harvesting ¾ Data providers make metadata available for harvesting ¾ Service Providers harvest metadata ¾ Metadata can be centrally collected or “aggregated” ¾ That’s all it is: a way to bring metadata together in one place! CERN Workshop on Innovations in Scholarly Communications (OAI3) 12th-14th Febuary 2004 Open Archives Forum Tutorial ¾ Task List Page ¾ Task 1 Seven key definitions ¾ Local Link file:///D:/Moni/OAFTutorial/page1.htm#section3 ¾ Web link http://www.oaforum.org/tutorial/english/page1.htm#section3 CERN Workshop on Innovations in Scholarly Communications (OAI3) 12th-14th Febuary 2004 A History Lesson - Roots of OAI ¾ Early activity: scholarly research (eprints archive) XXX (arXiv) – high energy physics CogPrints - psychology NCSTRL – computer science technical reports RePEc - economics ¾ Web interfaces for people No machine interfaces ¾ Different interfaces for different archives ¾ End Users forced to learn diverse interfaces ¾ Little or no autonomous metadata sharing CERN Workshop on Innovations in Scholarly Communications (OAI3) 12th-14th Febuary 2004 Santa Fe Meeting ¾ “…the joint impact of these and future initiatives can be substantially higher when interoperability between them [e-print archives] can be established…” [Ginsparg, Luce, Van de Sompel, UPS Call, July 1999] CERN Workshop on Innovations in Scholarly Communications (OAI3) 12th-14th Febuary 2004 The Problems Two problems: ¾ End users were/are faced with multiple search interfaces making resource discovery harder. ¾ No machine based way of sharing the metadata CERN Workshop on Innovations in Scholarly Communications (OAI3) 12th-14th Febuary 2004 Cross Search? ¾ US Digital Library Experience suggests cross searching doesn’t scale - N > 100 = bad! ¾ Collection description - knowing which target to use ¾ Query language and search attribute variation ¾ Rank merging problem ¾ Different size and type of target can skew results ¾ Performance - limited to slowest target ¾ Difficult to build a browse interface SOLUTION: get all the metadata records in one place CERN Workshop on Innovations in Scholarly Communications (OAI3) 12th-14th Febuary 2004 Harvest? ¾ Harvest records out of archives into one place ¾ Universal Preprint Service Prototype So: ¾ N = 1 most of the time… ¾ One query language, set of search attributes and ranking algorithm ¾ An awareness of the data makes browse structures easier to build ¾ UPS was quickly changed to OAI - the Open Archives Initiative CERN Workshop on Innovations in Scholarly Communications (OAI3) 12th-14th Febuary 2004 Data and Service Providers ¾ Data Provider Creators and keepers of the metadata and repositories of resources Handle deposit and publishing ¾ Service Provider Harvesters of metadata for the purpose of providing a service such as a search interface, peer-review system, etc. ¾ One ‘service’ can play both roles CERN Workshop on Innovations in Scholarly Communications (OAI3) 12th-14th Febuary 2004 The Dawn of a Protocol To facilitate metadata harvesting there needs to be agreement on: ¾ Transport protocol - HTTP or FTP or … ¾ Metadata format - Dublin Core or MARC or … ¾ Metadata Quality Assurance - mandatory element set, naming and subject conventions, etc. ¾ Intellectual Property and Usage Rights - who can do what with what? ¾ Agreement led to (fanfare): the Santa Fe Convention CERN Workshop on Innovations in Scholarly Communications (OAI3) 12th-14th Febuary 2004 The Santa Fe Convention ¾ First incarnation of the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) ¾ Drew upon: The UPS Prototype RePEc/SODA - the Service/Data provider model the Dienst Protocol Work of the Santa Fe group ¾ To “optimise the discovery of e-prints” CERN Workshop on Innovations in Scholarly Communications (OAI3) 12th-14th Febuary 2004 The OAI-PMH 1.0 ¾ Introduced Dublin Core element set ¾ Drew upon: Santa Fe Convention Digital Library Federation meetings Work at Cornell Feedback from alpha-testers ¾ A new focus to facilitate the discovery of “document-like objects” CERN Workshop on Innovations in Scholarly Communications (OAI3) 12th-14th Febuary 2004 The OAI-PMH 1.0 - Summary ¾ Low barrier interoperability specification ¾ Based around metadata harvesting model ¾ Focus on “document-like objects” ¾ HTTP based ¾ GET / POST requests ¾ XML responses ¾ Uses unqualified Dublin Core ¾ Not a search protocol! ¾ Experimental CERN Workshop on Innovations in Scholarly Communications (OAI3) 12th-14th Febuary 2004 The OAI-PMH 1.1 ¾ A revision of the 1.0 specification taking account of changes to the emerging XML Schema specification CERN Workshop on Innovations in Scholarly Communications (OAI3) 12th-14th Febuary 2004 The OAI-PMH 2.0 ¾ Major revision - not compatible with 1.x ¾ Drew upon: OAI-PMH 1.x Feedback from OAI Implementers List OAI tech deliberation Feedback from alpha-testers ¾ “the recurrent exchange of metadata about resources between systems” CERN Workshop on Innovations in Scholarly Communications (OAI3) 12th-14th Febuary 2004 The OAI-PMH 2.0 - Summary ¾ Still a low barrier interoperability specification ¾ Based around metadata harvesting model ¾ Metadata about resources ¾ HTTP based ¾ GET / POST requests ¾ XML responses ¾ Uses unqualified Dublin Core ¾ Not a search protocol! ¾ Stable - OAI has committed to making subsequent revisions of the protocol backwards compatible CERN Workshop on Innovations in Scholarly Communications (OAI3) 12th-14th Febuary 2004 Santa Fe OAI-PMH OAI-PMH convention v.1.0/1.1 v.2.0 nature experimental experimental stable verbs Dienst OAI-PMH OAI-PMH requests HTTP GET/POST HTTP GET/POST HTTP GET/POST responses XML XML XML transport HTTP HTTP HTTP unqualified unqualified metadata OAMS Dublin Core Dublin Core document about eprints resources like objects metadata metadata metadata model harvesting harvesting harvesting CERN Workshop on Innovations in Scholarly Communications (OAI3) 12th-14th Febuary 2004 Multiple data and service p’s Data providers Harvesting based on OAI-PMH Service providers CERN Workshop on Innovations in Scholarly Communications (OAI3) 12th-14th Febuary 2004 Aggregators Data providers Aggregator Service providers CERN Workshop on Innovations in Scholarly Communications (OAI3) 12th-14th Febuary 2004 Can be mixed with x-searching Data providers Harvesting based on OAI-PMH Searching based on Z39.50 or SRW Service providers CERN Workshop on Innovations in Scholarly Communications (OAI3) 12th-14th Febuary 2004 The Benefits of OAI-PMH ¾ Simple ¾ Web (and so firewall) friendly ¾ Access-control, compression, error codes, etc. based on HTTP ¾ Many toolkits - can hide the protocol from developers ¾ Multiple SPs can harvest from multiple DPs ensuring a wider spread of metadata ¾ A base layer to build other services on ¾ Complements search protocols like Z39.50 CERN Workshop on Innovations in Scholarly Communications (OAI3) 12th-14th Febuary 2004 Summary So Far ¾ Early movers developing separately ¾ Need for interoperability ¾ Santa Fe Meeting led to OAI ¾ OAI promotes interoperability via: ¾ OAI-PMH Low cost Harvest model Data Providers / Service Providers Simple, easy and built on existing technology An open standard CERN Workshop on Innovations in Scholarly Communications (OAI3) 12th-14th Febuary 2004 Open Archives Forum Tutorial ¾ Task Page ¾ Task 2 Sources of further information ¾ Local link file:///D:/Moni/OAFTutorial/page2.htm#section9 ¾ Web link http://www.oaforum.org/tutorial/english/page2.htm#section9 CERN Workshop on Innovations in Scholarly Communications (OAI3) 12th-14th Febuary 2004 Tutorial 1 OAI and OAI-PMH for absolute beginners An introduction