arXiv:cs/0307008v1 [cs.DL] 4 Jul 2003 010/77801499.Junlvrin2003/03/14, version 03:24:35 Journal 10.1108/07378830310479794. aaHretn OIPH Lgz,20a stersl falmos of development. result and the experimentation is of Protoc 2002a) (Lagoz (Lagoze, Initiative (OAI-PMH) applicable Archives Harvesting widely Open data exten more application-neutral, was be OAI current, the to The of generalized communica During scope protocol scholarly the development associated archives. improving and eprint discussion of of between intention year interoperability Conve the Fe improved with Santa through the 2000), Fe and Sompel, Santa 1999) 1999 de (Ginsparg, the from meeting born Service was (OAI) Initiative Archives Open The Introduction Communication arly fteOI n euti htteeaeagoignme fern a eprint of number OAI-PMH. growing the a via are available there de is that the metadata is in which result roles One active OAI. play the to of continued have participants original the et)ueoln ehd ofidpitmtrasfrrsac (spr research for materials unde print and find graduate to (faculty, methods online respondents use of dents) 87% that reported optn n nomto cec,CrelUiest,N,USA. NY, University, Cornell Science, Information and Computing ∗ Keywords: hl h ou fteOIhsbodndt nld oeta ute just than more include to broadened has OAI the of focus the While eetsuy(relne,20)o h coal nomto en information scholarly the of 2002) (Friedlander, study recent A ulse sLbayH eh oue2,Nme ,1118(2 151-158 2, Number 21, Volume Tech, Hi Library as Published avsigmyb sdt ute mrv h tlt fepri infrastructu of communication utility scholarly the the improve of further component a to as used wh f be situations protocol several usi may discuss OAI harvesting services then the of I article using (OAI-PMH). and this repositories harvesting repositories, In eprint eprint from providers. sti OAI harvested data repositories of OAI eprint survey broadened, of brief fraction been significant has OAI a repositories. the eprint of between scope interoperability promote to pit n h pnAcie Initiative Archives Open the and Eprints h pnAcie ntaie(A)wscetda practica a as created was (OAI) Initiative Archives Open The pit pnAcie ntaie eaaaHretn,Schol- Harvesting, Metadata Initiative, Archives Open Eprint, [email protected] ienWarner Simeon Abstract 1 hsvrin ae 2003/07/04 Date: version: this lhuhthe Although r metadata ere gmetadata ng re. rmetadata or lrepresent ll rsn a present I tarchives nt gaut stu- rgraduate he years three t a between ead lfrMeta- for ol e n the and ded way l to (Van ntion 0) DOI: 003), cie for rchives velopment vironment ,2002c). e, Universal h first the ∗ prints, tion online databases, library finding aids, search engines, subject directories and other forms). This suggests that the inclusion of OAI-PMH harvested meta- data for in library finding aids, search engines and subject directories would increase the impact of eprints – with the added advantage for users that they can also access the material online. The same study reported that the 74% of respondents said that online access was their preferred access method for electronic journal articles. (As opposed to using a library terminal, or personal holdings.)

What is an eprint?

Different writers use the terms eprint in more or less general senses. Some imply a very general meaning, for example: “An Eprint Archive is a collection of digital documents” [1], which would include a private digital library or a proprietary electronic journal with restricted access. Others restrict the term to author-self archived electronic documents, and yet others apply the term only to author- self archived pre-prints. The term was originally used in the announcement of “Algebraic Geometry E-Prints” at Duke. Paul Ginsparg recounts “...originally ‘e-print’ was a pun on preprint, originally appeared on a page created by Dave Morrison at Duke in Feb 1992 for ‘Algebraic Geometry E-Prints’, the second archive based on my original hep-th csh scripts. (on that page, Dave credited his colleague Greg Lawler with coining the word.) the word ‘e-print’ then quickly devolved to meaninglessness but more recently has been rehabilitated to mean an article either in draft or final form self-archived by the author.” [2]. In this paper I use a definition similar to that given by Pinfield et al (Pinfield, 2002). I use the term eprint to group together many forms of scholarly literature for which there is to the full-content via the internet. Eprints may include: journal articles, pre-prints, technical reports, books, theses and dissertations. Eprints may or may not be refereed.

Who uses eprints?

The importance of eprints varies widely over different subject areas. Eprints have been most successful in high-energy physics where they are used as a dissemination mechanism that shortcuts the delays and access-restrictions as- sociated with conventional journals. This success is usually attributed to the pre-existing culture of sharing pre-prints (Kreitz, 1996; Kling, 2000). Kling and McKim (Kling, 2000) point out that other disciplines use electronic media in different ways and for different parts of the scientific communication process. They suggest that many of the differences between disciplines will be “durable features of the scholarly landscape”. We should thus avoid the temptation to imagine that simply extending the model of physicists’ use of eprints to other disciplines will succeed. However, the OAI framework is not dependent on one model of electronic media use and can provide a discipline independent infras-

2 tructure for metadata exchange while also supporting the exchange of discipline specific metadata.

Review and certification

Publication and review are not the same thing in general, and one does not nec- essarily imply the other. To publish means simply “to make generally known” or “to disseminate to the public” [3]. If we consider the four components of schol- arly communication described by Roosendaal and Guerts (Roosendaal, 1998) — Registration, Certification, Awareness and Archiving — registration is the com- ponent satisfied by publication, and certification may be satisfied by peer-review or some other process (perhaps more than one). The arXiv eprint archive [4] provides almost no certification and yet has transformed scholarly communication in some areas of physics. arXiv does pro- vide the registration, awareness and, to some extent, the archiving components of scholarly communication. However, even those physicists that rely on arXiv for dissemination of their research usually also rely on conventional journals to provide the certification (by peer-review) necessary to support career advance- ment and funding applications. Theses and dissertations are different from typical research articles in that they are subject to a certification process that is usually quite separate from publication and any associated revenue stream. This makes theses and dis- sertations one form of scholarly communication that already appear as eprints over a much broader range of subjects than are covered by the few successful discipline-based eprint repositories. The meaning of the certification of a the- ses or dissertation depends strongly on the reputation of the degree awarding institution so any metadata which will allow a user to assess the certification should include this information. The draft metadata format proposed by the Networked Digital Library of Theses and Dissertations (NDLTD) (Atkins, 2001) includes specific fields to describe the name, level, discipline and grantor of the degree. One hurdle for more widespread acceptance of eprints is users’ skepticism about the authenticity and credibility of information obtained from the Internet. A recent Digital Library Federation (DLF) commissioned study (Friedlander, 2002) reported that more than half of respondents say they verify the accuracy of information they obtain from the Internet. The same report shows a wide spectrum of methods used to determine the authoritativeness of information obtained from the Internet: 19% only reference known sources, 14% check with alternative sources, 13% trust the author, 9% trust the sponsoring organization or publisher, 9% trust the web site, 7% only reference academic sources provided by an accredited institution. These figures suggest that the identity of an eprint repository and the authority associated with it will be important in determining acceptance. There already exists a metadata format specifically for including “branding” information with OAI-PMH records (Lagoze, 2002b). The branding information

3 includes an icon and a link to be associated with the icon (typically the home page of the originating repository). The linked icon can be displayed by service providers which use harvested metadata. Greater adoption of this standard may help support the projection of repository identities and associated author- ity assumptions in OAI based services. This may be particularly effective for institutional repositories where the institution carries significant authority.

Eprints and the OAI

The recent SPARC white paper on institutional repositories (Crow, 2002) presents a compelling case for institutional repositories, a type of eprint archive, as part of an evolving scholarly publishing system. Interoperability as provided by the existing OAI infrastructure is cited as an essential infrastructure component required for the effective use of such repositories. Perhaps the most prominent system to encourage the creation of eprint repositories is the EPrints software [5], the goal of which is to promote au- thor self-archiving and institutional archives. Exchange of metadata via the OAI-PMH is a key element of the EPrints software and has been built into the system since its first version. At the very least, by sharing metadata, individual repositories will be included in the ‘union catalogs’ upon which OAI search and alerting services are based. In this way, institutional repositories keep the pub- lication and preservation functions with the institution, which has motivations to support these activities, while still allowing the eprints to be part of a global collection. The development of additional OAI-based services can add value to the entire collection or to selected segments. Table 1 shows that over half of the registered OAI data providers (reposi- tories) contain metadata about eprints. Determination of whether a repository contains metadata about eprints was made based on the repository name, de- scription and response to the OAI-PMH Identify verb. Only repositories reg- istered with the OAI website were included. Some repositories are registered twice because they support both versions 1.1 and 2.0 of the OAI-PMH. These duplicate entries and “test servers” were excluded from the percentages calcu- lated. Data provider type number % of total Metadata about eprints 57 54% Metadata not about eprints 30 29% Unreachable or broken 18 17% Duplicate v1.1 and v2.0 servers 9 not included Test servers 5 not included

Table 1: Survey of registered OAI data providers to estimate number that contain metadata about eprints (October 2002)

Table 2 shows the number of items (equivalently, the number of metadata

4 records in the mandatory Dublin Core format) in a sample of OAI repositories and an estimate of the number which describe eprints. The arXiv.org [4] eprint archive holds the largest collection of metadata about eprints (and the full content of those eprints) currently exported via the OAI-PMH. The OCLC xtcat repository [6] has many more records than arXiv.org, over 4 million, but exceedingly few include any means by which the full content may be accessed. Of the OAI data providers that export metadata for eprints, a significant fraction are repositories of theses and dissertations. Many of these repositories have just a few hundred records at present. The VTETD repository is a repository of electronic theses and dissertations with a significant number of records (3665) and approximately two-thirds include links to freely accessible full content.

Data provider Items Eprints Resource type arXiv [7] 212,976 212,976 articles / tech. reports / theses NCSTRLH [8] 20517 20517 articles / tech. reports NACA [9] 7549 7549 tech. reports RePEc [10] 231,822 ∼60001 articles / tech. reports/ author & institution records LTRS [11] 3002 3002 articles / tech. reports VTETD [12] 3665 24082 theses CogPrints [13] 1543 1543 articles BioMed Central [14] 1186 1186 articles CULEuclid [15] 4938 114 articles

1 Estimate provided by Christian Zimmermann after discussion on the repec-run mail- ing list http://lists.openlib.org/pipermail/repec-run/2002-November/000557.html 2 The metadata records are marked with ‘restricted’, ‘unrestricted’ or ‘mixed’. This number is the count of metadata records marked ‘unrestricted’. 3 The count of eprints includes only those articles which will remain ‘open access’, a significant additional number of articles are currently listed as ‘limited time open access’.

Table 2: Survey of selected OAI data providers with significant fraction of metadata about eprints (October 2002)

Table 3 is a survey of all registered OAI service providers. The list is domi- nated by search services, some of which have additional facilities. For example, torii [21] includes personalization and personal document storage facilities.

Metadata harvesting as a way to improve the util- ity of eprints

At present there are just a few significant eprint archives and they tend to be discipline based. This means that researchers in a particular field can know

5 Service provider Coverage Service arc [16] OAI repositories search my.OAI [17] 11 OAI DPs search with personalization Perseus [18] A few OAI repositories + search (full text for local) local OAIster [19] OAI repositories1 search iCite [20] arXiv2 search by author and citations torii [21] OAI eprints3 search, personalization, and personal document store PKP [22] OAI eprints4 search citebaseSearch [23] A few repositories5 search with local citation and impact analyses Scirus [24] A few OAI repositories + search proprietary + web NCSTRL [25] A few OAI repositories + search local6

1 Harvests from sources where full content is available digitally: the resources “have a corresponding web-based digital representation (e.g., this would not include the metadata records for slides when the slides cannot be accessed through the web).” 2 Limited to the physics section of arXiv (contains most of the submissions in arXiv). Harvests full-content outside of OAI-PMH. 3 arXiv, JHEP (not currently a registered OAI data provider), BioMed Central, M2DB (a local multi-media database). 4 Not yet operational, just a few records harvested. 5 Test service, harvests full-content PDF from arXiv, BioMed Central and CogPrints outside of the OAI-PMH for reference extraction. 6 Harvests from institutional CS technical-report repositories and includes the locally stored, historical NCSTRL collection.

Table 3: Survey of end-user services provided by registered OAI service providers (October 2002)

6 about the one or two archives appropriate to their interests, and services using metadata from eprint archives can manually select appropriate archives to har- vest from. If eprint archives become more numerous then these conditions will no longer hold. The OAI and, more recently, the Budapest Open Access Initiative (Chan, 2002) has spurred growth in the number of institutional archives. Presumably some institutional archives will combine electronic theses and dissertations with research articles. There are currently more electronic theses and dissertations archives than general institutional archives registered as OAI data providers. This is perhaps to be expected because efforts to encourage the use of electronic theses and dissertations have been going for longer. These efforts also have the advantage that universities usually have considerably more control over students than over faculty, some even mandate the electronic submission of theses and dissertations. It is likely that universities will have to put significant effort into promoting general institutional archives and assisting faculty in using them. The OAI-PMH is designed to support automation and this feature will be- come more important as the number of OAI data providers increases. In the next few sections I highlight a few areas where the OAI metadata harvesting infrastructure may improve the utility of eprints and eprint archives.

Discovery There are already several search engines based all or in-part on OAI harvested metadata. Some of these services also include local or web data and clearly one can see adding appropriate OAI metadata to a local search engine as a way of adding value to the local search and helping to make that a good starting point for information discovery. Metadata based search engines are good at answering certain types of ques- tion which make use of the structure of the metadata. Trivial examples are “find documents authored by Fred Bloggs” (notwithstanding problems created by possible lack of name-authority information to associate articles with author listed as “F Bloggs” and to separate from articles authored by some other Fred Bloggs), or “find documents that cite A” (given appropriate citation metadata). However, questions such as “find documents similar to document X” are unlikely to be answered well by query engines which do not have access to the full con- tent or more complete summary information. The OAI has steered away from specifying facilities for the exchange of full content for various reasons, includ- ing: appropriate use and rights issues, concerns about resource and bandwidth use, and the desire to create a strong base-line interoperability framework with as many players as possible. One way to improve discovery tools without going as far as sharing full content is to include summary metadata. Research articles in many fields typically include author-created abstracts, in other cases it may be appropriate to augment metadata with automatically generated summaries as surrogates for the full content.

7 Grouping and classification Resources from different sources and in different subject areas are often classified according to different classification schemes. Within an interoperable framework there will be the need to provide classifications across disparate collections and to group or classify documents according to criteria that were not imagined by the creators of the source repositories. In most cases it will not be feasible to use human catalogers. In recent years there have been significant advances in systems automatic classification and clustering of text-documents (Sebastiani, 2002). While these systems cannot yet compete with well-trained human catalogers, they do pro- duce useful classifications which can be inexpensively applied to large collec- tions. A large class of automated classifiers and clustering algorithms are based on analysis of word frequencies in the abstracts or full texts of documents. An obvious way to permit such analysis is to allow full content to be harvested and indexed but this has at least two potential problems: 1) the content may be prohibitively large and harvesting it may place excessive burdens on the re- sources of data providers, and 2) if full content is harvested, the harvester must understand every format available to be able to extract the text. Generation of summaries or text-only versions of the content (without markup) may prove an effective alternative which can both reduce bandwidth requirements and use local knowledge of content and formats to do these extraction jobs well.

Reference and citation linking Citebase [23] is an impressive demonstration of a reference and citation linking service built with automatically extracted reference data from eprints. While this data can be obtained only from the full content (PDF in the case of Cite- base) of articles, the reference data may then be shared. All reference and citation data extracted by Citebase is available via the OAI-PMH, it may be harvested and used by other services. Citebase also provides a search service which can rank results by an impact measure based on citations and web usage data. There is the exciting potential to realize a globally connected web of eprints based on link extraction and identifier resolution. Metadata exchanged via OAI- PMH will be one component of such a web and may be combined with context- sensitive linking services based on the OpenURL standard (Van de Sompel, 2001).

Rights management Rights management is important both to commercial entities who wish to pro- tect their resources, and also to providers of eprints and other open-access con- tent who may still be wish to describe appropriate uses or restrictions on use. Project RoMEO [26] is specifically investigating the rights issues surrounding open-access scholarly literature in the context of the OAI.

8 Current eprint repositories often include free-text rights statements regard- ing the use of their metadata and content. The creation of tools which under- stand rights information requires the development of machine readable rights metadata. Creation and inclusion of rights metadata was been suggested sev- eral times during the development but rejected as outside the scope of the core protocol and likely to be difficult or impossible given the broad scope of OAI. This task may be tractable within the limited scope of the eprints community.

Preservation The preservation of paper copies of journals and other scholarly works has tra- ditionally been a role of university and national libraries. The situation is much less clear with commercial electronic journals and even worse for eprints. Em- pirical studies of the persistence of objects in digital libraries (Nelson, 2001) and of web references in the scientific literature (Lawrence, 2001) have reported losses of ∼ 3%. Such losses would clearly be unacceptable in a traditional li- brary. These studies remind us that many valuable digital resources, including scholarly works, are not adequately provided with preservation strategies. The LOCKSS (Reich, 2001) system illustrates one suggested approach for the preservation of web published material. Using LOCKSS, libraries act to preserve material they subscribe to by running persistent caches. In the event that material is not available from the publisher, multiple copies will remain in these persistent caches. In the subscription world there is a need for agreements between publishes and libraries to grant the libraries the right to make and use cached copies in this way. With open-access material, including eprints, there need not be agreements between individual data providers and agents preserving the content provided there is some machine readable information that says such caching is acceptable. In this way, a portal could automatically cache all material that its users access and the OAI-PMH would be one way to exchange additional metadata to support this. Indeed, such a system must be automatic if it is to work with perhaps thousands of individual repositories. The Open Archival Information System (OAIS) (OCLC/RLG Preservation Metadata Working Group, 2002) provides a comprehensive preservation meta- data framework that is likely to be too heavyweight to be implemented by eprint repositories in the near future. However, various elements described in this framework could be used to fulfill specific preservation requirements. For example, an MD5 checksum (Rivest, 1992) might be included in the metadata for a resource to allow the resource’s validity to be checked (part of the fixity requirement in OAIS).

Conclusion

There are an increasing number of eprint repositories, both discipline based and institutional. The (OAI) is a key infrastructure component that avoids individual repositories becoming isolated islands of infor-

9 mation, by supporting the creation user services as a separate layer. A number of services already provide cross-repository searching and other facilities based on OAI harvested metadata. There is considerable potential to add value to eprint repositories with new services and facilities. Some of these services will require additional metadata, and this metadata must be machine-readable so that it can used automatically. As the number of OAI repositories and services increases, it will become increas- ingly important that services do not need to create and implement a custom solution for each repository they cover. There is the realizable potential to cre- ate a global network of interoperating eprint archives with a rich set of discovery, classification and linking services based on the OAI framework.

Acknowledgments

This work is supported by the U.S. National Science Foundation under agree- ment number 0132355 and grant number IIS-9817416.

References

Atkins, Anthony, Ed Fox, Robert France and Hussein Suleman (editors) (2001), ETD-ms: an Interoperability Metadata Standard for Electronic Theses and Dissertations — version 1.00. Available: http://www.ndltd.org/standards/metadata/ETD-ms-v1.00.html

Chan, Leslie, et al (2002), Budapest Open Access Initiative. Available: http://www.soros.org/openaccess/read.shtml

Crow, Raym (2002), The Case for Institutional Repositories: A SPARC Position Paper, SPARC. Available: http://www.arl.org/sparc/IR/ir.html

Friedlander, Amy (November 2002), Dimensions and Use of the Scholarly Infor- mation Environment: Introduction to a Data Set Assembled by the Digital Library Federation and Outsell, Inc. Available: http://www.clir.org/pubs/abstract/pub110abst.html. Table 81 and section 3 - use of online methods to find print material. Table 84 and sec- tion 3 - preference for online method to find electronic journal articles. Table 44 - verification of accuracy of information obtained from internet. Table 55 - verification of authoritativeness of information obtained from internet. Ginsparg, Paul, Rick Luce, Herbert Van de Sompel (1999), The Open Archives initiative aimed at the further promotion of author self-archived solutions, Universal PrePrint Service (UPS) Meeting. Available: http://www.openarchives.org/meetings/SantaFe1999/ups-invitation-ori.htm

10 Kling, Rob and Geoffrey McKim (2002), “Not Just a Matter of Time: Field Differences and the Shaping of Electronic Media in Supporting Scientific Communication,” Journal of the American Society for Information Science. Available: http://arxiv.org/abs/cs.CY/9909008 Kreitz, P.A., L. Addis, H. Galic and T. Johnson (1996), The Virtual Library in Action: Collaborative International Control of High-Energy Physics Pre-Prints SLAC-PUB-7110. Available: http://www.slac.stanford.edu/pubs/slacpubs/7000/slac-pub-7110.html Lagoze, Carl, Herbert Van de Sompel, Michael Nelson and Simeon Warner (ed- itors) (2002), The Open Archives Initiative Protocol for Metadata Harvest- ing v2.0. Available: http://www.openarchives.org/OAI/2.0/openarchivesprotocol.htm Lagoze, Carl, Herbert Van de Sompel, Michael Nelson and Simeon Warner (ed- itors) (2002), Implementation Guidelines for the Open Archives Initiative Protocol for Metadata Harvesting — XML schema to hold branding infor- mation for collections. Available: http://www.openarchives.org/OAI/2.0/guidelines-branding.htm Lagoze, Carl and Herbert Van de Sompel (2002), The Making of the Open Archives Initiative for Metadata Harvesting, ibid Lawrence, S., F Coetzee, E. Glover, D. Pennock, G. Flake, F. Nielsen, R. Krovetz, A. Kruger and C. L. Giles (2001), “Persistence of Web References in Scientific Research” IEEE Computer, Vol. 34 No. 2, pp. 26–31 Nelson, Michael L. and B. Danette Allen (January 2001), “Object Persistence and Availability in Digital Libraries,:: D-Lib Magazine, Vol. 8 No. 1. Avail- able: http://www.dlib.org/dlib/january02/nelson/01nelson.html OCLC/RLG Preservation Metadata Working Group (June 2002), Preservation Metadata and the OAIS Information Model. Available: http://oclc.org/research/pmwg/pm framework.pdf Pinfield, Stephen, Mike Gardner, John MacColl (April 2002), “Setting up an institutional e-print archive” Ariadne, Issue 31. Available: http://www.ariadne.ac.uk/issue31/eprint-archives/intro.html Reich, Vicky and David S.H. Rosenthal (June 2001), “LOCKSS: A Permanent Web Publishing and Access System” D-Lib Magazine, Vol. 7 No. 6. Avail- able: http://www.dlib.org/dlib/june01/reich/06reich.html Rivest, R. (1992), “The MD5 Message-Digest Algorithm,” IETF RFC1321. Available: http://www.faqs.org/ftp/rfc/rfc1321.txt

11 Roosendaal, Hans E. and Peter A Th. M Guerts (1998), “Forces and functions in scientific communication: an analysis of their interplay,”, CRISP, Vol. 97. Available: http://www.physik.uni-oldenburg.de/conferences/CRISP97/roosendaal.html Sebastiani, Fabrizio (2002), “Machine learning in automated text categoriza- tion,” ACM Computing Surveys, Vol. 34 No. 1, pp. 1-47. Available: http://faure.iei.pi.cnr.it/~fabrizio/Publications/ACMCS02.pdf

Van de Sompel, Herbert and Carl Lagoze (2000), “The Santa Fe Convention of the Open Archives Initiative,” D-Lib Magazine, Vol. 6 No. 2. Available: http://www.dlib.org/dlib/february00/vandesompel-oai/02vandesompel-oai.html Van de Sompel, Herbert and Oren Beit-Arie (March 2001), “Open Linking in the Scholarly Information Environment Using the OpenURL Framework,” D-Lib Magazine, Vol. 7 No. 3. Available: http://www.dlib.org/dlib/march01/vandesompel/03vandesompel.html

Notes

[1] Self-Archiving FAQ, “Q: What is an Eprint Archive?, A: An Eprint Archive is a collection of digital documents. ...” http://www.eprints.org/self-faq/#Eprint-Archive [2] September American Scientist Forum, http://listserver.sigmaxi.org/sc/wa.exe?A2=ind99&L=september98-forum&P=21053 [3] Merriam-Webster Online, http://www.merriamwebster.com/ [4] The arXiv.org eprint archive, http://arXiv.org/ [5] GNU EPrints Software, http://software.eprints.org/ [6] OCLC Experimental Catalog, http://alcme.oclc.org/xtcat/index.html. OAI baseURL http://alcme.oclc.org/xtcat/servlet/OAIHandler [7] arXiv eprint archive, 29 October 2002: 212,976 records, all openly available. OAI-PMH baseURL http://arXiv.org/oai2 [8] NCSTRL historical collection, http://oai.dlib.vt.edu:80/~ncstrlh/cgi-bin/oai.pl [9] National Advisory Committee for Aeronautics (NACA) Technical Report Server, OAI-PMH baseURL http://naca.larc.nasa.gov/oai2.0/index.cgi [10] RePEc, OAI-PMH baseURL http://oai.repec.openlib.org [11] Langley Technical Report Server, OAI-PMH baseURL http://techreports.larc.nasa.gov/ltrs/oai2.0/

12 [12] Virginia Tech. Electronic Theses and Dissertations, 29Oct2002, 3665 records, 2408 “unrestricted”, 1134 “restricted” (sampled entries were restricted to VT campus users), 123 “mixed”. OAI-PMH baseURL http://scholar.lib.vt.edu:80/theses/OAI/cgi-bin/index.pl

[13] CogPrints eprint archive, OAI-PMH baseURL http://cogprints.soton.ac.uk/perl/oai [14] BioMed Central, OAI-PMH baseURL http://www.biomedcentral.com/oai/1.1/bmcoai.asp [15] Project Euclid, Cornell University, OAI-PMH baseURL http://ProjectEuclid.org/Dienst [16] arc OAI service provider, http://arc.cs.odu.edu/ [17] my.OAI OAI service provider, http://www.myoai.org/ [18] Perseus OAI service provider, http://www.perseus.tufts.edu/cgi-bin/vor [19] OAIster OAI service provider, http://oaister.umdl.umich.edu/o/oaister/ [20] iCite OAI service provider, http://icite.sissa.it:8888/icite/ [21] torii OAI service provider, http://torii.sissa.it/ [22] Public Knowledge Project OAI service provider, http://pkp.ubc.ca/harvester/

[23] Citebase OAI service provider, http://citebase.eprints.org/cgi-bin/search [24] Scirus OAI service provider, http://www.scirus.com [25] NCSTRL OAI service provider, http://www.ncstrl.org/ [26] RoMEO project: Rights MEtadata for Open archiving, http://www.lboro.ac.uk/departments/ls/disresearch/romeo/index.html

13