Distributed Archives and Scientometrics

Resolving an Anomaly www.eprints.org H. G. Wells, World Brain: The Idea of a Permanent World Encyclopaedia Encyclopédie Française, August, 1937 • Encyclopaedias of the past sufficed for the needs of a cultivated minority – universal education was unthought of – gigantic increase in recorded knowledge • Discontent with the role of universities and libraries in the intellectual life of mankind • Universities multiply but do not enlarge their scope – thought & knowledge organization of the world • No obstacle to the creation of an efficient index to all human knowledge, ideas and achievements Resolving an Anomaly www.eprints.org The Optimal and Inevitable for Researchers All of this will come to pass. The only question is “How Soon?”

• The entire full-text refereed corpus online • On every researcher’s desktop, everywhere • 24 hours a day • All papers citation-interlinked • Fully searchable, navigable, retrievable • For free, for all, forever

Resolving an Anomaly www.eprints.org Globalizing Research

Impact Access

Harvard Harvard

financial firewalls

The The Rest Rest

Resolving an Anomaly www.eprints.org The :

Sufficient to free entire refereed corpus forever, immediately:

1. Universities install off-the-shelf, OAI-compliant software 2. Authors self-archive ( & ) 3. Institutions subsidize first start-up wave of self-archiving 4. The Give-Away corpus is freed ______Hypothetical Sequel:

5. Users prefer free version? 6. Publisher S/L/P revenues shrink, Library S/L/P savings grow? 7. Publishers downsize to QC/C service-providers + optional add-ons? 8. QC/C service costs funded by author-institution out of reader-institution S/L/P savings?

Resolving an Anomaly www.eprints.org Five Essential PostGutenberg Distinctions: (if you don’t make them, none of this will make sense)

1. Distinguish the non-give-away vs. give-away literature Litmus test: “Does the author seek a royalty/fee?”: books (yes)vs. refereed journal papers (no)

2. Distinguish income (from paper sale) vs. impact (from paper use) (and distinguish give-away-author imprint-income [0] vs. impact-income [??])

3. Distinguish give-away author copyright protection from: theft-of-authorship (wanted) vs. theft-of-text (unwanted)

4. Distinguish self-publishing (vanity press) vs. self-archiving (of published, refereed research)

5. Distinguish unrefereed preprints vs. refereed postprints “eprints” = preprints + postprints

Resolving an Anomaly www.eprints.org Zeno’s Prima-FaQs “I worry about self-archiving because…”:

1. Preservation 13. Censorship 2. Authentication 14. Capitalism 3. Corruption 15. Readability 4. Navigation (info-glut) 16. Graphics 5. Certification 17. Publishers’ future 6. Evaluation 18. Libraries’ future 7. 19. Learned Societies’ future 8. Paying the piper 20. University conspiracy 9. Downsizing 21. Serendipity 10. Copyright 22. Tenure/Promotion 11. Plagiarism 23. (your prima-FaQ here…) 12. Priority Answers available at < http://cogsci.soton.ac.uk/~harnad/Tp/resolution.htm >

Resolving an Anomaly www.eprints.org Eprints < www.eprints.org >is dedicated to freeing the research literature, pre- www.eprints.org and post-refereeing, through author/institution self-archiving in interoperable

Open Archives < www.openarchives.org > To help the self-archiving initiative quickly gain momentum, archive-creating software, compliant with the OAi protocol, hence fully interoperable with all other Open Archives, has been developed at the University of Southampton.

Eprints is designed to be as flexible and adaptable as possible, so that all universities world-wide can immediately adopt and configure it with minimal effort for all their disciplines’ self- archiving needs.

The Eprints software, has been available (for free, of course) from eprints.org since December 2000. From Linear Growth to Exponential

s e Deposit lin Rates ip sc Di

Time arXiv submission rates Exponential growth in - linear growth only archiving to catch up with paper-based research 30% of citations to papers deposited in arXiv 100% of papers archived, in all disciplines

Resolving an Anomaly www.eprints.org Well’s Global Research Database?

Resolving an Anomaly www.eprints.org New OAI Services

Multiple Updates by LANL Subfield (based on LANL meta-data)

solv-int quant-ph physics patt-sol nucl-th nucl-ex nlin math-ph math hep-th hep-ph hep-lat hep-ex gr-qc cs cond-mat comp-gas chao-dyn astro-ph Citation Linking adap-org

0 5000 10000 15000 20000 25000 No. of Papers with Updates & Scientometric No Updates 1 Update 2 Updates 3 Updates 4 Updates Analysis Resolving an Anomaly www.eprints.org Citation-Ranked Searches

Resolving an Anomaly www.eprints.org Citation-based Visualisation

Resolving an Anomaly www.eprints.org Decreasing Citation Latencies Frequency of Citation Latencies: 1992-1999

5000

4500

4000

3500

3000

2500

Citations 2000

1500

1000

500

0 0 1224364860728496 Time Difference/Months

99 98 97 96 95 94 93 92

• The raw data show that the latency of the citation peak has been reducing over the period of the archive Resolving an Anomaly www.eprints.org The “New Paper Rush”

Age of paper against number of downloads

50000 45000 40000 35000 30000 25000 20000 15000

Number of10000 Downloads 5000 0

0 2 4 6 8 8 10 12 14 16 1 20 22 24 26 28 30 Age of Paper (days)

• Users subscribe to an email alerting service that informs them of new papers.

Resolving an Anomaly www.eprints.org Article Embryology hep-th

200

175

150

125

100 Papers

75

50

25

0 199107 199201 199207 199301 199307 199401 199407 199501 199507 199601 199607 199701 199707 199801 199807 199901 199907 200001

With J-R With J-R/Report Report Unknow n

• Papers with a journal reference [J-R] cross papers without a J-R at an age of 13 months, suggesting a time difference of 13 months between pre-print and post-print Resolving an Anomaly www.eprints.org Effect of Paper Impact

• The papers were split into three sets based on the number of citations to them. • There are an equal number of citations to the papers in the low, medium and high sets.

Resolving an Anomaly www.eprints.org Author Impact Quartiles

Mean Citations/Aut Quartile Total % Total Citations Papers Deposits Updates/ hor/Paper Author High 25% 798 2.09% 240,092 2,732 0.11 6,720 0.48 Med 50% 9,262 24.20% 733,272 37,318 0.00212 93,671 0.37 Low 25% 28,211 73.71% 251,925 67,951 0.000131 165,971 0.27

• High impact authors update more than medium or low • High and medium impact authors deposit more papers than low

Resolving an Anomaly www.eprints.org Citation Quality

Do Papers Cite Papers of Like Impact

140000

120000 100000 80000 No of Citations 60000 High 40000 Medium 20000 0 Dest. Impact Low High Medium Low Source Impact

• Papers generally cite papers of like impact (χ2 underway).

Resolving an Anomaly www.eprints.org Citation Spread

Histogram of Citations per Paper (author impact) 30,000 papers were by authors with no citation

40000

35000 30807

30000

25000

20000

Papers 13668 15000 11527 2060 6784 10000 9627 3105 5000 6534 4441 6072 5863 4781 138 121 170 257 249 0 1797 No citations 1 Citation 2/3 Citations 4/5/6 7/8/9/10 11 or more Citations Citations Citations

High (2.53%) Medium (34.55%) Low (62.92%) • A small number of papers receive a very large number of citations

Resolving an Anomaly www.eprints.org Effect of Paper Impact on Usage

All Papers

0.0025

0.002

0.0015

0.001 Frequency Density 0.0005

0 0 109 218 327 436 545 654 763 872 981 1090 1199 1308 1417 1526 1635 1744 1853 1962 2071 2180 2289 2398 Age of paper (days)

High (2.0%) Medium (7.7%) Low (46.5%) Unknown (39.6%)

• Higher impact papers have a longer download life expectancy.

Resolving an Anomaly www.eprints.org Correlating citations and downloads

Download type r n All Papers 0.11155 63671 High Impact Papers (2.0%) 0.27293 1981 Medium Impact Papers (7.7%) 0.01288 5937 Low Impact Papers (46.5%) -0.01412 30163

• There is a significant positive correlation between citations and downloads for high impact papers.

Resolving an Anomaly www.eprints.org Implementation Issues

• Creating new metadata vs • Creating new services

Resolving an Anomaly www.eprints.org