arXiv@25: Exploring Future Directions and Strategies

Oya Y. Rieger Martin Lessmeister Sandy Payette Associate University Librarian arXiv Developer Director of IT for Scholarly Resources & [email protected] and Scholarship Preservation Services arXiv CTO arXiv Program Director [email protected] [email protected] CNI Fall 2016 Member Meeting

1

https://arxiv.org/ 2

Cornell University Library, 2016 1 Organizational & Business Model

3

1.2 million OA e-prints in , , , Quantitative , Quantitative Finance and • 2012 – 84,000 new submissions – 64 million downloads • 2013 – 92,500 new submissions – 67 million downloads • 2014 – 97,000 new submissions – 90 million downloads* • 2015 – 105,000 new submissions – 139 million downloads*

* The numbers are sensitive to robot downloads and it is hard to remove all from our numbers so potential significant over- 4 counting – we put less effort in cleaning up this data 2014 on.

Cornell University Library, 2016 2 arXiv submission rate statistics

http://arxiv.org/help/stats5

http://arxiv.org/help/stats6

Cornell University Library, 2016 3 https://confluence.cornell.edu/display/culpublic/arXiv+Sustainability+Initiative

7

Australia 2015 International US Gov, 2% Canada, 3% France, 6% Use Distribution

US Edu, 24% Germany, 12%

Other Countries Austria Belgium Italy, 4% Brazil Czech Republic Denmark United Finland Kingdom (UK), Japan, 7% Hong Kong 11% India Israel Poland Russia Singapore All Other Sweden Countries, 18% Taiwan Switzerland, 7% Spain, 2% Netherlands, 2%

Cornell University Library, 2016 4 Business Model, 2013-2017

• Cornell University Library – $75,000 per year in support of operational costs – in-kind contribution of all indirect costs (37%) • Simons Foundation – $100,000 per year (increased to $100,000 per year in 2016) – $300,000 per year matching grant • Member Institution – annual fees within $1,500-3,000 range (based on usage ranking) 10

Cornell University Library, 2016 5 arXiv Organizational Model - Cornell University Library

Oya Rieger, Vacant, Program Scientific Director Director (0.3FTE) (0.4FTE)

OPERATIONS & MEMBERSHIP TECHNOLOGY Gail Steinhart, Program Sandy Payette, Associate CTO (0.3FTE) (0.5FTE)

Chloe McLaren, Martin Jim Entwood, Membership arXiv Lessmeister, Operations Program developers Lead Developer Manager (1FTE) Coordinator (2.2FTE) (1.0FTE) (0.2FTE)

arXiv Key: administrators and students Leadership team (3.1FTE) Leadership & operations teams

Operations team

arXiv: Roles & Responsibilities SCIENTIFIC ADVISORY BOARD: LEADERSHIP TEAM: • Provides advice and guidance • Bears overall responsibility for pertaining to intellectual oversight arXiv’s operation and of arXiv, with particular focus on development, with guidance arXiv's moderation system and Scientific from the MAB and SAB. criteria for depositing content. Advisory • Proposes & reviews proposals for • Responsible for business and Board (SAB) sustainability planning, new subject domains. collaborations and • Makes recommendations and partnerships. provides feedback on development projects. arXiv MEMBER ADVISORY Leadership BOARD: Member • Represents members’ & Advisory Operations Board (MAB) interests. • Advises CUL on Teams development, business planning, outreach and advocacy.

OPERATIONS TEAM: • Manages moderation, submission Cornell CUL ADMINISTRATION: and user support processes. University • Assumes overall responsibility • Operates and develops arXiv’s Library for arXiv’s obligations. technical infrastructure. Administration • Provides institutional support • Administers membership and resources for arXiv (HR, program. business services, legal, etc.). • Final arbiter for arXiv decisions.

Cornell University Library, 2016 6 SUSTAINABILITY

Financial Stability

Sharable & Discovery and Access Persistent Data

Scalable and Reusable Repository Attention to User Architecture Needs & Scholarly Communication Culture Curatorial Policies & Quality Control Information Policies Interoperability with Related Systems

13

arXIv@25 & Next-Gen arXiv

14

Cornell University Library, 2016 7 https://confluence.cornell.edu/display/culpublic/arXiv+Sustainability+Initiative

15

Vision Setting Process

• Gather feedback from the arXiv's advisory boards • Conduct a user survey to seek input • Convene a technical infrastructure (IT)

16

Cornell University Library, 2016 8 Member Advisory Board & Scientific Advisory Board Survey

• Essential to keep focused on the core mission

• Maintain and improve fundamental systems, functionality, moderation support, and user interaction before any new major initiatives are undertaken

17

April 6-27, 2016 18

Cornell University Library, 2016 9 main place of work is located in:

Other Countries: 1% or less representation each from 113 countries 19

95%

20

Cornell University Library, 2016 10 I appreciate the simple, functional design without marketing or other distractions.

A bit old-school but functionalities are basically easy to find and use.

Deliciously old fashioned. Please keep it that way.

21

Key Findings: Wish List • Improve the search function & author name disambiguation

• Provide better support for submitting and linking research data, code, slides and other materials associated with papers

• Add direct links to papers in the references and support reference extraction

22

Cornell University Library, 2016 11 23

Key Findings: QC & Moderation

• Continue to implement quality control measures: – checking for text overlap – correct classification of submissions – rejection of papers without much scientific value – asking authors to fix format-related problems

• Provide more information about the moderation process and policies

24

Cornell University Library, 2016 12 Key Findings: arXiv and Scientific Communication • Divided opinions: – think boldly and further advance – emphasis on the importance of sticking to the main mission

• Urge vigilance when approaching any changes

• Caution against turning arXiv into a “” style platform

25

arXiv is NOT a social media platform. It is a repository for scholarly work, and its primary focus should stay that way.

Do not make arXiv into a social media platform or something complicated. Keep working on improving your core, which is what we use and love!

26

Cornell University Library, 2016 13 Key Findings: • Rating system – split between very important/important (36%) and not important/should not be doing this (36%)

• Annotation feature – split with 34.89% of users ranking it as very important/important and 34.08% as not important/should not be doing this

• Implement very carefully and systematically

27

Don't give in to pressure to add new features. The core features currently provided are crucial for the day to day operating of many professional researchers, who certainly don't need an arXiv with knobs on.

I do not feel it is appropriate for arXiv to become a forum for public commentary, which often deteriorates into personal or inappropriate remarks.

28

Cornell University Library, 2016 14 DEMOGRAPHIC CHARACTERISTICS & CORELATIONS

29

Very important, important

Somewhat important

Not important, should not be doing this

No opinion

0% 20% 40% 60% 80% 100%

0 - 5 years 6- 10 years 11 or more years

How important is it to… Improve support for submitting research papers by updating the TeX engine,” by years respondents have used arXiv

30

Cornell University Library, 2016 15 0 - 5 years

6- 10 years

11 or more years

0% 20% 40% 60% 80% 100%

Very important, important Somewhat important Not important, should not be doing this No opinion

Responses to the question “How important is it to… Offer a rating system so readers can recommend arXiv papers that they find valuable,” by years respondents have used arXiv.

31

72%

20%

8%

32

Cornell University Library, 2016 16 Source:

Oya Y. Rieger, Gail Steinhart, Deborah Cooper (2016). arXiv@25: Key findings of a user survey. http://arxiv.org/abs/1607.08212

33

Key Conclusions of the IT Workshop

USERS: arXiv is a production service, not a technology experiment.

CODE BASE: No single system can replace everything that arXiv does. The code is 20+ years old - infrastructure is at risk.

ARCHITECTURE: Broad agreement that about pursuing a modular approach that builds on either an open-source stack or framework.

PROCESS : Managing the transition and putting in place a sound project oversight is as critical as making the right technological choices.

STAKEHOLDERS: There needs to be a careful stakeholder analysis to understand use cases. 34

Cornell University Library, 2016 17 Next-Gen arXiv Phase I Initiative

35

Technical Infrastructure Workshop (April 28-29 2016)

Q: Spectrum of approaches on where to go next ? …

36

Cornell University Library, 2016 18 From Workshop Report Two key recommendations

Process Matters Modular Code Base

“The arXiv team needs “A solution somewhere in to create a balanced the middle of the plan that factors in a spectrum seems like the range of issues most plausible option. extending from The core is what needs to architectural choices be improved while to sustainability …” maintaining simplicity…”

37

arXiv-NG - Notional Architecture v.1

This is unique for arXiv community

38

Cornell University Library, 2016 19 Current arXiv-NG Analysis Phase

• Designing “notional architecture” • Analyzing workflows, esp. moderation • Evaluating partnership opportunities • Evaluating existing technologies – stacks and service components

arXiv-NG Classic arXiv

39

Business Planning

•Develop a roadmap to transition Classic arXiv to arXiv-NG •Consider existing workflows, policies, user support needs, and communication strategies •Consider the existing business model and the governance structure •Engage additional funders and development partners to support future development phases

40

Cornell University Library, 2016 20 Cornell University Library, 2016 21