Arxiv@25: Exploring Future Directions and Strategies
Total Page:16
File Type:pdf, Size:1020Kb
arXiv@25: Exploring Future Directions and Strategies Oya Y. Rieger Martin Lessmeister Sandy Payette Associate University Librarian arXiv Lead Developer Director of IT for Research Scholarly Resources & [email protected] and Scholarship Preservation Services arXiv CTO arXiv Program Director [email protected] [email protected] CNI Fall 2016 Member Meeting 1 https://arxiv.org/ 2 Cornell University Library, 2016 1 Organizational & Business Model 3 1.2 million OA e-prints in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance and Statistics • 2012 – 84,000 new submissions – 64 million downloads • 2013 – 92,500 new submissions – 67 million downloads • 2014 – 97,000 new submissions – 90 million downloads* • 2015 – 105,000 new submissions – 139 million downloads* * The numbers are sensitive to robot downloads and it is hard to remove all from our numbers so potential significant over- 4 counting – we put less effort in cleaning up this data 2014 on. Cornell University Library, 2016 2 arXiv submission rate statistics http://arxiv.org/help/stats5 http://arxiv.org/help/stats6 Cornell University Library, 2016 3 https://confluence.cornell.edu/display/culpublic/arXiv+Sustainability+Initiative 7 Australia 2015 International US Gov, 2% Canada, 3% France, 6% Use Distribution US Edu, 24% Germany, 12% Other Countries Austria Belgium Italy, 4% Brazil Czech Republic Denmark United Finland Kingdom (UK), Japan, 7% Hong Kong 11% India Israel Poland Russia Singapore All Other Sweden Countries, 18% Taiwan Switzerland, 7% Spain, 2% Netherlands, 2% Cornell University Library, 2016 4 Business Model, 2013-2017 • Cornell University Library – $75,000 per year in support of operational costs – in-kind contribution of all indirect costs (37%) • Simons Foundation – $100,000 per year (increased to $100,000 per year in 2016) – $300,000 per year matching grant • Member Institution – annual fees within $1,500-3,000 range (based on usage ranking) 10 Cornell University Library, 2016 5 arXiv Organizational Model - Cornell University Library Oya Rieger, Vacant, Program Scientific Director Director (0.3FTE) (0.4FTE) OPERATIONS & MEMBERSHIP INFORMATION TECHNOLOGY Gail Steinhart, Program Sandy Payette, Associate CTO (0.3FTE) (0.5FTE) Chloe McLaren, Martin Jim Entwood, Membership arXiv Lessmeister, Operations Program developers Lead Developer Manager (1FTE) Coordinator (2.2FTE) (1.0FTE) (0.2FTE) arXiv Key: administrators and students Leadership team (3.1FTE) Leadership & operations teams Operations team arXiv: Roles & Responsibilities SCIENTIFIC ADVISORY BOARD: LEADERSHIP TEAM: • Provides advice and guidance • Bears overall responsibility for pertaining to intellectual oversight arXiv’s operation and of arXiv, with particular focus on development, with guidance arXiv's moderation system and Scientific from the MAB and SAB. criteria for depositing content. Advisory • Proposes & reviews proposals for • Responsible for business and Board (SAB) sustainability planning, new subject domains. collaborations and • Makes recommendations and partnerships. provides feedback on development projects. arXiv MEMBER ADVISORY Leadership BOARD: Member • Represents members’ & Advisory Operations Board (MAB) interests. • Advises CUL on Teams development, business planning, outreach and advocacy. OPERATIONS TEAM: • Manages moderation, submission Cornell CUL ADMINISTRATION: and user support processes. University • Assumes overall responsibility • Operates and develops arXiv’s Library for arXiv’s obligations. technical infrastructure. Administration • Provides institutional support • Administers membership and resources for arXiv (HR, program. business services, legal, etc.). • Final arbiter for arXiv decisions. Cornell University Library, 2016 6 SUSTAINABILITY Financial Stability Sharable & Discovery and Access Persistent Data Scalable and Reusable Repository Attention to User Architecture Needs & Scholarly Communication Culture Curatorial Policies & Quality Control Information Policies Interoperability with Related Systems 13 arXIv@25 & Next-Gen arXiv 14 Cornell University Library, 2016 7 https://confluence.cornell.edu/display/culpublic/arXiv+Sustainability+Initiative 15 Vision Setting Process • Gather feedback from the arXiv's advisory boards • Conduct a user survey to seek input • Convene a technical infrastructure (IT) 16 Cornell University Library, 2016 8 Member Advisory Board & Scientific Advisory Board Survey • Essential to keep focused on the core mission • Maintain and improve fundamental systems, functionality, moderation support, and user interaction before any new major initiatives are undertaken 17 April 6-27, 2016 18 Cornell University Library, 2016 9 main place of work is located in: Other Countries: 1% or less representation each from 113 countries 19 95% 20 Cornell University Library, 2016 10 I appreciate the simple, functional design without marketing or other distractions. A bit old-school but functionalities are basically easy to find and use. Deliciously old fashioned. Please keep it that way. 21 Key Findings: Wish List • Improve the search function & author name disambiguation • Provide better support for submitting and linking research data, code, slides and other materials associated with papers • Add direct links to papers in the references and support reference extraction 22 Cornell University Library, 2016 11 23 Key Findings: QC & Moderation • Continue to implement quality control measures: – checking for text overlap – correct classification of submissions – rejection of papers without much scientific value – asking authors to fix format-related problems • Provide more information about the moderation process and policies 24 Cornell University Library, 2016 12 Key Findings: arXiv and Scientific Communication • Divided opinions: – think boldly and further advance open access – emphasis on the importance of sticking to the main mission • Urge vigilance when approaching any changes • Caution against turning arXiv into a “social media” style platform 25 arXiv is NOT a social media platform. It is a repository for scholarly work, and its primary focus should stay that way. Do not make arXiv into a social media platform or something complicated. Keep working on improving your core, which is what we use and love! 26 Cornell University Library, 2016 13 Key Findings: Open Science • Rating system – split between very important/important (36%) and not important/should not be doing this (36%) • Annotation feature – split with 34.89% of users ranking it as very important/important and 34.08% as not important/should not be doing this • Implement very carefully and systematically 27 Don't give in to pressure to add new features. The core features currently provided are crucial for the day to day operating of many professional researchers, who certainly don't need an arXiv with knobs on. I do not feel it is appropriate for arXiv to become a forum for public commentary, which often deteriorates into personal or inappropriate remarks. 28 Cornell University Library, 2016 14 DEMOGRAPHIC CHARACTERISTICS & CORELATIONS 29 Very important, important Somewhat important Not important, should not be doing this No opinion 0% 20% 40% 60% 80% 100% 0 - 5 years 6- 10 years 11 or more years How important is it to… Improve support for submitting research papers by updating the TeX engine,” by years respondents have used arXiv 30 Cornell University Library, 2016 15 0 - 5 years 6- 10 years 11 or more years 0% 20% 40% 60% 80% 100% Very important, important Somewhat important Not important, should not be doing this No opinion Responses to the question “How important is it to… Offer a rating system so readers can recommend arXiv papers that they find valuable,” by years respondents have used arXiv. 31 72% 20% 8% 32 Cornell University Library, 2016 16 Source: Oya Y. Rieger, Gail Steinhart, Deborah Cooper (2016). arXiv@25: Key findings of a user survey. http://arxiv.org/abs/1607.08212 33 Key Conclusions of the IT Workshop USERS: arXiv is a production service, not a technology experiment. CODE BASE: No single system can replace everything that arXiv does. The code base is 20+ years old - infrastructure is at risk. ARCHITECTURE: Broad agreement that about pursuing a modular approach that builds on either an open-source stack or framework. PROCESS MATTERS: Managing the transition and putting in place a sound project oversight is as critical as making the right technological choices. STAKEHOLDERS: There needs to be a careful stakeholder analysis to understand use cases. 34 Cornell University Library, 2016 17 Next-Gen arXiv Phase I Initiative 35 Technical Infrastructure Workshop (April 28-29 2016) Q: Spectrum of approaches on where to go next ? … 36 Cornell University Library, 2016 18 From Workshop Report Two key recommendations Process Matters Modular Code Base “The arXiv team needs “A solution somewhere in to create a balanced the middle of the plan that factors in a spectrum seems like the range of issues most plausible option. extending from The core is what needs to architectural choices be improved while to sustainability …” maintaining simplicity…” 37 arXiv-NG - Notional Architecture v.1 This is unique for arXiv community 38 Cornell University Library, 2016 19 Current arXiv-NG Analysis Phase • Designing “notional architecture” • Analyzing workflows, esp. moderation • Evaluating partnership opportunities • Evaluating existing technologies – Open source stacks and service components arXiv-NG Classic arXiv 39 Business Planning •Develop a roadmap to transition Classic arXiv to arXiv-NG •Consider existing workflows, policies, user support needs, and communication strategies •Consider the existing business model and the governance structure •Engage additional funders and development partners to support future development phases 40 Cornell University Library, 2016 20 Cornell University Library, 2016 21.