Developing a Comprehensive Repository at Carnegie Mellon University

Total Page:16

File Type:pdf, Size:1020Kb

Developing a Comprehensive Repository at Carnegie Mellon University publications Article Balancing Multiple Roles of Repositories: Developing a Comprehensive Repository at Carnegie Mellon University David Scherer 1,* and Daniel Valen 2 1 University Libraries, Carnegie Mellon University, Pittsburgh, PA 15237, USA 2 Figshare, Cambridge, MA 02139, USA; dan@figshare.com * Correspondence: [email protected]; Tel.: +1-412-268-2443 Received: 27 February 2019; Accepted: 22 April 2019; Published: 26 April 2019 Abstract: Many academic and research institutions today maintain multiple types of institutional repositories operating on different systems and platforms to accommodate the needs and governance of the materials they house. Often, these institutions support multiple repository infrastructures, as these systems and platforms are not able to accommodate the broad range of materials that an institution creates. Announced in 2017, the Carnegie Mellon University (CMU) Libraries implemented a new repository solution and service model. Built upon the Figshare for Institutions platform, the KiltHub repository has taken on the role of a traditional institutional repository and institutional data repository, meeting the disparate needs of its researchers, faculty, and students. This paper will review how the CMU Libraries implemented the KiltHub repository and how the repository services was redeveloped to provide a more encompassing solution for traditional institutional repository materials and research datasets. Additionally, this paper will summarize how the CMU University Libraries surveyed the current repository landscape, decided to implement Figshare for Institutions as a comprehensive institutional repository, revised its previous repository service model to accommodate the influx of new material types, and what needed to be developed for campus engagement. This paper is based upon a presentation of the same title delivered at the 2018 Open Repositories Conference held at Montana State University in Bozeman, Montana. Keywords: institutional repositories; research data; Carnegie Mellon University; scholarly communications; open access; open scholarship; open science; open data; engagement 1. Introduction Over the last two decades since the creation of the DSpace repository platform by MIT and Hewlett-Packard in 2002 [1], academic and research institutions have developed and implemented a wide range of institutional repositories. Increasingly, institutional repositories have become a dynamic tool for scholarly communication, and a necessary resource for managing institutional research and knowledge [2]. This has included multiple repositories focused on maintaining and housing the wide range of materials that required unique environments and needs to accommodate them digitally. Likewise, some repositories were designed for set purposes, such as Electronic Thesis/Dissertation (ETD) repositories, open access publication repositories, and research data repositories. As the creation of research data has increased, so too has the need to support its creation and management. Michael Witt noted that academic and research libraries have taken a more active role in the research data management services and infrastructure provided by institutions to handle the increase in data output [3]. The expansion of roles for academic libraries now has often led to their expanded integration in the research cycle of their institutions. Witt further elaborates this point, Publications 2019, 7, 30; doi:10.3390/publications7020030 www.mdpi.com/journal/publications Publications 2019, 7, 30 2 of 24 detailing that libraries can collaborate with their campus communities to understand what tools, services, and support will be necessary to support services for data [3]. As Tenopir et al. explained in their 2015 study, this can lead to libraries becoming invested partners in all aspects of the research process, from data collection to publication, and to the preservation of research outputs [4]. In his 2002 SPARC position paper, Raym Crow noted that an Institutional Repository (IR) could be implemented to demonstrate the visibility, reach, and overall significance of an institution’s research, thereby providing both short-term and long-term benefits [5]. In contrast, Clifford Lynch expanded upon the notion of how an IR could be defined beyond a single entity or service. In his 2003 ARL briefing, Lynch described an IR not just as a single entity or service, but rather as “a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members” [6]. More recently, the Research Data Alliance’s (RDA) Data Foundations and Terminology Working Group presented a more defined definition of a repository, especially with the involvement of research data. RDA defines a repository as “a Repository (aka Data Repository or Digital Data Repository) is a searchable and queryable interfacing entity that is able to store, manage, maintain and curate Data/Digital Objects. A data repository provides a service for human and machine to make data discoverable/searchable through collection(s) of metadata” [7]. As institutional and research repositories have grown in adoption and usage, the conceptual thinking of what a repository is has also grown. This new rethought includes the range of different repository platforms and services models. Just as repositories can be defined as a dynamic service and tool, the overall scholarly communication ecosystem can also be defined as a set of related and sometimes interrelated tools and services designed to create, maintain, publish, disseminate, and assess the data and other scholarly outputs created during the research lifecycle. Founded in 1900 by Steel magnate and philanthropist, Andrew Carnegie, Carnegie Mellon University is an R1: Doctoral Universities—Very high research activity private nonprofit research university located in Pittsburgh, Pennsylvania [8]. CMU is home to nearly 1400 faculty and 14,000 students representing seven academic colleges, ranging from business, computer science, fine arts, engineering, humanities and social sciences, information systems and public policy, and the sciences [9]. With campuses located nationwide in Pittsburgh, New York, Silicon Valley, and globally with campuses in Qatar, Rwanda, and Australia, CMU is well represented and situated within the global communities of research and practice. CMU students represent 109 countries, with faculty also representing forty-two countries. The CMU Alumni network includes over 105,000 thousand living members representing 145 different countries [9]. With such high and dynamic focuses in both the arts and STEM, and with such a large and diverse global reach, the research data and other scholarly outputs produced by its campus community range in diversity as well. In 2015, Carnegie Mellon University (CMU) began evaluating its own institutional repository platform and services models. CMU concluded that a new repository was needed to support the wide range of materials it produces, including research data and other forms of scholarly outputs. Beyond focusing on the repository and service models, CMU also focused on the overall scholarly communication ecosystem. This additional focus included examining and considering the expanded role the IR. CMU sought a partnership with open data repository platform Figshare, in examining the development of a new repository that could comprehensively serve an academic institution or research entity by serving the multiple needs required of a new generation of repositories, while also expanding the role a repository could play in the broader research lifecycle for the individual and the institution. This paper is based upon a presentation of the same title delivered at the 2018 Open Repositories Conference held at Montana State University in Bozeman, Montana [10]. 2. Figshare.com to Figshare for Institutions Figshare was launched in by Mark Hahnel while he was finishing his PhD in stem cell biology at Imperial College London [11]. Through a chance meeting during the Beyond Impact workshop at Publications 2019, 7, 30 3 of 24 the Wellcome Trust in 2011, Figshare was offered funding by Digital Science to grow the company with the mission of making research data citable, shareable, and discoverable [12]. The platform initially operated under a ‘freemium’ model, allowing individual researchers to create free accounts and upload research data, regardless of whether it was associated with a published paper or contained negative results. In 2013, there was a push from government initiatives and funding agencies across the globe regarding public access to research (this seen in open access activity in Australia to the OSTP Memo from The Obama Administration in the US) [13,14]. During this same year, Figshare announced an enterprise tool and began working to support universities and academic publishers in research data management. With Figshare for Institutions, universities were given a way to publish research data in any file type and encourage collaboration, sharing of research, and data reuse. Today, Figshare works with over 100 enterprise partners globally and hosts over 5 million files on the Figshare platform [15]. Since its inception, Figshare has aligned itself and its mission with the wider open access and open research communities. Larger open research initiatives like the Research Data Alliance (RDA) and FORCE11 have helped provide standards and guidelines for the community around best practices
Recommended publications
  • Gigabyte: Publishing at the Speed of Research Scott C
    EDITORIAL GigaByte: Publishing at the Speed of Research Scott C. Edmunds1,* and Laurie Goodman1 1 GigaScience, BGI Hong Kong Tech Co Ltd., 26F A Kings Wing Plaza, 1 On Kwan Street, Shek Mun, Sha Tin, NT, Hong Kong, China ABSTRACT Current practices in scientific publishing are unsuitable for rapidly changing fields and for presenting updatable data sets and software tools. In this regard, and as part of our continuing pursuit of pushing scientific publishing to match the needs of modern research, we are delighted to announce the launch of GigaByte, an online open-access, open data journal that aims to be a new way to publish research following the software paradigm: CODE, RELEASE, FORK, UPDATE and REPEAT. Following on the success of GigaScience in promoting data sharing and reproducibility of research, its new sister, GigaByte, aims to take this even further. With a focus on short articles, using a questionnaire-style review process, and combining that with the custom built publishing infrastructure from River Valley Technologies, we now have a cutting edge, XML-first publishing platform designed specifically to make the entire publication process easier, quicker, more interactive, and better suited to the speed needed to communicate modern research. Subjects Computer Sciences, Data Integration, Data Management FUTURE’S PAST In 2012, we launched GigaScience [1] as a new type of journal — one that provides standard scientific publishing linked directly to a database that hosts all the relevant data. Aiming to address Buckheit and Donoho’s 1995 complaint that “an article about computational results is advertising, not scholarship. The actual scholarship is the full software environment, code and data that produced the result.” [2], we were inspired to launch a new type of journal Submitted: 29 May 2020 that focussed on making sure the actual scholarship of all types of research was included Accepted: 01 June 2020 with the article narrative.
    [Show full text]
  • Open Science and the Role of Publishers in Reproducible Research
    Open science and the role of publishers in reproducible research Authors: Iain Hrynaszkiewicz (Faculty of 1000), Peter Li (BGI), Scott Edmunds (BGI) Chapter summary Reproducible computational research is and will be facilitated by the wide availability of scientific data, literature and code which is freely accessible and, furthermore, licensed such that it can be reused, inteGrated and built upon to drive new scientific discoveries without leGal impediments. Scholarly publishers have an important role in encouraGing and mandating the availability of data and code accordinG to community norms and best practices, and developinG innovative mechanisms and platforms for sharinG and publishinG products of research, beyond papers in journals. Open access publishers, in particular the first commercial open access publisher BioMed Central, have played a key role in the development of policies on open access and open data, and increasing the use by scientists of leGal tools – licenses and waivers – which maximize reproducibility. Collaborations, between publishers and funders of scientific research, are vital for the successful implementation of reproducible research policies. The genomics and, latterly, other ‘omics communities historically have been leaders in the creation and wide adoption of policies on public availability of data. This has been throuGh policies, such as Fort Lauderdale and the Bermuda Principles; infrastructure, such as the INSDC databases; and incentives, such as conditions of journal publication. We review some of these policies and practices, and how these events relate to the open access publishinG movement. We describe the implementation and adoption of licenses and waivers prepared by Creative Commons, in science publishinG, with a focus on licensing of research data published in scholarly journals and data repositories.
    [Show full text]
  • Gold Open Access in the Uk: Springer Nature's Transition
    springernature.com Construction of a reference genome sequence for barley Open Research GOLD OPEN ACCESS IN THE UK: SPRINGER NATURE’S TRANSITION Case study Open Research Journals Data Books Tools Open Research: Journals, books, data and tools from: Contents Foreword ...........................................................1 Authors Carrie Calder, Mithu Lucraft, Executive summary ...................................................3 Jessica Monaghan, Ros Pyne, Where are we now? Gold OA in the UK after Finch ..........................4 Veronika Spinka UK and Springer Nature timeline.....................................4 Springer Nature’s UK transition......................................6 May 2018 Springer Nature’s UK publications in fully OA and hybrid journals ..........7 Springer Nature’s UK publications by discipline .........................9 This case study has been made openly available in the Figshare repository. Springer Nature’s transition in Europe ................................10 Choosing the gold route ...............................................11 Access case study: Gold OA options in Springer Nature journals ...........................11 https://doi.org/10.6084/m9. The role of hybrid .................................................11 figshare.6230813.v1 Supporting the transition to OA .........................................13 Partnering with the research community ..............................13 BMC memberships ................................................13 Springer Compact.................................................14
    [Show full text]
  • Data Journals: a Survey
    This is a preprint of the article: L. Candela, D. Castelli, P. Manghi, A. Tani Data Journals: A Survey. Journal of the Association for Information and Science Technology. Accepted for publication in June 2014. Will appear with DOI: 10.1002/asi.23358 ----- Data Journals: A Survey Leonardo Candela, Donatella Castelli, Paolo Manghi, and Alice Tani Istituto di Scienza e Tecnologie dell’Informazione “A. Faedo”, Italian National Research Council, via G. Moruzzi, 1, 56124, Pisa, Italy. E-mail: {candela, castelli, manghi, tani}@isti.cnr.it Data occupy a key role in our information society. However, although the amount of published data continues to grow and terms like “data deluge” and “big data” today characterize numerous (research) initiatives, a lot of work is still needed in the direction of publishing data in order to make them effectively discoverable, available, and reusable by others. Several barriers hinder data publishing, from lack of attribution and rewards, vague citation practices, quality issues, to a rather general lack of data sharing culture. Lately, data journals came forward as a solution to overcome some of these barriers. In this study of more than 100 currently existing data journals, we describe the approaches they promote for datasets description, availability, citation, quality and open access. We close by identifying ways to expand and strengthen the data journals approach as a means to actually promote datasets access and exploitation. Introduction Data – that serving “Big Science” as well as that serving “Long-tail Science” (Murray-Rust, 2008) – is emerging as a driving instrument in science. Benefitting from data availability researchers are envisaging a large variety of new research patterns that are revolutionizing how science is being conducted.
    [Show full text]
  • Open Data Policies Among Library and Informationscience Journals
    publications Article Open Data Policies among Library and Information Science Journals Brian Jackson Library, Mount Royal University, 4825 Mount Royal Gate SW, Calgary, AB T3E 6K6, Canada; [email protected] Abstract: Journal publishers play an important role in the open research data ecosystem. Through open data policies that include public data archiving mandates and data availability statements, journal publishers help promote transparency in research and wider access to a growing scholarly record. The library and information science (LIS) discipline has a unique relationship with both open data initiatives and academic publishing and may be well-positioned to adopt rigorous open data policies. This study examines the information provided on public-facing websites of LIS journals in order to describe the extent, and nature, of open data guidance provided to prospective authors. Open access journals in the discipline have disproportionately adopted detailed, strict open data policies. Commercial publishers, which account for the largest share of publishing in the discipline, have largely adopted weaker policies. Rigorous policies, adopted by a minority of journals, describe the rationale, application, and expectations for open research data, while most journals that provide guidance on the matter use hesitant and vague language. Recommendations are provided for strengthening journal open data policies. Keywords: open data policies; research data management; academic publishing; library and infor- mation science Citation: Jackson, B. Open Data Policies among Library and Information Science Journals. Publications 2021, 9, 25. 1. Introduction https://doi.org/10.3390/ publications9020025 The idea of research data as a public good is burgeoning across disciplines. In this shifting conception, open research data is observed as bolstering the transparency and Academic Editor: reproducibility of results reported in other research products and as a distinct contribution Svetla Baykoucheva to the scholarly record.
    [Show full text]
  • How to Cite Datasets and Link to Publications
    A Digital Curation Centre ‘working level’ guide How to Cite Datasets and Link to Publications Alex Ball (DCC) and Monica Duke (DCC) Please cite as: Ball, A., & Duke, M. (2015). ‘How to Cite Datasets and Link to Publications’. DCC How-to Guides. Edinburgh: Digital Curation Centre. Available online: http://www.dcc.ac.uk/resources/how-guides Digital Curation Centre, 2015. Licensed under Creative Commons Attribution 4.0 International: http://creativecommons.org/licenses/by/4.0/ How to Cite Datasets and Link to Publications Introduction This guide will help you create links between your academic publications and the underlying datasets, so that anyone viewing the publication will be able to locate the dataset and vice versa. It provides a working knowledge of the issues and challenges involved, and of how current approaches seek to address them. This guide should interest researchers and principal investigators working on data-led research, as well as the data repositories with which they work. Why cite datasets and link mechanisms allowing authors to be open about their research while still receiving due credit; metrics used them to publications? to translate such attributions into rewards for authors 1 and their institutions; and archives ensuring that the The motivation to cite datasets arises from a recog- work is permanently available for reference and reuse.5 nition that data generated in the course of research If datasets are to be regarded as first-class records of are just as valuable to the ongoing academic discourse research, as they need to be, a similar set of control as papers and monographs.
    [Show full text]
  • Open Science MOOC Response to UNESCO Draft Open Science Recommendations
    Open Science MOOC Response to UNESCO Draft Open Science Recommendations December 30, 2020 1 The Open Science MOOC ​ 2020 Steering Committee ([email protected]) responds on ​ ​ ​ ​ behalf of our community to the UNESCO consultation on open science. We offer this response from our perspective as a global source of open educational materials for teaching and learning open science principles and practices. Our community and recommendations center early career researchers as a key audience for and as valuable creators of educational materials. Our steering committee offers our response to the UNESCO Draft Open Science Recommendations based on our own reactions and on comments from the broader Open Science MOOC community. Our response is structured around the following categories: ● Promoting a common understanding of Open Science ● Developing an enabling policy environment for Open Science ● Investing in Open Science infrastructures, services and capacity building for Open Science ● Transforming scientific culture and aligning incentives for Open Science ● Promoting innovative approaches for Open Science at different stages of the scientific process ● Promoting international cooperation on Open Science Promoting a common understanding of Open Science ​ ​ Definition of Open Science is a continuous debate. It can differ a lot for different communities. Consequently, it is extremely important to agree with the universal and common definition of Open Science and its components. This will make Open Science not a manipulative but a democratic agenda and prevent violation of Open Science principles and their usage subject to third party interests. Generally, Open Science’s definition, discussion, promotion and implementation need support from non-governmental, international and national governmental organisations, educational institutions and scientific unions, media and citizens.
    [Show full text]
  • Making Open Science a Reality
    MAKING OPEN SCIENCE A REALITY 1 ORGANISATION FOR ECONOMIC CO-OPERATION AND DEVELOPMENT The OECD is a unique forum where governments work together to address the economic, social and environmental challenges of globalisation. The OECD is also at the forefront of efforts to understand and to help governments respond to new developments and concerns, such as corporate governance, the information economy and the challenges of an ageing population. The Organisation provides a setting where governments can compare policy experiences, seek answers to common problems, identify good practice and work to co-ordinate domestic and international policies. The OECD member countries are: Australia, Austria, Belgium, Canada, Chile, the Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Iceland, Ireland, Israel, Italy, Japan, Korea, Luxembourg, Mexico, the Netherlands, New Zealand, Norway, Poland, Portugal, the Slovak Republic, Slovenia, Spain, Sweden, Switzerland, Turkey, the United Kingdom and the United Sates. The European Union takes part in the work of the OECD. This work is published on the responsibility of the Secretary-General of the OECD. The opinions expressed and arguments employed herein do not necessarily reflect the official views of the Organisation or of the governments of its member countries. © OECD 2015 The statistical data for Israel are supplied by and under the responsibility of the relevant Israeli authorities. The use of such data by the OECD is without prejudice to the status of the Golan Heights, East Jerusalem and Israeli settlements in the West Bank under the terms of international law. Photo credits: Cover © Getty Images International 2 TABLE OF CONTENTS FOREWORD ................................................................................................................................................... 5 GLOSSARY ...................................................................................................................................................
    [Show full text]
  • Design for Open Access Publications in European Research Areas for Social Sciences and Humanities Project Number: GA 731031 OPERAS-D
    Design for Open access Publications in European research Areas for Social Sciences and Humanities Project Number: GA 731031 OPERAS-D WP 2: Developing network and e-infrastructure strategy Landscape Study on Open Access Publishing The project has received funding from European Union’s Horizon 2020 research and innovation programme under grant agreement 731031. 0 Contents List of Acronyms and Abbreviations .......................................................................................... 2 Executive Summary ................................................................................................................... 3 1. Introduction .......................................................................................................................... 5 2. Milestones in the Open Access Movement ......................................................................... 6 2.1 The three Bs: Budapest, Berlin and Bethesda ............................................................... 6 2.2 Pathways to Open Access ............................................................................................... 9 2.3 Policies and Mandates .................................................................................................. 10 2.4 Infrastructures .............................................................................................................. 15 3. Open Access Publishing in SSH ........................................................................................... 19 3.1 The Landscape ..............................................................................................................
    [Show full text]
  • The Open Science Training Handbook
    The Open Science Training Handbook Table of Contents The Open Science Training Handbook .................................................................................................... 4 Help us making the handbook better .................................................................................................. 4 Let's run an Open Science training together ....................................................................................... 5 How to refer to the handbook ............................................................................................................ 5 The Authors and the Book Sprint facilitators ...................................................................................... 5 Thank you to ........................................................................................................................................ 6 Copyright statement............................................................................................................................ 6 Funding ................................................................................................................................................ 6 Purpose of the book ................................................................................................................................ 7 Who is this book for? .......................................................................................................................... 8 What is Open Science? .......................................................................................................................
    [Show full text]
  • Workflows for Research Data Publishing: Models and Key Components
    Final Manuscript for submission to the International Journal of Digital Libraries, Special edition: Research Data Publishing We would be very grateful if you would fill in this short survey on your evaluation of the manuscript. It will take about 3 minutes to complete the survey and you don’t need a SurveyMonkey account to do it (just close the annoying pop-up window if there is one: https://www.surveymonkey.com/r/workflows_for_data_publication ​ Workflows for Research Data Publishing: Models and Key Components 1 2 § 3 4,5 Authors: Theodora Bloom ,​ Sünje Dallmeier-Tiessen *​ ,​ Fiona Murphy *​ , Claire C. Austin ,​ ​ ​ ​ ​ ​ ​ ​ 6 7 8 9 10 Angus Whyte ,​ Jonathan Tedds ,​ Amy Nurnberger ,​ Lisa Raymond ,​ Martina Stockhause ,​ ​ ​ ​ ​ ​ Mary Vardigan11 ​ *Lead authors § C​ orresponding author: [email protected] ​ 12 13 14 15 15 Contributors: Tim Clarke ,​ Eleni Castro ,​ Elizabeth Newbold ,​ Samuel Moore ,​ Brian Hole ​ ​ ​ ​ ​ ​ 1 2 3 4 5 Author affiliations: B​ MJ, C​ ERN, W​ iley, E​ nvironment Canada, R​ esearch Data Canada, ​ ​ ​ ​ ​ ​ 6 7 8 9 D​ igital Curation Centre - Edinburgh, U​ niversity of Leicester, C​ olumbia University, W​ oods ​ ​ ​ 10 11 Hole Oceanographic Institution, G​ erman Climate Computing Centre (DKRZ) , U​ niversity of ​ ​ Michigan/ICPSR 12 13 Contributor affiliations: M​ assachusetts General Hospital/Harvard Medical School, H​ arvard ​ ​ ​ 14 15 University/IQSS, T​ he British Library, U​ biquity Press ​ ​ Author statement: All authors1 affirm that they have no undeclared conflicts of interest. ​ Opinions expressed in this paper are those of the authors and do not necessarily reflect the policies of the organizations with which they are affiliated. ABSTRACT Data publishing is a major cornerstone of open science, reliable research, and modern scholarly communication.
    [Show full text]
  • Data Papers As a New Form of Knowledge Organization in the Field of Research Data Joachim Schöpfel, Dominic Farace, Hélène Prost, Antonella Zane
    Data papers as a new form of knowledge organization in the field of research data Joachim Schöpfel, Dominic Farace, Hélène Prost, Antonella Zane To cite this version: Joachim Schöpfel, Dominic Farace, Hélène Prost, Antonella Zane. Data papers as a new form of knowledge organization in the field of research data. 12ème Colloque international d’ISKO-France : Données et mégadonnées ouvertes en SHS : de nouveaux enjeux pour l’état et l’organisation des con- naissances ?, ISKO France, Oct 2019, Montpellier, France. halshs-02284548 HAL Id: halshs-02284548 https://halshs.archives-ouvertes.fr/halshs-02284548 Submitted on 7 Oct 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Data Papers as a New Form of Knowledge Organization in the Field of Research Data Joachim Schöpfel University of Lille, GERiiCO laboratory Associate professor in information sciences, ORCID 0000-0002-4000-807X [email protected] Dominic Farace GreyNet International, Amsterdam Director of the Grey Literature Network Service, ORCID 0000-0003-2561-3631 [email protected] Hélène Prost CNRS, GERiiCO laboratory Information engineer, ORCID 0000-0002-7982-2765 [email protected] Antonella Zane University of Padova, Library System Head of the Antonio Vallisneri Bio-medical Library, ORCID 0000-0001-7218-6068 [email protected] Abstract Data papers have been defined as scholarly journal publications whose primary purpose is to describe research data.
    [Show full text]