Developing a Comprehensive Repository at Carnegie Mellon University

publications Article Balancing Multiple Roles of Repositories: Developing a Comprehensive Repository at Carnegie Mellon University David Scherer 1,* and Daniel Valen 2 1 University Libraries, Carnegie Mellon University, Pittsburgh, PA 15237, USA 2 Figshare, Cambridge, MA 02139, USA; dan@figshare.com * Correspondence: [email protected]; Tel.: +1-412-268-2443 Received: 27 February 2019; Accepted: 22 April 2019; Published: 26 April 2019 Abstract: Many academic and research institutions today maintain multiple types of institutional repositories operating on different systems and platforms to accommodate the needs and governance of the materials they house. Often, these institutions support multiple repository infrastructures, as these systems and platforms are not able to accommodate the broad range of materials that an institution creates. Announced in 2017, the Carnegie Mellon University (CMU) Libraries implemented a new repository solution and service model. Built upon the Figshare for Institutions platform, the KiltHub repository has taken on the role of a traditional institutional repository and institutional data repository, meeting the disparate needs of its researchers, faculty, and students. This paper will review how the CMU Libraries implemented the KiltHub repository and how the repository services was redeveloped to provide a more encompassing solution for traditional institutional repository materials and research datasets. Additionally, this paper will summarize how the CMU University Libraries surveyed the current repository landscape, decided to implement Figshare for Institutions as a comprehensive institutional repository, revised its previous repository service model to accommodate the influx of new material types, and what needed to be developed for campus engagement. This paper is based upon a presentation of the same title delivered at the 2018 Open Repositories Conference held at Montana State University in Bozeman, Montana. Keywords: institutional repositories; research data; Carnegie Mellon University; scholarly communications; open access; open scholarship; open science; open data; engagement 1. Introduction Over the last two decades since the creation of the DSpace repository platform by MIT and Hewlett-Packard in 2002 [1], academic and research institutions have developed and implemented a wide range of institutional repositories. Increasingly, institutional repositories have become a dynamic tool for scholarly communication, and a necessary resource for managing institutional research and knowledge [2]. This has included multiple repositories focused on maintaining and housing the wide range of materials that required unique environments and needs to accommodate them digitally. Likewise, some repositories were designed for set purposes, such as Electronic Thesis/Dissertation (ETD) repositories, open access publication repositories, and research data repositories. As the creation of research data has increased, so too has the need to support its creation and management. Michael Witt noted that academic and research libraries have taken a more active role in the research data management services and infrastructure provided by institutions to handle the increase in data output [3]. The expansion of roles for academic libraries now has often led to their expanded integration in the research cycle of their institutions. Witt further elaborates this point, Publications 2019, 7, 30; doi:10.3390/publications7020030 www.mdpi.com/journal/publications Publications 2019, 7, 30 2 of 24 detailing that libraries can collaborate with their campus communities to understand what tools, services, and support will be necessary to support services for data [3]. As Tenopir et al. explained in their 2015 study, this can lead to libraries becoming invested partners in all aspects of the research process, from data collection to publication, and to the preservation of research outputs [4]. In his 2002 SPARC position paper, Raym Crow noted that an Institutional Repository (IR) could be implemented to demonstrate the visibility, reach, and overall significance of an institution’s research, thereby providing both short-term and long-term benefits [5]. In contrast, Clifford Lynch expanded upon the notion of how an IR could be defined beyond a single entity or service. In his 2003 ARL briefing, Lynch described an IR not just as a single entity or service, but rather as “a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members” [6]. More recently, the Research Data Alliance’s (RDA) Data Foundations and Terminology Working Group presented a more defined definition of a repository, especially with the involvement of research data. RDA defines a repository as “a Repository (aka Data Repository or Digital Data Repository) is a searchable and queryable interfacing entity that is able to store, manage, maintain and curate Data/Digital Objects. A data repository provides a service for human and machine to make data discoverable/searchable through collection(s) of metadata” [7]. As institutional and research repositories have grown in adoption and usage, the conceptual thinking of what a repository is has also grown. This new rethought includes the range of different repository platforms and services models. Just as repositories can be defined as a dynamic service and tool, the overall scholarly communication ecosystem can also be defined as a set of related and sometimes interrelated tools and services designed to create, maintain, publish, disseminate, and assess the data and other scholarly outputs created during the research lifecycle. Founded in 1900 by Steel magnate and philanthropist, Andrew Carnegie, Carnegie Mellon University is an R1: Doctoral Universities—Very high research activity private nonprofit research university located in Pittsburgh, Pennsylvania [8]. CMU is home to nearly 1400 faculty and 14,000 students representing seven academic colleges, ranging from business, computer science, fine arts, engineering, humanities and social sciences, information systems and public policy, and the sciences [9]. With campuses located nationwide in Pittsburgh, New York, Silicon Valley, and globally with campuses in Qatar, Rwanda, and Australia, CMU is well represented and situated within the global communities of research and practice. CMU students represent 109 countries, with faculty also representing forty-two countries. The CMU Alumni network includes over 105,000 thousand living members representing 145 different countries [9]. With such high and dynamic focuses in both the arts and STEM, and with such a large and diverse global reach, the research data and other scholarly outputs produced by its campus community range in diversity as well. In 2015, Carnegie Mellon University (CMU) began evaluating its own institutional repository platform and services models. CMU concluded that a new repository was needed to support the wide range of materials it produces, including research data and other forms of scholarly outputs. Beyond focusing on the repository and service models, CMU also focused on the overall scholarly communication ecosystem. This additional focus included examining and considering the expanded role the IR. CMU sought a partnership with open data repository platform Figshare, in examining the development of a new repository that could comprehensively serve an academic institution or research entity by serving the multiple needs required of a new generation of repositories, while also expanding the role a repository could play in the broader research lifecycle for the individual and the institution. This paper is based upon a presentation of the same title delivered at the 2018 Open Repositories Conference held at Montana State University in Bozeman, Montana [10]. 2. Figshare.com to Figshare for Institutions Figshare was launched in by Mark Hahnel while he was finishing his PhD in stem cell biology at Imperial College London [11]. Through a chance meeting during the Beyond Impact workshop at Publications 2019, 7, 30 3 of 24 the Wellcome Trust in 2011, Figshare was offered funding by Digital Science to grow the company with the mission of making research data citable, shareable, and discoverable [12]. The platform initially operated under a ‘freemium’ model, allowing individual researchers to create free accounts and upload research data, regardless of whether it was associated with a published paper or contained negative results. In 2013, there was a push from government initiatives and funding agencies across the globe regarding public access to research (this seen in open access activity in Australia to the OSTP Memo from The Obama Administration in the US) [13,14]. During this same year, Figshare announced an enterprise tool and began working to support universities and academic publishers in research data management. With Figshare for Institutions, universities were given a way to publish research data in any file type and encourage collaboration, sharing of research, and data reuse. Today, Figshare works with over 100 enterprise partners globally and hosts over 5 million files on the Figshare platform [15]. Since its inception, Figshare has aligned itself and its mission with the wider open access and open research communities. Larger open research initiatives like the Research Data Alliance (RDA) and FORCE11 have helped provide standards and guidelines for the community around best practices

Developing a Comprehensive Repository at Carnegie Mellon University

Gigabyte: Publishing at the Speed of Research Scott C

Open Science and the Role of Publishers in Reproducible Research

Gold Open Access in the Uk: Springer Nature's Transition

Data Journals: a Survey

Open Data Policies Among Library and Informationscience Journals

How to Cite Datasets and Link to Publications

Open Science MOOC Response to UNESCO Draft Open Science Recommendations

Making Open Science a Reality

Design for Open Access Publications in European Research Areas for Social Sciences and Humanities Project Number: GA 731031 OPERAS-D

The Open Science Training Handbook

Workflows for Research Data Publishing: Models and Key Components

Data Papers As a New Form of Knowledge Organization in the Field of Research Data Joachim Schöpfel, Dominic Farace, Hélène Prost, Antonella Zane