Media Digitization and Preservation Initiative: a Case Study

IJDC | Peer-Reviewed Paper Media Digitization and Preservation Initiative: A Case Study Devan Ray Donaldson Allison McClanahan Indiana University Indiana University Leif Christiansen Laura Bell Indiana University Indiana University Mikala Narlock Shannon Martin Indiana University Indiana University Haley Suby Indiana University Abstract Since its creation nearly a decade ago, the Digital Curation Centre (DCC) Curation Lifecycle Model has become the quintessential framework for understanding digital curation. Organizations and consortia around the world have used the DCC Curation Lifecycle Model as a tool to ensure that all the necessary stages of digital curation are undertaken, to define roles and responsibilities, and to build a framework of standards and technologies for digital curation. Yet, research on the application of the model to large-scale digitization projects as a way of understanding their efforts at digital curation is scant. This paper reports on findings of a qualitative case study analysis of Indiana University Bloomington’s multi-million-dollar Media Digitization and Preservation Initiative (MDPI), employing the DCC Curation Lifecycle Model as a lens for examining the scope and effectiveness of its digital curation efforts. Findings underscore the success of MDPI in performing digital curation by illustrating the ways it implements each of the model’s components. Implications for the application of the DCC Curation Lifecycle Model in understanding digital curation for mass digitization projects are discussed as well as directions for future research. Received 9 July 2017 ~ Revision received 4 May 2018 ~ Accepted 7 May 2018 Correspondence should be addressed to Dr. Devan Ray Donaldson, 700 N. Woodlawn Avenue, Luddy Hall 2126, Department of Information and Library Science, School of Informatics, Computing, and Engineering, Bloomington, IN, USA, 47408. Email: [email protected] The International Journal of Digital Curation is an international journal committed to scholarly excellence and dedicated to the advancement of digital curation across a wide range of sectors. The IJDC is published by the University of Edinburgh on behalf of the Digital Curation Centre. ISSN: 1746-8256. URL: http://www.ijdc.net/ Copyright rests with the authors. This work is released under a Creative Commons Attribution Licence, version 4.0. For details please see https://creativecommons.org/licenses/by/4.0/ International Journal of Digital Curation 91 http://dx.doi.org/10.2218/ijdc.v13i1.502 2018, Vol. 13, Iss. 1, 91–113 DOI: 10.2218/ijdc.v13i1.502 92 | Media Digitization and Preservation Initiative doi:10.2218/ijdc.v13i1.502 Introduction Since its creation nearly a decade ago, the Digital Curation Centre (DCC) Curation Lifecycle Model has become the quintessential framework for understanding digital curation. Organizations and consortia around the world have used the DCC Curation Lifecycle Model as a tool to ensure that all the necessary stages of digital curation are undertaken, to define roles and responsibilities, and to build a framework of standards and technologies for digital curation. Recently, researchers have begun exploring the impact of the DCC Curation Lifecycle Model on understanding how digital curation is performed in various contexts with different types of digital data. These include, but are not limited to: video data in social studies of interaction (Whyte, 2009), and brain images in psychiatric research (Whyte, 2008). One unexplored research area involves the utility of the DCC Curation Lifecycle Model to act as a lens for understanding digital curation in mass digitization projects. This type of research is critical given the rise of mass digitization projects over the past decade, which is expected to increase within the cultural heritage and library services domains. The purpose of this paper is to report findings from a qualitative case study analysis of a mass digitization project using the DCC Curation Lifecycle Model as a means of understanding the scope and effectiveness of its digital curation efforts. Of the mass digitization projects that currently exist worldwide, we selected Indiana University Bloomington’s (IUB’s) multi-million-dollar Media Digitization and Preservation Initiative (MDPI) for our case study because heretofore research has focused primarily on the mass digitization of textual resources (e.g., books). In contrast, MDPI aims to digitize and make accessible a wide variety of time-based, audiovisual media. The main research question this study addresses is: How do the actions of MDPI compare to the actions specified in the DCC Curation Lifecycle Model? The remainder of this paper is organized as follows. First, the background section explores: 1) the value and importance of mass digitization and recent mass digitization projects; 2) MDPI, including its history and development; and 3) the DCC Curation Lifecycle Model, including previous research on its application in various organizational contexts with different types of digital data. Second, the methods section describes how we applied case study research design and methods to this project. Third, the findings section provides details about how MDPI implements each part of the DCC Curation Lifecycle Model. Fourth, the discussion section addresses the utility of the DCC Curation Lifecycle Model for understanding digital curation in mass digitization projects based on our findings. The discussion section also compares our results with findings of other studies that have used the DCC Curation Lifecycle Model as a framework. The paper concludes with discussion of future directions for research. Background Defining Mass Digitization As early as 1993, mass digitization has been discussed as a way to provide wider access to and preservation for archives and collections. In the past decade, mass digitization IJDC | Peer-Reviewed Paper doi:10.2218/ijdc.v13i1.502 Donaldson et al. | 93 has received a considerable amount of attention and scrutiny due to the involvement of large corporations and research institutions such as Yahoo!, Google, Microsoft, Stanford University, the University of California system, and the University of Michigan. Despite several publications on the topic of mass digitization, few definitions for the term exist (Schmitz, 2008). This has led to some vagaries and has, in part, hidden the variety of mass digitization projects. Coyle (2006) defines mass digitization as “the conversion of materials on an industrial scale.” For this paper, our working definition of mass digitization builds on Coyle’s (2006) by considering six characteristics. Note that they are not mutually exclusive: Aggregation and production (e.g., whether a project aggregates material that others have digitized, and the extent to which a project performs any digitization in-house); Openness (e.g., the extent to which materials are open and freely accessible); Business model and cost (e.g., commercial, non-profit, etc., and who pays for the digitization?); Scope (e.g., are you digitizing everything, most everything, or specific collections?); Format (e.g., digitized books, audiovisual materials, 3D Materials, etc.); Time spent digitizing (e.g., seconds per item, minutes per item, hours per item, etc.). Based on our scan of recent major mass digitization projects, we argue that these characteristics make the most significant difference in determining whether a digitization project is a mass digitization project. Below we describe each of these characteristics as they relate to specific projects. Aggregation and production Arguably the most massive and most publicized digitization project is the Google Books project, formerly Google Print. Officially started in 2004, the Google Books project has amassed a collection of approximately 25 million volumes to create a searchable and meaningful indexed digital library (Heyman, 2015). Google digitized many of these volumes. Libraries may bring books to Google for digitization according to Google’s proprietary method.1 Libraries have also contributed books that they digitized, such as the University of California Davis’ contribution of 45,000 volumes.2 In contrast to Google Books, the Open Book Alliance was formed in support of open digitization projects.3 The Open Book Alliance was a consortium of industry, library, and union representatives dedicated to open digital libraries. Members of the Open Book Alliance undertook their own self-titled mass digitization projects. The most successful member of the Open Book Alliance is the Internet Archive, which has over 11 million books and texts.4 These books are scanned either by the archive or one of its 1 5 Things About JSTOR: https://about.jstor.org/5things/ 2 UC Davis Joins Google Book Digitization Project: https://www.cdlib.org/cdlinfo/2013/07/29/uc-davis- joins-google-book-digitization-project/ 3 Open Book Alliance - Mission: http://www.che.ntu.edu.tw/ntuche/safety/upload/browse.php? u=Oi8vd2ViLmFyY2hpdmUub3JnL3dlYi8yMDEzMDYyMDE1NTk0NS9odHRwOi8vd3d3Lm9wZ W5ib29rYWxsaWFuY2Uub3JnL21pc3Npb24v&b=13 4 Internet Archive – About: https://archive.org/about/ IJDC | Peer-Reviewed Paper 94 | Media Digitization and Preservation Initiative doi:10.2218/ijdc.v13i1.502 450+ partners. The Internet Archive operates 33 scanning centers across the globe.5 Partners of the Internet Archive include corporate entities such as Yahoo!, which contributed approximately 200,000 texts to the archive.6 The Internet Archive allows the full download and creative commons use of all 11 million

Media Digitization and Preservation Initiative: a Case Study

Diversity in Digital Libraries

Challenges in Sustaining the Million Book Project, a Project Supported by the National Science Foundation

Future Reading" Digitization and Its Discontents

Frequently Asked Questions About the Million Book Project

Handbook of E-Resources from Yenepoya Central Library

The Million Book Project at Bibliotheca Alexandrina

Identifying and Selecting Content for the Million Book Project

Zhejiang University Libraries and CADAL Project: Retrospection & Anticipation Huang Chen (Vice Librarian, Zhejiang University Libraries)

The Project Gutenberg Ebook of a Short History of Ebooks, by Marie Lebert

Preserving Ebooks 1

Creating a Free to Read International Digital Library - Five Years Later

Digitizing a Million Books: Challenges for Document Analysis