Metadata Creation Practices in Digital Repositories and Collections
Total Page:16
File Type:pdf, Size:1020Kb
Metadata Creation Practices in Digital Repositories and Collections: Schemata, Jung-ran Park and Selection Criteria, and Interoperability Yuji Tosaka This study explores the current state of metadata-creation development of such a mediation mechanism calls for practices across digital repositories and collections by an empirical assessment of various issues surrounding metadata-creation practices. using data collected from a nationwide survey of mostly The critical issues concerning metadata practices cataloging and metadata professionals. Results show across distributed digital collections have been rela- that MARC, AACR2, and LCSH are the most widely tively unexplored. While examining learning objects and used metadata schema, content standard, and subject- e-prints communities of practice, Barton, Currier, and Hey point out the lack of formal investigation of the metadata- controlled vocabulary, respectively. Dublin Core (DC) is creation process.2 As will be discussed in the following the second most widely used metadata schema, followed section, some researchers have begun to assess the current by EAD, MODS, VRA, and TEI. Qualified DC’s wider state of descriptive practices, metadata schemata, and use vis-à-vis Unqualified DC (40.6 percent versus 25.4 content standards. However, the literature has not yet developed to a point where it affords a comprehensive percent) is noteworthy. The leading criteria in selecting picture. Given the propagation of metadata projects, it is metadata and controlled-vocabulary schemata are collec- important to continue to track changes in metadata-cre- tion-specific considerations, such as the types of resources, ation practices while they are still in constant flux. Such efforts are essential for adding new perspectives to digital nature of the collection, and needs of primary users and library research and practices in an environment where communities. Existing technological infrastructure and metadata best practices are being actively sought after staff expertise also are significant factors contributing to aid in the creation and management of high-quality to the current use of metadata schemata and controlled digital collections. This study examines the prevailing current state of vocabularies for subject access across distributed digital metadata-creation practices in digital repositories, col- repositories and collections. Metadata interoperability lections, and libraries, which may include both digitized remains a major challenge. There is a lack of exposure of and born-digital resources. Using nationwide survey locally created metadata and metadata guidelines beyond data, mostly drawn from the community of catalog- ing and metadata professionals, we seek to investigate the local environments. Homegrown locally added meta- issues in creating descriptive metadata elements, using data elements may also hinder metadata interoperability controlled vocabularies for subject access, and propa- across digital repositories and collections when there is a gating metadata and metadata guidelines beyond local environments. lack of sharable mechanisms for locally defined extensions We will address the following research questions: and variants. 1. Which metadata schema(ta) and content standard(s) are employed in individual digital repositories and etadata is an essential building block in facili- collections? tating effective resource discovery, access, and 2. Which controlled vocabulary schema(ta) are used to M sharing across ever-growing distributed digital facilitate subject access? collections. Quality metadata is becoming critical in a 3. What criteria are applied in selecting metadata and networked world in which metadata interoperability controlled-vocabulary schema(ta)? is among the top challenges faced by digital libraries. 4. To what extent are mechanisms for exposing and However, there is no common data model that catalog- sharing metadata integrated into current metadata- ing and metadata professionals can readily reference creation practices? as a mediation mechanism during the processes of descriptive metadata creation and controlled vocabu- In this article, we first review recent studies relating lary schemata application for subject description.1 The to current metadata-creation practices across digital col- lections. Then we present the survey method employed to conduct this study, the general characteristics of survey participants, and the validity of the collected data, fol- lowed by the study results. We report on how metadata Jung-ran park ([email protected]) is assistant and controlled vocabulary schema(ta) are being used Professor, College of Information Science and Technology, Drex- across institutions, and we present a data analysis of el university, Philadelphia, and Yuji tosaka ([email protected]) current metadata-creation practices. The final section is Cataloging/Metadata librarian, TCnJ library, The College of summarizes the study and presents some suggestions for new Jersey, Ewing, new Jersey. future studies. 104 INForMAtioN TECHNoloGY AND liBrAries | SepteMBer 2010 ■■ Literature Review possible increase in the use of locally developed schemata as many projects added new types of nontextual digital As evinced by the principles and practices of bib- objects that could not be adequately described by existing liographic control through shared cataloging, successful metadata schemata.6 resource access and sharing in the networked envi- There is a lack of research concerning the current use ronment demands semantic interoperability based on of content standards; however, it is reasonable to suspect accurate, complete, and consistent resource description. that content-standards use exhibits patterns similar to The recent survey by Ma finds that the Open Archives that of metadata because of their often close association Initiative Protocol for Metadata Harvesting (OAI-PMH) with particular metadata schemata. The OCLC RLG sur- and metadata crosswalks have been adopted by 83 vey reveals that Anglo-American Cataloguing Rules, 2nd percent and 73 percent of respondents, respectively. edition (AACR2)—the traditional cataloging rule that has Even though the sample comes only from sixty-eight most often been used in conjunction with MARC—is the Association of Research Libraries (ARL) member librar- most widely used content standard (81 percent). AACR2 ies, and the figures thus may be skewed higher than is followed by Describing Archives: A Content Standard those of the entire population of academic libraries, (DACS) with 42 percent; Descriptive Cataloging of Rare there is little doubt that interoperability is a critical Materials with 33 percent; Archives, Personal Papers, issue given the rapid proliferation of metadata schemata Manuscripts (APPM) with 25 percent; and Cataloging throughout digital libraries.3 Cultural Objects (CCO) with 21 percent.7 While there is a variety of metadata schemata cur- In the same way as metadata schemata, there appears rently in use for organizing digital collections, only a to be a concentration of a few controlled vocabulary few of them are widely used in digital repositories. In schemata at research institutions. Ma’s ARL survey, for her ARL survey, Ma reports that the MARC format is the example, shows that the Library of Congress Subject most widely used metadata schema (91 percent), followed Headings (LCSH) and Name Authority File (NAF) were by Encoded Archival Description (EAD) (84 percent), used by most survey respondents (96 percent and 88 Unqualified Dublin Core (DC) (78 percent), and Qualified percent, respectively). These two predominantly adopted DC (67 percent).4 Similarly, a 2007 member survey by vocabularies are followed by several domain-specific OCLC Research Libraries Group (RLG) programs gath- vocabularies, such as Art and Architecture Thesaurus ered information from eighteen major research libraries (AAT), Library of Congress Thesaurus for Graphical and cultural heritage institutions and also found that Materials (TGM) I and II, Getty Thesaurus of Geographic MARC is the most widely used scheme (65 percent), fol- Names (TGN), and the Getty Union List of Artists Names lowed by EAD (43 percent), Unqualified DC (30 percent), (ULAN), which were used by between 30 percent to more and Qualified DC (29 percent). The different levels of use than 60 percent of respondents.8 The OCLC RLG survey reported by these studies are probably due to different reports similar results; however, nearly half of the OCLC sample sizes and compositions, but results nonetheless RLG survey respondents (N = 9) indicated that they had suggest that metadata use at research institutions tends to also built and maintained one or more locally developed rely on a small number of major schemata.5 thesauri.9 There may in fact be much greater diversity in meta- While creating and sharing information about local data use patterns when the scope is expanded to include metadata implementations is an important step toward both research and nonresearch institutions. Palmer, increased interoperability, recent studies tend to paint a Zavalina, and Mustafoff, for example, tracked trends from grim picture of current local documentation practices and 2003 through 2006 in metadata selection and application open accessibility. In a nationwide study of institutional practices at more than 160 digital collections developed repositories in U.S. academic libraries, Markey et al. through Institute of Museum and Library Services grants. found that only 61.3 percent of