Metadata Demystified: a Guide for Publishers
Total Page:16
File Type:pdf, Size:1020Kb
ISBN 1-880124-59-9 Metadata Demystified: A Guide for Publishers Table of Contents What Metadata Is 1 What Metadata Isn’t 3 XML 3 Identifiers 4 Why Metadata Is Important 6 What Metadata Means to the Publisher 6 What Metadata Means to the Reader 6 Book-Oriented Metadata Practices 8 ONIX 9 Journal-Oriented Metadata Practices 10 ONIX for Serials 10 JWP On the Exchange of Serials Subscription Information 10 CrossRef 11 The Open Archives Initiative 13 Conclusion 13 Where To Go From Here 13 Compendium of Cited Resources 14 About the Authors and Publishers 15 Published by: The Sheridan Press & NISO Press Contributing Editors: Pat Harris, Susan Parente, Kevin Pirkey, Greg Suprock, Mark Witkowski Authors: Amy Brand, Frank Daly, Barbara Meyers Copyright 2003, The Sheridan Press and NISO Press Printed July 2003 Metadata Demystified: A Guide for Publishers This guide presents an overview of evolving classified according to a variety of specific metadata conventions in publishing, as well as functions, such as technical metadata for related initiatives designed to standardize how technical processes, rights metadata for rights metadata is structured and disseminated resolution, and preservation metadata for online. Focusing on strategic rather than digital archiving, this guide focuses on technical considerations in the business of descriptive metadata, or metadata that publishing, this guide offers insight into how characterizes the content itself. book and journal publishers can streamline the various metadata-based operations at work Occurrences of metadata vary tremendously in their companies and leverage that metadata in richness; that is, how much or how little for added exposure through digital media such of the entity being described is actually as the Web. This exposure is an additional captured in the metadata record. The way of sharing information about content. It strategic decisions publishers make about benefits not only publishers, but also potential metadata often concern how much to expose. readers who seek access to published products The answer to this question depends on the and the resource discovery environment more application at hand. In order to enable generally. reference linking across publisher platforms, for instance, the number of metadata Publishers work with metadata on a daily elements required is minimal, often less than basis. It is in the manuscript tracking process, what occurs in a typical citation. The in internal reports and content management CrossRef metadata set, which we will look at systems, in marketing copy, and in the in section 5, contains only a handful of information transmitted to the supply chain. required elements. For electronic Whenever publishers complete copyright bookselling, where one role of metadata is to registration forms or supply promotional and approximate the experience of perusing a library cataloging information during the physical book in a bookstore, the richer the editorial/production process, they create metadata record, the better. Hence, the metadata. Similarly, whenever authors cite Online Information Exchange (ONIX) other publications, or libraries record their standard for books specifies over 200 holdings, they create metadata. elements. To illustrate what metadata is, let’s look at a What Metadata Is simple metadata standard called Dublin The term metadata refers to information Core. The Dublin Core Metadata Initiative about information or, equivalently, data about (DCMI) got underway in 1995 as a joint data. In current practice, the term has come to effort among professionals from the mean structured information that feeds into publishing, library, and academic automated processes, and this is currently the communities. One outcome of this effort was most useful way to think about metadata. This the Dublin Core Metadata Element Set, definition holds whether the publication that which became a NISO standard in 2001 the metadata describes is in print or electronic (ANSI/NISO Z39.85-2001) and an form. While metadata in publishing can be international standard (ISO 15836) in 2003. The Sheridan Press / NISO Press 1 The DCMI standard includes fifteen optional metadata elements and the record layout for metadata elements for describing cross- transmitting those elements. genre, cross-disciplinary information resources. These elements are: title, creator, Standards-building is an ongoing, collaborative subject, description, publisher, contributor, process in which book and journal publishers date, type, format, identifier, source, should participate. Despite the fact that a much language, relation, coverage, and rights. greater proportion of journal content than book Some of these elements relate to the content content is digitized, publisher-driven of the item, some to the item as intellectual standardization initiatives in book publishing property, and others to the particular are more advanced than in journal publishing. instantiation, or version of the item. Book publishers have been driven toward standardization in order to capitalize on The Dublin Core website (http://dublincore.org) aggregated bookselling—traditionally via uses its own metadata scheme to display wholesalers and now through the Internet— document information. Table 1 shows a three- which has required them to conform to element Dublin Core record. standards for supplying promotional metadata. Even existing standards have a routine review The left-hand column lists element types, process to incorporate new features, and and the right-hand column assigns element publishers can take part via organizations such values for this particular document. Dublin as the National Information Standards Core has been mapped to several other Organization (NISO, http://www.niso.org), in metadata formats, including the Machine order to have input on how both current and Readable Cataloging (MARC) 21 new standards take shape. bibliographic format for representation and exchange of bibliographic information that The remainder of this document is structured most library catalogs use today. See as follows: In the next section, we will refine http://www.loc.gov/marc for more our operational definition of metadata by information. explaining its relationship to Extensible Markup Language (XML) and to identifiers. Metadata in the publishing and Then we will look at the internal and communication cycle is not new. What is external roles of metadata in today’s relatively new to the broader publishing publishing companies, and why metadata has community, and crucial for interoperability become a strategic issue. Next, we will turn in the digital age, is standardization. This is to metadata practices and trends in book the process of building consensus around publishing. In the final section, we will best practices in the formatting and use of discuss evolving standards in journal metadata for specific applications, so that publishing. machines can interpret and exchange this information efficiently. In recent years, Along the way, we will provide pointers to clear standards have emerged to define tools and resources that publishers should be Table 1. Dublin Core Record Title Overview of Documentation for DCMI Metadata Terms Identifier http://dublincore.org/usage/documents/overview Description of Document This page provides an overview of official documentation of all DCMI metadata terms. 2 Metadata Demystified familiar with as they embark on integrating XML syntax. XML uses a simple syntax that automated metadata processes into their both people and machines can easily process. content management, production, and The syntax consists of matching start and end marketing/supply systems. A handful of tags, such as <journal> and </journal>, to sample metadata records will be displayed, mark up information elements. These tags but these are not intended to replace can also be associated with attributes, also implementation guidelines for the various known as name-value pairs (e.g., type = standards they illustrate, nor do they reflect “print”). the full range of metadata schemes, standards, and initiatives presently in use across the Document Type Definition (DTD). An XML information industry. DTD provides a description (actually expressed in Standard Generalized Markup Language, or SGML) of the building blocks What Metadata Isn’t of any type of XML document, whether that The term metadata has come to refer to document is a list, a metadata record, a standardized, structured information that journal article, or a whole book. It includes machines can interpret and use. The what to call different types of elements, how boundaries of this definition often overlap, yet they should be ordered, and how they are not to be confused with, two related sets interrelate. Some DTDs are proprietary— of conventions: XML, a widely adopted created by a company for their internal standard for structuring and exchanging data, use—while others are standardized and and identifiers, which are standards for freely available. The latter include the uniquely naming a piece of content or metadata formats we will discuss in sections intellectual property. In this section we take a 4 and 5. brief look at XML and identifiers to explain their relation to metadata. XML schema. An XML schema (also called an XSD file) is itself an XML document and is an alternative to the DTD that provides XML developers with enhanced validation Although not a programming language per se, capabilities and more refined tools for XML is a language for expressing rules that