Metadata Demystified: a Guide for Publishers

Total Page:16

File Type:pdf, Size:1020Kb

Metadata Demystified: a Guide for Publishers ISBN 1-880124-59-9 Metadata Demystified: A Guide for Publishers Table of Contents What Metadata Is 1 What Metadata Isn’t 3 XML 3 Identifiers 4 Why Metadata Is Important 6 What Metadata Means to the Publisher 6 What Metadata Means to the Reader 6 Book-Oriented Metadata Practices 8 ONIX 9 Journal-Oriented Metadata Practices 10 ONIX for Serials 10 JWP On the Exchange of Serials Subscription Information 10 CrossRef 11 The Open Archives Initiative 13 Conclusion 13 Where To Go From Here 13 Compendium of Cited Resources 14 About the Authors and Publishers 15 Published by: The Sheridan Press & NISO Press Contributing Editors: Pat Harris, Susan Parente, Kevin Pirkey, Greg Suprock, Mark Witkowski Authors: Amy Brand, Frank Daly, Barbara Meyers Copyright 2003, The Sheridan Press and NISO Press Printed July 2003 Metadata Demystified: A Guide for Publishers This guide presents an overview of evolving classified according to a variety of specific metadata conventions in publishing, as well as functions, such as technical metadata for related initiatives designed to standardize how technical processes, rights metadata for rights metadata is structured and disseminated resolution, and preservation metadata for online. Focusing on strategic rather than digital archiving, this guide focuses on technical considerations in the business of descriptive metadata, or metadata that publishing, this guide offers insight into how characterizes the content itself. book and journal publishers can streamline the various metadata-based operations at work Occurrences of metadata vary tremendously in their companies and leverage that metadata in richness; that is, how much or how little for added exposure through digital media such of the entity being described is actually as the Web. This exposure is an additional captured in the metadata record. The way of sharing information about content. It strategic decisions publishers make about benefits not only publishers, but also potential metadata often concern how much to expose. readers who seek access to published products The answer to this question depends on the and the resource discovery environment more application at hand. In order to enable generally. reference linking across publisher platforms, for instance, the number of metadata Publishers work with metadata on a daily elements required is minimal, often less than basis. It is in the manuscript tracking process, what occurs in a typical citation. The in internal reports and content management CrossRef metadata set, which we will look at systems, in marketing copy, and in the in section 5, contains only a handful of information transmitted to the supply chain. required elements. For electronic Whenever publishers complete copyright bookselling, where one role of metadata is to registration forms or supply promotional and approximate the experience of perusing a library cataloging information during the physical book in a bookstore, the richer the editorial/production process, they create metadata record, the better. Hence, the metadata. Similarly, whenever authors cite Online Information Exchange (ONIX) other publications, or libraries record their standard for books specifies over 200 holdings, they create metadata. elements. To illustrate what metadata is, let’s look at a What Metadata Is simple metadata standard called Dublin The term metadata refers to information Core. The Dublin Core Metadata Initiative about information or, equivalently, data about (DCMI) got underway in 1995 as a joint data. In current practice, the term has come to effort among professionals from the mean structured information that feeds into publishing, library, and academic automated processes, and this is currently the communities. One outcome of this effort was most useful way to think about metadata. This the Dublin Core Metadata Element Set, definition holds whether the publication that which became a NISO standard in 2001 the metadata describes is in print or electronic (ANSI/NISO Z39.85-2001) and an form. While metadata in publishing can be international standard (ISO 15836) in 2003. The Sheridan Press / NISO Press 1 The DCMI standard includes fifteen optional metadata elements and the record layout for metadata elements for describing cross- transmitting those elements. genre, cross-disciplinary information resources. These elements are: title, creator, Standards-building is an ongoing, collaborative subject, description, publisher, contributor, process in which book and journal publishers date, type, format, identifier, source, should participate. Despite the fact that a much language, relation, coverage, and rights. greater proportion of journal content than book Some of these elements relate to the content content is digitized, publisher-driven of the item, some to the item as intellectual standardization initiatives in book publishing property, and others to the particular are more advanced than in journal publishing. instantiation, or version of the item. Book publishers have been driven toward standardization in order to capitalize on The Dublin Core website (http://dublincore.org) aggregated bookselling—traditionally via uses its own metadata scheme to display wholesalers and now through the Internet— document information. Table 1 shows a three- which has required them to conform to element Dublin Core record. standards for supplying promotional metadata. Even existing standards have a routine review The left-hand column lists element types, process to incorporate new features, and and the right-hand column assigns element publishers can take part via organizations such values for this particular document. Dublin as the National Information Standards Core has been mapped to several other Organization (NISO, http://www.niso.org), in metadata formats, including the Machine order to have input on how both current and Readable Cataloging (MARC) 21 new standards take shape. bibliographic format for representation and exchange of bibliographic information that The remainder of this document is structured most library catalogs use today. See as follows: In the next section, we will refine http://www.loc.gov/marc for more our operational definition of metadata by information. explaining its relationship to Extensible Markup Language (XML) and to identifiers. Metadata in the publishing and Then we will look at the internal and communication cycle is not new. What is external roles of metadata in today’s relatively new to the broader publishing publishing companies, and why metadata has community, and crucial for interoperability become a strategic issue. Next, we will turn in the digital age, is standardization. This is to metadata practices and trends in book the process of building consensus around publishing. In the final section, we will best practices in the formatting and use of discuss evolving standards in journal metadata for specific applications, so that publishing. machines can interpret and exchange this information efficiently. In recent years, Along the way, we will provide pointers to clear standards have emerged to define tools and resources that publishers should be Table 1. Dublin Core Record Title Overview of Documentation for DCMI Metadata Terms Identifier http://dublincore.org/usage/documents/overview Description of Document This page provides an overview of official documentation of all DCMI metadata terms. 2 Metadata Demystified familiar with as they embark on integrating XML syntax. XML uses a simple syntax that automated metadata processes into their both people and machines can easily process. content management, production, and The syntax consists of matching start and end marketing/supply systems. A handful of tags, such as <journal> and </journal>, to sample metadata records will be displayed, mark up information elements. These tags but these are not intended to replace can also be associated with attributes, also implementation guidelines for the various known as name-value pairs (e.g., type = standards they illustrate, nor do they reflect “print”). the full range of metadata schemes, standards, and initiatives presently in use across the Document Type Definition (DTD). An XML information industry. DTD provides a description (actually expressed in Standard Generalized Markup Language, or SGML) of the building blocks What Metadata Isn’t of any type of XML document, whether that The term metadata has come to refer to document is a list, a metadata record, a standardized, structured information that journal article, or a whole book. It includes machines can interpret and use. The what to call different types of elements, how boundaries of this definition often overlap, yet they should be ordered, and how they are not to be confused with, two related sets interrelate. Some DTDs are proprietary— of conventions: XML, a widely adopted created by a company for their internal standard for structuring and exchanging data, use—while others are standardized and and identifiers, which are standards for freely available. The latter include the uniquely naming a piece of content or metadata formats we will discuss in sections intellectual property. In this section we take a 4 and 5. brief look at XML and identifiers to explain their relation to metadata. XML schema. An XML schema (also called an XSD file) is itself an XML document and is an alternative to the DTD that provides XML developers with enhanced validation Although not a programming language per se, capabilities and more refined tools for XML is a language for expressing rules that
Recommended publications
  • Collecting and Preserving Digital Materials
    COLLECTING AND PRESERVING DIGITAL MATERIALS A HOW-TO GUIDE FOR HISTORICAL SOCIETIES BY SOPHIE SHILLING CONTENTS Foreword Preface 1 Introduction 2 Digital material creation Born-digital materials Digitisation 3 Project planning Write a plan Create a workflow Policies and procedures Funding Getting everyone on-board 4 Select Bitstream preservation File formats Image resolution File naming conventions 5 Describe Metadata 6 Ingest Software Digital storage 7 Access and outreach Copyright Culturally sensitive content 8 Community 9 Glossary Bibliography i Foreword FOREWORD How the collection and research landscape has changed!! In 2000 the Federation of Australian Historical Societies commissioned Bronwyn Wilson to prepare a training guide for historical societies on the collection of cultural materials. Its purpose was to advise societies on the need to gather and collect contemporary material of diverse types for the benefit of future generations of researchers. The material that she discussed was essentially in hard copy format, but under the heading of ‘Electronic Media’ Bronwyn included a discussion of video tape, audio tape and the internet. Fast forward to 2018 and we inhabit a very different world because of the digital revolution. Today a very high proportion of the information generated in our technologically-driven society is created and distributed digitally, from emails to publications to images. Increasingly, collecting organisations are making their data available online, so that the modern researcher can achieve much by simply sitting at home on their computer and accessing information via services such as Trove and the increasing body of government and private material that is becoming available on the web. This creates both challenges and opportunities for historical societies.
    [Show full text]
  • Module 8 Wiki Guide
    Best Practices for Biomedical Research Data Management Harvard Medical School, The Francis A. Countway Library of Medicine Module 8 Wiki Guide Learning Objectives and Outcomes: 1. Emphasize characteristics of long-term data curation and preservation that build on and extend active data management ● It is the purview of permanent archiving and preservation to take over stewardship and ensure that the data do not become technologically obsolete and no longer permanently accessible. ○ The selection of a repository to ensure that certain technical processes are performed routinely and reliably to maintain data integrity ○ Determining the costs and steps necessary to address preservation issues such as technological obsolescence inhibiting data access ○ Consistent, citable access to data and associated contextual records ○ Ensuring that protected data stays protected through repository-governed access control ● Data Management ○ Refers to the handling, manipulation, and retention of data generated within the context of the scientific process ○ Use of this term has become more common as funding agencies require researchers to develop and implement structured plans as part of grant-funded project activities ● Digital Stewardship ○ Contributions to the longevity and usefulness of digital content by its caretakers that may occur within, but often outside of, a formal digital preservation program ○ Encompasses all activities related to the care and management of digital objects over time, and addresses all phases of the digital object lifecycle 2. Distinguish between preservation and curation ● Digital Curation ○ The combination of data curation and digital preservation ○ There tends to be a relatively strong orientation toward authenticity, trustworthiness, and long-term preservation ○ Maintaining and adding value to a trusted body of digital information for future and current use.
    [Show full text]
  • 2016 Technical Guidelines for Digitizing Cultural Heritage Materials
    September 2016 Technical Guidelines for Digitizing Cultural Heritage Materials Creation of Raster Image Files i Document Information Title Editor Technical Guidelines for Digitizing Cultural Heritage Materials: Thomas Rieger Creation of Raster Image Files Document Type Technical Guidelines Publication Date September 2016 Source Documents Title Editors Technical Guidelines for Digitizing Cultural Heritage Materials: Don Williams and Michael Creation of Raster Image Master Files Stelmach http://www.digitizationguidelines.gov/guidelines/FADGI_Still_Image- Tech_Guidelines_2010-08-24.pdf Document Type Technical Guidelines Publication Date August 2010 Title Author s Technical Guidelines for Digitizing Archival Records for Electronic Steven Puglia, Jeffrey Reed, and Access: Creation of Production Master Files – Raster Images Erin Rhodes http://www.archives.gov/preservation/technical/guidelines.pdf U.S. National Archives and Records Administration Document Type Technical Guidelines Publication Date June 2004 This work is available for worldwide use and reuse under CC0 1.0 Universal. ii Table of Contents INTRODUCTION ........................................................................................................................................... 7 SCOPE .......................................................................................................................................................... 7 THE FADGI STAR SYSTEM .......................................................................................................................
    [Show full text]
  • Repurposing Archival Theory in the Practice of Data Curation
    Repurposing Archival Theory in the Practice of Data Curation Elizabeth Rolando| Wendy Hagenmaier |Susan Wells Parham Introduction Methodology • Expansion of data curation and digital archiving services at the Georgia Tech Library • Process the same digital collection, once by data curator, once by digital archivist and Archives. • Data curation processing informed by OAIS Reference Model1, ICPSR workflow2, and • How do data curation and archival science intersect? UK Data Archive workflow3 • How can comparing data curation and archival science lead to improvements in • Archival processing informed by concepts, such as appraisal, respect des fonds, local workflows and practices? original order, and archival value4, as well documented practices at peer institutions • Compare processing plans to discover areas of agreement and areas of conflict Data Transfer Data Processing Metadata Processing Preservation Access Unique Data Curation Processing Steps -Deposit agreement modeled on -Format transformation policies -Review and enhancement of -Varied retention periods, -Datasets treated as active and institutional repository license guided by reuse over preservation README file, used to accommodate determined by Board of Regents reusable -Funding model for sustainability -Create derivatives to promote diverse depositor needs Retention Schedule and funding -Datasets linked to publications access and re-use model -Bulk or individual file download -Correct erroneous or missing data Common Processing Steps -Data quarantine -Format identification
    [Show full text]
  • Metadata Standards
    Chapter 3 METADATA STANDARDS This chapter lists the major metadata standards in use or under development. Standards are subdivided into six areas: general, transportation models, educa- tion, media-specific, preservation, and rights. Each standard lists its official URL, sponsoring agency, community of use, purpose and goals, description, potential for information organizations, and key projects. General metadata standards General metadata standards are the most common, well-known, and univer- sally accepted schemas to date. Some are general and meant to be used as universal or common-denominator standards; others are for specific communi- ties with specific information resources. Most of the metadata standards listed here have emerged as practical applications in use, or will probably emerge as the most commonly applied standards, due to broad international support, history and development, or industry application and acceptance. Dublin Core Metadata Initiative (DCMI) http://dublincore.org The Dublin Core Metadata Initiative (DCMI) is managed by an international board of trustees, but most of the direction and maintenance of the standard has been led by the Online Computer Library Center (OCLC) in Dublin, Ohio. Community of use Library Technology Reports Librarians, Web content providers, Web resource creators, metadata creators, and general public. Purpose and goals Dublin Core assists in the discovery and description of Web and electronic resources. It is designed to provide a simple descriptive metadata standard extensible to Web resources of any format or subject domain. Description www.techsource.ala.org Dublin Core is a set of 15 core elements that assist in simple description and discovery of electronic resources. The standards basic principles are the reasons for its success as a viable common-denominator metadata standard for elec- tronic resources.
    [Show full text]
  • A Framework of Guidance for Building Good Digital Collections
    A Framework of Guidance for Building Good Digital Collections 3rd edition December 2007 A NISO Recommended Practice Prepared by the NISO Framework Working Group with support from the Institute of Museum and Library Services About NISO Recommended Practices A NISO Recommended Practice is a recommended "best practice" or "guideline" for methods, materials, or practices in order to give guidance to the user. Such documents usually represent a leading edge, exceptional model, or proven industry practice. All elements of Recommended Practices are discretionary and may be used as stated or modified by the user to meet specific needs. This recommended practice may be revised or withdrawn at any time. For current information on the status of this publication contact the NISO office or visit the NISO website (www.niso.org). Published by National Information Standards Organization (NISO) One North Charles Street, Suite 1905 Baltimore, MD 21201 www.niso.org Copyright © 2007 by the National Information Standards Organization All rights reserved under International and Pan-American Copyright Conventions. For noncommercial purposes only, this publication may be reproduced or transmitted in any form or by any means without prior permission in writing from the publisher, provided it is reproduced accurately, the source of the material is identified, and the NISO copyright status is acknowledged. All inquires regarding translations into other languages or commercial reproduction or distribution should be addressed to: NISO, One North Charles Street, Suite
    [Show full text]
  • Don't WARC Away: Preservation Metadata & Web Archives
    Don't WARC Away: Preservation Metadata & Web Archives! Jefferson Bailey & Maria LaCalle, Internet Archive ALA 2015 | ALCTS PARS | June 27, 2015 @jefferson_bail | [email protected] Don't WARC Away: Preservation Metadata & Web Archives! Jefferson Bailey & Maria LaCalle, Internet Archive ALA 2015 | ALCTS PARS | June 27, 2015 @jefferson_bail | [email protected] •! We are a non-profit Digital Library & Archive founded in 1996 •! 20+PB unique data: 10PB web, ~8m text, 2m vid, 2m aud, 100K soft, etc •! We work in a former church and it’s awesome •! Developed: Heritrix, Wayback, warcprox, Umbra, NutchWax, ARC format •! Engineers, librarians/archivists, program staff •! https://archive.org/web •! Largest and oldest publicly available web archive in existence •! 485,000,000,000+ URLs (that’s billions) •! Like a billion websites, domain agnostic •! Content in 40+ Languages •! Periodic snapshot; 1b+ URLs per week •! https://archive-it.org/ •! Web archiving service used by 370+ institutions •! 3500+ collection, 10 billion+ URLs •! 49 states and 19 countries •! Libraries, archives, museums, governments, non-profits, etc. •! User groups, Annual Meeting, collaborative and educational projects What is a web archive? •! Web archiving is the process of collecting portions of web content, preserving the collections, and then providing access to the archives - for use and re use. •! A web archive is a collection of archived URLs grouped by theme, event, subject area, or web address. •! A web archive contains as much as possible from the original resources and documents the change over time. It recreates the experience a user would have had if they!had visited the live site on the day it was archived.
    [Show full text]
  • Digital Preservation Metadata for Practitioners Implementing PREMIS
    springer.com Computer Science : Computer Applications Dappert, A., Guenther, R.S., Peyrard, S. (Eds.) Digital Preservation Metadata for Practitioners Implementing PREMIS Provides an introduction to fundamental issues related to digital preservation metadata and to its practical use and implementation Bridges the gap between the formal specifications provided in the PREMIS Data Dictionary and specific implementations Addresses the needs of both practitioners and students in Library, Information and Archival Science degree programs or related fields for understanding digital preservation issues This book begins with an introduction to fundamental issues related to digital preservation Springer metadata before proceeding to in-depth coverage of issues concerning its practical use and 1st ed. 2016, XIV, 266 p. 69 implementation. It helps readers to understand which options need to be considered in 1st illus. specifying a digital preservation metadata profile to ensure it matches their individual content edition types, technical infrastructure, and organizational needs. Further, it provides practical guidance and examples, and raises important questions. It does not provide full-fledged implementation solutions, as such solutions can, by definition, only be specific to a given preservation context. Printed book As such, the book effectively bridges the gap between the formal specifications provided in a Hardcover standard, such as the PREMIS Data Dictionary – a de-facto standard that defines the core metadata required by most preservation repositories – and specific implementations.Anybody Printed book who needs to manage digital assets in any form with the intent of preserving them for an Hardcover indefinite period of time will find this book a valuable resource. The PREMIS Data Dictionary ISBN 978-3-319-43761-3 provides a data model consisting of basic entities (objects, agents, events and rights) and basic £ 54,99 | CHF 71,00 | 59,99 € | properties (called “semantic units”) that describe them.
    [Show full text]
  • Digital Preservation Metadata for Practitioners : Implementing Premis Pdf, Epub, Ebook
    DIGITAL PRESERVATION METADATA FOR PRACTITIONERS : IMPLEMENTING PREMIS PDF, EPUB, EBOOK Angela Dappert | 266 pages | 18 Jan 2017 | Springer International Publishing AG | 9783319437613 | English | Cham, Switzerland Digital Preservation Metadata for Practitioners : Implementing PREMIS PDF Book As such, the book effectively bridges the gap between the formal specifications provided in a standard, such as the PREMIS Data Dictionary — a de-facto standard that defines the core metadata required by most preservation repositories — and specific implementations. Skip to content. History The beginnings of a standardized metadata scheme for collections of digital objects can be traced back to , when UC Berkeley and the Digital Library Federation DLF initiated a project to further the concept of digital libraries sharing resources. This chapter presents such challenges and illustrates them with the choices made in the Portico preservation service. Digital preservation metadata profiles vary because of different content types held in the repository, different functions performed on them, different organizational mandates and processes, different policies, different technical platforms, and other reasons. They are using a set of tools, generally open source, to identify, harvest, store, index, make available to end users, and preserve internet content over the long term. I think that to put it simply, the PREMIS Data Dictionary provides a clear guide to what specific information needs to be known about a digital collection and its individual objects in order to best support any digital preservation activities. This is where we can support each other and develop a mutual consensus of best metadata practice for digital preservation sub-domains. By , it had become clear that METS could not only serve as an answer to the interoperability needs associated with sharing digital objects, but that METS is also valuable for preservation purposes.
    [Show full text]
  • PREMIS EC Barcelona March 2009
    Digital Preservation Metadata Angela Dappert The British Library, Planets, PREMIS EC Barcelona March 2009 Some of the slides on PREMIS are based on slides by Priscilla Caplan, Florida Center for Library Automation Rebecca Guenther, Library of Congress Brian Lavoie, OCLC Overview Introduction to Digital Preservation Metadata – What is Digital Preservation Metadata – Hands-on Exercise – Case Study: eJournals (1) Preservation Metadata in Practice – Workflow Issues – Tools and Standards – PREMIS Data Dictionary •Overview • Hands-on Exercise • Implementation Issues – Case Study: eJournals (2) Overview Introduction to Digital Preservation Metadata – What is Digital Preservation Metadata – Hands-on Exercise – Case Study: eJournals (1) Preservation Metadata in Practice – Workflow Issues – Tools and Standards – PREMIS Data Dictionary •Overview • Hands-on Exercise • Implementation Issues – Case Study: eJournals (2) Domain What is Digital Preservation Metadata? Metadata = data about data Information that is essential to ensure long-term accessibility of digital resources What is Digital Preservation Metadata? A best guess on the future – little experience with digital objects – uncertain future technical possibilities – uncertain future legal framework in which we will operate Digital objects must be self-descriptive Must be able to exist independently from the systems which were used to create them – XML (machine and human readable) Why do we need new forms of metadata? - Use Cases Supporting New Features MetaD: Semantic Information for the designated
    [Show full text]
  • Metadata Requirements and Preparing Content for Digital Preservation
    METADATA REQUIREMENTS AND PREPARING CONTENT FOR DIGITAL PRESERVATION v1.7.0 This document forms part of the Ministry of Education and Culture’s Open science and digital cultural heritage entity Licence Creative Commons Finland CC-BY-NC-SA 4.0 (https://creativecommons.org/licenses/by-nc-sa/4.0/) Users of this Specification are entitled to distribute the report, i.e. copy, circulate, display publicly and perform publicly the standard portfolio and modify it under the following conditions: . The MinistryThis document of Education forms and Culture part of is the appointed Ministry the of Original Education Author and (not, Culture’s however, so that notification would Openrefer to scien a licenseece and or digital means cultural by which heritage the Specification entity is used as supported by the licensor). The user is not entitled to use the Specification commercially. If the user makes any modifications to the Specification or uses it as the basis for their own works, the derivative work shall be distributed in the same manner or under the same type of licence. METADATA REQUIREMENTS AND PREPARING CONTENT FOR DIGITAL PRESERVATION – 1.7.0 CONTENT 1 INTRODUCTION .................................................................................................................................................. 4 1.1 Digital Preservation Services ............................................................................................................................. 4 1.2 Resource Description .......................................................................................................................................
    [Show full text]
  • Metadata for the Open Data Portals
    Metadata for the open data portals Discussion Paper No. 6, December 2016, Joined-up Data Standards Project Beata Lisowska Data Scientist, Development Initiatives Contents Introduction .............................................................................................................................. 2 Platforms used for open data portals ...................................................................................... 3 Metadata standards .................................................................................................................. 4 Core metadata standards ........................................................................ 5 RDF Data Cube vocabulary .................................................................... 5 Dublin Core ............................................................................................ 5 Data Catalog Vocabulary (DCAT) ........................................................... 6 Other metadata standards ...................................................................... 6 Geographic metadata standards ............................................................. 6 ISO 19115 .............................................................................................. 6 Open data portals and metadata ............................................................................................. 8 Socrata and CKAN ................................................................................... 8 DKAN .......................................................................................................
    [Show full text]