ISBN 1-880124-59-9 Demystified: A Guide for Publishers

Table of Contents

What Metadata Is 1 What Metadata Isn’t 3 XML 3 Identifiers 4 Why Metadata Is Important 6 What Metadata Means to the Publisher 6 What Metadata Means to the Reader 6 -Oriented Metadata Practices 8 ONIX 9 Journal-Oriented Metadata Practices 10 ONIX for Serials 10 JWP On the Exchange of Serials Subscription Information 10 CrossRef 11 The Open Initiative 13 Conclusion 13 Where To Go From Here 13 Compendium of Cited Resources 14 About the Authors and Publishers 15

Published by: The Sheridan Press & NISO Press

Contributing Editors: Pat Harris, Susan Parente, Kevin Pirkey, Greg Suprock, Mark Witkowski

Authors: Amy Brand, Frank Daly, Barbara Meyers

Copyright 2003, The Sheridan Press and NISO Press Printed July 2003 Metadata Demystified: A Guide for Publishers

This guide presents an overview of evolving classified according to a variety of specific metadata conventions in publishing, as well as functions, such as technical metadata for related initiatives designed to standardize how technical processes, rights metadata for rights metadata is structured and disseminated resolution, and preservation metadata for online. Focusing on strategic rather than digital archiving, this guide focuses on technical considerations in the business of descriptive metadata, or metadata that publishing, this guide offers insight into how characterizes the content itself. book and journal publishers can streamline the various metadata-based operations at work Occurrences of metadata vary tremendously in their companies and leverage that metadata in richness; that is, how much or how little for added exposure through digital media such of the entity being described is actually as the Web. This exposure is an additional captured in the metadata record. The way of sharing information about content. It strategic decisions publishers make about benefits not only publishers, but also potential metadata often concern how much to expose. readers who seek access to published products The answer to this question depends on the and the resource discovery environment more application at hand. In order to enable generally. reference linking across publisher platforms, for instance, the number of metadata Publishers work with metadata on a daily elements required is minimal, often less than basis. It is in the manuscript tracking process, what occurs in a typical citation. The in internal reports and content management CrossRef metadata set, which we will look at systems, in marketing copy, and in the in section 5, contains only a handful of information transmitted to the supply chain. required elements. For electronic Whenever publishers complete copyright bookselling, where one role of metadata is to registration forms or supply promotional and approximate the experience of perusing a cataloging information during the physical book in a bookstore, the richer the editorial/production process, they create metadata record, the better. Hence, the metadata. Similarly, whenever authors cite Online Information Exchange (ONIX) other publications, or record their standard for specifies over 200 holdings, they create metadata. elements.

To illustrate what metadata is, let’s look at a What Metadata Is simple metadata standard called Dublin The term metadata refers to information Core. The Metadata Initiative about information or, equivalently, data about (DCMI) got underway in 1995 as a joint data. In current practice, the term has come to effort among professionals from the mean structured information that feeds into publishing, library, and academic automated processes, and this is currently the communities. One outcome of this effort was most useful way to think about metadata. This the Dublin Core Metadata Element Set, definition holds whether the publication that which became a NISO standard in 2001 the metadata describes is in print or electronic (ANSI/NISO Z39.85-2001) and an form. While metadata in publishing can be international standard (ISO 15836) in 2003.

The Sheridan Press / NISO Press 1 The DCMI standard includes fifteen optional metadata elements and the record layout for metadata elements for describing cross- transmitting those elements. genre, cross-disciplinary information resources. These elements are: title, creator, Standards-building is an ongoing, collaborative subject, description, publisher, contributor, process in which book and journal publishers date, type, format, identifier, source, should participate. Despite the fact that a much language, relation, coverage, and rights. greater proportion of journal content than book Some of these elements relate to the content content is digitized, publisher-driven of the item, some to the item as intellectual standardization initiatives in book publishing property, and others to the particular are more advanced than in journal publishing. instantiation, or version of the item. Book publishers have been driven toward standardization in order to capitalize on The Dublin Core website (http://dublincore.org) aggregated bookselling—traditionally via uses its own metadata scheme to display wholesalers and now through the Internet— document information. Table 1 shows a three- which has required them to conform to element Dublin Core record. standards for supplying promotional metadata. Even existing standards have a routine review The left-hand column lists element types, process to incorporate new features, and and the right-hand column assigns element publishers can take part via organizations such values for this particular document. Dublin as the National Information Standards Core has been mapped to several other Organization (NISO, http://www.niso.org), in metadata formats, including the Machine order to have input on how both current and Readable Cataloging (MARC) 21 new standards take shape. bibliographic format for representation and exchange of bibliographic information that The remainder of this document is structured most library catalogs use today. See as follows: In the next section, we will refine http://www.loc.gov/marc for more our operational definition of metadata by information. explaining its relationship to Extensible Markup Language (XML) and to identifiers. Metadata in the publishing and Then we will look at the internal and communication cycle is not new. What is external roles of metadata in today’s relatively new to the broader publishing publishing companies, and why metadata has community, and crucial for interoperability become a strategic issue. Next, we will turn in the digital age, is standardization. This is to metadata practices and trends in book the process of building consensus around publishing. In the final section, we will best practices in the formatting and use of discuss evolving standards in journal metadata for specific applications, so that publishing. machines can interpret and exchange this information efficiently. In recent years, Along the way, we will provide pointers to clear standards have emerged to define tools and resources that publishers should be

Table 1. Dublin Core Record Title Overview of Documentation for DCMI Metadata Terms Identifier http://dublincore.org/usage/documents/overview Description of Document This page provides an overview of official documentation of all DCMI metadata terms.

2 Metadata Demystified familiar with as they embark on integrating XML syntax. XML uses a simple syntax that automated metadata processes into their both people and machines can easily process. content management, production, and The syntax consists of matching start and end marketing/supply systems. A handful of tags, such as and , to sample metadata records will be displayed, mark up information elements. These tags but these are not intended to replace can also be associated with attributes, also implementation guidelines for the various known as name-value pairs (e.g., type = standards they illustrate, nor do they reflect “print”). the full range of metadata schemes, standards, and initiatives presently in use across the Document Type Definition (DTD). An XML information industry. DTD provides a description (actually expressed in Standard Generalized Markup Language, or SGML) of the building blocks What Metadata Isn’t of any type of XML document, whether that The term metadata has come to refer to document is a list, a metadata record, a standardized, structured information that journal article, or a whole book. It includes machines can interpret and use. The what to call different types of elements, how boundaries of this definition often overlap, yet they should be ordered, and how they are not to be confused with, two related sets interrelate. Some DTDs are proprietary— of conventions: XML, a widely adopted created by a company for their internal standard for structuring and exchanging data, use—while others are standardized and and identifiers, which are standards for freely available. The latter include the uniquely naming a piece of content or metadata formats we will discuss in sections intellectual property. In this section we take a 4 and 5. brief look at XML and identifiers to explain their relation to metadata. XML schema. An XML schema (also called an XSD file) is itself an XML document and is an alternative to the DTD that provides XML developers with enhanced validation Although not a programming language per se, capabilities and more refined tools for XML is a language for expressing rules that structuring their own XML-based formats. give structure to any kind of textual data, Whereas DTDs only allow for relatively including but not limited to metadata. One simple data types, a schema has a set of way to think about XML in this context is as powerful, flexible semantics for defining what the information “wrapper” or container of an XML file can contain. choice for your metadata. XML has been widely adopted because it was designed for XML workflow. This is not a technical precisely the kind of data transfer that term, but a way of describing the comprehensive electronic publishing requires. infrastructure that publishers put in place It also provides an application-independent in order to capture data in XML format as method for sharing data, and because it is free needed and streamline processes for to license, XML can save publishers money creating, re-purposing, and disseminating through the use of inexpensive, off-the-shelf that data. tools. A large part of its power comes from the nearly universal support it receives from For additional information on XML, product vendors, standards bodies, academia, go to http://www.w3.org/XML and the open source community.

The Sheridan Press / NISO Press 3 Identifiers ISBN. The International Standard Book Number (ISBN/ISO 2108) is a ten-digit Identifiers are names or strings adhering to numeric string (e.g., ISBN 0-500-27664-1) certain conventions that, if properly employed, that uniquely identifies each manifestation ensure uniqueness. While standard identifiers of a book or non-serial publication. for publications have been in use for decades, Although sub-parts of the ISBN identify unambiguous identification of content entities country (or language area) and publisher, has become especially important for ISBNs strings as a whole are opaque electronic publishing and e-commerce (“dumb numbers”), non-actionable at platforms. Identifiers and metadata are not present, publisher-driven, and not currently one and the same, yet identifiers are most associated with a . The useful in association with metadata. ISBN standard is now undergoing a revision process that will increase the ISBN Identifiers for book and journal publishing string to thirteen digits, in addition, can be characterized according to six publishers will be encouraged to deposit a parameters or features: core set of metadata as part of the • Whether the identifier itself is registration process. transparent (derivable) or opaque (not inherently meaningful or interpretable). For more information on the ISBN, go to • Whether the entity that the identifier http://www.isbn.org/standards/home/index.asp points to is a work (an abstraction, not tied to any particular physical medium) ISTC. The International Standard Text Code or a manifestation (an exemplar in a (ISTC/ISO 21047) is a new numbering particular physical medium of a work, scheme, mainly but not exclusively for books, such as the online version of a journal under development for unique identification that exists in multiple media). of textual works, as opposed to • Whether or not the identifier is manifestations. It is intended to be opaque, actionable in the electronic actionable, and persistent, and potentially to environment, so that clicking on it be assigned as soon as a work is conceived by takes you directly to the thing being a creator or author. It may potentially be used named; for example, the URL as an overarching identifier to tie together the identifier. various related identifiers registered at the • If actionable, whether or not the manifestation level. As part of the ISTC identifier is truly persistent: that is, registration process, descriptive metadata will designed to withstand changes in the be captured by the ISTC registration agency online location of content identified. and will include, at minimum, a title, the URLs are actionable but not persistent. name of the author or contributor, a unique • Who drives or regulates the identifier identifier for the ISTC registrant, registration registration process (e.g., the author, date, and whether the work identified is a publisher, or library community). derived or original. • Whether descriptive metadata is registered in association with the For more information on the ISTC, go to identifier. http://www.nlc-bnc.ca/iso/tc46sc9/istc.htm

The following identifiers are the most familiar ISSN. The International Standard Serial ones for books and journals in terms of these Number (ISO 3297) is an opaque eight-digit properties. number (e.g., 1234-1231) for unique

4 Metadata Demystified identification of journals and other serial DOI. The Digital Object Identifier (DOI) resources; the same serial in a different syntax is a more recent open standard physical medium is assigned a different (ANSI/NISO Z39.84-2000). The DOI system ISSN, and title changes to serials frequently is a complete system for implementing call for new ISSNs as well. ISSN assignment persistent identifiers, and DOIs themselves is a regulated process. ISSNs are assigned by are variable-length alphanumeric strings (e.g., ISSN national centers; publishers should doi:10.1101/gr.10.12.1841) assigned by contact their national ISSN center to request publishers at any level of granularity. an ISSN assignment. The National Serials Although the DOI is designed to be opaque, Data Program (NSDP) at the Library of the DOI suffix can incorporate other existing Congress coordinates the U.S. ISSN identifiers as an option. The DOI is also program. actionable; one click on a properly implemented DOI gets the user to the Each ISSN assigned to a serial publication is location of the content being identified. The registered in an international database (the DOI is persistent because it is paired with the ISSN Register), along with a relatively rich content object’s electronic address, or URL, metadata record. Among the bibliographic in an updateable central directory and elements in these records are ISSN, key title, published in place of the URL; this avoids abbreviated key title, frequency of broken links while allowing the content to publication, language, other forms of the title, move as needed. Although the content that it place of publication, publisher, former title(s), is linked to may take the form of a pointers to other language editions and other manifestation (the electronic version of an media editions, and URLs. ISSN records are article, for instance), the DOI can function as available in MARC-compatible format. a work-level identifier when it is associated with a rich set of metadata elements For more information on the ISSN, describing a work. go to http://www.issn.org Declaration of kernel metadata is in the SICI. The Serial Item and Contribution process of becoming mandatory for all DOIs Identifier standard (SICI) (e.g. SICI: 0002- in the global DOI directory; the requisite 8231(199412)45:10<>1.1.TX;2-M) is a NISO metadata follows a carefully designed scheme standard (ANSI/NISO Z39.56-1996) for the based on indecs (http://www.indecs.org) to unique identification of a serials issue or maximize interoperability. This way, the DOI article, regardless of the distribution medium. can support a range of applications for SICI is designed to be dynamically electronic content, such as e-commerce, constructed and in that sense is a transparent management of rights and permissions, and identifier. It is neither actionable nor the creation of learning objects. For associated with a metadata record in its example, among the official DOI registration current implementation. Due to their strict, agencies of the International DOI derivable format, SICIs can be created and Foundation, Learning Objects Network used by anyone involved in serials (http://www.learningobjectsnetwork.com) is management, and automated SICI generators applying DOI functionality to SCORM- have been created for this purpose. compliant learning objects. Sharable Content Object Reference Model (SCORM) consists For more information on the SICI, of metadata specifications for a range of go to http://sunsite.berkeley.edu/SICI e-learning content applications. See http://www.adlnet.org for more information.

The Sheridan Press / NISO Press 5 For more information on the DOI, database can feed multiple metadata templates go to http://www.doi.org corresponding to the formats required for different purposes, both internal and external. This selection of identifier standards currently Given such a system, responsibility for in use in book and journal publishing validating the data can be easily shared across indicates a clear trend toward identifiers with departments. At the same time, any update to the following properties: actionability, an information element, such as the title or persistence, opacity, and association with the price, is automatically propagated to all metadata. Identifiers with these characteristics outputs. best meet the demands of the digital medium. While the ISBN, ISSN, and SICI are not While supplying structured metadata currently actionable, they could well be in the according to several formats may seem like a future. Actionable, persistent identifiers add huge task, the web of mappings among value to publications because they enable new common metadata standards continues to functionality and work reliably in the Web grow, and there are many shared elements environment. Identifiers do not need to be across the different standards. For example, transparent or inherently meaningful if they the data elements currently proposed for the are associated with descriptive metadata and new ISBN kernel were developed as a subset primarily interpreted by machines. Finally, of ONIX and are a subset of Dublin Core. All registration of an identifier along with of the standards a publisher now encounters metadata lays the groundwork for are likely to be tagged in XML and to constructing other automated services around function across several formats. the content being identified. The benefits of structuring and tagging text apply not only to metadata but to full-text Why Metadata Is Important content. Full-text mark-up of books and Metadata can take many forms, and metadata journals allows them to be readily re-purposed records can vary tremendously in richness, for course-packs, in derivative works creating an array of content management and requiring a subset or re-ordering of the economic models. original content, and as input to emerging archival standards. The key to successful metadata usage is to develop the systems and What metadata means to procedures necessary to maintain and the publisher disseminate metadata as an integral part of the Publishers benefit in many ways from publication process. Creating structured automating and streamlining their internal metadata as a normal part of the production metadata practices. In book publishing, it is workflow allows a publisher to provide still common to see employees in different consistent information about products to all departments re-keying the same descriptive the communities using that information. information for different purposes; for instance, when a new contract is logged, when What metadata means to that same manuscript is launched for editing, when its marketing and catalog copy is the reader created, etc. With appropriate back-office Many of the advantages that publishers reap tools and procedures in place, a publisher can from effective use of metadata turn out to set up a database of metadata elements benefit the reader and research communities compiled from the various departments. This as well. For example, the online aggregation

6 Metadata Demystified of book metadata brought about by transformed the research process. One centralized Internet bookselling was a boon important metadata-driven trend is toward for publishers, who saw an unprecedented virtual, or distributed, aggregation of surge in sales of the backlist titles that they no information resources. Researchers who have longer promoted through established long relied on specialist databases to mine channels; it was equally an advantage for authoritative information resources in their scholars seeking out those obscure backlist fields now turn to powerful search engines titles. Readers, for the first time, had at their that index, but do not aggregate, those disposal an easy way to search across a resources. The more robust the metadata that comprehensive, cross-publisher database of publishers expose for this purpose, the more available books and complete a purchase. The they will benefit from this trend. Interlinking digital medium has made published of resources is another example of distributed information easier to disseminate, search, and integration in e-journal publishing. Both sell, and metadata plays a critical role in these publisher and researcher benefit from advantages. initiatives that use metadata and identifier registration to enable cross-publisher linking In the publication of journals, cross-publisher without aggregation of any proprietary metadata has traditionally been aggregated by content. (The term distributed integration is intermediaries, or secondary publishers, who attributed to Brian Schottlaender; see create sophisticated tools and services (e.g., Schottlaender, B., “Portals for Integration and citation indexing and resource discovery) Collaboration” presented at the AAP/PSP around subject-based databases of Annual Conference, Washington, DC, bibliographic information and journal article February 2003.) abstracts. The process of compiling this metadata has been substantially automated, Publishers are now cooperating directly with although there are still some manual one another, with some exposing not only components, such as selecting content for their metadata but also their full text for inclusion, classification of content, and search and navigation purposes. In addition, writing abstracts where they are not already automated tools for the intelligent available. classification of content have become more available. As a result of these trends, there Abstracting and indexing (A&I) services have will be less of a need for manual aggregation been a source of income for publishers who of subject-based resources in the future. As sell metadata or have their own secondary publisher-supplied metadata grows to include divisions. Publishers also earn income from more semantic information about a aggregators that license full-text content or publication, conceptually based research tools link back to publishers and thereby drive will also evolve. As standards emerge for article sales and journal subscriptions. These capturing metadata associated with the business models are currently in flux. Many individual user (e.g., access rights profiles publishers now have their own journal and personal preferences), frameworks will be websites where they freely provide required for structuring how that kind of bibliographic information, abstracts, tables of metadata interacts with the metadata for contents, and other resources that they may information resources. A number of initiatives previously have considered proprietary. are currently underway to specify, at a high level, how metadata standards for different From the end-user perspective, metadata and domains (publications, individuals, e- its innovative use by publishers have already commerce applications) should interoperate.

The Sheridan Press / NISO Press 7 (See, for example, http://www.indecs.org, Advisory Committee (BISAC) Title Status http://www.cores-eu.net, and format. This format was eventually http://www.w3.org/RDF). superseded by the BISAC X12 832 transaction. Both of these formats are now Metadata is thus both a marketing tool and a obsolete, although the push to adopt ONIX as way to add functionality to electronic a standard method of communicating publications. It allows publishers to “open up” metadata has not entirely replaced them. their proprietary content for e-commerce and resource discovery applications such as Ultimately, wholesalers created web-based indexing, search, and linking, while applications for their customers that require maintaining control over their own trading the detailed range of data included in an practices. ONIX record. At present, wholesalers are using a combination of publisher-provided Book-Oriented electronic files and manual keying of data to maintain these applications. ONIX is fast Metadata Practices becoming the method wholesalers use to An explanation of how book wholesalers, update their web products, and the same retailers, and libraries use metadata will wholesalers have been licensing their clarify why metadata is becoming critical to databases to Internet booksellers for use on the overall success of every publisher. bookseller sites. The data that publishers Historically, wholesalers obtained their provide to wholesalers, therefore, not only information about forthcoming titles from updates their internal file but also visits by publisher sales representatives feeds the wholesalers’ websites and often and/or catalogs. The wholesaler used the those of several Internet booksellers. Like information in a publisher’s catalog to update wholesalers, booksellers require detailed their in-house inventory database manually, information about titles to decide on the re-keying the data elements their system initial buy. They also require an easy needed to track customer demand and order method of placing basic information in their a title. inventory management system, and those with websites require rich metadata for their This inventory database was often the source promotional web pages. of the catalogs and selection lists the wholesaler created and mailed to their Many library suppliers have developed web- customers (mainly libraries for research and based search and order applications that scholarly titles). As wholesalers expanded resemble an Internet retailer’s site, with the number of book titles they stocked or jacket image, table of contents, first were willing to obtain for their customers, chapter, and so on. These sites allow the cost of this re-keying of data increased. librarians to access considerably more At the same time, the shift in technology information about a title than could be from microfiche to CD-ROM and then to the provided in a catalog. Librarians are also Web increased the amount of information licensing wholesaler and bibliographic that publishers provided on each title. databases such as Bowker’s Books in Print for use on their internal acquisitions These factors led wholesalers to seek an systems and Online Public Access Catalogs electronic means of obtaining title (OPACs), and beginning to use portions of information from publishers. The earliest publishers’ ONIX records to enhance their standard was the Book Industry Systems MARC record data.

8 Metadata Demystified ONIX availability in different markets, and promotional information, as well as The ONIX initiative got underway in 1999, comprehensive bibliographic information. with the American Association of Publishers (AAP) bringing together the major publishers, The following examples show part of the wholesalers, online retailers, and book same ONIX sample record, in the first box information services personnel to create a using plain text “reference names” in XML, universal, international format in which all and in the second using short tags: trading partners, regardless of their size, could exchange information about books. The working group released ONIX 1.0 in January 2000. Release 2.1 of ONIX is currently in 02 development. 0816016356 ONIX is now published and maintained by BB EDItEUR in association with the Book <TitleType>01</TitleType> Industry Study Group (BISG, <TitleText textcase = “02”>British English, A to http://www.bisg.org) in the U.S. and the Book Zed</TitleText> Industry Communication (BIC) in the U.K, and has become the international standard for 1 book-trade metadata. In addition to the United A01 States and United Kingdom, France, Schur, Norman W Germany, and Korea have set up national A Harvard graduate in Latin and implementation groups; the ONIX DTD has Italian literature, Norman Schur attended the been extended to accommodate the trading University of Rome and the Sorbonne before returning practices in these countries. to the United States to study law at Harvard and Columbia Law Schools. Now retired from legal practise, Mr. Schur is a fluent speaker and writer of ONIX comprises both a content specification both British and American and an XML DTD. The content specification English includes a comprehensive set of carefully defined data elements, code lists and XML tags, that can be either short codes (e.g. ) or text labels (e.g. ). 02 XML schemas have also been defined for trial 0816016356 purposes. BB Originally designed for books and other non- <b202>01</b202> serial materials such as audio and point of <b203 textcase = “02”>British English, A to sale materials produced by book publishers, Zed</b203> the scope of ONIX has now grown to cover serials (see below) and a version of ONIX has A01 been developed for the video/DVD sector. Schur, Norman W A Harvard graduate in Latin and Italian ONIX data elements include structured tables literature, Norman Schur attended the University of of contents, text items (e.g. descriptions, Rome and the Sorbonne before returning to the United States to study law at Harvard and Columbia Law reviews, extracts, author biographies), images Schools. Now retired from legal practise, Mr. Schur is (e.g. jackets, author pictures, double page a fluent speaker and writer of both British and spreads), links to video, audio or websites, American English territorial rights information, price and

The Sheridan Press / NISO Press 9 Creating an ONIX message involves two focused their energies on their own proprietary basic steps: organizing the data into ONIX- journal platforms and formats. This approach specified fields and storing it in a database; is changing as libraries, publishers, and third and using an XML software application and parties exchange an increasing amount of the ONIX DTD to organize and tag that data. catalog information, serials subscription data, A single ONIX message may contain data and other structured data at multiple about multiple titles. An ONIX message is bibliographic levels (journal, volume, issue, transmitted across networks and the Internet article). It is in this environment that the the same way that other data is transferred; developers of ONIX have undertaken efforts to for instance, as an email attachment or via extend ONIX to serials. FTP. Once an online retailer receives an ONIX message, the same tools (an XML software application and the ONIX DTD) are ONIX for serials used to validate the data. From that point, the There are three new ONIX records specific to retailer translates the delivered data into what serials that are currently under review: the Serial is seen on a web page. Title Record, the Serial Item Record, and the Subscription Package Record. The Serial Title ONIX differs from other metadata standards in Record is the proposed ONIX format for that it is a very rich record with over 200 data exchanges of rich catalog information. It elements, some optional and some required. For provides a readily extensible framework for the example, ISBN, author name, and title are description of a journal as a bibliographic item, required elements; book reviews and cover including such details as the cost of an individual image remain optional. In contrast, DCMI uses subscription item. The Serial Item Record is the only fifteen repeatable, optional elements. A full ONIX format for alerting, shipping, library ONIX record loaded onto a website provides a check-in functions, and structured multilevel searching experience similar to that of browsing bibliographic description of serial parts. The the physical book. Just as book retailers and Subscription Package Record is the ONIX wholesalers came to require an ISBN and a bar format for communicating a publisher’s or code, they will soon require an ONIX record agent’s product catalog information about for every new title. Several publishers are subscription packages, along with the Serial Title already delivering ONIX data feeds to their Record, which carries product catalog trading partners. information about individual serials.

For more information on ONIX, A Serial Title Record file is linkable to an go to http://www.editeur.org/onix.html accompanying Serial Item Record file when more complex price information is required, such as the ability to specify “off-the-shelf ” or Journal-Oriented tailored subscription packages of the kind Metadata Practices increasingly being offered by academic journal Journal publishers have been slower to publishers. This linkage could prove invaluable converge on their own metadata standards than for sales of journals to consortia. book publishers, in part due to a business environment in which metadata was largely the JWP on the exchange of serials purview of other parties, such as subscription agents, aggregators, and libraries. Although subscription information electronic publishing has taken a firm hold in Taking ONIX for serials as a starting point, journals publishing, most publishers have NISO and EDItEUR have recently launched a

10 Metadata Demystified Joint Working Party (JWP) to explore the Publisher members of CrossRef initially creation of standard formats for the exchange deposit a record for a content item that of serials subscription information. At the consists of minimal bibliographic metadata: present time, most such exchanges make use journal title, ISSN, first author, year, volume, of variable, proprietary formats, except where issue, page number, DOI and URL. formats appropriate to a given exchange Depositing metadata with CrossRef involves already exist, such as use of the MARC 21 creating a file formatted according to an bibliographic format for library holdings data. XML schema. The following example In the future, there will probably be more illustrates an abbreviated metadata record pressure on publishers and others to exchange containing both journal-level and article-level this information in an accurate, efficient, and elements: secure manner. Development of these guidelines also requires standard identifiers Applied Physics Letters for the key elements in the exchange, Appl. Phys. Lett. including parties to the exchange, 00036951 aggregations, subscription packages, and the 10773118 journals themselves. 10.1063/aplo http://ojps.aip.org/aplo/ The JWP is currently functioning as three … subgroups: one on identifiers, another on publisher-to-library exchanges, and a third on PAMS (Publication Access Management Ann P. Service)-to-Library exchanges. The Shirakawa immediate goals of the JWP are to implement pilot programs in these three areas during 2003 and ultimately recommend specific 1999 enhancements to the ONIX for serials schema. 2268 For more information on the JWP, go to 10.1063/1.123820 http://www.niso.org/news/SerialsExchange.html 19990628123304 http://ojps.aip.org/link/?apl/74/2268/ab CrossRef CrossRef is a DOI-based system for the After a publisher deposits a record, CrossRef persistent identification of scholarly content registers the DOI-URL pair in the central DOI and cross-publisher reference linking to the directory and maintains the full metadata set in full text of a journal. CrossRef DOIs link to its metadata database (MDDB). In a separate publisher response pages, which include the process, the publisher submits the citations full bibliographic citation and abstract, as well contained in each deposited article to the as providing full-text access as determined by Reference Resolver, the front-end component the publisher. The publisher response page of the MDDB that allows for the retrieval of often includes other linking options, such as DOIs. By using this method, the publisher can, pay-per-view access, journal table of contents as part of its electronic production process, add and homepage, and associated resources. outbound hyperlinks to any of an article’s CrossRef has recently begun adding books and citations that point to content already registered conference proceedings to its linking network. in the CrossRef system.

The Sheridan Press / NISO Press 11 If the identified content migrates from one interlibrary loan (ILL) services, databases, production system to another (e.g., pre-print search engines, etc. For the user working in an to post-print), or moves from one publisher to institutional context, it is often useful to be another if a journal—or the publisher itself— directed to resources outside the publisher’s changes ownership, the publisher need only site. For example, the institution may not update the URL in one place in order for the subscribe to the e-journal itself but may still DOI to persist. In all these cases the DOI be able to offer the user access to the desired never changes, which means that all the links article through an aggregated database or to that content that have already been made print holdings. In addition, the library may will still function. wish to provide a range of linking options beyond what is available at the publisher’s The CrossRef Reference Resolver accepts website. bibliographic metadata and returns the corresponding DOI. Queries are formatted in Information providers are beginning to a pipe-delimited format containing ten fields implement the OpenURL to enable optimal for queries against journal holdings and integration with library linking systems. This twelve fields for queries against books and has caused some confusion among primary conference proceeding holdings. These and secondary publishers who use the queries are submitted interactively through a CrossRef/DOI system for cross-publisher Web browser interface or programmatically links to full text, because of the mistaken via the system’s HTTP interface. The resolver perception that the OpenURL and the DOI are will also accept a DOI as input and return the competing standards; they are not. CrossRef associated metadata. When a query result is and the DOI provide persistent identification returned, the metadata can be presented in of scholarly content and centralized linking to either the same pipe-delimited format or as the full text and other resources designated by XML. the publisher. The OpenURL is designed for localized linking and enables library- For more information on CrossRef, controlled links to a multiplicity of resources go to http://www.crossref.org related to a citation.

OpenURL and CrossRef. The OpenURL is a The OpenURL and DOI work together in mechanism for transporting metadata and several ways. First, the DOI directory itself— identifiers describing a publication for the where link resolution occurs in the CrossRef purpose of context-sensitive linking. The system—is OpenURL-enabled. This means OpenURL is currently on the path toward that it can recognize a user with access to a NISO approval. local resolver. When such a user clicks on a DOI, the CrossRef system redirects that DOI A link resolver is a system for linking within back to the user’s local resolver and at the an institutional context that can interpret same time allows the DOI to be used as a key incoming OpenURLs, take the local holdings to pull metadata out of the CrossRef database and access privileges of that institution — metadata that is needed to create the (usually a library) into account, and display OpenURL that targets the local link resolver. links to appropriate resources. A link resolver As a result, the institutional user clicking on a allows the library to provide a range of DOI is directed to appropriate resources. library-configured links and services, including links to the full text, a local catalog By using the CrossRef/DOI system to identify to check print holdings, document delivery or their content, publishers can make their

12 Metadata Demystified products OpenURL-aware. Since DOIs can For more information, streamline linking and data management go to http://www.openarchives.org processes for publishers, many publishers are beginning to require that the DOI be used as the primary mechanism for linking to full Conclusion text; link resolvers can then use the CrossRef Metadata has become an essential part of the system to retrieve the DOI if the DOI is not publication process. Whether an information already available from the source, or citing resource is published in book or journal form, document. in print or electronic format, metadata is how the content creator or producer advertises its For more information on the OpenURL, existence. The richer the metadata record, the go to http://library.caltech.edu/openurl greater the possibilities.

The Open Archives Initiative As the sea of information grows, being able to locate, discover, link to, search on, re- Although the Open Archives Initiative (OAI) purpose, integrate, track, exchange, or sell a got underway as a means of supporting given information resource all tend to distributed e-print archives with tools for become more complex processes. Good interoperability, a growing number of metadata practices reduce some of this publishers now recognize its value as a tool complexity and help publishers harness the for disseminating publisher metadata. The new opportunities that new technologies OAI framework for exposing metadata will bring. through the OAI Protocol for Metadata Harvesting (OAI-PMH) is entirely independent of the type of underlying content Where To Go From Here and the economic models surrounding that Without recommending specific products or content. vendors, the following list provides some information resources on electronic OAI-PMH defines an easy-to-implement tool publishing that serve as good starting points: for harvesting XML-formatted metadata from content repositories, or servers. Participation Since 1997, Sheridan Press has published a can take one of two forms: data providers use series of white papers on information OAI-PMH to expose metadata, while service technology and publishing, available at providers use metadata harvested via the http://www.sheridanpress.com/whitepapers.htm. OAI-PMH to build new services. To quote Clifford Lynch, Executive Director of the NISO standards and guides are available to Coalition for Networked Information (CNI), the public without charge from the NISO OAI-PMH is “simply an interface that a website: http://www.niso.org. NISO offers networked server (not necessarily an e-print workshops and programs throughout the year server) can employ to make metadata focusing on standards and good publishing describing objects housed at that server practices. available to external applications that wish to collect this metadata.” (See Lynch, C., ARL Both the Society for Scholarly Publishing Bimonthly Report 217 titled “Metadata (http://www.sspnet.org) and the Council Harvesting and the Open Archives of Science Editors Initiative” available at (http://www.councilofscienceeditors.org) offer http://www.arl.org/newsltr/217/mhp.html.) tutorials on electronic publishing topics.

The Sheridan Press / NISO Press 13 The Columbia Guide to Digital Publishing, subscription. It offers in-depth reports, news edited by William Kasdorf, is available briefs, and other information about current for online browsing at educational opportunities and resources in http://www.digitalpublishingguide.com, and electronic publishing. is an excellent, up-to-date resource on XML, The NYU Center for Publishing, part of the content management, and related workflow School of Continuing and Professional issues. Studies (http://www.scps.nyu.edu/ Data Conversion Labs publishes a newsletter departments/index.jsp) offers classes on called DCLNews at http://www.dclab.com/ ONIX and technology and publishing. DCLNews.asp that is available via free Compendium of Cited Web Resources Book Industry Study Group (BISG) http://www.bisg.org Coalition for Networked Information (CNI) http://www.cni.org Columbia Guide to Digital Publishing http://www.digitalpublishingguide.com CORES Forum on Shared Metadata Vocabularies http://www.cores-eu.net Council of Science Editors (CSE) http://www.councilofscienceeditors.org CrossRef http://www.crossref.org DCLNews http://www.dclab.com/DCLNews.asp Digital Object Identifier (DOI) http://www.doi.org Dublin Core Metadata Initiative (DCMI) http://dublincore.org Extensible Markup Language (XML) http://www.w3/org/XML International Standard Book Number (ISBN) http://www.isbn.org/standards/home/index.asp International Standard Serial Number (ISSN) http://www.issn.org International Standard Text Code (ISTC) http://www.nlc-bnc.ca/iso/tc46sc9/istc.htm Interoperability of Data in E-Commerce Systems (INDECS) http://www.indecs.org Learning Objects Network (LON) http://www.learningobjectsnetwork.com Machine Readable Catalog (MARC) http://www.loc.gov/marc National Information Standards Organization (NISO) http://www.niso.org NISO-EDItEUR Joint Working Party on the Exchange of Serials Subscription Information http://www.niso.org/news/SerialsExchange.html NYU Center for Publishing http://www.scps.nyu.edu/departments/index.jsp Online Information Exchange (ONIX) http://www.editeur.org/onix.html Open Archives Initiative (OAI) http://www.openarchives.org OpenURL http://library.caltech.edu/openurl Serial Item and Contribution Identifier Standard (SICI) http://sunsite.berkeley.edu/SICI Seybold Reports http://www.seyboldreports.com Sharable Content Object Reference Model (SCORM) http://www.adlnet.org Sheridan Press White Papers http://www.sheridanpress.com/whitepapers.htm Society for Scholarly Publishing (SSP) http://www.sspnet.org

14 Metadata Demystified About the Authors Barbara currently serves on the SSP Board of Directors and is a past president of the and Publishers Council of Science Editors (CSE). She holds Amy Brand joined CrossRef as Director of two degrees from George Washington Business Development in April 2001. Her University: her bachelor’s in science career spans electronic publishing, book journalism and her master’s in science, publishing, and academia. She has previously technology, and public policy with a held positions at Ingenta, LEA Inc., the specialization in technology assessment. For University of Pennsylvania, and The MIT additional information on Barbara’s Press where she was an executive editor from background, services, and client list, please 1994-2000. She received her doctorate in visit the MCS web site at cognitive science from MIT in 1989. Contact http://www.MCSone.com. Contact Information: CrossRef, 40 Salem Street, Information: Meyers Consulting Services, Lynnfield, MA 01940. V: 781-295-0072; 1836 Metzerott Road, Suite 1003, Adelphi, F: 781-295-0077; E: [email protected]. MD 20783-3448. V: 301-434-6249; F: 301- 434-0126; E: [email protected]. Frank Daly, until recently, was Executive Director of the Book Industry Study Group. NISO Press is the publishing program of the For more than twenty years, Frank was with National Information Standards Organization Baker & Taylor, Inc. During that time he (NISO). NISO, a nonprofit association served in a variety of roles, including accredited by the American National Director of Marketing, Public & School Standards Institute (ANSI), identifies, Libraries, and Vice President, Business develops, maintains, and publishes technical Development. Frank is on the advisory standards to manage information in our boards of Clarion University, NYU’s Center changing and ever-more digital environment. for Publishing, and KnowledgeMax, a NISO standards apply both traditional and corporate intranet provider. He is past new technologies to the full range of President of The American Wholesale information-related needs, including retrieval, Booksellers Association. Frank received re-purposing, storage, metadata, and his MBA from Fordham University and preservation. Contact Information: NISO, his BBA from the University of 4733 Bethesda Avenue, Suite 300, Bethesda, Massachusetts. Contact Information: MD 20814. V: 301-654-2512; F: 301-654- 30 Tiberon Drive, Holmdel, NJ 07733. 1721; E: [email protected]. V: 732-817-1774; F: 732-817-1774; Website: http://www.niso.org. E: [email protected]. The Sheridan Press provides a full range of Barbara Meyers, president of Meyers printing and publishing services and Consulting Services (est. 1983), provides technology innovations to associations, expert advice and experienced operational publishers, and university presses within the support to professional societies, scholarly scientific, technical, and medical journal publishers, and their supplier communities in markets. Contact Information: The Sheridan the areas of management, marketing, Press, 450 Fame Avenue, Hanover, PA 17331. planning, and research. One of the founders V: 717-632-3535; F: 717-633-8900. of the Society for Scholarly Publishers (SSP), Website: http://www.sheridanpress.com.

The Sheridan Press / NISO Press 15 Printing and Publishing Services 450 Fame Avenue Hanover, Pennsylvania 17331

For more information about The Sheridan Press or to request additional copies of the Metadata Demystified White Paper, call Prudi Showers at 717-632-3535 or contact her by e-mail at [email protected] or fax this form to 717-633-8900.

Name Title

Company

Address

Phone Number Fax Number

E-Mail Address

The Sheridan Press Publications and Literature

I am interested in additional copies of the following White Papers: _____ Metadata Demystified (in collaboration with NISO Press) (7/03) _____ Member Recruitment (4/03) _____ Digital Art (5/02) _____ Implementing Information Technology Systems (1/02) _____ Marketing Reprints (10/01) _____ Marketing Scholarly Journals (5/01) _____ Digital Archiving in the New Millennium: Developing an Infrastructure (11/00) _____ Improving Journal Quality with Process Improvement Methods (5/00) _____ Digital Workflow: Managing the Process Electronically (3/00) _____ How to Make the Most of Reprints (5/99) _____ The Future of the Print Journal (2/99) _____ Outsourcing (6/98) _____ Archiving (9/97)

I am interested in more information about: _____ The Sheridan Press _____ Sheridan Reprints _____ The Sheridan Group ISBN 1-880124-59-9