Metadata Basics

Chapter 7: Learning about Metadata By Jennifer Phillips /A Introduction A basic understanding of metadata – its principles, standards, and best practices – can go a long way toward launching your career in digital librarianship. At first metadata may seem like a somewhat mystifying branch of cataloging or an issue of concern primarily to software engineers and computer programmers. But metadata is not an obscure topic just for technical people. Familiarity with the basic principles of metadata is necessary for all people working in digital librarianship, and a solid foundation can help you stand out professionally. If we think of metadata in terms of its relationship to core principles of librarianship, it becomes more approachable. This chapter will provide an overview of the concepts and define the terminology used in discussions of metadata. It is intended to be a high-level discussion of metadata rather than an explanation of the nuts and bolts of metadata implementation, which will be addressed in Chapter 8. If you come from a technical services background or if you focused on cataloging in library school, many of the ideas here may already be familiar to you. If instead you come from a public services background, or are new to library and information science in general, this chapter will familiarize you with metadata and the issues surrounding it in a library context. The goal of this chapter is to define metadata in a way that invites you to think about how it pertains to both the public service and technical aspects of digital library work. Another aim is to introduce you to or refresh your memory about categories of metadata and metadata standards, so that you will be able to articulate the importance of metadata for modern libraries. Demonstrating an understanding of metadata and how it relates to the librarian’s job of assisting 1 in the discovery, access, and use of information resources can be extremely useful when trying to get involved in digital projects. /A What is metadata? Metadata is difficult to define briefly, because the term is used for a variety of kinds of information that describe other information. The most commonly used definition is “data about data,” but this is incomplete. To understand metadata in a general sense, it is important to bear in mind a few key points: • metadata is information or data that is associated with other information resources • metadata is structured information • metadata is used to enable a range of functions with respect to the resource it describes While metadata is a type of information that is always about other information, it can be about any form of information. In other words, metadata can describe information resources of all types – from physical books and images to web sites, audio files, datasets and software. It can be stored in a database, separate from the resources it explains, or it can be embedded in the digital files it describes. Because it is structured, metadata can be machine processed, and it is therefore fundamental to the way that information resources function and are used in an electronic environment. Finally, part of the definition of metadata should include its purpose, which is to support the description, discovery, use, management, and preservation of information resources. A few familiar examples of metadata can help clarify the concept and illustrate the contexts in which some forms of metadata have been developed. Most of us have encountered data about digital files that is stored within the files, without necessarily thinking about it as metadata. For example, the Apple iTunes application for managing music files (MP3s) on a 2 home computer displays songs according to the categories name, artist, album, time, track number, and genre. These are all elements of metadata encoded in the ID3 tag at the end of an MP3 file. This file-based metadata is displayed in the iTunes interface and gives the user the ability to sort and search for songs according to these properties. Another example, which also illustrates how file-based metadata can be in part system- generated and in part supplied by the user, is the properties of a Microsoft Word document. In Word, you can view characteristics of a file and information about its content. The system- generated information includes the date created, date modified, size, and file type; the metadata the user can supply includes the author, title, subject, keywords and a description. This metadata allows for input from the user on the one hand, and on the other facilitates system-based operations such as the interaction of the file with the software application or operating system. You can organize, identify, and search for your documents based on both the values you have specified and the automatically generated properties. 3 Since the metadata associated with an MP3 or Microsoft Word file allows the user to describe, arrange, search for and select their files, these everyday examples of metadata show how metadata supports these user tasks. Metadata has evolved from several different communities including library and information science, records management, database design, and software design. One example of metadata that most librarians are already familiar with is the MARC (Machine-Readable Cataloging) record. MARC is based on a set of rules, the International Standard Bibliographic Description (ISBD), and is designed specifically for bibliographic data to meet the needs of the library community. A MARC bibliographic record is a source of information about a bibliographic resource (book, serial, sound recording, video recording, etc.), and when you look at an online library catalog you are being presented with a view of MARC records. MARC takes the information that describes the intellectual and physical characteristics of a resource and structures it in such a way that allows it to be displayed in catalogs and shared with other systems. The MARC format for bibliographic records defines the data elements – units of data with specific meaning – and the codes used for encoding bibliographic data. For example, 4 MARC defines the data element “title and statement of responsibility” and puts it in the “245” field. Indicator and subfield codes characterize and further mark up the data contained within the field. The first indicator indicates whether there should be an added entry for the title in the library catalog, and subfield “a” distinguishes the title from the statement of responsibility in subfield “c.” Personal author information goes in the “100” field, and imprint information (publication, distribution, etc.) goes in the “260.” As such, the basic elements for an edition of Herman Melville’s “Moby Dick” would be encoded in MARC as follows: 100 1 $aMelville, Herman,$d1819-1891. 245 10 $aMoby-Dick, or, The whale /$cHerman Melville ; foreword by Nathaniel Philbrick. 260 $aLondon :$bPenguin,$c2009. Thus structured and encoded, bibliographic metadata can be interpreted and displayed by library system software and exchanged with other agencies, regardless of the language of the content. MARC enables the discovery, retrieval, and use of resources by making them searchable in library catalogs according to a broad set of elements, including the title, statement of responsibility, publication information, physical description, information specific to medium, and subject. /A What is the purpose of metadata? Metadata supports the use of information resources in a digital environment. As you consider the relevance of metadata to your career as a digital librarian, it may be useful to think about how metadata reflects the core values of librarianship in general and the principles that underlie library cataloging in particular. There is a clear example of this in the case of digital libraries, where metadata can in part be seen as serving the same purpose as bibliographic records in traditional libraries. Like bibliographic records, metadata should support the generic user tasks of finding, identifying, selecting and obtaining resources, as defined in IFLA’s 5 Functional Requirements for Bibliographic Records (IFLA Study Group on the Functional Requirements for Bibliographic Records 2008, 8). A digital library normally consists of collections of digital resources that are made available online through a user interface. The users of such collections may vary. A digital library may be open access and designed for the public, as is the case for a collection of digitized versions of unique, local resources. On the other hand, there may be use restrictions, as in the case of repositories for an organization’s electronic records or subscription-based materials. Regardless of the type of digital collection and user, however, the purpose behind specific data elements can be articulated in terms of supporting user tasks. Perhaps the most obvious purpose of metadata in this regard is its role in search and discovery. When deciding which metadata elements to employ in a given context, it is vital to consider what use elements will have from the user perspective, as well as the search functionality they will support. Due to the ubiquity of search engines like Google that provide a single search box, today’s users are often most comfortable searching by keyword. Metadata improves the results of this type of search by enabling keyword matching on metadata terms, which have been selected because of their relevance, rather than relying on the possibility of matching words from within the text. More sophisticated queries may include author/creator name and title or title keyword, and since these data elements are the backbone of most descriptive efforts, metadata supports this method of searching in particular. Beyond being aligned with specific search criteria, metadata also enables browsing and collocation. Metadata pertaining to subject or resource type can allow for multiple resources, sometimes from different contexts, to be automatically associated with each other “on the fly.” For example, to return to our earlier example of iTunes, you can use the metadata associated with 6 songs to arrange your music.

Metadata Basics

Metadata Demystified: a Guide for Publishers

Metadata Standards

A Framework of Guidance for Building Good Digital Collections

Metadata for the Open Data Portals

Metadata Developments in Libraries and Other Cultural Heritage Institutions

The Dublin Core Metadata Element Set

METADATA GUIDELINES for DESCRIBING BORN -DIGITAL PROGRAMS – SEPTEMBER 2007 Preserving Digital Public Television

Observations on the Catalogers' Role in Descriptive Metadata Creation In

MODS: the Metadata Object Description Schema

MPEG-7 White Paper

CDP Dublin Core Metadata Best Practices Version 2.1

Core Competencies for Cataloging and Metadata Professional Librarians