Understanding Metadata
Understanding Metadata
What is Metadata? ...... 1 What Does Metadata Do? ...... 1 Structuring Metadata ...... 2 Metadata Schemes and Element Sets ...... 3 Dublin Core ...... 3 TEI and METS ...... 4 MODS ...... 5 EAD and LOM...... 6
About NISO NISO, a non-profit association accredited by the American National Standards Institute (ANSI), identifies, develops, maintains, and publishes technical standards to manage information in our changing and ever-more digital environment. NISO standards apply both traditional and new technologies to the full range of information-related needs, including retrieval, re-purposing, storage, metadata, and preservation. NISO Standards, information about NISO’s activities and membership are featured on the NISO website
This booklet is available for free on the NISO website (www.niso.org) and in hardcopy from NISO Press.
Published by: NISO Press National Information Standards Organization 4733 Bethesda Avenue, Suite 300 Bethesda, MD 20814 USA Email: [email protected] Tel: 301-654-2512 Fax: 301-654-1721 URL: www.niso.org
Copyright © 2004 National Information Standards Organization ISBN: 1-880124-62-9 Understanding Metadata What Is Metadata? administrative data; two that in the headers of image files. sometimes are listed as separate Storing metadata with the object it Metadata is structured infor- metadata types are: describes ensures the metadata will mation that describes, explains, not be lost, obviates problems of locates, or otherwise makes it − Rights management meta- linking between data and metadata, easier to retrieve, use, or manage data, which deals with and helps ensure that the metadata an information resource. Metadata intellectual property rights, and object will be updated together. is often called data about data or and However, it is impossible to embed information about information. − Preservation metadata, which metadata in some types of objects The term metadata is used contains information needed (for example, artifacts). Also, storing differently in different communities. to archive and preserve a metadata separately can simplify Some use it to refer to machine resource. the management of the metadata understandable information, while itself and facilitate search and others use it only for records that Metadata can describe re- retrieval. Therefore, metadata is describe electronic resources. In sources at any level of aggregation. commonly stored in a database the library environment, metadata It can describe a collection, a single system and linked to the objects is commonly used for any formal resource, or a component part of a described. scheme of resource description, larger resource (for example, a applying to any type of object, digital photograph in an article). Just as What Does or non-digital. Traditional library cataloging is a form of metadata; Metadata Do? MARC 21 and the rule sets used Metadata is key An important reason for creating with it, such as AACR2, are to ensuring that descriptive metadata is to facilitate metadata standards. Other discovery of relevant information. In metadata schemes have been resources will addition to resource discovery, developed to describe various types survive and metadata can help organize of textual and non-textual objects continue to be electronic resources, facilitate including published books, interoperability and legacy resource electronic documents, archival accessible into integration, provide digital finding aids, art objects, educational the future. identification, and support archiving and training materials, and scientific and preservation. datasets. Resource Discovery There are three main types of catalogers make decisions about metadata: whether a catalog record should be Metadata serves the same • created for a whole set of volumes functions in resource discovery as Descriptive metadata describes or for each particular volume in the good cataloging does by: a resource for purposes such as set, so the metadata creator makes • allowing resources to be found discovery and identification. It similar decisions. Metadata can also by relevant criteria; can include elements such as be used for description at any level title, abstract, author, and of the information model laid out in • identifying resources; keywords. the IFLA (International Federation • bringing similar resources of Library Associations and • Structural metadata indicates together; how compound objects are put Institutions) Functional Require- together, for example, how ments for Bibliographic Records: • distinguishing dissimilar re- pages are ordered to form work, expression, manifestation, or sources; and chapters. item. For example, a metadata • record could describe a report, a giving location information. • Administrative metadata pro- particular edition of the report, or a Organizing Electronic vides information to help specific copy of that edition of the manage a resource, such as report. Resources when and how it was created, file Metadata can be embedded in As the number of Web-based type and other technical a digital object or it can be stored resources grows exponentially, information, and who can access separately. Metadata is often aggregate sites or portals are it. There are several subsets of embedded in HTML documents and increasingly useful in organizing
Page 1 links to resources based on digital object may also be given The latter group developed a audience or topic. Such lists can be using a file name, URL (Uniform framework outlining types of built as static webpages, with the Resource Locator), or some more presentation metadata. A follow-up names and locations of the persistent identifier such as a PURL group, PREMIS (PREservation resources “hardcoded” in the (Persistent URL) or DOI (Digital Metadata: Implementation Strat- HTML. However, it is more efficient Object Identifier). Persistent egies)—also sponsored by OCLC and increasingly more common to identifiers are preferred because and RLG—is developing a set of build these pages dynamically from object locations often change, core elements and strategies for the metadata stored in databases. making the standard URL (and encoding, storage, and manage- Various software tools can be used therefore the metadata record) ment of preservation metadata to automatically extract and invalid. In addition to the actual within a digital preservation system. reformat the information for Web elements that point to the object, the Many of these initiatives are based applications. metadata can be combined to act on or compatible with the ISO as a set of identifying data, Reference Model for an Open Interoperability differentiating one object from Archival Information System Describing a resource with another for validation purposes. (OAIS). metadata allows it to be understood Archiving and by both humans and machines in Structuring Metadata ways that promote interoperability. Preservation Metadata schemes (also called Interoperability is the ability of Most current metadata efforts schema) are sets of metadata multiple systems with different center around the discovery of elements designed for a specific hardware and software platforms, recently created resources. purpose, such as describing a data structures, and interfaces to However, there is a growing particular type of information exchange data with minimal loss of concern that digital resources will resource. The definition or meaning content and functionality. Using not survive in usable form into the of the elements themselves is defined metadata schemes, shared future. Digital information is fragile; known as the semantics of the transfer protocols, and crosswalks it can be corrupted or altered, scheme. The values given to between schemes, resources intentionally or unintentionally. It metadata elements are the content. across the network can be may become unusable as storage Metadata schemes generally searched more seamlessly. media and hardware and software specify names of elements and their Two approaches to inter- technologies change. Format semantics. Optionally, they may operability are cross-system search migration and perhaps emulation of specify content rules for how and metadata harvesting. The current hardware and software content must be formulated (for Z39.50 protocol is commonly used behavior in future hardware and example, how to identify the main for cross-system search. Z39.50 software platforms are strategies for title), representation rules for implementers do not share overcoming these challenges. content (for example, capitalization metadata but map their own search Metadata is key to ensuring that rules), and allowable content values capabilities to a common set of resources will survive and continue (for example, terms must be used search attributes. A contrasting to be accessible into the future. from a specified controlled approach taken by the Open Archiving and preservation require vocabulary). Archives Initiative is for all data special elements to track the There may also be syntax rules providers to translate their native lineage of a digital object (where it for how the elements and their metadata to a common core set of came from and how it has changed content should be encoded. A elements and expose this for over time), to detail its physical metadata scheme with no harvesting. A search service characteristics, and to document its prescribed syntax rules is called provider then gathers the metadata behavior in order to emulate it on syntax independent. Metadata can into a consistent central index to future technologies. allow cross-repository searching be encoded in any definable syntax. Many organizations inter- Many current metadata schemes regardless of the metadata formats nationally have worked on defining used by participating repositories. use SGML (Standard Generalized metadata schemes for digital Mark-up Language) or XML Digital Identification preservation, including the National (Extensible Mark-up Language). Library of Australia, the British Most metadata schemes include XML, developed by the World Wide Cedars Project (CURL Exemplars elements such as standard Web Consortium (W3C), is an in Digital Archives), and a joint numbers to uniquely identify the extended form of HTML that allows Working Group of OCLC and the work or object to which the for locally defined tag sets and the Research Libraries Group (RLG). metadata refers. The location of a easy exchange of structured
Page 2 Understanding Metadata information. SGML is a superset of some complexity. There has for libraries is being developed by both HTML and XML and allows for historically been some tension the Libraries Working Group. the richest mark-up of a document. between supporters of a minimalist Useful XML tools are becoming view, who emphasize the widely available as XML plays an need to keep the elements Dublin Core Example increasingly crucial role in the to a minimum and the exchange of a variety of data on the semantics and syntax Title=”Metadata Demystified” Web. simple, and supporters of a structuralist view who Creator=”Brand, Amy” argue for finer semantic Metadata Schemes and Creator=”Daly, Frank” distinctions and more Element Sets extensibility for particular Creator=”Meyers, Barbara” Many different metadata communities. Subject=”metadata” schemes are being developed in a These discussions variety of user environments and have led to a distinction Description=”Presents an overview of disciplines. Some of the most between qualified and metadata conventions in common ones are discussed in this unqualified (or simple) publishing.” section. Dublin Core. Qualifiers can Publisher=”NISO Press” Dublin Core be used to refine (narrow the scope of) an element, Publisher=”The Sheridan Press” The Dublin Core Metadata or to identify the encoding Date=”2003-07" Element Set arose from discussions scheme used in repre- at a 1995 workshop sponsored by senting an element value. Type=”Text” OCLC and the National Center for The element Date, for Format=”application/pdf” Supercomputing Applications example, can be used with (NCSA). As the workshop was held the refinement qualifier Identifier=”http://www.niso.org/ standards/resources/ in Dublin, Ohio, the element set was created to narrow the Metadata_Demystified.pdf” named the Dublin Core. The meaning of the element to continuing development of the the date the object was Language=”en” Dublin Core and related spec- created. Date can also be ifications is managed by the Dublin used with an encoding scheme Because of its simplicity, the Core Metadata Initiative (DCMI). qualifier to identify the format in Dublin Core element set is now The original objective of the which the date is recorded, for used by many outside the library Dublin Core was to define a set of example, following the ISO 8601 community— researchers, elements that could be used by standard for representing date and museum curators, and music authors to describe their own Web time. collectors to name only a few. There resources. Faced with a pro- All Dublin Core elements are are hundreds of projects worldwide liferation of electronic resources optional and all are repeatable. The that use the Dublin Core either for and the inability of the library elements may be presented in any cataloging or to collect data from the profession to catalog all these order. While the Dublin Core Internet; more than 50 of these have resources, the goal was to define a description recommends the use of links on the DCMI website. The few elements and some simple controlled values for fields where subjects range from cultural rules that could be applied by they are appropriate (for example, heritage and art to math and noncatalogers. The original 13 core controlled vocabularies for the physics. Meanwhile the Dublin Core elements were later increased to Subject field), this is not required. Metadata Initiative has expanded 15: Title, Creator, Subject, Descrip- However, working groups have beyond simply maintaining the tion, Publisher, Contributor, Date, been established to discuss Dublin Core Metadata Element Set Type, Format, Identifier, Source, authoritative lists for certain into an organization that describes Language, Relation, Coverage, and elements such as Resource Type. itself as “dedicated to promoting the Rights. While Dublin Core leaves content widespread adoption of inter- The Dublin Core was developed rules to the particular imple- operable metadata standards and to be simple and concise, and to mentation, the DCMI encourages developing specialized metadata describe Web-based documents. the adoption of application profiles vocabularies for discovery However, Dublin Core has been (domain-specific rules) for particular systems.” used with other types of materials domains such as education and and in applications demanding government. An application profile
Understanding Metadata Page 3 The Text Encoding Metadata Encoding and an encoding format for metadata Initiative (TEI) Transmission Standard for textual and image-based works. (METS) The Digital Library Federation (DLF) The Text Encoding Initiative is an built on that earlier work to create international project to develop The Metadata Encoding and METS, a standard schema for guidelines for marking up electronic Transmission Standard (METS) providing a method for expressing texts such as novels, plays, and was developed to fill the need for a and packaging together descriptive, poetry, primarily to support research standard data structure for administrative, and structural in the humanities. In addition to describing complex digital library metadata for objects within a digital specifying how to encode the text objects. METS is an XML Schema library. Expressed using the XML of a work, the TEI Guidelines for for creating XML document schema language, METS provides Electronic Text Encoding and instances that express the structure a document format for encoding the Interchange also specify a header of digital library objects, the metadata necessary for manage- portion, embedded in the resource, associated descriptive and ment of digital library objects within that consists of metadata about the administrative metadata, and the a repository and for exchange work. The TEI header, like the rest names and locations of the files that between repositories. of the TEI, is defined as an SGML comprise the digital object. DTD (Document Type Definition)— The metadata nec- a set of tags and rules defined in essary for successful Metadata in Action SGML syntax that describe the management and use of An oral historian makes tape- structure and elements of a digital objects is both more document. This SGML mark-up recordings of interviews with members of extensive than and a particular ethnic group. Interviewees becomes part of the electronic different from the sign a paper release form giving resource itself. Since the TEI DTD metadata used for intellectual property rights to the historian. is rather large and complicated in managing collections of Most interviewees grant permission to order to apply to a vast range of printed works and other disseminate the interviews in print and texts and uses, a simpler subset of physical materials. electronically, but several restrict publication and dissemination until 25 the DTD, known as TEI Lite, is Structural metadata is years after death. commonly used in libraries. needed to ensure that Information about each interview is It is assumed that TEI-encoded separately digitized files texts are electronic versions of kept in a database: Interviewer, (for example, different Interviewee, Date, Place, etc. Each printed texts. Therefore the TEI pages of a digitized book) interview follows a questionnaire format. Header can be used to record are structured appro- The questionnaire exists as a text file. The bibliographic information about both priately. Technical tapes, release forms, database, and text the electronic version of the text and metadata is needed for file are donated to a library that has a about the non-electronic source information about the special collection focusing on the particular version. The basic bibliographic digitization process so ethnic group. information is similar to that that scholars may The tapes are digitized. Since each recorded in library cataloging and determine how accurate a interview runs over several tapes, can be mapped to and from MARC. reflection of the original technicians record structural metadata to keep component parts of each interview However, there are also elements the digital version defined to record details about how together. Technicians record provides. Other technical administrative metadata such as file the text was transcribed and edited, metadata is required for names, location of each interview in the how mark-up was performed, what internal purposes in order files, equipment used, the methods of revisions were made, and other to periodically refresh and digitizing and assuring quality and non-bibliographic facts. Libraries migrate the data, ensuring completeness, file formats, etc. Different tend to use TEI headers when they the durability of valuable segments of this metadata allow the audio have collections of SGML-encoded resources. files to be automatically tracked, accessed, stored, refreshed, and migrated. full text. Some libraries use TEI METS was originally headers to derive MARC records for An archivist expands the database to an outgrowth of the include the persistent identifier of each their catalogs, while others use Making of America II MARC records as the basis for interview, thereby linking the audio file to project, a digitization the descriptive metadata. The names of creating TEI header descriptions for project of major research the data elements are revised to match the source texts. libraries that attempted to Dublin Core terminology, including address these metadata qualifiers used specifically for audio issues, in part by providing (continued on page 5)
Page 4 Understanding Metadata A METS document contains • Administrative Metadata – called MIX, Metadata for Images in seven major sections: Provides information regarding XML Schema, and is based on a • METS Header – Contains how the files are created and proposed NISO standard, Z39.87, metadata describing the METS stored, intellectual property Data Dictionary: Technical document itself, including such rights, the original source object Metadata for Digital Still Images. information as creator, editor, from which the digital library Further work is in process on etc. object derives, and the prov- extension schemas for audio, video, enance of the files comprising and websites. Another current area • Descriptive Metadata – Points to the digital library object. of concentration for the METS descriptive metadata external to • development community is the the METS document (for File Section – Lists all files creation of METS application example, a MARC record in an containing content that comprise profiles to give guidance regarding OPAC or an Encoded Archival the electronic versions of the the creation of METS documents for Description finding aid main- digital object. particular object types. tained on a webserver), or to • Structural Map – Outlines a Use of the METS schema is internally embedded descriptive hierarchical structure for the widespread. A list of implementation metadata, or both. digital library object and links the registries using METS, a tutorial, elements of that structure and other important information can Metadata in Action to content files and be found on the METS website. (continued from page 4) metadata that pertain to Metadata Object each element. materials. Information on rights and Description Schema permissions is entered. • Structural Links – (MODS) An archivist creates an EAD finding Allows METS creators to aid for the audio collection using the record the nodes in the The Metadata Object database as the core. Portions of the Description Schema (MODS) is a questionnaire text file are incorporated as hierarchy outlined in the a rich source of subject keywords. A MARC Structural Map. descriptive metadata schema that record is derived from the EAD finding aid is a derivative of MARC 21 and • Behavior – and added to OCLC and RLIN. intended to either carry selected Associates executable A webpage is created where data from existing MARC 21 researchers can access the finding aid, behaviors with content in records or enable the creation of search the database, and listen to the the METS object. original resource description audio files. Interviews coded as restricted The METS header, file records. It includes a subset of are invisible to the search program until section, structural map, MARC fields and uses language- the date when they become open to the based tags rather than the numeric public. Administrative, structural, and structural links, and behavior sections are ones used in MARC 21 records. In descriptive metadata is created for the some cases, it regroups elements webpage to hold all the pieces together, defined within the METS allow them to be managed, and allow schema. METS is less from the MARC 21 bibliographic them to be accessed. prescriptive about format. Like METS, MODS is The library participates in a metadata descriptive and admin- expressed using the XML schema harvesting protocol to provide extracts of istrative metadata, relying language. local metadata in a common format to a on extension schemas— Although the MODS standard service provider so that information about externally developed can stand on its own, it may also the collection is automatically included in metadata schemes—to complement other metadata a number of relevant tools such as provide specific elements. formats. Because of its flexibility catalogs and portals. The METS Editorial Board and use of XML, MODS may The webpage is linked to the library’s has endorsed three potentially be used as a Z39.50 website dedicated to resources about the Next Generation specified format, ethnic group, where it is available to descriptive metadata researchers in context with archival and schemes: simple Dublin an extension schema to METS, a visual materials, digitized secondary Core, MARCXML, and metadata set for harvesting, and for sources, etc. Administrative, structural, MODS (discussed below). creating original resource metadata and descriptive metadata at the website For technical metadata records in an XML syntax. level has also been created. the METS website makes Rich description of electronic available schemas for text resources is a particular focus of and digital still images. MODS, which provides some The latter standard is advantages over other metadata
Understanding Metadata Page 5 the EAD DTD provides A MODS Record Example support for both SGML
Page 6 Understanding Metadata • Rights, describing the intellectual allow various schemes for sculpture has its own special property rights and use transactions related to different requirements. The Art Information conditions; genres such as music, journal Task Force (AITF), developed a • articles, and books to be able to conceptual framework for describ- Relation, identifying related interchange information, particularly ing and accessing information about objects; that related to intellectual property objects and images called • Annotation, containing com- rights. In order to support this Categories for the Descriptions of ments and the date and author common framework,
Understanding Metadata Page 7 MPEG Multimedia particular applications of audio. The for the framework. It was issued Metadata cross-application low-level descrip- as an ISO technical report (ISO/ tors cover Structures and Features IEC TR 21000:1-2001) and is The ISO/IEC Moving Picture (temporal and spectral). The available as a free download Experts Group (MPEG) has domain-specific high-level descrip- from ISO’s publicly available developed a suite of standards for tors include such elements as standards website. A second coded representation of digital Musical Instrument Timbre, Melody edition of the vision document is audio and video. Two of the Description, and Spoken Content underway to address comments standards address metadata: Description. and suggestions received from MPEG-7, Multimedia Content The Description Schemes are other organizations following the Description Interface (ISO/IEC based on XML, and can be initial publication. 15938), and MPEG-21, Multimedia expressed in textual form suitable • Framework (ISO/IEC 21000). Part 2: Digital Item Declaration, for editing, searching, filtering, and issued in 2003, describes a MPEG-7 defines the metadata human readability; or in a binary model for defining Digital Items. elements, structure, and rela- form for storage, transmission, and It includes a description of the tionships that are used to describe streaming delivery. Since the full syntax and semantics of each of audiovisual objects including still description of a multimedia object the Digital Item Declaration pictures, graphics, 3D models, can be quite complex, the standard elements and a corresponding music, audio, speech, video, or provides for a Summary Description XML schema. multimedia collections. It is a multi- Scheme geared to browsing and part standard that addresses: navigation. • Part 3: Digital Item Identification, • Description Tools including The standard envisions that also issued in 2003, describes Descriptors that define the search engines could use MPEG-7 how to uniquely identify Digital syntax and the semantics of metadata descriptions to identify Items and how to link Digital each metadata element and audiovisual objects in entirely new Items with related information Description Schemes that ways, such as digitizing a musical such as descriptive metadata. specify the structure and phrase played on a keyboard and • Part 4: Intellectual Property semantics of the relationships then retrieving a list of musical Management and Protection is between the elements. pieces that contain the sequence of still in development. It is intended notes; drawing some lines on an • A Description Definition Lang- to define the framework for electronic drawing tablet and uage to define the syntax of the ensuring interoperability of retrieving images with similar intellectual property manage- Description Tools, allow the graphics; or using a voice excerpt creation of new Description ment tools, including authen- to retrieve related speech files, tication, and accommodates the Schemes, and allow the photographs, video clips, and extension and modification of Rights information defined in the biographical information of the following two parts. existing Description Schemes. speaker. These retrieval mech- • • System tools, to support storage anisms are outside the scope of Part 5: Rights Expression and transmission, synch- MPEG-7, but the standards Language, issued in 2004, is a ronization of descriptions with developers wanted to machine-readable language that content, and management and accommodate these futuristic can declare rights and per- protection of intellectual property. capabilities and have included missions. many interoperability requirements • Part 6: Rights Data Dictionary is Descriptors for visual and audio beyond the typical metadata still in development. It will define are defined separately using a elements. hierarchy of elements and sub- a standard set of terms to be MPEG-21 was developed to elements. For visual objects there used with the Rights Expression address the need for an overarching are descriptors for Basic Structure, Language. It is also expected to framework to ensure interoperability Color, Texture, Shape, Motion, include specifications for of digital multimedia objects. The Localization, and Face Recognition. mapping and transforming rights multi-part standard is not yet fully Audio descriptors are divided into metadata terminology. The completed but is intended to include two categories: low-level Rights Data Dictionary and the following: descriptors that are common to Expression Language are being audio objects across most • Part 1: Vision, Technologies and viewed as models for the applications, and high-level Strategy provides the overview handling of intellectual property descriptors that are specific to of the complete vision and plan metadata for applications beyond audiovisual.
Page 8 Understanding Metadata • Part 7: Digital Item Adaptation, Documentation Initiative (DDI) information resources. The profile also in development, is intended standard for describing social defines an extended set of data for to standardize networking and science datasets. The DDI is describing biological data, such as interoperability description tools. defined as an XML DTD, and allows the taxonomic name of the Included in this part will be User for top down hierarchical description organism and its classification in the Characteristic description tools of a social science study, the data taxonomic hierarchy. that specify user preferences. files resulting from that study, and the variables Metadata in Action There are some seven additional used in the data files. A county land planner is studying the parts identified and in various There is also a header impact of new zoning laws on a particular stages of development that deal area that uses Dublin Core bird species. The study team is composed with technical interoperability issues elements for a high-level of an ecologist, hydrologist, civil engineer, of less specific relevance to and environmental protection specialist. description of the DDI metadata. All of the published parts Remote sensing data for the last 20 document itself. are available from ISO as ISO/IEC years provides a trend analysis of the 21000-[part#]. Extensions and decrease in wetlands, the bird’s habitat. These datasets have FGDC metadata. The Metadata for Datasets Profiles biologists on the study team need to document the results of a field inventory. Despite the recent Metadata schemes for datasets Using a biological profile to extend the are enabling original data in the development of many of FGDC element set, the biologists add the science and social science fields to these metadata schemes, genus-species name and taxonomic be shared in a way that was never most have already been hierarchy. The ecologists are concerned possible before the Internet. One of subject to the changes with collection methods and modeling the most well developed element brought about by imple- tools. The data related to the changes in human population are documented using sets is the Federal Geographic Data menting them in real world a metadata set developed by the Census Committee (FGDC) Content situations. These modifi- cations are of two types: Bureau. Standard for Digital Geospatial This study results in a technical report extensions and profiles. Metadata (CSDGM), officially which is assigned Dublin Core metadata known as FGDC-STD-001-1998. An extension is the by the author. When the technical report Geospatial datasets include addition of elements to an is cataloged into the organization’s topographic and demographic data, already developed repository, the Dublin Core elements are used as the basis for automatic generation GIS (geographic information scheme to support the of a MARC cataloging record. This record systems), and computer-aided description of an infor- mation resource of a is enhanced by the cataloger and included cartography base files. They are in the library’s online public access used in a wide variety of areas, particular type or subject catalog. including soil and land use studies, or to meet the needs of a biodiversity counts, climatology and particular interest group. global change tracking, remote Extensions increase the number of elements. sensing, and satellite imagery. The The U.S. Department of FGDC Content Standard is required Profiles are subsets of a scheme Education’s Gateway to Edu- for use with resources created and that are implemented by a particular cational Materials (GEM) project funded by the U.S. Government and interest group. Profiles can has based their own metadata is also being used by many state constrain the number of elements scheme on the Dublin Core. The governments. that will be used, refine element GEM profile limits the Dublin Core An international standard, ISO definitions to describe the specific elements that can be used (for 19115, Geographic Information— types of resources more accurately, example, Contributor is not allowed) metadata was issued in 2003. A and specify values that an element and makes some elements technical amendment that will allow can take. mandatory. GEM also defines ad- datasets to be both ISO and FGDC In practice, many applications ditional elements such as Audience, compliant is underway along with an use both extensions and profiles of Grade, Quality, and Standards, implementation model that can be base metadata schemes. For extending the base Dublin Core set used in conjunction with an XML example, the National Biological for educational use. schema. Information Infrastructure (NBII) A metadata scheme becoming has developed a Biological Data well established in the social and Profile of the FGDC Content behavioral sciences is the Data Standard for use with biological
Understanding Metadata Page 9 Creating Metadata being used. The template will making it difficult to locate relevant then generate a formatted set of information. Who creates metadata? The the element attributes and their answer to this varies by discipline, The Framework of Guidance for corresponding values. the resource being described, the Building Good Digital Collections, tools available, and the expected • Mark-up tools will structure the available on the NISO website, outcome, but it is almost always a metadata attributes and values articulates six principles applying to cooperative effort. into the specified schema good metadata: Much basic structural and language. Most of these tools • Good metadata should be administrative metadata is supplied generate XML or SGML appropriate to the materials in by the technical staff who initially Document Type Definitions the collection, users of the digitize or otherwise create the (DTD). Some templates include collection, and intended, current digital object, or is generated such a mark-up as part of their and likely use of the digital through an automated process. For final translation of the metadata. object. descriptive metadata, it is best in • Extraction tools will • Good metadata supports inter- some situations if the originator of automatically create metadata operability. the resource provides the from an analysis of the digital information. This is particularly true • Good metadata uses standard resource. These tools are in the documentation of scientific controlled vocabularies to reflect generally limited to textual datasets where the originator has the what, where, when and who resources. The quality of the significant understanding of the of the content. metadata extracted can vary rationale for the dataset and the significantly based on the tool’s • uses to which it could be put, and Good metadata includes a clear algorithms as well as the content for which there is little if any textual statement on the conditions and and structure of the source text. information from which an indexer terms of use for the digital object. These tools should be con- could work. • sidered as an aid to creating Good metadata records are However, many projects have metadata. The resulting objects themselves and found that it is more efficient to have metadata should always be therefore should have the indexers or other information manually reviewed and edited. qualities of archivability, professionals create the descriptive persistence, unique ident- metadata, because the authors or • Conversion tools will translate ification, etc. Good metadata creators of the data do not have the one metadata format to another. should be authoritative and time or the skills. In other cases, a The similarity of elements in the verifiable. source and target formats will combination of researcher and • information professional is used. affect how much additional Good metadata supports the The researcher may create a editing and manual input of long-term management of skeleton, completing the elements metadata may be required. objects in collections. that can be supplied most readily. Metadata tools are generally There are a number of ongoing Then results may be supplemented developed to support specific efforts for dealing with the metadata or reviewed by the information metadata schemas or element sets. quality challenge: specialist for consistency and The websites for the particular • Metadata creation tools are compliance with the schema syntax schema will frequently have links to being improved with such and local guidelines. relevant toolsets. features as templates, pick lists Creation Tools that limit the selection in a Metadata Quality Control particular field, and improved Many metadata project The creation of metadata validation rules. initiatives have developed tools and automatically or by information made them available to others, • Software interoperability pro- originators who are not familiar with sometimes for free. A growing grams that can automate the cataloging, indexing, or vocabulary number of commercial software “crosswalk” between different control can create quality problems. tools are also becoming available. schemas are continuously being Mandatory elements may be Creation tools fall into several developed and refined. missing or used incorrectly. Schema categories: syntax may have errors that prevent • • Content originators are being Templates allow a user to enter the metadata from being processed formally trained in understanding the metadata values into pre-set correctly. Metadata content ter- metadata and controlled fields that match the element set minology may be inconsistent, vocabulary concepts and in the
Page 10 Understanding Metadata use of metadata-related software Interoperability and descriptions, created at different times for different purposes, can tools. Exchange of Metadata • also be linked to each other. RDF is Existing controlled vocabularies generally expressed in XML. that may have initially been Some people ask: Do we need designed for a specific use or a so many metadata standards? With Metadata Crosswalks narrow audience are getting all the metadata standards, The interoperability and ex- broader use and awareness. For initiatives, extensions, and profiles, change of metadata is further example, the Content Types and how can interoperability be facilitated by metadata crosswalks. Subtypes originally defined for ensured? A crosswalk is a mapping of the MIME email exchange are It is important to remember that elements, semantics, and syntax commonly used as the controlled different schemes serve distinct from one metadata scheme to those list for the Dublin Core Format needs and audiences. Comple- of another. element. mentary schemes can be used to describe the same resource for A crosswalk allows metadata • Communities of users are multiple purposes and to serve a created by one community to be developing and refining number of user groups. For ex- used by another group that employs audience-specific metadata ample, a technical report could have a different metadata standard. The schemas, application profiles, a MARC metadata set in a library’s degree to which these crosswalks controlled vocabularies, and online catalog, an FGDC are successful at the individual user guidelines. The MODS User description as part of the National record level depends on the Guidelines are a good example Spatial Data Infrastructure similarity of the two schemes, the of the latter. Clearinghouse granularity of the elements in the Mechanism, and an target scheme compared to that of A Dublin Core description embedded set of the source, and the compatibility of represented in RDF Dublin Core ele- the content rules used to fill the ments. elements of each scheme. xml-20000714.dtd”> Description virtual collections where resources
Understanding Metadata Page11 Table 1. Example of Metadata Crosswalk Mapping
Dublin Core EAD MARC 21 Title Element Title
Page12 Understanding Metadata More Information on Metadata
General Resources Metadata Information Working Group on Preservation Clearinghouse Interactive Metadata, January 31, 2001 Digital Libraries: Metadata (MICI) www.oclc.org/research/projects/ Resources (IFLA) http://www.metadata pmwg/presmeta_wp.pdf http://www.ifla.org/II/ information.org metadata.htm Schemes, Initiatives, Metadata Portals and Multi- and Related Sites A Framework of Guidance for standard Projects Building Good Digital by Candy Schwartz Application profiles: mixing and Collections http://web.simmons.edu/ matching metadata schemas http://www.niso.org/framework/ ~schwartz/meta.html Rachel Heery and Manjula Patel, forumframework.html Ariadne, Issue 25, September Metadata Primer – A “How To” 2000. Introduction to Metadata: Guide on Metadata Pathways to Digital http://www.ariadne.ac.uk/issue25/ Implementation [for digital spatial app-profiles/intro.html Information data] by Martha Baca by David Hart and Hugh Phillips The Cedars Project (CURL http://www.getty.edu/research/ http://www.lic.wisc.edu/metadata/ exemplars in digital archives) conducting_research/standards/ metaprim.htm http://www.leeds.ac.uk/cedars/ intrometadata/index.html metadata.html Metadata Principles and Metadata: Cataloging by Any Practicalities CDWA (Categories for the Other Name Duval, Erik, Wayne Hodgins, Description of Works of Art) by Jessica Milstead and Susan Stuart Sutton, and Stuart L. http://www.getty.edu/research/ Feldman Weibel conducting_research/standards/ ONLINE, January 1999 D-Lib Magazine 8(4) (April 2002) cdwa/ http://www.onlinemag.net/ http://www.dlib.org/dlib/april02/ OL1999/milstead1.html weibel/04weibel.html DDI (Data Documentation Initiative) Metadata and Its Application Metadata Resources (UKOLN) http://www.icpsr.umich.edu/DDI/ by Brad Eden http://www.ukoln.ac.uk/metadata/ Library Technology Reports resources DOI (Digital Object Identifier) (September-October 2002) http://www.doi.org/ Metadata Standards Metadata Demystified: A Guide http://www.chin.gc.ca/English/ Dublin Core Metadata Initiative for Publishers Standards/metadata_intro.html (DCMI) by Amy Brand, Frank Daly, http://dublincore.org Barbara Meyers Metadata Standards, NISO Press & The Sheridan Crosswalks, and Standards EAD (Encoded Archival Press, 2003, Organizations Description) ISBN 1-880125-49-9 http://staff.library.mun.ca/staff/ http://www.loc.gov/ead/ http://www.niso.org/standards/ toolbox/standards.htm resources/ Environmental Data Registry Metadata_Demystified.pdf Metadata.net – Projects, Tools & (EPA) Services, and Schema Registry http://www.epa.gov/edr/ Metadata Fundamentals for All (Australia) FGDC Content Standard for Librarians http://metadata.net/ by Priscilla Caplan Digital Geospatial Metadata ALA, 2003, ISBN: 0-8389-0847-0 Preservation Metadata for (CSDGM) Digital Objects: A Review of the http://www.fgdc.gov/metadata/ State of the Art Gateway to Educational A White Paper by the OCLC/RLG Materials (GEM) http://www.geminfo.org/
Understanding Metadata Page 13 IFLA Functional Requirements OAIS (Open Archival Crosswalks and Lists of for Bibliographic Records Information System) Crosswalks http://www.ifla.org/VII/s13/frbr/ http://www.ccsds.org/documents/ frbr.htm 650x0b1.pdf All about Crosswalks http://www.oclc.org/research/ IMS Global Learning ONIX (Online Information projects/mswitch/ Consortium Exchange) 1_crosswalks.htm http://www.imsglobal.org http://www.editeur.org/onix.html Dublin Core / MARC / GILS
Page 14 Understanding Metadata Metadata Registries & NBII Metadata Clearinghouse FGDC Metadata Tools Clearinghouses http://metadata.nbii.gov/ http://www.nbii.gov/datainfo/ metadata/tools/ DCMI Registry Working Group The SCHEMAS Registry http://dublincore.org/groups/ http://www.schemas-forum.org/ Metadata Software Tools registry/ registry/ http://ukoln.bath.ac.uk/metadata/ software-tools/ DESIRE Metadata Registry Tools for Metadata OAI-Specific Tools http://desire.ukoln.ac.uk/registry/ Creation http://www.openarchives.org/tools/ Environmental Data Registry DDI Tools tools.html http://www.epa.gov/edr/ http://www.icpsr.umich.edu/DDI/ users/tools.html#a01 RDF Editors and Tools FGDC Clearinghouse Registry http://www.ilrt.bris.ac.uk/ http://registry.gsdi.org/ Dublin Core tools discovery/rdf/resources/#sec-tools http://dublincore.org/tools/ MICI (Metadata Information TEI Software Clearinghouse Interactive) http://www.tei-c.org/Software/ http:// index.html www.metadatainformation.org/ Glossary AACR2 (Anglo-American DC (Dublin Core) – a general extension – an element that is not Cataloging Rules) – A standard set metadata element set for describing officially part of a metadata scheme, of rules for cataloging library all types of resources. which is defined for use with that scheme for a particular application. materials. The “2” refers to the DDI (Data Documentation second edition. Initiative) - a specification for FGDC (Federal Geographic Data administrative metadata – describing social science datasets. Committee) – a U.S. Federal government interagency committee metadata related to the use, descriptive metadata – metadata management, and encoding that describes a work for purposes responsible for developing the processes of digital objects over a of discovery and identification, such National Spatial Data Infrastructure. period of time. Includes the subsets as creator, title, and subject. of technical metadata, rights GEM (Gateway to Educational management metadata, and DLF (Digital Library Federation) Materials) – a U.S. Department of preservation metadata. – a membership organization Education initiative that has defined dedicated to making digital an extension to the Dublin Core ANSI (American National information widely accessible. element set to accommodate Standards Institute) – administers educational resources. and coordinates the U.S. voluntary DOI (Digital Object Identifier) – a GIS (Geographic Information standardization and conformity unique identifier assigned to electronic objects of intellectual System) – a computer system for assessment system. property which can be resolved to capturing, managing, and CDWA (Categories for the the object’s location on the Internet. displaying data related to positions Descriptions of Works of Art) – a DTD (Document Type Definition) on the Earth’s surface. metadata element set for describing – a formal description in SGML or artworks. HTML (Hypertext Mark-up XML syntax of the structure Language) – a set of tags and rules crosswalk – a mapping of the (elements, attributes, and entities) derived from SGML used to create elements, semantics, and syntax to be used for describing the hypertext documents for the World from one metadata scheme to specified document type. Wide Web. Officially, a W3C another. EAD (Encoded Archival Recommendation. CSDGM (Content Standard for Description) – a metadata scheme
Understanding Metadata Page 15 Glossary ISO (International Organization namespace – in RDF, a way to tie SGML (Standard Generalized for Standardization) – the primary a specific use of a metadata Markup Language) – a language international standards develop- element to the scheme where the used to mark-up electronic ment organization. intended definition is to be found. documents with tags that define the IEC (International Electro- NISO (National Information relationship between the content technical Commission) – an Standards Organization) – a and the structure. Officially, international standard ISO 8879, international standards develop- standards development organ- Information processing—Text and ization, accredited by the American ment organization for all electrical, office systems—Standard Gen- electronic and related technologies. National Standards Institute, that eralized Markup Language (SGML). Co-sponsors with ISO the Joint develops library and information- Technical Committee 1 on Infor- related standards. structural metadata – metadata that indicates how compound mation Technology. ONIX (Online Information objects are structured, provided to LOM (Learning Object Metadata) Exchange) – a metadata scheme support use of the objects. – a metadata scheme for for book bibliographic, trade, and promotional data. syntax – rules for how metadata technology-supported learning elements and their content are resources. preservation metadata – a form of encoded. MARC 21 (MAchine Readable administrative metadata dealing technical metadata – a form of Cataloging) -- a formatting, record with the provenance of a resource structure, and encoding standard and its archival management. administrative metadata dealing with the creation or storage for electronic bibliographic profile – a subset of a scheme cataloging records developed by encoding processes or formats of defined and used by a particular the resource. the Library of Congress. The “21” interest group to customize the refers to the version of MARC scheme for its purposes. TEI (Text Encoding Initiative) – a issued in 1998 that integrated the metadata scheme for electronic text U.S. and Canadian versions of PURL (Persistent URL) – a naming MARC. and resolution system developed by URL (Uniform Resource Locator) OCLC utilizing an intermediate – A unique address for identifying MARCXML – a metadata scheme redirection service to locate a and locating a resource on the for working with MARC data in a resource’s URL. Internet. XML environment qualifier – an optional sub-element VRA (Visual Resources metadata – structured information to a Dublin Core element that is Association ) Core – a metadata that describes, explains, locates, used to further refine the element scheme for describing a visual work and otherwise makes it easier to or support a specific encoding and its representations scheme. retrieve and use an information W3C (World Wide Web resource. RDF (Resource Description Consortium) – an international metadata harvesting – a technique Framework) – a language for consortium that develops for extracting metadata from representing metadata about Web consensus protocols and individual repositories and resources so it can be exchanged specifications to ensure the collecting it in a central catalog between applications without loss interoperability of the World Wide of meaning. Officially, a suite of Web. METS (Metadata Encoding and W3C specifications. Transmission Standard) – a XML (Extensible Mark-up metadata scheme for complex registry – a formal system for the Language) – an application profile digital library objects. documentation of the element sets, of SGML designed for use in Web descriptions, semantics, and syntax applications. Officially, a W3C MODS (Metadata Object of one or more metadata schemes. Recommendation. Description Schema) – a metadata scheme for rich rights management metadata – a Z39.50 – a NISO and ISO standard description of electronic resources. form of administrative metadata protocol for cross-system search dealing with the intellectual property and retrieval. Officially, international MPEG (Moving Pictures Experts rights of a resource. standard, ISO 23950, Information Group) – Standards Committee 29, Retrieval (Z39.50): Application Working Group 11 of ISO/IEC JTC1, scheme (schema)– a metadata Service Definition and Protocol which develops standards for digital element set and rules for using it. Specification, and ANSI/NISO audio and video. Also refers to a semantics – the names and standard Z39.50. suite of standards developed by the meanings of metadata elements. group.
Page 16 Understanding Metadata Support the leaders in our community who support NISO as Voting Members:
3M Entopia, Inc. National Security Agency
American Association of Law ExLibris USA Nylink Libraries Fretwell-Downing Informatics OCLC, Inc. American Chemical Society Gale Group Openly Informatics, Inc. American Library Association Geac Library Solutions ProQuest Information and Learning American Society for Information Science and Technology GIS Information Systems, Inc. Random House, Inc.
American Society of Indexers H.W. Wilson Company Recording Industry Association of America American Theological Library Helsinki University Library Association The Research Libraries Group Index Data ARMA International SAGE Publications Infotrieve Armed Forces Medical Library Serials Solutions, Inc. Innovative Interfaces, Inc. Art Libraries Society of North SIRSI Corporation America Institute for Scientific Information Society for Technical AIIM International The International DOI Foundation Communication
Association of Information and Ithaka/JSTOR/ARTstor Society of American Archivists Dissemination Centers John Wiley & Sons, Inc. Special Libraries Association Association of Jewish Libraries KINS, Inc. Synapse Corporation Association of Research Libraries Library Binding Institute TAGSYS, Inc. Auto-Graphics, Inc. Library of Congress Talis Information Ltd. Barnes & Noble, Inc. The Library Corporation Triangle Research Libraries Book Industry Communication Network Los Alamos National Laboratory California Digital Library U.S. Department of Commerce, Lucent Technologies NIST, Office of Information Cambridge Information Group Services Medical Library Association Checkpoint Systems, Inc. U.S. Department of Defense, DTIC MINITEX (Defense Technical Information College Center for Library Center) Automation Modern Language Association U.S. Department of Energy, Office Colorado State Library Motion Picture Association of of Scientific & Technical America Information CrossRef MuseGlobal, Inc. U.S. Government Printing Office Davandy, L.L.C. Music Library Association U.S. National Commission on Docutek Information Systems Libraries and Information Science National Agricultural Library Dynix Corporation VTLS, Inc. National Archives and Records EBSCO Information Services Administration WebFeat
Elsevier Science Inc. National Federation of Abstracting and Information Services Endeavor Information Systems, Inc. National Library of Medicine
Understanding Metadata Page ISBN 1-880124-62-9