: A PRIMER FOR INDEXERS

Metadata: A Primer for Indexers

BY DEBBIE OLSON Introduction data that fits the definition. Metadata. It’s a term being used widely and often • “Digital libraries” made up of born-digital infor - in the library (digital libraries, cataloging), business mation or materials originally in another format Debbie Olson provides freelance (knowledge management), and web design (infor - brought together online via scanning or other indexing, library and archival services mation architecture and taxonomy development) technologies. Such initiatives are seen in the tra - and lives in Syracuse, NY. She is the worlds. Its most basic definition – one that you are ditional library environments but organizations current Webmaster for the Web Index - probably already familiar with – is “information also use digital library concepts to document ing SIG and has recently audited about information.” It’s the mantra of the informa - their in-house information. classes in information architecture and tion world. But here’s a more detailed definition • The information contained in the metadata from Syracuse University. from the National Information Standards Organiza - element of a web page, e.g. key - She can be reached at [email protected] tion (NISO) where the key to metadata is that it is words. or www.olson-info.com . “structured” information (2004): • The technical details associated with photos Metadata is structured information that from a digital camera, e.g. width, height, pixels describes, explains, locates, or otherwise per inch, resolution. makes it easier to retrieve, use, or manage • The information about music downloaded from an information resource. 1 an online source e.g. artist, length, genre. • The properties of a text document, e.g. type of Its nuances now cast a broader net in terms of file, location, creator. meaning and application. It’s partially about a Some types of metadata may be automatically change in terminology stemming from a change in embedded in a record such as in the case of photos technology but it’s also about a new way of looking taken from a digital camera or in downloaded at information. And as standards and best practices music. It can also be proprietary, as in the case of evolve, information can be more easily accessed and the metadata stored in a text editor, or non-propri - shared across communities and platforms. etary, such as that stored in the header of an HTML Indexers are already creators of metadata and document. Other metadata is added by humans. have important skills and experience to bring to the Tagging photos on a gallery website such as Flickr is broader discussion. We can start by developing a one example .2 basic understanding of the definitions, standards, schemas, applications, and best practices associated Types of Metadata with the world of metadata. Metadata is generally broken down into four or Applications for Metadata five categories. Administrative metadata includes rights and preservation metadata (sometimes sepa - Metadata creation is not limited to text-based rate) and contains information regarding the cre - material. It can be applied to three-dimensional ation, management, and accessibility of the material. objects, images, audio, video materials, geospatial Technical metadata (sometimes included with admin - data, PowerPoint presentations, lesson plans, web - istrative) contains information about how the item sites, computer file systems, and just about anything or piece of information was generated or transferred, we want in order to find it faster and more effi - e.g. camera settings for a photograph that are auto - ciently in the future. A few examples include: matically embedded in the file/object. Structural • The traditional library card catalog – certainly metadata refers to how the information or object is one of the earliest, mainstream forms of meta - to be compiled, e.g. filenames for chapters of a

18 JANUARY – MARCH 2009 METADATA: A PRIMER FOR INDEXERS book. Descriptive metadata, the type of meta - munity can be viewed in the the genre term to input in the “type” element. data we’re most familiar with and indexers are section of the page info and page source codes It can be adapted or modified for non-DCMES most likely to work with, includes elements of the home page for IBM. 7 is projects. such as titles, genres, authors/creators, and declared as the namespace , where the name of The Tools of Metadata Creation subjects. 3 the schema and the web location (URL) where Metadata Standards and Schemas it resides is given. For example, IBM uses six Metadata can be input by hand or with the Dublin Core elements that are noted by the assistance of tools such as templates and edi - Metadata standards and schemas provide a “dc” prefix: dc.publisher, dc.language, dc.date, tors. Schemas may also have tools created way to format and exchange metadata and dc.rights, dc.subject, and dc.type. W3C-DTF, a especially for them to assist with creating and improve interoperability , or the ability to be syntax standard for expressing the date and encoding metadata for the Web. One example understood and shared across information time, is used for configuring the date and is DC-dot, a Web-based application that will communities and platforms. Metadata stan - dc.subject refers to internal IBM taxonomies . generate DCMES metadata in HTML or XML dards address how to describe content and DCMES, however, has become too simplistic for embedding in the of a Web what must be included, such as a title, creator, for some communities. The fifteen elements page. 10 date, and subject. Standards also provide the were refined and expanded with additional There are also crosswalks , tools to assist with rules, or syntax , for how values , e.g. , the name properties but other standards and schemas converting one schema to another through of a particular title or a subject heading, are have been developed too. A few examples mapping one set of elements to another or via constructed or formatted. Metadata schemas , as include: Categories for the Description of Works an intermediary, e.g., one schema is mapped to opposed to schemas for encoding material for of Art (CDWA) and the VRA Core Categories DCMES then mapped to another schema from the Web such as XML, include specific sets of (VRA) for artwork and other visual media; DCMES. It is important to keep in mind that elements, e.g., title, creator, date, and subject. IEEE Learning Object Metadata (LOM) and with any of these tools a certain amount of In the library community, the Anglo-Ameri - Gateway to Educational Materials (GEM) for editing will be necessary to meet the needs of a can Cataloging Rules (AACR) provides the rules educational materials; Encoded Archival Descrip - particular project or community. to describe the content of a bibliographic tion (EAD) for archival finding aids; Content Metadata may also be created through a record while MAchine Readable Cataloging Standards for Digital Geospatial Metadata content management system (CMS) which (MARC) provides the schema, or set of ele - (CSDGM) for scientific material; and ONline provides for the authoring, publishing, archiv - ments, to markup and encode this biblio - Information eXhange (ONIX ) and the Text ing, and searching of documents. Many corpo - graphic information in an electronic format, Encoding Initiative (TEI) for booksellers and rations are using CMSs to manage websites e.g., 245 is the title field. In the archival com - publishers. Elements from various schemas and internal information 11 and many cultural munity, Describing Archives: A Content Standard may also be combined to create an application institutions are using digital management soft - (DACS) , provides the standards for describing profile , a formal set of guidelines on how a par - ware packages, e.g., CONTENTdm®, to cata - archival materials while Encoded Archival ticular community is using the elements of one log their collections. 12 Open source (free) Description (EAD) provides the set of elements or several schemas. systems are also available to manage content necessary to markup and encode the material, The public sector is also involved and and metadata. such as the components of a finding aid, for an invested in developing standards and schemas Indexers as Metadata Creators online environment. to manage the ever-expanding amount of Current metadata standards and schemas online government information. The Australian In expanding our knowledge and vocabu - were developed and became more widespread Government Locator Service (AGLS), the e- lary to include the world of metadata we can primarily in the mid-1990s with the creation Government Metadata Standard (e-GMS) (UK), take a step towards not only maintaining, but of the Dublin Core Metadata Element Set and the Government of Canada Records Man - more importantly, advancing our role in the (DCMES) . Sometimes simply known as agement Metadata Standard (GC RMMS) are information world in our own eyes and those “Dublin Core,” it is named after Dublin, Ohio, examples. The U.S. federal government recom - of the wider information community. We can the location where the original workshop con - mends the use of metadata as a part of its best begin by familiarizing ourselves with the basics vened and where OCLC, the library cataloging practices for government websites . 8 Many of of metadata and its applications, standards, services company, is headquartered. The initia - these public standards are based on DCMES. schemas, best practices, and tools available to tive called for a set of categories that could be assist us with its creation . used to describe electronic resources and The Role of Controlled Vocabularies Through attending workshops and partici - involved professionals in the fields of com - Specific controlled vocabularies, such as the - pating in ventures beyond the indexing world puter science, librarianship, archives and sauri and name and subject authorities, may we can actively bring our indexing knowledge indexing .4 either be recommended for use with a particu - and expertise to the broader information table Objectives of the workshop included creat - lar standard or may need to be located, and open ourselves and our profession to new ing a core set of elements and promoting refined, or created from a variety of vocabular - opportunities for collaboration while also understanding among stakeholders, informa - ies or resources, especially if the terminology is expanding and developing new markets for 12 tion communities and users. 5 The result was a localized or emerging. The Getty Institute’s Art our services. basic set of fifteen elements: contributor, cov - and Architecture Thesaurus (AAT)®, Union List End Notes erage, creator, date, description, format, identi - of Artist Names (ULAN)®, Thesaurus of Geo - 1 Understanding Metadata. Bethesda, MD: National fier, language, publisher, relation, rights, graphic Names (TGN)®, and , Cultural Objects 6 Information Standards Organization. 2004. source, subject, title and type . Each element is Name Authority (CONA)™ are widely used as www.niso.org/publications/press/Understanding optional and repeatable and provides enough are the Library of Congress’ Name and Subject Metadata.pdf simplicity and flexibility to be adopted by a Authorities .9 Other vocabularies, such as the 2 For an example of the application of metadata in variety of communities. DCMI Type Vocabulary , provide a list of values regards to photo gallery sites, see Fred Brown’s An example of DCMES in the business com - for a specific element in DCMES, in this case, article, “Metadata Goes Mainstream,” KnowGene -

KEY WORDS / VOL. 17, NO. 1 19 METADATA: A PRIMER FOR INDEXERS

sis International Journal for Technical Communi - Taylor, Arlene G. The Organization of Information, Visual Resources Association Foundation. Cata - cation. Vol. 2, Issue 1 (March 2007), pp 9-10. 2nd edition. Westport, CT: Libraries Unlimited, loging Cultural Objects: A Guide to Describing www.allegrotechindexing.com/metadata.pdf 2004. Cultural Works and Their Images (CCO). 3 Understanding Metadata. Bethesda, MD: National Wyman, Pilar L. “Navigating the Future of Technical www.vrafoundation.org/ccoweb/index.htm . Information Standards Organization. 2004. Communication: STC’s 51st Annual Conference,” www.niso.org/publications/press/Understanding Key Words, vol 12, no. 3 (July-September 2004), Syntax Standards Metadata.pdf pp. 100-104. This conference report summarizes Date and Time Formats (W3C-DTF). 4 Dublin Core Metadata Initiative (DCMI). two sessions that discuss the role of indexers in www.w3.org/TR/NOTE-datetime . www.dublincore.org/workshops/dc1/ . Although the metadata creation. ISO Country Code List. ISO 3166-1193 (E). indexing community is listed as involved in the Zeng, Marcia Lei and Jian Qin. Metadata. New www.iso.org/iso/country_codes/iso_3166_code_ workshop, ASI was not listed on the registration York: Neal-Schuman Publishers, Inc. 2008. lists/english_country_names_and_code_elements.htm. list (also available online at this address). Accessed December 12, 2008. Metadata Standards and Schemas Controlled Vocabularies 5 Dublin Core Metadata Initiative (DCMI). Australian Government Locator Service (AGLS). DCMI Type Vocabulary. www.dublincore.org/workshops/dc1/general.shtml . www.nla.gov.au/metadata.html . dublincore.org/documents/dcmi-type-vocabulary/. Accessed December 12, 2008. Categories for the Description of Works of Art Getty Institute Vocabularies. 6 For details visit: www.dublincore.org/ (CDWA). The Getty. www.getty.edu/research/ www.getty.edu/research/conducting_research/ documents/dces/ . conducting_research/standards/ . vocabularies /. 7 Visit IBM at www.ibm.com/us/ . Accessed December Content Standard for Digital Geospatial Metadata Library of Congress Name and Subject Authorities. 12, 2008. (CSDGM). Federal Geographic Data Committee. authorities.loc.gov /. 8 Federal Web Content Manager’s Advisory Council www.fgdc.gov/metadata/csdgm/ . National Library of Medicine. Medical Subject (US). www.usa.gov/webcontent/managing_content/ e-Government Metadata Standard (e-GMS) (UK). Headings (MeSH). www.nlm.nih.gov/mesh/. organizing/metadata.shtml . www.govtalk.gov.uk/schemasstandards/ Metadata Tools 9 metadata_document.asp?docnum=768 . For more information on controlled vocabularies, DC-dot. www.ukoln.ac.uk/metadata/dcdot/. their use, creation and links, visit ASI’s Tax - Federal Web Content Manager’s Advisory Council onomies and Controlled Vocabulary SIG site at (US) www.usa.gov/webcontent/ Dublin Core tools and software. dublincore.org/tools/. www.taxonomies-sig.org/ . managing_content/organizing/metadata.shtml . oXygen (XML Editor). SyncRO soft ltd. 10 DC-dot. www.ukoln.ac.uk/metadata/dcdot/ . Gateway to Educational Materials (GEM). Spon - www.oxygenxml.com/xml_schema_editor.html 11 For more information on content management sored by the National Education Association. Content Management Systems (CMSs) systems (CMS), see Fred Leise’s article, “Metadata www.thegateway.org/about/gemingeneral/about-gem/ . Government of Canada Records Management Meta - CONTENTdm from OCLC (digital collections man - and Content Management Systems: An Introduc - agement). www.contentdm.com/. tion for Indexers,” The Indexer, Vol. 24, no. 2 data Standard (GC RMMS). (October 2004), pp. 71-74. Available online at: www.collectionscanada.gc.ca/government/ Drupal (free). www.drupal.org/. www.contextualanalysis.com/publications/ products-services/007002-5001-e.html . Joomla (free). www.joomla.org/about-joomla.html. • Indexer_2004_02_Leise%20-%20Final.pdf . Learning Object Metadata (LOM). IEEE. ltsc.ieee.org/ 12 For examples of digital library collections using wg12/files/LOM_1484_12_1_v1_Final_Draft.pdf . metadata, visit the CONTENTdm in action web Library of Congress. www.loc.gov/standards/ . page at: www.oclc.org/contentdm/collections/ United Kingdom. The e-Government Metadata Stan - default.htm . You will be able to view the item and dard (e-GMS). www.govtalk.gov.uk/ its metadata once you choose an individual item schemasstandards/metadata.asp . within a collection. Visual Resources Association (VRA) Core Categories. Metadata in the Library, www.vraweb.org/projects/vracore4/index.html . Business, Information architecture Encoding Standards or Indexing professions Encoded Archival Description (EAD). Library of Baca, Murtha, ed. Introduction to Metadata: Path - Congress. www.loc.gov/ead/ . ways to Digital Information, 2nd edition (online Dublin Core Metadata Initiative (DCMI). version 3.0), 2008. Los Angeles, CA: The J. Paul dublincore.org/ . Getty Trust. Available at: www.getty.edu/research/ MAchine Readable Cataloging (MARC). conducting_research/standards/intrometadata/ www.loc.gov/marc/ . Brand, Amy, et. al. Metadata Demystified: A Guide ONline Information eXhange (ONIX). Maintained for Publishers. The Sheridan Press & NISO Press, by EDItEUR with the Book Industry Communi - July 2003. Available at: www.niso.org/publications/ cation (UK) and the Book Industry Study Group press/Metadata_Demystified.pdf . (US). www.editeur.org/onix.html . Browne, Glenda and Jon Jermey. The Indexing Text Encoding Initiative (TEI). TEI Consortium, Companion. Cambridge, England: Cambridge eds. www.tei-c.org/index.xml . University Press, 2007. Foulonneau, Muriel and Jenn Riley. Metadata for Content Standards Digital Resources: Implementation, Systems Design Anglo-American Cataloging Rules (AACR). Chicago: and Interoperability . Oxford, England: Chandos American Library Association, 2002. Press, 2008. (See Heather Hedden’s review in this Describing Archives: A Content Standard (DACS). issue of Key Words ) Chicago: Society of American Archivists, 2004. Lambe, Patrick. Organising Knowledge: Taxonomies, www.archivists.org/catalog/ Knowledge and Organizational Effectiveness. pubDetail.asp?objectID=1279 . Oxford, England: Chandos Press, 2007. Digital Library for Earth Systems Education. Morville, Peter and Louis Rosenfeld. Information www.dlese.org/library/index.jsp . Architecture for the World Wide Web, 3rd ed. Sebastopol, CA: O’Reilly Media, Inc., 2007.

20 JANUARY – MARCH 2009