Controlled Vocabularies: an Overview
Total Page:16
File Type:pdf, Size:1020Kb
Controlled Vocabularies: an Overview MURTHA BACA DESCRIPTIVE METADATA WORKSHOP AT REED COLLEGE MAY 28, 2010 TYPOLOGY of DATA STANDARDS Data structure standards (metadata element sets): MARC, EAD, Dublin Core, CDWA, VRA Core Categories Data content standards (cataloging rules): AACR (ÎRDA), ISBD, CCO, DACS Data value standards (vocabularies): LCSH, LCNAF, TGM, AAT, ULAN , TGN, MeSH DtData forma tstddtandards (tdd(standards expressed in machine-readable form): MARC, MARCXML, MODS, EAD, CDWA Lite XML, Dublin Core Simple XML schema, DC Qualified XML schema, VRA Core XML schema M. Baca: Overview of Controlled Vocabularies Workshop at Reed College, 2010-05-28 1 What are vocabularies? y Maps to guide people to information { creating / filli ng { searching / researching { organizing / classifying / thinking y Collections of terminology where relationships between terms are represented y Data value standards (i.e. what is used to “fill” metadata elements/categories or “containers” of information) What are vocabularies? “Knowledge bases” -- bodies of knowledge represented by language (glossaries, dictionaries, thesauri, word lists) M. Baca: Overview of Controlled Vocabularies Workshop at Reed College, 2010-05-28 2 Types of terms in vocabularies personal names: Collate, Charles B. geographic names: Campbeltown (Argyll and Bute, SldScotland, UK) object names: clack valve corporate names: Cambrian Railways iconographic subjects and themes: The Legend of John Henry genre terms: political cartoons, fish stories multilingual equivalents: flat car (English) = Schienenwagen (German) = platforma (atklata) (Latvian) What is a controlled vocabulary? A tool for consistency in the language used in the recording and retrieval of information M. Baca: Overview of Controlled Vocabularies Workshop at Reed College, 2010-05-28 3 What is a controlled vocabulary? An organized arrangement of words and phhrases that are used to index content and/or to retrieve content through navigation or a search TTypi ically ll a vocabu lary that incl ud es preferred terms and has a limited scope or describes a specific domain Types of Controlled Vocabularies M. Baca: Overview of Controlled Vocabularies Workshop at Reed College, 2010-05-28 4 Controlled Lists Simple lists of terms used to control terminology In a well-constructed controlled list: Each term must be unique (no homographs). Terms should all be members of the same class. Terms should not be overlapping in meaning Terms should be equal in granularity or specificity. Terms should be arranged alphabetically or in another logical order . Controlled Lists cont. May include terms from other controlled vocabulary resources (especially standard published vocabularies) For some elements or fields in a database, a controlled list may be sufficient to control terminology, p articularly where the terminology for that field is limited and unlikely to have synonyms or ancillary information. (Example: artists’ roles in ULAN, place types in TGN). M. Baca: Overview of Controlled Vocabularies Workshop at Reed College, 2010-05-28 5 Subject Headings Compilations, usually in alphabetical order, that combine separate concepts into a “string,” as in the Library of Congress Subject Headings (LCSH) Commercial fishing -- Japanese competition Salmon fisheries -- law and legislation -- California Subject Headings cont. Pre-coordination of terminology is a characteristic of subject headings; subject headings typically combine several unique concepts together. Subject headings--Pictures. Pictures--Computer network resources. World Wide Web--Subject access. M. Baca: Overview of Controlled Vocabularies Workshop at Reed College, 2010-05-28 6 Authority Files • Compilations of authorized terms or headings used by a single information system, organization, or consortium for cataloging, indexing, and documentation. • Main purpose is to regulate usage. • Include synonyms (“See” references) and related or associated terms (“See also” references). • Examples include Library of Congress Name Authority File (LCNAF), local authorities for names, subjects, etc. • Authority files may take the form of thesauri, word lists, etc.—in other words, any kind of controlled vocabulary can be used as an authority. Taxonomies/Classifications Vocabularies that orggyanize a body of knowledge for a defined domain into conceptual categories, e.g. Nomenclature for Museum Cataloging, ICONCLASS. The Greek heroic legends Story of Hercules (Heracles) LbLabors of Hercul es Hercules chokes the Nemean lion Hercules kills the Hydra of Lerna Hercules captures the Ceryneian hind Hercules captures the Cretan bull http://www.iconclass.nl/ M. Baca: Overview of Controlled Vocabularies Workshop at Reed College, 2010-05-28 7 Thesauri Compilations of terms representing single concepts. Thesauri explicitly express relationships among terms via a semantic structure. <visual works by form> dioramas didiptych s medals medallions (medals) polyptychs triptychs Thesauri cont. Terms in a thesaurus may have the following three types of relationships: Equivalence Hierarchical Associative M. Baca: Overview of Controlled Vocabularies Workshop at Reed College, 2010-05-28 8 Thesaural Relationships y Equivalence { synonyms, spelling variations, language variations y Hierarchical { broader to narrower Ùwhole/part Ùgenus/species y Associative { related concepts Equivalence Relationship: Terms/names denote the same thing—a preferred name is used for displays Bulgarini, Bartolomeo (Sienese painter, circa 1337-1378) Lorenzetti, Ugolino Master of the Ovile Madonna Ovile Master example from ULAN M. Baca: Overview of Controlled Vocabularies Workshop at Reed College, 2010-05-28 9 Equivalence Relationship y still lifes y still life y still-lifes y still lives y nature morte y natura morta y stilleven y Stilleben y vie coye y ontbijtje y banketje Whole/Part Relationship: “children” or narrower terms are part of the parent or broader term España..........................(nation) Andalucía....................... (region) Almería.........................(province) Cádiz...........................(province) Córdoba.........................(province) Granada.........................(province) Huelva..........................(province) Málaga..........................(province) Sevilla.........................(province) M. Baca: Overview of Controlled Vocabularies Workshop at Reed College, 2010-05-28 10 Genus/Species Relationship: “children” represent types of the “parent” or broader term funerary sculpture brasses effigies gisants... haniwa tomb slabs ushabti Associative Relationship: terms are related conceptually, but not necessarily hierarchically Descriptor: charterhouses Hierarchy: Built Complexes and Districts Scope note - Carthusian monasteries. Alternate Forms of Speech {ALT}: charterhouse Synonyms and spelling variants {UF}: certose charter houses chartreuses Related concepts: Carthusian (Religions hierarchy) M. Baca: Overview of Controlled Vocabularies Workshop at Reed College, 2010-05-28 11 Vocabularies provide intellectual “paths” that can improve access to information Harlem Renaissance Negro Renaissance New Negro Movement Renaissance, Harlem Renaissance, Negro Example from the AAT Jacob Lawrence Tombstones, 1942 Why do we need vocabularies? y Because of national and regional differences: lorries vs. trucks, lifts vs. elevators, Tom Thumb golf courses vs. miitiniature golf courses y Because of historical vs. contemporary names: Iran vs. Persia vs. Islamic Republic of Iran y Because of political and social changes: KhoiKhoi vs. Hottentot y Because of linguistic differences: Titian vs. Tiziano vs. Titien; pottery vs. keramik vs. céramique y To disambiguate homographs: sinopia (pigment -- Materials hierarchy) vs. sinopia (preliminary drawing -- Visual Works hierarchy) M. Baca: Overview of Controlled Vocabularies Workshop at Reed College, 2010-05-28 12 Why do we need vocabularies? Thesaural relationships provide greater research/searching capabilities: drawings <drawings by function> preliminary drawings underdrawings siiinopie Issues in vocabulary-enhanced searching y User interfaces are problematic y Optimally, controlled vocabularies should be used both on the “back end” and on the “front end” to be most effective y Economics: consistent implementation of controlled vocabularies is time- and labor- intensive y Vocabulary control is almost non-existent on the open Web at present M. Baca: Overview of Controlled Vocabularies Workshop at Reed College, 2010-05-28 13 Search “ARES” Against Google (returns 1,250,000 pages; none of first 6 pages are relevant) Increase precision by ANDing the broader/parent term of ARES, “Major Gods” M. Baca: Overview of Controlled Vocabularies Workshop at Reed College, 2010-05-28 14 “Ares AND Major Gods” now narrow to 506 hits (all first 7 pages are relevant) Examples of standards for data values: ¾ The Getty Vocabularies ¾ Library of Congress Name Authority File (LCNAF) ¾ Libraryyg of Congress Subj ect Heading g()s (LCSH) ¾ ICONCLASS M. Baca: Overview of Controlled Vocabularies Workshop at Reed College, 2010-05-28 15 The Getty Vocabularies http://www.getty.edu/research/conducting_research/vocabularies/ The Getty Vocabularies Compiled and maintained by the Getty Vocabulary Program ¾Union List of Artist Names® (ULAN) ¾ 117,600 ‘records’; 257,241 names ¾Art & Architecture Thesaurus® (AAT) ¾ 33,150 ‘records’; 128,075 terms ¾Getty Thesaurus of Geographic Names® (TGN) ¾911,300 ‘records,’1,102,200 names Focus on the visual arts, architecture, & material