<<

Glossary

abbreviation empires, nations, states, districts, and A shortened form of a name or term (e.g., townships. See also physical feature. Mr. for Mister). See also acronym and algorithm initialism. In the context of this book, an algorithm access point is a procedure, a formula, or the rules in a An entry point to a systematic arrange- computer program or set of programs, often ment of information, specifically an expressed in algebraic notation, that follow a indexed field or heading in a work record, logical, unambiguous step-by-step process a record, or another content to retrieve a set of results, solve a problem, object that is formatted and indexed in make a decision, manipulate or alter data, or order to provide access to the information achieve some other result or state. Although in the record. a computer program may be considered one large algorithm, in common usage in acronym computer science, the term typically refers An abbreviation or formed from the to a small procedure applied recurrently. initial letters of a compound term or phrase See also computer program. (e.g., MoMA, for Museum of Modern Art). See also abbreviation and initialism. alphanumeric classification scheme A set of controlled codes (letters or ad hoc query numbers or both) that represent concepts called a direct query. A query or Also or headings and generally have an implied report that is constructed when required taxonomy that can be surmised from the and that directly accesses data files and codes (e.g., the Dewey Decimal Classifica- fields that are selected only when the query tion system number 735.942). See also is created. It differs from a predefined chain indexing. report or querying a database through a user interface. alternate descriptor (AD) A variant form of a descriptor available for administrative data use; usually a singular form or a different In the context of cataloging art, informa- part of speech than the descriptor (e.g., tion having to do with the administrative lithograph is an alternate descriptor for the history and care of the work and the history plural descriptor lithographs). In thesauri, of the catalog record (e.g., insurance value, the relationship indicator for this type of conservation history, and revision history term is AD. of the catalog record). See also descrip- tive data. ancestor In a hierarchy, any record that is a broader administrative entity context for the record at hand, including In the context of a geographic vocabulary, parents, grandparents, and all other broader a political or other administrative body contexts at higher levels; any node in the defined by administrative boundaries and succession of parent nodes on a path all the conditions, including inhabited places, way up to the root. See also descendant.

210 Glossary 211 antonym by museums. Performance art is also A term that is the in meaning included, but the performing arts are not. of another term (e.g., roughness is an Note that these are works of visual art of antonym for smoothness). the type collected by art museums. The objects themselves may actually be held by application an ethnographic, anthropological, or other Also called an application program. A museum, or owned by a private collector. software program designed to accomplish a task for an end user (e.g., word processing artist or project management), as distinguished Any person or group of people involved in from the operating system program that the design or production of visual arts that runs the computer itself. are of the type collected by art museums. application programming ascending order interface (API) In the context of a string of hierarchical In the context of this book, an online parents, refers to the display of parents system, source code, and interface that a from narrowest to broadest (e.g., Columbus data provider (e.g., a vocabulary provider (Bartholomew county, Indiana, United or library) employs to allow users to have States) ). See also descending order. access to the data. It may be language ASCII dependent (designed for a specific Acronym for the American Standard Code programming language) or language for Information Interchange, a 7-bit char- independent (works with multiple pro- acter code defining 128 characters used for gramming languages). information interchange, data processing, architect and communications systems. A person or firm involved in the design or associative relationship creation of structures or parts of structures In a , the relationship between that are the result of conscious construc- concepts that are closely related conceptu- tion, are of practical use, are relatively ally, but the relationship is not hierarchical stable and permanent, and are of a size because it is not whole/part or genus/ and scale appropriate for—but not limited species. The relationship indicator for this to—habitable buildings. relationship is RT (for related term). architectural work See also equivalence relationship and See built work. hierarchical relationship. architecture asymmetric relationship Refers to the built environment that is In the context of a thesaurus, refers to a typically classified as fine art, meaning it reciprocal relationship that is different is generally considered to have aesthetic in one direction than it is in the reverse value, was designed by an architect, and direction—for example, BT/NT (for was constructed with skilled labor. See also broader term/narrower term). See also built work. symmetric relationship. archival group authoritative source See group. A published source that is based on reliable documentary evidence that is accepted as art true by most experts and used as a standard In the context of this book, refers to the source in a given discipline. visual arts such as painting, sculpture, drawing, printmaking, photography, authority file ceramics, textiles, and decorative arts of Also called simply an authority. A file, the type and caliber generally collected typically electronic, that serves as a source 212 Introduction to Controlled

of standardized forms of names, terms, of data processing, typically accomplished titles, etc. Authority files should include by the computer without user interaction, references or links from variant forms as contrasted to entering records manu- to preferred forms. The main purpose of ally, one at a time. See also load and an authority is to enforce usage, often processing. requiring users to use only the preferred batch processing term for a given concept. Any type of See processing. vocabulary can be used as an authority. See also controlled vocabulary and local best match authority. Also called a weighted term ranking. Refers to a variety of electronic term- authority heading matching and ranking methods that attempt A preferred, authorized heading used in a to predict the potential relevance of query vocabulary, particularly in a bibliographic results by assigning relevance scores and authority file that typically includes a ranking based on comparing search terms string of names or terms, with additional to the indexing terms of the target database. information as necessary to allow disam- See also exact match. biguation between identical headings (e.g., United States—History—Civil blind reference War, 1861–1865—Battlefields and In the context of a vocabulary that is being United States—History—Civil War, used for indexing or retrieval on a defined 1861–1865—Campaigns ). The types of data set, refers to a term in the vocabulary authority headings used by the Library of that is not linked to any content in the data Congress are the following: subject, name, set. End users should typically not receive title, name/title, and keyword authority blind references in a retrieval situation headings. See also heading. because they result in a failed search; however, these terms should be retained in authorization structured vocabularies that are used for In the context of vocabularies, the process indexing because they may be needed in the by which the creators of a vocabulary or an future or in another context. oversight group regulate the selection of terms and establishment of relationships in Boolean operators a controlled vocabulary. See also warrant. Logical operators used as modifiers to refine the relationship between terms in automatic indexing a search. The four most commonly used In the context of online retrieval, indexing Boolean operators are AND, OR, NOT, and by the analysis of text or other content using ADJ (adjacent). They may be used with computer algorithms. The focus is on auto- parentheses and other punctuation to form matic methods used behind the scenes with logical groupings of criteria in queries (e.g., little or no input from individual searchers, (Castillo OR Rancho) AND Diego). with the exception of relevance feedback. The results tend to be broad and imprecise, bound term as contrasted to human indexing. See also A compound term representing a single co-occurrence mapping. concept, characterized by the fact that the almost always occur together and autoposting the meaning is lost or altered if the term is See up-posting. split into its component words. See also batch load compound term and lexical unit. In the context of populating or contributing brand name to vocabulary systems or other databases, A trade or proprietary name for a thing or refers to moving or manipulating a group process (e.g., Super Glue). of records as a single unit for the purpose Glossary 213 broadcast searching cataloger See federated searching. In the context of this book, the person who records information in records for works. broaden results See also end user and indexer. To adjust criteria in a search in order to retrieve a larger number of results, typically cataloging because the searcher did not find what he In the context of this book, the process of or she wanted in an initial narrower search. describing and indexing a work or image, See also narrow results. particularly in a collections management system or other automated system. Cata- broader term (BT) loging involves the use of prescribed fields Also called a broader context. A of information and rules (e.g., the rules vocabulary record to which another record described in CCO and CDWA ). or multiple records are subordinate in a hierarchy. In thesauri, the relationship indi- cataloging rules cator for this type of term is BT. Variations See editorial rules. on the notation include BTG (broader term cataloging tool generic), BTP (broader term partitive), BTI A system that focuses on content descrip- (broader term instance), BT1 (broader term tion and labeling output (e.g., wall labels or level 1), BT2 (broader term level 2), etc. slide labels), often part of a more complex browsing collection management system. The process whereby a user of a system chain indexing or Web site visually scans and maneuvers Also called chain procedure. A technique through navigation lists, results lists, hier- for indexing that uses a numeric or alpha- archical displays, or other content in order numeric classification scheme—for to make a selection, as contrasted to the example, the Dewey Decimal Classification user entering a search term in a search box. system—where the entries have meaning See also searching. beyond simple numeric sequencing (e.g., built work in Dewey number 735.942, 735 means An instance of architecture, which includes sculpture after the year 1400 c e , 9 means structures or parts of structures that are geographic area, 4 means Europe, and 2 the result of conscious construction, are means England ). of practical use, are relatively stable and child permanent, and are of a size and scale See narrower term. appropriate for—but not limited to—habit- able buildings. Built works in the context classification of art information are manifestations of the In the context of this book, the process built environment typically classified as fine of arranging works or other content art, meaning it is generally considered to objects systematically in groups or have aesthetic value, was designed by an categories of shared similarity according architect (whether or not his or her name is to established criteria and using terms to known), and constructed with skilled labor. identify the classes. See also architecture and movable classification notation work. In a vocabulary, a numeric, alphabetic, or candidate term alphanumeric code in a system of codes Also known as a provisional term. A term used to classify or categorize entries; may under consideration for admission into a be used in a hierarchical arrangement to controlled vocabulary because of its poten- impose a display or sorting order on the tial usefulness. See also contribution. lines or levels in the hierarchy (e.g., V, V.PC, V.PE). See also notation. 214 Introduction to Controlled Vocabularies

classified display buttresses). See also bound term, See hierarchical display. complex term, and lexical unit. clustering computer code In the context of automated data, usually Also called code. The machine-readable refers to the process of grouping or form, arrangement of data, and instruc- ­classifying items or data through automatic or tions of a computer program that are algorithmic means rather than incorporating created when a computer program, which human judgment. was written by a human programmer, is converted into binary code that can be read code by a computer. See computer code. computer program collection Also called a program. A specific set of In the context of cataloging art, refers instructions for ordered operations that to multiple works that are physically or result in the completion of a task by the conceptually arranged together, including computer; a computer program consists the entire set of objects curated by a given of computer code. While the program museum or other repository. is technically a type of data, computer collection management system programs are generally considered as (CMS) separate from the data to which the A type of database system that allows an programs refer (e.g., data would be the institution to control various aspects of its terms, scope notes, etc., in a vocabulary collections, including description (artist, record). A program is interactive if it acts title, measurements, media, style, subject, when prompted by an action or information etc.) as well as administrative information supplied by a user, or batch if it automati- regarding acquisitions, loans, and conser- cally runs at a certain time or under certain vation information. conditions and then stops after the task is completed. A program is written in a complex term programming language. See also A single phrase denoting more than two processing. distinct concepts, which could be broken out and used independently, as defined by computer system the Library of Congress. See also bound See system. term, compound term, and heading. concept component In the context of the AAT and other thesauri In the context of cataloging art and archi- comprising generic terms, the subject of tecture, a part of a larger item. A component the vocabulary record (i.e., the concept to differs from an item in that an item can which the terms refer), including abstract stand alone as an independent work, but concepts; physical attributes such as shape, a component typically cannot or does not pattern, and color; style or period; activities; stand alone (e.g., a panel of a polyptych terms for performers of activities; materials; or a façade of a basilica). See also group objects; and visual and verbal communica- and item. tion forms. See also discrete concept. compound term concept record A term consisting of two or more words. See record. In the context of this book, mention of conceptual data model compound terms generally refers to bound An abstract model or representation of data terms, which are compound terms that for a particular domain, business enter- represent a single concept (e.g., flying prise, field of study, etc., independent of Glossary 215 any specific software or information system; to retrieve content through browsing or usually expressed in terms of entities and searching. A controlled vocabulary typi- relationships. See also logical data cally includes preferred and variant terms model. and has a limited scope or describes a specific domain. content object In the context of a database, any entity that co-occurrence mapping contains data. A content object can itself be Also called co-occurrence clustering. made up of content objects. For example, An automated method of compiling groups a journal is a content object made up of of terms that tend to occur together in individual journal articles, which are them- certain contexts and are therefore presumed selves content objects. See also informa- to be related in some way; the resulting tion object. groups of terms are considered to be loosely related and may be used to auto- contribution matically broaden a user’s search or to In the context of controlled vocabularies, a suggest alternative search terms to users in term or record that is submitted for admis- order to improve search results. See also sion into a thesaurus or other vocabulary by automatic indexing. an agency or individual outside the group responsible for maintaining the vocabulary; core fields contributions are typically made by users of Also called core elements. In the context the vocabulary. See also candidate term. of this book, the set of fields representing the fundamental or most important informa- controlled field tion required for a minimal record, whether In the context of this book, a field in a the record is a work record or a vocabulary record that is not free text, meaning it is record. See also required fields. specially formatted and often linked to controlled vocabularies (authorities) or corporate body controlled lists to allow for successful In the context of vocabularies discussed retrieval. See also free-text field. in this book, an organized, identifiable group of individuals working together in a controlled format particular place and within a defined period Rules applied to the field regarding the of time, whether or not they are legally types of values that may be included (e.g., a incorporated (e.g., architectural firms, artist controlled measurement’s value field would studios, and art repositories). allow only numbers). Fields may have controlled format in addition to being linked criteria to controlled vocabulary, or the controlled In the context of this book, a specific set of format may exist in the absence of any finite limiting conditions used to create a query controlled list of valid values. or select a subset of entries (e.g., a WHERE statement in SQL). See also variable. controlled list A simple list of terms used to control termi- cross-database searching nology. In a well-constructed controlled list, See federated searching. terms should be unique, members of the cross-reference links same class, not overlapping in meaning, See syndetic structure. equal in granularity/specificity, and arranged alphabetically or in another logical crosswalk order. A type of controlled vocabulary. A chart or table (visual or virtual) that repre- sents the semantic or technical mapping controlled vocabulary of fields or data elements in one database, An organized arrangement of words and metadata framework, standard, or schema phrases used to index content and/or to fields or data elements that have a 216 Introduction to Controlled Vocabularies

similar function or meaning in one or more would be fields included in a vocabulary other databases, frameworks, standards, record). See also field. or schemas (e.g., the artist element in one database index standard may map to the creator element in Also called a data index. A particular type another). See also mapping. of data structure that improves the speed of cultural heritage operations in a table by allowing the quick The total corpus of activities and the arti- location of particular records based on key facts of activities that provide a record of column values. Indexes are essential for the life of a culture. See also material good database performance. The concept culture. is distinguished from indexing (human indexing) and automatic indexing. cultural works In the context of this book, art and archi- database normalization tectural works and other artifacts of cultural See normalization. significance, including both physical database record objects and performance art. In related See record. disciplines, the scope could be broader, also including the performing arts. data content The organization and formatting of the data words or terms that form data values. In common usage in computer science, this term is used as a singular noun to refer to data elements information that exists in a form that may be The specific categories or types of infor- used by a computer, excluding the program mation that are collected and aggregated code. In other uses, datum is the singular in a database. and data is the plural, referring to facts or data preprocessing numbers in a general sense. See preprocessing. database data processing A structured set of data held in computer See processing. storage, especially one that incorporates software to make it accessible in a variety data structure of ways. A database is used to store, A given organization of data, particularly query, and retrieve information. It typically the data elements, the logical relationships comprises a logical collection of inter- between data elements, and the storage related information that is managed as a allocations for the data. unit, stored in machine-readable form, and data table organized and structured as records that Sets of data that are organized in a grid or are presented in a standardized format in matrix comprising rows and columns. order to allow rapid search and retrieval by a computer. See also system. data values In the context of this book, the terms, database field words, or numbers used to populate fields Also called a data field. A placeholder for in a work or vocabulary record. See also a set of one or more adjacent characters data content. comprising a unit of information in a data- base, forming one of the searchable items decoordination in that database. It is a portion of a struc- In the context of a thesaurus, the splitting tured record, especially a machine-readable of a compound term into its component record, containing a particular category words to stand as individual terms. This of information (e.g., term and scope note would typically happen if a compound term Glossary 217 had been added to the thesaurus but was use as a default in displays. In thesauri, later determined not to be a bound term. the relationship indicator for this type of term is D. deep Web See hidden Web. diacritics Also called diacritical marks. Signs or derivation accent marks found over, under, or through Also called modeling. In the context of alphabetic letters in many languages (e.g., this book, the process of building a new the umlaut in German, München), used to vocabulary based on an existing vocabulary. indicate emphasis or pronunciation, often In this approach, an appropriate controlled to distinguish different sounds or values vocabulary is selected as a model for devel- of the same letter or character without the oping controlled for local use, diacritical mark. so that the local terms will be interoperable with the larger original vocabulary. See also digital asset management system local authority and microcontrolled (DAMS) vocabulary. A type of system for organizing digital media assets, such as digital images or descendant video clips, for storage and retrieval. Digital Also often spelled descendent in the asset management systems sometimes disciplines of computer science and incorporate a descriptive data cataloging thesaurus construction. In a hierarchy, any component, but they tend to focus on record that is a narrower context for the managing workflow for creating digital record at hand, including children, grand- assets and for managing asset rights, children, and all other narrower contexts at requests, and permissions. all lower levels; any node in the succession of parent nodes on a path all the way down direct mapping to the tips (leaves) of the hierarchies. See In the context of interoperability of vocabu- also ancestor. laries, refers to the matching of terms one-to-one in two controlled vocabularies. descending order While the vocabularies need not be the In the context of a string of hierarchical same size or cover exactly the same parents, the display of parents from content, where overlap exists, there should broadest to narrowest (e.g., Columbus be the same meaning and level of specificity (United States, Indiana, Bartholomew between the two terms in each controlled county) ). See also ascending order. vocabulary. See also switching. descriptive data direct query In the context of cataloging art, data See ad hoc query. intended to describe and identify a work, as contrasted to information necessary for disambiguation administrative, technical, or accounting In the context of creating and displaying purposes. See also administrative data. a vocabulary, the use of qualifiers, head- ings, or other methods to clarify and descriptor (D) remove ambiguity between homographs In a thesaurus, the term recommended (e.g., Smith, John (English printmaker, to represent the concept in displays 1654–1742) and Smith, John (English and indexing. Also called the main term, architect, 1781–1852)). See also word postable term, or preferred term in sense disambiguation. a monolingual thesaurus. A multilingual thesaurus may have multiple descriptors discrete concept (one in each language represented) but may In the context of a generic concept vocabu- possibly have only one preferred term for lary, a discrete thing or idea as opposed to 218 Introduction to Controlled Vocabularies

a subject heading, which often concatenates rules for catalogers of works are called multiple terms or concepts together in a cataloging rules. string. See also concept. end user displayed index In the context of this book, usually the An index that is visible and available to searcher, client, or patron who retrieves, end users for browsing. See also nondis- views, and uses the data in a vocabulary played index. or work record, as distinguished from the editors or catalogers. In the context of display field systems design, the term refers to any client In the context of this book, a field intended for whom a database system is designed and for viewing by the end user, typically used; from that perspective, it could include showing data in natural language that is the editors or catalogers for whom an edito- easily read and understood and that can rial or cataloging system has been designed. convey nuance and ambiguity. Display information may, in some cases, be concat- end-user thesaurus enated from controlled fields; in other A thesaurus designed for direct access by cases, this information is best recorded in searchers rather than for use by indexers. free-text fields. See also indexing. Instead of controlling the terminology, the purpose of an end-user thesaurus is to document help searchers find useful terminology for In the context of search and retrieval, the improving, narrowing, and broadening their combination of a defined, primarily self- queries. See also indexer thesaurus. contained, machine-readable text or other information and the format in which it entity is housed. In the context of computer science, a self- contained piece of data that can be refer- dominant language enced as a unit. In a more general sense, the In the context of multilingual vocabularies, term is used in this book to refer to a distinct the more prominent or original language person, place, or concept in a vocabulary. to which terms in other languages are mapped and in which other fields in the entity-relationship model record (e.g., scope notes or date notes) are A type of conceptual data model that repre- written. In a purely multilingual vocabu- sents structured data in terms of entities lary, no language is dominant, but in a rich and relationships. An entity-relationship and complex vocabulary (e.g., the AAT ), diagram can be used to visually represent a dominant language may be required for information objects and their relationships. practical purposes. Because the constructs used in the entity- relationship model can easily be trans- download formed into relational tables, this type of See load. model is often used in database design. editorial rules entry array In the context of this book, written rules A type of display, often used for headings, and guidelines for creators or editors of in which any two or more entries that have vocabulary records that dictate how to the same broader heading (e.g., Religious populate fields and choose or interpret art—Ancient Egyptian, Religious art— data. They should include which fields are Christian, Religious art—Hindu, etc.) required, how to choose appropriate values are grouped together vertically under the for various fields (e.g., how to choose a broader heading. While this is not a true preferred term), how to choose hierarchical hierarchical display, it may resemble a hier- positions, the format and syntax for each archical display through use of indentation. field, authorized sources, etc. Analogous Glossary 219 equivalence relationship extension vocabulary In a thesaurus, the relationship between A thesaurus that is created with the inten- synonymous terms or names for the tion of, or is later adapted for, linking to same concept, typically distinguishing another vocabulary that is larger, broader, preferred terms (descriptors) and or more generic; the extension vocabulary nonpreferred terms (variants or UFs). is typically linked through node linking, See also associative relationship and rather than being integrated at many points hierarchical relationship. in the original vocabulary. See also micro- controlled vocabulary, node linking, equivalent term and satellite vocabulary. A term that is considered an equivalent in search-and-retrieval, including not only true external node but possibly also near-synonyms See leaf node. and any other terms that are considered facet closely enough related to be useful in Also called a faceted display. A funda- broadening a query; to narrow a query, mental, homogeneous, and mutually exclu- exact equivalents could be used instead. sive category of information in a thesaurus exact equivalence (e.g., the AAT has seven facets: Associated The relationship between synonyms in one Concepts, Physical Attributes, Styles and language and terms in different languages Periods, Agents, Materials, Activities, that have the same usage and meaning. and Objects). See also inexact equivalence and facet indicator nonequivalence. A node label that designates a facet. exact match false hit Electronic term-matching that produces Also called a false drop. In search and a result that precisely matches the user’s retrieval, an entry in a list of results that does query term and does not implement not comply with the user’s intended results. automatic Boolean operators, truncation, proximity ranges, or stemming. In a strictly federated searching applied exact match, normalization is not Also called broadcast searching, cross- used, so that differences in punctuation, database searching, metasearching, spacing, and diacritics are maintained in and parallel searching. Performing the match. See also best match. queries simultaneously across resources that are in different domains and created by exhaustivity different communities. Federated searching In the context of cataloging and indexing, may involve searching across multiple the degree of depth and breadth that the databases, different platforms, and varying cataloger uses in assigning indexing protocols, thus requiring the application terms or writing a description. Measures of interoperability between resources of greater exhaustivity include the use of a and vocabularies. greater number of optional fields and the assignment of a greater number of indexing field terms for each field. See also specificity. In the context of this book, an area (often mapping to a metadata element in a meta- expansion data element set) in the user interface of a See query expansion. system where a discrete unit of information explode a hierarchy is displayed or the cataloger can enter To retrieve and display all the descen- information. Note: In this context, field dants of any given node, typically in a is not necessarily equivalent to a graphic display. database field. 220 Introduction to Controlled Vocabularies

filing rules following standard protocols and using A set of guidelines that determine how standardized controlled vocabularies. letters, numbers, spaces, and special format characters should be processed when Used in two senses in this book. In the assembling an alphabetical or other listing. context of cataloging art, the configuration See also sorting. of a work—including technical formats—or first name the conventional designation for the dimen- Also called a given name. In Western sions or proportion of a work (e.g., cabinet tradition, the name of a person that identi- photograph or IMAX ). In the context of fies that individual, typically unique in the computer science, the physical layout of a immediate family and used with a last name data storage device or the logical structure (e.g., Richard in Richard Meier). See also or composition of a file. last name and middle name. format control flat-file database See controlled format. A database with a data model designed free-text field around a single table, often a single file A field that may contain data entered containing many records that all have without any vocabulary control or system- exactly the same fields. It is a simpler defined structure. It may be used to express model than the more highly structured rela- ambiguity, uncertainty, and nuance in a tional and object-oriented models. note. See also controlled fieldand text. flat format generic concept In the context of a thesaurus, an alpha- In the context of this book, a concept in a betical display in which only one level of vocabulary that is described by terms other broader contexts and one level of narrower than proper nouns or names (e.g., the type contexts are displayed for each focus of artwork, such as amphora, or a material, record. See also generic structure. such as terracotta). Generic concepts do not focus include proper names of persons, organiza- Also known as a head noun for terms tions, geographic places, named subjects, and a trunk name for proper names. or named events. In the context of a compound term, the generic posting noun component that identifies the class In controlled vocabularies, the use of of concepts to which the term as a whole narrower terms as used for terms for a refers (e.g., buttresses in the term flying descriptor that is really a broader term in buttresses). In the context of a modified the same vocabulary record. A generic name such as a place name, the part of the posting is typically used as a time-saving name that is not a modifier (e.g., Etna in strategy rather than making separate Mount Etna). See also modifier. records for all the terms and linking them folksonomy hierarchically. See also up-posting. A neologism referring to an assemblage generic structure of concepts, which are represented by A display format for a thesaurus in which all terms and names (called tags) that are hierarchical levels are displayed by using compiled through social tagging, gener- indentation, codes, or punctuation marks. ally on the Web. A folksonomy differs from See also flat format. a taxonomy in that it is not structured hierarchically, and the authors of the folk- genus/species relationship sonomy are typically the casual users of the Also called a generic relationship. content rather than professional indexers A hierarchical relationship in which all Glossary 221 children must be a kind of, type of, or mani- such as disks, disk drives, chips, electronic festation of the parent. The genus/species circuitry, keyboards, monitors, modems, relationship is the most common hierar- and printers. See also software. chical relationship in thesauri and taxono- harmonization mies, because it is applicable to a wide In the context of vocabularies and stan- range of topics. See also instance rela- dards, the process of preventing, mini- tionship and whole/part relationship. mizing, or eliminating technical and content given name differences and contradictions between See first name. standards or vocabularies that have the same or similar scope or that must work gloss interchangeably or in concert. See qualifier. heading grandparent Also called a label. A string of words In a thesaurus, the level immediately comprising a term combined with other above the parent of the focus record (e.g., information that serves to modify, disam- in the following example, Indiana is the biguate, amplify, or create a context for the grandparent of Columbus: Columbus, main term in displays. Examples include the Bartholomew county, Indiana, United listing of qualifiers and/or broader contexts States ). for terms (e.g., rhyta (, containers) ), place See specificity. types and administrative broader contexts for place names (e.g., Dayr al-Bahri group (deserted settlement) (Qinaˉ governorate, Also called an archival group or record Egypt) ), or biographical information for group. In the context of cataloging works, people’s names (e.g., Francesco Aliunno refers to an aggregate of items that share a (Italian calligrapher, active 15th century) ). common provenance. See also component See also authority heading, name and item. authority, and subject heading list. group-level cataloging head noun Describing and assigning indexing terms See focus. for a group of works as a whole, typically focusing on the most important or most hidden Web frequently occurring characteristics in the Also called the deep Web or invisible items of the group. See also item-level Web. The sum of the Web pages that cataloging. are not accessible to Web crawlers or robots, usually because they are either guide term dynamically generated by a user querying A node label that is not a facet, but is a database or are password protected or created as a hierarchical level to provide subscription based. order and structure to thesauri by grouping narrower terms according to a given logic. hierarchical display Guide terms are not used for indexing and Also called a classified display or are often enclosed in angled brackets or systematic display. In a thesaurus, a otherwise distinguished from other terms in graphic arrangement of terms showing displays (e.g., ). broader/narrower relationships through the use of indentation, codes, or another method. hardware The physical components of a computer hierarchical relationship system, including those that are mechan- The broader and narrower (parent/ ical, electronic, magnetic, and electrical child) relationship between two entities 222 Introduction to Controlled Vocabularies

in a thesaurus, namely whole/part (e.g., of the document or to other documents. Montréal is part of Québec), genus/species It is usually indicated by color or other (e.g., bronze is a type of metal ), or instance emphasis applied to a word, phrase, icon, relationships (e.g., Montréal is an instance or symbol. of a city ). It is the basic structure that hypertext database creates a hierarchy. A dataset that resides as a collection of hierarchy online documents with links joining various An organization of records related by levels parts to each other, with access provided via of superordination and subordination. Each an interactive browser. record in the hierarchy, except the root, is a Hypertext Markup Language (HTML) narrower context of the record above it. See A markup language used to create the layout also monohierarchy, polyhierarchy, and presentation of documents for World and subfacet. Wide Web applications. historical term image Also called a historical name. In the In the context of cataloging art, a visual context of the vocabularies discussed in representation of a work, usually existing this book, a term or name that was used to in a photomechanical, photographic, or refer to a person, place, subject, or concept digital format. In a typical visual resources in the past, but in current usage has been collection, an image is a slide, photograph, replaced with a different term or name (e.g., or digital file. historical names for St. Petersburg, Russia, are Leningrad and Petrograd ). indentation Also called indention. In the context of hits printing or other displays of typed words See results list. or texts, refers to the white or blank space homograph of a fixed width on a row along the right or A term that is spelled the same as another left margin of a display, as commonly used term, but the meanings of the terms are to indicate the first line in a new paragraph different (e.g., drums can have at least of text. Graduated indentation is used to three meanings: components of columns, indicate relationships between parents and membranophones, or walls that support a their descendants in hierarchical displays dome). Homographs exist whether or not of thesauri. the terms are pronounced alike. Terms are indexer generally considered homographs despite A person who assigns indexing terms for a differences in capitalization, punctuation, or work or image, typically the same person as diacritics. See also qualifier. the cataloger. See also cataloger. homophone indexer thesaurus A term that is pronounced like another A thesaurus designed to control termi- term but spelled differently (e.g., bows and nology and guide indexers in the choice of boughs). Homophones are not typically terms. See also end-user thesaurus. labeled in traditional controlled vocabularies. indexing human indexing Also called human indexing and See indexing. manual indexing. In the context of this hyperlink book, the process of evaluating informa- Also called a hypertext link. In the tion and designating indexing terms by context of online information, an embedded using controlled vocabulary that aids in link that connects different parts of an finding and accessing the cultural work online document or data set to other parts record. Refers to indexing done by human Glossary 223 labor, not to the automatic parsing of internal node data into a database index (automatic See nonleaf node. indexing), which is used by a system to interoperability speed up search and retrieval. In the context of controlled vocabularies, inexact equivalence the ability of two or more vocabularies The relationship between synonyms in one and their systems or components of their language or terms in different languages systems to map to each other’s data, with that have similar or overlapping meaning the goal of exchanging information or and usage but are not true synonyms (e.g., enhancing discovery. floating and flying). See also exact equiv- inverse document frequency (IDF) alence, nonequivalence, and partial An automatic ranking method often used in equivalence. a formula with term frequency in infor- information object mation retrieval and text mining to estimate A digital unit or group of units, regard- how important a term is to a set of data and less of type or format, that a computer can how useful it will be in retrieval. address or manipulate as a single discrete inverted form object. See also content object. Also called an inverted index. In the information processing context of a controlled vocabulary, the See processing. indexing form of a multiple-word name or term, where the last name or trunk information retrieval database portion of the term is listed first, followed Also called an IR database. Any data- by a comma and the descriptive word base designed primarily for discovering (e.g., Wren, Christopher, or buttresses, and retrieving information. The systems flying). See also natural order form and that work with IR databases provide the permuted index. following: a search interface to permit users to compose queries, methods for searching invisible Web through the target data, viewable or behind- See hidden Web. the-scenes indexes, and results displays. ISO (International Organization for initialism Standardization) A set of initials that stand for the full form A worldwide voluntary, nontreaty network of a name (e.g., MFA, for Museum of of national standards institutes of approxi- Fine Arts). See also abbreviation mately 160 countries. The standards bodies and acronym. work in partnership with international orga- nizations, governments, industry, business, instance relationship and consumer representatives to reach A hierarchical relationship in which all consensus, set standards, and promote children must be an example of a broader their use with the goal of facilitating trade context, most commonly seen in vocabu- and meeting the broader needs of society. laries where proper names are organized by general categories of things or events (e.g., item if the proper names of mountains and rivers In the context of cataloging art, an indi- are organized under the general categories vidual object or work. See also component mountains and rivers). See also genus/ and group. species relationship and whole/ item-level cataloging part relationship. Describing and assigning indexing terms interactive processing for individual items in a collection of works. See processing. See also group-level cataloging. 224 Introduction to Controlled Vocabularies

jargon partially address the problem of the variety A characteristic terminology of a particular of terms that can be used to express group or discipline that is typically not similar concepts. understood by a more general audience. Latin 1 keyword A character set (consisting of 191 charac- In the context of vocabularies, a verbal ters) that is part of a series of ASCII-based unit or word of a term that may be used in a character encodings defined in ISO/IEC search expression (e.g., for the place name 8859-1:1998: 8-Bit Single-Byte Coded Sena Julia, Sena is one keyword and Julia Graphic Character Sets—Part 1. is another). In the broader context of online latinization retrieval, any significant word or phrase in See romanization. the title, subject headings, or text associ- ated with an information object. lead-in term See used for term. Keyword in Context (KWIC) A type of automatic indexing in which leaf linking each word in a text, title, subject heading, See node linking. string of words, or term becomes an entry leaf node word in the index, with the exception of Also called an external node. In a words in stop lists. Variations on KWICs thesaurus, a node that has no children, as are KWOCs (Keyword Out of Context) and with the ends or tips of hierarchical trees. KWACs (Keyword Alongside Context). keyword index A fundamental unit of the words of a An index based on individual words language, around which may be clustered (keywords) found in a vocabulary term, text, a set of words that are different forms of or other content object. the same word (e.g., paint is the lexeme for label paints, painted ). See heading. lexical unit language model Also called a lexical item. One or more A type of automatic indexing based on term words that refer to a single concept (e.g., weighting and relevance prediction that flying buttresses or bills of sale). See also attempts to predict probable query search bound term and compound term. terms based on term frequencies within lexical variant documents and the inverse document A term that is a different word form for frequency of terms across the target data. It another term, caused by differences, is similar to the probabilistic model. grammatical variation, or abbreviations last name (e.g., watercolor and water-colour ). Lexical Also called a surname. In Western tradi- variants are considered as and grouped with tion, the family name used with a first name synonyms in a vocabulary record, but they to identify a person (e.g., Meier in Richard technically differ from synonyms in that Meier ). See also first name and synonyms are different terms for the same middle name. concept. See also . latent semantic indexing (LSI) link A form of automatic indexing based on In the context of this book, any relationship the co-occurrence clustering of terms in between two vocabulary records, two works, combination with content that is associ- a work and image, or a work or image and ated with these clusters; it attempts to an authority. Compare to hyperlink. Glossary 225 literary warrant main term Justification for the inclusion of a term in See descriptor. a vocabulary based on published evidence manual indexing that is sufficient to prove that the form, See indexing. spelling, usage, and meaning of the term are widely agreed upon in authorita- mapping tive sources. See also organizational A set of correspondences between terms, warrant, source, and user warrant. fields, or element names used for trans- lating data from one standard or vocabulary load into another, or as a means of combining The process of moving or transferring terms or data for search and retrieval. See files or software from one disk, computer, also crosswalk. or server to another disk, computer, or server. To upload means to transfer from markup language a local computer to a remote computer; to A formal way of annotating a document or download means to transfer from a remote collection of digital data using embedded computer to a local one. encoding tags to indicate the structure of the document or data file and the contents of its loan word data elements. This markup also provides In the context of a given language, a word a computer with information about how to that is taken directly from another language process and display marked-up documents. (e.g., sotto in su, an Italian phrase used in HTML, XML, and SGML are examples of English to mean painted in correct perspec- standardized markup languages. tive as if viewed from below). material culture local authority A term referring to art together with the An authority developed for local use. broad realm of physical objects and Although often compiled from one or more edifices produced by a culture. See also standard authoritative published vocabu- cultural heritage. laries, a local authority enforces preferences and usage pertinent for the local setting. metadata See also authority file and derivation. A structured set of descriptive elements used to describe a definable entity. This locator data may include one or more pieces of In a bibliographic index, the part of an index information, which can exist as separate entry that indicates the location of the book, physical forms. In the context of art infor- page, or other resource. In an online index, mation, metadata includes data associ- it may be a hyperlink to the source. ated with information about the creation, logical data model physical characteristics, history, location, A data model that includes all entities and administration, or preservation of the work. the relationships among them based on Metaphone the structures identified in a conceptual A phonetic algorithm for matching terms data model, and that specifies all attributes and names by sound, as pronounced for each entity. The data is described in as in English, by translating words into a much detail as possible, without regard to standard code or representation. It was how it will be implemented in a specific developed by Lawrence Philips to address database. See also conceptual the perceived deficiencies in the Soundex data model. algorithm. Metaphone and its later improve- logical record ments are available as built-in operators in See record. a number of systems. See also Soundex. 226 Introduction to Controlled Vocabularies

metasearching real estate or other buildings. Distinguished See federated searching. from built work. microcontrolled vocabulary multilingual Also called a microthesaurus. A Expressed in more than one language, as controlled vocabulary that is limited in the distinguished from monolingual. In a range of topics covered but fits within the multilingual thesaurus, terms and other domain of a larger, broader, or more generic information may be expressed in more than controlled vocabulary. It typically contains one language. highly specialized terms that are not neces- name authority sarily in the broader controlled vocabulary An authority containing proper names, most but that map to the hierarchical structure of often personal names. See also subject the broader controlled vocabulary. See also heading list. derivation, extension vocabulary, and satellite vocabulary. narrower term (NT) Also called narrower context or child. A middle name record to which another record or multiple In Western tradition, any name for a person records are superordinate in a hierarchy placed before the last name (surname) but (e.g., Brewster chair is a narrower term to after the first name (e.g., Alan in Richard armchair). In thesauri, the relationship indi- Alan Meier ). See also first name and cator for this type of term is NT. Variations last name. on the notation include NTG (narrower term minimal description generic), NTP (narrower term partitive), NTI In the context of cataloging art, a record (narrower term instance), NT1 (narrower term containing the minimum amount of infor- level 1), NT2 (narrower term level 2), etc. mation in the minimum number of fields or narrow results metadata elements. To adjust criteria in a search in order to modeling retrieve a smaller number of more precise See derivation. results that better match the intention of the searcher. See also broaden results. modifier In a compound term or name, the adjectival natural language component that modifies the noun (e.g., Spoken or written texts, as distinguished flying in flying buttresses; Mount in Mount from fielded data and controlled Etna). See also focus. vocabulary. monohierarchy natural order form A hierarchy in which each child has only In the context of a controlled vocabulary, one immediate parent. Distinguished from a the form of a multiple-word name or term, polyhierarchy. where the name or term appears in the form that would be used in speech or a monolingual written text (e.g., Christopher Wren or Expressed in a single language, as distin- flying buttresses), rather than inverted (as guished from multilingual. In a mono- may be appropriate for an index). See also lingual thesaurus, the terms and names are inverted form. expressed in only one language. navigation movable work In the context of search and retrieval, the In the context of cataloging art, any tangible facility that allows users to move through object capable of being moved or conveyed a controlled vocabulary or other content from one place to another, as opposed to Glossary 227 object by using preestablished links structure of a source controlled vocabulary or relationships. to link to more detailed controlled vocabu- laries that are applicable to a single node of near synonymy the parent hierarchy. The vocabulary linked Also called quasi-synonymy. The char- to a broader vocabulary in this way is often acteristic of a term with meaning that is called an extension vocabulary. regarded as different from another term, but both the terms are treated as equivalents for nondisplayed index the purposes of broadening retrieval. See A machine-readable index that is not also synonym and true synonymy. displayed for browsing or other direct access of end users, but is used behind the neologism scenes to improve accuracy or speed in A term that has been newly invented, or an search and retrieval. Such indexes may be existing term to which a new meaning is created beforehand or on the fly at the time applied, often arising in the professional of the query. See also displayed index. literature of a discipline. nonequivalence nickname In mapping one vocabulary to another, the A familiar, affectionate, derogatory, or situation where there is no exact match, humorous name that is used to refer to no term in the second language has partial a person, place, or corporate body as a or inexact equivalence, and there is no replacement for, or in addition to, the real combination of descriptors in the second or official name (e.g., Masaccio, meaning language that would approximate a match. “big Tom,” is a nickname for the painter See also exact equivalence and inexact Tommaso Guidi ). (In the case of Masaccio, equivalence. in the ULAN it is the preferred name based on literary warrant.) See also pseudonym. nonleaf node Also called an internal node. In a hier- NISO (National Information archy, a node that links to one or more Standards Organization) narrower contexts. See also leaf node. A nonprofit association that is accredited by the American National Standards Institute nonpreferred parent (ANSI) and identifies, develops, maintains, In a polyhierarchical thesaurus, any parent and publishes technical standards to that is not flagged as preferred for use as manage information. a default in displays. See also preferred parent. node In the context of a thesaurus, any point or nonpreferred term record in the hierarchy that is a location at Also called a nonpreferred name. Any which a branch or individual record (leaf) term in a vocabulary record that is not the is attached; thus, the basic conceptual unit preferred term, which is the term flagged used to build hierarchies. as preferred for use as default in displays. node label normalization A word or phrase inserted into a hierarchy In the context of vocabulary retrieval, to indicate the logical classification of the normalizing terms through a process of terms beneath it. See also facet indicator converting a term to its simplest form and guide term. by removing case sensitivity, spaces, punctuation, and diacritics. It differs from node linking database normalization, which is the Also called leaf linking. In the context of process of reducing a complex data struc- combining multiple vocabularies, a method ture into its simplest structure, a technique that uses various nodes in the hierarchical used to eliminate data redundancy by 228 Introduction to Controlled Vocabularies

converting Unicode text into a standardized specificity of terms in a domain based on form, among other things. the number of postings or links to that term in a content object (e.g., a term notation that is linked to very few content objects For a thesaurus, the alphabetic code used to is predicted to be highly specific). express term types (D, AD, UF), associative relationship (RT), hierarchical relationships organizational warrant (BT, NT, BTG, NTG, BTP, NTP, BTI, NTI, BT1, Justification for the inclusion of a term BT2, NT1, NT2), and scope notes (SN), in a vocabulary based on the specialized among others. See also classification requirements or jargon of the group or notation. organization that is creating or sponsoring the vocabulary. See also literary warrant object and user warrant. See work. orphan term object-oriented database In a thesaurus, a record that has no asso- A data model where the universe is divided ciative or hierarchical relationship to any into a framework of classes, with each class other term in the thesaurus. containing instances or members (called “objects”). Classes can contain subclasses, orthography members of which inherit the properties of Correct or proper spelling and form of a the parent or “superclass.” Rules and algo- word or words, including capitalization, rithms for processing the data are integrated diacritics, and punctuation, based on with the data. standard usage or convention. online catalog paradigmatic relationship In the context of art information, a type of Also called a semantic relationship. system used by end users to search for and A relationship between terms or con- view data and images. cepts that is permanent and based on a known definition. ontology A formal, machine-readable specification parallel searching of a conceptual model, in which concepts, See federated searching. properties, relationships, functions, parent constraints, and axioms are all explicitly See broader term (BT). defined. While an ontology is not techni- cally a controlled vocabulary, it uses one or parenthetical qualifier more controlled vocabularies for a defined A qualifier placed in parentheses domain and expresses the vocabulary in a for display. representative language that has a grammar parent string for using vocabulary terms in an automated The display of hierarchical parents in a way to express something meaningful. horizontal string, as distinguished from operating system vertical indented displays or displays Also called an operating system using notation. program. A software program that runs parsing a computer, as distinguished from an In processing data, a process where data application program, which is designed is broken or filtered into smaller, more to accomplish a task for an end user (e.g., distinct units. word processing). partial equivalence operational specificity The relationship between terms in two Also called postings specificity. An auto- vocabularies where one term has a broader mated method that attempts to predict the Glossary 229 scope but is partially synonymous with the polyseme other term. See also exact equivalence A word or lexical unit (e.g., a compound and inexact equivalence. term) with multiple meanings; known as a homograph in written language and a partitive relationship homophone in spoken language. See whole/part relationship. postable term patronymic See descriptor. Also called a patronym. A word or words used with a given name to identify a person; postcoordination common in early Western personal names The process of combining two or more when last names were uncommon (e.g., terms at the time of retrieval rather than Bartolo di Fredi means “Bartolo, son of at the indexing stage; usually uses the Fredi”); may also refer to a surname derived Boolean operators AND, OR, or NOT from a paternal ancestor (e.g., Robinson (Baroque AND cathedral ) in formulating a means “son of Robin”). query. See also precoordination. permuted index posting A type of index where individual words of In the context of indexing, any instance of a a term are rotated to bring each word of the given indexing term having been assigned term into alphabetical order in the term list. to records, documents, or other content See also inverted form. objects. Formulas used for predicting the usefulness of terms or methods of retrieval phonetic matching may count the number of postings relative A process by which terms are matched to to the target content objects or use the other terms that are presumed to sound like numbers of postings in other statistics. the original term, in an attempt to compen- sate for users’ misspellings or general vari- postings specificity ation in spelling of names or terms (e.g., See operational specificity. Meier and Meyer are pronounced alike). precision Phonetic algorithms—such as Soundex, A measure of a search system’s effective- Metaphone, and others—are used for ness in terms of retrieving only relevant indexing words by their pronunciation. results; expressed as the ratio of relevant physical feature records or documents retrieved from a In the context of geographic information, a database to the total number retrieved in characteristic of the earth’s surface that has response to the query. A high-precision been shaped by natural forces, including search means that most of the results continents, mountains, forests, rivers, and retrieved will be relevant; however, a oceans. See also administrative entity. high-precision search will not necessarily retrieve all relevant results. Recall and pick list precision are inverse ratios (when one goes A user interface feature that allows the user up, the other goes down). See also recall. to select from a preset list of terms and is typically used to control vocabulary for precoordination indexing or to provide options in a query. The formulation of a compound term or A pick list is generally populated with a multiword heading at the time of indexing, controlled list. rather than at the time of retrieval. An example of a precoordinated term is polyhierarchy Baroque cathedrals; an example of a A thesaurus in which any record may be precoordinated heading is United States— linked to multiple parent records. See History—Civil War, 1861–1865. See also also hierarchy. postcoordination. 230 Introduction to Controlled Vocabularies

predefined report procedure A report for which the query and the output Also called a subprogram or subrou- have been written and made available tine. A relatively independent portion of for repeated use by users; users may be computer code within a larger computer allowed to enter variables that are plugged program that performs a specific task in a into the report. See also ad hoc query. series of steps. preferred flag processing A designation indicating that a term or other Also called data processing or informa- data instance is preferred over others of tion processing. The manipulation or the same type in a record. In addition to a transformation of data through a series preferred term for the record overall, there of operations. In batch processing, may be a preferred indexing name flag the operations are grouped together in for the inverted order version of the term, a batches and performed automatically; in preferred display name for the natural interactive processing, the opera- order form of the name, a preferred role tions are prompted by input from a human or preferred place type flagged among a programmer or user. See also computer list of roles or place types, and so on. program. preferred parent program In a polyhierarchical thesaurus, the broader See computer program. context that is chosen as conceptually programming language preferred; or, to serve as the default in hier- A formal language defined by syntactic and archical displays. See also nonpreferred semantic rules and used to write instruc- parent. tions that can be translated into machine preferred term language and then executed by a computer Also called a preferred name. The term (e.g., SQL, C++, C#, Java, Perl). designated among all synonyms or lexical provisional term variants for a concept to be used as the See candidate term. default term to represent the concept in displays and other situations. In a mono- pseudonym lingual thesaurus, the preferred term is A false or fictitious name, especially one also the only descriptor in the record. In assumed by an artist, author, or other person a multilingual thesaurus, there may be a to maintain anonymity or to designate an descriptor for every language, but there is identity for a particular activity, among other often only one preferred term for the record reasons (e.g., Le Corbusier is a pseudonym as a whole. See also descriptor. assumed by the architect Charles Édouard Jeanneret ). See also nickname. preprocessing Also called data preprocessing. Prelimi- punctuation nary processing or transformation of data In the context of vocabulary terms, the in order to facilitate further processing, marks from standard written communica- parsing, etc. tion used to clarify, organize, or indicate how a word or words should be read (e.g., probabilistic model hyphen, comma, period, quotation marks, An automatic relevance and weighting parentheses). method in which terms in a text or other content object are modeled as random qualifier variables so that term frequency and distri- A word or phrase used to distinguish a term bution are used to predict the probability of in a vocabulary from otherwise identical relevance. See also language model. terms that have different meanings. A Glossary 231 qualifier is separated from the term, usually record by parentheses. It is also called a gloss; Also called a logical record. In the context although, strictly speaking, a qualifier of this book, a conceptual arrangement of should be used only with homographs, and fields referring to a vocabulary concept or a gloss has a more general meaning in the a work. This is different from a database field of linguistics. See also homograph. record, which is one row in a database table or another set of related, contiguous quasi-synonymy data. See also concept record. See near synonymy. record group query See group. Also called a search. In the context of retrieval, a command to look in a database related term (RT) and find records or other information that A concept that is associatively (not hier- meet a specified set of criteria (e.g., select archically) linked to another concept in subject_id from term where normalized_ a thesaurus. In thesauri, the relationship term like ‘A%’ and historic_flag = ‘H’;). The indicator for this type of term is RT. See also most precise queries are those that return associative relationship. the fewest false hits. relational table database query expansion (QE) Also called a relational database. A Reformulating a query in order to return database in which data is organized into a broader or more comprehensive set of columns and rows according to specific results (e.g., adding synonyms to the user’s defined relationships (e.g., in a vocabulary search term). database, a table of terms may be linked to a table for languages). recall A measure of a search system’s effective- relationship ness in terms of retrieving all results that In the context of this book, a link between are possibly relevant, expressed as the ratio two types of data, records, files, or any two of the number of relevant records or docu- entities of the same or different types in a ments retrieved over all the relevant records system or network. See also link. or documents. A high recall search retrieves relationship indicator a comprehensive set of relevant results; A word, code, or other device used in however, it also increases the likelihood thesauri to identify the semantic relation- that marginally relevant content objects will ship between terms (e.g., UF), other fields also be retrieved. Recall and precision are (e.g., SN), or records (e.g., BT). inverse ratios. See also precision. relevance reciprocity The extent to which information retrieved in In reference to vocabulary records, the char- a search is judged by the user to meet the acteristic of a two-way relationship in which criteria of the query. both entities have mutual dependence, action, or influence on each other. Semantic relevance ranking relationships in controlled vocabularies Ranking and sorting of query results, typi- must be reciprocal, meaning each relation- cally estimated by an algorithm that calcu- ship from one record to another must also lates the number and weight of occurrences be represented by a reciprocal relationship of the search term in the targeted data. in the other direction. Reciprocal relation- report ships may be symmetric (e.g. RT/RT) or An organized set of data presented in a asymmetric (e.g. BT/NT). format suitable for viewing or printing, 232 Introduction to Controlled Vocabularies

typically produced by a preestablished also extension vocabulary, microcon- query that may or may not have variables trolled vocabulary, and node linking. that are manipulated by the user. schema repository Also called a scheme. In the context In the context of art and related disciplines, of this book, the organization, structure, refers to an institution, agency, or individual and rules for a set of data (e.g., the set of that has physical or administrative respon- tables, views, indexes, and descriptions for sibility for an art object, work of architec- columns in a database, or the organization ture, or cultural object. and description of an XML document). required fields scope note (SN) Fields or data elements that are required to A note explaining the coverage, specialized meet a standard or the requirements of a usage, and meaning of terms. In thesauri, system’s operations. See also core fields. the relationship indicator for this note is SN. reserved characters search Letters, numbers, or symbols that have See query. special uses or meanings in a programming searching or querying language. Operations or algorithms intended to deter- results list mine if one or more data items meet defined The records or other data retrieved in criteria or possess a specified property. response to a query and presented online or see also reference in a system in an organized display. A type of cross-reference, usually in a retrieval printed index, directing the reader to In the context of this book, the activity of a related term or entry. A see also refer- using a search or other method to find ence differs from a see reference in that records or other data in a database. See the see also reference is not made between also query. synonyms, but between terms or headings that are more peripherally related. romanization Also called latinization. The conversion see reference of a character or word expressed in a non- A type of cross-reference, usually in a Roman alphabet or writing system (e.g., printed index, directing the reader from a Cyrillic or Korean) into the Roman alphabet nonpreferred term or subject heading to the by means of transcription, transliteration, or preferred term or subject heading for the a combination of the two methods. same concept. The term or subject heading at the see reference is a synonym for the root preferred term or heading. Also called root node or top term. The highest level of the hierarchy, from which semantic linking all branches descend. A method of linking terms in a vocabulary or larger database according to the rotated listing meaning of the terms and relationships See permuted index. between terms. satellite vocabulary semantic relationship A thesaurus that is created with the inten- See paradigmatic relationship. tion of, or is later adapted for, linking to another vocabulary that is larger, broader, SGML (Standard Generalized Markup or more generic; it may be integrated at Language) many points in the original vocabulary. See International Standards Organization standard ISO/IEC 8879:1986; a markup Glossary 233 language first used by the publishing graphs in the list—to sort by the parent industry, for defining, specifying, and string). See also filing rules. creating digital documents that can be Soundex delivered, displayed, linked, and manipu- A phonetic algorithm for matching terms and lated in a system-independent manner. XML names by sound, as pronounced in English, and HTML are derived from SGML. by translating words into a standard code or sibling representation. It was developed by Robert A concept that shares the same immediate Russell and Margaret Odell and patented in broader context (one level higher) as other 1918 and 1922. The National Archives and concepts. Siblings are subordinate to the Records Administration (NARA) maintains same broader concept and are at the same the current rule set for the official implemen- hierarchical level. tation of Soundex used by the U.S. Govern- ment. See also Metaphone. single-to-multiple term equivalence In the context of mapping terms from source different vocabularies to each other, the In the context of building vocabularies, situation that occurs when a term in one a citable reference to a term in the litera- vocabulary has no direct match in the ture that helps establish its form, spell- second vocabulary, but instead must be ing, usage, and meaning. See also mapped to a combination of terms. literary warrant. social tagging source authority The decentralized practice and method In the context of this book, a bibliographic by which individuals and groups create, authority file used to control the citations manage, and share tags (terms, names, etc.) providing warrant for terms in a vocabulary to annotate and categorize digital resources or information in a work record. in an online “social” environment. See also source language folksonomy. In the context of translating or mapping software one vocabulary to a vocabulary in another The components of a computer system language, the language of the original that are not physical, including programs, vocabulary. See also target language. procedures, algorithms, and documenta- specialized vocabulary tion pertaining to the operation of a system See microcontrolled vocabulary. and the performance of specific tasks, such as word processing, Web browsers, photo specifications editing, and art cataloging or vocabulary In the context of designing an information editing. See also hardware. system, the formal, detailed description of user and technical requirements, including sorting specific descriptions of procedures, func- In the context of this book, the automated tions, screens, reports, materials, other process of organizing a results list, data features, and hardware. See also user elements in a record, or other data in a requirements. particular sequence based on established criteria or attributes of the data—for specificity example, alphabetically, by parent string, In the context of indexing, the degree of or by an associated date. There may be precision or granularity used in assigning primary sort criteria and secondary sort terms. Measures of greater specificity criteria (e.g., an algorithm can be formu- include the use of the narrowest applicable lated to first sort place names in a results indexing term rather than a broader, more list alphabetically, and then—for homo- generic term. See also exhaustivity. 234 Introduction to Controlled Vocabularies

SQL (Structured Query Language) subfacet A standard command language used with A major conceptual division of a thesaurus relational databases to perform queries and that is located near the top of the tree but other tasks. under a facet. Also called a hierarchy in the AAT, although hierarchy has a more standard general meaning as well. A vocabulary, set of rules, code of prac- tice, or description of characteristics and subject parameters that is documented, established In the context of this book, the focus by experts, or approved by an authoritative concept of a vocabulary record (e.g., the body and widely recognized or employed subject of a ULAN record is a person). Also as an authoritative exemplar of correctness used to refer to the subject matter (often or best practice; used within a discipline or iconographical content) of what is depicted domain in order to promote interoperability in or by a work of art or the content of a text. and efficiency. subject heading list statistical specificity An alphabetical list of words or phrases See operational specificity. used to indicate the content of a text or stemming other thing; characterized by precoordina- In the context of mapping terms for search tion of terminology, meaning that several and retrieval, the alteration of a term by unique concepts are combined in a string automatically truncating or removing (e.g., Archaeology and art—China— common suffixes, word endings, or History—20th century ). A type of prefixes in order to find a match, usually controlled vocabulary. See also authority applied to sets of related words that are heading and heading. derived from a common root and appear in subject indexing a variety of grammatical forms (e.g., paint, A term typically used in the context of ). painting, painted bibliographic cataloging but also applicable stop list to cataloging art; refers to the application of In the context of search and retrieval, words indexing terms to the content of the docu- in a vocabulary or target data that are ment, as contrasted to a description of its ignored in searching or matching because physical characteristics. they occur too frequently or are otherwise of subprogram little value in retrieval for a given domain. See procedure. Common stop lists for a text contain articles, conjunctions, and prepositions, subroutine although these words are typically not See procedure. included in a stop list for a vocabulary. surface Web string syntax See visible Web. Also called string indexing. The creation of headings by computer algorithm, charac- surname terized by headings that are more consistent See last name. than the typically idiosyncratic headings switching created by hand (e.g., the automatic concat- In the context of mapping one vocabulary to enation of a parent string in a heading for a another, refers to the use of a third vocabu- geographic place, such as San Gimignano lary (a switching vocabulary) that itself (Siena province, Tuscany, Italy) ). can link to terms in each of the two original structure controlled vocabularies; useful when the See data structure. original two vocabularies do not map well Glossary 235 directly to each other. See also direct coordination with other descriptors (these mapping. recommendations are now found in the AAT Editorial Manual ). symmetric relationship In the context of a thesaurus, a reciprocal system relationship that is the same in both direc- Also called a computer system. A tions (e.g., RT/RT). See also asymmetric number of interrelated hardware and soft- relationship and reciprocity. ware components that work together to store and convert data into information by using syndetic structure electronic processing. In the context of this Also called cross-reference links. In book, a system for building and maintaining the context of a vocabulary, refers to the vocabularies, cataloging art, or performing linking of equivalent, broader, narrower, and search and retrieval. See also database. other related terms so that they can be used as cross-references to each other and to systematic display related headings for the purpose of access. See hierarchical display. synonym table A term having a different form but exactly See data table. or very nearly the same meaning as another target language term. See also near synonymy and true In the context of translating or mapping synonymy. Compare lexical variant. one vocabulary to a vocabulary in another synonym ring list language, the language into which the A type of controlled vocabulary containing original vocabulary is being translated. See terms that are considered equivalent for the also source language. purposes of retrieval but do not necessarily taxonomy have true synonymy. A classification organized into a hierar- synonymy chical structure and applicable to a defined A type of semantic relation in which two domain. Often used to refer to the clas- words or terms have the same or very sification of living organisms according to similar meaning. See also near synonymy physical characteristics, but the term and and true synonymy. principles can be applied to classification in any discipline. Unlike thesauri, taxono- syntax mies do not typically include synonyms In the context of this book, the structure of and associative relationships. See also elements in a compound term or name (e.g., folksonomy. last name first, comma, first name, middle initial) or heading; also used to refer to term the structure of elements in a search query A word or group of words representing (e.g., rules for the placement of the Boolean a single concept; a vocabulary record operators OR, AND, or NOT between terms); comprises terms and other information, and analogous to the linguistic structure of including relationships, scope notes, elements in a sentence. sources, etc. Additionally, in the jargon of thesaurus construction, the word term synthesis note is often used as shorthand to refer to the A brief preliminary finding, example, or concept that is represented by that term recommendation. This expression was (e.g., BT and NT actually refer to the rela- used in the original print publication of tionships between concepts). The distinc- the AAT to refer to bottom-of-page notes tion between a term in the strict sense throughout each subfacet (or hierarchy) that and term meaning a record must often be suggested ways in which descriptors from inferred from the context of the discussion. that subfacet could be combined in post- 236 Introduction to Controlled Vocabularies

term frequency (TF) seminormalized transcriptions, meaning An automatic ranking method often used both substantive and accidental features of in a formula with inverse document the original are retained, but abbreviations frequency in information retrieval and text are spelled out using brackets or other mining to measure how important a term is punctuation to distinguish the original from to a set of data and how useful it will be in the editorial content. retrieval. translation term record The process of changing a term or text from In the jargon of thesaurus construction, the one language into another by interpreting collection of information associated with the meaning of the original (source) term a descriptor, including the history of the and expressing it as an equivalent in the term, its relationships to other terms and second (target) term (e.g., copper mines in records, etc. In this book, it is referred to as English is translated as mines de cuivre a record (or a concept record) in order in French). to distinguish it from the information that transliteration is actually associated only with the term The process of rendering the letters or table in a relational database model (e.g., characters of one alphabet or writing system language of the term, contributor of into the corresponding letters or characters the term). of another alphabet or writing system, text generally based on phonetic equivalen- In the context of this book, data that is cies. While a common noun will often be not vocabulary controlled and generally translated, a proper name in a non-Roman unstructured beyond the common structure alphabet is more often transliterated. There of standard language expressions of are often multiple standards for transliter- characters, words, sentences, or para- ating from one writing system to another, graphs. See also free-text field. thus producing multiple variant names. thesaurus tree structure A controlled vocabulary arranged in a A controlled vocabulary display format in specific order and characterized by three which the complete hierarchy of records is relationships: equivalence, hierarchical, and shown or accessible by clicking. The tree associative. Thesauri may be monolingual or structure may be constructed by assigning a multilingual. Their purposes are to promote tree number or line number to each record, consistency in the indexing of content and to or by another method. See also hierar- facilitate searching and browsing. chical display. top term (TT) true synonymy See root. In thesauri, the relationship indi- The characteristic of terms or names cator for this type of term is TT. that have meanings that are identical or as nearly identical as is possible with transcription language. The purpose of enforcing true In the context of cataloging art, the process synonymy in a vocabulary is to increase of recording a term or text word-for-word precision in indexing and retrieval. See also and letter-for-letter, including accurately near synonymy and synonym. copying capitalization, punctuation, spacing, line breaks, illegible passages, truncation and all other possible aspects of the orig- In searching and matching, the action of inal (e.g., to accurately express the nuances cutting off characters in a search term of an artist’s signature or an ancient archi- in order to find all terms with a certain tectural inscription). Transcriptions in this common string of characters; typically context are typically semidiplomatic or involves the user employing a wildcard Glossary 237 symbol to search for a string of characters user interface (UI) no matter what other characters follow The portion of the design and function- (or sometimes, precede) that string (e.g., ality of a cataloging, editorial, search and searching for arch* will retrieve arch, retrieval, or other system or Web site with arches, architrave, architecture, architec- which end users interact, including the tural history, etc.). arrangement of displays, menus, clickable text or images, pagination, etc. A user inter- trunk name face that is easy for users to utilize is called See focus. user friendly. typography user requirements The font style and size, and arrangement, In system design, the initial formal explana- appearance, and layout of words and texts tion of functionalities, displays, and reports on a page; in the context of this book, one expressed from the point of view of the of the critical elements in designing an end- users’ needs and expectations. See also user display of vocabulary records. specifications. Unicode user warrant A 16-bit character encoding scheme and Justification for a term in a controlled standard for representing letters, characters, vocabulary based on the frequency of user and diacritical marks in most of the world’s queries that employ the term. User warrant modern scripts. may be used for terms intended for retrieval unique identifier but is typically not sufficient warrant for A number or other string that is associated posting a term in a thesaurus used for with a record or piece of data, exists only indexing. See also literary warrant and once in a database, and is used to uniquely organizational warrant. identify and disambiguate that record or variable piece of data from all others in the database. In a query, criteria or factors that may be upload changed to produce different results (e.g., See load. as may be expressed in a where clause, as the relationship type code in this query: up-posting select distinct subjecta_id from associa- Also known as autoposting. The tive_rels where rel_type_code = ’2110’;). automatic generation of search terms or See also criteria. indexing terms by adding broader terms to the specific term requested by a searcher variant term or used by the indexer. See also generic In a vocabulary, a term that is not the posting. preferred term but refers to the same concept, including used for terms and alter- used for term nate descriptors. Also called a UF. In thesaurus jargon, a term that is not a descriptor and not an vector-space model alternate descriptor. If the thesaurus is A method of automatic weighting in retrieval being used as an authority, a used for term where an algebraic model is used for term is not authorized for indexing. Used for frequency and distribution, creating repre- terms typically comprise spelling or gram- sentative vectors in multiple dimensional matical variants of the descriptor or have space; when compared to the vectors of an true synonymy with the descriptor. incoming query, the relevance of results may be predicted. user See end user. verbal units (VU) In linguistics and computer science, the phonemic, morphemic, or grammatical 238 Introduction to Controlled Vocabularies

clauses or units of language or texts, whole/part relationship corresponding in part to syllables, letters, Also called a partitive relationship. or words. A hierarchical relationship between a larger entity and a part or component. In visible Web the context of cataloging art, it typically The subset of the World Wide Web that is refers to a relationship between two work visible to Web browsers and can be indexed records or two records in a thesaurus by search engines’ Web crawlers or robots, (e.g., Florence is part of Tuscany). See in contrast to pages that are impenetrable also genus/species relationship and by search engines or to data that is gener- instance relationship. ated dynamically. wildcard visual arts Also called a wildcard character or See art. wildcard symbol. In searching, a char- vocabulary acter or symbol, such as an asterisk or See controlled vocabulary. percent sign, that is used to represent any other character or characters in a Boolean vocabulary control query or other string (e.g., the asterisk in The process of enforcing the use of certain Buonar*). terminology with the goal of providing consistency and improving retrieval. word sense disambiguation (WSD) In automatic search and retrieval, the warrant problem of determining in which sense a In the context of vocabularies, sources homograph is intended in a given data set that provide justification for the spelling or text. See also disambiguation. and usage of a term to refer to a particular usage for a concept, including warrant of work publications, common usage by experts of a In the context of this book, a creative discipline, or other sources. product, including architecture; artworks such as paintings, drawings, graphic arts, Web browser sculpture, decorative arts, and photo- A software application that enables users graphs that are considered to be art; and to view and interact with information and other cultural artifacts. A work may be a media files on the Web (e.g., Internet single item or may be made up of many Explorer, Mozilla Firefox, and Safari). physical parts. Web site XML (Extensible Markup Language) A collection of related electronic pages A simple, flexible markup language derived (Web pages), generally formatted in from SGML. Originally designed for large- HTML and found at a single address where scale electronic publishing, but now playing the server computer is identified by a given an increasingly important role in the publi- host name. cation and exchange of a wide variety of weighted term ranking data on the Web. See best match.