Controlled Vocabularies Specification
Total Page:16
File Type:pdf, Size:1020Kb
Ref. Ares(2020)1128555 - 21/02/2020 Deliverable D4.1 Controlled Vocabularies Specification Lead-beneficiary LBI Work Package No. and Title WP4 Curation and Enrichment of Metadata Work Package Leader JLU Relevant Task Task 4.1 Development of controlled vocabularies Task Leader LBI Main Author(s) LBI (Ingo Zechner, Jakob Zenzmaier, Alexander Prenninger) Contributor(s) HUJI (Tobias Ebbrecht-Hartmann, Lital Henig, Noga Stiassny) CERCEC (Irina Tcherneva) MM (Christian Dürr) JLU (Anja Horstmann, Ulrike Koppermann) Reviewer(s) OFM (Michael Loebenstein) HUJI (Tobias Ebbrecht-Hartmann) TUW-TA (Sebastian Hofstätter) TUW-CVL (Daniel Helm) DFF (Eleonore Emsbach, David Kleingers) Dissemination Level Public Due Date M14 (2020-02), rescheduled from M12 (2019-12) Version (No., Date) V1.8, 2020-02-21 Controlled Vocabularies Specification Table of Contents 1. INTRODUCTION .............................................................................................................. 3 2. METHODOLOGICAL BACKGROUND ............................................................................... 4 3. CREATION PROCESSES ................................................................................................... 8 3.1. Primary and secondary sources ................................................................................. 8 3.2. Vocabularies from peer projects ............................................................................... 8 EFG ............................................................................................................................... 9 EFilms ........................................................................................................................... 9 I-Media-Cities ............................................................................................................... 9 3.3. Other existing vocabularies ....................................................................................... 9 3.4. Classification of image relations ............................................................................. 10 3.5. Meetings and workshops ......................................................................................... 10 3.6. Consolidation and review ........................................................................................ 10 4. THE VHH VOCABULARIES: OVERVIEW ........................................................................ 12 4.1. Reused vocabularies ................................................................................................. 12 EFG vocabularies reused ............................................................................................. 12 EFG vocabularies adapted ........................................................................................... 13 EFG vocabularies partially integrated ......................................................................... 13 EFG vocabularies not reused ....................................................................................... 13 4.2. Newly created vocabularies ...................................................................................... 14 ContentComponent Terms .......................................................................................... 14 Taxonomy of Relations ................................................................................................ 14 Other TBA Terms ......................................................................................................... 14 NTBA Terms ................................................................................................................ 15 Names .......................................................................................................................... 15 APPENDIX A. CONTENTCOMPONENT TERMS IN VHH .................................................... 16 APPENDIX B. SHOTTECHNIQUE TERMS IN VHH ............................................................ 45 APPENDIX C. HISTORICREGION NAMES IN VHH ........................................................... 47 VHH_D4-1_Vocabularies-Specification_v1-8_2020-02-21.docx 2 1. Introduction This document summarizes the controlled vocabularies to be used in the VHH project as well as the methodologies and processes applied to create them. It refers to existing vocabularies that are reused in VHH and presents newly created vocabularies that have been developed for the specific objectives of VHH. Those two types of vocabularies constitute the VHH Vocabularies providing the values for those elements, attributes and relationships within the VHH-EFG Metadata Schema that are defined by controlled vocabularies. The purpose of this document is to • provide descriptive and analytical terms for the metadata enrichment of filmic and related non-filmic heritage material; • inform the implementation of the Digital Asset Management (DAM) component of the Visual History of the Holocaust Media Management and Search Infrastructure (VHH-MMSI); • serve as best practice model to institutions evaluating and creating vocabularies for metadata enrichment. This document is a living document reflecting the outcomes of year 1 of a 4-year project. There will be adaptions of and extensions to the VHH Vocabularies until MMSI v1 will be completed in M20 (August 2020). Some adaptions are expected to occur even after this deadline and will be based on tests of and experiences with metadata enrichment through the VHH-MMSI. This deliverable is to be used together with the following deliverables: • D4.2 Metadata Integration Concept (M12) • D3.1 Definition of Engagement Levels, Usage Modes, and User Types (M12) • D5.1 Requirements Document (M8) • D5.3 System Design v1 (M12) VHH_D4-1_Vocabularies-Specification_v1-8_2020-02-21.docx 3 2. Methodological background Within a metadata schema, controlled vocabularies have two main functions: • to limit the number of possible values in a field • to reduce the ambiguities of natural language However, thoroughly created vocabularies will not only provide lists of vocabulary entries but also definitions of each entry’s meaning and its relation to other entries in the list, which is now called an “ontology” in information science. A controlled vocabulary is a glossary, not a dictionary: it does not contain every word or phrase actually used in a specific language but rather a carefully selected and well-defined list from a specific domain of that language, excluding homonyms and synonyms (or rather relating to them). Controlled vocabularies are at the core of subject headings, thesauri and taxonomies. Indexing has been a key task in library science since its beginnings. Its methods have been adopted as well as adapted by archival science. Indexes are commonly described as lists of words or phrases (traditionally called “headings”) associated with pointers (“locators”) providing the basis for any targeted search and find functionalities. An index entry in a library catalog or archival catalog usually refers to a call number (of a book, a file etc.) while in a book index (also called “back-of the-book index”) an entry usually refers to a page number or any other part of the book. In both cases the index is used to find something: a book on a shelf or a file in a box, a section in a book. It is also common to distinguish between three main types of indexing languages:1 • free indexing language: any term (not only from the document) can be used to describe the document • natural language indexing language: any term from the document in question can be used to describe the document • controlled indexing language: only approved terms can be used by the indexer to describe the document While the VHH-EFG Metadata Schema in some of its fields gives room to the first two, most of its fields are restricted to controlled indexing language aka controlled vocabularies. This is not due to disregard for the richness and power of natural language but to specific requirements in a scholarly informed project: the need to apply terms consistently • to allow for structured search functionalities (in a tree data structure) • to make the indexed elements comparable to each other and thus quantifiable 1 https://en.wikipedia.org/wiki/Controlled_vocabulary (31.12.2019) VHH_D4-1_Vocabularies-Specification_v1-8_2020-02-21.docx 4 The art of indexing always consisted of a certain and sometimes uncertain balance between the literal repetition of terms used in a document and their subsumption under terms that may or may not be used literally in the document. In that sense, even descriptive terms have an analytical dimension as far as they generate at least some normalization. Anticipating and facilitating search queries are the main functions of every index in the digital as well as in the analog realm. Full-text search may replace the art of indexing in computing. However, even if full-text search is based on an index that contains every term used in a text, technologies are developed and applied to sort terms based on their relevance and information gain to retrieve meaningful concepts from this text. Controlled vocabularies provide such meaningful concepts. With images the situation is slightly different. Other than with texts there are usually not too many words to generate an index with meaningful entries but no words at all. Indexing images at the same time means verbalizing those images. However, there is usually much more information in images than may be verbalized, particularly in moving images. With time-based visual and audiovisual media,