Glossary 1 of 7

Glossary

AACR (Anglo-American back-end database geneous cultural heritage information. Cataloguing Rules) A database that contains and manages http://cidoc.ics.forth.gr/. A data content standard for describing data for an information system, distinct client bibliographic materials. http://www from the presentation or interface compo- An application that retrieves and/or .aacr2.org/. nents of that system. renders resources or resource manifesta- CCO (Cataloging Cultural Objects) tions. Often used to denote a computer A formula or procedure for solving a A data content standard for describing or other kinds of devices connected to problem or carrying out a task. An algo- works of art, architecture, and material a network, equipped with software that rithm is a set of steps in a very specific culture. enables users to access resources avail- order, such as a mathematical formula or http://www.vraweb.org/ccoweb/cco/ able on another computer connected to the instructions in a computer program. index.. the same network, called a server. See also server. application profile CDWA (Categories for the A set of elements, policies, and Description of Works of Art) conceptual data model guidelines defined for a particular appli- A set of metadata categories and recom- An abstract model or representation of cation or community. The elements may mendations that may be used to design data for a particular domain, business be from one or more element sets, thus information systems and to do cataloging enterprise, field of study, etc., indepen- allowing a given application to meet its for art, architecture, objects of material dent of any specific software or informa- functional requirements by using meta- culture, and archaeological and archival tion system. Usually expressed in terms data from several element sets, including materials. http://www.getty.edu/research/ of entities and relationships. See also locally defined elements. conducting_research/standards/cdwa/. logical data model. authentication CDWA Lite crosswalk A human or machine process that verifies An XML schema for core records for A chart or table (visual or virtual) that that an individual, computer, or informa- art, architecture, and material culture represents the semantic mapping of fields tion object is who or what it purports designed to work with the OAI-PMH; the or data elements in one data standard to be. elements are based on a subset of the full to fields or data elements in another element set of Categories for the Descrip- standard that has a similar function or authority file tion of Works of Art (CDWA). http://www meaning. Crosswalks make it possible to A file, typically electronic, that serves .getty.edu/research/conducting_research/ convert data between databases that use as a source of standardized forms of standards/cdwa/cdwalite.html. different metadata schemes and enable names, terms, titles, and so on. Authority heterogeneous databases to be searched files should include references or links CGI script simultaneously with a single query as if from variant forms to preferred forms. A computer program, most frequently they were a single database (semantic For example, in the Library of Congress written in , Perl, or a shell script, that interoperability). Also known as field Name Authority File (LCNAF), “Schia- uses the Common Gateway Interface mapping. See also metadata mapping. vone, Andrea” is the preferred name form (CGI) standard and provides an inter- for a Dalmatian artist active in Italy during active interface between a user or an DACS (Describing Archives: the sixteenth century, while “Medulic´, external computer application and a A Content Standard) Andrija,” “Lo Schiavone,” and several World Wide . CGI scripts are A data content standard for describing other forms are listed as variant names. most commonly used to develop forms archival collections. http://www Authority files regulate usage but also that allow users to submit information to .archivists.org/catalog/pubDetail. provide additional access points, thus a Web server. asp?objectID=1279. increasing both the precision and the CIDOC CRM (CIDOC Conceptual data content standard recall of many searches. Reference Model) Rules that determine the vocabulary, An object-oriented ontology for the syntax, or format of content entered into mediation and interchange of hetero- data fields or metadata elements, for

Many thanks to Marcia Lei Zeng of the School of Library and Information Science at Kent State University, who reviewed the glossary and provided extremely valuable input.

Introduction to Metadata 3.0 ©2008 J. Paul Getty Trust 2 of 7 Introduction to Metadata

example, Anglo-American Cataloguing dynamically generated FTP (File Transfer Protocol) Rules (AACR), ISO 8601 (rules for Refers to a Web page, metadata record, A TCP/IP protocol that allows data files to recording date and time), Describing or other information object that is gener- be copied directly from one computer to Archives: A Content Standard (DACS), ated on demand, typically from content another over the Internet. Cataloging Cultural Objects (CCO). stored in a database, and usually either finding aid in response to a user’s input or from data provider (OAI nomenclature) A descriptive tool widely used in dynamic data sources that are refreshed An organization that exposes metadata archives. Finding aids typically take the periodically. The expression “on the fly” records in one or more repositories form of hierarchical narrative descriptions is often used in relation to dynamically (specially configured servers) for of cohesive groups of archival records generated content. harvesting by service providers. or collections of manuscript materials. EAD (Encoded Archival Description) Finding aids traditionally were paper A data structure standard for encoding documents; EAD is a structured way of See Hidden Web. archival finding aids in SGML or XML expressing finding aids as machine- default values according to the EAD DTD or EAD XML ­readable data. Values that are assumed or supplied schema, making it possible for the FRBR (Functional Requirements for automatically, for example, by a computer semantic contents of a hierarchically Bibliographic Records) system, if a value is not specified. structured finding aid to be machine A set of requirements and a conceptual processed. http://www.loc.gov/ead/. digital signatures entity-relationship model developed by A form of electronic authentication of a encryption the International Federation of Library digital document. Digital signatures are An encoding mechanism used to prevent Associations and Institutions (IFLA) to created and verified using public key nonauthorized users from reading digital support bibliographic access and control. cryptography and serve to tie the docu- information and also for user and docu- http://www.ifla.org/VII/s13/frbr/frbr.htm. ment being signed to the signer. ment authentication. Only designated FRBRoo users or recipients have the capability to digital surrogate A joint initiative of the International decode encrypted materials. A digital “copy” of an original work or Federation of Library Associations and item, for example, a JPEG or TIFF image entity-relationship model Institutions (IFLA) and the International of a painting or sculpture or a PDF file of A type of conceptual data model that Council of Museums–International Docu- an article or book. In OAI nomenclature, represents structured data in terms of mentation Committee (ICOM-CIDOC) digital surrogates are often referred to as entities and relationships. An entity- to create an object-oriented ontology “resources.” ­relationship diagram can be used to that both captures the semantics of represent information objects and their bibliographic information and harmo- DTD (Document Type Definition) relationships visually. Because the nizes those concepts in common with the A collection of markup declarations constructs used in the entity-relationship CIDOC CRM, thus facilitating information that define the structure, elements, and model can easily be transformed into interchange between the museum and attributes that can be used in encoding relational tables, this type of model is library communities. http://cidoc.ics certain type of documents in SGML or, often used in database design. .forth.gr/frbr_inro.html. more commonly, in XML. Examples of DTDs include the EAD DTD, the HTML EXIF (Exchangeable Image File folksonomy DTD, and the TEI DTD. XML DTDs are Format) An assemblage of concepts, represented gradually being replaced by the newer A specification for an image file format by terms and names (called “tags”), the XML schemas. for digital cameras that provides the result of social tagging. Note that a folk- ability to attach image metadata to JPEG, sonomy is not a true taxonomy. See also Dublin Core Metadata Element Set TIFF, and RIFF images. As of this writing, social tagging, taxonomy. (DCMES) EXIF is not maintained by any industry A set of 15 metadata elements that can Google Sitemap or standards organization but is widely be assigned to information resources, Metadata about the content of a Web used by camera manufacturers. http:// optimized for resource discovery on the site that assists the Web www.exif.org/. . Also often used as a crawler to index a site more efficiently “lowest common denominator” in meta- field mapping and comprehensively. www.google data mapping. http://dublincore See crosswalk. .com/webmasters/sitemaps/. .org/documents/dces/.

Introduction to Metadata 3.0 ©2008 J. Paul Getty Trust Glossary 3 of 7

granularity Internet MARC (Machine-Readable The level of detail at which an information A global collection of computer networks Cataloging format) object or resource is viewed or described. that exchange information by the TCP/IP A set of standardized data structures for suite of networking protocols. describing bibliographic materials that harvester (OAI nomenclature) facilitates cooperative cataloging and data A computer system that sends OAI-PMH Internet directory exchange in bibliographic information requests to OAI data providers’ reposito- A thematically organized list of descrip- systems. http://www.loc.gov/marc/. ries and harvests metadata records from tive links to Internet sites, often created them. by humans who have classified sites by markup language their content. Yahoo! provides numerous A formal way of annotating a document or header metadata such directories. collection of digital data using embedded Metadata embedded in the header part of encoding tags to indicate the structure of a digital file. interoperability the document or datafile and the contents The ability of different information Hidden Web (also known as of its data elements. This markup also systems to work together, particularly in Deep Web, Invisible Web) provides a computer with informa- the correct interpretation of data seman- The sum of the Web pages that are not tion about how to process and display tics and functionality. See also semantic accessible to Web crawlers, usually marked-up documents. HTML, XML, interoperability. because they are either dynami- and SGML are examples of standardized cally generated by a user querying a Invisible Web markup languages. database or password-protected or See Hidden Web. memory institution subscription-based. legacy system A generic term used to describe an hostname An information system that has been institution that has a responsibility to An identifier for a specific machine on developed and modified over a period collect, care for, and provide access to the the Internet. The hostname identifies not of time and has become outdated and human record—for example, museums, only the machine but also its subnet and difficult and costly to maintain but that libraries, and archives. domain, for example, www.getty.edu. See holds important information and involves metadata mapping also domain name. processes that are deeply ingrained in an A formal identification of equivalent or organization. Legacy systems usually are HTML (HyperText Markup nearly equivalent metadata elements eventually replaced by a new hardware Language) or groups of metadata elements within and software configuration. An SGML-derived markup language different metadata schemas, carried used to create documents for World Wide link resolver out in order to facilitate applications. HTML has evolved to Software that uses the OpenURL stan- interoperability. emphasize design and appearance rather dard to automatically redirect a user’s metadata mining than the representation of document request to the most appropriate copy of a The automated extraction of metadata structure and metadata elements. networked digital object. Typically, link from electronic documents. resolvers are used by libraries to direct HTTP their patrons from bibliographic records metasearch HyperText Transfer Protocol, the standard or abstracts to licensed subscription- Searching of diverse databases on protocol that enables users with Web based resources such as full-text elec- diverse platforms with diverse metadata browsers to access HTML documents and tronic versions of articles and books. in real time by means of one or more related media. http://www.niso.org/standards/standard_ protocols. The NISO MetaSearch Initia- detail.cfm?std_id=783. tive defines metasearch as “search and An abbreviated reference to a “hypertext retrieval to span multiple databases, logical data model link,” a method of creating nonlinear sources, platforms, protocols, and A data model that includes all enti- pathways between related digital docu- vendors at once.” Metasearch enables ties and the relationships among them ments or to link to related objects such as users to enter search criteria once and based on the structures identified in a image or audio files. access several search engines simultane- conceptual data model and that specifies ously. With metasearch,­ fresh records are information object all attributes for each entity. The data is always available, because searching is in A digital item or group of items referred described in as much detail as possible, real time, in a distributed environment. to as a unit, regardless of type or format, without regard to how it will be physically http://www.niso.org/committees/MS_ that a computer can address or manipu- implemented in a specific database. initiative.html. late as a single discrete object.

Introduction to Metadata 3.0 ©2008 J. Paul Getty Trust 4 of 7 Introduction to Metadata

m e t a tag purposes, bandwidth is generally (and precision An HTML tag that enables metadata to be incorrectly) used to refer to the rate of A measure of search effectiveness embedded invisibly on Web pages, for data transfer. expressed as the ratio of relevant records example, Description, Keywords. or documents retrieved from a database OAI-PMH (Open Archives Initiative to the total number retrieved in response m e t a tag spamming Protocol for Metadata Harvesting) to the query; for example, in a database The deliberate misuse of meta tags A protocol used to harvest or collect containing 100 records relevant to the in order to attract traffic to a site, for metadata records from data providers. topic “book history,” a search retrieving example, by boosting its ranking in http://www.openarchives.org/pmh/. 50 records, 25 of which are relevant to the search results. object-oriented topic, would have 50 percent precision METS (Metadata Encoding A programming or data modeling (25/50). (Definition from ODLIS, Online Transmission Schema) methodology that utilizes the notion of Dictionary for Library and Information A standard for encoding descriptive, classes and their properties. Members Science, http://lu.com/odlis/.) See also administrative, and structural metadata (or instances) of a class share the same recall. relating to objects in a , properties—for example, color or weight protocol expressed in XML. METS enables the (however, note that although members of A specification—often a standard—that “packaging” of complex digital objects a class all share the same properties, the describes how computers communicate that include a range of metadata as well values of those properties do not need with each other, for example, the TCP/IP as related digital surrogates. http://www to be the same). Classes can contain suite of communication protocols or the .loc.gov/standards/mets/ subclasses, members of which inherit the OAI-PMH. properties of the parent or “superclass.” MODS (Metadata Object Description RDF (Resource Description Schema) ontology Framework) An XML schema for bibliographic A formal, machine-readable specification An application of XML that enables the records, developed and maintained of a conceptual model, in which concepts, creation of rich, structured, machine- by the Library of Congress. http://www properties, relationships, functions, readable resource descriptions. http:// .loc.gov/standards/mods/. constraints, and axioms are all explicitly www.w3.org/RDF/. defined. namespace RDF schema The set of unique names used to OPAC (Online Public Access A set of semantics within a defined identify objects within a well-defined Catalog) namespace for use with specific applica- domain, particularly relevant for XML A computerized inventory of a library’s tions of RDF. applications. An XML Namespace is holdings. a W3C recommendation for providing recall Open WorldCat uniquely named elements and attributes A measure of the effectiveness of a search A subset of the WorldCat union biblio- in an XML instance. A namespace is expressed as the ratio of the number of graphic database made available by declared using the reserved XML attri- relevant records or documents retrieved OCLC to certain Web search engines and bute xmlns, the value of which must in response to the query to the total online book retailers. http://www.oclc be a URI (Uniform Resource Identifier) number of relevant records or docu- .org/worldcat/open/. reference. For example, the Dublin ments in the database; for example, Core Metadata Element Set, Version 1.1 PageRank™ (Google) in a database containing 100 records (original 15 elements) has the approved A proprietary link-analysis algorithm relevant to the topic “book history,” a DCMI namespace URI as http://purl. developed by Google founders search retrieving 50 records, 25 of which org/dc/elements/1.1/. and to assign a numerical are relevant to the topic, would have score to each document in a set of hyper- 25 percent recall (25/100). (Definition nesting text documents based on the number of from ODLIS, Online Dictionary for Library The way in which subelements may referring links. The algorithm also takes and Information Science, http://lu.com/ be contained within larger elements, into account the rank of the referring odlis/.) See also precision. resulting in multiple levels of metadata. page, such that a link from a high-ranking relevance network bandwidth page counts more than a link from a low- The extent to which information retrieved Derived from the term used to describe ranking page. http://www.google in a search of a library collection or other the size or “width” of the frequencies .com/technology/. resource, such as an online catalog or used to carry analog communications a bibliographic database, is judged by such as television and radio. For Internet the user to be applicable to (“about”) the

Introduction to Metadata 3.0 ©2008 J. Paul Getty Trust Glossary 5 of 7

subject of the query. Relevance depends context of the World Wide Web, the term on the searcher’s subjective perception usually refers to a program that searches The decentralized practice and method of the degree to which the document a large index of Web pages generated by by which individuals and groups create, fulfills the information need, which may an automated . See also Web classify, store, discover, and share Web or may not have been expressed fully or . bookmarks or “favorites” in an online with precision in the search statement. “social” environment. semantic interoperability Measures of the effectiveness of infor- The ability of different agents, services, social tagging mation retrieval, such as precision and and applications to communicate data The decentralized practice and method recall, depend on the relevance of search while ensuring accuracy and preserving by which individuals and groups create, results. (Definition from ODLIS, Online the meaning of the data (definition based manage, and share terms, names, and so Dictionary for Library and Information on Marcia Bates and Mary Niles Maack, on (called tags), to annotate and catego- Science, http://lu.com/odlis/.) Encyclopedia of Library and Information rize digital resources in an online “social” relevance ranking Sciences, 3rd ed. [New York: Marcel environment. A folksonomy is the result The algorithmic process, a feature of Dekker, forthcoming]). of social tagging. Also referred to as many search software applications, by collaborative tagging, social classifica- Semantic Web which results in a result set are sorted tion, social indexing, mob indexing, folk An evolving, collaborative effort led or ranked according to their relevance. categorization. See also folksonomy, by the W3C whose goal is to provide a In OPACs, for example, relevance is tagging. common framework that will allow data computed based upon the number of to be shared and re-used across various spamming occurrences of the search term in the applications as well as across enterprise Used in reference to meta tags. The abuse record that is retrieved, and the weight and community boundaries. It derives of metadata that creators include in the assigned to the field(s) in which the from W3C director and inventor of the HTML header area of their Web pages search term appears. (Definition from World Wide Web Sir Tim Berners-Lee’s in order to increase the number of visi- ODLIS, Online Dictionary for Library vision of the Web as a universal medium tors to a Web site. Keyword spamming and Information Science, http://lu.com/ for data, information, and knowledge entails repeating keywords multiple times odlis/.) Google’s PageRank™ is an exchange. in order to appear at the top of search example of a relevance ranking algorithm. engine result listings or listing keywords server resource discovery that are irrelevant to the site in order to An application that supplies resources or The process of searching for specific attract visitors under false pretenses. resource manifestations. Often used to information objects on the Web. refer to a networked computer that acts as spider robot a source of data and/or applications used See Web crawler. See Web crawler. by multiple client computers or devices. SRU/SRW (Search and Retrieve See also client. schema via URL/Search and Retrieve Web A set of rules for encoding information service provider (OAI Service) that supports specific communities of nomenclature) Companion protocols for Web search users. Also called “scheme.” The plural An institution or organization that queries utilizing the CQL Common forms of the word schema are schemas harvests metadata from data providers Query Language. http://www.loc and schemata. See also XML schema. and uses the aggregated metadata as a .gov/standards/sru/. basis for building value-added services. schema registry surrogate An authoritative source of names, SGML (Standard Generalized See digital surrogate. semantics, and syntaxes for one or more Markup Language) tagging schemas. International Standards Organization In the context of the Web, the act of standard ISO/IEC 8879:1986; a markup screen scraping associating terms (called tags) with language first used by the publishing A technique in which display data an information object (e.g., a Web industry, for defining, specifying, and (usually unstructured) is automatically page, an image, a streaming video creating digital documents that can be retrieved and extracted, for example, from clip), thus describing the item and delivered, displayed, linked, and manipu- a Web page. enabling keyword-based classification lated in a system-independent manner. and retrieval. Tags—a form of user- search engine XML and HTML are derived from SGML. ­generated metadata—from communities A computer program that allows users of users can be aggregated and analyzed, to search electronic resources. In the

Introduction to Metadata 3.0 ©2008 J. Paul Getty Trust 6 of 7 Introduction to Metadata

providing useful information about the host and directory path. For example, on the Web and puts them in an index collection of objects with which the tags urn:issn:0167-6423 is the URN for or database that Web users can search have been associated. See also social the journal Science of Computer in a variety of ways. The search results tagging. Programming. provide links back to the pages matching the user’s search in their original taxonomy Visible Web location. An orderly classification that explicitly The subset of the World Wide Web that expresses the relationships, usually hier- is visible to Web browsers and indexable wiki archical (e.g., genus/species, whole/part, by search engines’ Web crawlers. To be A collaborative Web site that contains class/instance), between and among the accessible to Web crawlers, the pages pages that any authorized user can edit. things being classified. must be accessible simply by following Wikis typically retain all former versions links (i.e., not generated dynamically in of each page, allowing the revision TCP/IP (Transmission Control response to user input) and not protected history of a page to be tracked and for Protocol/ Internet Protocol) by a password. unwanted revisions to be reversed. The ISO standardized suite of network protocols that enables information VRA Core 4.0 systems to communicate with other infor- An XML schema for describing works A free, collaborative, volunteer-driven mation systems on the Internet, regard- of art and architecture and their visual Web-based encyclopedia that utilizes wiki less of their computer platforms. surrogates. http://www.vraweb.org/­ software to allow anyone to edit articles. projects/vracore4/index.html http://en.wikipedia.org/wiki/. TEI (Text Encoding Initiative) An international cooperative effort to W3C (World Wide Web Consortium) World Wide Web develop guidelines for standard encoding The main international standards organi- A vast distributed wide-area client-server schemes (i.e., the TEI and TEI Lite DTDs) zation for the World Wide Web. architecture for retrieving hypermedia for literary and linguistic texts. http:// documents over the Internet. Web 2.0 www.tei-c.org/. A phrase used loosely by the Web devel- XHTML (Extensible HyperText URI (Uniform Resource Identifier) opment community to refer to a perceived Markup Language) A short string that uniquely identifies a “second generation” of Web technologies A reformulation of HTML in XML. resource such as an HTML document, an and applications. Wikis, folksonomies, XML (Extensible Markup Language) image, a downloadable file, or a service. gaming, podcasting, blogging, and so on, A simple, flexible markup language and URNs are types of URIs. are all considered Web 2.0 applications. derived from SGML. Originally designed URL (Uniform Resource Locator) Web browser for large-scale electronic publishing, A type of URI consisting of an Internet A software application that enables users XML is now playing an increasingly address that tells users how and where to view and interact with information and important role in the publication and to locate a specific file on the World media files on the Web. Internet Explorer, exchange of a wide variety of data on Wide Web. A URL includes not only the Mozilla Firefox, and Navigator the Web. name of a file but also the name of the are examples of Web browsers. XML schema host computer, the directory path to get Web crawler (robot, spider) A machine-readable definition of to that file, and the protocol needed in A software program that systematically the structure, elements, and attri- order to use it (e.g., http://www.getty.edu/ traverses the Web, either for the purpose butes allowed in a valid instance of research/conducting_research/standards/ of generating a searchable index of Web a conforming XML document. XML intrometadata/intro.html specifies that the content or to gather statistics. schemas are expressed using the hypertext transfer protocol “http” should XML Schema Definition language, a be used to retrieve the document intro. Web server W3C standard. http://www html from the host www.getty.edu in the A computer that is able to respond to .w3.org/TR/xmlschema-0/. directory research/conducting_research/ HTTP requests from clients known as standards/intrometadata. Web browsers and return the appropriate XMP (Extensible Metadata HTTP responses—most typically serving Platform) URN (Uniform Resource Name) an HTML page. A markup language, based on RDF, for A type of URI consisting of a unique, recording and embedding metadata location-independent identifier of a Web search engine/Internet about digital assets. Developed by Adobe file available on the Internet. The file search engine Systems and supported across the remains accessible by its URN regard- A software program that collects data company’s range of software products less of changes that might occur in its taken from the content of files available

Introduction to Metadata 3.0 ©2008 J. Paul Getty Trust Glossary 7 of 7

and file formats. http://www.adobe.com/products/xmp/index. html. Z39.50 An ISO 23950 and ANSI/NISO Z39.50 standard information retrieval protocol. Z39.50 is a client/server-based protocol for searching and retrieving information from remote databases.

http://www.getty.edu/research/conducting_research/standards/intrometadata/

Introduction to Metadata 3.0 ©2008 J. Paul Getty Trust