CHAPTER K (CV)

K.1 INTRODUCTION Vocabulary is one of the main attributes of any language. In subject index- ing, vocabulary plays a very important role since the subject matters of the respective documents are represented by words or terms which are constituents of the vocabulary of the language used in indexing. As indi- cated earlier, mainly two types of languages are used in indexing, viz., uncontrolled or natural language and controlled (artificial) language. The difficulties faced while using natural language in indexing have been dis- cussed in the previous chapter. The concept of Controlled Vocabulary has emerged to obviate those difficulties.

K.2 DEFINITION OF CV A controlled vocabulary is an authoritative list of terms to be used in indexing (human or automated) [1]. More precisely, it is “an organized arrangement of words and phrases used to index content and/or to retrieve content through browsing or searching” [2]. A controlled vocabulary essentially includes preferred terms and may or may not include variant terms for cross-reference. A controlled vocabulary has “a defined scope or describes a specific domain” [3]. The term “controlled” here signifies that only terms from the list (vocabu- lary) can be used for indicating the subject of a document while indexing. It also signifies that “if it is used by more than one person, there is con- trol over who adds terms or how terms can be added to the list. The list could grow, but only under defined policies….. The objectives of a controlled vocabulary are to ensure consistency in indexing, tagging or categorizing and to guide the user to where the desired information is” [2].

K.3 CHARACTERISTICS OF CV The characteristics of different types of controlled vocabulary may slightly vary. But broadly the main characteristics of a controlled vocabulary are: ● It is based on any natural language vocabulary, but its size is always smaller than the vocabulary on which it is based; Elements of Information Organization and Dissemination © 2017 Amitabha Chatterjee. DOI: http://dx.doi.org/10.1016/B978-0-08-102025-8.00011-9 Published by Elsevier Ltd. All rights reserved. 151 152 Elements of Information Organization and Dissemination

● It allows only one term out of all and quasi-synonyms rep- resenting an idea for use in an index; ● It may allow use of variants of preferred terms for cross-referencing; ● It avoids use of homonyms, but in cases where it is at all not possible, qualifiers are added to indicate the context; ● The scope of the term is sometimes deliberately restricted to a selected meaning which is best suited for an indexing system; ● Spellings, number (singular/plural), and other word forms are standardized; ● A definite rule is followed for compound terms.

K.4 TYPES OF CV Controlled vocabularies are structured to enable displaying the different types of relationships among the terms they contain. There are different types of controlled vocabulary, determined by their increasingly complex structure. The main types of controlled vocabulary fall in the following sequence of increasing complexity.

Classification Scheme/ Authority List Ring Taxonomy Thesaurus Ontology

Increasing Complexity

Ambiguity Synonym Ambiguity Ambiguity Ambiguity control control control control control Synonym Synonym Synonym control control control Hierarchical Hierarchical Customized relationships relationships associations Associative relationships

ANSI/NISO Z39.19-2005 ISBN: 1-880124-65-3 Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies. (Note: The figure is based on the one proposed by Redmond-Neal [1].) The different types of controlled vocabulary are introduced below. However, thesaurus being the most widely used controlled vocabulary in alphabetical , it has been discussed in more detail.

K.4.1 Subject Authority List The simplest form of controlled vocabulary is subject authority list or file. This is a bare list of subject headings consistently being used by an Controlled Vocabulary (CV) 153 indexing system arranged primarily in alphabetical order. This is main- tained to ensure avoidance of synonyms by the indexers doing index- ing work simultaneously in an organization and also by the same indexer working at different times and different indexers working at different times. Such a list often does not indicate any type of relationship that might exist between the terms and as such is shorter in size.

K.4.2 Taxonomy The word taxonomy means the science of classifying things, and tradition- ally the classification of plants and animals, as in the Linnaean classification. It has become a popular term now for any hierarchical classification or categorization system [2]. In the field of it denotes “a kind of controlled vocabulary that has a hierarchy (broader term/narrower terms), but not necessarily the related-term relationships and other fea- tures of a standard thesaurus” [2]. Taxonomies are often displayed in a tree structure. Terms within a taxonomy are often called “nodes.” A node may be repeated at more than one place within the taxonomy if it has mul- tiple broader terms. This is referred to as a polyhierarchy. Another type of taxonomy, with a more limited hierarchy, comprises multiple sub-taxono- mies or “facets,” whereby the top-level node of each represents a different type of taxonomy, attribute, or context. This is used in post-coordinated searching, whereby the user chooses a combination of nodes, one from each facet. The use of equivalent synonyms or see references may or may not exist in a taxonomy. If a hierarchy is not too large and can be browsed, and especially if there are polyhierarchies, there is less of a need for non- preferred variants [4].

K.4.3 Subject Heading List A subject heading list is “a standard list of terms to be used as subject headings, either for the whole field of knowledge or for a limited subject area, including references made to and from each term, notes explaining the scope and usage of certain headings, and occasionally corresponding class numbers” [5]. Such a list is normally arranged alphabetically. Both preferred and rejected terms are listed in the same sequence. The terms are linked by “See” and “See also” references. The most well known sub- ject heading lists for the whole field of knowledge are Library of Congress Subject Headings and Sears List of Subject Headings, while Medical Subject Headings (MeSH) is an example of subject headings list on a limited sub- ject area. However, most of the subject headings lists have now adopted 154 Elements of Information Organization and Dissemination thesaural structure. More discussions on subject headings lists may be found in any book on library cataloguing or resource description.

K.4.4 Classification Scheme A classification scheme is a list of class terms with corresponding notation, accompanied by an alphabetical index. There are mainly two types of clas- sification schemes: enumerative and faceted. An enumerative classification scheme consists of a single list or schedule of all class terms representing universe of subjects or a subject domain, while a faceted scheme consists of different schedules of class terms representing different facets of the concerned domain. A classification scheme contains a notational vocabu- lary, while its index represents an alphabetical vocabulary. More discussions on classification schemes may be found in any book on library classifica- tion or .

K.4.5 Thesaurus As mentioned, thesaurus is the most widely used example of controlled vocabulary. The word “Thesaurus” is of Greek origin meaning “treasury or storehouse of knowledge” [6]. In modern usage, it denotes a list of terms arranged according to their relationships of ideas [7]. It was Peter Mark Roget who first conceived the idea of such a compilation and brought out in 1852 his Thesaurus of English Words and Phrases for the benefit of writers looking for appropriate words to express their ideas. Roget’s thesaurus had nothing to do with information retrieval, but his novel idea was profitably utilized in compilation of modern IR thesauri. According to B.C. Vickery, Helen Brownson was the first person who used the term “Thesaurus” in the context of IR in her paper presented at Dorking Conference on Classification Research in 1957. Hans P. Luhn was possibly the first person to think about information retrieval thesaurus, who suggested the com- pilation, for indexing purposes, of “families of notions,” and dictionary of “notional families,” very similar to the principles of Roget [8]. The first the- saurus used in information retrieval was developed at the E I Dupont de Nemours Company in the United States around 1959 and since then a large number of IR thesauri have been brought out in different subject fields.

K.4.5.1 Definition of Thesaurus An IR thesaurus, from the point of view of function, is “a terminologi- cal control device used in translating from the natural language of doc- uments, by indexer or users into a more constrained ‘system language’ Controlled Vocabulary (CV) 155

(i.e., documentation language, information language).” From the point of view of structure it is “a controlled and dynamic vocabulary of seman- tically and generically related terms which covers a specific domain of knowledge” [9]. According to Kent, it is “a compilation of terms of a given information system’s vocabulary, arranged in some meaningful form and which provides information relating to each term that will enable a user of the information file to predict the relevance of responses to ques- tions when this particular control mechanism is used” [10]. Briefly, it may be defined as a list of descriptors for use in information retrieval system arranged in a systematic order and manifesting various types of relation- ship existing between them [11].

K.4.5.2 Difference from S H List Both thesauri and subject headings lists control the use and form of index terms and summarize the relationships between terms in an indexing lan- guage. But, thesauri and subject headings lists achieve these two objec- tives in different environments. Most of the main subject headings lists are geared to the alphabetical subject approach found in dictionary catalogues. The unique features that distinguish a thesaurus from a subject headings list are [12]: ● A thesaurus is likely to contain terms that are more specific than those found in a subject headings list; ● A thesaurus tends to avoid inverted terms (such as Art, French); ● Headings in a thesaurus are not subdivided, but this is common in tra- ditional subject headings lists; ● The relationship display in a thesaurus is often more extensive than the relationship display in subject headings lists; ● Different types of relationships are shown in a thesaurus by the use of BT, NT, and RT, instead of see also which is frequently used to indicate all relationships, whatever their nature, in a subject headings list (Lately, however, many subject headings lists are also using BT, NT, and RT to show relationships); ● A thesaurus often has an additional explicit statement of the structure of the relationships between terms in the form of categorized lists or graphic display.

K.4.5.3 Similar Tools and Concepts There are also some other concepts, viz., Thesaurofacet and Classaurus, which possess not only all the features of a thesaurus, but also some 156 Elements of Information Organization and Dissemination additional features. Thesaurofacet has been developed by Jean Aitchison and others for English Electric Company. It is basically a faceted classification scheme with an alphabetical index in the form of thesaurus. “The terms in the system appear once in the thesaurus and once in the schedules, the link between the two locations being the notation or class number” [13]. The Classaurus developed by G. Bhattacharyya of Documentation Research and Training Centre, Bangalore, has only one section consist- ing of separate schedules for different facets, but the schedules incorporate within themselves the features of both classification scheme and a thesau- rus. The index of the classaurus is a usual alphabetical index.

K.4.5.4 Scope and Size of Thesaurus No ideal scope or size can be prescribed for an IR thesaurus. These will vary depending upon: ● Scope and complexity of the subject to be covered; ● Kind of retrieval objects and data to be processed; ● Intended exhaustivity and specificity of indexing. However, the scope of a thesaurus must be such that it can serve the spe- cific needs, viewpoints and priorities of the users.

K.4.5.5 Need and Characteristics of Thesaurus Any IR system, whether manual or mechanized, requires an articulate vocabulary free from homonyms and synonyms for its efficient function- ing. An IR thesaurus fulfills this need. The main characteristics of an IR thesaurus are: ● It provides descriptors to be used in indexing and retrieval; ● It shows the intrinsic, semantic relationships existing between the terms.

K.4.5.6 Types of Thesaurus From the point of view of terminological control, there are mainly two types of thesauri: ● Controlled thesauri—this type of thesauri allows only one term to denote a concept; ● Free language thesauri—this type of thesauri uses all terms found in relevant literature denoting a concept. The use of free language thesauri in an IR system may create problems of matching at search stage and hence controlled thesauri are mainly used in information retrieval. In a controlled thesaurus, vocabulary control is effected by following ways [7]: Controlled Vocabulary (CV) 157

● Only one term out of all possible synonyms and quasi-synonyms is selected as a descriptor; ● The scope of the term is deliberately restricted to a selected meaning which is best suited for an indexing system. (The scope is clearly indi- cated in scope note wherever required); ● Spellings, number (singular/plural), and word-forms are standardized; ● A definite rule is followed for compound terms; ● Homographs (homonyms) are avoided as far as possible and differenti- ated by qualifiers, if at all used. There can be another type of thesaurus known as Convertible Thesaurus or Source Thesaurus. This thesaurus serves as a “switching” or “reconciling thesaurus” for information interchange purposes.

K.4.5.7 Internal Structure of Thesaurus The arrangement of different components of an entry and the arrange- ment of different entries in relation to one another constitute the struc- ture of a thesaurus. Cross-references make explicit the ways in which entries relate to each other in a network of concepts [14]. An entry in a thesaurus consists of a bunch of terms led by a descriptor which is fol- lowed by the terms which are related in different ways. The different terms in an entry are displayed in the following format: Descriptor (with scope notes wherever needed) Synonyms and quasi-synonyms (displaying equivalent relationships) Broader terms (displaying hierarchical—super-ordinate relationships) Narrower terms (displaying hierarchical—subordinate relationships) Related terms (displaying associative relationships) Top term (displaying the broader class to which the descriptor belongs) The top term is usually not repeated when all the descriptors belong to the same broad class. The meanings of the above concepts are described below. ● Descriptors: The notion of descriptor was first introduced by Calvin Mooers, an American pioneer in the field of coordinate indexing. The descriptors are the terms allowed by a thesaurus to be used in 158 Elements of Information Organization and Dissemination

indexing. Descriptors are authorized and formalized terms or symbols used to represent unambiguously the concepts of documents and que- ries. Descriptors can be of two types: ● Terms denoting concepts and concept combinations, e.g., TELEVISION, SATELLITE TELEVISION ● Terms denoting individual entities (proper name identifiers), e.g., SAMSUNG TELEVISION. The terms denoting individual entities are often excluded from a thesau- rus in order to restrict the size and because these, if needed, can be sup- plied by the indexer without any fear of ambiguity. ● Scope note: A scope note clarifies or brings out the intended meaning or scope of the term. This is added to a descriptor whenever needed. ● Non-preferred terms: All the synonyms and quasi-synonyms displaying equivalence relationship with a descriptor, but not selected for index- ing purposes are non-preferred terms. A cross-reference is made for each non-preferred term, e.g., Microcopies USE MICROFORMS. ● Broader/Narrower terms: A term which is superordinate to a descriptor in a hierarchy is a broader term, while a term which is subordinate to a descriptor is a narrower term, e.g., MICRO FORMS BT Data Media NT Micro Transparencies. ● Related terms: The terms which are neither non-preferred, nor broader, nor narrower, but are related to the descriptor conceptually are called related terms.

K.4.5.8 Relationships Between Terms One of the most important functions of a thesaurus is to display how the concepts are related. The relationships that are shown in a thesaurus are broadly of two types, viz., Hierarchical and Non-hierarchical. Hierarchical relationship is easier to determine than non-hierarchical relationship. ● Hierarchical relationship: can be Hierarchical relationship shows the inter- relationship between the concepts in a hierarchy. It expresses degrees or levels of superordination and subordination between concepts. This type of relationship is considered to be the basic relationship which differenti- ates a thesaurus from other controlled lists of terms. Hierarchical relation- ship can be of three types: ● Genus–Species (Generic) relationship, e.g., SNAKES—COBRA Controlled Vocabulary (CV) 159

● Hierarchical Whole-Part relationship, e.g., HUMAN BODY— CHEST ● Instance relationship, e.g., CAMERA-NIKON However, some experts do not consider Whole-Part relationship as hierarchical relationship, while instance relationship is often not shown to control the size of a thesaurus. Some terms may belong to more than one hierarchy and consequently may be related to more than one broader term and more than one set of narrower terms. Such relation- ship is called poly-hierarchical relationship, e.g.,

MAMALIA MARINE ANIMALS

WHALE

● Non-hierarchical relationship: Non-hierarchical relationship can be of two types: ● Equivalence (or preferential) relationship: Equivalence relationship is the relationship between preferred and non-preferred terms in an indexing language, in which each of two or more terms is regarded for indexing purposes, as referring to the same concept. In other words it is the relationship between synonyms and/or quasi-synonyms. Synonyms are the terms having the same meaning and are, therefore, interchange- able, while quasi-synonyms are those terms the meanings of which are not exactly same, but are regarded as same for the purpose of indexing. In case of equivalent terms, only one term is selected as descriptor, e.g., CYTOLOGY—Cell Biology It should be ensured that all documents associated with the equiva- lence category are retrieved whenever needed. Sometimes relation- ship between antonyms, i.e., terms with opposite meanings is also treated as equivalence relationship. Further, equivalence relationship is considered by some experts as a separate type of relationship and a not a kind of nonhierarchical relationship. ● Associative (or affinitive) relationship: This type of relationship can- not be properly defined. It covers relationship between the terms in pairs of terms which are neither members of an equivalent set nor can they be organized in a hierarchy, yet they are semantically associated to such an extent that the link between them should be made explicit in a thesaurus. There may be associative relationship between two kinds of terms: – Terms of same category, e.g., SHIPS—BOATS – Terms of different categories, e.g., INDIANS—INDIA. 160 Elements of Information Organization and Dissemination

There are different ways in which two terms can be associated. Neelameghan has identified 29 types of such relationship. However, only those associative relationships need to be shown which are likely to be needed for retrieval in an information retrieval system [19]. It may be mentioned that whenever a relationship, whether hierarchical or non-hierarchical, is established between two terms, it is necessary to pro- vide reciprocal entries for each term in the thesaurus. For example, if the term BROADCASTING and TRANSMITTER are considered related, the entry under BROADCASTING should display TRANSMITTER as a related term and the entry under TRANSMITTER should similarly display BROADCASTING as a related term.

K.4.5.9 Display of Relations The relations between terms are displayed in a thesaurus in either of the following ways: ● By prefixing abbreviations ● By prefixing symbols ● By graphic method.

Display Using Abbreviations UNISIST Guidelines have recommended the following abbreviations [16]: BT ( = Broader Term) to represent a concept of wider connotation NT ( = Narrower term) to represent a concept of more specific connotation RT ( = Related Term) to denote a term having associative relationship with the descriptor SN ( = Scope Note) to indicate the note attached to a descriptor restricting the meaning of the term TT ( = Top Term) to represent the broadest class UF ( = Use For) to indicate a nonpreferred term USE to indicate a preferred term or descriptor [Note: In some thesauri, the term SEE is used for USE and SEE FOR (SF) for USE FOR (UF).] UNISIST Guidelines has also recommended that to distinguish generic relations from whole-part relations, the abbreviations may be modified as follows [15]: BTG for Broader Term Generic BTP for Broader Term Partitive Controlled Vocabulary (CV) 161

NTG for Narrower Term generic NTP for Narrower term Partitive.

Display Using Symbols A committee of the ISO has recommended the following symbols to indi- cate the various types of relationships [16]: Hierarchical relationships < to precede a broader term

to precede a narrower term >P to precede a narrower term (partitive) Equivalence relationship = to precede a preferred term (descriptor) ≠ to precede a nonpreferred term Associative relationship – to precede a related term Conjunction & to indicate that terms joined by the symbol should be used in combination to represent a compound concept.

Graphic Display A number of graphic display devices are being used to show the rela- tionships between the terms in a thesaurus, such as tree structure, arrow- graph, Euler circles, and circular display, of which the first two are more popular. Tree Structure: In tree structure the terms are arranged in a tree showing the hierarchy such as shown below:

TELEGRAPH EQUIPMENT

Telegraph Telegraph Telegraph cables receivers transmitters

Coaxial Conductors Pulse cables cables Teleprinters Teletypewriters

Acoustic Visual Recording receivers receivers receivers 162 Elements of Information Organization and Dissemination

Arrowgraph: In an arrowgraph, the related terms are linked by arrows as shown below:

Recording Tele- Visual Tele-printers receivers receivers typewriters

Acoustic Telegraph Telegraph receivers receivers transmitters

Coaxial TELEGRAPH Communication cables EQUIPMENT cables Telegraph Conductors cables

Pulse cables

Euler Diagram: In this representation, each concept is defined by a polygo- nal domain or “circle.” Synonymous terms, by definition, occupy the same domain and related subjects have overlapping domains. A subject which is perceived as being entirely contained within the bounds of another will have its “circle” totally within the boundaries of the domain of the broader subject [8], as shown below:

Television

CableTV

Circular Display: In this type of display, the central concept is shown as a central circle. Related topics are arranged in concentric circles around the central circle. Concepts which are only remotely related to the central concept will be positioned well away from the central circle [12].

Format of Display An IR thesaurus may be arranged and presented in one or more of the following methods: ● Alphabetical—in which descriptors (along with their related terms) and cross-references are arranged in alphabetical order. Controlled Vocabulary (CV) 163

● Systematic or classified—in which the descriptors are arranged in their hierarchical order with levels of hierarchy represented by indentions, dashes, etc. ● Graphic—in which the hierarchy is shown by a tree or an arrowgraph.

K.4.5.10 Method of Compilation Compilation of IR thesaurus is a specialized job requiring a fair knowl- edge of the subject to be covered as well as familiarity with the methodol- ogy of compilation. At the outset it is necessary to obtain fair knowledge of the subject to be covered by studying some standard and representative books and discussion with subject experts. For learning compilation meth- odology, guidelines brought out by ISO [9], BSI [17], and UNESCO [16], or the manual compiled by Aitchison and Gilchrist [18] may be consulted. The essential steps for compiling of a monolingual thesaurus are [11]: ● Delineation of scope: At first, the scope of the subject of the proposed thesaurus is to be clearly defined. The scope must be coextensive with the information retrieval system for which the thesaurus is being com- piled. For a thesaurus meant for general use, the exact scope should be defined by demarcating boundaries of the subject field, and penumbral subjects should be identified. ● Determination of characteristics: A decision has to be taken regarding the following characteristics of the thesaurus: ● The level of specificity; ● The level of pre-coordination; ● The extent of hierarchical and other relationships; ● Auxiliary precision devices (e.g., links, roles) to be used; ● Arrangement and layout of the main and auxiliary parts. ● Division of subject field into facets: The main areas or facets of the subject field may be identified with the help of representative books and sub- ject specialists. This helps in term collection by ensuring inclusion of terms relating to all important areas of the subject field. Existing clas- sification schemes and thesauri on related or broader subjects may be consulted for the purpose. ● Identification of sources: The sources from which terms are to be col- lected should be identified before starting term collection. Sources may include standard reference tools such as encyclopedias and, glossaries; representative books on the subject; available classification schemes and thesauri, primary and secondary periodicals, etc. It is advisable to pre- pare a main entry for all such sources in a standard format mentioning 164 Elements of Information Organization and Dissemination

all the bibliographical details of the document and including, if pos- sible, an abstract. These entries should be arranged in a helpful order, or alphabetically, and numbered serially. ● Collection and selection of terms: Each of the sources may be scanned for collection of relevant terms. Where a concept is denoted by differ- ent terms, the term which will serve as the descriptor should be cho- sen keeping in view the frequency of use of the term in the literature, its present and possible use in retrieval queries as also its acceptability among the subject experts. The currency of the term and its ability to precisely express a particular concept should also be considered. Some definite rules should be followed for determining the form of the term. A term record should be prepared for each term selected in a specially designed term record card. At this stage it may not be possible to note down all the information required to be entered in the term record, e.g., the broader terms and narrower terms cannot normally be supplied until the hierarchy of the terms is established. In such circumstances gaps may be left which may be filled in at a later stage. Besides, full biblio- graphical details of the source need not be repeated in all term records; mentioning of serial number of the main entry may be sufficient. ● Determination of relation: Determination of interrelation between the terms is most significant task and should be done with caution. For determining the hierarchical relationship a hierarchical tree can be drawn covering all the descriptors. Help may be taken from the existing classification schedules and thesauri and the subject experts. Associative relationships should be established keeping in view the need of the information system and taking help of any guidelines. After determining all the relationships these should be noted down in appro- priate places on the term records. ● Preparation of entries: An entry should now be prepared for each descrip- tor in previously determined format using necessary abbreviations or symbols as mentioned earlier. For each non-preferred term also, a cross- reference entry should be prepared in a predetermined format, e.g., Microcopies USE MICROFORMS The descriptors and nonpreferred terms may be differentiated typographi- cally as above. ● Arrangement of main part: When the main part is alphabetical, the entries for the descriptors and non-preferred terms are all arranged in alpha- betical order. In case of classified or systematic main part, the terms are arranged in a hierarchy displaying the order of the terms by indentions, dots, or dashes. Scope notes, synonyms, and related terms are added Controlled Vocabulary (CV) 165

where necessary. For graphic display, a tree or arrowgraph is drawn showing the relationships of the terms. ● Preparation of auxiliary part: The auxiliary parts necessary to supplement the main part and to meet the needs of the indexing system should now be prepared. ● Final editing: The drafts of the main and auxiliary parts should be tested and checked on the following points: ● Word forms and spellings; ● Reciprocal entries; ● Arrangement of entries; ● Indentions, spacings, and layout; ● Links between the main and auxiliary parts. All mistakes discovered should be corrected and new relationships, if any found, should be added. An explanatory introduction explaining the scope, structure, features, and arrangement of the thesaurus and a “Guide to Users” should be prepared and prefixed. Finally necessary instructions should be added for printing. ● Updating: No thesaurus can be a static document. There should be a mechanism to update the thesaurus at regular intervals. Updating should be done keeping in view the developments in the subject-field covered and readers’ queries. Updating may involve change in existing relationships and addition of new relationships.

K.4.5.11 Advantages of Thesaurus Like any other vocabulary control tool, thesaurus too effects vocabulary control in the language being used in indexing and information retrieval and helps the indexer in selecting preferred terms. However, the advan- tages of using IR thesaurus in comparison to other such tools are: ● It provides more access points, in comparison to other vocabulary con- trol devices; ● It enables the searcher to find out not only information on a specific topic, but also on related topics; ● By using indexing terms and search terms from the same thesaurus, the speed of retrieval can be increased.

K.4.5.12 Use of Thesaurus According to Rowley, the early thesauri were constructed for use with card-based post-coordinate indexing systems and with early computerized information retrieval systems. These thesauri were typically printed (or typed) and were used alongside the index for which they were designed, 166 Elements of Information Organization and Dissemination to assist with both indexing and searching of the database [12]. Even in the age of online databases thesaurus is important. It can be used in index- ing the documents which are listed in databases as well as retrieving the needed information from the database. Many databases have their online thesauri. Thus a thesaurus can be used in [12]: ● The intellectual assignment of indexing terms to documents as their records enter the system; ● Searching of database through use of appropriate search terms selected by the users from the thesaurus.

K.4.5.13 Evaluation of Thesaurus The efficiency of a thesaurus depends on its ability to index and retrieve information precisely and quickly. Evaluation of a thesaurus is, there- fore, necessary so that lacuna, if any, can be found out and remedial measures can be taken. The factors to be considered for evaluation of a thesaurus are: ● Specificity of terms; ● Completeness of thesaurus; ● Extent of pre-coordination level; ● Word-forms, direct entry and other matters of consistency; ● Extent of linkage; ● Extent of synonyms. Various quantitative measures have been developed for evaluation work.

K.4.6 Ontology An ontology, like a thesaurus, is a kind of taxonomy with structure and specific types of relationships between terms belonging to a domain of knowledge and expressed in a format that is machine readable. In an ontology, the types of relationships are greater in number and variety and more specific in their function. Relationships can include, for exam- ple, located in to relate an organization to a place, produces/is produced by to relate a company and its product, and employer/employed by to relate a company and a person. Information, which, in a simple controlled vocab- ulary or taxonomy, is conveyed through indexing, is embedded into the ontology itself. Ontological relationships are used in more complex infor- mation systems, such as the Semantic Web [2,4]. Ontology has been dis- cussed in detail in Chapter Y. Controlled Vocabulary (CV) 167

K.4.7 Synonym Ring List A synonym ring is a set of terms that are considered equivalent for the purposes of retrieval. Synonym rings cannot be used during the index- ing process. Rather, they are used only during retrieval. Thus although a synonym ring is considered to be a type of controlled vocabulary, it plays a somewhat different role than other controlled vocabularies. Use of syn- onym rings ensures that a concept that can be described by multiple syn- onymous or quasi-synonymous terms will be retrieved if any one of the terms is used in a search. A synonym ring allows users to access all content objects or database entries containing any one of the terms. Synonym rings are generally used in the interface in an electronic information system, and provide access to content that is represented in natural, uncontrolled lan- guage [18]. A synonym ring may be illustrated in the following way.

Mental Illness

Mental Mental Disorder Sickness

Mental Mental Disease Abnormality

Psychological Disorder

Adapted from ANSI/NISO Z39.19-2005 ISBN: 1-880124-65-3 Guidelines for the Construc- tion, Format, and Management of Monolingual Controlled Vocabularies. It may be pointed out that synonym rings are used specifically to broaden retrieval (this is often referred to as query expansion). Thus, synonym rings may, in fact, contain near-synonyms that have similar or related meanings, rather than restricting themselves to only terms with true synonymy [3].

K.4.8 Folksonomy, is a user-generated classification system of web contents that allows users to their favorite web resources with their chosen words or phrases selected from natural language. These tags (also called con- cepts, categories, facets or entities) can be used to classify web resources and to express users’ preferences. Folksonomy-based systems allow users to 168 Elements of Information Organization and Dissemination classify Web resources through tagging bookmarks, photos, or other Web resources and saving them to a public website. Thus, information about web resources and online articles can be shared in an easy way [20]. This concept has been discussed in more details in Chapter Y.

K.5 ADVANTAGES OF CV So far as subject indexing is concerned, use of controlled vocabulary can be more advantageous than that of natural language vocabulary. The main advantages of controlled vocabulary are: ● It ensures consistent indexing; ● It helps the indexer in selecting preferred terms; ● It helps in achieving high precision in searching.

K.6 DISADVANTAGES OF CV The disadvantages of using controlled vocabulary in subject indexing are: ● The use of controlled vocabulary is likely to be costly than that of nat- ural language vocabulary; ● The user has to be familiar with the controlled vocabulary scheme to make best use of the system; ● Controlled vocabulary can be outdated quickly due to constant devel- opments in the concerned domain.

REFERENCES [1] A. Redmond-Neal, Building taxonomies (PPT presentation). 2006. . [2] H. Hedden, Taxonomies, thesauri, and controlled vocabularies. . [3] What are controlled vocabularies? . [4] American Society For Indexing. Taxonomy and Controlled Vocabularies SIG. About taxonomies and controlled vocabularies. . [5] American Library Association, ALA Glossary of Library and Information Science, ALA, Chicago, 1983. p. 220. [6] Shorter Oxford English Dictionary. V. 2. third ed., Clarendon Press, Oxford, 1975. p. 22. [7] A. Chatterjee, Thesaurus – an aid to information retrieval, in: P. Dhyani (Ed.), Information Science and Libraries, Atlantic Publishers, New Delhi, 1990. pp. 43–65. [8] P.M. Roget, Thesaurus of English Words and Phrases; Enlarged by John Lewis Roget, Grosset & Dunlop, New York, 1974. Controlled Vocabulary (CV) 169

[9] International Organization for Standardization. Guidelines for the establishment and development of monolingual thesauri (ISO 2788:1976, rev. in 1986). [10] A. Kent, Information Analysis and Retrieval, third ed., Becker and Hayes, New York, 1971. p. 230. [11] A. Chatterjee, Information retrieval thesaurus, its structure, function and construction, in: S.B. Ghosh, J.N. Satpathy, (Eds.), Subject Indexing Systems, Concepts Methods and Techniques, IASLIC, Calcutta, 1998, pp. 41–65. [12] J. Rowley, Abstracting and Indexing, second ed., Clive Bingley, London, 1988. [13] J. Aitchison, et al., Comp. Thesaurofacet, English Electric Company, Whetstone, 1969. p. XIV. [14] A. Gilchrist, The Thesaurus in Retrieval, ASLIB, London, 1971. pp. 4–5. [15] A. Neelameghan, Non-hierarchical associative relationships In: DRTC and INSDOC Seminar on Thesaurus in Information Systems: Papers, DRTC, Bangalore, 1975. pp. A1–A7. [16] UNESCO-PGI, UNISIST, Guidelines for the Establishment and Development of Monolingual Thesauri, second ed., UNESCO, Paris, 1981. [17] British Standards Institution, Guidelines for the Establishment and Development of Monolingual Thesauri (BS5723: 1979, BSI, London, 1979. [18] J. Aitchison, A. Gilchrist, Thesaurus Construction: A Practical Manual, ASLIB, London, 1972. p. 48. [19] Structure of controlled vocabularies. . [20] A. Nouruzi, : (Un)Controlled vocabulary, Knowl. Org. 33 (4) (2006) 199–203.