<<

432 Knowl. Org. 39(2012)No.6 G. Baião, G. Â. Borém Lima. Using Topic Maps in Establishing Compatibility ...

Using Topic Maps in Establishing Compatibility of Semantically Structured Hypertext Contents

Guilherme Baião Salgado Silva* and Gercina Ângela Borém de Oliveira Lima** * Universidade Federal de Minas Gerais, Av. Antônio Carlos 6627 Sala: 4053 – Pampulha 31270-010 Belo Horizonte/MG, Brazil ** Universidade Federal de Minas Gerais, Av. Antônio Carlos 6627 Sala: 4053 – Pampulha 31270-010 Belo Horizonte/MG, Brazil

Guilherme Baião Salgado Silva holds a Master’s in Information Science (UFMG, Brazil) and a Bachelor’s in science (Pontifícia Universidade Católica, Brazil), and is a specialist in (Centro Universitário de Belo Horizonte, Brazil). He works as a systems analyst-programmer and manager of in- formation of the Court of Justice of Minas Gerais State (Brazil).

Gercina Ângela Borém Lima is full professor of information science at the Information Science School – UFMG (Brazil). She holds a PhD in Information Science from UFMG (Brazil) and a Master’s in Li- brary Science from Clark Atlanta University (USA). Currently, she is the vice president of ISKO’s Bra- zilian Chapter, the coordinator of the Graduate Program in Information Science at UFMG. She has been funded by a grant from the research agency CNPq to coordinate the Research Group MHTX (models for hypertext organization) since 2010. Current research includes the areas of knowledge or- ganization and hypertext systems, both applied to .

Baião Salgado Silva, Guilherme, and Lima, Gercina Ângela Borém de Oliveira. Using Topic Maps in Es- tablishing Compatibility of Semantically Structured Hypertext Contents. Knowledge Organization. 39(6), 432-445. 30 references.

ABSTRACT: Considering the characteristics of hypertext systems and problems such as cognitive overload and the disorientation of users, this project studies subject hypertext documents that have un- dergone conceptual structuring using facets for content representation and improvement of informa- tion retrieval during navigation. The main objective was to assess the possibility of the application of technology for automating the compatibilization process of these structures. For this purpose, two dissertations from the UFMG Information Science Post-Graduation Program were adopted as samples. Both dissertations had been duly analyzed and structured on the MHTX (Hypertextual Map) prototype database. The faceted structures of both dissertations, which had been represented in conceptual maps, were then converted into topic maps. It was then possible to use the merge property of the topic maps to pro- mote the semantic interrelationship between the maps and, consequently, between the hypertextual information resources proper. The merge results were then analyzed in the light of theories dealing with the compatibilization of languages developed within the realm of information technology and librarianship from the 1960s on. The main goals accomplished were: (a) the de- tailed conceptualization of the merge process of the topic maps, considering the possible compatibilization levels and the appli- cability of this technology in the integration of faceted structures; and (b) the production of a detailed sequence of steps that may be used in the implementation of topic maps based on faceted structures.

Received 10 July 2010; Revised 14 August 2012; Accepted 23 August 2012

Knowl. Org. 39(2012)No.6 433 G. Baião, G. Â. Borém Lima. Using Topic Maps in Establishing Compatibility ...

1.0 Introduction with others, in distinct ways, in each of these struc- tures. The growing use of hypertext systems through the This study is intended to assess, in the light of Web has triggered a series of changes in information theories for the compatibility of languages and vo- organization, storage, and retrieval concepts. With cabularies, topic map standard as a tool for the auto- their non-linear associative structure, hypertext mated integration of heterogeneous faceted concep- documents offer a number of innovations in relation tual structures in hypertextual contents. These maps to traditional printed documents, mainly in regard to may be used for the representation of information us- research and learning processes, since hypertextual ing Knowledge Organisation Systems (KOS), such as reading is predominantly based on navigation. On the thesauri, taxonomies, conceptual maps, and faceted other hand, hypertext systems have some disadvan- classifications (Garshol 2004). tages for users, including cognitive overload and dis- orientation during navigation. 2.0 Topic maps With the goal of mitigating such problems, several hypertext structured navigation systems have been Topic maps form a standard that allows for the im- proposed. These new models are based on the appli- plementation of information representation instru- cation of techniques and methodologies for the or- ments, a standard that is used to describe and navigate ganization of information in the hypertextual seman- through informational objects in digital systems in a tic network and on interfaces that allow for oriented contextualized manner. According to Park (2003), navigation supported by the ability of visualizing the topic maps and the organization of knowledge are global structure of the network in a number of ways, natural partners, since the first one was invented so as such as hierarchical diagrams and conceptual maps. to meet the requirements and solve problems with Some projects, like the Hypertextual Map (MHTX) the latter, mainly in respect to the semantic integra- proposed by Lima (2004), suggest the application of tion (or compatibilization of heterogeneous subject the theory of facet analysis in the analysis of tradi- schemes) and shared construction of conceptualiza- tional documents and their conversion into hypertext- tions (or mental models). For Park, the construction ual documents for later availability in a digital library of topic maps must always be based on theories, of theses and dissertations. However, the hypertext- techniques, and methodologies developed by knowl- ual documents in these repositories have heterogene- edge organization for the construction of intellectual ous semantic structures, unified search requires the structures, such as faceted classification. identification of interrelationships between document As in other information representation instru- structures, a rather complex process. ments (such as thesauri and concept maps, for exam- Therefore, the problematic issue to be dealt with in ple), topic maps are based on the representation of this study is characterized by the existence of distinct subjects (topics) through relations (associations). conceptual structures for the representation of differ- Based on the mapping of subjects and associations, ent hypertextual documents in one same repository “paths” that point to information resources that are (a digital library, for example), and which may ap- relevant for these subjects are constructed. This para- proach subjects that are correlated among themselves digm requires that the topic map’s author think in or even approach the same subject through different terms of topics (subject matter, conversation topics, terms and contexts. For that reason, the creation of specific notions, ideas or concepts) and associate mechanisms that enable the interrelationship between various types of information to a specific topic. For these structures tends to facilitate the retrieval of in- example, a topic called “hardware” can be associated formation by users. with another “topic” called “printer” throught an as- Considering that the subject matter analysis of sociation of type “is a.” The topic “printer,” in turn, each document was conducted based on theory of can be linked to different kinds of information re- facet analysis and supported by literature warranty in sources (occurrences), like an image of a printer or a relation to the documents analyzed, at the end of the text that describe its functionality. process we shall have different controlled vocabular- Lima and Fagundes (2004) note that the original ies being used for the organization of documents. application of this language was the construction of These differences occur mainly due to the possibility indexes and glossaries for documents, which was later of designating different terms for the same subject extended to the Web for the definition of relations be- and to the possibility of one subject being associated tween entities, constituting one structure that makes 434 Knowl. Org. 39(2012)No.6 G. Baião, G. Â. Borém Lima. Using Topic Maps in Establishing Compatibility ...

data search more efficient. Librelotto (2005) lists the components, the first group was named TAO by following as main objectives of topic maps: Pepper and the second group, IFS. Comments will be made on each of these concepts below. a) to structure information resources by means A topic is the representation of a subject. More of mechanisms that are external to these re- generically, a subject could be anything: a person, a sources; concept, an entity, etc. In an ideal representation, the b) to allow for searches that retrieve the desired relation between topics and subjects is of a “one-to- information; and, one” type; that is, each topic represents one and only c) create different views for users or specific ap- one subject, and each subject is represented by a single plications by means of information filters. topic. Within the topic map, any topic is an instance of Garshol (2002) observes that, through topic maps, it one or more “topic types.” For instance, it can be said is possible to create external indexes to describe in- that the “indexer” topic is of type “Person” (the type formation that is present in documents or databases. “Person” being one topic in itself), and that the “sub- According to Ahmed and Moore (2006), the sepa- ject matter analysis” and “translation” topics are of the ration between the conceptual structure and indexed “Process” type and “book” is a “material” type topic. resources presents three remarkable benefits: For each topic in the map, it is possible to define one or more topic names. Librelotto (2005) notes that a) the map serves as one representation, at the the possibility of attributing various names to one highest level, of the indexed information, thus topic may have two objectives: to facilitate the under- facilitating the location of resources and the standing of the meaning of the topic through alterna- conceptual structuring of those resources being tive descriptions and to indicate the use of different represented; names in different contexts, like language, domain, b) ease of topic map fragmentation so as to geographical area, historical period, and so on. Ac- meet different requirements; and, cording to Park (2003), the utilization of multiple c) ease of combination among topic maps that names for one same topic is a fundamental requisite index different sets of resources. for the robustness, scalability, and interoperability of maps. The topic map standard was initially conceived in the In a topic map, associations are meant to express a beginning of the 1990s. Since then, several research relationship between one or more topics. An associa- studies have been conducted with the goal of creating tion of topics is, formally, a linking element that de- norms and standards for their use, as well as of assess- scribes the relations between two or more topics ing their applicability. In the late ’90s, the ISO/IEC (Pepper 2000) and the roles each of them play in that group created an international standard ISO 13250 association. Associations between topics may also be for the formalization of knowledge structures repre- grouped according to their association types, which, sented by this technology. Soon after the publication in turn, are also defined through topics. In these as- of ISO 13250, studies were begun aimed at its adapta- sociations as well, as cited in Pepper (2000), we can tion to XML (eXtensible ), which define also the role of each topic. For this purpose, already stood out in terms of acceptance as a language the association roles element is utilized. For example, for communication among information systems on when there is a need to represent, in a topic map, a the Web (Park and Hunting 2003). At that same time, whole-part type relation between subject matter the independent organization To p i c M a p s . o r g was analysis processes and subject extraction, it is possible founded with the goal of creating an XML specifica- to define not only the type of relation (whole-part), tion for topic maps. XTM 1.0 (XML Topic Maps) ap- but also to define that the whole is the subject matter peared in the beginning of 2001. analysis and the part is the extraction of subjects. According to Pepper (2000), the topic map stan- Contrary to what happens in thesauri and in other dard was created based on three main components traditional classification schemes, topic maps allow (also present in the bibliographical indexes): topics, for the representation of any type of relation between associations, and occurrences. In addition to these subjects. The class-instance type relation is estab- three, Pepper also highlights other important ele- lished by typifying topics, associations, or occur- ments in the topic maps: subject identity, facets, and rences (Pepper 2000). For the remaining types of re- scope. Due to the mnemonic composition of these lations, new associations must be defined. Knowl. Org. 39(2012)No.6 435 G. Baião, G. Â. Borém Lima. Using Topic Maps in Establishing Compatibility ...

In topic maps, an occurrence is any information ambiguities and reducing the chance of errors when that, in some manner, is specified as being relevant for merging topic maps (Pepper 2001). Topic characteris- a certain topic (subject). That is, occurrences allow a tics are understood to be the names, associations, and topic to be related to any number of resources con- occurrences in which the topic takes part. According sidered to be relevant for it. These resources are then to Ahmed and Moore (2006) “scope is the term used called topic occurrences. Ruiz (2005, 86) defines oc- in the topic map norm to refer to a restriction or a currence as “external information resources, linked context in which something is said about a topic.” If by a reference that is useful for its location, which de- no scope is attributed to the name, to the occurrence, fine or exemplify the meaning of the topic.” For Pep- or to the association, this means that all topic charac- per (2001), occurrences may be external to the topic teristics are valid in any situation. The merge process (one Web page about that topic, for instance) or in- is a resource made available by topic maps to integrate ternal (defined in the map proper). As with topics two maps into one, in a manner that guarantees that and associations, occurrences may also be classified topics that represent the same subject in the two maps by means of occurrence types. Therefore, it is possi- are transformed into one single topic in the resulting ble, for example, to distinguish those occurrences map. When this merging takes place, the characteris- that define the topic from those occurrences that ex- tics of the merged topics are to be unified, thus elimi- emplify or instanciate the topic. nating duplication, and kept in the resulting topic As mentioned before, in the second group of topic (names, occurrences, associations, and identifiers). map components defined by Pepper (2000), we will This process can be automated through applica- find the element responsible for the identification of tions that are specific for the manipulation of topic subjects, facets, and scope. maps. These applications use two distinct possibilities The very first of them, the subject identifier, is in- to determine that two topics represent the same sub- tended to guarantee that a subject is represented by ject. The first one occurs when the single subject iden- one single topic in the map. For that purpose, the tifier is the same for two distinct topics, whether the subject can be identified in the topic map through: identifier is an addressable object or is a PSI. The sec- ond occurs when there are one or more names of a a) already existing and addressable electronic re- topic that coincide with one or more names of an- sources, such as an image on a webpage, for in- other. stance, or The second case requires that the homonymy b) a subject indicator created and made available for problem be taken into account, that is, when one this specific purpose, since the subject in question same term represents two or more different subjects. is not addressable, such as a subject “Brazil,” for In order to deal with this problem, applications can instance. take two different paths: in the first situation, the ap- plication presents to the user those topics that were In the latter case, the resource created by a commu- identified with the same name so that the user may nity with the specific goal of providing a subject with decide whether the topic should be merged or not. In one single identification is called a published subject this case, the process is no longer totally automated; indicator or simply PSI. The Topicmaps.org site in the second situation, the application takes into ac- (http://www.topicmaps.org/xtm/), for example, de- count the equivalence of the names only if they are fines a series of PSIs related to electronically non- defined for the same scope, thereby turning it into a addressable subjects, such as languages, countries, and fully automated process. the subjects themselves that are involved in the con- Another merging possibility among map elements struction of topic maps. The second component, the is the use of the mergemap component to specify the facets, is in fact used to describe occur- topics that are to be considered as being one only. rences within a topic map. This resource was adopted The mergemap refers to an external map through a by standard ISO 13250 and eliminated by the XTM URI, thus allowing for the indication, in the map top- standard, since, in the latter, the resource and the ics, of corresponding topics in other maps. property value must be considered topics that are as- Since some topic maps may contain a great quantity sociated by means of “applicable to” type associations of topics and associations, it is currently possible to (Park 2003). find tools that enable the of topic maps The scope is used in topic maps to define the con- through different interfaces: textual, generally made text in which topic characteristics are valid, removing available through HTML pages, graphs, trees or maps 436 Knowl. Org. 39(2012)No.6 G. Baião, G. Â. Borém Lima. Using Topic Maps in Establishing Compatibility ...

of several different types, including tridimensional Gernot Wersig, Eric Coates, Elaine Svenonius, Da- types. Omnigator, for instance, enables the visualiza- gobert Soergel, Linda C. Smith and V. M. Glushkov. tion of maps both in page format and in graphic for- In addition to these authors, Lancaster (1986) points mat (http://www.ontopia.net/omnigator/). Another out the work of H. Neville and R. T. Niehoff, cited in example is the TMNav, which allows for the visualiza- Soergel (1974). tion of maps in several different graphic forms, among In 1971, UNESCO’s UNISIST report dedicated an which is the tree form. TMNav is one of the tools entire section to the conversion and compatibility of available in the TM4J (http://tm4j.org/), which is an indexing instruments. The report defines compatibili- application package for the creation, management, and zation and conversion as follows (Dahlberg 1981, 86): visualization of topic maps. The most recent studies about topic maps concentrate on the implementation The report defines ‘compatibility’ as the quality of resources aimed at making the definition of restric- of systems whose products may be used in an tions in relationships1 and inference rules viable. interchangeable manner, despite the differences in notation, structure, physical support, etc., 3.0 Compatibilization of languages and with no special automated conversion; and controlled vocabularies ‘conversion’ as the process through which in- formation entries are transformed based on Studies concerned with the compatibilization of lan- code transcription, data structure, etc., thus al- guages were started in the 1960s (Dahlberg 1981). lowing for them to be interchangeable between Dahlberg (1981) highlights the work conducted by two or more services or systems that utilize dif- William Hammond and Staffan Rosenborg, Simon M. ferent conventions and media. Newman, Madeline M. Henderson, John S. Moats, and Mary Elizabeth Stevens. One of the main results In the following decade, the , there was little re- reached by the studies at that time was the definition search into the theme (Maniez 1997, 213). At that of some core concepts for the field, such as converti- time, technologies that allowed for the interchange of bility, compatibility, and their respective applications, data among different bases were rather restricted. In mainly within the scope of cooperation among in- the 1990s, studies were then resumed due to the formation centers. William Hammond, in 1965, de- global expansion of information networks, the ease of fined compatibility as: “the capacity of an informa- simultaneous access to different collections and the tion system to accept indexing data and summariza- increased amount of interchange among different da- tion from another system on any subject matter that tabases, most importantly over the Web (Maniez is common to both” (Newman 1965, 7). 1997; Hudon 2004). These studies presented new According to Maniez (1997) the words “integra- definitions for compatibility. Noteworthy among tion,” “harmonization,” “reconciliation,” and “con- them is the definition provided by Gerhard Riesthuis cordance” are usually applied to indicate the concept in a seminar on system integration and compatibiliza- of convergence, that is, the same concept of “com- tion held in 1995, in Varsovia: compatibility means patibilization.” However, Lancaster (1986) considers that, for each term in a language there is a corre- integration of vocabularies as a specific method of sponding term with the same meaning in another lan- compatibilization, in which there is no conceptual guage, and that is why it is possible to convert terms analysis, that is, for a certain user request, all related of one language into another, with no alterations in occurrences (including word variations) and elements meaning (Maniez 1997). Maniez (1997) also com- are shown as a result, without showing the relation or ments that the compatibilization of languages is a the context in which each one of them are present. crucial aspect in any system, Coates (1970) comments that up until then coop- since, for these systems, users are required to use a eration among libraries was based solely on the union language that is compatible with that which was used of catalogues, that is, there were no efficient mecha- for indexing the entries, considering the subjectivity nisms that allowed for the exchange of descriptions of implicit in the indexing process. a subject matter. However, according to Maniez In Brazilian literature, among studies produced (1997), the 1970s witnessed an increase in the produc- since the beginning of the current decade, we high- tion of studies on language compatibility. Some rele- light those conducted by Dr. Maria Luiza de Almeida vant authors of that time cited by Dahlberg (1978) Campos. Campos (2005) cites as the main language were, among others: Hans Wellisch, Verina Horsnell, compatibility theorists in information science: Soer- Knowl. Org. 39(2012)No.6 437 G. Baião, G. Â. Borém Lima. Using Topic Maps in Establishing Compatibility ...

gel (1982), Dahlberg (1981), Neville (1970), and f) the lack of subjects in some of the systems, Glushkov et al. (1978). although this same subject occurs in other sys- According to Zhang (2006), the objective of pro- tems involved. This last obstacle tends to be moting the compatibility of indexing languages is to bigger as the overlaying of subject matters dealt enable users, through a single search strategy, to re- with by those vocabularies involved decreases. trieve information stored in any of the bases or re- positories that comprise a certain information re- In addition to these problems, Lancaster (1986) trieval system. For Rada (1987), the integration2 of comments on problems related to the use of compu- controlled indexing languages may come to serve for tational processing in an attempt to automate the different purposes: compatibilization process and notes, as the main is- sue, the possibility that one same term may have dif- a) to enable users to use a term for a certain ferent meanings. For the same author, another prob- subject and, based on that, allow for the indica- lem is the existence of errors in the form in which the tion of appropriate descriptors in each base; terms are written in some systems. For Doerr (2001), b) to permit the indexing process to be shared the difficulties found in the compatibilization process through the possibility of converting one the- originate mainly in the subjectivity of the choice of saurus into another; descriptors and hierarchical relations during the con- c) to make the identification of relations be- struction of controlled vocabularies. tween different controlled vocabulary subjects The compatibilization of controlled vocabularies possible; and can occur at three levels: terminological (or verbal), d) to increase the vocabulary content. conceptual, and structural (Soergel 1982). At the first one, the terminological level, procedures are based on In 1965, William Hammond had listed the following the comparison of names or descriptors that have compatibilization objectives (Newman 1965): been attributed to the subjects. The second level, conceptual, may be considered the most complex of a) to provide for the dissemination of informa- all, since it is at this level that nearly all problems pre- tion resulting from ongoing searches; viously mentioned are found. Due to this complexity, b) to eliminate the need for doubled indexing contrary to what occurs at the terminological level, and for search report summaries; and compatibilization hardly ever occurs in an automated c) to furnish for the retrieval of reports stored manner at the conceptual level. Also at the conceptual in different repositories, based on the original level, methods for the validation of the previous stage indexing. may be proposed, in view of the fact that homonyms may generate erroneous terminological compatibiliza- For Guinchat and Menou (1994), the compatibility be- tion from the conceptual point of view. In turn, com- tween documentary languages is of utmost importance patibilization at the structural level is responsible for for the exchange of information between units that deal guaranteeing the conformity of those relations estab- with the same or interrelated subject matters. Accord- lished among elements. ing to Soergel (1982), the main obstacles found in the Hudon (2004) mentions a fourth level, the com- controlled vocabulary compatibilization process are: patibilization of subject matter. This level is related to the possibility of representing the same subject mat- a) different levels of precombination of the de- ters in different controlled languages, whether by scriptors used by each of the systems; means of a single element or through the combina- b) different degrees of specificity for subjects tion of two or more of them. Glushkov et al. (1978), that compose each one of the systems; in turn, divided the compatibilization of vocabularies c) different ways to relate one specific term to into two levels: the semantic level, linked to the ca- one single generic term in the case of polyhier- pacity of both to represent the same body of knowl- archies; 3 edge, and the structural level, related to the similarity d) different possibilities for the aggregation of according to internal characteristics, which corre- specific subjects formed by terms that, in an iso- sponds to the terminological, conceptual, and struc- lated manner, do not represent subjects; tural levels mentioned by Soergel (1982). e) the possibility of different connotations for In the compatibilization model proposed by Dahl- one same term (homonyms); and berg (1981), for instance, the three noticeable stages 438 Knowl. Org. 39(2012)No.6 G. Baião, G. Â. Borém Lima. Using Topic Maps in Establishing Compatibility ...

of compatibilization are: verbal (alphabetical com- from distinct vocabularies: (1) exact correspondence, parison matrix constructed in topic 4 of the author’s (2) synonyms, (3) generic to specific, (4), mappable study), conceptual (topics 5.1 to 5.4) and structural to a different pre-coordination level, (6) semantic fac- (5.5 to 5.7). toring, and (6) antonyms. It is important to note, We found different techniques and algorithms for however, that Neville (1970) did not consider the the integration of languages, among which we may treatment of homonyms, except in those cases in highlight mapping, the use of intermediary languages, which at least one of the vocabularies involved con- microthesauri, macrothesauri, and the universal the- strues the occurrence of homonymy, as well as, for saurus. Among these, mapping is the one most often example, a case in which a vocabulary distinguishes found in references to the subject. the elements “tank (war vehicle)” and “tank (con- The mapping consists of determining, for each tainer)” and the other vocabulary presents only the term of a vocabulary, the term with the closest corre- term “tank.” Also with respect to homonymy, it is spondence in another vocabulary. This mapping may worth noting that it is a type of correspondence that be unidirectional or bidirectional. Unidirectional is not as common when we are working with vocabu- mapping identifies, for those terms in a first vocabu- laries from the same knowledge field, as was done in lary X, those with the closest possible equivalence in the study by Neville. a second vocabulary Y. Therefore, all terms from X For Tennis (2001), the work of Neville (1970) will have a correspondent in Y, but not the other way represents a specific type of mapping, the supra- around. For the reverse order also to take place, it is thesauri, which may be defined as a type of mapping necessary to undertake bidirectional mapping, that is, that utilizes intermediary language, in which this lan- also map those equivalent terms in X for each term in guage is generated after an iterative process of con- Y. According to Doerr (2001), mapping is the basic ceptual analysis of the involved thesauri. Therefore, process in all other compatibilization techniques. the supra-thesauri only exist after the analysis of a As for the number of vocabularies involved in- predefined set of thesauri, and that is why its applica- creases, the situation becomes extremely complex, tion is limited to this group. This procedure is not and a more viable alternative is then the use of inter- supported by any literary warrant, as it is replaced by mediary languages. In this case, each vocabulary must the conceptual warrant derived from the involved vo- be mapped bidirectionally with the intermediary lan- cabulary. guage and, in so doing, produce the correspondence Soergel (1974) presents a table for the mapping of among all vocabularies involved. controlled vocabularies in which he suggests five pos- With respect to the main difficulties found during sible levels of conceptual equivalence among elements mapping (whether intermediary languages were used of distinct languages and, for each of these levels, five or not), Lancaster (1986) points out: possible levels of terminological equivalence: same term, distinct terms, and descriptor combinations a) differences between pre-coordination levels (utilizing logical operators). The table proposed by of involved vocabularies; Soergel (1974) is presented in Table 1 below. b) differences between levels of specificity; and The mapping of one vocabulary into another is a c) the lack of corresponding elements in one of tiresome and time-consuming intellectual task (Lan- the vocabularies. caster, 1986). For this reason, some studies were con- ducted in the late 1960s and early 1970s with the pur- For both the first two situations, the solution pro- pose of developing algorithms that would automate at posed by the author is to map one element of a vo- least a part of the process. Some types of mappings cabulary in relation to various elements in the other. that may be realized automatically were identified in Eventually, the solution would be to add the element these algorithms. into one of the vocabularies. This situation occurs in Lancaster (1986) also produces another specific smaller proportions as the overlaying of subject mat- type of technique for the compatibilization of vo- ters covered by both vocabularies increases. cabularies which he calls the integrated vocabulary Neville (1970) presents an example of a step-by- approach and whose purpose is slightly different step mapping of thesauri, which he denominated rec- from that of intermediary languages, as equivalences onciliation of thesauri. Considering the terminologi- among elements of involved vocabularies are defined cal and conceptual models, the author identified six according to the user’s search strategy and, conse- possible correspondence types between elements quently, no new neutral language is created. In those Knowl. Org. 39(2012)No.6 439 G. Baião, G. Â. Borém Lima. Using Topic Maps in Establishing Compatibility ...

Table 1. Terminological and conceptual compatibilization matrix Source: Adapted from SOERGEL, 1974, p. 506. cases, the users themselves are the ones who select case, the categorization of subjects and facets was the comparison strategy: exact equivalence, syno- based on formal categories and subcategories pro- nyms, related terms, number of terms to be exhibited posed by Dahlberg (1978). as results, etc. The search result displayed the occur- It is worth noting that, in addition to the differ- rences at the selected data bases, each one with their ences between the categorization models adopted in specific terms and the reason for which the occur- the construction of each one of the two schemes, the rence was considered. For Lancaster (1986), the com- latter was constructed independently of the scenario patibility matrix proposed by Dahlberg (1981) is an- used in the first one, that is, at no time was there any other example of the integrated vocabulary approach. preoccupation with the compatibility between the two. Therefore, they can be considered an ideal sam- 4.0 Procedures adopted ple for construing the problematic scenario previ- ously described. Two faceted conceptual structures were utilized as the According to Garshol (2004), the three steps for empirical object for the experiment. The first one was the creation of a topic map based on a faceted classifi- developed by Lima (2004) during the implementation cation system are: of the MHTX prototype, presented in her doctorate dissertation. It is the result of the facet analysis of the a) create, for each facet, a “facet” type topic doctoral dissertation by Madalena Martins Lopes named after the facet; Naves, PhD in Information Science from the UFMG b) create, for each highest level term of each Information Science School, from the year 2000. In facet, a “term” type topic, whose name will be that analysis, the set of principles suggested by the the term proper, and associate it to the corre- Spiteri Simplified Model was utilized. sponding facet using a “belonging to facet” type The second one was constructed by Professor association; Gercina Ângela Borém de Oliveira Lima, in partner- c) create, for each term just below the highest ship with Professor Maria Luiza de Almeida Campos, level term, a “term” type topic whose name will both with degrees in librarianship and PhDs in in- be the term proper, and associate it to its father formation science. It resulted from the facet analysis using a “generic/specific” type association. of the doctoral dissertation by Lima (2004). In that 440 Knowl. Org. 39(2012)No.6 G. Baião, G. Â. Borém Lima. Using Topic Maps in Establishing Compatibility ...

In order to maintain the representativeness of the a PSI may be represented by any type of data, the ideal conceptual maps developed by Lima (2004) and avoid situation is that they be in a repository from which semantic loss in the faceted structure, it was neces- they are made available on the Web and that each of sary to make some adaptations to the route suggested their elements be addressable by means of a specific by Garshol (2004): URI. This allows for the reutilization and the collabo- rative construction of ever more complete and more a) instead of creating one single “facet” type comprehensive PSI repositories from the conceptual topic, eight topics were created, each of them point of view. representing one different facet or subfacet In order to assemble a second topic map, the same level, since, in the scheme proposed by Lima procedures were adopted, this time based on a dis- (2004), at each level a different symbol attribute tinct faceted structure that was obtained from the was given in the conceptual map; analysis of the dissertation by Lima (2004). However, b) for each association roles were also defined in this second map, the PSIs were not added, since for the topics involved. In a “belonging to facet” the creation of a PSI for each new topic, as was done type association, for example, one of the topics in the first map, might generate more than one PSI involved takes up the “facet” role and the other for the same subject, in the event a subject was being one the “term” role; dealt with in both maps. That is, the reutilization of c) in those cases in which synonymous terms already published subject matters demands previous were kept (Concept/Idea/Thought, for in- manual conceptual analysis to be performed by the stance), the “topic names” property was used to professional constructing the topic map. Since this indicate that all the terms referred to one same study deals with the compatibilization by means of subject; automated resources already present in the topic d) to indicate the division criteria, “division cri- maps, conducting the analysis for the insertion of teria” type topics were created and, subse- PSIs in the second map would be justifiable. quently, associated to the corresponding facet The software used to aid in the construction, editing through a “divides the facet” type association. and navigation of the topic maps was Topincs , version 1.3.3, which In order to make a “belongs to the facet” type associa- is a free, open code software application for academic tion between those terms subordinate to each of these purposes and which allows for the exportation of maps criteria and the immediately superior facet, it was nec- into XTM files. This resource provides for the visuali- essary to add a third role indicating the division crite- zation of maps created in other topic map editing tools ria with the goal of making explicit which terms that and navigators that support the same standard. are subordinate to the facet belong to which criteria. TMNav was used for graphic visualization of the This means that, in the case of subordinations for maps. This software enables different graphic forms of which there was an explicit division criterion in the visualization, such as hierarchical (in tree format) and original scheme, a ternary relation was generated in- through a hyperbolic map, which, according to Lima volving the facet, the subfacet (whose role was defined (2004) and Silva (2007) is a highly recommended in- as “term”) and the division criteria in question. terface for navigation in hypertextual resources. Following that, occurrences were created for each Several other software programs were analyzed be- one of the topics. Each occurrence pointed to the fore these were selected. However, some presented a HTML page corresponding to the part of Naves’s high level of installation and handling complexity, re- (2000) dissertation that best defined its semantic con- quiring, inclusively, the mastering of a programming tent. Therefore, all links between elements of the map language or code architecture (TM4J, available and the respective information resources that had http://www.techquila.com/tm4j.html), requires broad been created by Lima (2004) were kept. From that knowledge of Java for the installation and compila- point on, it was possible to navigate in context tion of its components). Others, like TMProc4 do not through the topic map. support the XTM standard and require the installa- The next step was the creation of a PSI repository. tion of several other little-known software programs. As previously mentioned, PSIs are especially respon- For the automated merging of the two topic maps sible for the portability of topic maps, for they are in- produced during the previously described stages, the tended to provide one singular identification of a sub- Ontopia OKS Samplers was used. ject with no ambiguities. For that reason, even though Knowl. Org. 39(2012)No.6 441 G. Baião, G. Â. Borém Lima. Using Topic Maps in Establishing Compatibility ...

5.0 Results analysis Symbol Description

The first step of the results analysis was the manual ▲ Terminological and conceptual equivalence; structural divergence; unification of the faceted structures involved. The ■ Terminological variation; conceptual equiva- unified structure resulting from this procedure was lence; structural divergence; used as a parameter for comparison with the topic ◊ Terminological divergence; conceptual di- map obtained after the automated merging of the two vergence only in relation to the level of previously generated maps originating from these specificity; structural equivalence; same structures. For the production of the manually □ Structural equivalence only; unified scheme, the structure related to the disserta- Table 2. Symbols attributed to each of the relation types tion by Lima (2004) was used as the basis. Following identified during manual unification. that, an attempt was made to allocate into this basic structure each of the descriptors from the second Figure 1 shows a small part of the structure resulting scheme, related to the dissertation by Naves (2000), from the unification as a function of the steps de- considering the subject to which each one referred to scribed previously. in each piece of work separately. The identification of the relations, besides allowing During this process, it was possible to identify dif- for a better comparison of results, serves as a justifi- ferent types of relations between the elements of cation for the insertion of each term from the second these two structures. These relations were classified scheme into the first scheme. An example of that is considering the three levels of compatibilization the term “exact sciences” that, as it had been inserted (terminological, conceptual, and structural) presented in the symbol ◊, allows us to deduce that there is a di- by Soergel (1982). The first type of relation identified vergence only in the specificity level. Considering was the terminological, conceptual and structural that the hierarchical chain in which the term now equivalence, that is, the same descriptor present in takes part, one may deduce that it is a subject whose both structures, thus denoting the same concept and specificity level is greater than that of “Natural Real in similar hierarchies. Sciences” and smaller than “Computer Sciences.” A second type of relation was identified among the In order to facilitate the comparison of results from elements of the same terminology and conceptual load both processes (the manual unification of the two with slight variations such as number variation. An- schemes and the automated merging of the topic other type of relation was identified among those ele- maps), it was necessary to navigate throughout the re- ments that were equivalent from the conceptual and sulting topic map using OKS Samplers so as to extract terminological point of view, but were presented under a hierarchical structure in the same format as that of a different subordination in each scheme, which implies the scheme generated during the manual unification. a divergence at the structural level. The fourth type is a At this point, it was noticed that there were topics variation of the latter, due to terminological variations in the resulting topic map that were subordinate to between descriptors. The fifth type was identified more than one “parents” topic, that is, that polyhier- among elements with different descriptors and which archies occurred. This happened in all those cases in were conceptually divergent only in relation to their which two identical descriptors were present at differ- specificity. The last type of relation identified occurred ent levels in the original schemes. This type of relation in those cases in which the elements had no previous had been dealt with using the symbol ▲ in the manual relations, but had their allocation defined so as to pro- unification process. Figure 2 exemplifies that with the vide for the structural compatibility between the “terminology” topic for which distinct subordination schemes involved and the resulting scheme. could be observed. In order to provide for an easier identification of The occurrence of polyhierarchies, allied to other the types of relations in the resulting scheme, the fol- factors that will be commented on later in this article, lowing symbol mapping strategy was adopted: make it evident that merging the maps at the struc- tural level results in a sum or union; that is, for those Symbol Description cases in which the same descriptor occurs in both maps, the two hierarchies in which they occur remain ∆ Structural, conceptual and terminological after merging. equivalence ╤ Terminological variation; structural and By comparing the two schemes, one observes that conceptual equivalence the terminological variations, even when small, ex- 442 Knowl. Org. 39(2012)No.6 G. Baião, G. Â. Borém Lima. Using Topic Maps in Establishing Compatibility ...

Figure 1. Scheme resulting from the manual unification.

Figure 2. Example of polyhierarchy in the topic map erted negative influence on the process of merging fore, at the moment the user, when navigating the maps, as among the characteristics of the topics through the map, reached a topic resulting from the used for the identification of equivalences are the unification of two topics (one from each map used in names that are attributed to them. The “authors” and the input) the two relevant information resources “author” topics, for instance, which in both maps were displayed. represented the same subject, were not unified. This Those cases that were treated with the ◊ symbol means that the relations identified by the ╤ and ■ (terminological divergence, conceptual divergence symbols in the manually unified scheme did not re- only in relation to the specificity level, and structural ceive the treatment expected in the automated merg- equivalence) did not receive any type of treatment in ing. As to those topics that took part in relations the automated process, and were presented in the re- identified by the symbol ∆ (terminological and con- sulting scheme as non-related topics. Finally, the ceptual equivalence), they were duly unified. cases of simply structural equivalence, represented in It is important to note that in those cases in which the manual unification by the symbol □, were duly two topics were unified into one, all properties of treated in the resulting merge topic map mainly both topics such as names, associations, and, mainly, thanks to the terminological compatibility between occurrences were kept in the resulting topic. There- the large categories dealt with in the two schemes. Knowl. Org. 39(2012)No.6 443 G. Baião, G. Â. Borém Lima. Using Topic Maps in Establishing Compatibility ...

6.0 Final considerations and future developments ments may come to propose new subject matter analysis models based on the theory of facet analysis The topic map merge may be defined, within the that may serve as the basis for the construction of scope of interoperability between vocabularies, as an more complete topic maps, under the semantic point automated terminological compatibilization, based on of view, and more prepared for compatibilization topic names; that is, two topics in different maps that through the utilization of scopes, for example. In ad- have one of more common descriptors are unified dition to that, other proposals for the conversion of into one single topic in the resulting map. It is in this traditional IC classification schemes into topic maps topic that the sum or union of the characteristics of may be developed, with the use of types of topics, as- the original topics is deposited. Occasional variations sociations, and roles that are different from those in the utilization of descriptors may influence nega- used in this project. tively the result of the process, even when it is with The occurrence of polyhierarchies in the map re- regard to small variations, such as divergence in the sulting from the automated merging proved to be ef- utilization of a descriptor in the singular or plural. ficient in the sense that it showed different concep- Some future studies may propose the incorporation tual definitions for one same term, the occurrence of of algorithms for the mating of spelling verifications which is clearly possible when we are working with and standards, such as those currently used by information resources bearing different authorships. Google, into topic map manipulation tools, thus pro- This fact acquires even greater relevance in those viding for the treatment of small terminological varia- cases in which the content of the information re- tions or spelling mistakes. The structural compatibili- sources that are being dealt with belong to knowledge zation while merging takes place as a consequence of areas whose terminology is not well defined. Such is occasional unifications at the terminological level. the case with disciplines dealing with bibliographical At the conceptual level, the proposal in the topic classification (Satija, 2000), since, in these fields, au- maps is to use the PSI concept. However, the identifi- thors will use terminologies that differ from those cation of equivalences between one new topic and ex- used in other fields. With no possibilities for poly- isting PSIs is a completely manual process, which re- hierarchies, there would be the risk of some term be- quires intense intellectual effort on the part of the user ing housed under a subordination whose semantic responsible for the new map. That is, the utilization of value does not coincide with any of the occurrences PSIs, which is of fundamental importance if topic that are likely to be accessed by the user. maps are to accomplish their goal of interoperability, is For the same reasons, we must emphasize the im- similar to the process of utilizing intermediary lan- portance of the possibility of attributing various guages for the compatibilization of vocabularies. names to the same topic. Once such equivalences are Research into the compatibilization of vocabularies explicit, even when one any subject is referred to by may come to contribute towards the development of different authors through distinct terms, the literary new topic map interoperability techniques. The great warrant remains. Most specifically in the cases of sys- majority of these studies, whose initial development tems that favor context navigation to the detriment dates back to 1960s, were focused mainly on its appli- of other search techniques, the role literary warrant cation in thesauri. It was only more recently that plays is even more important than that it plays in these studies have begun to be revisited with the goal other systems that prioritize other search techniques. of integrating other forms of information representa- In the case of textual search, for example, the user tion, such as ontologies, for example). warrant becomes essential. The topic maps generated from faceted schemes In relation to the objective of the topic maps, are extremely simple from the point of view of the re- which is to provide for contextualized navigation and sources offered by the maps (endless types of associa- offer greater flexibility for the representation of sub- tions, names of topics, scope, etc.), due to a fragility jects and relationships, it is possible to say that the of semantic explicitation of these schemes: types of technology fulfills its role well. It supports a diversity limited relations, scope, and cardinality are not con- of navigation interfaces (textual, hierarchical, in sev- sidered. In addition, polyhierarchy is not allowed, eral forms of conceptual maps, or in various other since the principles defended by Ranganathan estab- graphic forms), with most of them permitting the lish that the ranking classes must be mutually exclu- user to efficiently access the location of any hypertext sive; that is, no subject from the structure may belong knots in relation to its global structure. Such effi- to more than one class in the rank. Future develop- ciency is founded on mapping directly between the 444 Knowl. Org. 39(2012)No.6 G. Baião, G. Â. Borém Lima. Using Topic Maps in Establishing Compatibility ...

structure represented by the map and the sources of 4. TMProc was developed by the Ontopia group information to which this structure refers. Addition- (http://www.ontopia.net/) and was one of the first ally, there are no restrictions or limits for semantic software applications developed for the manipula- representation; any idea, object or thought, as well as tion of topic maps. any type of relation they may have, may be repre- sented by means of topic maps. The possibility of es- References tablishing limits for the application of certain subjects (scope) and the possibility of representing one same Ahmed, Kal, and Moore, Graham. 2006. An introduc- subject in different hierarchies (polyhierarchy) on in tion to topic maps. Architecture journal July. Avail- the same map contribute even further to its represen- able http://msdn.microsoft.com/en-us/library/aa4 tation flexibility. 80048.aspx. With regard to the of hetero- Campos, Maria Luiza de Almeida. 2005. A prob- geneous schemes and the shared construction of con- lemática da compatibilização terminológica e a in- ceptualizations, which are also considered objectives tegração de ontologias: o papel das definições con- of the topic maps, we observed the confirmation of ceituais. In Anais do VI ENANCIB: Encontro Na- what some language and vocabulary compatibilization cional de Pesquisa em Ciência da Informação, 28-30 researchers had already observed: complexity of the Nov. 2005, Florianópolis. Florianópolis: Universi- compatibilization at the conceptual level, requiring a dade Federal de Santa Catarina. high level of manual interference. From that point of Coates, Eric J. 1970. Switching languages for index- view, the topic map standard comes to simply play the ing. Journal of documentation 26:102-10. role of supporting human analyses and decisions, Colmenero Ruiz, Maria Jesús. 2005. Introducción al since the automation of the process is concentrated at modelo topic maps (ISO/IEC13250:2003). Revista the terminological level. digital de biblioteconomia e ciência da informação With the evolution of an XTM standard, the devel- 3 n1. Available http://www.brapci.ufpr.br/docu opment of languages for specific consultations and mento.?dd0=0000007483&dd1=0f592. the possibility of implementing inference rules, new Dahlberg, Ingetraut. 1978. A referent-oriented, ana- projects will be undertaken in an effort to reassess the lytical concept theory for INTERCONCEPT. In- level of automation of merge procedures. ternational classification 5: 142-51. Dahlberg, Ingetraut. 1981.Toward establishment of Notes compatibility between indexing languages. Interna- tional classification 8: 86-91. 1. Some examples of such restrictions are: “Every per- Doerr, Martin. 2001. Semantic problems of thesaurus son was born somewhere” (considering a map ap- mapping. Journal of digital information 1n8. Avail- proaching both the topics “person” and “place”); able http://journals.tdl.org/jodi/article/viewArticle/ “An animal is always the offspring of another ani- 31/. mal.” Garshol, Lars Marius. 2004. Metadata? thesauri? tax- 2. According to Maniez (1997) the words “integra- onomies? topic maps! making sense of it all. Jour- tion,” “harmonization,” “reconciliation,” and “con- nal of information science 30:378-91. Available cordance” are usually applied to indicate the con- http://www.ontopia.net/topicmaps/materials/ cept of convergence, that is, the same concept of tm-vs-thesauri.. “compatibilization.” However, Lancaster (1986) Garshol, Lars Marius. 2002. What are topic maps. considers integration of vocabularies as a specific XML.com September 11. Available http://www. method of compatibilization, in which there is no .com/pub/a/2002/09/11/topicmaps.html. conceptual analysis, that is, for a certain user re- Glushkov, Victor M., Skorokhod'ko, Ėduard Fe- quest, all related occurrences (including word varia- dorovich, and Strongnii, A.A. 1978. Evaluation of tions) and elements are shown as a result, without the degree of compatibility of information retrieval showing the relation or the context in which each languages of document retrieval systems. Auto- one of them are present. matic documentation and mathematical linguistics 3. Polyhierarchy occurs when a specific term may be 12n1:18-26. subordinate to more than one generic term. The Guinchat, Claire, and Menou, Michel. 1994. Intro- term “geological chemistry,” for example, can be dução geral às ciências e técnicas da informação e subordinate both to chemistry and geology. documentação. 2. ed. Brasília: IBICT. Knowl. Org. 39(2012)No.6 445 G. Baião, G. Â. Borém Lima. Using Topic Maps in Establishing Compatibility ...

Hudon, Michèle. 2004. Conceptual and lexical com- Park, Jack, and Hunting, Sam. 2003. XML topic maps: patibility in thesauri used to describe and access creating and using topic maps for the web. Boston: moving image collections. In Julien, Heidi, and Addison-Wesley. Thompson, Sharon eds., Access to information: tech- Pepper, Steve. 2000. The TAO of topic maps: finding nologies, skills, and socio-political context, proceed- the way in the age of infoglut. In proceedings of ings of the Congress of the Canadian Federation for the XML Europe conference, 2000, Paris. Available the Humanities and Social Sciences, June 3-5, 2004, http://www.ontopia.net/topicmaps/materials/tao. Winnipeg, Manitoba, University of Manitoba. Avail- html. able http://www.cais-acsi.ca /proceedings/2004/ Pepper, Steve, and Moore, Graham. 2001. XML topic hudon_2004.. maps (XTM) 1.0.: Topicmaps.Org Specification. Lancaster, Frederic Wilfrid. 1986. Vocabulary control Available http://www.topicmaps.org/xtm/. for information retrieval. Arlington, VA: Info Re- Rada, Roy, and Martin, Brian K. 1987. Augmenting sources Press. thesauri for information systems. ACM transac- Librelotto, Giovani Rubert. 2005. "Topic maps: da tions on office information systems 5n4: 378-92. sintaxe à semântica." PhD diss., Universidade do Available http://dl.acm.org/citation.cfm?id=42246 Minho. Available http://tede.ibict.br/tde_busca/ &dl=ACM&coll=DL&CFID=144583757&CFT arquivo.php? codArquivo=486. OKEN=16357518. Lima, Carlos Eduardo de, Fagundes, Fabiano. 2004. Satija, Mohinder Partap. 2000. Library classification: Utilização de mapas de tópicos no desen- an essay in terminology. Knowledge organization volvimento de hiperdocumentos educacionais. In 27:221-29. VI Encontro de estudantes de informatica do Silva, Marcel Ferrante. 2007. Estudo comparativo en- estado do tocantins – ENCOINFO 6, 4 e 5 de no- tre interfaces hipertextuais de soft-wares para a vembro de 2004, Palmas, Tocantins. Palmas, To- representação do conhecimento. Ponto de acesso cantins: CEULP/ULBRA. 1n2:3-21. Lima, Gercina Ângela Borém de Oliveira. 2004. Soergel, Dagobert. 1974. Indexing languages and “Mapa hipertextual (MHTX): um modelo para or- thesauri: construction and maintenance. Los Ange- ganização hipertextual de documentos.” PhD diss., les: Melville. Universidade Federal de Minas Gerais, Belo Hori- Soergel, Dagobert. 1982. Compatibility of vocabular- zonte. ies. In Riggs, Fred W. ed., Proceedings of the Maniez, Jacques. 1997. Database merging and the Conta Conference on Conceptual and Termino- compatibility of indexing languages. Knowledge or- logical Analysis in the Social Sciences, May 24-27, ganization 24: 213-24. 1981, Bielefeld, FRG. Frankfurt: Indeks Verlag, pp. Naves, Madalena M. L. 2000. "Fatores interferentes 209-23. no processo de análise de assunto: estudo de caso Tennis, Joseph T. 2001. Layers of meaning: disentan- de indexadores." PhD diss., Universidade Federal gling subject access interoperability. In proceed- de Minas Gerais, Belo Horizonte. ings of the 12th ASIS SIG/CR Classification Re- Neville, H. H. 1970. Feasibility study of a scheme for search Workshop 2001, Washington, DC. Washing- reconciling thesauri covering a commom subject. ton, DC: University of Washington. Journal of documentation 26: 313-36. Zhang, Xueying. 2006. Concept integration of docu- Newman, Simon M. 1965. Information systems com- ment databases using different indexing languages. patibility. Washington, D.C.: Spartan Books. Information processing & management 42:121-35