ABSTRACT

AN EMPIRICAL EVALUATION OF SEMANTIC SIMILARITY MEASURES USING THE WORDNET AND UMLS ONTOLOGIES

by Youbo Wang

Ontologies have been promoted as a sound basis for communications of Semantic Web, the next generation of the Web. The proliferation of multiple ontologies for heterogeneous information systems developed independently, however, requires tools to enable semantic interoperability established by discovering semantically appropriate mappings between different and independent ontologies. Finding this mapping requires some notion of a measure of semantic relatedness between concepts. Numerous such measures have been proposed for use within an ontology. The evaluation of these measures has been done predominately through experiments using these measures on word pairs from the WordNet ontology. The results are then compared to human judgments on these same pairs. In this work, an experimental software testbed that implements the semantic relatedness measures and automates their performance testing is developed to evaluate the validity, performance and applicability of these measures on two very different ontologies, WordNet and Unified Medical Language System (UMLS).

AN EMPIRICAL EVALUATION OF SEMANTIC SIMILARITY MEASURES USING THE WORDNET AND UMLS ONTOLOGIES

A Thesis

Submitted to the Faculty of Miami University in partial fulfillment of the requirements for the degree of Master of Computer Science Department of Computer Science & Systems Analysis by Youbo Wang Miami University Oxford, Ohio 2005

Advisor ______Dr. Valerie Cross

Reader ______Dr. James Kiper

Reader ______Dr. Fazli Can TABLE OF CONTENTS

1 Introduction...... 1 2 Ontologies: Key to Semantic Web Success...... 2 2.1 Definition ...... 3 2.2 Types and Uses of Ontologies ...... 4 2.3 Languages for Ontology Specification and Ontology Editors...... 5 2.3.1 Ontology Languages ...... 6 2.3.2 Ontology Editor ...... 7 2.4 WordNet and UMLS Ontologies Overview ...... 9 2.4.1 WordNet Introduction...... 10 2.4.2 UMLS Introduction...... 11 3 Semantic Similarity...... 15 3.1 Motivation...... 15 3.2 Semantic Distance, Similarity and Relatedness Measures ...... 15 3.2.1 Network Distance-Based Approach...... 16 3.2.2 Information Content-Based Approaches...... 18 3.3 Determining Information Content Measures ...... 20 4 Other Researchers’ WordNet and UMLS Experiments...... 21 4.1 WordNet Experiments ...... 21 4.2 Semantic Similarity Experiments in UMLS ...... 23 5 Experiments Using the Protégé Testbed...... 25 5.1 Overview...... 25 5.2 Difficulties in Acquiring WordNet and UMLS Ontologies...... 27 5.3 WordNet and UMLS Ontology Structure...... 28 5.3.1 WordNet in OWL ...... 28 5.3.2 UMLS in OWL ...... 30 5.4 Testbed Description ...... 31 5.5 WordNet Experiment...... 37 5.5.1 Experiment and results...... 37 5.5.2 Analysis...... 41 5.6 UMLS Experiments ...... 42 5.6.1 Experiments and results ...... 42 5.6.2 Analysis...... 45

ii 6 Significance...... 50 7 Conclusions and Future Work ...... 51 References…...... 53 Appendix…...... 56 Appendix A: Experts’ Scores………………………………………….…………….57 Appendix B: Semantic Similarities of UMLS Vocabularies………………...... 58 Appendix C: Ranking according to Pearson Correlation……………….…...... 64 Appendix D: Rankings of each method in three vocabularies……………...... 67

iii

LIST OF TABLES

Table 4.1 11 related concepts in UMLS vocabularies ...... 24 Table 4.2 unrelated concepts...... 25 Table 5.1 Arguments of WordNet and UMLS experiments...... 33 Table 5.2 Classes and their associated properties of the TestCase Ontology...... 36 Table 5.3 Information content semantic similarity measures of WordNet experiment...... 38 Table 5.4 Network distance-based semantic similarity measures of WordNet experiment ....38 Table 5.5 Least subsumers of three word pairs in WordNet 2.0 and WordNet 1.7...... 40 Table 5.6 Correlations with Human judgment for Wordnet experiment...... 40 Table 5.7 Correlations for MSH ...... 44 Table 5.8 Correlations for SNMI...... 45 Table 5.9 Correlations for ICD9CM...... 45 Table 5.10 Average correlations for AverageES across all vocabularies...... 45 Table 5.11Ranking of all six methods in each vocabulary according to Pearson Correlation 46 Table 5.12 Average correlation of different measures...... 47 Table 5.13 Correlations of WordNet and UMLS...... 48 Table 5.14 Number of pairs in each group ...... 49 Table 5.15 Number of common pairs in each group ...... 49

iv LIST OF FIGURES

Figure 2.1 A high level portions of OpenCyc (CYC)...... 5 Figure 2.2 Layer cake structure of the Semantic Web...... 7 Figure 2.3 The Protégé 3.1 user interface...... 9 Figure 2.4 A WordNet noun hierarchy example...... 11 Figure 2.5 MSH concepts and relations example ...... 13 Figure 2.6 UMLS Semantic Network example...... 14 Figure 3.1 An example of a hierarchy ...... 17 Figure 3.2 A fragment of the WordNet ontology ...... 18 Figure 5.1 Hierarchical structure of WordNet ...... 29 Figure 5.2 Hierarchical structure of UMLS vocabularies...... 30 Figure 5.3 The interface of integrated tab plugin ...... 31 Figure 5.4 The pop-up window used for entering word pairs...... 33 Figure 5.5 Screenshots of WordNet and ICD9CM experiments ...... 34 Figure 5.6 The interface of Test plugin ...... 35 Figure 5.7 Hierarchical structure of TestCase Ontology ...... 36 Figure 5.8 A cycle example in ICD9CM...... 43 Figure 5.9 A cycle example in ICD9CM...... 43

v

ACKNOWLEDGEMENTS

I would like to thank Prof. Valerie Cross for her constant support, valuable advice and encouragement. I also want to thank Sam Holton for his help and my thesis committee members, Prof. James Kiper and Prof. Fazli Can for their guidance and their time in reviewing this work.

vi

1 Introduction The Semantic Web was introduced to the public in 2001 when Tim Berners-Lee described his vision of the Semantic Web as an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation (Berners-Lee 2001). The idea is to turn the WWW into a machine understandable knowledge base by adding “semantic annotations” to web pages in a computer-readable format. These annotations represent an encoding of the meaning for the web page information viewed by a human reader. These annotations can then be connected to additional information included in ontologies. This additional information enables determining the meaning of the web page’s information and allows inference rules and inference engines to perform automated reasoning to construct corresponding information. In addition to semantic annotations to web pages and ontologies, another important component of the Semantic Web is software agents, often viewed as “personal assistants” to human users. The primary responsibilities of agents are to aid in information retrieval, filtering and processing information and finding services needed by the human users. In order for agents to carry out their tasks, they will look to ontologies for assistance in distinguishing and understanding concepts conveyed by the semantic annotations. By using ontologies, agents may find new concepts and associate them with ones they already understand. Using the structure of and the inference rules or axioms within the ontology, they may infer additional knowledge useful to completing their individual assignments. Ontologies, serving as the focal points of the Semantic web, will permit processing and sharing of information far beyond the WWW’s capability of simply exchanging data. This research focuses on measuring semantic similarity among concepts in ontologies and analyzing the performance of different semantic similarity measures. Two ontologies used in our research are WordNet 1.7 and UMLS (Unified Medical Language System). Six measures, three information content-based and three network distance-based measures were implemented as plugins for the ontology editor Protégé 3.1 (http://protege.stanford.edu). For each ontology, a set of concept pairs were selected and

1 the semantic similarity of those pairs were measured using each of those six measures. To evaluate our experimental results, the correlations between the measures obtained in our experiments and human experts’ scores were computed. Since the word “ontology” is used in many varying contexts and interpreted in a number of distinctive ways, section 2 first provides some clarification on its meaning, describes the types of ontologies, discusses current representation formats for ontologies, and then introduces WordNet and UMLS, two ontologies used in this research. Section 3 discusses the motivation of measuring semantic similarity, clarifies the difference among semantic similarity related concepts, and describes the details of the measures implemented in this research. Section 4 summarizes some previous experiments that were carried out based on WordNet and UMLS. Section 5 describes our experiments including the Protégé testbed, the difficulties overcome in performing the experiments, implementation details, and results and analysis. Section 6 explains the significance of this research. Conclusions and future opportunities for continuing this research are discussed.

2 Ontologies: Key to Semantic Web Success In philosophy, the ontology seeks “a classification that is exhaustive in the sense that all types of entities are included in its classifications, including also the types of relations by which the entities are tied together” (Smith 2001). This word was borrowed by computer science in 1980’s when John McCarthy first used it in his paper “Circumscription- A Form of Nonmonotonic Reasoning” (1986). He pointed out that to build logic-based intelligent systems, “building an ontology of our world” that lists everything that exists is necessary. Sowa (1984) proposed a more explicit explanation that an ontology is “a catalogue of everything that makes up the world, and how it’s put together.” In 1986, Alexander and his colleagues first applied this word in computer science without overlapping with philosophy. He suggested using ontological analysis to create a formal specification for a knowledge domain. This word has seen a steady increase in its use in the area of artificial intelligence after 1986.

2 During the early 1990’s, research in specifying, building, maintaining and using ontologies started to become popular. Ontologies have been used in a variety of areas including artificial intelligence (AI), information retrieval (IR), database theory, electronic commerce, and knowledge management for the purpose of reasoning, classification, facilitating communication and sharing of information among different systems. The popularity of ontologies is due to their ability to offer a common understanding of a specific domain that can be communicated between people. An ontology can serve as a unifying framework to eliminate poor communication and the lack of interoperability that are often caused by different needs, background, and contexts within a single organization or across different organizations and systems with different terminologies, overlapping concepts, and mismatched concepts.

2.1 Definition In the computer science discipline, many definitions exist for the word ontology. Gruber (1993) was one of the first to provide a succinct and often used definition: “An ontology is an explicit specification of a conceptualization.” Conceptualization can be viewed as a set of concepts, their definitions and their inter-relationships. A concept (sometimes called a class) describes a set of entities in a domain. Each concept may have several properties which describe various features and attributes of the concept. For example, in an newspaper ontology (http://protege.stanford.edu/overview/), the concept editor may have the properties such as name, hired date, and phone number. Those properties sometimes are called slots. Two kinds of relations used within ontologies are taxonomic and associative relationships. Taxonomies are organized as a sub-super- concept tree structure, using “isa” relationships. Associative relationships connect concepts across tree structures, such as an editor isAssociativewith a newspaper. Like concepts, relations also have properties. For example, relation properties may include whether a relation is optional or transitive. An ontology may be intensional or extensional. An intensional ontology only includes the ontology schema or definition. Axioms are part of the intensional ontology and specify constraints for concepts or their instances (Smith and Welty 2001). An extensional ontology includes the instances of an ontology. Instances are occurrences of a concept. An extensional ontology is created by ontology instantiation which is to insert

3 instances of the corresponding concepts. Strictly speaking, instances should not occur within an intensional ontology itself since it is a conceptualization of the domain. Both the intensional ontology together with its corresponding extensional ontology is known as a knowledge base.

2.2 Types and Uses of Ontologies A classification of ontologies was proposed by ven Heijst, Schreiber and Wielinga (1997). They distinguish ontologies according to two dimensions: the amount and type of structure of the conceptualization and the subject of the conceptualization. With respect to the structure dimension, ontologies can be distinguished with increasing complexity based on the description detail for the concepts and the kinds of relationship supported between concepts. The following categories of ontologies are listed in increasing order of complexity: • Terminological ontologies, which specify the terms that are used to represent knowledge in the domain of discourse, such as a medical ontology. • Information ontologies, which specify the structure of concepts by declaring properties for concepts, such as the record structure of a database. • Knowledge modeling ontologies, which specify conceptualizations of the knowledge, and have a richer internal structure. According to the subject dimension, ontologies can be placed into four categorizes: • Application ontologies, which contain all the definitions that are needed to model the knowledge required for a certain application. • Domain ontologies, which are defined for conceptualizing the particular domain. • Generic or upper level ontologies, which define concepts that are generic across many fields, such as state, event, action, etc. Domain ontologies may be mapped to generic ontologies since domain concepts may be the specializations of concepts in generic ontologies. • Representation ontologies, which explicate the conceptualizations that underlie knowledge representation formalisms (Davis 1993). One of the most famous upper level ontologies is the Cyc Upper Ontology released first in 1997 by Cycorp as a text file containing the most general concepts

4 defined in the ontology. Later in 2002, OpenCyc was released under the GNU Lesser General Public License and includes most upper level concepts and some mid-level ones. Figure 2.1 illustrates at a high level portions of OpenCyc (http://www.opencyc.org/)

Figure 2.1 A high level portions of OpenCyc (http://www.cyc.com/)

Hundreds of different types of ontologies have been developed to facilitate knowledge sharing, filtering, and retrieving. Examples of such ontologies can be found on the WWW at many different web sites: http://www.geneontology.org (gene ontology), http://www.geog.buffalo.edu/ncgia/i21/ontology.html (Geographic Ontology), http://www.w3.org/2001/sw/BestPractices/WNET/wordnet-sw-20040713.html (WordNet ontology), and Protege’s web site maintains a library of ontologies (http://protege.stanford.edu/download/ontologies.html).

2.3 Languages for Ontology Specification and Ontology Editors The future of the Semantic Web has been linked to the future of ontologies (Kim 2002), but an ontology itself is simply a specification of a certain domain, i.e., an abstract theory. In order for it to be used on the Semantic Web, it must be implemented using a representation format that is machine readable. Numerous technologies and standards have been used to represent ontologies. In this section several of the representation formats for ontologies that have been specifically developed for the use on the Semantic Web are discussed.

5 2.3.1 Ontology Languages Currently, ontology languages include XML/XMLS, RDF (Resource Description Framework) /RDFS (Resource Description Framework Schema), DAML (Darpa Agent Markup Language) + OIL (Ontology Inference Layer), and OWL (Ontology Web Language). Figure 2.2 (Berners-Lee 2000) presents the “layer cake” structure of the Semantic Web. This figure also illustrates the important role of ontologies in the Semantic Web as the ontology vocabulary layer serves as an interface between the three lower levels and the three upper levels. For the WWW, XML/XMLS is used to represent arbitrary information structures. XML/XMLS serves as the de facto standard for data exchange between applications; however, it only defines the syntax for data and cannot impose any semantic constraints on data. For example, it cannot handle the situation when different terms present the same meaning or the same term describes different meanings under different contexts (W3C 2004). Thus, to produce machine-readable data, XML/XMLS is not enough. RDF/RDFS, first recommended in 1999, can overcome these problems by describing and interchanging metadata (Powers 2003) to create machine-readable documents. With RDF/RDFS, a class can have subclasses and super classes associated with it. In addition, each property can have its subproperties, domains, and ranges. Thus, RDF and RDFS can be used as a representation format for the information content of an ontology. Although RDF/RDFS serves as a representation format for ontologies, it does not provide a means to express specific complex properties of concepts or relations, for example, that a property of a concept is unique or the cardinality constraints or transitivity of a relation. These advanced properties are essential to performing inferencing and the development of logical rules. DAML was created as an ontology and inference language by extending RDFS with more detailed properties and classes and expressiveness using simple terms for creating inferences. Its initial development DAML-ONT was released in October 2000 (Hendler and McGuinness 2000). As the U.S. government was working on DAML, the European Union IST (Information Society Technologies) program under the On-To-Knowledge project was developing the Ontology Inference Language. OIL merges the commonly used modeling primitives from frame-based languages with the formal semantics and reasoning capabilities of

6 description logics. The two groups joined forces to develop DAML+OIL language to express advanced classifications and properties of resources not expressible in RDFS and released the most current version March 2001. Subsequently, the Web Ontology (WebONT) Working Group was commissioned with the goal of creating an even more sophisticated ontology language using DAML+OIL as its basis (W3C 2003). The resulting ontology language OWL is an extension of DAMIL+OIL provides even greater machine interpretability of web page content. For example, OWL vocabulary can be used to specify disjointness between two classes.

Figure 2.2 Layer cake structure of the Semantic Web

The two foundational standards, the Resource Description Framework (RDF) and OWL Web Ontology Language, became W3C (The World Wide Web Consortium) recommendations in February 2004. Their recommendation indicates that W3C considers them ready for widespread adoption.

2.3.2 Ontology Editor Numerous ontology editors have been developed that permit the user to create, query, and maintain ontologies. Most have common editor and browsing capabilities and usually include ontology documentation, importation from and exportation to different formats, graphical views, and libraries of existing ontologies. In several surveys of

7 ontology editors (IST-2000-29243), Protégé developed at Stanford University has been favorably described and is becoming the de facto standard in the U.S. The most recent release Protégé 3.1 (http://protege.stanford.edu/download/registered.html), which is freely available, has been used as the foundation of our software development. Figure 2.3 shows the interface of Protégé 3.1 with the WordNet ontology definition being displayed.

Its scalability and extensibility permit new functionality to be added through the creation of plugins. The three categories of plugins are backend plugins, slot widgets, and tab plugins. Backend plugins provide the ability of storing and importing knowledge bases in different formats. Slot widgets are used to display and edit slot values. Tab plugins perform knowledge-based applications by linking with the knowledge base. Tab plugins are also the most popular among three categories (IST Project IST-2000-29243 2002).

Up to now, hundreds of plugins have been developed to add new functions to Protégé. Some plugins include RDF storage backend which is used to create, import and save RDF/RDFS files, OWL Storage backend which is used to load, save and edit OWL files in Protégé, and a UMLS tab which allows searching and annotating from the Unified Medical Language System (UMLS). In our research, we developed two tab plugins to calculate various semantic similarity measures between pairs of concepts in WordNet and UMLS ontologies. Those two plugins are described in section 5.3.

8

Figure 2.3 The Protégé 3.1 user interface

2.4 WordNet and UMLS Ontologies Overview In our research, we measured the similarity of concept pairs defined in two ontologies: WordNet and UMLS (Unified Medical Language System). Both are terminological ontologies but are in different domains. WordNet serves as both an electronic thesaurus and dictionary of the English language. UMLS contains a vocabulary specialized for the language of biomedicine and health. Most experiments in measuring semantic similarity have used WordNet. By comparing our results with the results obtained from these previous WordNet experiments, we are able to evaluate the validity of our implementation. The UMLS ontology is used in our research because of its significant influence on facilitating the development of information systems to understand the meaning of biomedicine and health language (UMLS Knowledge Sources Documentation 2005AB) to investigate if any performance differences exist among the measures when a specialized domain ontology is used. Only a few experiments to evaluate semantic similarity measures have been performed on this ontology, and only using the simplest network distance based measure (Caviedes and Cimino 2004). The following two sections provide a description of these two ontologies.

9 2.4.1 WordNet Introduction WordNet (Miller 1990) is an online lexical reference system developed by Cognitive Science Laboratory at Princeton University. WordNet 1.0 was released in June 1991. It is a freely available database originally designed as a semantic network, and expanded by addition of definitions, now also viewed as a dictionary. The latest version is WordNet 2.0 (http://wordnet.princeton.edu/). The basic unit of WordNet is a word, although it contains compounds, phrasal verbs, collocations, and idiomatic phrases. In WordNet, the lexicon is divided into five syntactic categories: nouns, verbs, adjectives, adverbs, and function words. Words are grouped as different synsets (synonym sets). Each synset represents a lexicalized concept. There are 115,000 synsets organized by 150,000 words in WordNet. Every synest contains a group of synonymous words. Each word may appear in several synsets if it has several senses. The meaning of a synset is defined as a gloss. For example, the synset “bird, fowl” is defined as: bird, fowl -- (the flesh of a bird or fowl (wild or domestic) used as food) Synsets are linked to each other by different relations. There are four main relations that link concepts within WordNet: • Synonymy: If two expressions can be exchanged in a sentence without changing the truth value of the sentence, these two expressions are synonymous. • Antonym: It is a symmetric relation. Word x’s antonymy is sometimes not-x. For example, ascend and descend are antonymy. • Hyponymy/Hypernymy: It is also called subordination/superordination, subset/superset, or the isa relation. If “an x is a (kind of) y”, x is a hyponym of y. For example, tree is a hyponym of plant. A plant is a hypernym of tree. • Meronymy/Holonymy: It is a part-whole (has a) relation. If “a y has an x”, x is a meronym of y. For example, branch is meronym of tree and tree is a holonym of branch. WordNet does not support the relations linking words from different syntactic categories. To some extent, WordNet can be viewed as a traditional dictionary since it gives glosses and sample sentences for a given synset representing a lexical concept. However, unlike a traditional dictionary, it does not contain word pronunciation, derivative,

10 morphology, etymology, and usage notes (Miller 1990). More importantly, WordNet provides an aid to search dictionaries conceptually, not alphabetically because it is organized with respect to word meanings not word forms. Because of its organization, WordNet is also called a semantic dictionary. It also qualifies as an upper ontology by including the most general concepts as well as more specialized concepts. The following Figure 2.4 illustrates one noun root tree for the lexical concept “entity, physical thing.” WordNet contains eight other noun hierarchies such as abstraction, event, and phenomenon.

Figure 2.4 A WordNet noun hierarchy example

2.4.2 UMLS Introduction UMLS (http://www.nlm.nih.gov/research/umls/documentation.html) (McCray 2004) is an ontology that combines many distinct terminologies and was created by the National Library of Medicine (NLM) in Bethesda MD. It helps retrieve information from different biomedicine and health sources. For that purpose, NLM published some databases which are called UMLS Knowledge Sources for users reading and customizing different sources. UMLS Knowledge Sources consist of Metathesaurus, Semantic Network, and the SPECIALIST Lexicon. Those components may be accessed online or

11 locally by the installing MetamorphoSys program. But a license must be issued first if the user wants to use those resources because vocabularies contained in Metathesaurus are produced by different copyright holders. A user registers at the official UMLS website (http://umlsks.nlm.nih.gov/kss/servlet/Turbine/template/admin,user,KSS_login.vm) to obtain a UMLS license. UMLS consists of a large vocabulary database, Metathesaurus. It contains 1.8 million biomedical and health-related concepts, various string concept names, and their relationships. In Metathesaurus, there are more than 100 source vocabularies including different terminologies, classifications, and some thesauri. It is organized by concepts. A concept represents a meaning, and each meaning has several string names associated with it. These groups of names for a concept are similar to WordNet’s synsets. The main purpose of Metathesaurus is to link all string names of the same concept in different source vocabularies and to identify the relationships between concepts (UMLS Knowledge Sources Documentation 2005AB). All information including concepts, string names, their relationships, and which source vocabulary it belongs to is defined in Metathesaurus rich release format (RRF) files provided by MetamorphoSys. The two most important files are MRREL.RRF and MRCONSO.RRF. MRCONSO.RRF connects each concept to its string names and source vocabulary. For example, the following row in MRCONSO table relates the concept C0001175 to the string name “Acquired Immunodeficiency Syndromes”, and declares that this concept belongs to MSH (Medical Subject Heading) source vocabulary.

C0001175|ENG|P|L0001175|VO|S0010340|Y|A0019182||M0000245|D000163|M SH|PM|D000163|Acquired Immunodeficiency Syndromes|0|N|| MRREL.RRF file provides the relationships between concepts. For example, the following row means that two concepts: C0002372 and C0002371 have the RB relation (C0002371 is broader than C0002372).

C0002372|A0022284|AUI|RB|C0002371|A0022279|AUI||R01983351||MSH|MSH |||N|| There are 15 relationships defined in source vocabularies: AQ (Allowed qualifier), CHD (has child relationship), DEL (Deleted concept), PAR (has parent relationship), QB (can by qualified by), RB (has a broader relationship), RL (alike or

12 similar relationship), RN (has a narrower relationship), RO (has relationship other than synonymous, narrower, or broader), RQ (related and possibly synonymous), RU (related, unspecified), SIB (has sibling relationship), SUBX (concept removed from current subset), SY (source asserted synonymy), and XR (not related). Figure 2.5, developed from interpreting the MRREL file for UMLS, illustrates some concepts and their relations in the MSH source vocabulary.

AQ C0005768 C0008031 AQ PAR AQ C0010068

PAR C0002962 C0010051 SIB

Figure 2.5 MSH concepts and relations example

The Metathesaurus is not an ontology in the formal sense because concepts are not fully interlinked. The UMLS Semantic Network provides an ontological framework for those concepts by assigning each concept a semantic type defined in the Semantic Network. Semantic types are linked by semantic relations in a tree structure. The Semantic Network consists of 135 semantic types and 54 semantic relations. In the Semantic Network, semantic types are nodes and relationships are links between nodes. There are seven major groups of semantic types: organisms, anatomical structures, biologic function, chemicals, events, physical objects, and concepts or ideas. Semantic types are a set of broad subject categories which are assigned to concepts in Metathesaurus to provide a consistent categorization. Each Metathesaurus concept is assigned at least one semantic type. The semantic type should be the most specific one in the hierarchy. The primary semantic relation is the isa relation that establishes the hierarchy of the Semantic Network, and thus, decides the most specific type that is assigned to the Metathesaurus concept. Besides the isa relation, the five categories of non-hierarchical relations are physically related to, spatially related to, temporally related to, functionally related to, and conceptually related to. The relation that is assigned between semantic

13 types in the higher hierarchy is inherited to all children of the semantic types. Figure 2.6 (UMLS Knowledge Sources Documentation 2005AB) illustrates a part of Semantic Network showing relations between semantic types. Semantic types and relations are provided by SRDEF table and the structure of the network is provided by SRSTR table.

Organism part of Property of Plant Organism Archaeon Fungus … Attribute Anatomical Alga evaluation of Structure Finding

Laboratory or Sign or Test Result Symptom

isa realtion non-isa realtion Figure 2.6 A UMLS Semantic Network example

The UMLS Metathesaurus together with the Semantic Network represent a biomedical knowledge resource that attempts to standardize the semantics of the various terms from different biomedical vocabularies and capture relationships between those terms, which might be both within and across vocabularies. The SPECIALIST Lexicon contains many biomedical terms and functions as an English lexicon. It is designed to facilitate the SPECIALIST Natural Language Processing System (NLP). The lexicon consists of lexical entries for spelling variants of a word. Each lexicon entry includes the syntactic, morphological, and orthographic information.

14 3 Semantic Similarity 3.1 Motivation Determining the semantic similarity between lexical words has a long history in philosophy, psychology, and artificial intelligence. “Semantics,” as compared to “syntactics” which refer to the characteristics of a sentence, is the study of meanings of linguistic expressions. A primary motivation of measuring semantic similarity comes from Natural Language Processing (NLP) applications, such as word sense disambiguation, text summarization and annotation, information extraction and retrieval, automatic indexing, and lexical selection (Budanitsky 1999). For instance, traditional information retrieval techniques use the occurrence of query terms in documents to retrieve related documents. By implementing semantic similarity measures, queries may be reformulated to include a set of words that are semantically related to the query terms to achieve a more flexible and precise information retrieval. Although NLP applications have served as a motivation for measuring semantic similarity, a new and important use of semantic similarity measures is in the process of determining the degree of interoperability across ontologies, establishing mappings between ontologies, and merging and integrating ontologies from various information systems. In the environments with multiple information systems, different systems usually have their own ontology even if they are in the same domain. Level of details and logic vary between different ontologies. For the Semantic Web, data comes from different ontologies, and information processing across ontologies is not possible without knowing the semantic mappings between them.

3.2 Semantic Distance, Similarity and Relatedness Measures Three different kinds of measures: similarity, distance, and relatedness have been used. Semantic relatedness involves several different types of relationships, such as meronymy (or partOf relation, such as a wheel is partOf a car), antonymy, synonymy, subsumption (or subclassof relation, such as a car isa vehicle), etc. Semantic similarity is a special case of relatedness that uses synonymy and also the subsumption relationship. Depending on the context or the importance of a particular kind of relationship, the terms

15 car and gasoline may appear to be more closely related than the terms car and bicycle, even though car and bicycle are more similar with respect to a subsumption relationship. Much of the research literature supports the view that distance measures the opposite of similarity. Semantic distance could be used with respect to distance between related concepts and distance between similar concepts. In our research, semantic distance is used for the inverse of both semantic similarity and semantic relatedness. Given semantic similarity or relatedness value x, semantic distance is defined as (the subscript is used if it is necessary to distinguish between the two): –1 distsim(x) = semantic similarity (x) (1) –1 distrel(x) = semantic relatedness (x) (2) Although there have been numerous proposals for semantic distance, similarity, and relatedness measures, the majority of these have been based on two underlying approaches: distance-based within a network structure and information content-based on a common parent between concepts. Many of these measures were initially defined using the context of the WordNet ontology.

3.2.1 Network Distance-Based Approach One of the earliest approaches to determine semantic distance in an ontology is to measure the distance between the nodes corresponding to the words or concepts being compared. The number of edges in the shortest path between the two concepts measures the distance between them. The shorter the distance the more similar the concepts. It is also called simple edge counting. This first approach was based on a hierarchical isa semantic network (Rada 1989). Although this network distance approach is intuitive and direct, it is not sensitive to the depth of the nodes for which a distance is being calculated. For example, in the following hierarchy shown in Figure 3.1, the distance between Plant and Animal is the same as the distance between Horse and Zebra even though the length of the edges between Horse and Zebra naturally seem much smaller than those between Plant and Animal.

16

Living thing Animal Plant

….. ……

Equine

Horse Zebra

Figure 3.1 An example of a hierarchy

Intuitively, an increase of depth should decrease the distance between the two concepts c1 and c2 that are at similar depths in the hierarchy. To overcome this limitation of equal distance edges, some proposed methods to weight each edge. Weights can be determined by the depth, the density of a node in the hierarchy, the type of link, and the strength of an edge link. Richardson and Smeaton (1995) suggested that the greater the density the closer the distance between the nodes. Sussna (1993) scaled by the greater depth of the two nodes c1 and c2. Leacock and Chodorow (1998) incorporated the total depth of the taxonomy when calculating the distance:

simLC (c1,c2) = -log(len(c1,c2)/2D)) (3) where len(c1,c2) is the shortest path between c1 and c2. D is the overall depth of the taxonomy. Another variation is based on the lowest common subsumer concept c3 between concepts c1 and c2. For example, in Figure 3.2, the lowest common subsumer between nickel and dime is coin and between dime and credit is medium of exchange. A semantic similarity based on the lowest common subsumer (Wu and Palmer 1994) is given as:

sim WP (c1, c2) = 2*len(root, c3)/(len(c1, c3) + len(c2, c3) + 2*len(root, c3)) (4) The previously discussed measures are based on using only subsumption, i.e., isa relationship between concepts. Another proposal includes all relation types and thus is considered a measure of semantic relatedness. Two concepts are semantically related if they are connected by a path that is not longer than an arbitrary fixed constant C and whose direction does not change too often (Hirst and St-Onge 1995):

relHS(c1, c2) = C – len(c1,c2) – k*dirChanges(c1,c2) (5)

17 where k is another constant, and dirChanges(c1,c2) is the number of changes of direction in the path. This method is similar to network distance in that it assumes uniform distance for each edge between nodes regardless of the type of relationship the edge represents or its depth in the hierarchy.

3.2.2 Information Content-Based Approaches The foundation for this approach is the insight that conceptual similarity between two concepts c1 and c2 may be judged by the degree to which they share information (Resnik 1995). The more information they share then the more similar they are. In an isa hierarchical network, this common information is contained in the most specific concept that subsumes both of c1 and c2, the lowest common subsumer c3. The similarity value is the information content value of c3. For example, Figure 3.2 shows a fragment of the WordNet ontology.

Medium of exchange

Money Credit Cash

Coin Credit Card

Nickel Dime

Figure 3.2 A fragment of the WordNet ontology

The most specific superclass or the lowest common subsumer for nickel and dime is coin, and for nickel and credit card is medium of exchange. The semantic similarity between nickel and dime should be determined by the information content of coin and that between nickel and credit card should be determined by the information content of medium of exchange. According to standard information theory (Ross 1976), the information content of a concept c is defined as -log p(c), where p(c) is the probability of encountering an instance of concept c in a certain corpus. As one moves up the taxonomy, p(c) is monotonically nondecreasing. This probability is calculated as:

p(c) = freq(c)/N (6)

18 where N is the total number of words in the corpus and freq(c) is defined as: freq(c) = Σcount(n) (7) n∈ words(c) where word(c) is the words subsumed by concept c. Since each instance of a concept that occurs in the corpus is counted as an occurrence of its subsuming concept, as the probability increases, informativeness decreases. From the perspective of semantic similarity, it means that the higher the position of c in taxonomy, the more abstract c, and therefore, the less information content. Thus, c1 and c2 shares less information and have a lower semantic similarity. Resnik (1995) defines the semantic similarity between c1 and c2 as simRES(c1,c2) = max [-log p(c) ] (8) c in S(c1, c2) where S(c1,c2) is the set of concepts that subsume both c1 and c2. To maximize equation (8), c is the lowest concept in the isa hierarchy that subsumes both c1 and c2, i.e., the lowest common subsumer. The network distance approach actually captures some of the information content approach indirectly. In network distance approach, if the shortest path between c1 and c2 is long, it is necessary to go high in taxonomy to find the lowest common subsumer, a more abstract concept. Comparing with network distance approach, information content approach proposed by Resnik is less sensitive to the structure of the ontology. It also only takes commonality of two concepts into account, while ignoring the difference. If two pairs of concepts in a sub-hierarchy have the same lowest common subsumer, they always have the same semantic similarity. For example, in Figure 3.2, nickel and credit have the same semantic similarity as money and credit yet intuitively money and credit are more similar than nickel and credit. Approaches to overcome this limitation incorporate an assessment of the differences between the two concepts. Lin (1998) argued that the similarity between c1 and c2 should be the ratio between the amount of information in common and the amount of information that separately describes c1 and c2. Shared information content in the most specific subsumer c3 is normalized by the sum of the information content of both concepts. The equation is defined as:

simLin(c1,c2) = 2log(p(c3)/[log p(c1)+log p(c2)] (9)

19 The approach taken by Jiang and Conrath (1997) attempted to combine both network distance and information content methods by first relying on network distance and then adding information contents as a decision factor. In their approach, the conditional probability of encountering an instance of the child concept given an instance of a parent concept is used to calculate the information content. Therefore, not only the lowest common subsumer, but also the two nodes, c1 and c2 are taken into account. If probabilities are assigned as described above for Resnik’s method, the distance between c1 and c2 with c3 as the lowest subsumer becomes:

distJC (c1, c2) = 2log (p(c3)) – (log(p(c1) + log(p(c2)) (10) This approach measures semantic distance, the inverse of similarity. In Jiang and Conrath’s method, it calculates the distance instead of similarity. To ensure the coherency in the implementation, a linear transformation (Seco, Veale, and Hayes 2004) was applied to obtain the similarity values. The formula is: Sim (c1, c2) = 1- (2log (p(c3)) – (log(p(c1) + log(p(c2)))/2 (11) The major difficulty in calculating semantic similarity measures based on information content is determining the probabilities for concepts using a corpus. For the same word, related to a concept, different corpuses retrieve different probabilities. In addition, in most cases, frequency count is only based on a word, and not a word sense. For example, the word “proposal” can refer to the concept “marriage” and a “written document.”

3.3 Determining Information Content Measures The traditional way to determine the information content of each concept is to calculate the probability of the concept by counting the noun frequency from a specified corpus. However, this approach has its disadvantages. First, the probability of the same word varies from different corpora. Second, this method simply counts word occurrences without taking the word sense into account, and therefore, the information content for words with more than one sense is not precise. Seco, Veale and Hayes (2004) proposed a novel metric for information content calculation in WordNet. They argued that WordNet itself could be used as a statistical resource with no need for external ones such as corpora. They creatively exploit the WordNet taxonomy to compute the probabilities needed for semantic similarity calculations. The assumption is that the taxonomy of

20 WordNet is organized in a meaningful and structured way. Concepts with many hyponyms communicate less information than leaf concepts; therefore, the more hyponyms a concept has the less information it expresses. Based on these assumptions information content for a concept is computed as follows:

ICwn(c) = log [(hypo(c) +1)/maxwn]/ log(1/ maxwn) = 1- log(hypo(c)+1)/log(maxwn) (12)

where hypo(c) is the number of hyponyms of concept c and maxwn is the maximum number of concepts in the taxonomy. According to this formula, information content values of all leaf nodes are 1, and the value decreases monotonically from leaf nodes to root nodes. The evaluation of their approach is described in section 4.1

4 Other Researchers’ WordNet and UMLS Experiments The primary approach that has been used to validate these various measures of semantic similarity, relatedness and distance has been through experiments that calculate these measures on word pairs. The resulting measures are then compared to those of human judgments on these same pairs. The principal ontology that has been used in these experiments has been WordNet. Recent experiments have used UMLS for evaluating the simple edge counting measure. The previous experiments found in the research literature using these two ontologies are described below.

4.1 WordNet Experiments To evaluate the performance of different semantic similarity measures, various researchers have carried out experiments using WordNet. Most experiments have used a set of 30 noun pairs (see Tables 5.3 and 5.4) used in an experiment by Miller and Charles (1991). In this experiment, the similarity of meaning of each of these 30 noun pairs was rated by human judges. For each pair, a semantic similarity is specified in a range of 0 to 4 where 0 means no similarity and 4 means perfect synonymy. These 30 noun pairs were extracted from 65 noun pairs that were used to obtain synonymy judgments by Rubenstein and Goodenough (1965). For the purpose of evaluation, most experiments assessing the performance of various semantic similarity measures compared their

21 computational results with human experts’ ratings (Budanitsky 1999). After computing the semantic measures for all of those pairs, Pearson coefficients of correlation between the human ratings and semantic similarity measures were reported. In 1995, Resnik first evaluated his information content method by comparing it with the simple edge counting measure. His results showed that his method had a higher correlation than the simple edge counting method. Jiang and Conrath (1997) and Lin (1998) evaluated their own revised information content methods by using the same test cases as in Resnik’s experiments using WordNet. Jiang and Conrath compared their method with Resnik’s and simple edge counting method, and got the results that the correlation of their method is the highest among the three, followed by Resnik’s method as the second highest, and simple edge counting method as the lowest. Lin compared three information content methods including Lin’s, Resnik’s, and Wu-Palmer’s, and ranked the correlation from the highest to the lowest as Lin’s method, Wu-Palmer’s method, and Resnik’s. Other researchers investigated the use of their particular measure within WordNet for a particular application such as Hirst-St-Onge (1995) for malapropism detection and Leacock-Chodorow (1997) for word sense identification. To validate the new information content measure and approaches described in section 3.3, experiments were performed that substituted this measure of information content in place of Resnik’s information content measure within the semantic similarity measure (Seco, Veale, Hayes 2004). The experiment used the same 30 noun pairs provided by Miller and Charles to validate the semantic similarity measures based on using the new information content measure described in section 3.2.2. The new information content semantic similarity measures produced coefficients of correlation with human judgments that were almost identical to those of the information content semantic similarity measures using information content measures based on probabilities obtained from a document corpus. (Table 1 in Seco, Veale, and Hayes 2004). The values produced for the same semantic similarity measure on the same word pair are different for the numerous WordNet experiments described above due to several reasons. Since for information content methods, different corpora were used in the individual experiments to determine word frequencies (Resnik 1995, Jiang and Conrath 1997), concept probabilities could be different and, therefore, cause differences in the

22 information content measures. For Seco, Veale and Hayes (2004), they applied a totally different way to calculate the information content of each concept depending on the WordNet ontology itself without referring to any external resource as described in 3.3. Each of the previous experiments also used different versions of WordNet: WordNet 1.4 for Resnik (1995), WordNet 1.5 for Jiang and Conrath (1997), and WordNet2.0 for Seco, Veale and Hayes (2004).

4.2 Semantic Similarity Experiments in UMLS The majority of research evaluating semantic similarity measures has focused on using WordNet for the ontology and only recently have others begun to use different ontologies. One of the few experiments assessing semantic similarity in a biomedical domain ontology (Caviedes and Cimino 2004) used the UMLS ontology to examine the performance of only the simple edge-counting distance (shortest path) measure. Their experiments are based on three source vocabularies in Metathesaurus. Those three vocabularies used in these experiments are described below: ƒ MSH, the Medical Subject Heading standard terminology. ƒ SNMI, the Systematized Nomenclature of Medicine, also referred to as SNOMED. ƒ ICD9CM, the International Classification of Diseases, ninth revision, Clinical Modifications. The first experiment computed conceptual distances between related concepts within the three previously described vocabularies. As done in the WordNet experiments, the calculated conceptual distances were compared against the distance measures scores provided by three domain experts. They were not trying to evaluate the performance of the shortest path approach but use this semantic similarity measure to determine which source vocabulary had the best correlation with the human judgment scores. Eleven concepts were used in this experiment as shown in Table 4.1:

23 CUI Name C0002962 Angina pectoris C0003811 Arrhythmia C0007912 Cardiomyopathy, alcoholic C0010068 Coronary diseases C0018799 diseases C0018834 Heartburn C0018802 Heart failure, congestive C0000737 Abdominal pain, unspecified site C0020621 Hypokalemia C0030631 Passive-aggressive personality disorder C0035238 Respiratory system abnormalities

Table 4.1 11 related concepts in UMLS vocabularies

In this experiment, 55 or less (some concepts do not exist in all three of the different vocabularies) conceptual distances were computed between the pair of concepts listed in the Table 4.1 based on PAR (parent) relation. These distances were calculated in each of the three separate source vocabularies, MSH, SNMI, and ICD9CM and also in a joint MSH-SNMI vocabulary. Their experiment produced the ordering (from best to worst) of MSH, joint MSH-SNMI, ICD9CM, and SNMI based on the correlation with expert judgment. In addition, another set of conceptual distances in joint MSH-SNMI were computed based on RB (broader-than) relation. The correlation was higher than SNMI and lower than ICD9CM. The next experiment examined the conceptual distances between unrelated concepts to evaluate the behavior out of normal range. The concepts used in this part of the experiments are listed in Table 4.2:

24 CUI Name C0878705 Plica syndrome C0878707 Precipitous drop in hematocrit C0878752 Abnormal loss of weight and underweight

C0878754 Genetic counseling and testing on procreative management C0878756 Perpetrator of child and adult abuse

C0878757 Child battering and other maltreatment by father, stepfather, or boyfriend C0917805 Transient cerebral ischemia C0917967 Papillary functions, abnormal C0920296 Reading disorder, developmental C0936250 Eczema herpeticum C0949122 Acute laryngitis without mention of obstruction

Table 4.2 unrelated concepts

The results were a 7 or larger conceptual simple edge counting distance for those pairs compared to 6.7 or larger for human judgment results. The final experiment carried out by Caviedes and Cimino measured the conceptual distance among clusters of concepts in MESH and SNMI. The distances among four clusters with three concepts contained in each cluster were computed and evaluated with expert scores. They recorded high correlation with expert scores. Those experiments also “re-discovered” the problem of using the simple edge- counting distances, i.e., “pairs of closely related general concepts are less similar than pairs of more specific concepts.” (Caviedes and Cimino 2004)

5 Experiments Using the Protégé Testbed

5.1 Overview Currently, most of the experiments to evaluate the performance of semantic similarity use WordNet as the test ontology. A few are just beginning to experiment with

25 medical ontologies such as UMLS. These experiments are performed using software that is dependent on the particular ontology being used to determine semantic similarity. One objective of this research is the development of an experimental testbed and automated methods to permit evaluating the validity and applicability of semantic relatedness measures on a variety of different categories of ontologies and across different domain ontologies. This testbed permits the efficient development of other new measures of semantic relatedness. This testbed is the basis for the experiments comparing previously described semantic similarity measures on a recent version of WordNet and on subsets of the UMLS ontology. The primary resources that are used in this research are the Protégé 3.1 ontology editor and the two different ontologies, WordNet and UMLS. Six semantic relatedness measures (specifically semantic similarity measures since only the isa taxonomic link has been used): Resnik, Lin, Jiang and Conrath (JC), Wu-Palmer (WP), Leacock-Chodorow (LC), and simple edge counting (SEC) measures have been implemented within Protégé as a tab plugin to compute semantic similarity between concepts in WordNet and between concepts in MSH, SNMI, and ICD9CM UMLS source vocabularies. The first step in our experiments was to verify our implementation of the six measures. We imported an OWL version of WordNet 1.7 into Protégé 3.1 and used the same 30 noun pairs in previous experiments to test our code to see if we could obtain similar results as those published in the above described papers. Similar results refer to close similarity values for each pair, close coefficients of correlation between Miller and Charles, or the same ranking for all of four methods as summarized by Budanitsky and Hirst (2001). When computing the information content of each concept, we used the ontology-derived information content measure as described in section 3.3. Our experiment results were compared to Seco, Veale, and Hayes’s (2004) results. In the second step, we applied our experimental testbed to MSH, SNMI, and ICD9CM in UMLS to evaluate the six semantic similarity measures. Since most of the previous experiments are based on the WordNet ontology, our experiments are the first to investigate the performance of these measures within an entirely different ontology. Our experiments rely on the test cases developed to evaluate the simplest semantic distance based measure, the simple edge-counting measure within the UMLS ontology as

26 described previously in section 4.2. We evaluate multiple measures from both the network distance-based and information content-based measures using the information content measure described in section 3.3. After these evaluations, the overall performance of the various semantic similarity measures was analyzed by comparing within an ontology and also within each vocabulary and across the two different ontologies of WordNet and UMLS.

5.2 Difficulties in Acquiring WordNet and UMLS Ontologies In undertaking this research, the first step taken was to verify our implementations of the various semantic similarity measures. We began by importing WordNet into Protégé so that previous experiments of Seco et al (2004) could be preformed using our implementations. Currently, Protégé 3.1 supports several file formats: RDF, OWL, XML and .pprj (default Protégé project format). Great efforts have been made to find a suitable format of WordNet and UMLS that could be loaded into Protégé. For the WordNet Ontology, we found that W3C published several projects which convert WordNet into RDF and OWL formats. But most of them are not suitable for this research because some of those projects are only partly done and others do not have the parent (hypernym) and children (hyponym) properties defined for each lexical concept. Those two properties are indispensable in calculating semantic similarity between concepts. After examining all possible versions of WordNet, we finally decided to import an OWL version of WordNet 1.7 implemented by knOWLer project, Neuchatel University (http://taurus.unine.ch/knowler/wordnet.html). The reason for selecting this implementation is that in this project WordNet 1.7 is completely converted. Not only are all lexical concepts in WordNet 1.7 defined, but all properties’ values of each lexical concept are assigned. Therefore, the needed relationships hypernym and hyponym for calculating semantic similarity measures are available. The UMLS experiments are based on the Metathesaurus. Three source vocabularies, ICD9CM, SNMI, and MSH, which are subsets of the Metathesaurus, were used. We could not find any proper versions in a suitable format for those three source vocabularies that could be imported to Protégé. Therefore, we developed our own OWL version of these source vocabularies using the MRCONSO and MRREL files. To produce an importable OWL file for each vocabulary, first, all lexical concepts were

27 retrieved by reading the SAB (source abbreviation) from the MRCONSO file. SAB for those three source vocabularies are ICD9CM, SNMI, and MSH. MRCONSO also maps the string identifiers (SUI) which are instances of StringObject class to their lexical concepts (CUI). Values of each property were determined, specifically the CHD (child) and PAR (parent) properties, by reading the corresponding rows in the MRREL file. Those values were extracted from the MRREL file. We would like to acknowledge the contribution of Sam Holton who converted the three vocabularies into OWL files. Another difficulty in importing ontologies was caused by the huge size of those two ontologies. WordNet is more than 65MB. For those three UMLS vocabularies, ICD9CM is 22MB, and MSH and SNMI are more than 100MB. When trying to import WordNet into Protégé, the out of memory error always occurred. To solve this problem, the maximum heap size defined in Protégé was reset to 1.5GB so that more memory could be allocated to the heap and used on the importing task. But it was still not enough to avoid the error. In our experiments, a computer with 2GB memory was used to avoid the out of memory problem.

5.3 WordNet and UMLS Ontology Structure In our experiments, WordNet and UMLS vocabularies imported into Protégé 3.1 are expressed as extensional ontologies in the format of OWL. Their intensional ontologies as represented in OWL files are described below.

5.3.1 WordNet in OWL In this OWL version of WordNet 1.7, there are ten classes. The hierarchical structure is shown below where an arrow represents the isa relationship:

28 Owl:Thing

LexicalConcept WordObject

Nouns_and_Adjectives Adverb Nouns_and_Adverbs

Adjective Noun Noun Verb

AdjectiveSatellite

Figure 5.1 Hierarchical structure of WordNet’s intensional ontology

All words appearing in WordNet are defined as instances of WordObject class. There are 140355 instances of WordObject. WordObject properties include antonymOf, inSynsetOf, participleOf, pertainsTO, seeAlso, and Synset. Each WordObject instance is associated with one or more lexical concepts. Lexical concept class includes all WordNet synsets. Each synset is defined as a lexical concept with a unique ID assigned to it. All lexical concepts are instances of Adjective, Noun, Verb, and AdjectiveSatellite class. Words used in this experiment are all nouns, so that only lexical concepts for the noun class were involved in calculations. There are 75804 noun concepts. In the WordNet ontology, all direct parents of the concept are given as values of the HyponymOf (parents) property, and all direct children of the concept are given as values of the HypernymOf (children) property. The multi-valued property wordForm for the lexical concept class takes as its values all WordObject IDs associated with the lexical concept. This set of associated WordObjects is referred to as a synset in WordNet. On the interface of the plugins described in section 5.4, the user can choose to enter either the word or the concept ID. If the word is entered, the lexical concept ID corresponding to the synset it belongs to is retrieved. Semantic similarity calculations are performed using lexical concepts. If there is more than one concept related to the word, all related concepts are retrieved, and the

29 semantic similarity values of all related concepts are calculated. The highest value is considered as the final semantic similarity measure for the word pair.

5.3.2 UMLS in OWL The UMLS experiments were carried out using its Metathesaurus component. The Metathesaurus is a large database which contains multi-lingual vocabularies about biomedical and health related concepts, concept’s names, and relationships between concepts. In the Metathesaurus, there are more than 100 source vocabularies which are classifications, lists of controlled terms, terminologies, etc. (UMLS Knowledge Sources Documentation 2005AB). In our experiments, semantic similarities between concepts in three source vocabularies, ICD9CM, SNMI, and MSH, were calculated. We followed the same hierarchical structure for the Metathesaurus intensional ontology as shown below:

Owl:Thing

UMLSConcept StringObject

CUI

Figure 5.2 Hierarchical structure of intensional UMLS Metathesaurus

UMLSConcept class corresponds to LexcialConcept class in WordNet which contains all synsets. Each synset is considered as a concept with a unique ID. StringObject class stores all string names and their string IDs. It is equal to the WordObject class in WordNet. Each string name is associated to a concept. All string IDs, string names, concept IDs, and properties are extracted from MRCONSO.RRF table and MRREL.RRF table. MRCONSO.RRF table relates the string name to its associated concepts. It also clarifies the source vocabulary (SAB) of the concepts. All direct parents of the concept are given as values of the PAR property, and all direct children of the concept are given as values of the CHD property. The CHD and PAR extracted from MRREL.RRF table corresponds to HypernymOf and HyponymOf in WordNet.

30 5.4 Testbed Description Two plugins were developed as the testbed for this research. One is the integrated plugin which is used to interactively retrieve the semantic similarity measure of user- entered word pairs. It implements the six semantic similarity measures: simple edge counting (SEC), Leacock-Chodorow (LC), Wu-Palmer (WP), Resnik, Lin, and Jiang and Conrath (JC). Figure 5.3 shows the interface of this plugin. These interfaces were not designed specifically for this research, but for calculating the semantic relatedness of an arbitrary ontology that has the similar structure as described in section 5.3.1 and section 5.3.2. This interface is primarily used for testing new implementations of semantic relatedness measures.

Figure 5.3 The interface of integrated tab plugin

Because different ontologies have different class names and property names, to make the plugin more generic so that it can be used across different ontologies, several arguments need to be selected from the plugin interface by the user. First, the user should select the class which contains concept instances.

31 The second two arguments are two properties. Both information content and network distance methods need to use the values in two properties: parent property and children property. The parent property is used to get the least subsumer of two concepts and find the root nodes of the taxonomy. The children property is used to obtain the information content value of each concept and find the leaf nodes of the taxonomy. The least subsumer is determined using the following method. After the two concepts related to the word pair are determined, all parents (subsumers) of these two concepts are obtained by searching values in the parent property of the given concepts. The search is recursive. If any of those parents still have their own parents, then, those parent values are also considered as the parent of the retrieved concept. The search ends until the root concept is found. After obtaining all parents of two concepts, the common parents are saved. For each of those common parents, the information content value is calculated using Seco, Veale and Hayes’s (2004) formula discussed in 3.3. The number of children of the concept is given by counting values in children property. It is also a recursive process. The search for children is finished when the leaf concept is reached. The common parent with the highest information content value is the least subsumer for information content measures. If Lin or Jiang and Conrath’s measure is selected by the user, the information content value of each concept should also be calculated. For network distance measures, the least subsumer is the one that has the shortest distance between two concepts. To enter word pairs, the user also needs to select the class which defines all words as instances, and the property of that class that can map the word to its associated concept(s). Then, the user can enter the word pairs either by typing the word or by choosing the word from the list in the pop-up window (shown in Figure 5.4). All word pairs are listed in the lower-left box below the mapping property. When there are several concepts associating with each word in the pair, all combinations of those concepts will be returned. As shown in Figure 5.4, five concepts are associated with gem, and 2 concepts are associated with jewel. So there are 10 combinations returned. The last argument specifies if any cycles exist in the ontology. Cycles exist in all three UMLS vocabularies, but not in WordNet. Different algorithms were implemented to handle different conditions and are described in Section 5.6.1. The user also needs to

32 select which measures to use from those listed in the right pane of the interface by checking the box in front of the measure. Having all arguments set up, the semantic similarity values can be obtained by clicking the “Get Value” button below the methods. Then, values will be returned in the table below the button.

Figure 5.4 The pop-up window used for entering word pairs

The Table 5.1 lists all arguments for WordNet experiment and UMLS experiment. Figures 5.5 (a) and (b) show the screenshots of WordNet experiment and ICD9CM experiment respectively with all the arguments entered.

Arguments WordNet UMLS

Concept class Noun CUI

Parent property hyponymOf PAR

Children property hypernymOf CHD

Word class WordObject CUI

Mapping property wordForm StringForm

Does the ontology have any cycle? No Yes

Table 5.1 Arguments of WordNet and UMLS experiments

33

(a). ICD9CM arguments (b). WordNet arguments Figure 5.5 Screenshots of WordNet and ICD9CM experiments

The other plugin is a test plugin. The screenshot is shown in Figure 5.6. The user is able to break down the calculations into several steps, and retrieve the results in each step. This plugin can be used to verify experiment results. As in the integrated plugin, the user still needs to choose several arguments: the concept class, parent property and children property. The functions of the test plugin include retrieving: • associated concept/concepts of the given word • children concepts of the given concepts • parent concepts of the given concepts • common parents of two concepts • the maximum information content value of common parents • the information content value of a given concept • the overall depth of the taxonomy • root nodes of the taxonomy

34

Figure 5.6 The interface of Test plugin

For both information content and network distance measures, the least subsumer of the word pair is retrieved for the calculations. Both of these two plugins are based on the extensional ontology. In other words, all experiments calculated similarity of instances, not the class. We used these two plugins for both WordNet and UMLS experiments. These two plugins are suitable not only for WordNet and UMLS ontologies, but for all extensional ontologies that have the similar terminological structure. This capability is provided in the user interface by selecting the class and the relationships to use for the extensional taxonomic structure. To make the experiments repeatable, another ontology, TestCase, was created to save and load test cases used in the experiment and to store their results. Figure 5.7 shows the hierarchical structure of this ontology. Classes and their associated properties are listed in the Table 5.2. The function of each property is explained as following: • measuresName: save the name of methods used in the experiment • ontologyExtensional: if the ontology is an extensional ontology, save the file name in the property • ontologyIntensional: if the ontology is an intensional ontology, save the file name in the property

35 • relationName: name of relations used in the calculation, such as HyponymOf. • similarityValues: for each pair, save the similarity values of each method • pairName1: the name of concept 1 • pairName2: the name of concept 2 In our experiments, all word pairs can be saved as instances of InstancePairs class. The number of instances of InstancePairs is equal to the number of word pairs. For each test case, there is one instance of Definition and one instance of Measures. By clicking the load button on the integrated interface, the user can load the previous word pairs by reading instances of InstancePairs class. By clicking the save button, the current word pairs and semantic relatedness values are saved in the TestCase ontology for future uses

TestCase

Definition Measures TestPairs

ConceptPairs InstancePairs intensional extensional Figure 5.7 Hierarchical structure of TestCase Ontology

Class Properties Function TestCase Definition measuresName, ontologyExtensional, Save the name of measures and the OntologyIntensional ontology file Measures relationName, similarityValues Save the name of relations and similarity values TestPairs pairName1, pairName2 ConceptPairs pairName1, pairName2 Save test pairs of intensional ontology InstancePairs pairName1, pairName2 Save test pairs of extensional ontology

Table 5.2 Classes and their associated properties of the TestCase Ontology

36 5.5 WordNet Experiment

5.5.1 Experiment and results The main purpose of this experiment it to evaluate the validity of our plugins. The semantic similarity values of 29 out of 30 word pairs in Miller and Charles’s (1991) experiment were obtained using three information content-based (IC) measures: Resnik, Lin, and Jiang and Conrath (JC), and three network distance-based (ND) measures: simple network distance (SEC), Wu-Palmer (WP), and Leacock-Chodorow (LC). Because the word “furnace” is not contained in WordNet 1.7, the semantic similarity value of the word pair “furnace” and “stove” could not be obtained in this experiment. Those values are also compared to human judgment scores published by Miller and Charles (1991). Table 5.3 for IC measures and Table 5.4 for ND measures list the semantic similarity values. Columns 3 through 6 are our results and columns 7 through 9 are from Seco, Veale and Hayes (2004).

1 2 3 Word Pair M&C Resnik Lin JC Avg IC ResnikSVH LinSVH JCSVH car, automobile 3.92 0.68 1 1 0.89 0.68 1 1 gem, jewel 3.84 1 1 1 1 1 1 1 journey, voyage 3.84 0.59 0.73 0.79 0.7 0.66 0.84 0.88 boy, lad 3.76 0.58 0.69 0.74 0.67 0.76 0.86 0.88 coast, shore 3.7 0.79 0.98 0.99 0.92 0.78 0.98 0.99 asylum, 0.94 0.97 0.97 madhouse 3.61 0.94 0.99 0.99 0.97 magician, wizard 3.5 0.74 1 1 0.91 0.8 1 1 midday, noon 3.42 1 1 1 1 1 1 1 food, fruit 3.08 0.05 0.12 0.63 0.27 0.05 0.13 0.63 bird, cock 3.05 0.4 0.6 0.73 0.57 0.4 0.6 0.73 bird, crane 2.97 0.4 0.6 0.73 0.57 0.4 0.6 0.73 tool, implement 2.95 0.42 0.93 0.97 0.77 0.42 0.93 0.97 brother, monk 2.82 0.19 0.22 0.33 0.24 0.18 0.22 0.33 crane, implement 1.68 0.24 0.37 0.59 0.4 0.24 0.37 0.59 lad, brother 1.66 0.19 0.21 0.28 0.22 0.18 0.2 0.28 journey, car 1.16 0 0 0 0 0.1 0 0 monk, oracle 1.1 0.19 0.22 0.35 0.25 0.18 0.22 0.34 cemetery, woodland 0.95 0.05 0.06 0.19 0.1 0.05 0.06 0.19

1 M&C: data obtained by Miller and Charles (1991) regarding human judgments 2 Average IC: average of semantic similarity of each concept pair over three information content measures

3 SVH: data obtained by Seco, Veale, and Hayes (2004)

37 Word Pair M&C Resnik Lin JC Avg IC ResnikSVH LinSVH JCSVH food, rooster 0.89 0.05 0.08 0.4 0.18 0.05 0.08 0.4 coast, hill 0.87 0.51 0.63 0.71 0.62 0.5 0.63 0.71 forest, graveyard 0.84 0.05 0.06 0.19 0.1 0.05 0.06 0.19 shore, woodland 0.63 0.08 0.1 0.29 0.16 0.08 0.11 0.3 monk, slave 0.55 0.19 0.23 0.38 0.27 0.18 0.23 0.39 coast, forest 0.42 0.08 0.1 0.28 0.15 0.08 0.1 0.29 lad, wizard 0.42 0.19 0.22 0.35 0.25 0.18 0.21 0.32 chord, smile 0.13 0.26 0.29 0.36 0.3 0.25 0.28 0.35 glass, magician 0.11 0.08 0.11 0.34 0.18 0.18 0.2 0.31 noon, string 0.08 0 0 0 0 0 0 0 rooster, voyage 0.08 0 0 0 0 0 0 0 Table 5.3 Information content semantic similarity measures of WordNet experiment

Word Pair M&C WP LC LC(normalized) SEC Avg ND Car, automobile 3.92 1 5.83 1 1 1 gem, jewel 3.84 1 5.83 1 1 1 journey, voyage 3.84 0.91 3.09 0.53 0.92 0.79 boy, lad 3.76 0.89 3.53 0.61 0.92 0.81 coast, shore 3.7 0.89 3.53 0.61 0.92 0.81 asylum, madhouse 3.61 0.93 3.53 0.61 0.92 0.82 magician, wizard 3.5 1 5.83 1 1 1 midday, noon 3.42 1 5.83 1 1 1 food, fruit 3.08 0 1.58 0.27 0.46 0.24 bird, cock 3.05 0.93 3.53 0.61 0.92 0.82 bird, crane 2.97 0.82 2.43 0.42 0.77 0.67 tool, implement 2.95 0.89 3.53 0.61 0.92 0.81 brother, monk 2.82 0.44 1.92 0.33 0.62 0.46 crane, implement 1.68 0.6 2.14 0.37 0.69 0.55 lad, brother 1.66 0.5 2.14 0.37 0.69 0.52 journey, car 1.16 0 0 0 0 0 monk, oracle 1.1 0.36 1.58 0.27 0.46 0.36 cemetery, woodland 0.95 0 1.33 0.23 0.31 0.18 food, rooster 0.89 0 0.96 0.16 0 0.05 coast, hill 0.87 0.6 2.14 0.37 0.69 0.55 forest, graveyard 0.84 0 1.33 0.23 0.31 0.18 shore, woodland 0.63 0.29 1.92 0.33 0.62 0.41 monk, slave 0.55 0.5 2.14 0.37 0.69 0.52 coast, forest 0.42 0.25 1.73 0.3 0.54 0.36 lad, wizard 0.42 0.5 2.14 0.37 0.69 0.52 chord, smile 0.13 0.38 0.96 0.16 0.23 0.26 glass, magician 0.11 0 1.58 0.27 0.46 0.24 noon, string 0.08 0 0 0 0 0 rooster, voyage 0.08 0 0 0 0 0 Table 5.4 Network distance-based semantic similarity measures of WordNet experiment

4Average ND: average of semantic similarity of each concept pair over three network distance-based measures

38 When implementing network distance-based measures, several issues were considered. First, all of these measures implemented in the plugins need to count the length between two concepts c1 and c2, which can be expressed as len(c1,c2). len(c1,c2) is always the shortest path between c1 and c2. If c1 and c2 do not have any common parents, the similarity between the two concepts is 0. Second, because there are nine different noun trees in WordNet, there might be the situation that the least subsumer is contained in several different trees due to multiple word senses. In this case, the overall depth described in LC is the maximum depth of the taxonomy of all related trees. The length from the root to the least subsumer c3 used in the WP measure is the maximum depth from all roots of the least subsumer. Third, because simple edge counting method returns the value of distance between two concepts, and when comparing with the human scores which measure the similarity, the value of distance was converted to similarity by using the following equation: Similarity = 1 - distance/max of all distance values (13) Another problem determined for the LC measure is that when the minimum distance between concepts is 0, that is, they are the same concept, the semantic similarity value is infinity (log(0)=infinity). To avoid this problem, we arbitrarily used 0.1 instead of 0 to represent that minimum distance. We also compared our results with Seco, Veale and Hayes’s (2004) to see if our results are similar to theirs. We found that most of the word pairs have comparable semantic similarity values except for three word pairs: journey and voyage, boy and lad, and glass and magician. The semantic similarity values for Seco, Veale and Hayes are greater than our values. The upper difference is 0.18 for the pair “boy and lad”, and the lower difference is 0.03 for the pair “glass and magician”. The reason is that WordNet 1.7 is used in our experiment instead of WordNet 2.0 used in their experiment. The total number of lexical noun concepts in these versions is different: 79,689 in WordNet 2.0 and 75,804 in WordNet 1.7. Thus, the hierarchical structure is different and denser in WordNet 2.0. This greater numbers of concepts in WordNet 2.0 means that a word pair may have a different least subsumer and probably one that is lower in the hierarchy, thus having a higher information content and therefore a higher semantic similarity. Table 5.4 presents the least subsumer of those three pairs in two WordNet versions.

39 Word pair Least subsumer in WordNet 1.7 Least subsumer in WordNet 2.0 journey-voyage travel, traveling, traveling journey, journeying boy-lad male, male person male, male person glass-magician object, physical object causal agent, cause, causal agency

Table 5.5 Least subsumers of three word pairs in WordNet 2.0 and WordNet 1.7

It is obvious that the least subsumer for the first and last pairs in Table 5.5 are more specific in WordNet 2.0 than in WordNet 1.7 and thus have more information content. Although the word pair, boy and lad, has the same least subsumer “male, male person” concept in the two versions, the number of children of this concept is different in each version. This increase in the number of children also produces a different information content value. Table 5.6 lists the correlations with experts’ scores of our experimental results and Seco, Veale and Hayes’s results. They implemented the three information content semantic similarity measures using their new information content measure as described in section 3,3, and their published correlation values of the network distance based measures were obtained by their using an independent software package WordNet::Similarity developed by Siddharth Patwardhan and Ted Pederson (http://wn- similarity.sourceforge.net/)

Measure Correlation (our results) Correlation (Seco, Veale and Hayes) Resnik 0.79 0.77 Lin 0.83 0.81 Jiang and Conrath 0.85 0.84 Avg IC 0.84 0.83 Wu-Palmer 0.8 0.74 Leacock-Chodorow 0.79 0.82 Simple edge counting 0.77 0.77 Avg ND 0.82 0.82

Table 5.6 Correlations with Human judgment for Wordnet experiment

40 From our experiment results, we can say that our plugins implemented those measures correctly. The values produced by our plugins for the information content measures agreed quite closely with those obtained in the Seco, Veale and Hayes experiments as can be seen Table 5.6 except for the three previously explained word pairs. After verifying the validity of our plugins, we carried out another set of experiments based on the UMLS ontology.

5.5.2 Analysis For the three information content semantic similarity measures, our correlation with human judgment scores of semantic similarity from the highest to the lowest is ranked as JC, Lin and Resnik. This ordering agrees with that of Seco, Veale and Hayes results. The higher correlation for JC and Lin measures is as expected since as previously described Resnik’s measure only takes into account the commonality of the two concepts when determining a semantic similarity. Both JC and Lin measures incorporate the amount of information that separately describes the two concepts. For the three network distance-based measures, the ranking of measures using our correlation from the highest to lowest is WP, LC, and SEC. Although there is very little difference in the correlation values, if one considers the formulas for these three measures, however, this ordering reflects the sophistication of the normalization methods for the network distances. The WP measure incorporates the distance from the root to the lowest common subsumer which is specific to the selected word pair. The LC measure normalizes using the depth of the tree and is the same for all word pairs in the same tree. The SEC measure normalizes using the largest distance calculated and is the same for all word pairs. Comparing all six measures, the JC measure in both our experiment and that of Seco, Veale and Hayes has the highest correlation with human judgment on the WordNet word pairs. In addition, the correlation of Average IC (0.84) is higher than the correlation of Average ND (0.82),

41 5.6 UMLS Experiments

5.6.1 Experiments and results In the UMLS experiments, we calculated the same six semantic similarity measures used in the WordNet experiment on 55 pairs of concepts selected from three source vocabularies: MSH, ICD9CM, and SNMI. These were the same pairs used by Caviedes and Cimino (2004) who investigated the performance of the simple edge counting measure using these three UMLS vocabularies. Table 4.1 lists all 11 concepts and their names. The 55 word pairs were constructed by combining concept 1 with the other 10 concepts, combining concept 2 with the other 9 concepts, and so on. Therefore, there are 55 combinations for 11 concepts. Because the concept C0010068 is not contained in ICD9CM and SNMI, only 45 pairs were calculated for those two source vocabularies. The first step was to import the OWL files created for each vocabulary from the UMLS source files. Those three OWL files share the same structure as shown in Figure 5.2. Because the size of MSH and SNMI owl files are too big to be imported to Protégé, in our experiments, we deleted all instances of StringObject class and values in all properties except PAR and CHD which define the taxonomic structure for the MSH and SNMI vocabularies. Using this condensed version would not change the value of our results because only child and parent values were used in the calculations. One problem with the UMLS ontology is that cycles exist in all three vocabularies. These cycles might be considered an inconsistency in the vocabulary but we had to determine a method of handling it with respect to network distance-based measures. Figure 5.8 gives a cycle example in ICD9CM. In this case, the length retrieved between two concepts is always the shortest path. For example, the length between C0728936 and C0340270 is 1 (C0340270 Æ C0728936), not 2 (C0728936Æ C0348668Æ C0340270). If there are several root nodes in a cycle, as shown in Figure 5.9, the overall depth is the longest path from the leaf node to the one of nodes in the cycle. For example, if C0080022 is a leaf node, the overall depth of the taxonomy is 9 (C0080022Æ C1256741ÆC1256739ÆC1135584ÆC0038545ÆC0282503ÆC0920316ÆC0021425 ÆC0021423Æ C00221424). In this example the root node is C000221424. For cycles,

42 we defined the root node to be the node farthest from starting node until we reach a node that already existed in the path.

C0340270

PAR C0178237 C0348668

PAR PAR PAR C0728936

PAR

Figure 5.8 A cycle example in ICD9CM

PAR C0282503 PAR C0038545 C0920316

PAR PAR C1135584 C0021425

PAR PAR C1256739 C0021423 PAR C1256741 C00221424 PAR PAR PAR PAR C0080022 C0020158

Figure 5.9 A cycle example in ICD9CM

As done for the WordNet experiment, the semantic similarity measures were compared with scores from three medical domain experts. (Jorge Caviedes provided the experts’ scores after contacting him through email) Three experts were asked to provide a

43 measure of similarity from zero to any arbitrary number for the 55 pairs of concepts. Those scores are attached as Appendix A. In Caviedes and Cimino’s (2004) experiment, they used the same scores to compare with their results obtained by using the SEC semantic similarity measure. In our experiments, for each vocabulary, we compared the values obtained for each of the six measures with the experts’ scores. Because all expert scores measure the distance between two concepts, we converted the expert distance scores to similarity values by applying the following formula. The conversion ensures that all expert scores are in the range from 0 to 1 where 1 represents the most similarity and 0 represents no similarity at all. Similarity = 1- distance/ maximum of expert score (17) In addition for each word pair, we calculated the average expert similarity, the average of the three information content similarity measures methods and the average of the three network distance-based similarity measures AverageES = (expert1+ expert2+ expert3)/3 (14)

Average IC = (simResnik + simLin + simJiang-Conrath) /3 (15)

Average ND = (simLeacock-Chodorow + simWu-Palmer + simsimple edge couonting)/3 (16) Average expert similarity values were also compared with average of the information content similarity measures and average of the network distance-based similarity measures. The semantic similarity values obtained for all six measures for each of the three vocabularies are listed in Appendix B. Table 5.7, 5.8, and 5.9 show the Pearson correlation coefficients for the different semantic similarity measures with the expert similarity values for the three different vocabularies. Note that the values reported for SEC correlation with the average expert scores in the Caviedes and Cimino experiments were 0.77, 0.60, and 0.74 for MSH, SNMI, and ICD9CM, respectively. These values are close to the values reported in the following tables: 0.77, 0.63 and 0.70 respectively

Resnik Lin JC Avg IC SEC WP LC Avg ND Expert 1 0.67 0.72 0.71 0.71 0.71 0.64 0.74 0.74 Expert 2 0.73 0.77 0.75 0.76 0.73 0.75 0.75 0.76 Expert 3 0.61 0.65 0.58 0.63 0.57 0.61 0.62 0.62 Average ES 0.78 0.83 0.78 0.81 0.77 0.78 0.81 0.81 Table 5.7 Correlations for MSH

44 Resnik Lin JC Avg IC SEC WP LC Avg ND Expert 1 0.47 0.52 0.5 0.51 0.39 0.41 0.45 0.43 Expert 2 0.87 0.87 0.83 0.87 0.73 0.81 0.73 0.76 Expert 3 0.57 0.59 0.55 0.58 0.42 0.48 0.47 0.47 Average ES 0.79 0.81 0.77 0.8 0.63 0.7 0.68 0.69 Table 5.8 Correlations for SNMI

Resnik Lin JC Avg IC SEC WP LC Avg ND Expert 1 0.62 0.6 0.57 0.61 0.57 0.61 0.57 0.59 Expert 2 0.87 0.86 0.78 0.85 0.71 0.84 0.74 0.77 Expert 3 0.67 0.66 0.54 0.63 0.47 0.63 0.51 0.53 Average ES 0.87 0.86 0.76 0.85 0.7 0.84 0.73 0.76 Table 5.9 Correlations for ICD9CM

Table 5.10 is average for the three vocabularies of the average ES correlations.

Resnik Lin JC Avg IC SEC WP LC Avg ND Ave Corr Vocabs 0.81 0.83 0.77 0.82 0.7 0.77 0.74 0.75 Table 5.10 Average correlations for AverageES across all vocabularies.

An examination of the above three tables, i.e., for each vocabulary, shows that expert 2’s similarity judgment always has the highest correlation with each measure. For MSH, expert 3’s similarity judgment always has the lowest correlation. For SNMI, expert 1’s similarity judgment always has the lowest correlation. For ICD9CM the result is mixed. Expert 1’s similarity judgment has the lowest correlation for two of the three IC5similarity measures, and expert 3’s similarity judgment has the lowest correlation for two of the three ND6 similarity measures.

5.6.2 Analysis

5.6.2.1 Comparing results within and across each vocabulary For the MSH vocabulary Table 5.7 indicates that the Lin measure always has the highest correlation for the IC semantic similarity measures across all experts and the expert average. For the ND measures, the LC measure has the highest correlation or tied for highest. For the SNMI vocabulary Table 5.8 indicates that the Lin measure always has

5 IC: information content 6 ND: network distance

45 the highest or tied for highest correlation for the IC semantic similarity measures. For the ND measures, the SEC measure has the lowest correlation or tied for lowest. For the ICD9CM vocabulary Table 5.9 indicates that the Resnik measure always has the highest correlation for the IC semantic similarity measures. For the ND measures, the WP measure has the highest correlation. The SEC measure has the lowest correlation or tied for lowest. The following table ranks the performance with respect to correlation with the average expert similarity judgment and also includes in the ranking the average measures for the two categories of similarity measures, IC and ND

MSH SNMI ICD9CM Ave Rank Resnik 6 3 1 3.3 (3) Lin 1 1 2 1.3 (1) JC 6 4 5.5 5.2 (6) Avg IC 3 2 3 2.7 (2) SEC 8 8 8 8.0 (8) WP 6 5 4 5.0 (5) LC 3 7 7 5.7 (7) Avg ND 3 6 5.5 4.8 (4)

Table 5.11 Ranking of all six methods in each vocabulary according to Pearson Correlation

From the above table and the last row of Table 5.7, it can be seen that for the MSH vocabulary the correlation values for all six measures are very comparable. But as seen with the other two vocabularies, the SEC measure has the lowest correlation and the Lin measure has the highest correlation and agrees with SNMI, with Lin a close second for ICD9CM. For SNMI and ICD9CM, the set containing the three top ranking measures is the same: Resnik, Lin, and Avg IC. For SNMI and ICD9CM, the ranking is identical for the three bottom measures: Avg ND, LC, and SEC. To summarize, based on the average rank across all three vocabularies for each measure, the best performing with respect to the correlation with human judgment is the Lin. The worst performing one is the SEC measure, which was the only measure

46 evaluated in the Seco, Veale and Hayes experiments. The IC semantic similarity measures in general perform better than the ND semantic similarity measures for the three UMLS vocabularies. This conclusion is also supported by the fact that for all vocabularies, the Avg IC measure ranks the same or substantially higher than the Avg ND measure. To compare the performance of information content measures and the network distance-based measures, we averaged correlations for all IC measures and for all ND measures separately for each vocabulary, and the results are shown in the following table:

IC measures ND measures MSH 0.72 0.71

SNMI 0.68 0.58

ICD9CM 0.73 0.66

Table 5.12 Average correlation of different measures

This table shows that the average correlation of IC similarity measures is always higher than that of the ND similarity measures although the difference is slight in MSH. A possible reason is the size of MSH is much bigger than the size of SNMI and ICD9CM. MSH has a more complex hierarchical structure than the other two. More layers in MSH than in ICD9CM and SNMI increases its overall depth. This greater depth provides the ND measures with the ability to better distinguish between the similarities of the concept pairs. Thus, the ND similarity values have a higher correlation with experts’ scores.

5.6.2.2 Comparing results across UMLS and WordNet We cannot absolutely find a semantic similarity measure that performs the best on both the UMLS vocabularies and WordNet. For UMLS, the Lin measure is the best performer as see in Table 5.11 but it is slightly the second best performer on the WordNet ontology. For WordNet, the JC measure has the highest correlation; however, it is ranked 6th for the UMLS ontology when looking at its average ranking across all three vocabularies in Table 5.11. From looking at both Table 5.6 and Table 5.11, the SEC measure is the worst for both the UMLS vocabularies and WordNet. For the network distance-based measures, the WP measure is the best for both WordNet and UMLS.

47 The following table includes the first and second columns of Table 5.6 and the row of Table 5.10 as its third column. When comparing the two different categories of semantic distance measures, in general the IC measures are better than the ND measures. For both the UMLS and WordNet ontologies, all three of the IC based measures for UMLS and the Lin and JC measures for WordNet, have correlations that are higher than or equivalent to those of the network distance-based measures. The correlation of the Resnik measure for WordNet is only slightly lower than that of Wu-Palmer. It is interesting to note that the correlation values with human judgment for the two different ontologies are quite close for all measures except JC and WP

Measure WordNet Correlation UMLS Correlation (averages from Table 5.9) Resnik 0.79 0.81 Lin 0.83 0.83 Jiang and Conrath 0.85 0.77 Wu-Palmer 0.8 0.70 Leacock-Chodorow 0.79 0.77 Simple edge counting 0.77 0.74

Table 5.13 Correlations of WordNet and UMLS

5.6.2.3 Comparing results across vocabularies We also compared the similarity of each concept pair across different vocabularies. For each vocabulary, we select the semantic similarity measure that has the highest correlation. Those measures are Lin for both MSH and SNMI and Resnik for ICD9CM. Concept pairs are ranked from the highest to the lowest according to the selected measure for each vocabulary. The comparison is done on the 45 pairs that are common to all vocabularies. Appendix C shows the concepts pairs and the rank of each pair based on the pair’s semantic similarity measure. We wanted to analyze the variation of rankings for a concept pair across the different vocabularies, and thus, analyze how similarly those vocabularies are.

48 First, we compared pairs that ranked top 20% of each vocabulary, and then increased the size by adding 20% each time. Thus, there are 5 groups: top 20%, top 40%, top 60%, top 80%, and 100%. So there should be 9 pairs in top 20% group, 18 pairs in top 40% group, 27 pairs in top 60% group, 36 pairs in top 80%, and all 45 pairs in 100% group. Since there are many tie rankings, however, in each vocabulary, the number of pairs in each group cannot be the exact number as described above. When constructing groups, we put those pairs that have the same ranking in the same group. for example, in ICD9CM, there are 28 pairs ranking as 18th, the lowest ranking in ICD9CM. Those 18 pairs were considered as the 100% group, that is, we are unable to go from top 40% to top 60% but must jump to the top 100% Table 5.14 shows the number of pairs in each group. Tables 5.15 shows the number and percentage of common pairs in each group across vocabularies. The results of Table 5.15 shows that percentages of common pair in each group are greater than or equal to 50%. The highest value is 88.2% when comparing the top 40% group between MSH and ICD9CM. It indicates the rankings of tested concept pairs in each vocabulary are quite close to each other. Top20% Top 40% Top 60% Top 80% 100% MSH 9 18 27 35 45 SNMI 10 20 29 45 ICD9CM 7 17 45

Table 5.14 Number of pairs in each group

Top % of group Top % of group Top % of group 20% 40% 60% MSH-SNMI 7 MSH 77.8% 12 66.7% 18 66.7% SNMI 70.0% 60.0% 62.0% MSH-ICD9CM 6 MSH 67.7% 15 88.3% ICD9CM 85.7% 88.2% SNMI-ICD9CM 6 SNMI 60.0% 14 70.0% ICD9CM 85.7% 82.4% MSH-SNMI- 5 MSH 55.6% 10 55.6% ICD9CM SNMI 50.0% 50.0% ICD9CM 71.4% 58.0%

Table 5.15 Number of common pairs in each group

49 6 Significance Computational models of similarity are beginning to be incorporated in some software systems in order for them to appear more intelligent or even creative. In the bio- informatics domain, databases are being developed with vast amounts of textual annotations. Search tools that incorporate semantic similarity measures are needed to exploit these annotations. Due to growing access to heterogeneous and independent data repositories, an increasingly important task is assessing semantic similarity, relatedness, or their inverse semantic distance, between concepts from different systems or domains. Because different agents may use different ontologies, how to translate one concept in an ontology into a corresponding concept in another is a major problem. Using the most appropriate and effective method to measure the semantic similarity between the concepts in two different ontologies can improve the semantic interoperability between agents. Realizing the importance of such measures, researchers are investigating methods to evaluate their performance. The majority of this research has focused on using WordNet for the ontology and only recently have researchers begun to use another ontology such as the UMLS. Our research is the first to investigate the performance of a suite of measures from both major categories of semantic similarity measures on another significant and well known ontology UMLS. In previous research on measuring semantic relatedness, each experiment developed ontology-dependent software which is only able to calculate semantic similarity measures for that ontology alone. Our research has developed a testbed for assessing such measures on any ontology written in a web ontology language that can be imported into Protégé. This design and implementation provides the flexibility of selecting different types of ontologies and ontologies from different problem domains. The various measures may be computed in different contexts by selecting the relationship that defines the taxonomic structure for the ontology. We performed the first experiments on measures from both network distance- based and information content categories using the UMLS ontology in addition to WordNet. Results of such experimentation could serve as a basis for developing guidelines for using such measures in a variety of applications. For example, one of the

50 key research areas in biomedical domain is to map GO (Gene Ontology), which provides a hierarchical vocabulary for the description of genes and gene products, into UMLS (Lomax and McCray 2004). The use of semantic similarity, relatedness, and distance measures is also important to NLP applications. For example, in (Budanitsaky and Hirst 2001) these measures were the basis for malapropism detection and correction. The general idea is that given an ontology, such as WordNet or another domain ontology, if a word in an article is not semantically related to the nearby words, but a spelling variation of this word is related, this word might be a malapropism. The spelling variation might be the actual word that was intended for use in the article. The experimental testbed developed as part of this research can serve as the foundation to support further research efforts such as creating application specific evaluations of semantic relatedness measures, the development of new hybrid measures, and the investigation of semantic relatedness measures in assessing the quality of ontologies. Quality assessment for ontologies is becoming a significant factor to consider when determining the reusability of existing ontologies on the Semantic Web. As previously noted, the success of the Semantic Web is linked to the success of ontologies. The reuse of ontologies is critical to their success.

7 Conclusions and Future Work

In this research, we developed plugins for Progete 3.1 to create the testbed for measuring semantic similarity of 30 word pairs in the ontology of WordNet and 55 word pairs of the UMLS source vocabularies. Six different measures were implemented, three from the information content-based category and three from the network distance-based category. The results were analyzed by comparing the semantic similarity for these pairs to the experts’ scores from the previous experiments of Miller and Charles’s (1991) for WordNet and of Caviedes and Cimino (2004) for UMLS. Our experimental results indicate that for WordNet, the JC measure is the one with the highest correlation followed closely by the LC measure. For the vocabularies of

51 UMLS, the correlation with the average ES score is greatest for the Lin measure for both MSH and SNMI. For ICD9CM, the correlation of Resnik’s is greatest but is followed closely by the Lin measure. For both WordNet and UMLS, the SEC measure has the lowest correlation. Since there are more than 100 vocabularies in UMLS, of which only three were used in this research, more experiments would need to be performed with the other vocabularies in order to discover if the performance of these measures is the same across all vocabularies. To analyze the performances of different measures across different ontologies, we compared the results of WordNet and the three UMLS vocabularies and found that their performance varies. One contributing factor towards this performance difference is that the hierarchical structures of these two ontologies are different. The wider range of correlation values for the measures within the ICD9CM vocabulary also needs to be further investigated. Our current results suggest that the information content-based semantic similarity measures perform better than the network distance-based measures for both WordNet and UMLS when determining performance based on correlation with human judgment scores of similarity. In addition to what we have accomplished in this research, several tasks look attractive for future exploration. First, our testbed needs to be suitably expanded for intensional ontologies. To make the testbed more general, it could be expanded by adding some new methods to read the ontology’s intensional structure and measure semantic similarity on this structure. Secondly, other UMLS source vocabularies should be used in order to further analyze the performance of the different measures. Finally, by using more UMLS vocabularies, the connections among the different vocabularies which can be used for ontology merging or matching and the usefulness of these semantic similarity measures for assessing the quality of an ontology can begin to be explored.

52 References 1. Alexander, J. H., Freiling, M. J., Shulman, S. J., Staley, J. L., Rehfus, S., and Messick, S. L., Knowledge Level Engineering: Ontological Analysis, Proceedings of AAAI-86. Proceedings of the 5th National Conference on Artificial Intelligence, Los Altos: CA, 1986. 2. Berners-Lee, T. and Hendler, J., The Semantic Web, http://www.sciam.com/2001/0501issue/0501berners-lee.html. 3. Budanitsky, A., Lexical Semantic Relatedness and Its Application in Natural Language Processing, Processing Computer Systems Research Group, University of Toronto, 1999. 4. Budanitsky, A. and Hirst, G., Semantic Distance in WordNet: An Experimental, Application-oriented Evaluation of Five Measures, In Workshop on WordNet and Other Lexical Resources, Second meeting of the NAACL, 2001, Pittsburgh. 5. Caviedes, J., and Cimino, J., Towards the Development of a Conceptual Distance Metric for The UMLS, Journal of Biomedical Informatics 37 (2004) 77-85. 6. Cross, V., Fuzzy Semantic Distance Measures between Ontological Concepts, Computer Science and Systems Analysis Department, Miami University. 7. Davis, R., Shrobe, H., and Szolovits, P., What is a knowledge representation?, AI Magazine, pages 17-33, 1993. 8. Gomes, P., Seco, N., Pereira, F. C., Paiva, P., Carreiro, P., Ferreira, J. L., and Bento, C., The importance of retrieval in creative design analogies, in Procceedings of the International Joint, Conference on Artificial Intelligence IJCAI’03 Workshop: ”3rd Workshop on Creative Systems”, 2003. 9. Gruber, T. R., A translation approach to portable ontologies, Knowledge Acquisition, 5, 199-220, 1993. 10. Heijst, G., Schreiber, A., and Wielinga, B.J., Using Explicit Ontologies in KBS Development, Deaprtment of Social science informatics, University of Amsterdam, 1997. 11. Hendler J. and McGuinneess D., The DARPA Agent Markup Language, IEEE intelligent Systems Trends and Controversies, 2000.

53 12. Hirst, G., and St-Onge, D., Lexical Chains as Representations of Context for the Detection and Correction of Malapropisms, In Fellbaum, 305-332, 1998 13. IST-2000-29243, Deliverable 1.3: A survey on ontology tools, Onto Web, May 2002 14. Jiang, J. and Conrath, D., Semantic Similarity Based on Corpus Staticstics and Lexical Taxonomy, In Processing of International Conference Research on Computational Linguistics (ROCLING X), 1997, Taiwan. 15. Kashyap, V., and Sheth, A. P., Semantic and Schematic Similarities Between Database Objects: A Context-Based Approach, VLDB J. 5(4): 276-304(1996). 16. Kim, H., Predicting How Ontologies for the Semantic Web Will Evolve, Communications of the ACM February 2002, 48 – 54, 2002. 17. Leacock, C. and Chodorow, M., Combining Local Context and WordNet Similarity for Word Sense Identification, In Fellbaum, 265-283, 1998. 18. Lin, D., An Information-Theoretic Definition of Similarity, Proceedings of International Conference on Machine Learning, Madison, Wisconsin, July1998. 19. Lomax, J. and McCray, A., Mapping the Gene Ontology into the Unified Medical Language System, Comparative and Functional Genomics, 2004; 5:354-361. 20. McCray, A., An Upper Level Ontology for the Biomedical Domain, Comparative and Functional Genomics, 2003, 4:80-84. 21. McCray, J., Circumscription- A Form of Nonmonotonic Reasoning, Computer Science Department, Stanford University, 1986. 22. Miller, G., Beckwith, R., Fellbaum, C., Gross, D., and Miller, K.J., Introduction to Wordnet: An on-line Lexical Database, International Journal of Lexicography, 3(4), 235 – 244, 1990. 23. Miller, G., and Charles, W., Contextual Correlates of Semantic Similarity, Language and Cognitive Processes, 6(1) 1-28, 1991 24. Pinto, G. and Martins, J., Ontology Integration: How to perform the Process, Processing of the IJCAI-01 Workshop on Ontologies and Information Sharing, Seattle, USA, August 4-5, 2001. 25. Powers S., Practical RDF, O’Reilly, July 2003 26. Rada, R., Development and Application of a Metric on Semantic Nets, IEEE Transactions on Systems, Man, and Cybernetics, Vol.19, No, 1, 17-30, 1989.

54 27. Resnik, P., Semantic Similarity in a Taxonomy: An Information-Based Measure and its Application to Problems of Ambiguity in Natural Language, Journal of Artificial Intelligence Research 11 (1999) 95-130. 28. Resnik, P., Using Information Content to Evaluate Semantic Similarity in a Taxonomy, Proceedings of the 14th International Joint Conference on Artificial Intelligence, Vol. 1, 448-453, Montreal, August 1995. 29. Richardson, R., and Smeaton, A.F., Using WordNet in a Knowledge-Based Approach to Information Retrieval, Working Paper, CA-0395, School of Computer Applications, Dublin Citi University, Ireland, 1995. 30. Ross, S.M., A First Course in Probability, Macmillan, 1976. 31. Rubenstein, H., and Goodenough, J. B., Contextual Correlates of Synonymy, Comoputational Linguistics, 8, 627-633, 1965. 32. Seco, N., Veale, T., and Hayes, J., An Intrinsic Information Content Metric for Semantic Similarity in WordNet, ECAI 2004: 1089-1090, 2004. 33. Smith, B. and Welty, C., Ontology: Towards a New Synthesis, 2nd International Conference on Formal Ontology in Information Systems, October 17-19, 2001. 34. Sowa, J. F., Conceptual Structures: Information Processing in Mind and Machine, Addison-Wesley, 1984. 35. Sussna, M., Word Sense Disambiguation for Free-text Indexing Using a Massive Semantic Network, Processing of the Second International Conference on Information and Knowledge Management, CIKM’93, 67-74, 1993. 36. UMLS Knowledge Sources January Release 2005 AA documentation, http://www.nlm.nih.gov/research/umls/documentation.html, January 2005 37. W3C, Web Ontology (WebONT) Working Group Charter, http://www.w3.org/2001/sw/WebOnt/charter, July 2003. 38. W3C, World Wide Web Consortium Issues RDF and OWL Recommendations, http://www.w3.org/ -- 10, February 2004 39. Wu, Z., and Palmer, M., Verb Semantic and Lexical Selection, In Proceeding of the 32nd Annual Meeting of the Association for Computational, Linguistics, Las Cruces, New Mexico, pages 133-138, June 1994.

55 Appendix A: Experts’ Scores (Courtesy of Dr. Jorge Caviedes) # Word pair Expert1 (KW) Expert1 (JC) Expert1 (MC) 1 C0002962, C0003811 2 1 3 2 C0002962, C0007192 2 1 4 3 C0002962, C0018799 2 0.5 1 4 C0002962, C0018834 1 3 5 5 C0002962, C0018802 2 1 3 6 C0002962, C0000737 2 3 7 7 C0002962, C0020621 3 2 6 8 C0002962, C0030631 3 3 10 9 C0002962, C0035238 3 3 8 10 C0002962,C0010068 1 0.2 1 11 C0003811, C0007192 2 0.5 3 12 C0003811, C0018799 1 0.5 1 13 C0003811, C0018834 3 3 4 14 C0003811, C0018802 2 1 1 15 C0003811, C0000737 3 3 7 16 C0003811, C0020621 3 0.5 2 17 C0003811, C0030631 3 3 10 18 C0003811, C0035238 3 3 8 19 C0003811, C0010068 2 0.7 3 20 C0007192, C0018799 1 0.5 1 21 C0007192, C0018834 2 3 5 22 C0007192, C0018802 2 0.5 2 23 C0007192, C0000737 2 3 7 24 C0007192, C0020621 3 3 4 25 C0007192, C0030631 3 3 10 26 C0007192, C0035238 2 3 8 27 C0007192, C0010068 2 1 4 28 C0018799, C0018834 2 3 4 29 C0018799, C0018802 1 0.5 1 30 C0018799, C0000737 2 3 8 31 C0018799, C0020621 3 2 5 32 C0018799, C0030631 3 3 10 33 C0018799, C0035238 2 3 8 34 C0018799, C0010068 1 0.5 1 35 C0018834, C0018802 2 3 5 36 C0018834, C0000737 2 0.5 1 37 C0018834, C0020621 3 3 5 38 C0018834, C0030631 3 3 5 39 C0018834, C0035238 3 3 5 40 C0018834, C0010068 2 3 5 41 C0018802, C0000737 2 3 6 42 C0018802, C0020621 3 1 4

56 # Word pair Expert1 (KW) Expert1 (JC) Expert1 (MC) 43 C0018802, C0030631 3 3 10 44 C0018802, C0035238 2 3 8 45 C0018802, C0010068 2 0.7 1 46 C0000737, C0020621 3 3 6 47 C0000737, C0030631 3 3 5 48 C0000737, C0035238 2 3 10 49 C0000737, C0010068 2 3 7 50 C0020621, C0030631 3 3 0 51 C0020621, C0035238 3 3 10 52 C0020621, C0010068 3 2 6 53 C0030631, C0035238 3 3 10 54 C0030631, C0010068 3 3 10 55 C0035238, C0010068 3 3 8

57 Appendix B: Semantic Similarities of UMLS Vocabularies MSH # Concept Pair Resnik Lin JC Avg IC SEC WP LC Avg ND 1 C0002962 C0003811 0.51 0.67 0.75 0.64 0.56 0.85 1.61 1.00 2 C0002962 C0007192 0.51 0.55 0.58 0.54 0.44 0.81 1.39 0.88 3 C0002962 C0018799 0.51 0.74 0.82 0.69 0.67 0.88 1.90 1.15 4 C0002962 C0018834 0.42 0.45 0.49 0.45 0.44 0.81 1.39 0.88 5 C0002962 C0018802 0.51 0.59 0.65 0.58 0.56 0.85 1.61 1.00 6 C0002962 C0000737 0.66 0.75 0.79 0.73 0.67 0.57 1.90 1.05 7 C0002962 C0020621 0.17 0.18 0.24 0.20 0.00 0.67 0.80 0.49 8 C0002962 C0030631 0.27 0.29 0.34 0.30 0.00 0.67 0.80 0.49 9 C0002962 C0035238 0.17 0.21 0.34 0.24 0.22 0.72 1.05 0.66 10 C0002962 C0010068 0.76 0.94 0.95 0.88 0.89 0.80 3.00 1.56 11 C0003811 C0007192 0.51 0.62 0.68 0.60 0.67 0.88 1.90 1.15 12 C0003811 C0018799 0.51 0.88 0.93 0.77 0.89 0.96 3.00 1.61 13 C0003811 C0018834 0.34 0.42 0.52 0.43 0.44 0.80 1.39 0.88 14 C0003811 C0018802 0.51 0.67 0.75 0.64 0.78 0.92 2.30 1.33 15 C0003811 C0000737 0.17 0.22 0.40 0.27 0.44 0.80 1.39 0.88 16 C0003811 C0020621 0.13 0.15 0.30 0.19 0.22 0.72 1.05 0.66 17 C0003811 C0030631 0.00 0.00 0.18 0.06 0.11 0.67 0.92 0.56 18 C0003811 C0035238 0.17 0.24 0.45 0.29 0.44 0.78 1.39 0.87 19 C0003811 C0010068 0.51 0.72 0.80 0.68 0.67 0.88 1.90 1.15 20 C0007192 C0018799 0.51 0.67 0.75 0.64 0.78 0.92 2.30 1.33 21 C0007192 C0018834 0.17 0.17 0.17 0.17 0.11 0.69 0.92 0.57 22 C0007192 C0018802 0.51 0.55 0.58 0.54 0.67 0.88 1.90 1.15 23 C0007192 C0000737 0.27 0.29 0.32 0.29 0.11 0.69 0.92 0.57 24 C0007192 C0020621 0.17 0.17 0.17 0.17 0.11 0.69 0.92 0.57 25 C0007192 C0030631 0.46 0.46 0.46 0.46 0.33 0.77 1.20 0.77 26 C0007192 C0035238 0.17 0.19 0.28 0.21 0.33 0.75 1.20 0.76 27 C0007192 C0010068 0.51 0.58 0.63 0.57 0.56 0.85 1.61 1.00

58 # Concept Pair Resnik Lin JC Avg IC SEC WP LC Avg ND 28 C0018799 C0018834 0.17 0.23 0.42 0.27 0.33 0.75 1.20 0.76 29 C0018799 C0018802 0.51 0.74 0.82 0.69 0.89 0.96 3.00 1.61 30 C0018799 C0000737 0.17 0.25 0.47 0.30 0.33 0.75 1.20 0.76 31 C0018799 C0020621 0.17 0.23 0.42 0.27 0.33 0.75 1.20 0.76 32 C0018799 C0030631 0.00 0.00 0.25 0.08 0.22 0.70 1.05 0.66 33 C0018799 C0035238 0.17 0.26 0.52 0.32 0.56 0.82 1.61 0.99 34 C0018799 C0010068 0.51 0.80 0.87 0.73 0.78 0.92 2.30 1.33 35 C0018834 C0018802 0.17 0.18 0.24 0.20 0.22 0.72 1.05 0.66 36 C0018834 C0000737 0.68 0.71 0.73 0.71 0.78 0.92 2.30 1.33 37 C0018834 C0020621 0.17 0.17 0.17 0.17 0.11 0.69 0.92 0.57 38 C0018834 C0030631 0.00 0.00 0.00 0.00 0.00 0.64 0.80 0.48 39 C0018834 C0035238 0.17 0.19 0.28 0.21 0.33 0.75 1.20 0.76 40 C0018834 C0010068 0.17 0.19 0.29 0.22 0.11 0.69 0.92 0.57 41 C0018802 C0000737 0.17 0.20 0.30 0.22 0.22 0.72 1.05 0.66 42 C0018802 C0020621 0.17 0.18 0.24 0.20 0.22 0.72 1.05 0.66 43 C0018802 C0030631 0.00 0.00 0.07 0.02 0.11 0.67 0.92 0.56 44 C0018802 C0035238 0.17 0.21 0.34 0.24 0.44 0.78 1.39 0.87 45 C0018802 C0010068 0.51 0.63 0.70 0.61 0.67 0.88 1.90 1.15 46 C0000737 C0020621 0.17 0.18 0.23 0.19 0.11 0.69 0.92 0.57 47 C0000737 C0030631 0.27 0.29 0.32 0.29 0.11 0.69 0.92 0.57 48 C0000737 C0035238 0.17 0.20 0.33 0.24 0.33 0.75 1.20 0.76 49 C0000737 C0010068 0.17 0.17 0.17 0.17 0.11 0.69 0.92 0.57 50 C0020621 C0030631 0.00 0.00 0.00 0.00 0.00 0.64 0.80 0.48 51 C0020621 C0035238 0.17 0.19 0.28 0.21 0.33 0.75 1.20 0.76 52 C0020621 C0010068 0.17 0.19 0.29 0.22 0.11 0.69 0.92 0.57 53 C0030631 C0035238 0.00 0.00 0.10 0.03 0.22 0.70 1.05 0.66 54 C0030631 C0010068 0.00 0.00 0.12 0.04 0.00 0.64 0.80 0.48 55 C0035238 C0010068 0.17 0.22 0.39 0.26 0.33 0.75 1.20 0.76

59

SNMI # Concept Pair Resnik Lin JC Avg IC SEC WP LC Avg ND 1 C0002962 C0003811 0.40 0.57 0.70 0.55 0.60 0.50 2.01 1.04 2 C0002962 C0007192 0.52 0.58 0.62 0.57 0.70 0.67 2.30 1.22 3 C0002962 C0018799 0.52 0.79 0.86 0.72 0.90 0.86 3.40 1.72 4 C0002962 C0018834 0.00 0.00 0.10 0.03 0.20 0.00 1.32 0.51 5 C0002962 C0018802 0.52 0.58 0.62 0.57 0.70 0.67 2.30 1.22 6 C0002962 C0000737 0.00 0.00 0.10 0.03 0.10 0.00 1.20 0.43 7 C0002962 C0020621 0.13 0.14 0.23 0.17 0.40 0.25 1.61 0.75 8 C0002962 C0030631 0.13 0.14 0.23 0.17 0.30 0.22 1.46 0.66 9 C0002962 C0035238 0.13 0.14 0.23 0.17 0.50 0.29 1.79 0.86 10 C0002962 C0010068 11 C0003811 C0007192 0.40 0.50 0.59 0.50 0.50 0.44 1.79 0.91 12 C0003811 C0018799 0.40 0.71 0.83 0.65 0.70 0.57 2.30 1.19 13 C0003811 C0018834 0.00 0.00 0.19 0.06 0.20 0.00 1.32 0.51 14 C0003811 C0018802 0.40 0.50 0.59 0.50 0.50 0.44 1.79 0.91 15 C0003811 C0000737 0.00 0.00 0.19 0.06 0.10 0.00 1.20 0.43 16 C0003811 C0020621 0.13 0.16 0.32 0.20 0.40 0.25 1.61 0.75 17 C0003811 C0030631 0.13 0.16 0.32 0.20 0.30 0.22 1.46 0.66 18 C0003811 C0035238 0.13 0.16 0.32 0.20 0.50 0.29 1.79 0.86 19 C0003811 C0010068 20 C0007192 C0018799 0.52 0.68 0.76 0.65 0.80 0.75 2.71 1.42 21 C0007192 C0018834 0.00 0.00 0.00 0.00 0.10 0.00 1.20 0.43 22 C0007192 C0018802 0.52 0.52 0.52 0.52 0.60 0.60 2.01 1.07 23 C0007192 C0000737 0.00 0.00 0.00 0.00 0.00 0.00 1.10 0.37 24 C0007192 C0020621 0.13 0.13 0.13 0.13 0.30 0.22 1.46 0.66 25 C0007192 C0030631 0.13 0.13 0.13 0.13 0.20 0.20 1.32 0.57 26 C0007192 C0035238 0.13 0.13 0.13 0.13 0.40 0.25 1.61 0.75 27 C0007192 C0010068

60 # Concept Pair Resnik Lin JC Avg IC SEC WP LC Avg ND 28 C0018799 C0018834 0.00 0.00 0.24 0.08 0.30 0.00 1.46 0.59 29 C0018799 C0018802 0.52 0.68 0.76 0.65 0.80 0.75 2.71 1.42 30 C0018799 C0000737 0.00 0.00 0.24 0.08 0.20 0.00 1.32 0.51 31 C0018799 C0020621 0.13 0.17 0.37 0.22 0.50 0.29 1.79 0.86 32 C0018799 C0030631 0.13 0.17 0.37 0.22 0.40 0.25 1.61 0.75 33 C0018799 C0035238 0.13 0.17 0.37 0.22 0.60 0.33 2.01 0.98 34 C0018799 C0010068 35 C0018834 C0018802 0.00 0.00 0.00 0.00 0.10 0.00 1.20 0.43 36 C0018834 C0000737 0.47 0.47 0.47 0.47 0.50 0.44 1.79 0.91 37 C0018834 C0020621 0.00 0.00 0.00 0.00 0.20 0.00 1.32 0.51 38 C0018834 C0030631 0.00 0.00 0.00 0.00 0.10 0.00 1.20 0.43 39 C0018834 C0035238 0.00 0.00 0.00 0.00 0.30 0.00 1.46 0.59 40 C0018834 C0010068 41 C0018802 C0000737 0.00 0.00 0.00 0.00 0.00 0.00 1.10 0.37 42 C0018802 C0020621 0.13 0.13 0.13 0.13 0.30 0.22 1.46 0.66 43 C0018802 C0030631 0.13 0.13 0.13 0.13 0.20 0.20 1.32 0.57 44 C0018802 C0035238 0.13 0.13 0.13 0.13 0.40 0.25 1.61 0.75 45 C0018802 C0010068 46 C0000737 C0020621 0.00 0.00 0.00 0.00 0.10 0.00 1.20 0.43 47 C0000737 C0030631 0.00 0.00 0.00 0.00 0.00 0.00 1.10 0.37 48 C0000737 C0035238 0.00 0.00 0.00 0.00 0.20 0.00 1.32 0.51 49 C0000737 C0010068 50 C0020621 C0030631 0.13 0.13 0.13 0.13 0.30 0.22 1.46 0.66 51 C0020621 C0035238 0.13 0.13 0.13 0.13 0.50 0.29 1.79 0.86 52 C0020621 C0010068 53 C0030631 C0035238 0.13 0.13 0.13 0.13 0.40 0.25 1.61 0.75 54 C0030631 C0010068 55 C0035238 C0010068

61

ICD9CM # Concept Pair Resnik Lin JC Avg IC SEC WP LC Avg ND 1 C0002962 C0003811 0.36 0.46 0.58 0.47 0.56 0.50 1.70 0.92 2 C0002962 C0007192 0.36 0.39 0.43 0.40 0.44 0.44 1.48 0.79 3 C0002962 C0018799 0.36 0.39 0.43 0.40 0.44 0.44 1.48 0.79 4 C0002962 C0018834 0.04 0.05 0.11 0.07 0.22 0.22 1.15 0.53 5 C0002962 C0018802 0.36 0.39 0.43 0.40 0.44 0.44 1.48 0.79 6 C0002962 C0000737 0.04 0.05 0.22 0.11 0.22 0.22 1.15 0.53 7 C0002962 C0020621 0.04 0.05 0.11 0.07 0.22 0.22 1.15 0.53 8 C0002962 C0030631 0.04 0.05 0.11 0.07 0.11 0.20 1.01 0.44 9 C0002962 C0035238 0.04 0.05 0.23 0.11 0.44 0.29 1.48 0.74 10 C0002962 C0010068 11 C0003811 C0007192 0.52 0.61 0.66 0.60 0.67 0.67 1.99 1.11 12 C0003811 C0018799 0.52 0.61 0.66 0.60 0.67 0.67 1.99 1.11 13 C0003811 C0018834 0.04 0.05 0.18 0.09 0.22 0.22 1.15 0.53 14 C0003811 C0018802 0.52 0.61 0.66 0.60 0.67 0.67 1.99 1.11 15 C0003811 C0000737 0.04 0.06 0.29 0.13 0.22 0.22 1.15 0.53 16 C0003811 C0020621 0.04 0.05 0.18 0.09 0.22 0.22 1.15 0.53 17 C0003811 C0030631 0.04 0.05 0.18 0.09 0.11 0.20 1.01 0.44 18 C0003811 C0035238 0.04 0.06 0.30 0.13 0.44 0.29 1.48 0.74 19 C0003811 C0010068 20 C0007192 C0018799 0.52 0.52 0.52 0.52 0.56 0.60 1.70 0.95 21 C0007192 C0018834 0.04 0.04 0.04 0.04 0.11 0.20 1.01 0.44 22 C0007192 C0018802 0.52 0.52 0.52 0.52 0.56 0.60 1.70 0.95 23 C0007192 C0000737 0.04 0.05 0.15 0.08 0.11 0.20 1.01 0.44 24 C0007192 C0020621 0.04 0.04 0.04 0.04 0.11 0.20 1.01 0.44 25 C0007192 C0030631 0.04 0.04 0.04 0.04 0.00 0.18 0.89 0.36 26 C0007192 C0035238 0.04 0.05 0.16 0.08 0.33 0.25 1.30 0.63 27 C0007192 C0010068

62 # Concept Pair Resnik Lin JC Avg IC SEC WP LC Avg ND 28 C0018799 C0018834 0.04 0.04 0.04 0.04 0.11 0.20 1.01 0.44 29 C0018799 C0018802 0.52 0.52 0.52 0.52 0.56 0.60 1.70 0.95 30 C0018799 C0000737 0.04 0.05 0.15 0.08 0.11 0.20 1.01 0.44 31 C0018799 C0020621 0.04 0.04 0.04 0.04 0.11 0.20 1.01 0.44 32 C0018799 C0030631 0.04 0.04 0.04 0.04 0.00 0.18 0.89 0.36 33 C0018799 C0035238 0.04 0.05 0.16 0.08 0.33 0.25 1.30 0.63 34 C0018799 C0010068 35 C0018834 C0018802 0.04 0.04 0.04 0.04 0.11 0.20 1.01 0.44 36 C0018834 C0000737 0.46 0.51 0.57 0.51 0.56 0.60 1.70 0.95 37 C0018834 C0020621 0.04 0.04 0.04 0.04 0.11 0.20 1.01 0.44 38 C0018834 C0030631 0.04 0.04 0.04 0.04 0.00 0.18 0.89 0.36 39 C0018834 C0035238 0.04 0.05 0.16 0.08 0.33 0.25 1.30 0.63 40 C0018834 C0010068 41 C0018802 C0000737 0.04 0.05 0.15 0.08 0.11 0.20 1.01 0.44 42 C0018802 C0020621 0.04 0.04 0.04 0.04 0.11 0.20 1.01 0.44 43 C0018802 C0030631 0.04 0.04 0.04 0.04 0.00 0.18 0.89 0.36 44 C0018802 C0035238 0.04 0.05 0.16 0.08 0.33 0.25 1.30 0.63 45 C0018802 C0010068 46 C0000737 C0020621 0.04 0.05 0.15 0.08 0.11 0.20 1.01 0.44 47 C0000737 C0030631 0.04 0.05 0.15 0.08 0.00 0.18 0.89 0.36 48 C0000737 C0035238 0.04 0.06 0.28 0.12 0.33 0.25 1.30 0.63 49 C0000737 C0010068 50 C0020621 C0030631 0.04 0.04 0.04 0.04 0.00 0.18 0.89 0.36 51 C0020621 C0035238 0.04 0.05 0.16 0.08 0.33 0.25 1.30 0.63 52 C0020621 C0010068 53 C0030631 C0035238 0.04 0.05 0.16 0.08 0.22 0.22 1.15 0.53 54 C0030631 C0010068 55 C0035238 C0010068

63 Appendix C: Ranking according to Pearson Correlation MSH Pair # CUI 1 CUI 2 Lin 12 C0003811 C0018799 1 6 C0002962 C0000737 2 3 C0002962 C0018799 3 29 C0018799 C0018802 4 36 C0018834 C0000737 5 20 C0007192 C0018799 6 1 C0002962 C0003811 7 14 C0003811 C0018802 8 11 C0003811 C0007192 9 5 C0002962 C0018802 10 2 C0002962 C0007192 11 22 C0007192 C0018802 12 25 C0007192 C0030631 13 4 C0002962 C0018834 14 13 C0003811 C0018834 15 8 C0002962 C0030631 16 23 C0007192 C0000737 17 47 C0000737 C0030631 17 33 C0018799 C0035238 19 30 C0018799 C0000737 20 18 C0003811 C0035238 21 28 C0018799 C0018834 22 31 C0018799 C0020621 22 15 C0003811 C0000737 24 9 C0002962 C0035238 25 44 C0018802 C0035238 25 48 C0000737 C0035238 27 41 C0018802 C0000737 28 26 C0007192 C0035238 29 39 C0018834 C0035238 29 51 C0020621 C0035238 29 7 C0002962 C0020621 32 35 C0018834 C0018802 32 42 C0018802 C0020621 32 46 C0000737 C0020621 35 21 C0007192 C0018834 36 24 C0007192 C0020621 36 37 C0018834 C0020621 36 16 C0003811 C0020621 39 17 C0003811 C0030631 40 32 C0018799 C0030631 40 38 C0018834 C0030631 40 43 C0018802 C0030631 40 50 C0020621 C0030631 40 53 C0030631 C0035238 40

64 SNMI Lin- Pair # CUI 1 CUI 2 SNMI 3 C0002962 C0018799 1 12 C0003811 C0018799 2 20 C0007192 C0018799 3 29 C0018799 C0018802 3 2 C0002962 C0007192 5 5 C0002962 C0018802 5 1 C0002962 C0003811 7 22 C0007192 C0018802 8 11 C0003811 C0007192 9 14 C0003811 C0018802 9 36 C0018834 C0000737 11 31 C0018799 C0020621 12 32 C0018799 C0030631 12 33 C0018799 C0035238 12 16 C0003811 C0020621 15 17 C0003811 C0030631 15 18 C0003811 C0035238 15 7 C0002962 C0020621 18 8 C0002962 C0030631 18 9 C0002962 C0035238 18 24 C0007192 C0020621 21 25 C0007192 C0030631 21 26 C0007192 C0035238 21 42 C0018802 C0020621 21 43 C0018802 C0030631 21 44 C0018802 C0035238 21 50 C0020621 C0030631 21 51 C0020621 C0035238 21 53 C0030631 C0035238 21 4 C0002962 C0018834 30 6 C0002962 C0000737 30 13 C0003811 C0018834 30 15 C0003811 C0000737 30 21 C0007192 C0018834 30 23 C0007192 C0000737 30 28 C0018799 C0018834 30 30 C0018799 C0000737 30 35 C0018834 C0018802 30 37 C0018834 C0020621 30 38 C0018834 C0030631 30 39 C0018834 C0035238 30 41 C0018802 C0000737 30 46 C0000737 C0020621 30 47 C0000737 C0030631 30 48 C0000737 C0035238 30

65 ICD9CM Resnik- Pair # CUI 1 CUI 2 ICD9CM 11 C0003811 C0007192 1 12 C0003811 C0018799 1 14 C0003811 C0018802 1 20 C0007192 C0018799 1 22 C0007192 C0018802 1 29 C0018799 C0018802 1 36 C0018834 C0000737 7 1 C0002962 C0003811 8 2 C0002962 C0007192 8 3 C0002962 C0018799 8 5 C0002962 C0018802 8 4 C0002962 C0018834 12 6 C0002962 C0000737 12 7 C0002962 C0020621 12 8 C0002962 C0030631 12 9 C0002962 C0035238 12 13 C0003811 C0018834 12 15 C0003811 C0000737 18 16 C0003811 C0020621 18 17 C0003811 C0030631 18 18 C0003811 C0035238 18 21 C0007192 C0018834 18 23 C0007192 C0000737 18 24 C0007192 C0020621 18 25 C0007192 C0030631 18 26 C0007192 C0035238 18 28 C0018799 C0018834 18 30 C0018799 C0000737 18 31 C0018799 C0020621 18 32 C0018799 C0030631 18 33 C0018799 C0035238 18 35 C0018834 C0018802 18 37 C0018834 C0020621 18 38 C0018834 C0030631 18 39 C0018834 C0035238 18 41 C0018802 C0000737 18 42 C0018802 C0020621 18 43 C0018802 C0030631 18 44 C0018802 C0035238 18 46 C0000737 C0020621 18 47 C0000737 C0030631 18 48 C0000737 C0035238 18 50 C0020621 C0030631 18 51 C0020621 C0035238 18 53 C0030631 C0035238 18

66 Appendix D: Rankings of each method in three vocabularies MSH # Concept Pair Resnik Lin JC Avg IC SEC WP LC Avg ND 1 C0002962 C0003811 4 10 9 10 14 12 14 14 2 C0002962 C0007192 4 16 16 16 18 16 18 18 3 C0002962 C0018799 4 5 4 6 8 7 8 8 4 C0002962 C0018834 19 19 20 19 18 16 18 18 5 C0002962 C0018802 4 14 14 14 14 12 14 14 6 C0002962 C0000737 3 4 7 3 8 55 8 13 7 C0002962 C0020621 24 40 42 40 51 48 51 51 8 C0002962 C0030631 21 21 30 22 51 48 51 51 9 C0002962 C0035238 24 31 28 31 33 32 33 33 10 C0002962 C0010068 1 1 1 1 1 18 1 3 11 C0003811 C0007192 8 13 13 13 8 7 8 8 12 C0003811 C0018799 8 2 2 2 1 1 1 1 13 C0003811 C0018834 20 20 19 20 18 18 18 20 14 C0003811 C0018802 8 11 10 11 4 4 4 5 15 C0003811 C0000737 24 29 26 29 18 18 18 20 16 C0003811 C0020621 48 48 34 43 33 32 33 33 17 C0003811 C0030631 49 49 46 50 40 48 40 49 18 C0003811 C0035238 24 26 23 26 18 21 18 22 19 C0003811 C0010068 8 7 6 8 8 7 8 8 20 C0007192 C0018799 8 9 8 9 4 4 4 5 21 C0007192 C0018834 24 44 47 45 40 39 40 40 22 C0007192 C0018802 8 17 17 17 8 7 8 8 23 C0007192 C0000737 22 22 32 24 40 39 40 40 24 C0007192 C0020621 24 44 47 45 40 39 40 40 25 C0007192 C0030631 18 18 22 18 24 23 24 24 26 C0007192 C0035238 24 37 38 37 24 24 24 25 27 C0007192 C0010068 8 15 15 15 14 12 14 14

67 # Concept Pair Resnik Lin JC Avg IC SEC WP LC Avg ND 28 C0018799 C0018834 24 27 24 27 24 24 24 25 29 C0018799 C0018802 8 6 5 7 1 1 1 1 30 C0018799 C0000737 24 25 21 23 24 24 24 25 31 C0018799 C0020621 24 27 24 27 24 24 24 25 32 C0018799 C0030631 49 49 41 49 33 37 33 38 33 C0018799 C0035238 24 24 18 21 14 15 14 17 34 C0018799 C0010068 8 3 3 4 4 4 4 5 35 C0018834 C0018802 24 40 42 40 33 32 33 33 36 C0018834 C0000737 2 8 11 5 4 3 4 4 37 C0018834 C0020621 24 44 47 45 40 39 40 40 38 C0018834 C0030631 49 49 54 54 51 52 51 53 39 C0018834 C0035238 24 37 38 37 24 24 24 25 40 C0018834 C0010068 24 35 36 35 40 39 40 40 41 C0018802 C0000737 24 34 35 34 33 32 33 33 42 C0018802 C0020621 24 40 42 40 33 32 33 33 43 C0018802 C0030631 49 49 53 53 40 48 40 49 44 C0018802 C0035238 24 31 28 31 18 21 18 22 45 C0018802 C0010068 8 12 12 12 8 7 8 8 46 C0000737 C0020621 24 43 45 44 40 39 40 40 47 C0000737 C0030631 22 22 32 24 40 39 40 40 48 C0000737 C0035238 24 33 31 33 24 24 24 25 49 C0000737 C0010068 24 44 50 48 40 39 40 40 50 C0020621 C0030631 49 49 54 54 51 52 51 53 51 C0020621 C0035238 24 37 38 37 24 24 24 25 52 C0020621 C0010068 24 35 36 35 40 39 40 40 53 C0030631 C0035238 49 49 52 52 33 37 33 38 54 C0030631 C0010068 49 49 51 51 51 52 51 53 55 C0035238 C0010068 24 30 27 30 24 24 24 25

68

SNMI # Concept Pair Resnik Lin JC Avg IC SEC WP LC Avg ND 1 C0002962 C0003811 8 7 5 7 7 8 7 8 2 C0002962 C0007192 1 5 6 5 4 4 4 4 3 C0002962 C0018799 1 1 1 1 1 1 1 1 4 C0002962 C0018834 30 30 34 34 30 30 30 32 5 C0002962 C0018802 1 5 6 5 4 4 4 4 6 C0002962 C0000737 30 30 34 34 37 30 37 37 7 C0002962 C0020621 12 18 20 18 17 17 17 17 8 C0002962 C0030631 12 18 20 18 23 23 23 23 9 C0002962 C0035238 12 18 20 18 10 13 10 13 11 C0003811 C0007192 8 9 8 9 10 9 10 10 12 C0003811 C0018799 8 2 2 4 4 7 4 6 13 C0003811 C0018834 30 30 23 32 30 30 30 32 14 C0003811 C0018802 8 9 8 9 10 9 10 10 15 C0003811 C0000737 30 30 23 32 37 30 37 37 16 C0003811 C0020621 12 15 15 15 17 17 17 17 17 C0003811 C0030631 12 15 15 15 23 23 23 23 18 C0003811 C0035238 12 15 15 15 10 13 10 13 20 C0007192 C0018799 1 3 3 2 2 2 2 2 21 C0007192 C0018834 30 30 36 36 37 30 37 37 22 C0007192 C0018802 1 8 10 8 7 6 7 7 23 C0007192 C0000737 30 30 36 36 43 30 43 43 24 C0007192 C0020621 12 21 25 21 23 23 23 23 25 C0007192 C0030631 12 21 25 21 30 28 30 30 26 C0007192 C0035238 12 21 25 21 17 17 17 17 28 C0018799 C0018834 30 30 18 30 23 30 23 28 29 C0018799 C0018802 1 3 3 2 2 2 2 2 30 C0018799 C0000737 30 30 18 30 30 30 30 32 31 C0018799 C0020621 12 12 12 12 10 13 10 13 32 C0018799 C0030631 12 12 12 12 17 17 17 17 33 C0018799 C0035238 12 12 12 12 7 12 7 9 35 C0018834 C0018802 30 30 36 36 37 30 37 37 36 C0018834 C0000737 7 11 11 11 10 9 10 10

69 # Concept Pair Resnik Lin JC Avg IC SEC WP LC Avg ND 37 C0018834 C0020621 30 30 36 36 30 30 30 32 38 C0018834 C0030631 30 30 36 36 37 30 37 37 39 C0018834 C0035238 30 30 36 36 23 30 23 28 41 C0018802 C0000737 30 30 36 36 43 30 43 43 42 C0018802 C0020621 12 21 25 21 23 23 23 23 43 C0018802 C0030631 12 21 25 21 30 28 30 30 44 C0018802 C0035238 12 21 25 21 17 17 17 17 46 C0000737 C0020621 30 30 36 36 37 30 37 37 47 C0000737 C0030631 30 30 36 36 43 30 43 43 48 C0000737 C0035238 30 30 36 36 30 30 30 32 50 C0020621 C0030631 12 21 25 21 23 23 23 23 51 C0020621 C0035238 12 21 25 21 10 13 10 13 53 C0030631 C0035238 12 21 25 21 17 17 17 17

70

ICD9CM # Concept Pair Resnik Lin JC Avg IC SEC WP LC Avg ND 1 C0002962 C0003811 8 8 4 8 4 8 4 8 2 C0002962 C0007192 8 9 9 9 9 9 9 9 3 C0002962 C0018799 8 9 9 9 9 9 9 9 4 C0002962 C0018834 12 31 31 31 20 20 20 20 5 C0002962 C0018802 8 9 9 9 9 9 9 9 6 C0002962 C0000737 12 16 16 16 20 20 20 20 7 C0002962 C0020621 12 31 31 31 20 20 20 20 8 C0002962 C0030631 12 31 31 31 27 27 27 27 9 C0002962 C0035238 12 15 15 15 9 12 9 12 11 C0003811 C0007192 1 1 1 1 1 1 1 1 12 C0003811 C0018799 1 1 1 1 1 1 1 1 13 C0003811 C0018834 12 17 17 17 20 20 20 20 14 C0003811 C0018802 1 1 1 1 1 1 1 1 15 C0003811 C0000737 18 13 13 13 20 20 20 20 16 C0003811 C0020621 18 17 17 18 20 20 20 20 17 C0003811 C0030631 18 17 17 18 27 27 27 27 18 C0003811 C0035238 18 12 12 12 9 12 9 12 20 C0007192 C0018799 1 4 6 4 4 4 4 4 21 C0007192 C0018834 18 37 37 37 27 27 27 27 22 C0007192 C0018802 1 4 6 4 4 4 4 4 23 C0007192 C0000737 18 29 29 29 27 27 27 27 24 C0007192 C0020621 18 37 37 37 27 27 27 27 25 C0007192 C0030631 18 37 37 37 40 40 40 40 26 C0007192 C0035238 18 23 23 23 14 14 14 14 28 C0018799 C0018834 18 37 37 37 27 27 27 27 29 C0018799 C0018802 1 4 6 4 4 4 4 4 30 C0018799 C0000737 18 29 29 29 27 27 27 27 31 C0018799 C0020621 18 37 37 37 27 27 27 27 32 C0018799 C0030631 18 37 37 37 40 40 40 40 33 C0018799 C0035238 18 23 23 23 14 14 14 14 35 C0018834 C0018802 18 37 37 37 27 27 27 27 36 C0018834 C0000737 7 7 5 7 4 4 4 4

71 # Concept Pair Resnik Lin JC Avg IC SEC WP LC Avg ND 37 C0018834 C0020621 18 37 37 37 27 27 27 27 38 C0018834 C0030631 18 37 37 37 40 40 40 40 39 C0018834 C0035238 18 23 23 23 14 14 14 14 41 C0018802 C0000737 18 26 26 26 27 27 27 27 42 C0018802 C0020621 18 34 34 34 27 27 27 27 43 C0018802 C0030631 18 34 34 34 40 40 40 40 44 C0018802 C0035238 18 20 20 20 14 14 14 14 46 C0000737 C0020621 18 26 26 26 27 27 27 27 47 C0000737 C0030631 18 26 26 26 40 40 40 40 48 C0000737 C0035238 18 14 14 14 14 14 14 14 50 C0020621 C0030631 18 34 34 34 40 40 40 40 51 C0020621 C0035238 18 20 20 20 14 14 14 14 53 C0030631 C0035238 18 20 20 20 20 20 20 20

72