ABSTRACT AN EMPIRICAL EVALUATION OF SEMANTIC SIMILARITY MEASURES USING THE WORDNET AND UMLS ONTOLOGIES by Youbo Wang Ontologies have been promoted as a sound basis for communications of Semantic Web, the next generation of the Web. The proliferation of multiple ontologies for heterogeneous information systems developed independently, however, requires tools to enable semantic interoperability established by discovering semantically appropriate mappings between different and independent ontologies. Finding this mapping requires some notion of a measure of semantic relatedness between concepts. Numerous such measures have been proposed for use within an ontology. The evaluation of these measures has been done predominately through experiments using these measures on word pairs from the WordNet ontology. The results are then compared to human judgments on these same pairs. In this work, an experimental software testbed that implements the semantic relatedness measures and automates their performance testing is developed to evaluate the validity, performance and applicability of these measures on two very different ontologies, WordNet and Unified Medical Language System (UMLS). AN EMPIRICAL EVALUATION OF SEMANTIC SIMILARITY MEASURES USING THE WORDNET AND UMLS ONTOLOGIES A Thesis Submitted to the Faculty of Miami University in partial fulfillment of the requirements for the degree of Master of Computer Science Department of Computer Science & Systems Analysis by Youbo Wang Miami University Oxford, Ohio 2005 Advisor ________________________ Dr. Valerie Cross Reader _________________________ Dr. James Kiper Reader _________________________ Dr. Fazli Can TABLE OF CONTENTS 1 Introduction........................................................................................................................1 2 Ontologies: Key to Semantic Web Success......................................................................2 2.1 Definition .....................................................................................................................3 2.2 Types and Uses of Ontologies .....................................................................................4 2.3 Languages for Ontology Specification and Ontology Editors.....................................5 2.3.1 Ontology Languages .............................................................................................6 2.3.2 Ontology Editor ....................................................................................................7 2.4 WordNet and UMLS Ontologies Overview ................................................................9 2.4.1 WordNet Introduction.........................................................................................10 2.4.2 UMLS Introduction.............................................................................................11 3 Semantic Similarity..........................................................................................................15 3.1 Motivation..................................................................................................................15 3.2 Semantic Distance, Similarity and Relatedness Measures ........................................15 3.2.1 Network Distance-Based Approach....................................................................16 3.2.2 Information Content-Based Approaches.............................................................18 3.3 Determining Information Content Measures .............................................................20 4 Other Researchers’ WordNet and UMLS Experiments...................................................21 4.1 WordNet Experiments ...............................................................................................21 4.2 Semantic Similarity Experiments in UMLS ..............................................................23 5 Experiments Using the Protégé Testbed..........................................................................25 5.1 Overview....................................................................................................................25 5.2 Difficulties in Acquiring WordNet and UMLS Ontologies.......................................27 5.3 WordNet and UMLS Ontology Structure..................................................................28 5.3.1 WordNet in OWL ...............................................................................................28 5.3.2 UMLS in OWL ...................................................................................................30 5.4 Testbed Description ...................................................................................................31 5.5 WordNet Experiment.................................................................................................37 5.5.1 Experiment and results........................................................................................37 5.5.2 Analysis...............................................................................................................41 5.6 UMLS Experiments ...................................................................................................42 5.6.1 Experiments and results ......................................................................................42 5.6.2 Analysis...............................................................................................................45 ii 6 Significance......................................................................................................................50 7 Conclusions and Future Work .........................................................................................51 References…............................................................................................................................53 Appendix…..............................................................................................................................56 Appendix A: Experts’ Scores………………………………………….…………….57 Appendix B: Semantic Similarities of UMLS Vocabularies………………..............58 Appendix C: Ranking according to Pearson Correlation……………….…..............64 Appendix D: Rankings of each method in three vocabularies……………...............67 iii LIST OF TABLES Table 4.1 11 related concepts in UMLS vocabularies .............................................................24 Table 4.2 unrelated concepts....................................................................................................25 Table 5.1 Arguments of WordNet and UMLS experiments....................................................33 Table 5.2 Classes and their associated properties of the TestCase Ontology..........................36 Table 5.3 Information content semantic similarity measures of WordNet experiment...........38 Table 5.4 Network distance-based semantic similarity measures of WordNet experiment ....38 Table 5.5 Least subsumers of three word pairs in WordNet 2.0 and WordNet 1.7.................40 Table 5.6 Correlations with Human judgment for Wordnet experiment.................................40 Table 5.7 Correlations for MSH ..............................................................................................44 Table 5.8 Correlations for SNMI.............................................................................................45 Table 5.9 Correlations for ICD9CM.......................................................................................45 Table 5.10 Average correlations for AverageES across all vocabularies................................45 Table 5.11Ranking of all six methods in each vocabulary according to Pearson Correlation 46 Table 5.12 Average correlation of different measures.............................................................47 Table 5.13 Correlations of WordNet and UMLS.....................................................................48 Table 5.14 Number of pairs in each group ..............................................................................49 Table 5.15 Number of common pairs in each group ...............................................................49 iv LIST OF FIGURES Figure 2.1 A high level portions of OpenCyc (CYC)................................................................5 Figure 2.2 Layer cake structure of the Semantic Web...............................................................7 Figure 2.3 The Protégé 3.1 user interface..................................................................................9 Figure 2.4 A WordNet noun hierarchy example......................................................................11 Figure 2.5 MSH concepts and relations example ....................................................................13 Figure 2.6 UMLS Semantic Network example........................................................................14 Figure 3.1 An example of a hierarchy .....................................................................................17 Figure 3.2 A fragment of the WordNet ontology ....................................................................18 Figure 5.1 Hierarchical structure of WordNet .........................................................................29 Figure 5.2 Hierarchical structure of UMLS vocabularies.......................................................30 Figure 5.3 The interface of integrated tab plugin ....................................................................31 Figure 5.4 The pop-up window used for entering word pairs..................................................33
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages79 Page
-
File Size-