From a Children's First Dictionary to a Lexical Knowledge Base of Conceptual Graphs
Total Page:16
File Type:pdf, Size:1020Kb
From a Children's First Dictionary to a Lexical Knowledge Base of Conceptual Graphs Caroline Barrikre B.Eng. Ecole Polytechnique de Montrgal, 1989 M. A.Sc. Ecole Polytechnique de Montrbal, 1991 A THESIS SUBMITTED IN PARTIAL FULFILLME-NT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOROF PHILOSOPHY in the School of Computing Science @ Caroline Barrikre 1997 SIMON FRASER UNIVERSITY June 1997 All rights reserved. This work may not be reproduced in whole or in part, by photocopy or other means, without the permission of the author. National Library BiMiotftbque nationale 1+1 dCana, dueCanada Acquisitions and Acqutsttlons 'et Bibliographc Serv~ces servlces b~bl~ograph~ques The author has granted a non- L'auteur a accorde une licence non exclusive licence allowing the exclusive permettant a la National Library of Canada to Bibliotheque nationale du Canada de reproduce, loan, &tribute or sell reproduke, priter, drstribuer ou copies of th~sthesis in microform, vendre des copies de cette thdesous paper or electronic formats. la forme de microfiche/film, de reproduction sur papier ou sur format electronique. The author retains ownershp of the L'auteur conserve la priprikte du , copyright in th~sthesis. Neither the droit d'auteur qui prdtege cette these thesis nor substantial extracts fiom it Ni la these ni des extraits substantiels may be pnnted or otherwise de celle-ci ne doivent etre imprimes reproduced without the author's ou autrement reproduits sans son autonsation. APPROVAL Name: Caroline Barrike Degree: Doctor of Philosophi Title of thqsis: From a Children's First Dictionary to a Lexical Knowl- edge Base of Conceptual Graphs Examining Committee: Lou IIafer ('l~air Dr. Frederick Popowich, Senior Supervisor Dr. Robert Iladley, Supervisor Dr. \'eronlca Dahl, Supervise Dr. Jith Delgrande, w: Examiner Dr. Handy Goebcl, ITni\.e,rsityof Alberta External Esarniner Date Approved: - Abstract This thesis aims at building a Lexical Knowledge Base (LKB) that will be useful to a Natural Language Processing (NLP) system by extracting information from a Machine Readable Dictionary (MRD). Our source of knowledge is the American Heritage First Dictionaryl(AHFD) which contains 1800 entries and is designed for children of age six to eight learning the structure and the basic vocabulary of their language. Using a children's dictionary allows us to restrict our vocabulary, but still work on general knowledge about day to day concepts and actions. Our Lexical Knowledge Base contains information extracted from the AHFD and represented using the Conceptual Graph (CG) formalism. The graph definitions explicitly give the information contained in all the noun and verb definitions from the AHFD. Each sentence of each definition is tagged, parsed and automatically trans- formed into a conceptual graph. The type hierarchy, extracted automatically from the definitions, groups all the nouns and verbs in the dictionary into a taxonomy. Covert categories will be discovered among the definitions and will complement the type hierarchy in its role for establishing concept similarity. Covert categories can be thought of as concepts not associated to a dictionary entry, such as "writing instrument" or "device giving time". They allow grouping of words based on different criteria than a common hypernym, and therefore augment the space to explore for finding similarity among concepts. The relation hierarchy is built manually which groups into subclasses/superclasses the relations used in our CG representation of 'Copyright 0199.1 by Houghton Mifflin Company. Reproduced by permission from THE AMER- ICAN HERITAGE FIRST DICTIONARY. definitions. The relations can be prepositions such as in, on or with or deeper se- mantic relations such as part-of, material or instrument. Concept clusters are constructed automatically around a trigger word to put it into a larger' context. Its graph representation is joined to the graph representations of other words in the dic- tionary that are related to it. The set of related words forms a concept cluster and their graph representation, showing all the relations between them and other related words, is a Concept Clustering Knowledge Graph. One important aspect of the thesis is the underlying thread of finding similarity through concept and graph comparison as a general way of processing information. The ideas presented in this thesis are implemented in a system ARC-Concept. We present and discuss the results obtained. Keywords: Lexical Knowledge Base, Machine Readable Dictionary, Concept Clus- tering, Type hierarchy, Conceptual Graphs A mes parents, pour tout leur amour et encouragement au fil des ans. Acknowledgments I thank my advisor Fred Popowich for believing in my good will to pursue my Ph.D. from a distance. I thank him for his moral support, his patience, his helpful comments and ideas on the present work but mostly for his positive approach to life. I thank Mark Trumbo, my husband, for his encouragement, his love, and his support. I thank him also for the useful discussions we have had relating to this work and for his good review of the whole thesis. Contents ... Abstract ..................................... 111 Acknowledgments ................................ vi List of Tables .................................. X List of Figures .................................. xii 1 INTRODUCTION ............................ 1 1.1 Choosing a dictionary ...................... 7 1.2 Conceptual graphs ........................ 16 1.2.1 Representing dictionary definitions .......... 18 1. .3 The type hierarchy ........................ 20 1.4 The meaning of a word ...................... 23 1.4.1 Cruse's View ...................... 24 1.4.2 Quillian's view ..................... 26 1.5 Layout of this dissertation ..................... 27 2 FROM A DEFINITION TO A CONCEPTUAL GRAPH ...... 29 2.1 Sentence to Conceptual Graph .................. 34 2.1.1 Sentence tagging .................... 36 2.1.2 Sentence parsing .................... 36 2.1.3 From a parse tree to a Conceptual Graph ...... 45 2.2 Building a type hierarchy .................... 50 2 .3 Structural Disambiguation .................... .52 2 ..3.1 Prepositions ....................... .? 2 2.i3.2 Conjunctions ...................... 5s 2. Semantic Disambiguation .................... ;59 vii 2.4.1 Anaphora resolution .................. 59 2.4.2 Word sense disambiguation .............. 60 2.4.3 Finding deeper semantic relations .......... 65 2.4.4 Conjunction distribution ............... 77 2.5 Discussion ............................ 78 3 A LEXICAL KNOWLEDGE BASE .................. 81 3.1 Graph definitions ......................... 88 3.1.1 Definitions as a set of interconnected nodes ..... 90 3.1.2 Certainty Information ................. 96 3.1.3 In Summary ....................... 108 3.2 Concept Lattice ......................... 109 3.2.1 Definingataxonomy ................. 110 3.2.2 Building a taxonomy from a dictionary ....... 112 3.2.3 Building a taxonomy from AHFD ........... 114 3.2.4 Genus disambiguation ................. 119 3.2.5 Comparing the children's taxonomy to WordNet . 122 3.2.6 Relative importance of hypernyms .......... 125 3.2.7 Covert categories ................... 128 3.2.8 In Summary ....................... 135 3.3 Relation taxonomy ........................ 138 3.4 Concept clusters ......................... 143 3.4.1 Semantically Significant Words ............ 146 3.4.2 Finding common subgraphs .............. 147 3.4.3 Joining two graphs .................. 1.52 3.1.4 Clustering procedure ................. 1.55 3.4.5 In Summary ....................... 157 3.5 Discussion ............................. 159 4 IALPLESIENTXTION AND RESULT ANALYSIS ........... 161 4.1 C'oGITo System ......................... 16:3 4.2 From a sentence to a Conceptual Graph ............ 164 2 . 1 Esample 1: DO(.: GHNUT ............... 16.7 ... Vlll 1.22 Example 2: BAT .................... l'iS 4.2.3 Example :3: PIANO .................. 190 4.3 Results over the whole dictionary ................. 194 1.4 Covert categories ......................... 198 4.5 Concept Clustering ....................... 210 4.5.1 Example of integration: POST-OFFICE ....... 211 4.5.2 Changing the threshold ................ 231 4.5.3 Changing the trigger word within the cluster ..... 233 4.5.4 Results on other trigger words ............ 238 4.6 Discussion ............................ 239 5 DISCUSSION AND CONCLUSION ................... 242 5.1 Dissertation summary ...................... 243 5.2 Major contributions ....................... 246 -5.2.1 Our use of Conceptual Graphs as a unifying formalism 247 5.2.2 Graph similarity for text processing .......... 215 5.2.3 LKB construction as a catalyst for future research . 250 5.2.4 Covert categories .................... 251 .5.2.5 Concept Clustering ................... 252 5 .3 Specific problems ......................... 254 5.4 Future research .......................... 256 5.4.1 Using the LKB for text processing .......... 264 5.5 Conclusion ............................ 271 Appendices A Electronic version of AHFD ....................... 275 B Irregular words and morphological rules ................ 279 C Parse rules ................................ 284 D Parse to Conceptual Graph rules .................... 291 E Semantic Relation Transformation Graphs ............... 299 List of Tables 1.1 Noundefinitionsfromadult'sAHDandAHFD ............ 12 2.1 Words used but not defined in the AHFD ............... 37 2.2 Words used but defined as different part-of-speech in the AHFD ... 38