The Analysis Of Noun Sequences Using Semantic Information Extracted From On-Line Dictionaries

The Analysis of Noun Sequences using Semantic Information Extracted from On-Line DictionariesLucretia H. VanderwendeOctober 5, 1995Technical ReportMSR-TR-95-57Microsoft ResearchMicrosoft CorporationOne Microsoft WayRedmond, WA 98052xv <h3>The Analysis of Noun Sequences using Semantic Information Extracted from On-Line Dictionaries</h3>A Dissertation submitted to the Faculty of the Graduate School of Arts and Sciences of Georgetown University in partial fulfillment of the requirements for the degree of Doctor of Philosophy in LinguisticsByLucretia H. Vanderwende, M.S.Washington, D.C.October 5, 1995 Copyright 1995 by Lucretia VanderwendeAll Rights Reserved<h3>The Analysis of Noun Sequences using Semantic Information Extracted from On-Line Dictionaries</h3>Lucretia H. Vanderwende, M.S.Mentor: Donald Loritz, Ph.D.ABSTRACTThis dissertation describes a computational system for the automatic analysis of noun sequences in unrestricted text. Noun sequences (also known as noun compounds or complex nominals) have several characteristics which prove to be obstacles to their automatic interpretation. First, the creation of noun sequences is highly productive in English; it is not possible to store all the noun sequences that will be encountered while processing text. Second, their interpretation is not recoverable from syntactic or morphological analysis. Interpreting a noun sequence, i.e., finding the relation between the nouns in a noun sequence, requires semantic information, both in limited domains and in unrestricted text. The semantic analysis in previous computational systems relied heavily on the availability of domain-specific knowledge bases; these have always been handcoded. In this dissertation, we will describe a new approach to the problem of interpreting noun sequences; we also propose a new classification schema for noun sequences, consisting of 14 basic relations/classes. The approach involves a small set of general rules for interpreting NSs which makes use of semantic information extracted from the definitions in on-line dictionaries; the process for automatically acquiring semantic information will be described in detail. Each general rule can be considered as the configuration of semantic features and attributes on the nouns which provide evidence for a particular noun sequence interpretation; the rules access a set of 28 semantic features and attributes. The rules test relatedness between the semantic information and the nouns in the noun sequence. The score for each rule is not determined by the presence or absence of semantic features and attributes, but by the degree to which the nouns are related.The results show that this system interprets 53% of the noun sequences in previously unseen text. An analysis of the results indicates that additional rules are needed and that the semantic information found provides good results, but some semantic information is still missing. For these tests, only information extracted from the definitions were used; on-line dictionaries also contain example sentences which should be exploited, as well as the definitions of words other those in the noun sequence.<h3>Acknowledgements</h3>I would like to express my gratitude to everyone who commented on this work as it developed and to everyone who encouraged me to complete this work. I am especially grateful for the trust and guidance that Karen Jensen and George Heidorn have shown me, and for their unbounded dedication to NLP.I owe thanks to the members of my dissertation committee, Professors Don Loritz, Karen Jensen and Cathy Ball, and to my professor Walter Cook, S.J. and to the late Francis P. Dinneen, S.J., and to my fellow students, above all Patty Schmid. Special mention goes to Ron Verheijen, who showed me that linguistics is both challenging and rewarding. I owe many thanks to the members of the NLP group at Microsoft Research, Bill Dolan, Karen Jensen, George Heidorn, Joseph Pentheroudakis, and Steve Richardson, and to Diana Peterson – they have endured much. I would also like to thank the Institute of Systems Science (National University of Singapore) and Microsoft Research, each of which provided a wonderful and stimulating environment in which this research could be continued.I also want to thank my friends already mentioned, Simonetta Montemagni, Lee Schwartz and Ben Erulkar for their constant support and encouragement. Most of all, I thank my family, Rita and Evert, Everett and Helma, and my husband, Arul Menezes, all of whom never demanded too much nor expected any less. TABLE OF CONTENTS1 Introduction 11.1 What is a Noun Sequence? 71.2 Coverage of noun sequences by SENS 82 Classification of noun sequences 112.1 Overview of previous noun sequence classifications 142.1.1 Jespersen 142.1.2 Lees 172.1.3 Levi 232.1.4 Downing 272.1.5 Finin 292.1.6 Leonard 322.2 The classification used for SENS 362.2.1 Some criteria for classifying noun sequences 392.3 Differences between SENS classification and others 442.3.1 One class elsewhere vs. multiple classes in SENS 472.3.1.1 Where? / When? 472.3.1.2 Whose ? / What is it part of? 482.3.1.3 What are its parts? / Made of what? 492.3.1.4 Made of what? / What kind is it? 502.3.1.5 Where? / What for? 512.3.2 One class in SENS vs. multiple classes elsewhere 522.3.2.1 What does it cause? 522.3.2.2 Where? 532.3.2.3 What is it part of? 542.3.2.4 Made of what? 552.3.3 Gaps in the classification schema 562.4 Conclusion 583 SENS: a System for Evaluating Noun Sequences 603.1 The algorithm for SENS 633.1.1 Jensen and Binot’s system for PP attachment 643.1.2 Control structure in the SENS algorithm 673.1.3 General rules for SENS 743.1.3.1 Rules for a ‘Who/What?’ interpretation 813.1.3.2 Rules for a ‘Whom/What?’ interpretation 853.1.3.3 Rules for a ‘Where?’ interpretation 883.1.3.4 Rules for a ‘When?’ interpretation 903.1.3.5 Rules for a ‘Whose?’ interpretation 913.1.3.6 Rules for a ‘What is it part of?’ interpretation 923.1.3.7 Rules for a ‘What are its parts?’ interpretation 943.1.3.8 Rules for a ‘What kind of?’ interpretation 953.1.3.9 Rules for a ‘What with?’ interpretation 963.1.3.10 Rules for a ‘What for?’ interpretation 973.1.3.11 Rules for a ‘Made of what?’ interpretation 1003.1.3.12 Rules for a ‘What about?’ interpretation 1013.1.3.13 Rules for a ‘What does it cause?’ interpretation 1023.1.3.14 Rules for a ‘What causes it?’ interpretation 1033.1.4 Matching procedure 1033.1.5 Determining the weight for each rule 1083.1.5.1 Rule and match weight combined 1123.1.6 An extended example: football field 114Step 1 115Step 2 122Step 3 126Step 4 1293.2 Previous computational approaches to NS 1313.2.1 Finin 1333.2.2 Importance of role nominals 1363.3.3 Leonard 1403.3.4 Parallel vs. serial algorithm 1433.3 Conclusion 1464 SESEMI : the System for Extracting SEMantic Information 1494.0.1 A dictionary entry 1514.0.2 Genus and differentia 1534.0.3 Semantic relations 1544.1 Previous work on on-line dictionaries 1584.1.1 Amsler 1584.1.2 Chodorow, Byrd and Heidorn 1594.1.3 Markowitz, Ahlswede and Evens 1604.1.4 Ahlswede and Evens 1624.1.5 Jensen and Binot 1654.1.6 Alshawi 1694.1.7 Slator 1724.2 Description of the syntactic analysis 1734.2.1 Data structures for MEG 1744.2.2 Types of lexical information 1764.2.3 Use of subcategorization information 1774.2.4 Partial parses 1784.2.5 A single parse 1814.3 How does SESEMI work? 1874.3.1 Example: the PURPOSE relation 1894.3.2 VERB-ARGS pattern 1924.4 Why not string patterns? 1954.4.1 Syntactic information transcends the local environment 1994.4.2 Functional information includes logical form 2044.5 Proposal for incorporating a reattachment component 2114.6 Conclusion 2165 SENS and SESEMI: Results 2185.1 Results and error analysis in the training corpus 2195.1.2 Incorrect rule weight 2215.1.3 SESEMI 2275.1.4 Syntactic analysis 2295.1.5 Dictionary content 2315.2 Results and error analysis in test corpora 2395.2.1 Incorrect rule weight 2435.2.2 Dictionary content 2465.3 Conclusion 2496 Review and future work 2546.1 SENS: the classification schema 2546.2 SENS: the rules 2576.3 SESEMI: automatic identification of semantic relations 260Appendix 1: The NSs in the training corpus 264Appendix 2: The NSs in the LOA test corpus 272Appendix 3: The NSs in the WSJ test corpus 278References 286 LIST OF FIGURESFigure 1. Possible structures for truck driver (see Spencer 1991, p.327-8) 13Figure 2. Jespersen’s classification schema 16Figure 3. Example of NS derived from a kernel sentence (Lees 1960, p.138) 17Figure 4. Lees (1960) classification schema (Lees 1960, p.125) 19Figure 5. Lees (1970) classification schema (Lees 1970, p.182-5) 22Figure 6. Additional classes of NSs in Lees (1970) 23Figure 7. Levi’s classification schema 24Figure 8. Semantic features for Levi’s classification (Levi 1978, p.240) 26Figure 9. Downing’s classification schema 28Figure 10. Finin’s classification schema (Finin 1980, p.18-20) 31Figure 11. Leonard’s classification schema (Leonard 1984, p.8-12) 33Figure 12. Classification of NSs for SENS 38Figure 13. Overview of classification schemas for NSs 45Figure 14. PEG parse tree for a syntactically ambiguous sentence, I ate a fish with a fork (Jensen and Binot 1987, p.252). 65Figure 15. Possible interpretations for garden party 69Figure 16. Control Structure for SENS 73Figure 17. Partial semantic frame for the definition of the noun market (L n,1) 75Figure 18. Semantic features and attributes used in SENS 76Figure 19. Table representation of the ‘Who/What?’ rules 81Figure 20. Table representation of the ‘Whom/What?’ rules 85Figure 21. Table representation of the ‘Where?’ rules 88Figure 22. Table representation of the ‘When?’ rules 90Figure 23. Table representation of the ‘Whose?’ rule 91Figure 24. Table representation of the ‘What is it part of?’ rules 92Figure 25. Table representation of the ‘What are its parts?’ rules 94Figure 26. Table representation of the ‘What kind of?’ rules 95Figure 27. Table representation of the ‘What with?’ rules 96Figure 28. Table representation of the ‘What for?’ rules 97Figure 29. Table representation of the ‘Made of what?’ rules 100Figure 30. Table representation of the ‘What about?’ rules 101Figure 31. Table representation of the ‘What does it cause?’ rules 102Figure 32. Table representation of the ‘What causes it?’ rules 103Figure 33. Semantic network for the ‘Where?’ interpretation of meadow mouse 105Figure 34. Semantic network for the ‘Who/What?’ interpretation of cat scratch 106Figure 35. Semantic network for the ‘What is the location for?’ interpretation of bird cage 107Figure 36. Semantic network for the ‘What is the location for?’ interpretation of refugee camp 108Figure 37. Part of the ‘What for?’ relation which tests INSTRUMENT feature (see Figure 28) 111Figure 38. Part of the ‘Made of what?’ relation which tests MATERIAL feature (see Figure 29) 111Figure 39. Semantic network for bird cage for the ‘Whom/What?’ rule 113Figure 40. Semantic network for bird cage for the ‘What is the location for?’ rule. 114Figure 41. Control Structure for SENS 115Figure 42. Sense frame for field (L n,1) 116Figure 43. The head-based rules which test semantic features and attributes present in the sense frame of field (L n,1) 117Figure 44. Sense frame for field (L n,2a) 118Figure 45. The head-based rule which tests semantic features and attributes present in the sense frame of field (L n,2a) (see Figure 28) 119Figure 46. Semantic network for the ‘What is the location for?’ interpretation of football field 120Figure 47. Sense frame for field (L n,7) 121Figure 48. The head-based rule which tests semantic features and attributes present in the sense frame of field (L n,7) (see Figure 26) 121Figure 49. Results for football field after analyzing the head senses of field 122Figure 50. Sense frame for football (L n,1) 123Figure 51. The mod-based rule which tests semantic features and attributes present in the sense frame of football (L n,1) (see Figure 21) 124Figure 52. Sense frame for football (L n,2) 124Figure 53. The mod-based rule which tests semantic features and attributes present in the sense frame of football (L n,2) (see Figure 20) 125Figure 54. Sense frame for football (L n,3) 125Figure 55. The mod-based rule which tests semantic features and attributes present in the sense frame of football (L n,3) (see Figure 23) 126Figure 56. Results for football field after analyzing the modifier senses of football 126Figure 57. Sense frame for field (L v,1) 127Figure 58. The head-deverbal rules which test semantic features and attributes present in the sense frame of field (L v,1) (see figures 20 and 19) 127Figure 59. Results for football field after analyzing the verbal senses of field 129Figure 60. Possible interpretations of NS football field 130Figure 61. Isabelle’s representation of the role nominal pilot (Isabelle 1984, p.513) 137Figure 62. Leonard’s rule ordering 141Figure 63. Partial semantic frame for the definition of the noun market (L n,1) 156Figure 64. PEG parse tree for a syntactically ambiguous sentence, 166Figure 65. Alshawi’s semantic structure of launch (L v,2) (Alshawi 1989, p. 158) 170Figure 66. Alshawi’s semantic structure of hornbeam (L n,1) (Alshawi 1989, p. 171Figure 67. Semantic frame for launch (L v,2) produced by SESEMI 172Figure 68. MEG syntactic analysis of cellar (L n,1) 174Figure 69. Example of MEG record structure 175Figure 70. Lexical records for the words room and for 176Figure 71. MEG syntactic analysis for mansion (L n) 179Figure 72. MEG syntactic analysis of cellar (L n,1) 180Figure 73. MEG syntactic analysis for baton (L n,3) 183Figure 74. MEG syntactic analysis for plantain (L n) 185Figure 75. SESEMI semantic frame for plantain (L n) 186Figure 76. MEG syntactic analysis of cellar (L n,1) 189Figure 77. Semantic frame which results after the PURPOSE pattern applies 190Figure 78. Record structure with functional information for PP1 in Figure 76 191Figure 79. SESEMI (partial) semantic frame for cellar (L n,1) 191Figure 80. MEG (partial) syntactic analysis of market (L n,1) 193Figure 81. Partial semantic frame for market (L n,1) 194Figure 82. Record structure for VP2 in Figure 80 194Figure 83. Semantic frame for market (L n,1) 195Figure 84. MEG (partial) syntactic analysis for bear (L n,1) 201Figure 85. MEG partial syntactic analysis of avocado (L n) 202Figure 86. Record structure for PP1 in Figure 85 203Figure 87. SESEMI semantic frame for avocado (L n) 203Figure 88. MEG syntactic analysis for groundnut (L n) 206Figure 89. Record structures with attribute SemNode, which show the logical 207Figure 90. Partial SESEMI semantic frame for groundnut (L n) 208Figure 91. Record structure for VP1 in Figure 88 209Figure 92. Record structure for VP2 in Figure 88 210Figure 93. Complete SESEMI semantic frame for groundnut (L n) 210Figure 94. MEG syntactic analysis for plantain (L n) 212Figure 95. MEG syntactic analysis of plantain (L n) after revision by the reattachment component 215Figure 96. Proposed SESEMI semantic frame for plantain (L n) 215Figure 97. Overview of SENS results for the training corpus……………………… 219Figure 98. SENS interpretations of the NS chocolate bar ………………………… 222Figure 99. Semantic frame for the definition of the noun bar (L n,10a)…………… 223

The Analysis Of Noun Sequences Using Semantic Information Extracted From On-Line Dictionaries

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support