<<

A WordNet-based Algorithm for Sense Disambiguation

Xiaobin Li" Stan Szpakowicz and Stan Matwin Department of Department of Computer Science Concordia University University of Ottawa Montreal, Quebec Ottawa, Ontario Canada H3G IMS Canada MN 6N5 xiaobmlQjcs concordia ca {szpak,stan}@csi uottawa ca

Abstract interpretation By design we only need the user to ap• prove the system s findings or prompt it for alternatives We present an algorithm for automatic word Also hy design we limit ourselves to information source* sense disambiguation ba.sed on lexical knowl in the inexpensive and other edge contained in WordNet and on the results lexical sources such as WordNet of surface-syntactic analvsis The algorithm lb part of a system that analyzes texts in or WordNel [Mill* r, 1990 Beckwith et al , 1991] is a very der to acquire knowledge in the presence of as rich source of lexical knowledge Since most entries have little pre-coded semantic knowledge as possi multiple senses, we fare a severe problem of blc On the other hand, we want lo make Hit The motivation for the work described here is the desire besl us* of public-domain information sources to design a word sense disambiguation (WSD) algorithm such as WordNet Rather than depend on large that satisfies the needs of our project (learning from text) amounts of hand-crafted knowledge or statis• without large amounts of hand-crafted knowledge or sta• tical data from large corpora, we use syntac• tistical data from large corpora We concentrate on using tic information and information in WordNet information in WordNet to the fullest extent and mini• and minimize the need for other knowledg< mizing the need for other knowledge sources in the WSD sources in the word sense disambiguation pro• algorithm between (defined cess We propose to guide disambiguation by in the next section) plays an important role in the algo• semantic similarity between words and heuris• rithm We propose several heuristic rules to guide WSD tic rules based on this similarity I lu algorithm Icsted on an unrestricted, real text (the Canadian In• has been applied to the ( anadian Income fax come Tax Guide) this automatic WSD method gives Guide Test results indicate that even on a rela- encouraging results tivelv small text the proposed method produces correct meaning more than 11% of the Word sense disambiguation is essential in natural lan• tim< guage processing Early symbolic methods [Hirst, 1967, Small and Rieger 1962, Wilks, 1975] heavily rely on large amounts of hand-crafted knowledge As a reault, 1 Introduction they can only work in a specific domain To overcome this weakness a variety of statistical WSD methods This work is part of the project that amih at a svner [Brown et a/, 1991, Gale et a/, 1992, Resnik, 1992, gistic integration of Machine Learning and Natural Lan• Schutze, 1992,Charniak, 1993, Lehman, 1994] have been guage Processing The long-t< rm goal of the project is put forward They scale up easily and this makes them a system that performs machine learning on the results useful for large unrestricted corpora One of the most of text analysis to acquire a useful collection of produc• important steps in statistical WSD methods, however, tion rules Because such a svstem should not require is statistically motivated extraction of word-word rela• extensive domain knowledge up front, text analysis is tionships from corpora As Resnik points out [1993], to be dont in a knowledge-scant setting and with mm the corpus may fail to provide sufficient information imal user involvement A domain-independent surface- about relevant word/word relationships and word/word syntactic parser produces an analysis of a text fragment relationships, even those supported by the data, may (usually a ) that undergoes interactive semantic not be the appropriate relationships to look at for some tasks" Most of the statistical methods suffer from this 'Tins work was done while the first author was with the Knowledge Acquisition and Machine Learning group at the limitation Resnik proposes constraining the set of pos• University of Ottawa The authors are gralcful to the Nat sible word classes by using WordNet Rather than im• ural Sciences and Engineering Research Council of Canada proving statistical approaches [Resnik, 1993, Sussna, far having supported this work with a Strategic Grant No 1993, Voorhees, 1993] we propose a completely differ• STR0117764 ent WordNet-based algorithm for WSD

1368 NATURAL LANGUAGE 2 WordNet and Semantic Similarity a child synset property belonging, holding, material WordNet is a lexical with a remarkably broad possession" is not a of its immediate parent coverage One of its most outstanding qualities is a svnset 'possession" at all Using the information about word sense network A word sense node in this net• a parent synset to decide the intended meaning of a word work is a group of strict called synset' which in its immediate child svnset is different from using the is regarded as a basic object in WordNet Each sense information about a child synset to decide the intended of a word is mapped to such a word sens*, node (1 e meaning of a word in its immediate parent synset So a synset) in WordNet and all word sense nodes in here we have divided the semantic similarity between a parent synset and its immediate child synset into two WordNet are linked by a \anety of semantic relation• levels (Level 2 and Level 3) ships, such as IS-A (hypernymy/hyponymy) PART-OF (meronymy/holonymy), synonymy and antonymy The Although we have only applied the algorithm to WSD IS-A relationship is dominant - synsels are grouped by of noun objects in a text (i e that are objects of it into hicrarchies Our algorithm only accesses informa• in sentences analyzed by, the system), it can also be tion about nouns and verbs, for which thtre exist such applied to other noun phrases in a sentence, in particular lexical hierarchies subjects It has become common to use some messure of se• In this approach, we must consider contexts that are mantic similarity between words to support word sense relevant lo our method and tht semantic similarity in disambiguation [Resmk 1992, Schutze, 1992] In this these contexts work, we have adopted the following defintion seman• Tor all practical purposes, the possible senses of a word tic similarity between words is inverselv proportional (o can be found in WordNet, but due to its extremely broad the semantic distance between words in a Wordnet IS coverage most words have multiple word senses A con• A hierarchy By investigating the semantic relationships text must be considered in order to decide a particular word sense The notion of and its use could between two given words in WordNet hierarchies seman- differ widely across WSD methods One may consider tic similarity can be measured and roughly divided into a whole text, a 100-word window, a sentence or some the following four levels specific words nearby, and so on In our work, we as• Level 1 The words are strict synonynis according to sume that a group of words co-occurring in a sentence WordNet one word is in the same synset as the will determine earh other s sense despite each of them other word being multiply ambiguous A simple and effective way is Lo consider as context the verbs that dominate noun Level 2 The words are extended synonyms accord• objects in sentences That is, we investigate -noun ing to WordNet one word is the immediate parent pairs to determine tht intended meaning of noun objects node of the other word in a IS-A hierarchy of Word- in sentences Net When used to get the synonyms of a word, WordNet not only produces the strict synonyms (a In this work, then we focus on investigating two as• synset), but also the immediate parent nodes of this pects of semantic similarity synset in a IS-A hierarchy of WordNet Here, these • The semantic similarity of the noun objects immediate parent nodes are called extended syn- onyms • The semantic similarity of their verb contexts Level 3 Tht words are hyponyms according to Word- 3 Heuristic Rules and Confidence of Net one word is a child node of the other word in a IS-A hierarchy of WordNet Results Level 4 The words have a coordinate relationship in We have set out to determine the intended meaning of a WordNet a word is a sibling node of the other word noun object in a given verb context from all candidate (i e both words have tht same parent node) m a word senses in WordNet of the noun object, select one IS-A hierarchy of WordNet sense that best fits the given verb context Suppose that the algorithm seeks the intended mean• Level 1 concerns the semantic similarity between ing of a noun object NOBJ in its verb context VC (that words inside a synset, Level 2 - Level 4 describe the se• is, the intended meaning of NOBJ in a verb-object pair mantic similarity between words that, belong to different (VC, NOBJ)) NOBJ has n candidate word senses in synsets (a synset which is composed of a group of strict WordNet s(k) means the kth word sense of NOBJ in synonyms is a node in the hierarchy of WordNet) The WordNet, l

LI SZPAKOWICZ,ANDMATWIN 1369 figure 3 Principle of Heuristic HR3

Figure 1 Principle of Heuristic HRl Heuristic Rule 4 (HR4) Search the text for a "such as" structure which follows the verb-object Heuristic Rule 2 (HR2) Search the text for a verb- pair (VC, NOBJ) in the text The object NOBJ' of object pair (VC NOBJ) in which VC has the se• such as" certainh has a synonym or hyponym re• mantic similarity with VC If the word sense of lationship with a word sense of NOBJ in WordNet NOBJ in verb context VC has already been ac• Then consider this word sense as a result (sec Fig 4 quired, then consider this word sense as a result and the example in SI EP 7 of the WSD algorithm (see Fig 2 and the example in STEP 4 of the WSD in the next section) algorithm)

Heuristic Rule 5 (HR5) Search the text for a VP coordinate structure such as either 'VC and VC ' or 'VC or VC " which has the noun object NOBJ Heuristic Rule 3 (HR3) Search the text for a verb If the word sense of NOBJ in the verb context VC object pair (VC , NOBJ ) in which VC ha* the se• has already been acquired, then consider this word mantic similarity with VC and NOBJ' has the ^e sense as a result (see Fig 5 and the example in mantic similarity with a word sense of NOBJ in STEP 8 of the WSD algorithm) WordNet Then consider this word sense as a re• HR4 only aims at one kind of special cases and its sult (see Fig 3 and the example in STEP 5 and coverage is limited HR5, like HR2, also depends on pre• STEP 6 of the WSD algorithm) viously acquired results, but its coverage is more limited Obviously HR2 is not always feasible and it just con• Because the results acquired by applying different tributes to the inference of further results based on the heuristic rules have different accuracy, we have to de• previously acquired results HR3 infers results with a fine the measure of confidence in the result ("CF" for weaker constraint on the semantic similarity between short) according to different heuristic rules and differ• verb-object pairs HR1 is the main heuristic rule in the ent semantic similarity adopted in these rules In our algorithm algorithm, the assignment of CF values intuitively cor• Besides the heuristic rules based on the semantic sim• responds to our notion of decreasing semantic similarity ilarity between verb-object pairs special heuristic rules (the test of the viability of this intuitive assignment has related to the syntactic structure have also been formu• been shown in section 5) The CF values are numbers lated to assiBt word sense disambiguation, see [Grefen- between 0 and 1 The assignment of CF values to levels stette and Hearst, 1992] for a similar approach is arbitrary, but it has served us well so far Because

1370 NATURAL LANGUAGE LI, SZPAKOWICZ, AND MATWIN 1371 STEP 4 ture "such as real estate" can be found in the text which Find in the parsed text a verb-object pair (Wv , Wn) follows 'sell property" in the text (that is, " sell prop- in which Wv has a synonymv, hyponvmy or coordinate erty, such as real estate " ) and the object "real es• relationship with Wv and the word sense of Wn in its tate" of this 'such as" structure is hyponymous with verb context Wv has already been acquired (suppose it sense(property, 1), this sense is given as "any tangible is sense (Wn, k) 1

For example suppose Wn = ' contribution' and Wv — STEP 8 ' prorate' ' contribution" has 6 senses m WordNet A Find in the parsed text a coordinate verb phrase struc verb-object pair ' calculate contribution can be found ture Wv and Wv " or "Wv or Wv' ' whose noun object in the text and ' calculate has a synonym relationship is Wn and the meaning of Wn in its verb context Wv with ' prorate The meaning of ' contribution in its has alreadv been acquired (suppose it is sense(Wn, k) verb context calculate has already been acquired (it is l

1372 NATURAL LANGUAGE wrote off its losses") => transferred property, transferred possession

Correct multiple solutions - more than one reason• able meaning of the noun object has been selected by the WSD algorithm Because some verbs cannot put strong restrictions on their objects, it is possi• ble that several word senses of a noun object are all reasonable in the verb context This test result for the WSD algorithm is encouraging For example, for the verb-object claim charge' , With human's assessment, the word sense disambigua• among 13 candidate word senses in WordNet, two tion has a high accuracy of up lo 72% (one correct so• word senses of "charge' have been selected by the lution and correct multiple solutions) Among the re• algorithm as the intended meaning of charge in maining 28% cases, there are only 4% wrong solutions the verb context 'claim' sense(charge, 1) and The 5% cases with partially correct solutions show how sense(charge, 2) In fact, both are reasonable in word sense ambiguity has been improved bv the WSD the verb context claim algorithm Sense 1 Of course, on the other hand, with 15% correct mul• charge - (a financial liability, such as a tax) tiple solution cases this WSD algorithm is obviously ->liability financial obligation, indebtedness pecu• limited In these cases, a verb context, for example, niary obligation include or ' have is not strong enough to limit the multiple word senses of its noun object to the only Sense 2 intended meaning In addition, even for some single- charge (the price charged for some article or ser• solution cases the only intended meaning obtained by vice) the algorithm is suitable in the verbal context considered => cost price, terms, damage by our algorithm, but is not necessarily appropriate for the whole text In order to produce the only intended Partially correct solutions - more than one word meaning of the noun objects in the whole text, contexts sense of the noun object has been selected by the other than just the verb context should be exploited in WSD algorithm Among those it least, there is the future versions of our algorithm the correct word sense that is reasonable in the verb context There are 19% no-solution cases in the test This means that 593 background verb-object pairs in the text For example for (he verb-object pur ' attach re• are not sufficient to support word sense disambiguation ceipt among three candidate word senses two of these cases The WSD algorithm depends on the rel• word senses of ' receipt have been selected by the evant information in a text, so the more relevant infor• WSD algorithm as the intended meaning of 're• mation in the text the better the WSD result If a ceipt in the verb context' attach" sense (receipt,-2) text is short of relevant information for supporting WSD and sense(receipt 3) In fac( only sense(receipt, 2) process, a good idea is that instead of using relevant is reasonable in the verb context attach' information from the text, the WSD algorithm works based on some supporting information from a large cor• Sense 2 pus (that is, with a large amount of verb-object pairs receipt, acknowledgment, acknowledgement from a large corpus as its background knowledge) => commercial document commercial instrument The sensitivity of results to the specific setting of CFs Sense 3 used has been noticed in the experiment as well The fol• reception receipt, receiving - (the act of receiving) lowing results indicate that indeed the quality decreases => acquisition, acquiring, taking with the decreasing values although mavbe the actual setting could be changed as long as its values are in Wrong solution the meanings of the noun object roughly the same decreasing order for HR1 - HR5 selected by the WSD algorithm are unreasonable for its verb context

No solution no result has been acquired by the algo• rithm For some cases, if no relev ant information for supporting the WSD process of these cases can be found in the text (e g there is no sufficient informa• tion in the text), the algorithm would not produce any solution

The following statistical data have been obtained by running the WSD program to deal with the 397 cases

LI, SZPAKOWICZ AND MATWIN 1373 values in the WSD algorithm is quite acceptable [Resnik, 1992] Resnik, P , "WordNet and Distributional Analysis A Class-based Approach to Lexical Discov• ery", In AAA1 Workshop on Statistically-Based NLP 6 Conclusions Techniques, San Jose, July 1992 In this paper, we have presented a WordNet-based al• [Resnik, 1993] Resnik, P , "Semantic CI asses and Syn• gorithm for word sense disambiguation The algorithm tactic Ambiguity", In Proc of the ARPA Workshop is designed to support text analysis with minimal pre- on Human Language Technology Princeton, 1993 coded knowledge Although the algorithm is assumed to aim at word sense disambiguation of noun objects in a [Schutze, 1992] Schutze, H , "Word Sense Disambigua• text, in fact it can be easily transformed to cover some tion With Sublexical Representations", In AAAI other parts of speech in a text Workshop on statistically-Based NLP Techniques, San The approach focuses on two parts the full utilization Jose, July 1992, 109-113 of the important relationships between words in Word- [Small and Rieger 1982] Small, S L and Rieger, G J , Net and the exploration of WSD heuristic rules based and Comprehending with Word Experts (a on the semantic similarity between words The exper• theory and its realization)', W G Lehnert and M iment described here demonstrates the viability of the H Ringle (ed ), In Strategies for Natural Language WSD algorithm, the rationality of the confidence value Processing, Lawrence Erlbaum Associates, 1982, 89- assignment and show the potential of a WordNet-based 147 method The experiment also shows the limitations of [Sussna, 1993] Subsna, M , "Word Sense Disambigua• considering only verb contexts in the WSD process tion for Free-text Indexing Using a Massive Semantic Future work on WSD will focus on investigating the Network , In ClhM 93, 1993 possibility of the involvement of more complex context in the WSD process and considering the effective com• [Voorhees, 1993] Voorhees E M , "Using WordNet to bination between the current result with verb contexts Disambiguate Word Sense for Text Retrieval" , In Proc and the possible future result with some other contexts ACM 11GIR fly, Pittsburgh, 1993, 171-180 [Wilks , 1975] Wilks Y A , "A Preferential, Pattern- References seeking for Natural Language Inference , In Vol 6 1975 [Beckwith et al, 1991] Beckwith, R , C Fellbaum D Gross and G Miller, "WordNet A Lexical database Organized on Psychohnguistic Principles , In Lexical Acquisition Exploiting On-line Resources to Build a Lawrence Erlbaum Associates, 1991 [Brown et al 199l] Brown, P F S A Delia Pietra V J Delia Pietra and R L Mercer Word Sense Disambiguation Using Statistical Methods In Proc ACL Meeting, Berkeley 1991, 264-270 [Charmak 1991] Charmak E Statistical La nguage Learning MIT Press, 1993 [Gale et al, 1992] ale, W A K W Church and D Yarowsky, "A Method for Disambiguating Word Senses in a Large Corpus", In Computers and the Humanities, 1992 [Grefenstette and Hearst 1992] Grefenstette, G and M Hearst, "A Method for Refining Automatically Discovered Lexical Relations Combining Weak Tech• niques for Stronger Results In A AAI Workshop on Statistically-Based NLP Techniques, San Jose, July 1992 [Hirst, 1987] Hirst, G , Semantic interpretation and the Disambiguation of Ambiguity Cambridge University Press, England 1987 [Lehman, 1994] Lehman, J F , "Toward the Essential Nature of Statistical Knowledge in Sense Disambigua• tion", In Proc AAAF-94, 1994, 734-741 [Miller, 1990] Miller, G , "WordNet An On-line Lexical Database' , In International Journal of , Vol 3, No 4, 1990

1374 NATURAL LANGUAGE