<<

From: AAAI Technical Report SS-95-01. Compilation copyright © 1995, AAAI (www.aaai.org). All rights reserved.

Lexical Acquisition via Constraint Solving *

Ted Pedersen Weidong Chen Departmentof ComputerScience & Engineering SouthernMethodist University Dallas,TX 75275 {pedersen, wchen}eseas, smu. edu

Abstract havea solution.Otherwise, constraints are inferred forunknown words that will make the sentencevalid. This paperdescribes a methodto auto- We choose to use Link Grammar[Sleatorand maticallyacquire the syntacticand se- Temperley,1991] as it providesa convenientmeans manticclassifications of unknown words. for expressingbidirectional constraints. Among the Our methodreduces the searchspace of otherframeworks we have investigatedwere Depen- thelexical acquisition problem by utiliz- dency Grammar[Mel~uk, 1988], Categorial Gram- ingboth the left and the right context of mar [Oehrleet aI., 1988],and WordGrammar [Hud- the unknownword. Link Grammarpro- son,1984] all of whichare lexically-based.We se- ridesa convenientframework in whichto lectedLink Grammar due to itsexplicit use of right implementour method. andleft context and the availability of an implemen- tationthat includes a 24,000word lexicon. However, our approachis applicableto any systemthat inte- 1 Introduction gratesbidirectional constraints explicitly in the lex- A robustNatural Language Processing (NLP) sys- icon. tem mustbe ableto processsentences that contain Thispaper begins with an introductionto Link wordsunknown to its lexicon.The syntacticand Grammar.We describethe processof acquiringthe semanticproperties of unknownwords are derived syntaxof unknownwords and outlinethe processof fromthose of knownwords in a sentence,assuming semanticacquisition. We close with a discussionof thatthe given sentence is valid. related work and our plans for the future. Theunderlying linguistic framework plays a criti- calrole in lexicalacquisition. Linguistic frameworks 2 Link Grammar canbe broadlyclassified into two groups: those with phrasestructure rules and thosewithout. The lexi- Link Grammar[Sleator and Temperley, 1991] is a con of knownwords and any phrasestructure rules context-free linguistic framework that is lexically thatexist determine the size of thesearch space for based. It differs from other context-free grammars the classificationof unknown words. In general,the in thatthere are no decomposableconstituent struc- morecomplex the phrasestructure rules, the larger turesand its grammar rules are implicit in thelexi- thesearch space. con. Thispaper explores lexical acquisition in a frame- Eachword in the grammaris definedby a syntac- workwithout phrase structure rules. All constraints ticconstraint that is expressedin a disjunctivenor- on the usageof wordsare integratedinto the lexi- realform. Each disjunct consists of a pairof ordered con.We use a novellexical representation that ex- listsof theform ((11, ..., /m-l, Im)(rn, m-l, ..., plicitlyspecifies what syntactic and semantic classes wherethe lefthand list is made up of connectors of wordsmay appear to theleft and to theright of a thatmust link to wordsto theleft of theword in the wordin a validsentence. If allwords are known in a sentenceand likewisefor theright hand list. Each sentence,it is validonly if theassociated constraints wordcan have multiple disjuncts, which implies that it canbe usedin varioussyntactic contexts. *Supportedin partby theNational Science Founda- The followingis a simpleexample of a LinkGram- tionunder Grant No. IRI-9212074. mar:

118 big, yellow: (() (A)) is to select one disjunct for each word in a sentence car, corn, condor, that will lead to &atisfaction of the the meta-rules. gasoline, meat: ((A,D,,O,) ((X,Ds)(Ss)) 3 Syntactic Acquisition ((Ds)(Ss)) Syntactic acquisition is the proce~ of mapping an ((D,,O,) unknownword to a finite set of syntactic categories. (COs)()) In Link Grammar syntactic categories are repre- eats: ((ss)(o)) sented by the constraints that are expressed as dis- ss)()) juncts. Our lexical acquisition system is not called the: l(() (D)) upon to create or identify new syntactic categories aa we assume that these are already known. a sentence in Link Grammarconsists of Given a sentence with unknown words the dis- choosing one disjunct for each word such that it can juncts of unknown words are determined based upon be connected to the surrounding words aa specified the syntactic constraints of the knownwords in the in that disjunct. For a simple example consider the sentence. sequence of words: The condor eats the meat" and For instance suppose that snipe is an unknown the following choices of disjuncts for each word from word in the sentence: "The snipe eats meat". The the lexicon above: following lists all the choices for the disjuncts of the the: (() (D)) known words which come from the lexicon. condor: ((Ds) (Ss)) the: (() (D)) eats: ((ss)(o)) snipe: ((?) (?)) the: (() (V)) eats: ((s,)(o)) meat: ((Ds,Os) ((as) meat: ((A,Ds,Os)()) ((A,Ds)(Ss)) The following diagram (called a linkage) shows the ((Ds)(Ss)) links amongthe words that justify the validity of the ((D,,O,)()) sentence according to Link Grammar. ((o,)()) + .... 08--+ It must be determined what disjnnct associated +-Ds--+-Ss--+ +-Ds-+ with ’snipe’ will allow for the selection of a single I disjunct for every known word such that each word 1:he condor eL~s 1:he mea~ can have its disjtmct satisfied in accordance with In general, a sequence of words is a sentence if it the meta-rules previously discussed. There are 10 is possible to draw links amongthe words in such distinct disjnncts in the above gr~,mmar and any one a way that the syntactic constraint of every word of those could be the proper syntactic category for is satisfied and all the following meta-rules are ob- ’snipe’. served: We could attempt to parse by blindly assigning to ’snipe’ each of these disjuncts and see which led ¯ Planarity: Links drawn above the sentence do to a valid linkage. Howeverthis is impractical since not intersect. more complicated grammars will have hundreds or ¯ Connectivity: There is a path from any word even thousands of known disjnncts. In fact, in the in the sentence to any other word via the links. current 24,000 word lexicon there are approximately ¯ Ordering: For each disjunct of a word w, of the 6,500 different syntactic constraints. A blind ap- form((iz, ..., In-z, l,n)(r,, m-x, ..., rx)), proach would assign all of these disjnncts to ’snipe’ m _> 0 and n >_ 0, the left hand list of con- and then attempt to parse. It is possible to greatly nectors indicates links to words to the left of reduce the number of candidate disjnncts by analyz- w, and likewise for the right hand list. In ad- ing the disjuncts for the known words. Those dis- dition, the larger the subscript of a connector, juncts that violate the constraints of the meta-rules the further away the word with the matching are eliminated. connector is from w. The disjuncts ((A,Ds)(Ss)) and ((Ds)(Ss)) ’meat’ are immediately eliminated as they can never ¯ Exclusion: No two links may connect the same be satisfied since there are no words to the right of pair of words. ’meat’. Parsing in Link Grammarcorresponds to constraint The disjunct ((A,Ds,Os)()) for ’meat’ can solving according to these meta-rules. The objective be eliminated. If the A connector is to be satisfied

119 it would have to be sat/~ied by ’snipe’. The or- +-Ds--+-Ss--+~Os-+ dering meta-rule implies that the Ds connector in I I I I ’meat’ wouldhave to be satisfied by ’the’ but then the snipe eal:s mea~ the remaining Os connector in ’meat’ would not be satiMiable since there are no wordspreceding ’the’. 4 Semantic Acquisition That leaves the disjnncts ((Ds,Os)()) and )) as the remainingpossibilities for ’meat’. Thedis- Acquisition of lexical semantics is defined in junct ((Ds,Os)()) can be eliminated since the [Berwick, 1983; Granger, 1977; Hastings, 1994; Rns- words that can satisfy the Ds connector are ’the’or sell, 1993] as mappingunknown words to knowncon- cepts. [Hastings, 1994; Russell, 1993] assumethat ’snipe’. Again the ordering meta-ru/e makesit im- possible to satisfy the Os connector. Thus the only the knowledgebase is a concept hierarchy structured remainingcandidate disjunct for ’meat’ is ((Os)()). as a tree where children are morespecific concepts The next wordconsidered is ’eats’. There are two than their parents. There are separate hierarchies pouihle disjuncts and neither can he immediately for nouns and verbs. Rather than usin~ concept hi- eliminated. The left handside of each disjunct con- erarchies [Berwick,1983; Granger,1977J used scripts s/sts of an Ss connector.This could only be satisfied and causal networks to represent a sequence of re- by ’snipe’ whichtherefore must have an Ss connector lated events. In their workLexical Acquisition con- sists of mappingan unknown word into a knownse- in its right hand side. Recall that the left handside quence of ’meat’ consists of an Os connector. This could be of events. Weadopt the convention of [Hast- satisfied either by the ((Ss)(O)) disjunct for ’eats’ ings, 1994; Russell, 1993Jand attempt to map un- if the right handside of ’snipe’ consists of ((Os,Ss)). knownwords into a concept hierarchy. The left handside of ’snipe’ need only consist of a In order to semantically classify an unknownword D connector in order to satisfy the right hand s/de the lexical entries of known words must be aug- of ’the’. Thus the disjunct for ’snipe’ must be ei- mentedwith semantic information derived from the actual usage of themin a variety of contexts. ther ((D)(Ss)) or ((D)(Os,Ss)) and we have eliminated any of the candidate disjuncts for ’eats’. As sentences with no unknownwords are parsed, Unfortunately the meta-rules do not allow for the each connector in the syntactic constraints of nouns further elimination of candidate disjuncts. and verbs is tagged with the noun or verb with which it connects to. For instance given the sentence: "The In cases suchas this the lexicon is usedaa a knowl- condor eats meat", the nouns and verbs are tagged edge source and will be used to resolve the issue. The as follows: disjunct ((D)(Ss)) is selected for the word’snipe’ since it appearsin the lexicon and is normallyassoci- the: (() (D)) ated with simple nouns. Thus the disjnnct ((Ss)(O)) condor: ((D) ($8~,,,)) is the only possibility for ’eats’. eats: The disjunct ((D)(Os,Ss)) does not appear meat: ((Os..,,) lexicon and in fact implies that the word it is as- Whena wordoccurs in zelatedsyntactic contexts sociated with is both a noun and a verb. To elim- thesemantic tags on thesyntactic constraint~ are inate such nonsensical combinations of connectors mergedthrough generalization using the superclass the lexicon of knownwords is consulted to see if a informationcontained in the lexicon. Suppose that theorized disjnnct has been used with a knownword, thefollowing sentences with no unknown words have and if so it is accepted. Theintuition is that even beenprocessed. though a word is unknownit is likely to belong to the same syntactic category as that of someknown $1:The big cow eats yellow corn. words. This follows from the assumption that the $2:The condor eats meat. set of syntactic categories is closed and will not be SO:The car eats gasoline. addedto by the lexical acquisition system. For ef- Thecorresponding semantic tags for ’eats’ are: ficiency these constraints can be used to avoid the eats: generationof nonsensicaldisjuncts in the first place. ((Sso~,) (0o...)) ((s,o..~..)(o~..,)) To summarize, the following assignment of dk- ((s~o..) (og..o,..)) juncts satisfies the recta-rules and leads to the link- Fromsentences SI and $2 a moregeneral se- age shownbelow. manticconstraint islearned since ’animal’ subsumes the: (()(S)) ’cow’and ’condor’ and ’food’ subsumes ’corn’ and snipe: ((D)(Ss)). ’meat’.This knowledge is expressed by: eats: ((Ss)(o)) gl: (Oloo)) meat: ((Os) ((sso.,)(og...,..))

120 The semantic tags applied to the connectors serve sentence with an -nkuown word is processed the sys- as semantic constraints on the words that ’eats’ con- tem locates the that most closely resem- nects to. The first disjunct in the above entry tells bles it and attempts to infer the of unknown us that ’eats’ must have a concept that is subsumed words from this tree. This approach assumes that by ’animal’ to its left and a concept that is subsumed the sentences with knownwords produce parse trees by ’food’ to its right. that will match or cover all of the sentences with While the lexicon has no information about the unknownwords. A limiting factor of this method is unknownword it does have the semantic constraints the potentially large numberof distinct parse trees. of the known words in the sentence. These are used Unification-based grammars have been brought to infer what the semantic classification of the un- to bear on the problem of unknown words[Hastings, knownword should be if the sentence is valid. 1994; Russell, 1993]. These approaches are similsrin No semantic information has been acquired for that properties of I,nirnown words are inferred from ’snipe’. If the nouns and verbs in the sentence, the lexicon and phrase structure rules. However, as "The snipe eats meat~, are tagged with the nouns the underlying parsers work from left-to-right it is and verbs that they connect to, the following is ob- natural to propagate information from known words tained:. to unknownwords in the same direction. The distinctive features to our approach are that the: (() (D)) all the required knowledgeis represented explicitly snipe: ((D) (Ss...)) in the lexicon and constraint solving is bidirectional. ~ts: ((Se,.,p.)(0...o,)) This makes maximal use of the constraints of known meat: ((O,,,t,) words and reduces the search space for determining The lexicon has no knowledge of ’snipe’ but it the properties of unknown words. Link Grammaris does have knowledge of the verb, ’eats’, that link~ not the only way to process grammars bidirection- to ’snipe’. It must be determined which of the two ally. In fact, there is no reason why a more tradi- usages of ’eats’ described in R1) applies to the usage tional context free grammar could not be processed of ’eats’ in "The snipe eats meat". bidirectionally[Satta and Stock, 1994]. According to the concept hierarchy ’meat’ is sub- An implementation is under way to extend the sumed by ’food’ whereas ’gasoline’ is not. This in- parser of Link Grammar to automatically acquire dicates that the usage ((Ss=,,m=i)(O/ood)) more the syntax and semantics of unknown words. It appropriate and that ’snipe’ must therefore be tenta- seems that the disjuncts of each word are a special tively classified as an animal. This classification can kind of feature structure. An interesting topic is to integrate feature structures and unification with be refined as other usages of ’snipe’ are encountered. Link Grammarto allow more expressive handling of semantic information. 5 Discussion There has been extensive work on lexical acquisition. References Probabilistic part-of-speech taggers have been suc- [Asker et aL, 1992] L. Asker, B. Gamback, and cesdul in identifying the part-of-speech of vnknown . Samuelsson. EBL2 : An approach to au- words[Church, 1988; Weischedel et ai., 1993]. These tomatic lexical acquisition. In Proceedings of approaches often require large amounts of manually the 14th International Conference on Cornpn- tagged text to use as tr~inlng data. rational Linguistics (COLING-g~,), pages 1172- Unknownwords have been semantically classified 1176, Nantes, France, 1992. using knowledge intensive methods[Berwick, 1983; [Berwick, 1983] It. Berwick. Learning word mean- Granger, 1977; Zernik, 1987]. They assume the ings from examples. In Proceedings of the 8th availability of scripts or other forms of detailed do- International Joint Conference on Artificial In- main knowledge that must be manually constructed. telligence (IJCAI-83), volume 1, pages 459-461, While they have considerable success in specific do- Karlsruhe, West Germany, August 1983. mains it is dlt~cult to port such systems to new domains without requiring extensive manual cus- [Church, 1988] K. Church. A stochastic parts pro- tomization for the new domain. gram and noun phrase parser for unrestricted Explanation Based Learning has been applied to text. In Proceedings of the Second Conference on lexical acquisition[Asker et al., 1992]. A large cor- Applied Natural Language Processing, pages 136- pus of text is divided into sentences with unknown 143, Austin, TX, 1988. words and those without. Those without are parsed [Granger, 1977] It. Granger. FOUL-UP:A program and their parse trees form a knowledge base. Whena that figures out meanings of words from context.

121 In Proceedings of the ~e 5th International Joint Conference on Ar~i~cial Intelligence (IJCAI-77), volume 1, pages 172-178, Cambridge, MA, Au- gust 1977. [Hastings, 1994] P. Hastings. Automatic Acquisition of Word Meaning from Contez~. PhD thesis, The University of Michigan, 1994. [Hudson, 1984] R. Hudson. Won/Grammar. Basil Blackwell, 1984. [Me]~uk, 1988] I.A. Mei~uk. Dependency 5yntaz: Theory and Practice. State University of New York Pre~, 1988. [Oehrle et aL, 1988] R. Oehrle, E. Bach, and D. Wheeler, eclitors. Categor~ai Grammarsand Natural Language 5truc~ures. Reidel, Dordrecht, The Netherlanck, 1988. [Russell, 1993] D. Russell. Language Acqu~ition in a Unificafion-Ba~ed Grammar Proces~ng System Using a Real- World Knowledge Base. PhD thmis, University of Illlnois at Urbana-Champalgn,1993. [Sattaand Stock, 1994] G. Satta and O. Stock. Bidirectional context-free grammar parsing for natural language processing. Artificial Intelli- gence, 89:123-164, 1994. [Sleator and Temperley, 1991] D. Sleator and D. Temperley. Parsing English with a Link Gram- mar. Technical Report CMU-CS-91-196,Carnegie Mellon University, October 1991. [Weischedel et al., 1993] It. Weischedel, M. Meteer, E. Schwartz, L. Rar~haw, and J. Palmucci. Cop- ing with ambiguity and unknown words through probabilistic modek. Computational Lingu~fics, 19(2):359-382, June 1993. [Zernik, 1987] U. Zernik. Language acquisition: Learning a hierarchy of phrases. In Proceedings of the lO~h International Joint Conference on Ar- ti~cial Intelligence (IJCAI-87), volume 1, pages 125-132, Milan, Italy, August 1987.

122