Parsing English with a Link Grammar

Parsing English with a Link Grammar Daniel D. Sleator* and Davy Temperleyt * School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 email: sleator©cs . emu . edu t Music Department Columbia University New York, NY 10027 email: dt3©cunixa. cc . columbia. edu Abstract We define a new formal grammatical system called a link grammar. A sequence of words is in the language of a link grammar if there is a way to draw links between words in such a way that (1) the local requirements of each word are satisfied, (2) the links do not cross, and (3) the words form a connected graph. We have encoded English grammar into such a system, and written a program (based on new algorithms) for efficiently parsing with a link grammar. The formalism is lexical and makes no explicit use of constituents and categories. The breadth of English phenomena that our system handles is quite large. A number of sophisticated and new tecµniques were used to allow efficient parsing of this very complex grammar. Our program is written in C, and the entire system may be obtained via anonymous ftp. Several other researchers have begun to use link grammars in their own research. 1 Introduction Satisfaction: The links satisfy the linking re quirements of each word in the se Most sentences of most natural languages have quence. the property that if arcs are drawn connecting each pair of words that relate to each other, then theknown arcs will not cross [10, p. 36] . planarThis ity,well phenomelinknon, grammar whichs we call is the basis of our new formal lan guage system. words A link grammar consists of a set of ( the terminallinking symbols requir ofement. the grammar), each of which hassentencea A sequence of words is a of the language definedlinks by the gram mar if there exists a way to draw among the The linkingdictionary. requirements of each word are con words so as to satisfy the following conditions: tained in a To illustrate the linking requirements, the followingdiagram a, the, shows cat, snakea sim, Planarity: The links do not cross (when pleMary, dictionary ran, forchased the words. drawn above the words). and The linking requirement Connectivity: The links suffice to connect all the of each word is represented by the diagram above words of the sequence together. the word. 277 278 DANIEL SLEATOR AND DAVY TEMPERLEY linkage. called a From now on we'll use simpler diagrams to illustrate linkages. Here is thethe sim cat plifiedchased forma snake of the diagram showing that a cat Mary is part of this language: the snake rYs� the cat chased a snake ran chased We have a succinct, computer-readable nota tion for expressing the dictionary of linking re quirements. The following dictionary encodes the connector.Each of the intricately shaped labeled boxes is a A connector is satisfied by match linking requirements of the previous example. ing it to a compatible connector ( one with the words formula appropriate shape, facing in the opposite direc a the D+ - or tion). Exactly one of the connectors attached to snake cat D- & (0 S+) 0- or S+ a given black dot must be satisfiedcat ( the others, if Mary s- any, must not be used). Thus, requir�s a D ran connector to its left, and either an 0 connector to chased S- & 0+ its left or a S connector to its right. Plugging a pair of connectors together corresponds to draw The linking requirement for each word is ex ing a link between that pair of words. pressedor, as a formula involving the operators &, and parentheses, and connector names. The The following diagram shows how the Thelinking cat requirements are satisfiedin the sentence + or - suffix on a connector name indicates the chased a snake. direction (relative to the word being defined) in which the matching connector (if any) must lie. The & of two formulas is orsatisfied by satisfying both the formulas. The of two formulas re quires that exactly one of its formulas be satis the cat chased a snake fied. The order of the arguments of an & operator is significant. The farther left a connector is in (The unused connectors haveMary been chased suppressed the cat, the expression, the nearer the word tocat whic h it here.)the It catis easy ran to see that connects must be. Thus, when using as an and are also sentencesthe ofMary this chasedgram object, its determiner (to which it is connected mar.cat The sequence of words: with its D- connector) must be closer than the is not in this language. Any attempt to sat verb (to which it is connected with its 0- connec isfy the linking requirements leads to a violation tor). of one of the three rules. Here is one attempt: We can roughly divide our work on link gram mars into three parts: the link grammar formal ism and its properties, the construction of a wide coverage link grammar for English, and efficient cat algorithms and techniques for parsing link gram the Mary chased ran Mary, cat ran chased mars. We now touch briefly on all three of these Similarly and are not aspects. part of this language. Link grammars are a new12 and elegant context A set of links that prove that a sequence of free grammatical formalism , and have a unique words is in the language of a link grammar is combination of useful properties: 1 Link grammars resemble dependency grammars and categorial grammars. There are also many significant differ ences. Some light is shed on the relationship in section 6. 2The proof of the context-freeness of link grammars is not included in this paper, but appears in our technical PARSING ENGLISH WITH A LINK GRAMMAR 279 1. In a link grammar each word of the lexicon is be seen to emerge as contiguous connected col given a definitiondescribing how it can be used lections of words attached to the rest of the in a sentence. The grammar is distributed sentence by a particular type of link. For ex among the words. Such a system is said to ample in the dictionary above, s links always be lexical. This has several important advan attach a noun phrase ( the connected collection tages. It makes it easier to construct a large of words at the left end of the link) to a verb grammar, because a change in the definition of (on the right end of the link). 0 links work in a word only affects the grammaticality of sen a similar fashion. In these cases the links of a tences involving that word. The grammar can sentence can be viewed as an alternative way easily be constructed incrementally. Further of specifying the constituent structure of the more, expressing the grammar of the irregular sentence. On the other hand, this is not the verbs of English is easy - there's a separate way we think about link grammars, and we see definition for each word. no advantage in taking that perspective. Another nice feature of a lexical system is that it allows the construction of useful prob abilistic language models. This has led re Our second result is the construction of a link searchers to construct lexical versions of other grammar dictionary forEnglish. The goal we set grammatical systems, such as tree-adjoining for ourselves was to make a link grammar that grammars [13] . Lafferty and the present au can distinguish, as accurately as possible, syntac thors have also constructed such a probabilis tically correct English sentences from incorrect tic model for link grammars [11]. ones. We chose a formal or newspaper-style En glish as our model. The result is a link grammar 2. Unlike a phrase structure grammar, after pars of roughly eight hundred definitions (formulas) ing a sentence with a link grammar words that and 25000 words that captures many phenomena are associated semantically and syntactically of English grammar. It handles: noun-verb agree are directly linked. This makes it easy to ment, questions, imperatives, complex and irreg enforce agreement, and to gather statistical ular verbs, many types of nouns, past- or present information about the relationships between participles in noun phrases, commas, a variety words. of adjective types, prepositions, adverbs, relative 3. In English, whether or not a noun needs a de clauses, possessives, coordinating conjunctions, terminer is independent of whether it is used as unbounded dependencies, and many other things. a subject, an object, or even if it is part of a The third result described in this paper is a prepositional phrase. The algebraic notation program for parsing with link grammars. The we developed for expressing a link grammar program reads in a dictionary (in a form very takes advantage of this orthogonality. Any lex similar to the tables above) and will parse sen ical grammatical system, if it is to be used by tences according to the given grammar. The pro a human, must have such a capability. In our gram does an exhaustive search - it finds every current on-line dictionary the word cat can be way of parsing the given sequence with the given used in 369 different ways, and for time this link grammar. It is basedon our own O(n3 ) algo number is 1689. A compact link grammar for rithm ( n is the number of words in the sentence to mula captures this large number of possibil be parsed). The program also makes use of sev ities, and can easily be written and compre eral very effective data structures and heuristics hended by a human being. to speed up parsing. The program is comfortably 4.

Load more