<<

An Earley-type recognizer for dependency Vincenzo Lombardo and Leonardo Lesmo Dipartimento di lnformatica and Centro di Scicnza Cognitiva Universith di Torino .so Svizzcra 185, 10149 Torino, Italy e-mail: {vincenzo, lesmo}@di.unito.it

Abstract cooked The paper is a first attempt to fill a gap in the SUBJ dependency literature, by providing a mathematical result on the complexity of recognition with a . The chef tish paper describes an improved Earley-type T~r/ recognizer with a complexity O(IGl2n3). The ,, improvement is due to a precompilation of the the a dependency rules into parse tables, that determine the conditions of applicability of two primary (a) actions, predict and scan, used in recognition. S 1 Introduction /NP\ /V k Dependency and constituency frameworks define D N V NP different syntactic structures. Dependency describe the structure of a sentence in terms of binary D N -modifier (also called dependency) relations on I I I" I the dlef cooked a fish the of the sentence. A dependency relation is an asymmetric relation between a callexl head (governor, parent), and a word called modifier Figure 1. A dependency tree (a) and a p.s. (dependent, daughter). A word in the sentence can tree (b) for the sentence "The chef cooked a play the role of the head in several dependency fish". The leftward or rightward orientation relations, i.e. it can have several modifiers; but each of the arrows in the dependency tree word can play the role of the modifier exactly once. represents the order constraints: the One special word does not play the role of the modifiers that precede the head stand on its modifier in any relation, and it is named the root. The left, the modifiers that follow the head stand set of the dependency relations that can be defined on on its right. a sentence form a tree, called the dependency tree (fig. la). , , xcomplement .... label the dependency Although born in the same years, dependency relations when the head is a verb. Grainmatical (Tesniere 1959) and constituency, or relations gained much popularity within the structure, syntax (Chomsky 1956) (see fig.lb), have unification formalisms in early 1980%. FUG (Kay had different impacts. The mainstream of formalisms 1979) and LFG (Kaplan, Bresnan 1982) exhibit consists ahnost exclusively of constituency mechanisms for producing a relational (or functional) approaches, but some of the original insights of the structure of the sentence, based on the merging of dependency tradition have found a role in the feature representations. constituency formalisms: in particular, the concept of All the recent constituency formalisms head of a phrase and the use of grammatical relations. acknowledge the importance of the lexicon, and The identification of the head within a phrase has reduce the amount of information brought by the been a major point of all the recent frameworks in phrasal categories. The "lexicalization" of context- : the X-bar theory (Jackendoff 1977), free grmnmars (Schabes, Waters 1993) points out defines as projections of (pre)terminal many similarities between the two paradigms symbols, i.e. word categories; in GPSG (Gazdar et al, (Rainbow, Joshi 1992). Dependency syntax is an 1985) and HPSG (Pollard, Sag 1987), each phrase extremely lexicalized framework, because the phrase structure rule identifies a head and a related structure component is totally absent. Like the other within its right-hand side; in HG lexicalized frameworks, the dependency approach (Pollard 1984) the head is involved in the so-called does not produce spurious grammars, and this facility head-wrapping operations, which allow the is of a practical interest, especially in writing realistic to go beyond the context-free power (Joshi grammars. For instance, there are no heavily et al. 1991). ambiguous, infinitely ambiguous or cyclic Grmmnatical relations are the primitive entities of dependency grammars (such as S ~ SS; S ~ a; S --* relational grammar (Perhnutter 1983) (classified as a ~; see (Tomita 1985), pp. 72-73). dependency-based theory in (Mercuk 1988)):

723 Dependency syntax is attractive because of the T is a set of dependency rules of the form X(Y1 Y2 immediate mapping of dependency structures on the ... Yi-1 # Yi+l ... Ym), where XGC, Y1GC, -argmnents structure (accessible by the .... Ym@C, and # is a special symbol that does not semantic interpreter), and because of the treatment of belong to C. (see fig. 2). free- constructs (Sgall et al. 1986) The modifier symbols Yj can take the form Yj*: as (Mel'cuk 1988) (Hudson 1990). A number of parsers usual, this means that an indefinite number of Yj's have been developed for some dependency (zero or more) may appear in an application of the frameworks (Fraser 1989) (Covington 1990) (Kwon, rule 1 . In the sample grammar below, this extension Yoon 1991) (Sleator, Temperley 1993) (Hahn et al. allows for several prepositional modifiers under a 1994) (Lai, Huang 1995): however, no result of single verbal or head without introducing algorithmic efficiency has been published as far as intermediate symbols; the predicate-arguments we know. The theoretical worst-case analysis of structure is immediately represented by a one-level O(n 3) descends from the (weak) equivalence between (flat) dependency structure. projective dependency grammars (a restricted of Let x=al a2...ap ~W* be a sentence. A dependency dependency grammars) and context-free grammars tree of x is a tree such that: (Gaifman 1965), and not from an actual 1) the nodes are the symbols ai~W (l, where ai W is a finite set of symbols (vocabulary of words of a ai natural language), C is a set of syntactic categories (preterminals, in Aiki constituency terms), i' ak | i | | S is a non-empty set of root categories (C _ S), i i L is a set of category assignment rules of the form X: i i x, where XCC, x@W, and Figure 3. The condition of projectivity. X

1 The use of the Kleene star is a notational change with respect to Gaifman: however, it is not uncommon to allow the symbols on the right hand side of a rule to be YI Y2 ... Yi-1 Yi+l ... Ym regular expressions in order to augment the perspicuity Figure 2 - A dependency rule: X is the of the syntactic representation, but not the expressive governor, and Y1 ..... Ym are the dependent power of the grammar (a similar extension appears in the of X in the given order (X is in # position). context-free part of the LFG formalism (Kaplan, Bresnan 1982)).

724 that a dependent is never separated from its governor i[ Y is starred by anything other than another dependent, together s' := s' U star(.Y[~) e,l~s':=s' u {.F}; with its subtree, or by a dependent of its own. endfor each dotted string; As an example, consider the grammar V := V U is'}; GI= } iV,N, P, A, D}, each category {I, saw, a, tall, old, man, in, the, park, with, telescope}, until all states in V are marked; {N: I, V: saw, D: a, A: tall, A: old, N: man, P: in, D: the, graph := . N: park, P: with, N: telescope} star (dotted-string): TI>, set-of strings:= {dotted-string}; where T1 is the following set of dependency rules: rsW~a~t t. V(N # P*); 2. V(N # N P*); take a non-marked dotted string ds from set-of-strings; mark ds; 3. N(A*#P*); 4. N(DA* # P*); if ds has the form ".Y[V' and Y is starred then 5. P(# N); 6. A(#); 7. D(#). set-of-strings := set-of-strings U {".f~"} For instance, the two rules for the root category all dotted strings in set-of-strings are marked V(erb) specify that a verb (V) can dominate one or star:= set-of-strings. two nouns and some prepositions (*). The initial set of states consists of a single state so, 3 Recognition with a dependency that contains all the possible strings ".a", such that grammar Cat(c0 is a dependency rule. Each string is prefixed with a dot. The marked states are the states that were ~[he recognizer is an improved Earley-type algorithm, expanded in a previous step. The expansion of a state where the predictive component has been compiled in s takes into account each symbol Y that ilmnediately a set of parse tables. We use two primary actions: follows a dot (Y C C U {#}). Y is a possible predict, that corresponds to the top-down guessing of continuation to a new state s', that contains the dotted a category, and scan, that corresponds to the scanning string ".[3", where ".Y[5" is a dotted string in s. s' is of the current input word. In the subsection 3.1 we added to the set of states, and a new edge from s to s' describe the data structures and the algorithms for labelled with Y is added to the set of edges. A dotted translating the dependency rules into the parse tables: string of the form .Y'13 is treated as a pair of dotted the dependency rules for a category are first strings {.Y'13, .[3}, so to allow a number of iterations translated into a transition graph, and then the (one or more Y's follow) or no iteration (the first transition graph is mapped onto a parse table. In the symbol in [~ follows) in the next step. The function subsection 3.2 we present the Earley-type recognizer, "star" takes into account these cases; the repeat loop that equals the most efficient recognizers for context- accounts for the case when the first symbol of 13 is tree grmnmar. starred too. The transition graphs obtained for the five 3,1 Transition graphs and parse tables categories of G1 are in fig. 4. Conventionally, we A transition graph is a pair (V, E), where V is a set of indicate the non-final states as h and the final states vertices called states, and E is a set of directed edges as Sk, where h and k are integers. labelled with a or the symbol #. The total number of states of all the transition Given a grammar G=, a state of the graphs for a grammar G is at most O(IGI), where IGI transition graph for a category Cat ~ C is a'set of is the sum of the lengths of the dependency rules. The dotted slxings of the Ibrm ". 13", where Cat(c~13) C T length of a dependency rule Cat(c0 is the length of (~. and et, [~ E (C U { # }) * ; an edge is a triple , Starting from the transition graph for a category Cat, where si, sj C V and Y G C U {#}. A state that we can build the parse table for Cat, i.e. PTCa t. contains the dotted string "." is called final; a final VI'Ca t is an array h x k, where h is the number of state signals that the recognition of one or more states of the transition graph and k is the nmnber of dependency rules has been completed. The following syntactic categories in C. Each row is identified by a algorithm constructs the transition graph for the pair , where State is the label of a state of category Cat: the corresponding transition graph; each column is associated with a syntactic category. In order to lSull.c.ti~ graph (('.at, G): improve the top-down algorithm we introduce the initialization concept of "first" of a category. The first of the s 0 := O; each rule in G of the iorm Cat(a) do category Cat is the set of categories that appear as s o := s 0 U {star ( .a )} leftmost node of a subtree headed by Cat. The first of a category X is computed by a simple procedure that V := is0}; we omit here. The function parse_table computes the E := ~; expansion parse tables of the various categories. E(t-graphcat) mllcat returns the set of the edges of the graph t-graphca t. take a non-marked state s from V; The contents of the entries in the parse tables are sets mark s; each eatcgory Y G C U {#} do (possibly empty) of predict and scan. The S' :: ~; initialization step consists in setting all entries of the each dotted string r = .Y[5 in s do table to the empty set.

725 0 1 $2 $3

(a) category V P

0 $1 @' -

0 1 $2 0 $1

(c) category P (e) category D

Fig. 4 - The transition graphs obtained for the grammar G 1. cat "• V N A p D '~

Figure 5 - The parse tables for the grammar G 1 parse-table (Cat, t-graphcat): although this does not happen for our simple initialize PTCat; grammar G 1. for each edge in E(t-graphcat) do if YECthen 3-2 A dependency recognizer for each category Z ~ first(Y) 0sa The dependency recognizer exhibits the same data PTCat(, Z) := PTCat(, Z) U structures of Earley's recognizer (Earley 1970), but I~Cat(} elseif Y =#then of the precompilation of the predictive component PTCat(, Cat) := PTCat(, Cat) U into the parse tables. {} In order to recognize a sentence of n words, n+l sets endif: Si of items are built. An item is a quadruple endfor parse-table := PTCat. where the first two elements (Category and State) The parse tables for the grammar G 1 are reported in correspond to a row of the parse table PTCategory, fig. 5. The firsts are: first(V)=first(N)={N, A, D}; the third element (Position) gives the index i of the first(P)={P}; first(A)={A}; first(D)={D}. Note that set Si where the recognition of a substructure began, an entry of a table can contain more than one action, and the fourth one (Depcat) is used to request the completion of a substructure headed by Depcat,

726 before continuing in the recognition of the larger span a portion of the input starting at i. In F, arley's slructure headed by Category (Depcat = "_" means terms, this item corresponds to all the dotted rules that the item is not waiting for any completion). of the form Cat'(. cz). Sentence: w 0 w 1 ... Wn. 1 b) , which represents the arc of initialization the transition graph of the category Cat, entering each root category V do the state State' and labelled Cat'. In Earley's terms, INSERT into SO this item corresponds to a dotted rule of the form ~a~dfor Cat(~z . Cat' l~). The items including a non-null body l~i fix/m0 ~ n do Depcat are just passive receptors waiting to be re- each item P= in Si do activated later when (and ii) the recognition of the completer: hypothesized substructure has successfully if final(State) then completed. .f.0Z each item in S i d~ INSERT into S i " scanner: "" results in inserting a new item into the set Si+l. Let us trace the recognition of the sentence "I saw predictor: a tall old man in the park with a telescope". The first if C l~l'Cat( × lnputcat) set SO (fig. 6) includes three items: the first one, tntc, ) Si., 0, 0, >, is produced by the initialization; the next INSERT il]to S i two, arid are produced by mdif the predictor (a N-headed subtree beginning in scaIltIt;F." if C lrFCat( × position 0 must be recognized and, in case such a Inputcat) th~ recognition occurs, the governing V can pass to state INSERT into Si+l 1). In S1 the first item is produced by e,nd~ each item tire scanner: it is the result of advancing on the input ternfination string according to the item in SO with j~ is in S n lh~ accept g]~reject an input noun "I" (the entry in the parse table PTN 'File external loop of the algorithm cycles on the sets x N contains ). The next item, is produced by applying the completer to the the set Si of the form . At each step item in S 0 . of the inner hoop, the action(s) given by the entry $2 contains the item , obtained by the " x lnputcat" in the parse table PTCa t scanner, that advances on the verb "saw". The other is(are) executed (where lnputcat is one of the four items are the result of a double application of the categories of the current word). Like in Earley's predictor, which, in a sense, builds a "chain" that parser there are three phases: completer, predictor consists of a noun governed by the root verb and of a and scanner. governed by that noun; this is the only completer: When an item is in a final state (of the way, according to the grammar, to accomodate an form $h), the algorithm looks for the items which incoming determiner when a verb is under analysis. represent the beginning of tile input portion just The subsequent steps can easily be traced by the analyzed: they are the l:i)ur-element items contained reader. The input sentence is accepted because of the in the set referred by j. These items are inserted into appearance in the last set of the item , Si after having set to "null" the fourth element (_). encoding that a structure headed by a verb (i.e. a root predictor: "" corresponds to a category), ending in a final state ($3), and covering prediction of the category Cat' as a modifier of the all the words from the beginning of the sentence has category Cat and to the transition to State', in case a been successfully recognized. substructure headed by Cat' is actually found. This is The space complexity of the recognizer is O(IGI modeled by introducing two new items in the set: n2). Each item is a quadruple , which represents the initial state of Depcat>: Depcat is a constant of the grammar; the the transition graph of the category Cat' which will pairs of Cat and State are bounded by O(IGI); S O [1] Slola] S12 S 9 [with] $7 ltlw] $3 [tall] S 5 #nan] S 1 [saw] $6 lin] S11 [telescope] $2 [a] S 4 [old] $8 [park] Figure 6. Sets of items generated in recognizing "I saw a tall old man in the park with a telescope".

V27 Position is bounded by O(n). The number of such Gaifman H., Dependency Systems and Phrase quadruples in a set of items is bounded by O(IGI n) Structure Systems, Information and Control 8, 1965, and there are n sets of items. 304-337. The time complexity of the recognizer is O(IGI2 Gazdar G., Klein E., Pullum G., Sag I., Generalized Phrase Structure Grammars, Basil n3). The phases scanner and predictor execute at Blackwell, Oxford, 1985. most O(IGI) actions per item; the items are at most Graham S. L., Harrison M. A., Ruzzo W. L., An O(IGI n 2) and the cost of these two phases for the improved Context-Free Recognizer, ACM Trans. on whole algorithm is O(IGl2n2). The phase completer Programming Languages and Systems 2, 1980, 415- executes at most one action per pair of items. The 462. variables of such a pair of items are the two states Hahn U., Schacht S., Broker N., Concurrent, Object-Oriented Natural Language Parsing: The (O(IGI2)), the two sets that contain them (O(n2)), and ParseTalk Model, CLIF Report 9/94, Albert- the two positions (O(n2)). But the pairs considered Ludwigs-Universitat, Freiburg, Germany. are not all the possible pairs: one of the sets has the Hudson R., English Word Grammar, Basil index which is the same of one of the positions, and Blackwell, Oxford, 1990. the complexity of the completer is O(IGI2 n3). The Jackendoff R., X-bar Syntax: A Study of Phrase phase completer prevails on the other two phases and Structure, MIT Press, 1977. the total complexity of the algorithm is O(IGI2 n3). Jacobs P.S., Rau L. F., Innovations in Text Interpretation, Artificial Intelligence Journal 63/1-2, Even if the O-analysis is equivalent to Earley's, the 1993, 143-191. phase of precompilation into the parse tables allows Joshi A.K., Vijay-Shanker K., Weir D., The to save a lot of computation time needed by the Convergence of Mildly Context-sensitive predictor. All the possible predictions are grammatical formalisms, in Sells P., Shieber S., precomputed in the transition to a new state. A Wasow T. (eds.), Foundational Issues in Natural similar device is presented in (Schabes 1990) for Language Processing, MIT Press, 1991. context-free grammars. Kaplan R., Bresnan J., Lexical-Functional Grammar: A Formal System for Grammatical Representation, in Bresnan J. (ed.), The mental 4 Conclusion representation of grammatical relations, MIT Press, 1982. The paper has described a recognition algorithm for Kay M., Functional Grammar, Proc. 5th Meeting dependency grammar. The dependency formalism is of the Berkeley Linguistic Society, 1979, 142-158. translated into parse tables, that determine the Kwon H., Yoon A., Unification-Based conditions of applicability of tile parser actions. The Dependency Parsing of Governor-Final Languages, recognizer is an improved Earley-type algorithm, Proc. IWPT 91, 1991, 182-192. whose performances are comparable to the best Lai B.Y.T., Huang C., Dependency Grammar and recognizers for the context-free grammars, the the Parsing of Chinese Sentences, Unpublished Manuscript on CompLing Server, 1995. formalism which is equivalent to the dependency Mel'cuk I., Dependency Syntax: Theory and formalism described in this paper. The algorithm has Practice, SUNY Press, Albany, 1988. been implemented in Common Lisp and runs under Perlmutter 1983, Studies in Relational Grammar 1, the Unix . The next step in our Univ. of Chicago Press, Chicago, 1983. research will be to relax the condition of projectivity Pollard C.J., Generalized Phrase Structure in order to improve the expressive power and to deal Grammars, Head Grammars, and Natural Language, with phenomena that go beyond the context-free Ph.D. Thesis, Stanford Univ., 1984. power. These changes imply the restructuring of Pollard CJ., Sag I., An Information Based Syntax some parts of the recognizer, with a plausible and , vo1.1, Fundamentals, CSLI Lecture increment of the complexity. Note 13. CSLI, Stanford, 1987. Rambow O., Joshi A., A Formal Look at Dependency Grammars and Phrase-Structure References Grammars, with Special Consideration of Word- Order Phenomena, Proc. of the Int. Workshop on the Chomsky N., Three models for the description of Meaning-Text Theory, Darmstadt, 1992. language, IRE Transactions on Information Theory, Schabes Y., Polynomial Time and Space Shift- IT-2, 1956, 113-124. Reduce Parsing of Arbitrary Context-Free Grammars, Covington M. A., Parsing Discontinuous Proc. ACL 90, Pittsburgh (PA), 1990, 106-113. Constituents in Dependency Grammar, Schabes Y., Waters R. C., Lexicalized Context- Computational Linguistics 16, 1990, 234-236. Free Grammars, Proc. ACL 93, 121-129. Covington M. A., An Empirically Motivated Sgall P., Haijcova E., Panevova J., The Meaning of Reinterpretation of Dependency Grammar, Res. Rep. Sentence in its Semantic and Pragmatic Aspects, AI-1994-01, Univ. of Georgia (also on CompLing D.Reidel Publ. Co., Dordrecht, 1986. Server), 1994. Sleator D. D., Temperley D., Parsing English with Earley J., An Efficient Context-free Parsing a , Proc. oflWPT 93, 1993, .277-291. Algorithm. Comm. of the ACM 13,1970, 94-102. Tesniere L., Elements de Syntaxe Structurale, Fraser N.M., Parsing and Dependency Grammar, Kliensieck, Paris, 1959. UCL Working Papers in Linguistics, 1989, 296-319. Tomita M., Efficient Parsing for Natural Fraser N.M., Hudson R. A., Inheritance in Word Language, Kluwer Acad. Publ., 1985. Grammar, Computational Linguistics 18, 1992, 133- 158.

728