An Earley for Range Concatenation Grammars

Laura Kallmeyer Wolfgang Maier Yannick Parmentier SFB 441 SFB 441 CNRS - LORIA Universitat¨ Tubingen¨ Universitat¨ Tubingen¨ Nancy Universite´ 72074 Tubingen,¨ Germany 72074 Tubingen,¨ Germany 54506 Vandœuvre, France [email protected] [email protected] [email protected]

Abstract class of LCFRS has received more attention con- cerning parsing (Villemonte de la Clergerie, 2002; We present a CYK and an Earley-style Burden and Ljunglof,¨ 2005). This article proposes algorithm for parsing Range Concatena- new CYK and Earley parsers for RCG, formulat- tion Grammar (RCG), using the deduc- ing them in the framework of parsing as deduction tive parsing framework. The characteris- (Shieber et al., 1995). The second section intro- tic property of the is that we duces necessary definitions. Section 3 presents a use a technique of range boundary con- CYK-style algorithm and Section 4 extends this straint propagation to compute the yields with an Earley-style prediction. of non-terminals as late as possible. Ex- periments show that, compared to previ- 2 Preliminaries ous approaches, the constraint propagation The rules (clauses) of RCGs1 rewrite predicates helps to considerably decrease the number ranging over parts of the input by other predicates. of items in the chart. E.g., a clause S(aXb) S(X) signifies that S is → 1 Introduction true for a part of the input if this part starts with an a, ends with a b, and if, furthermore, S is also true RCGs (Boullier, 2000) have recently received a for the part between a and b. growing interest in natural language processing Definition 1. A RCG G = N,T,V,P,S con- h i (Søgaard, 2008; Sagot, 2005; Kallmeyer et al., sists of a) a finite set of predicates N with an arity 2008; Maier and Søgaard, 2008). RCGs gener- function dim: N N 0 where S N is → \{ } ∈ ate exactly the class of languages parsable in de- the start predicate with dim(S) = 1, b) disjoint fi- terministic polynomial time (Bertsch and Neder- nite sets of terminals T and variables V , c) a finite hof, 2001). They are in particular more pow- set P of clauses ψ0 ψ1 . . . ψm, where m 0 erful than linear context-free rewriting systems → ≥ and each of the ψi, 0 i m, is a predicate of (LCFRS) (Vijay-Shanker et al., 1987). LCFRS is ≤ ≤ the form Ai(α1, . . . , αdim(A )) with Ai N and unable to describe certain natural language phe- i ∈ αj (T V )∗ for 1 j dim(Ai). nomena that RCGs actually can deal with. One ∈ ∪ ≤ ≤ Central to RCGs is the notion of ranges on example are long-distance scrambling phenom- strings. ena (Becker et al., 1991; Becker et al., 1992). Other examples are non-semilinear constructions Definition 2. For every w = w1 . . . wn with such as case stacking in Old Georgian (Michaelis wi T (1 i n), we define a) P os(w) = ∈ ≤ ≤ and Kracht, 1996) and Chinese number names 0, . . . , n . b) l, r P os(w) P os(w) with { } h i ∈ × (Radzinski, 1991). Boullier (1999) shows that l r is a range in w. Its yield l, r (w) is the ≤ h i RCGs can describe the permutations occurring substring wl+1 . . . wr. c) For two ranges ρ1 = with scrambling and the construction of Chinese l1, r1 , ρ2 = l2, r2 : if r1 = l2, then ρ1 ρ2 = h i h i · number names. l1, r2 ; otherwise ρ1 ρ2 is undefined. d) A vec- h i · tor φ = ( x , y ,..., x , y ) is a range vector Parsing for RCG have been intro- h 1 1i h k ki of dimension k in w if x , y is a range in w for duced by Boullier (2000), who presents a di- h i ii 1 i k. φ(i).l (resp. φ(i).r) denotes then the rectional top-down parsing algorithm using pseu- ≤ ≤ docode, and Barthelemy´ et al. (2001), who add an 1In this paper, by RCG, we always mean positive RCG, oracle to Boullier’s algorithm. The more restricted see Boullier (2000) for details.

9 Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pages 9–12, Suntec, Singapore, 4 August 2009. c 2009 ACL and AFNLP

first (resp. second) component of the ith element in c with i = Υ(c, x): ρ(i).l+1 = ρ(i).r C. For ∈ of φ, that is xi (resp. yi). all x, y that are variables or occurrences of termi- In order to instantiate a clause of the grammar, nals in c such that xy is a substring of one of the arguments in c: ρ(Υ(c, x)).r = ρ(Υ(c, y)).l C. we need to find ranges for all variables in the ∈ clause and for all occurrences of terminals. For These are all constraints in C. convenience, we assume the variables in a clause The range constraint vector of a clause c cap- and the occurrences of terminals to be equipped tures all information about boundaries forming a with distinct subscript indices, starting with 1 and range, ranges containing only a single terminal, ordered from left to right (where for variables, and adjacent variables/terminal occurrences in c. only the first occurrence is relevant for this order). An RCG derivation consists of rewriting in- We introduce a function Υ: P N that gives the → stantiated predicates applying instantiated clauses, maximal index in a clause, and we define Υ(c, x) i.e. in every derivation step Γ Γ , we re- 1 ⇒w 2 for a given clause c and x a variable or an occur- place the lefthand side of an instantiated clause rence of a terminal as the index of x in c. with its righthand side (w.r.t. a word w). The lan- Definition 3. An instantiation of a c P with guage of an RCG G is the set of strings that can ∈ Υ(c) = j w.r.t. to some string w is given by a be reduced to the empty word: L(G) = w + { | range vector φ of dimension j. Applying φ to S( 0, w ) ε . h | |i ⇒G,w } a predicate A(~α) in c maps all occurrences of The expressive power of RCG lies beyond mild x (T V ) with Υ(c, x) = i in ~α to φ(i). If ∈ ∪ context-sensitivity. As an example, consider the the result is defined (i.e., the images of adjacent RCG from Fig. 3 that generates a language that is variables can be concatenated), it is called an in- not semilinear. stantiated predicate and the result of applying φ to For simplicity, we assume in the following with- all predicates in c, if defined, is called an instanti- out loss of generality that empty arguments (ε) ated clause. occur only in clauses whose righthand sides are We also introduce range constraint vectors, vec- empty.2 tors of pairs of range boundary variables together with a set of constraints on these variables. 3 Directional Bottom-Up Chart Parsing

Definition 4. Let Vr = r1, r2,... be a set { } In our directional CYK algorithm, we move a dot of range boundary variables. A range constraint through the righthand side of a clause. We there- vector of dimension k is a pair ~ρ,C where a) h i fore have passive items [A, φ] where A is a pred- ~ρ (V 2)k; we define V (~ρ) as the set of range ∈ r r icate and φ a range vector of dimension dim(A) boundary variables occurring in ~ρ. b) C is a set and active items. In the latter, while traversing of constraints c that have one of the following r the righthand side of the clause, we keep a record forms: r = r , k = r , r + k = r , 1 2 1 1 2 of the left and right boundaries already found k r , r k, r r or r + k r ≤ 1 1 ≤ 1 ≤ 2 1 ≤ 2 for variables and terminal occurrences. This is for r1, r2 Vr(~ρ) and k N. ∈ ∈ achieved by subsequently enriching the range con- We say that a range vector φ satisfies a range straint vector of the clause. Active items have the constraint vector ρ, C iff φ and ρ are of the same form [A(~x) Φ Ψ, ρ, C ] with A(~x) ΦΨ a h i → • h i → dimension k and there is a function f : Vr N clause, ΦΨ = ε, Υ(A(~x ΦΨ)) = j and ρ, C → 6 → h i that maps ρ(i).l to φ(i).l and ρ(i).r to φ(i).r for a range constraint vector of dimension j. We re- all 1 i k such that all constraints in C are sat- quire that ρ, C be satisfiable.3 ≤ ≤ isfied. Furthermore, we say that a range constraint h i vector ρ, C is satisfiable iff there exists a range 2Any RCG can be easily transformed into an RCG satis- h i φ fying this condition: Introduce a new unary predicate Eps vector that satisfies it. with a clause Eps(ε) ε. Then, for every clause c with → Definition 5. For every clause c, we define its righthand side not ε, replace every argument ε that occurs in c with a new variable X (each time a distinct one) and add range constraint vector ρ, C w.r.t. a w with w = h i | | the predicate Eps(X) to the righthand side of c. n as follows: a) ρ has dimension Υ(c) and all 3Items that are distinguished from each other only by a bi- range boundary variables in ρ are pairwise differ- jection of the range variables are considered equivalent. I.e., if the application of a rule yields a new item such that an ent. b) For all r , r ρ: 0 r , r r , h 1 2i ∈ ≤ 1 1 ≤ 2 equivalent one has already been generated, this new one is r n C. For all occurrences x of terminals not added to the set of partial results. 2 ≤ ∈

10 A(~x) ε P with instantiation ψ : axiom is the prediction of an S ranging over the Scan [A, φ] such that→ ψ∈(A(~x)) = A(φ) A(~x) Φ P with entire input (initialize). We have two predict op- : range→ constraint∈ vector erations: predicts active items with Initialize [A(~x) Φ, ρ, C ] Predict-rule → • h i ρ, C , Φ = ε h i 6 the dot on the left of the righthand side, for a Complete: [B, φB ], given predicted passive item. Predict-pred pre- [A(~x) Φ B(x1...y1, ..., xk...yk)Ψ, ρ, C ] dicts a passive item for the predicate following the → • h i [A(~x) ΦB(x1...y1, ..., xk...yk) Ψ, ρ, C0 ] → • h i dot in an active item. Scan is applied whenever a where C0 = C φB (j).l = ρ(Υ(xj )).l, φB (j).r = predicted predicate can be derived by an ε-clause. ∪ { ρ(Υ(yj )).r 1 j k . The rules complete and convert are the ones from | ≤ ≤ } A(~x) Ψ P with [A(~x) Ψ , ρ, C ] an instantiation→ ∈ ψ that the CYK algorithm except that we add flags c to : Convert →[A, φ•] h i satisfies ρ, C , the passive items occurring in these rules. The h i ψ(A(~x)) = A(φ) goal is again [S, ( 0, n ), c]. Goal: [S, ( 0, n )] h i h i To understand how this algorithm works, con- Figure 1: CYK deduction rules sider the example in Fig. 3. The crucial property of this algorithm, in contrast to previous approaches, is the dynamic updating of a set of constraints on The deduction rules are shown in Fig. 1. The range boundaries. We can leave range boundaries first rule scans the yields of terminating clauses. unspecified and compute their values in a more in- Initialize introduces clauses with the dot on the cremental fashion instead of guessing all ranges of left of the righthand side. Complete moves the dot a clause at once at prediction.4 over a predicate provided a corresponding passive For evaluation, we have implemented a direc- item has been found. Convert turns an active item tional top-down algorithm where range bound- with the dot at the end into a passive item. aries are guessed at prediction (this is essentially the algorithm described in Boullier (2000)), and 4 The Earley Algorithm the new Earley-style algorithm. The algorithms were tested on different words of the language We now add top-down prediction to our algorithm. n L = a2 n 0 . Table 1 shows the number Active items are as above. Passive items have { | ≤ } an additional flag p or c depending on whether of generated items. the item is predicted or completed, i.e., they ei- Word Earley TD Word Earley TD ther have the form [A, ρ, C , p] where ρ, C is a a2 15 21 a16 100 539 h i h i 4 30 range constraint vector of dimension dim(A), or a 30 55 a 155 1666 8 32 the form [A, φ, c] where φ is a range vector of di- a 55 164 a 185 1894 a9 59 199 a64 350 6969 mension dim(A). Table 1: Items generated by both algorithms Initialize: [S, ( r1, r2 ), 0 = r1, n = r2 , p] h h i { }i Predict-rule: Clearly, range boundary constraint propagation [A, ρ, C , p] h i increases the amount of information transported [A(x1 . . . y1, . . . , xk . . . yk) Ψ, ρ0,C0 ] → • h i in single items and thereby decreases considerably where ρ0,C0 is obtained from the range constraint vector h i of the clause A(x1 . . . y1, . . . , xk . . . yk) Ψ by taking all the number of generated items. → constraints from C, mapping all ρ(i).l to ρ0(Υ(xi)).l and all ρ(i).r to ρ0(Υ(yi)).r, and then adding the resulting con- 5 Conclusion and future work straints to the range constraint vector of the clause. Predict-pred: We have presented a new CYK and Earley pars- [A(...) Φ B(x1...y1, ..., xk...yk)Ψ, ρ, C ] → • h i ing algorithms for the full class of RCG. The cru- [B, ρ0,C0 , p] h i where ρ0(i).l = ρ(Υ(xi)).l, ρ0(i).r = ρ(Υ(yi)).r for all cial difference between previously proposed top- 1 i k and C0 = c c C, c contains only range ≤ ≤ { | ∈ down RCG parsers and the new Earley-style algo- variables from ρ0 . rithm is that while the former compute all clause } A(~x) ε P with an [A, ρ, C , p] : instantiation→ ∈ψ satisfying ρ, C instantiations during predict operations, the latter Scan [A,h φ, ci] such that ψ(A(~x)) = A(φh) i 4Of course, the use of constraints makes comparisons be- Figure 2: Earley deduction rules tween items more complex and more expensive which means that for an efficient implementation, an integer-based repre- sentation of the constraints and adequate techniques for con- The deduction rules are listed in Fig. 2. The straint solving are required.

11 2n Grammar for a n > 0 : S(XY ) S(X)eq(X,Y ), S(a1) ε, eq(a1X, a2Y ) eq(X,Y ), eq(a1, a2) ε Parsing trace for{ w |= aa: } → → → → Item Rule 1 [S, ( r1, r2 ), 0 = r1, r1 r2, 2 = r2 , p] initialize 2 [S(hXYh ) i S{(X)eq(X,Y≤), X.l X.r,}i X.r = Y.l, Y.l Y.r, 0 = X.l, 2 = Y.r ] predict-rule from 1 → • { ≤ ≤ } 3 [S, ( r1, r2 ), 0 = r1, r1 r2 , p] predict-pred from 2 4 [S, h( h0, 1 ),i c] { ≤ }i scan from 3 5 [S(XYh ) i S(X)eq(X,Y ), X.l X.r, X.r = Y.l, Y.l Y.r, 0 = X.l, ] predict-rule from 3 6 [S(XY ) → •S(X) eq(X,Y ),{ ...,≤0 = X.l, 2 = Y.r, 1 =≤X.r ] } complete 2 with 4 7 [S(XY ) → S(X) • eq(X,Y ), {X.l X.r, X.r = Y.l, Y.l Y.r,} 0 = X.l, 1 = X.r ] complete 5 with 4 → • { ≤ ≤ } 8 [eq, ( r1, r2 , r3, r4 ), r1 r2, r2 = r3, r3 r4, 0 = r1, 2 = r4, 1 = r2 ] predict-pred from 6 h h i h i { ≤ ≤ }i 9 [eq(a1X, a2Y ) eq(X,Y ), a1.l + 1 = a1.r, a1.r = X.l, X.l X.r, → • { ≤ a2.l + 1 = a2.r, a2.r = Y.l, Y.l Y.r, X.r = a2.l, 0 = a1.l, 1 = X.r, 2 = Y.r ] predict-rule from 8 ... ≤ } 10 [eq, ( 0, 1 , 1, 2 ), c] scan 8 11 [S(XYh ) i hS(Xi)eq(X,Y ) , ..., 0 = X.l, 2 = Y.r, 1 = X.r, 1 = Y.l ] complete 6 with 10 12 [S, ( 0, 2 →), c] • { } convert 11 h i Figure 3: Trace of a sample Earley parse avoids this using a technique of dynamic updating Hakan˚ Burden and Peter Ljunglof.¨ 2005. Parsing lin- of a set of constraints on range boundaries. Exper- ear context-free rewriting systems. In Proceedings iments show that this significantly decreases the of IWPT 2005, pages 11–17, Vancouver. number of generated items, which confirms that Laura Kallmeyer, Timm Lichte, Wolfgang Maier, Yan- range boundary constraint propagation is a viable nick Parmentier, and Johannes Dellert. 2008. De- method for a lazy computation of ranges. veloping an MCTAG for German with an RCG- based parser. In Proceedings of LREC-2008, Mar- The Earley parser could be improved by allow- rakech, Morocco. ing to process the predicates of the righthand sides of clauses in any order, not necessarily from left Wolfgang Maier and Anders Søgaard. 2008. Tree- banks and mild context-sensitivity. In Proceedings to right. This way, one could process predicates of the 13th Conference on 2008, whose range boundaries are better known first. We Hamburg, Germany. plan to include this strategy in future work. Jens Michaelis and Marcus Kracht. 1996. Semilinear- ity as a Syntactic Invariant. In Logical Aspects of Computational Linguistics, Nancy. References Daniel Radzinski. 1991. Chinese number-names, tree Franc¸ois Barthelemy,´ Pierre Boullier, Philippe De- ´ adjoining languages, and mild context-sensitivity. schamp, and Eric de la Clergerie. 2001. Guided Computational Linguistics, 17:277–299. parsing of Range Concatenation Languages. In Pro- ceedings of ACL, pages 42–49. Benoˆıt Sagot. 2005. Linguistic facts as predicates over ranges of the sentence. In Proceedings of LACL 05, Tilman Becker, Aravind K. Joshi, and Owen Rambow. number 3492 in Lecture Notes in Computer Science, 1991. Long-distance scrambling and tree adjoining pages 271–286, Bordeaux, France. Springer. grammars. In Proceedings of EACL. Stuart M. Shieber, Yves Schabes, and Fernando C. N. Tilman Becker, Owen Rambow, and Michael Niv. Pereira. 1995. Principles and implementation of 1992. The Derivationel Generative Power of Formal deductive parsing. Journal of Logic Programming, Systems or Scrambling is Beyond LCFRS. Tech- 24(1& 2):3–36. nical Report IRCS-92-38, Institute for Research in Cognitive Science, University of Pennsylvania. Anders Søgaard. 2008. Range concatenation gram- mars for translation. In Proceedings of COLING, E. Bertsch and M.-J. Nederhof. 2001. On the complex- Manchester, England. ity of some extensions of RCG parsing. In Proceed- ings of IWPT 2001, pages 66–77, Beijing, China. K. Vijay-Shanker, David Weir, and Aravind Joshi. 1987. Characterising structural descriptions used by Pierre Boullier. 1999. Chinese numbers, mix, scram- various formalisms. In Proceedings of ACL. bling, and range concatenation grammars. In Pro- ceedings of EACL, pages 53–60, Bergen, Norway. Eric Villemonte de la Clergerie. 2002. Parsing mildly context-sensitive languages with thread automata. Pierre Boullier. 2000. Range concatenation grammars. In Proceedings of COLING, Taipei, Taiwan. In Proceedings of IWPT 2000, pages 53–64, Trento.

12