Efficient techniques for parsing with tree automata

Jonas Groschwitz and Alexander Koller Mark Johnson Department of Linguistics Department of Computing University of Potsdam Macquarie University Germany Australia groschwi|[email protected] [email protected]

Abstract ing algorithm which recursively explores substruc- tures of the input object and assigns them non- Parsing for a wide variety of grammar for- terminal symbols, but their exact relationship is malisms can be performed by intersecting rarely made explicit. On the practical side, this finite tree automata. However, naive im- parsing algorithm and its extensions (e.g. to EM plementations of parsing by intersection training) have to be implemented and optimized are very inefficient. We present techniques from scratch for each new grammar formalism. that speed up tree-automata-based pars- Thus, development time is spent on reinventing ing, to the point that it becomes practically wheels that are slightly different from previous feasible on realistic data when applied to ones, and the resulting implementations still tend context-free, TAG, and graph parsing. For to underperform. graph parsing, we obtain the best runtimes Koller and Kuhlmann (2011) introduced Inter- in the literature. preted Regular Tree Grammars (IRTGs) in order to address this situation. An IRTG represents 1 Introduction a language by describing a of Grammar formalisms that go beyond context-free derivation trees, each of which is mapped to a term grammars have recently enjoyed renewed atten- over some algebra and evaluated there. Gram- tion throughout computational linguistics. Clas- mars from a wide range of monolingual and syn- sical grammar formalisms such as TAG (Joshi and chronous formalisms can be mapped into IRTGs Schabes, 1997) and CCG (Steedman, 2001) have by using different algebras: Context-free and tree- been equipped with expressive statistical mod- adjoining grammars use string algebras of differ- els, and high-performance parsers have become ent kinds, graph grammars can be captured by us- available (Clark and Curran, 2007; Lewis and ing graph algebras, and so on. In addition, IRTGs Steedman, 2014; Kallmeyer and Maier, 2013). come with a universal parsing algorithm based on Synchronous grammar formalisms such as syn- closure results for tree automata. Implementing chronous context-free grammars (Chiang, 2007) and optimizing this parsing algorithm once, one and tree-to-string transducers (Galley et al., 2004; could apply it to all grammar formalisms that can Graehl et al., 2008; Seemann et al., 2015) are be mapped to IRTG. However, while Koller and being used as models that incorporate syntac- Kuhlmann show that asymptotically optimal pars- tic information in statistical machine translation. ing is possible in theory, it is non-trivial to imple- Synchronous string-to-tree (Wong and Mooney, ment their algorithm optimally. 2006) and string-to-graph grammars (Chiang et In this paper, we introduce practical algorithms al., 2013) have been applied to semantic parsing; for the two key operations underlying IRTG pars- and so forth. ing: computing the intersection of two tree au- Each of these grammar formalisms requires its tomata and applying an inverse tree homomor- users to develop new algorithms for parsing and phism to a . After defining IRTGs training. This comes with challenges that are both (Section 2), we will first illustrate that a naive practical and theoretical. From a theoretical per- bottom-up implementation of the intersection al- spective, many of these algorithms are basically gorithm yields asymptotic parsing complexities the same, in that they rest upon a CKY-style pars- that are too high (Section 3). We will then

2042 Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, pages 2042–2051, Berlin, Germany, August 7-12, 2016. c 2016 Association for Computational Linguistics show how the parsing complexity can be im- automaton rule homomorphism proved by combining algebra-specific index data S r1(NP, VP) (x1, x2) → ∗ structures with a generic parsing algorithm (Sec- NP r2 John → tion 4), and by replacing bottom-up with top-down VP r3 walks → queries (Section 5). In contrast to the naive al- VP r4(VP, NP) (x1, (on, x2)) → ∗ ∗ gorithm, both of these methods achieve the ex- NP r5 Mars → pected asymptotic complexities, e.g. O(n3) for context-free parsing, O(n6) for TAG parsing, etc. Figure 1: An example IRTG. Furthermore, an evaluation with realistic gram- mars shows that our algorithms improve practi- alphabet E. Its domain is the of all strings over cal parsing times with IRTG grammars encoding E. For each symbol a E, it has a nullary oper- ∈ context-free grammars, tree-adjoining grammars, ation symbol a with aE∗ = a. It also has a single and graph grammars by orders of magnitude (Sec- E binary operation symbol , such that ∗ (w1, w2) tion 6). Thus our algorithms make IRTG pars- ∗ ∗ is the concatenation of the strings w1 and w2. Thus ing practically feasible for the first time; for graph the term (John, (walks, (on, Mars))) in Fig. 2b ∗ ∗ ∗ parsing, we obtain the fastest reported runtimes. evaluates to the string “John walks on Mars”. A finite tree automaton M over the signature Σ 2 Interpreted Regular Tree Grammars is a structure M = (Σ, Q, R, XF ), where Q is a We will first define IRTGs and explain how the finite set of states and XF Q is a final state. ∈ universal parsing algorithm for IRTGs works. R is a finite set of transition rules of the form X r(X ,...,X ), where the terminal symbol → 1 k 2.1 Formal foundations r Σ is of arity k and X,X ,...,X Q.A ∈ 1 k ∈ First, we introduce some fundamental theoretical tree automaton can run non-deterministically on a tree t T by assigning states to the nodes concepts and notation. ∈ Σ A signature Σ is a finite set of symbols r, f, . . ., of t bottom-up. If we have t = r(t1, . . . , tn) each of which has an arity ar(r) 0.A tree t over and M can assign the state Xi to each ti, written ≥ Xi ∗ ti, then we also have X ∗ t. We say that the signature Σ is a term of the form r(t1, . . . , tn), → → M accepts t if XF ∗ t, and define the language where the ti are trees and r Σ has arity n. We → ∈ L(M) T of M as the (possibly infinite) set identify the nodes of t by their Gorn addresses, i.e. ⊆ Σ of all trees that M accepts. An example of a tree paths π N∗ from the root to the node, and write ∈ automaton (with states S, NP, etc.) is shown in t(π) for the label of π. We write TΣ for the set of all trees over Σ, and T ( ) for the trees in which the “automaton rule” column of Fig. 1. It accepts, Σ Xk each node either has a label from Σ, or is a leaf among others, the tree τ1 in Fig. 2a. labeled with one of the variables x , . . . , x . Tree automata can be defined top-down or { 1 k} A (linear, nondeleting) tree homomorphism h bottom-up, and are equivalent to regular tree from a signature Σ to a signature ∆ is a mapping grammars. The languages that can be accepted by h : T T . It is defined by specifying, for each finite tree automata are called the regular tree lan- Σ → ∆ symbol r Σ of arity k, a term h(r) T ( ) guages. See e.g. Comon et al. (2008) for details. ∈ ∈ ∆ Xk in which each variable occurs exactly once. This symbol-wise mapping is lifted to entire trees by 2.2 Interpreted regular tree grammars letting h(r(t1, . . . , tk)) = h(r)[h(t1), . . . , h(tk)], We can combine tree automata, homomorphisms, i.e. by replacing the variable xi in h(r) by the re- and algebras into grammars that can describe lan- cursively computed value h(ti). guages of arbitrary objects, as well as relations be- Let ∆ be a signature. A ∆-algebra con- tween such objects – in a way that inherits many A sists of a nonempty set A, called the domain, and technical properties from context-free grammars, for each symbol f ∆ with arity k, a function while extending the expressive capacity. ∈ f : Ak A, the operation associated with An interpreted A → f. We can evaluate any term t T to a value (IRTG, Koller and Kuhlmann (2011)) ∈ ∆ t A, by evaluating the operation symbols = (M, (h , ),..., (h , )) consists of A ∈ G 1 A1 n An bottom-up. In this paper, we will be particularly a tree automaton M over some signature Σ, interested in the string algebra E∗ over the finite together with an arbitrary number n of inter-

2043 ∗ r1 h John evaluate r2 r4 ∗ “John walks on Mars” −→ walks −−−−→ r3 r5 ∗ on Mars

(a) Tree τ1. (b) Term h (τ1). (c) h (τ1) evaluated in E∗.

Figure 2: The tree τ1, evaluated by the homomorphism h and the algebra E∗ pretations (h , ), where each is an algebra sent synchronous grammars and (bottom-up) tree- i Ai Ai over some signature ∆i and each hi is a tree to-tree and tree-to-string transducers. In general, homomorphism from Σ to ∆i. The automaton any grammar formalism whose grammars describe M describes a language L(M) of derivation derivations in terms of a finite set of states can typ- trees which represent abstract syntactic structures. ically be converted into IRTG. Each derivation tree τ is then interpreted n ways: we map it to a term hi(τ) T∆i , and then we 2.3 Parsing IRTGs ∈ evaluate h (τ) to a value a = h (τ) i A of the i i i A ∈ i algebra . Thus, the IRTG defines a language Koller and Kuhlmann (2011) present a uniform Ai G L( ) = (h (τ) 1 , . . . , h (τ) n ) τ L(M) , parsing algorithm for IRTGs based on tree au- G { 1 A n A | ∈ } which is an n-place relation between the domains tomata. The (monolingual) parsing problem of IRTG consists in determining, for an IRTG and of the algebras. G an input object a A, a representation of the set ∈ Consider the IRTG shown in Fig. 1. The “au- parses(a) = τ L(M) h(τ) = a , i.e. of the G { ∈ | A } tomaton rule” column indicates the five rules of derivation trees that are grammatically correct and M; the final state is S. We already saw the deriva- are mapped to a by the interpretation. In the ex- tion tree τ L(M). has a single interpre- 1 ∈ G ample, we have parses(“John walks on Mars”) = tation, into a string algebra E∗, and with a ho- τ , where τ is as above. In general, parses(a) { 1} 1 momorphism that is specified by the “homomor- may be infinite, and thus we aim to represent phism” column; for instance, h(r ) = (x , x ) 1 ∗ 1 2 it using a tree automaton Cha with L(Cha) = and h(r2) = John. Applying this homomorphism parses(a), the parse chart of a. to τ1, we obtain the term h(τ1) in Fig. 2b. As we We can compute Cha as follows. First, ob- saw earlier, this term evaluates in the string alge- 1 serve that parses(a) = L(M) h− (terms(a)), bra to the string “John walks on Mars” (Fig. 2c). 1 ∩ where h− (L) = τ TΣ h(τ) L (the in- Thus this string is an element of L( ). { ∈ | ∈ } G verse homomorphic image, or invhom, of L) and We assume that no two rules of M use the same terms(a) = t T∆ tA = a , i.e. the set { ∈ | } terminal symbol; this is generally not required of all terms that evaluate to a. Now assume that in tree automata, but every IRTG can be brought the algebra is regularly decomposable, which A into this convenient form. Furthermore, we focus means that every a A has a decomposition au- ∈ (but only for simplicity of presentation) on IRTGs tomaton Da, i.e. there is a tree automaton Da such that use a single string-algebra interpretation, as in that L(Da) = terms(a). Because regular tree lan- Fig. 1. Such grammars capture context-free gram- guages are closed under invhom and intersection, mars. However, IRTGs can capture a wide vari- we can then compute a tree automaton Cha by in- ety of grammar formalisms by using different al- tersecting M with the invhom of Da. gebras. For instance, an interpretation that uses To illustrate the IRTG parsing algorithm, let us a TAG string algebra (or TAG derived-tree alge- compute a chart for the sentence s = “John walks bra) models a tree-adjoining grammar (Koller and on Mars” with the example grammar of Fig. 1. G Kuhlmann, 2012), and an interpretation into an The states of the decomposition automaton Ds are s-graph algebra models a hyperedge replacement spans [i, k] of s; the final state is XF = [1, 5]. The graph grammar (HRG, Groschwitz et al. (2015)). automaton contains fourteen rules, including the By using multiple algebras, IRTGs can also repre- ones shown in Fig. 3a.

2044 [1, 5] ([1, 2], [2, 5]) [1, 5] r1([1, 2], [2, 5]) S[1, 5] r (NP[1, 2], VP[2, 5]) → ∗ → → 1 [2, 5] ([2, 3], [3, 5]) [2, 5] r4([2, 3], [4, 5]) VP[2, 5] r (VP[2, 3], NP[4, 5]) → ∗ → → 4 [3, 5] ([3, 4], [4, 5]) [2, 4] r1([2, 3], [3, 4]) NP[1, 2] r → ∗ → → 2 [3, 4] on [1, 2] r2 VP[2, 3] r → → → 3 [4, 5] Mars [2, 3] r3 NP[4, 5] r → → → 5 1 (a) Some rules of Ds. (b) Some rules of I = h− (Ds). (c) The parse chart Chs.

Figure 3: Example rules for the sentence s = “John walks on Mars”

Algorithm 1 Naive bottom-up intersection tomaton (called M above) and MR is the inv- 1: initialize agenda with state pairs for constants hom of a decomposition automaton. As in the 2: initialize P as empty product construction for finite string automata, the 3: while agenda is not empty do states of C will be pairs TX of states T of ML 4: T X pop(agenda) and states X of MR, and the rules of C will 0 0 ← 5: T X P all have the form TX r(T1X1,...,TnXn), add 0 0 to → 6: for T 00X00 P do where T r(T1,...,Tn) is a rule in ML, and ∈ → 7: for T1X1,T2X2 = T 0X0,T 00X00 do X r(X1,...,Xn) is a rule in MR. { } { } → 8: for T r(T1,T2) in ML do → 3.1 Naive intersection 9: for X r(X1,X2) in MR do → 1 10: store TX r(T X ,T X ) A naive bottom-up algorithm is shown in Alg. 1. → 1 1 2 2 11: add TX to agenda if new This algorithm maintains an agenda of state pairs that have been discovered, but not explored as children of bottom-up rule applications; and a We can then compute the invhom automaton I, chart-like set P of all state pairs that have ever 1 such that L(I) = h− (L(Ds)). I uses the same been popped off the agenda. The algorithm main- states as Ds, but uses terminal symbols from Σ in- tains the invariant that if TX is on the agenda or stead of ∆. Some rules of the invhom automaton I in P , then T and X are partners (written T X), ≈ in the example are shown in Fig. 3b. Notice that I i.e. there is a tree t T such that T t in M ∈ Σ →∗ L also contains rules that are not consistent with M, and X t in M . →∗ R i.e. that would not occur in a grammatical parse The agenda is initialized with all state pairs of the sentence, such as [2, 4] r1([2, 3], [3, 4]). TX, for which M has a rule T r and M → L → R Finally, the chart Chs is computed by intersecting has a rule X r for some nullary symbol r Σ. → ∈ M with I (see Fig. 3c). The states of Chs are pairs Then, while there are state pairs left on the agenda, of states from M and states from I. It accepts τ1, Alg. 1 pops a state pair T 0X0 off the agenda and because τ1 parses(s). Observe the similarity to adds it to P ; iterates over all state pairs T X in ∈ 00 00 a traditional context-free parse chart. P ; and queries ML and MR bottom-up for rules in which these states appear as children.2 The itera- 3 Bottom-up intersection tion in line 7 allows T 0 and X0 to be either left or Both the practical efficiency of this algorithm right children in these rules. For each pair of left and its asymptotic complexity depend crucially on and right rules, the rules are combined into a rule how we compute intersection and invhom. We il- of C, and the pair of the parent states T and X is lustrate this using an overly naive intersection al- added to the agenda. gorithm as a strawman, and then analyze the prob- This naive intersection algorithm yields an lem to lay the foundations for the improved algo- asymptotic complexity for IRTG parsing that is rithms in Sections 4 and 5. higher than expected. Assume, for example, Let’s say that we want to compute a tree au- 1We assume binary symbols for simplicity; all algorithms tomaton C for the intersection of a “left” automa- generalize to arbitrary arities. 2 ton ML and a “right” automaton MR both over For the invhom automaton this can be done by substi- tuting the variables in the homomorphic image h(r) with the Σ the same signature . In the application to IRTG corresponding states X0 and X00, and running the decompo- parsing, ML is typically the derivation tree au- sition automaton on the resulting tree.

2045 Algorithm 2 Bottom-up intersection with BU and a label r in M’s signature is a data structure 1: initialize agenda with state pairs for constants that supports a single operation, BU(S, i, X0). We 2: generate new Sr = S(MR, r) for every r Σ require that a call to BU(S, i, X0) returns the set ∈ of rules X r(X ,...,X ) of M such that X 3: while agenda is not empty do → 1 n 0 4: T X pop(agenda) is the i-th child state, and for every j = i, Xj 0 0 ← 6 5: for T r(T1,T2) in ML s.t. Ti = T 0 do must be a state for which we previously called → 6: for X r(X1,X2) BU(Sr, i, X0) do BU(S, j, Xj). Thus a sibling-finder performs a → ∈ 7: store rule TX r(T X ,T X ) bottom-up rule lookup, changing its state after → 1 1 2 2 8: add TX to agenda if new each call by caching the state and position. Assume that we have sibling-finders for MR. Then we can modify the naive Alg. 1 to the closely that we are parsing with an IRTG encoding of a related algorithm shown as Alg. 2. This algorithm context-free grammar, i.e. with a string algebra maintains the same agenda as Alg. 1, but instead (as in Fig. 1). Then the states of MR are spans of iterating over all explored partner states T 00X00, 2 [i, k], i.e. MR has O(n ) states. Once line 4 has it first iterates over all rules in ML that have T 0 as picked a span X0 = [i, j], line 6 iterates over all a child (line 5). In line 6, Alg. 2 then queries MR spans X00 = [k, l] that have been discovered so sibling-finders – we maintain one for each rule la- far – including ones in which j = k and i = l. bel – for right rules with matching rule label and 6 6 Thus the bottom-up lookup in line 9 is executed child positions. Note that because there is only one 4 O(n ) times, most of which will yield no rules. rule with label r in ML, the sibling-finders implic- The overall runtime of Alg. 1 is therefore higher itly keep track of the partners of T2 we have seen than the asymptotic runtime of O(n3) expected for so far. Thus they play the role of a more structured context-free parsing. Similar problems arise for variant of P . other algebras; for instance, the runtime of Alg. 1 There are a number of ways in which sibling- 8 6 for TAG parsing is O(n ) rather than O(n ). finders can be implemented. First, they could simply maintain sets chi(S , i) where a call to 3.2 Indexing r BU(Sr, i, X0) first adds X0 to chi(Sr, i). The In context-free parsing algorithms, such as CKY query can then iterate over the set chi(Sr, 3 i), to or Earley, this issue is addressed through appro- − check for each state X00 in that set whether MR ac- priate index datastructures, which organize P such tually contains a rule with terminal symbol r and that the lookup in line 5 only returns state pairs children X0 and X00 (in the right order). This es- where X00 is of the form [j, k] or [k, i]. This re- sentially reimplements the behavior of Alg. 1, and duces the runtime to cubic. comes with the same complexity issues. The idea of obtaining optimal asymptotic com- Second, we could theoretically iterate over all plexities in IRTG parsing through appropriate in- rules of MR to implement the sibling finders via dexing was already mentioned from a theoreti- a bottom-up index (e.g., a trie) that supports effi- cal perspective by Koller and Kuhlmann (2011). cient BU queries. However, in IRTG parsing MR However, they assumed an optimal indexing data is the invhom of a decomposition automaton. Be- structure as given. In practice, indexing requires cause the decomposition automaton represents all algebra-specific knowledge about X00: A CKY- the ways in which the input object can be built re- style index only works if we assume that the states cursively out of smaller structures, including ones of the decomposition automaton are spans (this is which will later be rejected by the grammar, such not the case in other algebras), and that the only automata can be very large in practice. Thus we binary operation in the string algebra is , which ∗ would like to work with a lazy representation of composes spans in a certain way. Furthermore, in MR and avoid iterating over all rules. IRTG parsing the rules of the invhom automaton do not directly correspond to algebra operations, 4 Efficient bottom-up lookup but to terms of operations, which further compli- cates indexing. Finally, we can exploit the fact that in IRTG In this paper, we incorporate indexing into the parsing, MR is the invhom of a decomposition intersection algorithm through sibling-finders.A automaton. Below, we first show how to de- sibling-finder S = S(M, r) for an automaton M fine algebra-specific sibling-finders for decompo-

2046 Algorithm 3 passUpwards(Y, π, i, r) new S(D, t(π)) for each node π with at least two children. For instance, consider the term t = 1: rules BU(Sr,π, i, Y ) ← h (r ) in Fig. 1. It contains three nodes which 2: if π = π0k =  then 4 6 are labeled by algebra operations, two of which 3: for X f(X1,...,Xn) rules do → ∈ are the concatenation. We decorate these with the 4: passUpwards(X, π0, k, r) sibling-finders Sr4, and Sr4,1. Each of these is a sibling-finder for the algebra’s concatenation op- sition automata. Then we develop an algebra- eration; but they may have different state because independent way to generate invhom sibling- they received different queries. finders out of those for the decomposition au- We can then construct an invhom sibling- tomata. These can be plugged into Alg. 2 to finder Sr = S(I, r), which answers a query achieve the expected parsing complexity. BU(Sr, i, X0) in two steps. First, we substitute the variable x by the state X and percolate it upward 4.1 Sibling-finders for decomposition i 0 through t using the D-sibling-finders on the path automata from xi to the root. If π = π0k is the path to xi, First, consider the special case of sibling-finders we do this by calling passUpwards(X0, π0, k, r), for a decomposition automaton D. The terminal as defined in Alg. 3. If the local sibling-finder re- symbols f of D are the operation symbols of an turns rules and we are not at the root yet, we recur- algebra. If we have information about the opera- sively call passUpwards at the parent node π0 with tions of this algebra, and how they operate on the each parent state of these rules. states of D, a sibling-finder S = S(D, f) can use indexing specific to the operation f to look up po- As we do this, we let each sibling-finder main- tential siblings, and only for them query D to an- tain the set of rules it found, indexed by their swer BU(S, i, X) parent state. This allows us to perform the sec- For instance, a sibling-finder for the ‘ ’ op- ond step: we traverse t top-down from the root ∗ eration of the string algebra may store all states to extract the rules of the invhom automaton that [k, l] for i = 1 under the index l. Thus a lookup answer the BU query. Recall that BU(Sr, i, X0) should return only rules X r(X ,X ) where BU(S, 2, [l, m]) can directly retrieve siblings from → 1 2 BU(Sr, 3 i, X3 i) was called before. Here, this the l-bin, just as a traditional parse chart would. − − Spans which do not end at l are never consid- is guaranteed by having distinct D-sibling-finders ered. Different algebras require different index Sπ,r for every node π at every tree h(r). A final structures. For instance, sibling-finders for the detail is that before the first query to r, we initial- string-wrapping operation in the TAG string alge- ize the sibling-finders by calling passUpwards for bra might retrieve all pairs of substrings [k, l, m, o] all the leaves that are labeled by constants. that wrap around [l, m] instead. Analogous data This process is illustrated in Fig. 4, on the structures can be defined for the s-graph algebra. sibling-finder S = S(I, r4) and the input string “John walks on a hill on Mars”, parsed with a 4.2 Sibling-finders for invhom suitable extension of the IRTG in Fig. 1. The de- We can build upon the D-sibling-finders to con- composition automaton can accept the word “on” struct sibling-finders for the invhom I of D. from states [3, 4] and [6, 7], which during initial- The basic idea is as follows. Consider the term ization are entered into position 1 of the lower h (r ) = (x , x ) from Fig. 1. It contains a sin- D-sibling-finder S , indexed by their end po- 1 ∗ 1 2 r4,2 gle operation symbol (plus variables); the homo- sitions (a). Alg. 2 may then generate the query ∗ morphism only replaces one symbol with another. BU(S, 2, [4, 6]). This enters [4, 6] into the lower Thus a sibling-finder S(D, ) from the decompo- D-sibling-finder, and because there is a state with ∗ sition automaton can directly serve as a sibling- end position 4 on the left side of this sibling- finder S(I, r ). We only need to replace the label finder, BU(S , 2, [4, 6]) returns a rule with par- 1 ∗ r4,2 on the returned rules with r1. ent [3, 6]. The parent is subsequently entered into In general, the situation is more complicated, the upper sibling-finder (b). Finally, the query because t = h(r) may be a complex term con- BU(S, 1, [2, 3]) enters [2, 3] into the upper D- sisting of many algebra operations. In such a sibling-finder and discovers its sibling [3, 6] (c). case, we construct a separate sibling-finder Sr,π = This yields a state X = [2, 6] for the whole phrase

2047 ∗ ∗ ∗ 3 :[3, 6] 3 :[2, 3] 3 : [3, 6] ∅ ∅ ∅       x x x 1 ∗ 1 ∗ 1 ∗ 4 :[3, 4] 4 : [3, 4] 4 :[4, 6] 4 : [3, 4] 4 : [4, 6] ∅ 7 :[6, 7] 7 : [6, 7] 7 : [6, 7]       on x2 on x2 on x2 (a) After initialization. (b) After BU(S, 2, [4, 6]). (c) After BU(S, 1, [2, 3]).

Figure 4: Three stages of BU on S(I, r4) for the sentence “John walks on a hill on Mars”.

Algorithm 4 Top-down intersection the parent [1, 5] and symbol r1, it will enumer- 1: X ate the rules [1, 5] r1([1, 2], [2, 5]), [1, 5] function expand( ): → → 2: if X/ visited then r1([1, 3], [3, 5]), and [1, 5] r1([1, 4], [4, 5]), ∈ → 3: add X to visited without ever considering any other combination of 4: for X r(X1,X2) in MR do child states. → 5: call expand(Xi) for i = 1, 2 This is the idea underlying the intersection al- 6: for T r(T1,T2) s.t. Ti prt(Xi) do gorithm in Alg. 4. It recursively visits states X of → ∈ 7: store rule TX r(T X ,T X ) MR, collecting for each X a set prt(X) of states → 1 1 2 2 8: add T to prt(X) T of ML such that T X. Line 5 ensures that the ≈ prt sets have been computed for both child states of the rule X r(X ,X ). Line 6 then does a → 1 2 “walks on a hill”. The top-down traversal of the bottom-up lookup of ML rules with the terminal sibling-finders reveals that this state is reached by symbol r and with child states that are partners combining x1 = [2, 3], for which this BU query of X1 and X2. Applied to our running example, asked, with x2 = [4, 6], and thus the BU query Alg. 4 parses “John walks on Mars” by recursive yields the rule [2, 6] r4([2, 3], [4, 6]). A sub- expand([1, 5]) expand([2, 5]) → calls on and , fol- sequent query for BU(S, 2, [4, 8]) would yield the lowing the rules of I top-down. Recursive calls rule [2, 8] r4([2, 3], [4, 8]), and so on. for [2, 3] and [4, 5] establish VP prt([2, 3]) and → ∈ The overall construction allows us to answer NP prt([4, 5]), which enables the recursive call ∈ BU queries on invhom automata while making use for [2, 5] to apply r4 in line 6 and consequently add of algebra-specific index structures. Given suit- VP to prt([2, 5]) in line 8. able index structures, the asymptotic complexity The algorithm mixes top-down queries to MR 3 drops down to the expected levels, e.g. O(n ) for with bottom-up queries to ML. Line 6 implements 6 IRTGs using the string algebra, O(n ) for the TAG the core idea of the CKY parser, in that it performs string algebra, and so on. This yields a practical bottom-up queries on sets of nonterminals that are algorithm that can be flexibly adapted to new al- partners of adjacent spans – but generalized to ar- gebras by implementing their sibling-finders. bitrary IRTGs instead of just the string algebra. The top-down query to M in line 4 is bounded 5 Top-down intersection R by the number of rules that actually exist in MR, Instead of investing into efficient bottom-up which is O(n3) for the string algebra, O(n6) in the queries, we can also explore the use of top-down TAG string algebra, and O ns 3dsds for graphs · queries instead. These ask for all rules with parent of degree d and treewidth s 1 in the graph al- −  state X and terminal symbol r. Such queries com- gebra. Thus Alg. 4 achieves the same asymptotic pletely avoid the problem of finding siblings in complexity as native parsing algorithms. MR. An invhom automaton can answer top-down Condensed top-down intersection. One weak- queries for r efficiently by running the decomposi- ness of Alg. 4 is that it iterates over all rules tion automaton top-down on h(r), collecting child X r(X ,X ) of M individually. This can be → 1 2 R states at the variable nodes. For instance, if we extremely wasteful when MR is the invhom of a query I from Section 2 top-down for rules with decomposition automaton, because it may contain

2048 107 107 bottom-up bottom-up

106 top-down top-down 106 top-down cond. top-down cond.

105 sibling-finder sibling-finder 105

104

104 3 runtime [ms] 10 runtime [ms]

103 102

101 102 0 10 20 30 40 50 0 10 20 30 40 50 sentenceLength sentenceLength

Figure 5: Runtimes for context-free parsing. Figure 6: Runtimes for TAG parsing.

107 a great number of rules that have the same states bottom-up and only differ in the terminal symbol r. For in- 106 top-down top-down cond. stance, when we encode a context-free grammar as 5 10 sibling-finder an IRTG, for every rule r of the form A BC we GKT 15 → 4 have h(r) = (x , x ). The rules of the invhom 10 ∗ 1 2 automaton are the same for all terminal symbols r 3 runtime [ms] 10 with the same term h(r). But Alg. 4 iterates over rules r and not over different terms h(r), repeating 102 the exact same computation for every binary rule 101 of the context-free grammar. 1 2 3 4 5 6 7 8 9 10 To solve this, we define condensed tree au- nodeCount tomata, which have rules of the form X → Figure 7: Runtimes for graph parsing. ρ(X ,...,X ), where ρ Σ is a nonempty set 1 n ⊆ of symbols with arity n. A condensed automaton represents the tree automaton which for all con- runtime of four algorithms: the naive bottom-up densed rules X ρ(X ,...,X ) and all r ρ → 1 n ∈ baseline of Section 3; the sibling-finder algorithm has the rule X r(X ,...,X ). It is straight- → 1 n from Section 4; and the non-condensed and the forward to represent an invhom automaton as a condensed version of the top-down algorithm from condensed condensed automaton, by determining Section 5. The results are shown in Figures 5, 6 for each distinct homomorphic image t the set and 7. We measure the runtimes for computing ρ = r , . . . , r of symbols with h(r ) = t. t { 1 k} i the complete chart, and plot the geometric mean We can modify Alg. 4 to iterate over condensed of runtimes for each input size on a log scale. rules in line 4, and to iterate in line 6 over the rules We measured all runtimes on an Intel Xeon E7- T r(T ,T ) for which T prt(X ) and r ρ. → 1 2 i ∈ i ∈ 8857 CPU at 3 GHz using Java 8. The JVM was This bottom-up query to ML can be answered ef- warmed up before the measurements. The parser ficiently from an appropriate index on the rules of filtered each grammar automatically, removing all ML. Altogether, this condensed intersection algo- rules whose homomorphic image contained a con- rithm can be dramatically faster than the original stant that could not be used for a given input (e.g., version, if the grammar contains many symbols a word that did not occur in the sentence). with the same homomorphic image. PCFG. We extracted a binarized context-free 6 Evaluation grammar with 6929 rules from Section 00 of the Penn Treebank, and parsed the sentences of Sec- We compare the runtime performance of the pro- tion 00 with it. The homorphism in the corre- posed algorithms on practical grammars and in- sponding IRTG assigns every terminal symbol a puts, from three very different grammar for- constant or the term (x , x ), as in Fig. 1. As a ∗ 1 2 malisms: context-free grammars, TAG, and HRG consequence, the condensed automaton optimiza- graph grammars. In each setting, we measure the tion from Section 5 outperforms all other algo-

2049 rithms, achieving a 100x speedup over the naive Groote, 2001) can be translated into Datalog, bottom-up algorithm when it was cancelled. which enables the use of generic indexing strate- TAG. We also extracted a tree-adjoining gram- gies for Datalog to achieve optimal asymptotic mar from Section 00 of the PTB as described by complexity. Ranta (2004) discusses parsing for his Chen and Vijay-Shanker (2000), converted it to Grammatical Framework formalism in terms of an IRTG as described by Koller and Kuhlmann partial evaluation techniques from functional pro- (2012), and binarized it, yielding an IRTG with gramming, which are related to the step-by-step 26652 rules. Each term h(r) in this grammar evaluation of sibling-finders in Figure 4. Like the represents an entire TAG elementary tree, which approach of Gomez-Rodriguez et al., these meth- means the terms are much more complex than ods have not been evaluated for large-scale gram- for the PCFG and there are much fewer terminal mars and realistic evaluation data, which makes it symbols with the same homomorphic term. As a hard to judge their relative practical merits. consequence, condensing the invhom is much less Most work in the tree automata community has helpful. However, the sibling-finder algorithm ex- a theoretical slant, and there is less research on the cels at maintaining state information within each efficient implementation of algorithms for tree au- elementary tree, yielding a 1000x speedup over the tomata than one would expect; Cleophas (2009) naive bottom-up algorithm when it was cancelled. and Lengal et al. (2012) are notable exceptions. Graphs. Finally, we parsed a corpus of graphs Even these tend to be motivated by applications instead of strings, using the 13681-rule graph such as specification and verification, where the grammar of Groschwitz et al. (2015) to parse the tree automata are much smaller and much less am- 1258 graphs with up to 10 nodes from the “Little biguous than in computational linguistics. This Prince” AMR-Bank (Banarescu et al., 2013). The makes these systems hard to apply directly. top-down algorithms are slow in this experiment, confirming Groschwitz et al.’s findings. Again, 8 Conclusion the sibling-finder algorithm outperforms all other We have presented novel algorithms for comput- algorithms. Note that Groschwitz et al.’s parser ing the intersection and the inverse homomorphic (“GKT 15” in Fig. 7) shares much code with our image of finite tree automata. These can be used system. It uses the same decomposition automata, to implement a generic algorithm for IRTG pars- but a less mature version of the sibling-finder ing, and apply directly to any grammar formalism method which fully computes the invhom automa- that can be represented as an IRTG. An evaluation ton. Our new system achieves a 9x speedup for on practical data from three different grammar for- parsing the whole corpus, compared to GKT 15. malisms shows consistent speed improvements of several orders of magnitude, and our graph parser 7 Related Work has the fastest published runtimes. Describing parsing algorithms at a high level of A Java implementation of our algorithms is abstraction has a long tradition in computational available as part of the Alto parser, http:// linguistics, e.g. in deductive parsing with parsing bitbucket.org/tclup/alto. schemata (Shieber et al., 1995). A key challenge We focused here purely on symbolic parsing, under this view is to index chart entries so they and on computing complete parse charts. In the can be retrieved efficiently, which parallels the presence of a probability model (e.g. for IRTG en- situation in automata intersection discussed here. codings of PCFGs), our algorithms could be made Gomez-Rodr´ ´ıguez et al. (2009) present an algo- faster through the use of appropriate pruning tech- rithm that automatically establishes index struc- niques. It would also be interesting to combine tures that guarantee optimal asymptotic runtime, the strengths of the condensed and sibling-finder but also requires algebra-specific extensions for algorithms for further efficiency gains. grammar formalisms that go beyond context-free Acknowledgments. We thank the anonymous string grammars. reviewers for their comments. We are grateful Efficient parsing has also been studied in other to Johannes Gontrum for an early implementation generalized grammar formalisms beyond IRTG. of Alg. 4, and to Christoph Teichmann for many Kanazawa (to appear) shows how the parsing fruitful discussions. This work was supported by problem of Abstract Categorial Grammars (de the DFG grant KO 2916/2-1.

2050 References Laura Kallmeyer and Wolfgang Maier. 2013. Data- driven parsing using probabilistic linear context- Laura Banarescu, Claire Bonial, Shu Cai, Madalina free rewriting systems. Computational Linguistics, Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin 39(1):87–119. Knight, Philipp Koehn, Martha Palmer, and Nathan Schneider. 2013. Abstract meaning representation Makoto Kanazawa. to appear. Parsing and genera- for sembanking. In Proceedings of the Linguistic tion as datalog query evaluation. IfCoLog Journal Annotation Workshop (LAW VII-ID). of Logics and Their Applications.

John Chen and K. Vijay-Shanker. 2000. Automated Alexander Koller and Marco Kuhlmann. 2011. A gen- extraction of TAGs from the Penn Treebank. In Pro- eralized view on parsing and translation. In Pro- ceedings of IWPT. ceedings of the 12th International Conference on Parsing Technologies (IWPT). David Chiang, Jacob Andreas, Daniel Bauer, Karl Moritz Hermann, Bevan Jones, and Kevin Alexander Koller and Marco Kuhlmann. 2012. De- Knight. 2013. Parsing graphs with hyperedge composing TAG algorithms using simple alge- replacement grammars. In Proceedings of the 51st braizations. In Proceedings of the 11th TAG+ Work- Annual Meeting of the Association for Computa- shop. tional Linguistics (ACL). Ondrej Lengal, Jiri Simacek, and Tomas Vojnar. 2012. Vata: A library for efficient manipulation of non- David Chiang. 2007. Hierarchical phrase-based trans- deterministic tree automata. In C. Flanagan and lation. Computational Linguistics, 33(2):201–228. B. Konig,¨ editors, Tools and Algorithms for the Con- struction and Analysis of Systems: 18th Interna- Stephen Clark and James Curran. 2007. Wide- tional Conference, TACAS 2012. Springer. coverage efficient statistical parsing with CCG and log-linear models. Computational Linguistics, Mike Lewis and Mark Steedman. 2014. A* CCG pars- 33(4):493–552. ing with a supertag-factored model. In Proceedings of EMNLP. Loek Cleophas. 2009. Forest FIRE and FIRE Wood: Tools for tree automata and tree algorithms. In Pro- Aarne Ranta. 2004. Grammatical framework: A type- ceedings of the Conference on Finite-State Methods theoretical grammar formalism. Journal of Func- and Natural Language Processing (FSMNLP). tional Programming, 14(2):145–189.

Hubert Comon, Max Dauchet, Remi´ Gilleron, Flo- Nina Seemann, Fabienne Braune, and Andreas Maletti. rent Jacquemard, Denis Lugiez, Christof Loding,¨ 2015. String-to-tree multi bottom-up tree transduc- Sophie Tison, and Marc Tommasi. 2008. Tree ers. In Proceedings of the 53rd ACL and 7th IJC- automata techniques and applications. http:// NLP. tata.gforge.inria.fr/. Stuart M Shieber, Yves Schabes, and Fernando CN Philippe de Groote. 2001. Towards abstract catego- Pereira. 1995. Principles and implementation of de- rial grammars. In Proceedings of the 39th ACL/10th ductive parsing. The Journal of logic programming, EACL. 24(1):3–36. Mark Steedman. 2001. The Syntactic Process. MIT Michel Galley, Mark Hopkins, Kevin Knight, and Press, Cambridge, MA. Daniel Marcu. 2004. What’s in a translation rule? In Proceedings of HLT/NAACL. Yuk Wah Wong and Raymond J. Mooney. 2006. Learning for semantic parsing with statistical ma- Carlos Gomez-Rodr´ ´ıguez, Jesus´ Vilares, and Miguel A. chine translation. In Proceedings of the Human Lan- Alonso. 2009. A compiler for parsing schemata. guage Technology Conference of the North Ameri- Software: Practice and Experience, 39(5):441–470. can Chapter of the Association for Computational Linguistics (HLT/NAACL-2006). Jonathan Graehl, Kevin Knight, and Jonathan May. 2008. Training tree transducers. Computational Linguistics, 34(3).

Jonas Groschwitz, Alexander Koller, and Christoph Te- ichmann. 2015. Graph parsing with s-graph gram- mars. In Proceedings of the 53rd ACL and 7th IJC- NLP.

Aravind K. Joshi and Yves Schabes. 1997. Tree- Adjoining Grammars. In G. Rozenberg and A. Salo- maa, editors, Handbook of Formal Languages, vol- ume 3, pages 69–123. Springer-Verlag, Berlin.

2051