Parsing Algorithms Based on Tree Automata
Total Page:16
File Type:pdf, Size:1020Kb
Parsing Algorithms based on Tree Automata Andreas Maletti Giorgio Satta Departament de Filologies Romaniques` Department of Information Engineering Universitat Rovira i Virgili, Tarragona, Spain University of Padua, Italy [email protected] [email protected] Abstract resorting to so-called parental annotations (John- We investigate several algorithms related son, 1998), but this, of course, results in a different to the parsing problem for weighted au- tree language, since these annotations will appear tomata, under the assumption that the in- in the derived tree. put is a string rather than a tree. This Most of the theoretical work on parsing and es- assumption is motivated by several natu- timation based on PTA has assumed that the in- ral language processing applications. We put is a tree (Graehl et al., 2008), in accordance provide algorithms for the computation of with the very definition of these devices. How- parse-forests, best tree probability, inside ever, both in parsing as well as in machine transla- probability (called partition function), and tion, the input is most often represented as a string prefix probability. Our algorithms are ob- rather than a tree. When the input is a string, some tained by extending to weighted tree au- trick is applied to map the problem back to the tomata the Bar-Hillel technique, as defined case of an input tree. As an example in the con- for context-free grammars. text of machine translation, assume a probabilistic tree transducer T as a translation model, and an 1 Introduction input string w to be translated. One can then inter- Tree automata are finite-state devices that recog- mediately construct a tree automaton Mw that rec- nize tree languages, that is, sets of trees. There ognizes the set of all possible trees that have w as is a growing interest nowadays in the natural yield, with internal nodes from the input alphabet language parsing community, and especially in of T . This automaton Mw is further transformed the area of syntax-based machine translation, for into a tree transducer implementing a partial iden- probabilistic tree automata (PTA) viewed as suit- tity translation, and such a transducer is composed able representations of grammar models. In fact, with T (relation composition). This is usually probabilistic tree automata are generatively more called the ‘cascaded’ approach. Such an approach powerful than probabilistic context-free gram- can be easily applied also to parsing problems. mars (PCFGs), when we consider the latter as de- In contrast with the cascaded approach above, vices that generate tree languages. This difference which may be rather inefficient, in this paper we can be intuitively understood if we consider that a investigate a more direct technique for parsing computation by a PTA uses hidden states, drawn strings based on weighted and probabilistic tree from a finite set, that can be used to transfer infor- automata. We do this by extending to weighted mation within the tree structure being recognized. tree automata the well-known Bar-Hillel construc- As an example, in written English we can em- tion defined for context-free grammars (Bar-Hillel pirically observe different distributions in the ex- et al., 1964) and for weighted context-free gram- pansion of so-called noun phrase (NP) nodes, in mars (Nederhof and Satta, 2003). This provides the contexts of subject and direct-object positions, an abstract framework under which several pars- respectively. This can be easily captured using ing algorithms can be directly derived, based on some states of a PTA that keep a record of the dif- weighted tree automata. We discuss several appli- ferent contexts. In contrast, PCFGs are unable to cations of our results, including algorithms for the model these effects, because NP node expansion computation of parse-forests, best tree probability, should be independent of the context in the deriva- inside probability (called partition function), and tion. This problem for PCFGs is usually solved by prefix probability. 1 Proceedings of the 11th International Conference on Parsing Technologies (IWPT), pages 1–12, Paris, October 2009. c 2009 Association for Computational Linguistics 2 Preliminary definitions is, an alphabet whose symbols have an associated arity. We write Σ to denote the set of all k-ary Let S be a nonempty set and be an associative k · symbols in Σ. We use a special symbol e Σ0 binary operation on S. If S contains an element 1 ∈ to syntactically represent the empty string ε. The such that 1 s = s = s 1 for every s S, then · · ∈ set of Σ-trees, denoted by TΣ, is the smallest set (S, , 1) is a monoid. A monoid (S, , 1) is com- · · satisfying both of the following conditions mutative if the equation s1 s2 = s2 s1 holds · · for every α Σ0, the single node labeled α, for every s1, s2 S.A commutative semiring • ∈ ∈ written α(), is a tree of TΣ, (S, +, , 0, 1) is a nonempty set S on which a bi- · for every σ Σk with k 1 and for every nary addition + and a binary multiplication have • ∈ ≥ · t1, . , tk TΣ, the tree with a root node la- been defined such that the following conditions are ∈ beled σ and trees t , . , t as its k children, satisfied: 1 k written σ(t , . , t ), belongs to T . (S, +, 0) and (S, , 1) are commutative 1 k Σ • · As a convention, throughout this paper we assume monoids, that σ(t , . , t ) denotes σ() if k = 0. The size distributes over + from both sides, and 1 k •· of the tree t TΣ, written t , is defined as the s 0 = 0 = 0 s for every s S. ∈ | | • · · ∈ number of occurrences of symbols from Σ in t. A weighted string automaton, abbreviated WSA, Let t = σ(t , . , t ). The yield of t is recur- (Schutzenberger,¨ 1961; Eilenberg, 1974) is a sys- 1 k sively defined by tem M = (Q, Σ, , I, ν, F ) where S Q is a finite alphabet of states, • σ if σ Σ0 e Σ is a finite alphabet of input symbols, ∈ \{ } • yd(t) = ε if σ = e = (S, +, , 0, 1) is a semiring, •S · I : Q S assigns initial weights, yd(t1) yd(tk) otherwise. • → ··· ν : Q Σ Q S assigns a weight to each • × × → The set ofpositions of t, denoted by Pos(t), is transition, and recursively defined by F : Q S assigns final weights. • → We now proceed with the semantics of M. Let Pos(σ(t1, . , tk)) = w Σ∗ be an input string of length n. For each ∈ ε iw 1 i k, w Pos(ti) . integer i with 1 i n, we write w(i) to denote { } ∪ { | ≤ ≤ ∈ } ≤ ≤ the i-th character of w. The set Pos(w) of posi- Note that t = Pos(t) and, according to our con- | | | | tions of w is i 0 i n .A run of M on w k = 0 { | ≤ ≤ } vention, when the above definition provides is a mapping r : Pos(w) Q. We denote the set Pos(σ()) = ε . We denote the symbol of t at → { } of all such runs by RunM (w). The weight of a position w by t(w) and its rank by rkt(w). run r Run (w) is ∈ M A weighted tree automaton (WTA) is a system M = (Q, Σ, , µ, F ) where n S Q is a finite alphabet of states, wtM (r) = ν(r(i 1), w(i), r(i)) . • − Σ is a finite ranked alphabet of input symbols, i=1 • Y = (S, +, , 0, 1) is a semiring, We assume the right-hand side of the above equa- •S · µ is an indexed family (µk)k N of mappings • k ∈ tion evaluates to 1 in case n = 0. The WSA M µ :Σ SQ Q , and k k → × recognizes the mapping M :Σ∗ S, which is F : Q S → 1 assigns final weights. defined for every w Σ of length n by • → k ∈ ∗ In the above definition, Q is the set of all strings over Q having length k, with Q0 = ε . Fur- M(w) = I(r(0)) wtM (r) F (r(n)) . Q Qk { } · · ther note that S × is the set of all matrices r Run (w) ∈ XM with elements in S, row index set Q, and column In order to define weighted tree automata (Bers- index set Qk. Correspondingly, we will use the tel and Reutenauer, 1982; Esik´ and Kuich, 2003; common matrix notation and write instances of µ Borchardt, 2005), we need to introduce some addi- in the form µk(σ)q0,q1 qk . Finally, we assume ··· tional notation. Let Σ be a , that q1 qk = ε if k = 0. ranked alphabet ··· We define the semantics also in terms of runs. 1We overload the symbol M to denote both an automaton Let t TΣ.A run of M on t is a mapping and its recognized mapping. However, the intended meaning ∈ will always be clear from the context. r : Pos(t) Q. We denote the set of all such runs → 2 σ by Run (t). The weight of a run r Run (t) M ∈ M is @ σ γ wtM (r) = µk(t(w))r(w),r(w1) r(wk) . δ ··· w Pos(t) ∈Y γ δ α @ rkt(w)=k α β σ α Note that, according to our convention, the string β @ r(w1) r(wk) denotes ε when k = 0. The ··· β α σ α WTA M recognizes the mapping M : T S, Σ → which is defined by @ β α M(t) = wt (r) F (r(ε)) M · r Run (t) ∈ XM Figure 1: Input tree t and encoded tree enc(t). for every t TΣ. We say that t is recognized ∈ by M if M(t) = 0.