A Generalized View on Parsing and Translation

Alexander Koller Marco Kuhlmann Dept. of Linguistics Dept. of Linguistics and Philology University of Potsdam, Germany Uppsala University, Sweden [email protected] [email protected]

Abstract In this paper, we make two contributions. First, we provide a formal framework – interpreted reg- We present a formal framework that gen- eralizes a variety of monolingual and syn- ular tree grammars – which generalizes both syn- chronous grammar formalisms for parsing chronous grammars, tree transducers, and LCFRS- and translation. Our framework is based on style monolingual grammars. A grammar of this regular tree grammars that describe deriva- formalism consists of a regular tree grammar (RTG, tion trees, which are interpreted in arbitrary Comon et al. (2007)) defining a language of deriva- algebras. We obtain generic parsing algo- tion trees and an arbitrary number of interpreta- rithms by exploiting closure properties of tions which map these trees into objects of arbi- regular tree languages. trary algebras. This allows us to capture a wide variety of (synchronous and monolingual) gram- 1 Introduction mar formalisms. We can also model heterogeneous Over the past years, grammar formalisms that relate synchronous languages, which relate e.g. trees with pairs of grammatical structures have received much strings; this is necessary for applications in ma- attention. These formalisms include synchronous chine translation (Graehl et al., 2008) and in pars- grammars (Lewis and Stearns, 1968; Shieber and ing strings with synchronous tree grammars. Schabes, 1990; Shieber, 1994; Rambow and Satta, Second, we also provide parsing and decoding al- 1996; Eisner, 2003) and tree transducers (Comon gorithms for our framework. The key concept that et al., 2007; Graehl et al., 2008). Weighted vari- we introduce is that of a regularly decomposable ants of both of families of formalisms have been algebra, where the of all terms that evaluate to used for machine translation (Graehl et al., 2008; a given object form a regular tree language. Once Chiang, 2007), where one tree represents a parse an algorithm that computes a compact representa- of a sentence in one language and the other a parse tion of this language is known, parsing algorithms in the other language. Synchronous grammars and follow from a generic construction. All important tree transducers are also useful as models of the algebras in natural language processing that we are syntax-semantics interface; here one tree represents aware of – in particular the standard algebras of the syntactic analysis of a sentence and the other strings and trees – are regularly decomposable. the semantic analysis (Shieber and Schabes, 1990; In summary, we obtain a formalism that pulls Nesson and Shieber, 2006). together much existing research under a common When such a variety of formalisms are avail- formal framework, and makes it possible to ob- able, it is useful to take a step back and look for tain parsers for existing and new formalisms in a a generalized model that explains the precise for- modular, universal fashion. mal relationship between them. There is a long Plan of the paper. The paper is structured as tradition of such research on monolingual grammar follows. We start by laying the formal foundations formalisms, where e.g. linear context-free rewriting in Section 2. We then introduce the framework of systems (LCFRS, Vijay-Shanker et al. (1987)) gen- interpreted RTGs and illustrate it with some simple eralize various mildly context-sensitive formalisms. examples in Section 3. The generic parsing and However, few such results exist for synchronous decoding algorithms are described in Section 4. formalisms. A notable exception is the work by Section 5 discusses the role of binarization in our Shieber (2004), who unified synchronous tree- framework. Section 6 shows how interpreted RTGs adjoining grammars with tree transducers. can be applied to existing grammar formalisms.

2 Proceedings of the 12th International Conference on Parsing Technologies, pages 2–13, October 5-7, 2011, Dublin City University. c 2011 Association for Computational Linguistics 2 Formal Foundations where ti/xi i [n] represents the substitu- { | ∈ } tion that replaces all occurrences of x with the For n 0, we define [n] = i 1 i n . i ≥ { | ≤ ≤ } respective ti. A homomorphism is called linear A signature is a finite set Σ of function sym- if every term h(f) contains each variable at most bols f, each of which has been assigned a non- once; and a delabeling if every term h(f) is of the negative integer called its rank. Given a signature form g(x , . . . , x ) where n is the rank of f Σ, we can define a (finite constructor) tree over Σ π(1) π(n) and π a permutation of 1, . . . , n . as a finite tree whose nodes are labeled with sym- { } bols from Σ such that a node with a label of rank 3 Interpreted Regular Tree Grammars n has exactly n children. We write TΣ for the set of all trees over Σ. Trees can be written as terms; We will now present a generalized framework for f(t1, . . . , tn) stands for the tree with root label f synchronous and monolingual grammars in terms and subtrees t1, . . . , tn. The nodes of a tree can be of regular tree grammars, tree homomorphisms, identified by paths π N∗ from the root: The root and algebras. We will illustrate the framework with ∈ has address , and the i-th child of the node at path two simple examples here, but many other grammar π has the address πi. We write t(π) for the symbol formalisms can be seen as special cases too, as we at path π in the tree t. will show in Section 6. A Σ-algebra consists of a non-empty set A A called the domain and, for each symbol f Σ 3.1 An Introductory Example ∈ with rank n, a total function f : An A, the A → The derivation process of context-free grammar is operation associated with f. We can evaluate a usually seen as a string-rewriting process in which term t TΣ to an object t A by executing ∈ A ∈ nonterminals are successively replaced by the right- the operations: hand sides of production rules. The actual parse J K tree is explained as a post-hoc description of the f(t1, . . . , tn) = f A( t1 ,..., tn ) . A A A rules that were applied in the derivation. Sets of trees can be specified by regular tree However, we can alternatively view this as a two- J K J K J K grammars (RTGs) (Gécseg and Steinby, 1997; step process which first computes a derivation tree Comon et al., 2007). Formally, such a grammar is and then interprets it as a string. Say we have the a structure = (N,Σ,P,S), where N is a signa- G G CNF grammar in Fig. 2, and we want to derive ture of nonterminal symbols, all of which are taken the string w = “Sue watches the man with the to have rank 0, Σ is a signature of terminal sym- telescope”. In the first step, we use G to generate a bols, S N is a distinguished start symbol, and P ∈ derivation tree like the one in Fig. 2a. The nodes of is a finite set of productions of the form B t, → this tree are labeled with names of the production where B is a nonterminal symbol, and t TN Σ. rules in G; nodes with labels r and r are licensed ∈ ∪ 7 3 The productions of a regular tree grammar are used by G to be the two children of r1 because r1 has as rewriting rules on terms. More specifically, the the two nonterminals NP and VP in its right-hand derivation relation of is defined as follows. Let r r G side, and the left-hand sides of 7 and 3 are NP t1, t2 TN Σ be terms. Then derives t2 from t1 and VP, respectively. In a second step, we can then ∈ ∪ G in one step, denoted by t1 t2, if there exists a interpret the derivation tree into w by interpreting ⇒G production of the form B t and t can be ob- → 2 each leaf labeled with a terminal production (say, tained by replacing an occurrence of B in t1 by t. r7) as the string on its right-hand side (“Sue”), and The (regular) language L( ) generated by is the G G each internal node as a string concatenation opera- set of all terms t T that can be derived, in zero ∈ Σ tion which arranges the string yields of its subtrees or more steps, from the term S. in the order given by the right-hand side of the A (tree) homomorphism is a total function production rule. h: T T which expands symbols of Σ into Σ → ∆ This view differs from the traditional perspec- trees over ∆ while following the structure of the in- tive on context-free grammars in that it makes the put tree. Formally, h is specified by pairs (f, h(f)), derivation tree the primary participant in the deriva- where f Σ is a symbol with some rank n, and ∈ tion process. The string is only one particular in- h(f) T∆ x1,...,xn is a term with variables. terpretation of the derivation tree, and instead of ∈ ∪{ } Given t T , the value of t under h is defined as ∈ Σ a string we could also have interpreted it as some h(f(t , . . . , t )) = h(f) h(t )/x i [n] , other kind of object. For instance, if we had inter- 1 n { i i | ∈ }

3 string transducer); thesestring differences transducer); simply these amount differencesterms. simply Then amountG derivesterms.t2 from Thent1Ginderives one step,t2 from de- t1 in one step, de- string transducer); these differencesstring transducer); simply amount thesestring differences transducer);terms.to Then simply the theseG appropriatederives amount differencest selectionfromterms.to simply thet appropriate Then ofin amount algebrasoneG step,derives selection andterms. de-t homo-from Then of algebrast notedGinderives one by and step,t1t homo-from de-G t2t,noted ifin there one by step,existst1 de- aG productiont2, if there of exists a production of 2 1 2 1 2⇒ 1 ⇒ to the appropriate selectionto of the algebras appropriate and homo- selectionto thenoted appropriate of algebras bystringmorphisms.t selection and transducer); homo-t , if of there algebras thesenotedstringmorphisms. exists differences by and transducer); at production homo- simplyt these,noted if of thereamount differences by existst terms.the a simply production formt , Then ifB thereamountGt of existsderivesand tterms.the2 acant production formfrom be ThenB obtainedt Gint ofderivesand one byt replacing2 step,cant frombe de- obtainedt in one by replacing step, de- 1 ⇒G 2 1 ⇒G 2 1 ⇒G 2 → 2 →1 2 1 morphisms. morphisms. morphisms.the formtoBPlan the appropriate oft and thet paper.2 can selection bethetoThe obtainedPlan theform paper appropriate of ofB algebrasthe isby structuredt paper.replacingand selection andtthe2Thecan formhomo-as paper be fol- ofB obtained algebras isnotedan structuredt and occurrence by by andt replacing2 tcan homo-as be fol- of obtainedBt ,innotedan ift there1 occurrenceby byt replacing exists.t The of alanguage productionBt ,in ift there1Lby(G oft exists). The alanguage productionL(G of) → → → 1 ⇒G 2 1 ⇒G 2 Plan of the paper. The paperPlan is structured of the paper. as fol-ThePlan paperan of occurrence the ismorphisms.lows. structured paper. We ofThe willB as paperin fol- startt1 by is byanmorphisms.lows. structuredt laying. occurrence The We thelanguage will as formal fol- of startBL foundations byin(anGt laying)1 occurrenceby t the. The formalthegenerated oflanguage formB foundationsinBt by1Lby(GGttis).and the Thetthegenerated setlanguagecan formofbe allB obtained terms byL(GGttis)and the byTtΣreplacing setthatcan of canbe all obtained terms t byTΣ replacingthat can → 2 → ∈ 2 ∈ lows. We will start by layinglows. the formal We will foundations start bylows. laying Wegenerated the will formalin startPlan Section by foundations byG ofis laying the2. the We paper. set the will of formalgenerated allinThe then termsPlanSection paper foundationsshow oft by the2. ishowTG We structured paper.isthat to willthe combine cangenerated setThe then ofas paper show all fol- bi- terms by ishowGanbe structuredtis derived,occurrence to the combineT setthat ofasin can allzero fol- of bi- termsB orinanbe moret derived,occurrencebyT steps,t.that The in from can zero oflanguageB the orin termmoret Lby(S steps,G.t). The fromlanguage the termL(SG. ) ∈ Σ ∈ Σ 1∈ Σ 1 in Section 2. We will then showin Section how to 2. combine We willin bi-then Section showbe derived, 2. how Welows.morphisms to will in combine We zero then will withor show startbi- more algebraic how bybe steps,lows.morphisms laying derived, to combine from interpretationsWe the will inthe formal with zero startbi- term algebraic or foundations bySbe moreand. laying derived, illus- steps, interpretations the in formalfromgenerated zeroA the(tree) or foundations termmoreand by homomorphism illus-GS steps,.is the fromgenerated setA of the(tree) allis term a terms by function homomorphismGS.tis theTh set:thatT ofΣ can allis a terms functiont Th :thatTΣ can ∈ Σ → ∈ Σ → morphisms with algebraic interpretationsmorphisms with and algebraic illus-morphisms interpretationsA (tree) withintrate Section algebraic homomorphism the and approach 2. illus- We interpretations will onis severalintrate thenA a Section function(tree) the show grammarand approach homomorphism2. how illus-h We: to formalisms willT on combine several thenA (tree) showis bi- grammarin a function homomorphism howbeT∆ derived, towhich formalisms combineh : expandsT inis zero bi- in a function orsymbolsbeT more∆ derived,which steps,h of:Σ expandsT ininto from zero (possibly the orsymbols termmore mul-S steps, of. Σ into from (possibly the term mul-S. Figure 1: Our unified perspectiveFigure 1: onOur grammar unified perspective formalisms:Figure on (a) grammar1: ordinary Our unified formalisms: grammar perspective formalisms; (a) ordinary onΣ grammar→ (b) grammar synchronous formalisms: formalisms; for- (a)Σ → ordinary (b) synchronous grammarΣ for-→ formalisms; (b) synchronous for- trate the approach on severaltrate grammar the approach formalisms on severaltrate in theT grammar∆ approachwhichmorphismsSection expandsformalisms on several 3. We with symbols will grammarin algebraic defineT ofmorphismsSection∆Σwhich formalisms genericinto interpretations 3. (possibly expands We with algorithms will in algebraic symbols mul-defineT and∆ inwhich illus-generic Sec- interpretationsof Σ expandsinto algorithmstiple)A (possibly(tree) symbols symbols and in homomorphism mul- illus- Sec- of ofΣ∆intotiple)whileA (possibly(tree) symbolsis following a function homomorphism mul- of the∆h while structure: T is following a function theh structure: T malisms; (c) multiple “inputs”malisms; and (c) “outputs”. multiple “inputs” andmalisms; “outputs”. (c) multiple “inputs” and “outputs”. Σ → Σ → Section 3. We will define genericSection algorithms 3. We will in define Sec-Section generictiple) 3. We algorithms symbolstratetion will 4. the define Section approach of in∆ generic Sec-while 7 on discusses algorithmstiple) severaltratetion following 4. the symbols grammarrelated Section approach in the Sec- ofstructure work, 7 formalisms∆ on discussestiple)while several and Sec- symbols following grammarrelatedin Tof∆ of work, thewhich formalismsthe∆ inputwhile structure and expands tree. Sec- following in Formally, symbolsTof∆ thewhich the ofinputh structureΣis expandsinto definedtree. (possibly Formally, symbols by a mul- term ofhΣisinto defined (possibly by a mul- term

tion 4. Section 7 discussestion related 4. Section work, and 7 discusses Sec-tion 4.of related Section the inputSectiontion work, 7 8 discusses tree. concludes.3. and We Formally, Sec- will related defineofSectiontionh work, theis 8generic defined input concludes.3. and We tree. algorithms Sec- by will a Formally, definetermof the in generic Sec- inputh is definedtree. algorithmstiple)h(f) Formally, symbols byT∆ a in term Sec-x1h of,...,xis∆n definedtiple)hwhile(forf) each symbols followingbyT∆ af termx1 of,...,xΣ the,∆ wheren while structureforn each followingis f Σ the, where structuren is ∈ ∪{ } ∈ ∪{∈ } ∈ the rank of f and the xtheare rank variable of f and symbols the x ofare rank variable symbols of rank tion 8 concludes. tion 8 concludes. tion 8 concludes.h(f) tionT∆ 4.x Section1,...,xn for 7 discusses eachhtion(f)f 4. relatedT SectionΣ∆, wherex1 work,,...,x 7 discussesnnh and(isforf) each Sec- relatedT∆f ofx1 work,,...,xΣ the, wheren input andfor tree.n each Sec-is Formally,f ofi Σ the, where inputh is definedtree.n is Formally, byi a termh is defined by a term result of this can be (non-standardly)result of this can recorded be (non-standardly) asresult a∈ ∪ of{ this recorded} can be as (non-standardly) a∈∈ ∪{ } recorded∈ r ∪{∈ as a } ∈r r 2 Formal foundations2 Formal foundations 1 0. Given a term t T0.Σ1, Givenh(t) is aterm definedt recursivelyTΣ, h(t) is defined1 recursively the ranktion of f 8and concludes. the xi are variablethetion rank 8 concludes. symbolsof f and ofthe rankxthei are rank variable of f andh symbols(f) the xTi of∆are rank variablex1,...,xn h symbols(forf) eachT of∆f rankx1,...,xΣ, wheren forn eachis f Σ, where n is derivation tree, whosederivation nodes are tree labeled, whose by names nodesderivation are labeled tree by, whose names nodes are labeled by names∈ ∪{ ∈ } ∈ ∪{∈ ∈ } ∈ 2 Formal foundations2 Formal foundations2 Formal0. Given foundations a term t T , h(0.t) Givenis defined a term recursivelyt T0., Givenh(t) is aterm definedtheby t rank recursivelyT of,fhand(t) is the definedxthebyare rank recursivelyvariable of f and symbols the x ofare rank variable symbols of rank of production rules, andof in production which a rule rules, application and in whichofA productionsignature a rule∈is application aΣ rules, finiteA set andsignatureΣ of in function whichis a finite∈ a symbols ruleΣ set Σ applicationfof, function∈ symbolsΣ f, i i by 2 Formal foundationsby2 Formal foundationsbyr 0.r Given a termr t T0.Σ, Givenh(t)ris aterm definedt recursivelyTΣ, h(tr) is defined recursivelyr A signature is a finite set ΣAofsignature functionis symbols a finiteAf setsignature, Σ of functioniseach a finite of symbols which set Σ hasfof, function beeneach assigned of symbols which a non-negative hasf, been3 assigned in- ah( non-negativef2(t1,...,tn)) in-3 = hh(f(f) (ht1(,...,tt1)2/x1n,...,h)) = h((tfn))/xh(nt13,)/x1,...,h(tn2)/xn , a is the child of anothera isa theif a childintroduced of another a non-a aif ais theintroduced child of a another non- a if a introduced a non- ∈ { ∈ { } } 1each of which has been assigned1each2 of a whichnon-negative1 has beeneach in- assigned of which2 a1Ateger non-negative hassignature1 been calledS assignedis its in- arank finiteNP a.Ateger non-negative Given set VPsignatureΣ calledof a signature2 functionis itsS in- arank1 finite symbolsΣ., Given set weNPΣ canfof a VP, signature functionby symbolsΣ, we canSf, by NP VP h(f(t1,...,tn)) =→h(f) h(ht(1f)(/xt11,...,t,...,hn))(tn =)/x→hh(nf(f), (ht1(,...,tt1)/x1nwhere,...,h)) = h((ttfn/x))/xh,...,t(nt1,)/x→1where/x,...,hrepresents(ttn/x)/x,...,tn , a substitution/x represents a substitution terminal occurrence thatterminal was expanded occurrence by thata2. was Interminal adefine expanded a (finite occurrence by constructor)a{2.define In that a a tree was(finiteover expanded constructor)Σ as} a{ finite by treea2.over InΣ a as1￿ a{ 1 finite} n￿ n 1￿ 1 } n￿ n teger called its rank. Giventeger a signature called itsΣ,rank we. canteger Given called a signature itseachrankVP ofΣ. which, Given we can has a signatureV beeneach NP assigned ofΣ which, weVP a can non-negative has been assignedVNP in- ah( non-negativef(t1{,...,tVPn)) in- = hh(f(f) (V}ht1({,...,tt NP1)/x1n,...,h)) = h((tfn))/x}h(nt1,)/x1,...,h(tn)/xn , where t￿ /x1,...,t→￿ /xn whererepresentst￿ /x a1 substitution,...,t→￿ where/xn representst￿r/x1that,...,t a replaces substitution￿ /xrn allrepresents occurrences→thatr{ a replaces substitution of xi allwithr occurrences the respec-{ } of xi withr the respec-}r seconddefine a step,(finite we constructor) can transformseconddefine tree over a step,(finite thisΣ as we constructor) derivation a finite candefine transform a tree tree(finitesecondover{tegertree1 constructor) thisΣ whoseNP calledas step, derivation a nodes finite itsn we treerankJohn are} canover.tegertree labeled Giventree transformΣ whose called{as a1 with asignatureNP nodes finite its symbolsrank this arenΣ., labeled Givenderivation from weJohn} can{ Σ a1 with5 signature symbols treenΣ, from4 we}NP canΣ 5John 4 5 4 that replaces all occurrencesthat of replacesxi withthe all occurrences respec-that replaces of xiwheretive allwith occurrencesti￿ the.t A respec-/x homomorphism,...,t of xiwheretive/xwithti￿ the.represents ist A called respec-/x homomorphism,...,tlinear a substitution/xif everyrepresents is called linear a substitutionif every intotree a whose string nodes by interpreting are labeledintotree with a whose each string symbols nodes rule by from application interpreting aretree labeledΣ whose with as nodeseachintodefinesuch symbols a ruleare that string a labeled(finite a from application node→ byΣ constructor) with interpretingdefinesuch asymbols label as that a oftree(finite a from rank node eachoverΣn constructor) withhasΣ rule→as exactly a labela application finite oftreen rankovern ashasΣ{as exactly1￿ a1 finiten →n￿ n} { 1￿ 1 n￿ n} tive t￿ . A homomorphismNP Marytive is calledt￿ . Alinear homomorphismNPif everytive Maryt￿ . is A called homomorphismthattermlinear replacesh(f)ifcontains everyNP all is calledoccurrences eachthattermlinear variableMary replacesh(f of)ifcontainsx every at allwith most occurrences theeach once. respec- variable of x atwith most the once. respec- such that a node with a labelsuch of rank thatn a nodehas exactly withsuch an label that of a rank nodei treechildren.n withhas whose exactly a We label nodes→ write ofn rank areTΣtreechildren. labeledforn has whosei the exactlyset with We nodes of write symbols alln trees are→TΣ labeledfor fromoveri theΣΣset. with of symbols all trees fromover ΣΣ. → i i a string-concatenationa operation. string-concatenation operation.term ha( string-concatenationf) containsV each variablelovesterm h operation. at(f most) contains once.V eachterm variablelovesh(f) contains attive mostFinally,t￿ each. once. A ahomomorphism variabletreeV transducer attive mostFinally,lovest￿ . isonce.is A called a homomorphism devicetreelinear transducerMiffor every de- isis called a devicelinearMiffor every de- children. We write TΣ for thechildren. set of all We trees write overTΣchildren.Σfor. the set We of writesuchTrees all trees thatT canΣ for a over be node the writtenΣ set. with ofsuchTrees asa all labelterms trees that can of; a over rankfbe node(t written1Σ,...,tn. withhasn exactlyas a) labelstandsterms ofn; rankf(t1,...,tn hasi n exactly) standsn i While this picture seemsWhile complicated this picture for seems context-Finally, complicatedWhile a tree this transducer→ for picture context-isFinally, seems a device a complicatedtreeM for transducer→ de-Finally, foris context- a devicetreetermscribing transducerhM(f binary)forcontains de- relationsis a→ each devicetermscribing variablebetweenhM(f binary)forcontains at trees; de- most relations the each once. first variablebetween tree at trees; most the once. first tree Trees can be written as termsTrees; f( cant1,...,t be writtenn) standsTrees as terms can; f be(tchildren.for written1,...,t the treen as) Westandsterms with write root; fT(tΣ labelchildren.for1,...,tfor the thef treeandn set) Westands withsubtrees of write all root treesTtΣ label1,...,tfor over thef Σandn set. subtrees of all treest1,...,t over Σn. scribing binaryFigure relations 2: A betweencontext-freescribing trees; binaryFigure grammar the relations first 2: treescribing A and betweencontext-free one binary of trees;in its relationsFinally, each deriva-grammar the pair firstFigure a between istree tree usually and transducer 2: trees;onein A seenFinally, each context-freeof the as its pairis first the deriva-a is device inputtree tree usually transducer grammar andM seen thefor sec-de- as andis the aonedevice input of andM its thefor deriva- sec- de- freefor the grammars tree with root by label themselves,freefforand the grammars subtrees tree the witht1 separation,...,t root by labelfor themselves,n. thef treeand into withsubtreesfreeTreesThe root the grammarsnodes cant label1 separation,...,t beoff written aandn. tree by subtreesTreesThe can as themselves,intoterms benodes cant identified1,...,t; f beof(t written1 a,...,tn. tree the by cannthe as separation) standsterms pathsbe identified; f(t1 into,...,t byn the) stands paths in each pairtion is usually trees. seenin as each the input pairtion isand usually trees. the sec-in seen each as pair the is inputscribingond usually as and the binary seen thetion output. sec- as relationstrees. the They inputscribingond betweengeneralize as and the binary the output. trees; sec- string relations the They transduc- first betweengeneralize tree trees; string the transduc- first tree twoThe differentnodes of a generative tree can betwoThe processes identified differentnodes of by (first a generative the tree paths aThe can derivation benodes processes identifiedoftwoforπ a tree the different byN (first∗ tree canthefrom with pathsbe a the generativederivation identified root root labelforπ to the the byNf∗ treeprocessesand node: thefrom withsubtrees paths theThe root root root (firstt label1,...,t to has the af ad- derivationandn node:. subtrees The roott1,...,t has ad-n. ond as the∈ output. They generalizeond∈ as the string output. transduc- Theyond generalize as the output.iners string each to They the pairtransduc- tree generalize is case usually andiners seenstringare each to definedthe as pairtransduc- the tree is input incase usually more and and detailseen the are sec- defined as in the input in more and detail the sec- in π N∗ from the root to theπ node:N∗ Thefrom root the has rootπ ad- to theN∗ node:fromThedress theThe rootnodes root￿, and to hasof the the ad- a node:i tree-th childThedress can The benodes ofroot￿, identified the and hasof node the ad- ai tree-th below by child canthe path pathsbe of identifiedπ the node below by the path pathsπ tree,∈ then the string fromtree, the∈ then derivation the string tree) from∈ is appli-ers the totree, derivation the tree then case the tree) and string are is appli- defineders from to the in the tree more derivation case detail anders in are to tree) definedthe tree is appli-ond(Comon incase more as and the etdetail are output. al., defined in 2007). Theyond(Comon in A more generalizeas useful the etdetail output. al.,way string in 2007). ofThey thinking transduc- A generalize useful of way string of thinking transduc- of dress ￿, and the i-th child ofdress the node￿, and below the i-th path childdressπ of￿, the and node theπhasi-th belowtheN∗ child addressfrom path of theπ theπi root. node Weπhas to write belowtheN∗ node:addressfromt(π path) for theTheππ thei root. root We symbol to has write the ad- node:att(π) for The the root symbol has ad- at (Comon et∈ al., 2007). A useful(Comon∈ way et of al., thinking 2007).(Comon of A useful et al.,wayersa tree2007). of to thinking thetransducer Atree useful case of is andway inersa terms tree are of to thinkingdefined thetransducer of treebimorphisms incase of more is and in detailterms. are A defined bi- in of bimorphisms in more detail. A bi- in cablehas the much address moreπi. We widely writecablehas and,t( theπ much) for weaddress the argue, more symbolπi. We widelyhas at write the address and,use-t(πcable) dresspathfor weπ thei. argue,π much￿ We,in symbol and the write the widelymore tree atti(-thtπ. ) widely childdresspathfor use- theπ of￿,in symboland,the and the node the tree we ati-th belowt argue,. child path ofwidelyπ the node use- below path π B =(h , ,h ) B =(h , ,h ) a tree transducerpossibly is in empty termsa of tree stringsbimorphisms transducerpossiblyh' of terminals),. is A' ina bi- empty terms of stringsbimorphismscontains(Comonmorphism is of in etterms terminals),.possibly isa al., A a bi- triple of2007).bimorphisms(Comonmorphism empty A usefulcontains1 et strings. is al.,way A a bi-2 triple 2007). of aof of thinking an terminals), ARTG useful of1 way2 ofofcontains thinking an RTG of a ful.path Theπ ingeneral the tree t. pictureful.path looks Theπ in asgeneral the follows. tree t. picturepath Considerπ in looks theful. treehas asA Thethet follows..Σ-algebra address generalπ Consideri.1 pictureconsists Wehash1A write theΣ-algebra looksofaddresst(π a)1 non-emptyfor asπ thei. follows.1consists We symbol write set A ofatt Consider(π a) non-emptyfor the symbol set A at G G G G . . G G G h h morphismh' is a triple B A=(. morphismh1, ,h2) isof a an triple RTGA. morphismB =(h1, is,h aand2 tripletree) of two transducer anB homomorphisms RTG=(h1, is,h inaand2 terms tree) of twoh1 transducer an,h of homomorphisms2 RTG;bimorphisms it represents is in terms. the Ah1 bi-,h of2;bimorphisms it represents. the A bi- a regularA Σ-algebra tree grammarconsistsa regular ofAover aΣ non-empty-algebra a tree signature grammar setconsistsAΣA,Σ an-algebra ofover al- aa non-emptypathcalled regular arule signatureπ' consists theinA the treedomain set treeA ofgrammarΣrt(.and,, a.A an non-emptypathcalled1,...,A for al-Gπ each theinover theruledomainn symbol set) tree.A WeaAt signature.and,.f interpretGr forΣ(Awith each1,...,AΣG such, symbol an al- deriva-n)f. WeruleΣ with interpretGAG r such(A1,...,A deriva-G n). We interpret such deriva- A A and two homomorphismsA → h and,h two; it represents homomorphismsm the→and bi-∈ twoh ,h homomorphisms; itmmorphismnary represents relation∈ theish a bi-(,h tripleh1(;t it→),hmorphismnaryB represents2=((t relation))h t, theis,hL a bi-(( tripleh) 1of)(t.) an,hvielleichtB 2 RTG=((t))h t, ,hL() of) . anvielleicht RTG G G rank m, a total function1rankh 2 fmG,: aA totalh'm functionA', calledf1 the:2A A, called1 the2 1 2 1 2 gebracalledA thewithdomain signatureand, forgebra each∆called, and symbolA the awithdomain homomorphismf signatureΣand,withcalled for each∆ the,h anddomain symbolgebra: aA homomorphismtionand,ΣfA-algebrawith trees forΣ with each signature into symbolnconsists stringsnAh :AΣf∆-algebra of, andintionΣ awithnon-empty the a trees homomorphismstringmconsists into set algebraA stringsA of a non-emptyAh ins: theovertionstring set{ A trees algebra into strings|AG∈s over{ G in} the string|G G∈ algebraG } As overG (a) m ∈ (b)narym relation (∈h1(t),h2A(tm))naryt relation(c)L∈( ) . (vielleicht→h1A(t),hnary2(t relation)) t Landbrauchen(→(h1) two(t.),hvielleicht homomorphisms2 wir(t)) dast hierLandbrauchen( gar) twoh1.,h nichtvielleicht homomorphisms2; wir it represents das hiergar theh1 bi-,h nicht2; it represents the bi- rank m, a total function f Arank: Am, a totalA, called functionrank the fmA ,: aA totalcalledoperation functionsomeA,{ the calledassociateddomain terminalf A the: Aand, withcalledoperation alphabet| for∈Af each,. the called WeGsomeassociateddomain symbol can}T{; theevaluate the terminaland,f elements with forΣaf termwith each. alphabet| We∈ of symbol can{ thisGevaluate}T alge-f; thesomeΣa termwith elements| terminal∈ G of} this alphabet alge-T ; the elements of this alge- TΣ T∆. If we applyTΣh to anyT→∆. tree If wet applyL( brauchen)h,to weT anyΣ→ wir treeT das∆t. hier IfL we gar( brauchen apply) nicht→, we h to wirm any das treehierbrauchen∈ gart nichtL( wirmnary) das, we relation hier∈ gar( nichth1(t),hnary2(t relation)) t L((h1)(t.),hvielleicht2(t)) t L( ) . vielleicht operation→ Figureassociated 1: Interpreted with foperation. We tree→ can grammars:evaluateassociated (a)a∈ term monolingual; withoperationfG. Weassociated can (b)rankt evaluate synchronous;→TmΣ, withto a antotala∈f term object. We (c)functionG multiplecanranktt evaluatefTmAΣ “inputs”,:to aAby antotala term executing object and functionA “outputs”., calledt thef∈A op- the: AbyG3 executing GrammarA, called the{ op- the formalisms3 Grammar| based∈ { G onformalisms} tree | based∈ G on} tree obtain a term over A,obtain whicha we term can over interpretA, which as anobtain we∈ bra can a are term interpret the over strings asA∈A an,∈ which inATbra∗, we and→ are can wethe interpretA havestrings∈ A constants inbrauchen as→T an∗, and for wirbra we das are have hier thebrauchen constants gar strings nicht wir in fordasT ∗ hier, and gar we nicht have constants for t TΣ to an object t t byTΣ executingto an object the op-tt TΣ toby an executingoperationerations: object t theassociated op- by executingwithoperationerations:f. We theassociated can op-evaluate withaf term. We can evaluateautomataa term automata ∈ A ∈ A∈ ∈A ∈3A Grammarthe elements formalismsA ∈ A of3 basedT Grammarand onthe a binary tree elements formalisms3 string Grammar of concatenation basedT and onformalisms a binary treethe string elements based concatenation on of treeT and a binary string concatenation elementerations: of A. By collectingelementerations: all of suchA. By terms,erations: collecting we ob-element allt suchTΣ to of terms, anA object. By we￿t collectingt￿ ob-TΣ toby an allexecuting object such￿t￿ theterms, op- by we executing ob- the op- automata∈ ∈Aautomata∈ A Aautomata∈ A 3 Grammar formalisms3 Grammar based onformalisms tree based on tree preted r7 as the tree N(Sue) and a rule like r1 as 3.3operationσ( Monolingualt1,...,tm.) As Grammars= af lastσσA(t1 step,t,...,t1operation,..., wem) usetm=. af As) homorphismσA. ( at1 lastWe,..., step, willthm weto nowoperation) use. present a homorphismWe. Asa will unified a last nowh step, frameworkto present we ause unifiedof a homorphism frameworkh ofto tain a language LA￿ (￿ ,htain)= a languageh(t) LA￿t(￿ ,hL)=( )tainerations:h a(t language)￿ ￿ t LLAAerations:(( ),h)=A h(tA) A t AL( )automataA automata σa(ttree,...,t operation) =whichf ( t takesσ(,...,t ,...,t thet treeA) yields) . = f ofσ(Wet itst,...,t will,..., now) t =A presentf) . ( t· aWe,..., unified willt framework now) . presentWe ofA· a will unified nowsynchronous framework present a and unifiedof non-synchronoussynchronous· framework and of grammar non-synchronous for- grammar for- 1 m G σA 1 {1 mm | G∈σA 1G1 }{Letmmap usm use each| theseσA rule∈1 definitions￿ into￿GG } am term tomap make{ over each ourA￿ s rule￿ example: for| into the∈ a above termG } over rulemapAs each: for therule above into a rule term over As: for the above rule of elements of AA. Thisof perspective elementsA of isAA illustratedA. This perspective inAof elementsSetsA of isA trees illustrated of canAA. beThisSets in specified perspective ofA trees by canregular beis illustrated specified tree malisms by inregular in terms tree of regularmalisms tree in languages,terms of regular tree ho- tree languages, tree ho- two subtrees and inserts them as the children ofsynchronous a with￿σ context-free( andt1,...,t non-synchronousm)￿ grammarssynchronous= f￿σσA(t￿1t,...,t1 grammar as￿ ,..., andstring-generatingm) non-synchronous￿t for-msynchronous=￿ f)σA. (￿t1￿ We,..., and grammar will non-synchronous￿tm￿ now) for-. presentWe grammar a will unified now for- framework present a unifiedof framework of Fig.Sets 1a. of trees can be specifiedFig.Sets 1a. of by treesregular can tree beSets specified of treesFig.grammars by canr1a.regular, we be have(G specified treeecseg´ h(Ar andgrammars)= by Steinby,regularω1 r(GAx, we1 tree1997;ecseg´ ... haveA ComonandxAhn Steinby,(rω)= etn+1Aωmomorphisms,. 1997;1 It canx1 Comon be...Ar, wex and etn have algebras.ωmomorphisms,n+1h(r.)= It We canω will1 beand illustratex1 algebras.... thex Wen ω willn+1 illustrate. It can the be ￿ root with label￿ S, etc.,￿ the￿￿ interpretation￿ ￿ ￿￿ ￿ of a deriva-￿ malisms￿ ￿ devices￿ in terms￿ ￿￿ precise. of regular￿ ￿ Thismalisms tree is languages,a￿ case in￿· terms with￿· tree of a single regular ho-malisms·￿ · inter- tree in languages,termssynchronous· of regular· treeho- and· tree non-synchronouslanguages,· synchronous treeho- and grammar· non-synchronous· for-· · grammar for- grammars (Gecseg´ and Steinby,grammars 1997;(Gecseg´ Comon andgrammars et Steinby,(G 1997;al.,ecseg´ Setsshown 2007). Comonand of treesFormally, Steinby, that et can under 1997;al., suchbeSets 2007). thisspecified a Comon grammarof construction,shown treesFormally, by et can isregular that a suchstructurebe under specifiedL tree aA grammar( thisframework,h by construction,) isregularis a ex- structureshown using tree ordinary thatLframeworkA ( under,h context-free) thisis using ex- construction, ordinary grammars context-freeLA ( ,h grammars) is ex- Wetion can tree define would an now obvious beWe a tree; canmembership namely, define an an ordinary problem obviousmomorphisms,: membershippretationWe￿ can and (n define algebras.= 1problem),￿ as anmomorphisms, illustrated We obvious￿ : will￿ ￿ illustrate inmembership and Fig.￿ algebras. 1a.momorphisms, the￿ ￿ Weproblems ￿G willmalisms and illustrate￿: algebras. in￿ terms the We of regular willmalismss illustrateG tree in languages,terms the of regular tree ho- tree languages,s G tree ho- al., 2007). Formally, such aal., grammar 2007). isFormally, a structureal., such 2007). aframework grammar Formally,grammarsG =(actly is usingN, a suchstructureΣL(G,P,S ordinary( aGecseg´ grammar)),, the whereframework andgrammarsG context-free string=( Steinby,isNN, ais structureΣ languageaactly signature(G,P,S using 1997; grammarsecseg´ L), ordinary( whereG Comonframeworkand of of), nonter- the the Steinby,N context-freeis originalet string a signature usingmomorphisms,and 1997; language synchronous gram- ordinary grammars Comon of nonter-actly of context-free and et tree-substitutionL the( algebras.Gmomorphisms,and original), synchronous the grammars string We gram- grammars, will and language tree-substitution illustrate algebras. but the of We the grammars, will original illustrate but gram- the Givenparse some tree element of G. WeaGiven couldA, evensome is a interpret elementLA( the,ha same)?WeA,Given is aWe some canLA adapt( element,h a) standard?Wea constructionA, is a (GoguenLA( ,h)?We G =(N,Σ,P,S), where N∈Gis=( a signatureN,Σ,P,S∈ of) nonter-, whereG =(N∈N,andisΣ a synchronous signature,P,Sal.,minal) 2007)., where∈ symbols, of nonter- tree-substitution Formally,NGis all aof signatureandal.,minal which such synchronous 2007). grammars,a symbols,∈ are of grammar nonter- taken Formally, all to tree-substitution but is haveof∈ aand the which suchstructure rank synchronous aG are0 grammar, taken grammars,framework to tree-substitution is have a structure but rank is using muchthe0, ordinary moregrammars,framework general context-free but is using than muchthe this, ordinary more grammars and general we context-free than this, grammars and we can alsoderivation define tree a parsing simultaneouslycan also problem define as a: string For a parsing every and as aele- problemcanet al., alsomar. 1977).: For define Let everyG a beparsing ele- a context-freemar. problem grammar: For with every ele- mar. minal symbols, all of whichminal are taken symbols, to have all rank ofminal which0, symbols,framework are takenGΣ all=(is to is haveofaN, muchsignature whichΣ rank,P,S more are0,) of, taken where general terminalframeworkGΣ =(is toN haveaN, than signatureis symbols,Σ a rank is signature,P,S this, much0 and,) of,S where more terminalframework weof nonter-N generalNisis symbols,a a is signature thanandwill much synchronous hint this,S more of at and nonter-N this generalis we at a tree-substitution the thanandwill end synchronous hint ofthis, the at and section. this grammars, we at tree-substitution the end but of the the section. grammars, but the tree using two different interpretation functions. nonterminals N, terminals T , and productions∈ P . ∈ mentΣ is a signatureLA( of,h terminal), computementΣ symbols,is a signature (someLSA( compact ofN,h terminalis)Σ, a computeis a repre-will signature symbols, hintmentminaldistinguished (some at of thisSa symbols, terminalFor atcompactNL the illustration,Ais start( endall symbols,a ,h of of symbol, repre-will)minaldistinguished which the, compute hint section.S symbols, consider are and at takenN thisPForis (somestartis at all ato a the the illustration,haveof finite symbol,will end which compact context-free rank sethint of arethe and of0 at, takensection. thisP repre- considerframeworkis at to a the gram-have finite end rank theis set ofFor much the of0 context-free, section. illustration, moreframework general gram- is thanconsider much this, more and the general we context-free than this, and gram- we ∈ G ∈ ∈G ∈∈ G ∈ 3.1 Ordinary grammars3.1 Ordinary grammars sentationdistinguished3.2 of) Interpreted start symbol, Regularsentationdistinguished and P Treeis of) a finite Grammars start set symbol,distinguished of and PsentationΣproductionsWe startisis a start a finite symbol, signature by of) set ofdefining and of the of formP terminalΣproductions ais regularis aB a finite signature symbols,t tree, set where of grammarof the ofSB form terminalisN aB non-.is For symbols,a t, wherewill hintSB atisN thisa non-is at a thewill end hint of the at section. this at the end of the section. mar in Fig. 2a, and→ let’smar say in Fig. we∈ wantG 2a,→ and to parse let’s say∈ themar we in want Fig. to 2a, parse and the let’s say we want to parse the productions of the form B productionst, where ofB theis a form non-productionsB 3.1t, Ordinarywhere ofdistinguishedterminala the stringB formisα symbol,grammars aB non-( startN t and, symbol, whereT3.1distinguishedterminal)t∗, let OrdinaryBT andNntis symbol,(Σα aP.) non-denote Thestartis grammars a and productionsfinite symbol,3.1 thet set string OrdinaryT and ofN ΣP.The Theis grammars a process productionsfinite set of of generatingThe process a string of from generating a context- a string from a context- ∈ ∪ ∈ ∪ ∈ ∪ While this view is unnecessarily→ complex for→ ex- of nonterminalssentence→ “John in α loves, in the Mary”.sentence same order. The “John RTG We lovesin- for3.1 the Mary”. gram- Ordinarysentence TheG grammars RTG3.1 “John for Ordinary the loves gram-G Mary”. grammars The RTG for the gram- terminalparses symbol,A, ,h(a and)=t tTterminalNparsesLΣ(. The) symbol,A, productionsh,h((ta) and)=terminalt= taTTheN. symbol,LΣ process(.productionsof The)parses aregular and productions ofh(t generatingtA,) tree ofT,hN the= grammar(aΣa form)=.The aproductionsof The. string aB process regular aret productions from usedLt,( tree whereof asa) generatingthe context- rewritinggrammarB formhThe(ist) aB process rulesare non- a string= usedt,a whereoffree as from. generating rewriting grammarB a context-is a rules non- a stringcanfree be from seen grammar a as context- a two-stepcan be process. seen as a two-step process. plainingG context-free{∈ ∈ grammars∪ G | G alone, theA separa-{∈ }∈ ∪ G | ∈GA ∪ } { →∈ G | A→ } of a regular tree grammar areof used a regular as rewriting tree grammar rulesof a regular arefree used grammar tree asterminalonclude rewritinggrammar terms.mar intoG symbol, containscan More rulesareall be used specifically, (and seen andfree asterminalon rules only)t as rewriting terms. grammar aT two-step productionsN thesuch symbol,mar MoreΣderivation. rulesG Theas process. containscan specifically, and productionsS offree bet therelation seen grammar formT rulesN asr the1Σ a(derivationNP,VP. two-stepTheInG The such acan firstprocess productions beas process.step, relation)mar seen; S of we generatingas contains generate a two-stepTheInr a1( firstprocessNP,VP a rulesderivation string process.step, of we from such generating); generateof aG ascontext-byS aex-derivation string fromr1(ofNP,VP aG context-by ex-); tion into two different generative processes (first G ∈ ∪ →∈ ∪ → → Weon terms. call the More trees specifically, overWeonΣ the terms.derivation callderivation the More trees relationspecifically, trees overon, terms. andΣIn the a thederivation More firstWederivationofA step, aG specifically, callit regularis generates wep defined( the relationA trees generate1 tree,...,A trees the, grammar as andderivation follows. the aInm overofderivation) a the, aG derivationfirst whereregular areisΣ Let useddefinedstep, relationderivationitpof treet generates1 as we,t=G rewriting2grammar as treebygenerateA follows.In ex-TN ashown trees firstαrulesareΣ the aisbederivation Let, usedstep, a derivation and intfreepanding1 as we,t Fig. the rewriting2 generategrammarof G 2b.T nonterminalsNit treeby rulesΣ ageneratesex-Gbederivation showncanfreepanding be using seen grammar inof the productionG asFig. nonterminals derivationby a two-step ex-G 2b.can rules. be process. using treeseen The production as shown a two-step in rules. Fig. process. The 2b. derivation, then interpretation) is￿ widely￿ applicable. →￿ ￿ ∈￿ → ∪￿ ∈ ∪ of G is defined as follows.of LetG ist1,t defined2 T asN follows.ΣofbeG ispanding Let definedt1onproduction,t nonterminals2 asterms. follows.TN MoreΣ ofbe LetusingG specifically,, andtpanding1on,t production2A terms.1 T thenonterminalsNA MorederivationmΣ rules.be= specifically,nt Thepanding(α using relation). Note production thenonterminalsderivationIn a first rules. step,using relation The we production generateIn a first arules.derivation step, The we generateof G by aex-derivation of G by ex- trees inInparses particular,(a) thethe derivation derivationtrees in partparses trees can∈ be(a of∪ independent) athe. derivationtrees treesThis in∈ parses of tree∪ a. can(a) the now derivation∈ be··· interpretedThis∪ tree trees can of usinga now. a be homomor- interpretedThis usingtree can a homomor- now be interpreted using a homomor- ofthatG byis defined doing so, as we follows.of viewG is Letp definedast1 a,t symbol2 as follows.TN ofΣ rankbe Let tpanding1,t2 T nonterminalsN Σ be panding using production nonterminals rules. using The production rules. The In the case of context-freeIn the grammars, case of context-free it is known grammars,Inphism the caseh it ofwith is context-freeknownh(r1)=phism grammars,x1 ∈hxwith2, ∪h it(rh is3()=r known1)=John∈x1phism,∪ x2, hh(rwith3)=h(Johnr1)=, x1 x2, h(r3)=John, of whether w is a string or some other algebra of ob- nt(α) . The nonterminals and the start· symbol · · | | h h h that thejects. language We formalize of derivationthat ourthe view language trees as follows. is a of regular First, derivation we treethatof trees theetc.are languageis as a for regularmapsG. We of the tree derivation now derivation interpretetc. trees the treemaps treesis ain regular gen- the Fig. derivation 2b tree to theetc. tree inmaps Fig. 2b the to derivation the tree in Fig. 2b to the G languageneed (Comon an interpretative et al.,language component 2007). (Comon It that is defined maps et deriva- al., by 2007). anlanguageeratedterm It is by defined(Comon(Johnover the byloves etstring an al.,) algebra 2007).termMary(over ItJohnover isT defined, whichAlovess, which by) anMary eval-termover(JohnAs, whichloves eval-) Mary over As, which eval- G · · · · · · RTG tionover trees the to signatureterms ofRTG the of relevant productionover object the signature algebra: rule names ofRTG productionweuates denoteover to by rule the theT . signature stringnames The domain “John ofuates production of loves this to the algebra Mary”. string rule is This names “John means lovesuates Mary”. to the This string means “John loves Mary”. This means G G G ∗ of theDefinition context-free 1 Let grammarΣofbe the a signature. context-freeG. For A everyΣ-interpre- grammar produc-ofGthe the.that set For context-free of it every all is strings a produc- derivation over grammarT , andthat tree weG. it have of For is that constants a every derivation string. produc- In tree fact, ofthat that it is string. a derivation In fact, tree of that string. In fact, tation is a pair = (h, ), where is a ∆-algebra, for the symbols in T and the empty string, as well tion rule r of theI formtionAA rule r ωAof1A the1 ...A formnωnA+1tionparses ruleω1Ar1(“John...Aof then lovesωn form+1 Mary”Aparses) is(ω“John the1A1 set...A loves thatn Mary”ω containsn+1 ) isparses the( set“John that loves contains Mary”) is the set that contains and h: TΣ T∆ is a homomorphism.→ 2 →as a single binary concatenation→ operation . As (where A and all→Ai are(where nonterminals,A and all andAi theareω nonterminals,i are(whereonlyA this and derivation theall Aωii are nonterminals, tree.only this derivation and• the ω tree.i are only this derivation tree. We then capture the derivation process with a a last step, we use a homorphism rb to map each single regular tree grammar, and connect it with rule of G into a term over the signature of T ∗: (potentially multiple) interpretations as follows: For each production p of the form above, rb(p) is the right-branching tree obtained from decom- Definition 2 An interpreted regular tree grammar posing α into a series of concatenation operations, (IRTG) is a structure G = ( , 1,..., n), n 1, G I I ≥ where the nonterminal Ai is replaced with the vari- where is a regular tree grammar with terminal G able xi. Thus we have constructed an IRTG gram- alphabet Σ, and the i are Σ-interpretations. 2 I mar G = ( , (rb,T ∗)). It can be shown that under Let = (h , ) be the ith interpretation. If we G Ii i Ai this construction L(G) is exactly L(G), the string apply the homomorphism h to any tree t L( ), i ∈ G language of the original grammar. we obtain a term hi(t), which we can evaluate to an Consider the context-free grammar in Fig. 2. object of i. Based on this, we define the language The RTG contains production rules such as A G generated by G as follows. We write t i as a S r (NP, VP); it generates an infinite language I → 1 shorthand for hi(t) . Ai of trees, including the derivation trees shown in J K L(G) = t 1 ,..., t n t L( ) Fig. 2a and 2b. These trees can now be interpreted {J h IK I i | ∈ G } using rb with rb(r ) = x x ,1 rb(r ) = Sue, etc. Given this notion, we can define an obvious 1 1 • 2 7 membership problemJ K : ForJ aK given tuple of objects This maps the tree in Fig. 2a to the term ( ( ( ( ( ))))) ~a = a1, . . . , an , is ~a L(G)? We can also de- Sue watches the man with the telescope h i ∈ • • • • • • fine a parsing task: For every element ~a L(G), over the signature of T , which evaluates in the ∈ ∗ compute (some compact representation of) the set algebra T ∗ to the string w mentioned earlier. Simi- parses (~a) = t L( ) i. t = ai larly, rb maps the tree in Fig. 2b to the term G { ∈ G | ∀ Ii } We call the trees in this set the derivation trees of ~a. 1Here and below, we write • in infix notation. J K 4 S t α1 t α2 α3 S @ α @ NP e NP e 1 NP loves NP @ e @ e1 ↓ α α NP ↓ loves NP ↓ 2 3 1 2 John j* Mary m* loves e loves e2 ↓ John Mary j* (a) (b) m* (c)

Figure 3: Synchronous TSG: (a) a lexicon consisting of three tree pairs; (b) a derived tree; (c) a derivation tree.

r1 con entries, as shown in Fig. 3a, into larger derived r :S → NP VP 1 r7 r3 trees by replacing corresponding substitution nodes r2 : NP → Det N r11 r2 r3 : VP → V NP (a) with trees from other lexicon entries. In the figure, r r r4 :N → N PP 8 4 we have marked the correspondence with numeric r : VP → VP PP r r 5 9 6 subscripts. The trees in Fig. 3a can be combined r6 : PP → P NP r12 r2 into the derived tree in Fig. 3b in two steps; this r7 : NP → Sue (b) r r8 r10 r8 : Det → the 1 process is recorded in the derivation tree in Fig. 3c. r9 :N → man r7 r5 We capture an STSG GS as an IRTG G by inter- r10 :N → telescope r3 r6 preting the (regular) language of derivation trees r11 :V → watches r11 r2 r r in appropriate tree algebras. The tree algebra T∆ r12 :P → with 12 2 r r 8 9 r8 r10 over some signature ∆ consists of all trees over ∆; every symbol f ∆ of rank m is interpreted as ∈ Figure 2: A CFG and two of its derivation trees. an m-place operation that returns the tree with root symbol f and its arguments as subtrees. To model Sue ((watches (the man)) (with (the telescope))) STSG, we use the two tree algebras over all the • • • • • • symbols occurring in the left and right components This means that L( ) is a set of strings which in- G of the lexicon entries, respectively. We can obtain cludes w. The tree language L( ) contains further G an RTG for the derivation trees using a standard trees, which map to strings other than w. Therefore G construction (Schmitz and Le Roux, 2008; Shieber, L( ) includes other strings, but the trees in Fig. 2a G 2004); its nonterminals are pairs A ,A of non- and b are the only two derivation trees of w. h L Ri terminals occurring in the left and right trees of GS. 3.4 Synchronous Grammars To encode a lexicon entry α with root nonterminals A and A , left substitution nodes A1 ,...,An , We can quite naturally represent grammars that L R L L and right substitution nodes A1 ,...,An , we add describe binary relations between objects, i.e. syn- R R an RTG rule of the form chronous grammars, as IRTGs with two interpre- 1 1 n n AL,AR α( A ,A ,..., A ,A ) . tations (n = 2). We write these interpretations h i → h L Ri h L Ri as = (h , ) (“left”) and = (h , ) IL L AL IR R AR We also let hL(α) and hR(α) be the left and right (“right”); see Fig. 1b. tree of α, with substitution nodes replaced by vari- Parsing with a synchronous grammar means to ables; hL and hR interpret derivation trees into compute (a compact representation of) the set of derived trees in tree algebras TΣ and T∆ of appro- all derivation trees for a given pair (aL, aR) from priate (and possibly different) signatures. In the the set . This is precisely the parsing AL × AR example, we obtain task that we defined in Section 3.2. A related task is decoding, in which we take the grammar S, t α ( NP, e , NP, e ) , h i → 1 h i h i G = ( , L, R) as a translation device for map- G I I hL(α1) = S(x1, loves, x2) , and ping input objects aL L to output objects ∈ A h (α ) = t(@(@(loves, x ), x )) . a . We define the decoding task as the R 1 2 1 R ∈ AR task of computing, for a given aL L, (a com- ∈ A The variables reflect the corresponding substitution pact representation of) the following set, where nodes. So if we let G = ( , (hL,TΣ), (hR,T∆)), GL = ( , L): G G I L(G) will be a language of pairs of derived trees, decodesG(aL) = t R t parses (aL) { I | ∈ GL } including the pair in Fig. 3b. To illustrate this with an example, consider the Parsing as defined above amounts to computing case of synchronousJ tree-substitutionK grammars a common derivation tree for a given pair of derived (STSGs, Eisner (2003)). An STSG combines lexi- trees; given only a left derived tree, the decoding

5 problem is to compute the corresponding right de- evaluate to a. Then parsesG(a) can be written as 1 rived trees. However, in an application of STSG to h− (terms (a)) L( ). Of course, terms (a) A ∩ G A machine translation or semantic construction, we may be a large or infinite set, so computing it in are typically given a string as the left input and general algebras is infeasible. But now assume want to decode it to a right output tree. We can an algebra in which terms (a) is a regular tree A A support this by left-interpreting the derivation trees language for every a , and in which we can ∈ A directly as strings: We use the appropriate string compute, for each a, a regular tree grammar D(a) algebra T ∗ (consisting of the symbols of Σ with with L(D(a)) = terms (a). Since regular tree A arity zero) for , and map every lexicon entry languages are effectively closed under both inverse AL to a term that concatenates the string yields of the homomorphisms and intersections (Comon et al., elementary trees. In the example grammar, we can 2007), we obtain a parsing algorithm which first let h (α ) = ((x loves) x ), h (α ) = John, computes D(a), and then as the grammar for L0 1 1 • • 2 L0 2 Ga and h (α ) = Mary. With this local change, we h 1(L(D(a))) L( ). L0 3 − ∩ G obtain a new IRTG G0 = ( , (h0 ,T ∗), (hR,T∆)), Formally, this can be done for the following class G L whose language contains pairs of a (left) string of algebras. and a (right) tree. One such pair has the string Definition 3 A Σ-algebra is called regularly de- A “John loves Mary” as the left and the right-hand composable if there is a computable function D( ) · tree in Fig. 3b as the right component. There- which maps every object a to a regular tree fore decodes(“John loves Mary”), i.e. the set of ∈ A grammar D(a) such that L(D(a)) = terms (a).2 right derived trees that are consistent with the input A string, contains the right-hand tree in Fig. 3b. Consider the example of context-free grammars. We conclude this section by remarking that de- We have shown in Section 3.3 how these can be coding can be easily generalized to n input objects seen as an IRTG with an interpretation into T ∗. The and m output objects, all of which can be taken string algebra T ∗ is regularly decomposable be- from different algebras (see Fig. 1c). cause the possible term representations of a string simply correspond to its bracketings: For a string 4 Algorithms w = w w , the grammar D(w) consists of a 1 ··· n rule Ai 1,i wi for each 1 i n, and a rule In the previous section, we have taken a view on − → ≤ ≤ A A A for all 0 i < j < k n. In parsing and translation in which languages and i,k → i,j • j,k ≤ ≤ our example “Sue watches the man with the tele- translations are obtained as the interpretation of scope” from Section 3.2, these are rules such as regular tree grammars. One advantage of this way A the, A man, A A A , of looking at things is that it is possible to define 2,3 → 3,4 → 2,4 → 2,3 • 3,4 and so on. The grammar generates a tree language completely generic parsing algorithms by exploit- consisting of the 132 binary bracketings of the sen- ing closure properties of regular tree languages. tence, including the two mentioned in Section 3.3. 4.1 Parsing Tree algebras are an even simpler example of a The fundamental problem that we must solve is to regularly decomposable algebra. For a given tree t T , the grammar D(t) consists of the rules compute, for a given IRTG G = ( , (h, )) and ∈ Σ G A Aπ f(Aπ1,...,Aπn) for all nodes π in t with object a , a regular tree grammar a such that → ∈ A G label f. D(t) generates a language that contains a L( a) = parses (a). A parser for IRTGs with G G t multiple interpretations follows from this immedi- single tree, namely itself. Thus we can use the parsing algorithm to parse tree inputs (say, in the ately. Assume that G = ( , 1,..., n); then G I I context of an STSG) just as easily as string inputs. n \ parsesG(a1, . . . , an) = parses( , i)(ai) . 4.2 Computing Inverse Homomorphisms G I i=1 The performance bottleneck of the parsing algo- Because regular tree languages are closed under in- rithm is the computation of the inverse homo- tersection, we can parse the different ai separately morphisms. The input of this problem is h and and then intersect all the . D(a); the task is to compute an RTG that Gai H0 The general idea of our parsing algorithm is as uses terminal symbols from the signature Σ of follows. Suppose we were able to compute the and the same nonterminals as D(a), such that G set terms (a) of all possible terms t over that h(L( 0)) = L(D(a)). This problem is nontrivial A A H

6 The complexity of this algorithm is bounded by h(f)(π) = xi A ND(a) ∈ (var) the number of instances of the up rule (McAllester, [f, π, A, A/xi ] 2002). For parsing with context-free grammars, up { } is applied to rules of the form A A A l,r → l,k • k,r A g(A1,...,An) in h(f)(π) = g of D(w); the premises are [f, π1,A , σ ] and → H l,k 1 [f, π1,A1, σ1] [f, πn, An, σn] [f, π2,A , σ ] and the conclusion [f, π, A , σ]. ··· k,r 2 l,r σ = merge(σ1, . . . , σn) = fail The substitution σ defines a segmentation of the 6 (up) [f, π, A, σ] substring between positions l and r into m 1 − smaller substrings, where m is the number of vari- Figure 4: Algorithm for computing h−1( ). ables in the domain of σ. So the instances of up are H uniquely determined by at most m + 1 string posi- tions, where m is the total number of variables in because h may not be a delabeling, so a term h(f) the tree h(f); the parsing complexity is O(nm+1). may have to be parsed by multiple rule applications By our encoding of context-free grammars into in D(a) (see e.g. hL0 (α1) in Section 3.4), and we IRTGs, m corresponds to the maximal number of cannot simply take the homomorphic pre-images nonterminals in the right-hand side of a production of the production rules of D(a). One approach of the original grammar. In particular, the generic (Comon et al., 2007) is to generate all possible algorithm parses Chomsky normal form grammars production rules A f(B ,...,B ) out of ter- → 1 m (where m = 2) in cubic time, as expected. minals f Σ and D(a)-nonterminals and check ∈ whether A h(f(B ,...,B )). Unfortu- ⇒D∗ (a) 1 m 4.3 Parse Charts nately, this algorithm blindly combines arbitrary tu- ples of nonterminals. For parsing with context-free We will now illustrate the operation of the parsing grammars in Chomsky normal form, this approach algorithm with our example context-free grammar leads to an O(n4) parsing algorithm. from Fig. 2 and our example sentence w = “Sue The problem can be solved more efficiently by watches the man with the telescope”. We first com- the algorithm in Fig. 4. This algorithm computes pute D(w), which generates all bracketings of the 1 an RTG 0 for h− (L( )), where is an RTG in sentence. Next, we use the algorithm in Fig. 4 to H H H 1 a normal form in which every rule contains a sin- compute a grammar 0 for h− (L(D(w))) for the H gle terminal symbol; bringing a grammar into this language of all derivation trees that are mapped by form only leads to a linear size increase (Gécseg h to a term evaluating to w. 0 contains rules such H and Steinby, 1997). The algorithm derives items of as A2,4 r2(A2,3,A3,4), A3,5 r2(A3,4,A4,5), → → the form [f, π, A, σ], stating that can generate and A3,4 man. That is, 0 uses terminal sym- H → H the subtree of h(f)σ at node π if it uses A as the bols from Σ, but the nonterminals from D(w). Fi- start symbol; the substitution σ is responsible for nally, we intersect 0 with to retain only deriva- H G replacing the variables in h(f) by nonterminal sym- tion trees that are grammatical according to . G bols. It starts by guessing all possible instantiations We obtain a grammar w for parses(w), which is G of each variable in h(f) (rule var). It then com- shown in Fig. 5 (we have left out unreachable and putes items bottom-up, deriving an item [f, π, A, σ] unproductive rules). The nonterminals of w are G if there is a rule in that can combine the non- pairs of the form (N,Ai,k), i.e. nonterminals of H G terminals derived for the children π1, . . . , πn of π and 0; we abbreviate these pairs as Ni,k. Note that H into A (rule up). The substitution σ is obtained by L( w) consists of exactly two trees, the derivation G merging all mappings in the σ1, . . . , σn; if some trees shown in Fig. 2. σi, σj assign different nonterminals to the same There is a clear parallel between the RTG in variable, the rule fails. Fig. 5 and a parse chart of the CKY parser for the Whenever the algorithm derives an item of same input. The RTG describes how to build larger the form [f, , A, σ], it has processed a com- parse items from smaller ones, and provides exactly plete tree h(f), and we add a production A the same kind of structure sharing for ambiguous → f(σ(x ), . . . , σ(x )) to ; for variables x on sentences that the CKY chart would. For all intents 1 n H0 i which σ is undefined, we let σ(x ) = $ for the and purposes, the RTG is a parse chart. When i Gw special nonterminal $. We also add rules to we parse grammars of other formalisms, such as H0 which generate any tree from T out of $. STSG, the nonterminals of generally record non- Σ Ga

7 terminals of and positions in the input objects, as G S0,7 → r1(NP0,1, VP1,7) NP5,7 → r2(Det5,6, N6,7) encoded in the nonterminals of D(a1),...,D(an); VP1,7 → r3(V1,2, NP2,7) NP0,1 → r7 the spans [i, k] occurring in CKY parse items sim- VP1,7 → r5(VP1,4, PP4,7)V1,2 → r11 ply happen to be the nonterminals of the D(a) for NP2,7 → r2(Det2,3, N3,7) Det2,3 → r8 the string algebra. N3,7 → r4(N3,4, PP4,7)N3,4 → r9 VP1,4 → r3(V1,2, NP2,4)P4,5 → r12 In fact, we maintain that the fundamental pur- NP2,4 → r2(Det2,3, N3,4) Det5,6 → r8 pose of a chart is to act as a device for generating PP4,7 → r6(P4,5, NP5,7)N6,7 → r10 the set of derivation trees for an input. This tree- generating nature of parse charts is made explicit by Figure 5: A “parse chart” RTG for the sentence “Sue watches the man with the telescope”. modeling them directly as RTGs; the well-known view of parse charts as context-free grammars (Bil- lot and Lang, 1989) captures the same intuition, but 5 Membership and Binarization abuses context-free grammars (which are primarily m string-generating devices) as tree description for- A binarization transforms an -ary grammar into malisms. One difference between the two views an equivalent binary one. Binarization is essen- is that regular tree languages are closed under in- tial for achieving efficient recognition algorithms, O(n3) tersection, which means that parse charts that are in particular the usual time algorithms for O(n6) modeled as RTGs can be easily restricted by ex- context-free grammars, and time recogni- ternal constraints (see Koller and Thater (2010) tion of synchronous context-free grammars. In this for a related approach), whereas this is hard in the section, we discuss binarization in terms of IRTGs. context-free view. 5.1 Context-Free Grammars We start with a discussion of parsing context-free 4.4 Decoding grammars. Let G = ( , (rb,T ∗)) be a CFG as G We conclude this section by explaining how to we defined it in Section 3.3. We have shown in solve the decoding problem. Suppose that in the Section 4.2 that our generic parsing algorithm pro- m+1 scenario of Fig. 1c, we have obtained a parse chart cesses a sentence w = w1 . . . wn in time O(n ), ~a for a tuple ~a = a1, . . . , an of inputs, if neces- where m is the maximal number of nonterminal G h i sary by intersecting the individual parse charts a . symbols in the right-hand side of the grammar. To G i Decoding means that we want to compute RTGs achieve the familiar cubic time complexity, an al- for the languages h0 (L( ~a)) where j [m]. The gorithm needs to convert the grammar into a binary j G ∈ actual output objects can be obtained from these form, either explicitly (by converting it to Chomsky languages of terms by evaluating the terms. normal form) or implicitly (as in the case of the

In the case where the homomorphisms hj0 are Earley algorithm, which binarizes on the fly). linear, we can once again exploit closure proper- Strictly speaking, no algorithm that works on the ties: Regular tree languages are closed under the binarized grammar is a parsing algorithm in the application of linear homomorphisms (Comon et sense of ‘parsing’ as we defined it above. Under al., 2007), and therefore we can apply a standard our view of things, such an algorithm does not algorithm to compute the output RTGs from the compute the set parsesG(w) of derivation trees of w parse chart. In the case of non-linear homomor- according to the grammar G, but according to a phisms, the output languages are not necessarily second, binarized grammar G0 = ( 0, (rb,T ∗)). G regular, so decoding exceeds the expressive capac- The binarization then takes the form of a function ity of our framework. However, linear output ho- bin that transforms terms over the signature of the momorphisms are frequent in practice; see e.g. the RTG into terms over the binary signature of the G analysis of synchronous grammar formalisms in RTG . For a binarized grammar, we have m = G0 (Shieber, 2004; Shieber, 2006). Some of the work- 2, and so the parsing complexity is O(n3) plus load of a non-linear homomorphism may also be whatever time it takes to compute G0 from G and w. carried by the output algebra, whose operations Standard binarization techniques of context-free may copy or delete material freely (as long as the grammars are linear in the size of the grammar. algebra remains regularly decomposable). Notice Although binarization does not simplify pars- that non-linear input homomorphisms are covered ing in the sense of this paper, it does simplify by the algorithm in Fig. 4. the membership problem of G: Given a string

8 G the nodes  and 2. In such a situation, we can bina- h1 h2 rize the rule A, B r( A ,B ,..., A ,B ) h i → h 1 1i h 4 4i A1 bin A2 in a way that follows the structure of h1(r), e.g.

2 2  r r r r h1 h2 A, B r ( A ,B , A ,B ) G2 h i → 1 h 1 1i h 2 2i r r 1 A1,B1 r1( A1,B1 , A2,B2 ) Figure 6: Binarization. h i → h i h i Ar,Br r2( A ,B , A ,B ) h 2 2i → 1 h 3 3i h 4 4i We can then encode the local rotations in two w T ∗, is there some derivation tree t 2 2 ∈ ∈ G new left and right homomorphisms h1 and h2, i.e. such that h(t) = w? Because L(G) = L(G0), 2  2 1 2 2 2 2 h1(r1) = h1(r1) = h1(r1) = h2(r1) = x1 this question can be decided by testing the empti- 2  2 1 • x2, h2(r1) = h2(r1) = x2 x1. To determine ness of parsesJ GK0 (w), without the need to compute • membership of some (a1, a2) in L( ), we compute parses (w). Furthermore, the set parses 0 (w) is G G G the pre-images of D(a ) and D(a ) under h2 and useful not only for deciding membership in L(G), 1 2 1 h2 but also for computing other quantities, such as 1 and intersect them with the binarized version, 2, of . This can be done in time O(n3 n3). inside probabilities of derivation trees of G. G G 1 · 2 5.3 A Generalized View on Binarization 5.2 Synchronous Context-Free Grammars The common theme of both examples we have just Synchronous context-free grammars can be repre- discussed is that binarization, when it is available, sented as IRTGs along the same lines as STSG allows us to solve the membership problem in less grammars in Section 3.4. The resulting gram- time than the parsing problem. A lower bound for mar G = ( , (h1,T ∗), (h2,T ∗)) consists of two the membership problem of a tuple a , . . . , a of G 1 2 h 1 ni ‘context-free’ interpretations of the RTG into inputs is O( D(a ) D(a ) ), because the pre- G | 1 | · · · | n | string algebras T1∗ and T2∗; as above, the synchro- images of the D(ai) grammars are at least as big nization is ensured by requiring that related strings as the grammars themselves, and the intersection in T1∗ and T2∗ are interpretations of the same deriva- algorithm computes the product of these. This tion tree t L( ). As above, we can parse means that a membership algorithm is optimal if it ∈ G synchronously by parsing separately for the two achieves this runtime. interpretations and intersecting the results. This As we have illustrated above, the parsing algo- yields a parsing complexity for SCFG parsing of rithm from Section 4 is not optimal for monolin- m+1 m+1 O(n n ), where n1 and n2 are the lengths gual context-free membership, because the RTG 1 · 2 G of the input strings and m is the rank of the RTG . has a higher rank than D(a), and therefore permits G Unlike in the monolingual case, this is now consis- too many combinations of input spans into rules. tent with the result that the membership problem of The binarization constructions above indicate one SCFGs is NP-complete (Satta and Peserico, 2005). way towards a generic optimal membership algo- The reason for the intractability of SCFG pars- rithm. Assume that we have algebras ,..., , A1 An ing is that SCFGs, in general, cannot be binarized. all of which over signatures with maximum rank However, Huang et al. (2009) define the class of k, and an IRTG G = ( , (h1, 1),..., (hn, n)), G A A binarizable SCFGs, which can be brought into a where is an RTG over some signature Σ. As- G weakly equivalent normal form in which all produc- sume further that we have some other signature tion rules are binary and the membership problem ∆, of maximum rank k, and a homomorphism can be solved in time O(n3 n3). The key property bin : T T . We can obtain a RTG 2 1 · 2 Σ → ∆ G of binarizable SCFGs, in our terms, is that if r is with L( 2) = bin(L( )) as in the SCFG exam- G G any production rule pair of the SCFG, h1(r) and ple above. Now assume that there are delabelings 2 2 2 h2(r) can be chosen in such a way that they can be hi : T∆ T i such that hi (L( )) = hi(L( )) → A G G transformed into each other by locally swapping for all i [n] (see Fig. 6). Then we can decide ∈ the subterms of a node. For instance, an SCFG rule membership of a tuple a , . . . , a by intersecting h 1 ni pair A A A A A ,B B B B B 2 with all the (h2) 1(L(D(a ))). Because the h → 1 2 3 4 → 3 4 2 1i G i − i can be represented by h (r) = (x x ) (x x ) h2 are delabelings, computing the pre-images can 1 1 • 2 • 3 • 4 i and h (r) = (x x ) (x x ), and h (r) can be be done in linear time; therefore this membership 2 4 • 3 • 1 • 2 2 obtained from h1(r) by swapping the children of algorithm is optimal. Notice that if the result of

9 the intersection is the RTG , then we can obtain into a more powerful algebra. This approach is 1H parses(a , . . . , a ) = bin− (L( )); this is where taken by Maletti (2010), who uses an ordinary tree 1 n H the exponential blowup can happen. homomorphism to map a derivation tree t into a The constructions in Sections 5.1 and 5.2 are tree t0 of ‘building instructions’ for a derived tree, both special cases of this generalized approach, and then applies a function E to execute these · which however also maintains a clear connection to building instructions and build the TAG derived the strong generative capacity. It is not obvious to tree. Maletti’s approach fits nicely into our frame- 2 us that it is necessary that the homomorphisms hi work if we assume an algebra in which the building must be delabelings for the membership algorithm instruction symbols are interpreted according to E. · to be optimal. Exploring this landscape, which ties Synchronous tree-adjoining grammars (Shieber in with the very active current research on binariza- and Schabes, 1990) can be modeled simply as an tion, is an interesting direction for future research. RTG with two separate TAG interpretations. We can separately choose to interpret each side as trees 6 Discussion and Related Work or strings, as described in Section 3.4. We conclude this paper by discussing how a num- ber of different grammar formalisms from the lit- 6.2 Weighted Tree Transducers erature relate to IRTGs, and use this discussion to One influential approach to statistical syntax-based highlight a number of features of our framework. machine translation is to use weighted transducers to map parse trees for an input language to parse 6.1 Tree-Adjoining Grammars trees or strings of the output language (Graehl et al., We have sketched in Section 3.4 how we can cap- 2008). Bottom-up tree transducers can be modeled ture tree-substitution grammars by assuming an in terms of bimorphisms, i.e. triples (h , , h ) of L G R RTG for the language of derivation trees and a ho- an RTG and two tree homomorphisms h and h G L R momorphism into the tree algebra which spells out that map a derivation t L( ) into the input tree ∈ G the derived trees; or alternatively, a homomorphism hL(t) and the output tree hR(t) of the transducer into the string algebra which computes the string (Arnold and Dauchet, 1982). Thus bottom-up trans- yield. This construction can be generalized to tree- ducers fit into the view of Fig. 1b. Although Graehl adjoining grammars (Joshi and Schabes, 1997). et al. use extended top-down transducers and not Assume first that we are only interested in the bottom-up transducers, a first inspection of their string language of the TAG grammar. Unlike in transducers leads us to believe that nothing hinges TSG, the string yield of a derivation tree in TAG on this specific choice for their application. The may be discontinuous. We can model this with exact situation bears further investigation. an algebra whose elements are strings and pairs Graehl et al.’s transducers further differ from of strings, along with a number of different con- the setup we have presented above in that they are catenation operators that represent possible ways in weighted, i.e. each derivation step is associated which these elements can be combined. (These are with a numeric weight (e.g., a probability), and a subset of the operations considered by Gómez- we can ask for the optimum derivation for a given Rodríguez et al. (2010).) We can then specify a input. Our framework can be straightforwardly homomorphism, essentially the binarization proce- extended to cover this case by assuming that the dure that Earley-like TAG parsers do on the fly, that RTG is a weighted RTG (wRTG, Knight and G maps derivation trees into terms over this algebra. Graehl (2005)). The parsing algorithm from Sec- The TAG string algebra is regularly decomposable, tion 4.1 generalizes to an algorithm for computing and D(a) can be computed in time O(n6). a weighted chart RTG, from which the best deriva- Now consider the case of mapping derivation tion can be extracted efficiently (Knight and Graehl, trees into derived trees. This cannot easily be done 2005). Similarly, the decoding algorithm from Sec- by a homomorphic interpretation in an ordinary tree tion 4.4 can be used to compute a weighted RTG for algebra. One way to deal with this, which is taken the output terms, and an algorithm for EM training by Shieber (2006), is to replace homomorphisms can be defined directly on the weighted charts. In by a more complex class of tree translation func- general, every grammar formalism that can be cap- tions called embedded pushdown tree transducers. tured as an IRTG has a canonical weighted variant A second approach is to interpret homomorphically in this way. As probabilistic grammar formalisms,

10 these assume that all RTG rule applications are for arbitrary regularly decomposable algebras; be- statistically independent. That is, the canonical cause the algebras and RTGs are modular, we can probabilistic version of context-free grammars is reuse algorithms for computing D(a) even when PCFG, and the canonical probabilistic version of we change the homomorphism. Finally, we offer a tree-adjoining grammar is PTAG (Resnik, 1992). more transparent view on synchronous grammars, A final point is that Graehl et al. invest consider- which separates the different dimensions clearly. able effort into defining different versions of their An important special case of GCFG is that of transducer training algorithms for the tree-to-tree linear context-free rewrite systems (LCFRS, Vijay- and tree-to-string translation cases. The core of Shanker et al. (1987)). LCFRSs are essentially their paper, in our terms, is to define synchronous GCFGs with a “yield” homomorphism that maps parsing algorithms to compute an RTG of deriva- objects of to strings or tuples of strings. There- A tion trees for (tree, tree) and (tree, string) input fore every grammar formalism that can be seen as pairs. In their setup, these two cases are formally an LCFRS, including certain dependency grammar completely different objects, and they define two formalisms (Kuhlmann, 2010), can be phrased as separate algorithms for these problems. Our ap- string-generating IRTGs. One particular advantage proach is more modular: The training and parsing that our framework has over LCFRS is that we algorithms can be fully generic, and all that needs do not need to impose a bound on the length of to be changed to switch between tree-to-tree and the string tuples. This makes it possible to model tree-to-string is to replace the algebra and homo- formalisms such as combinatory categorial gram- morphism on one side, as in Section 3.4. In fact, we mar (Steedman, 2001), which may be arbitrarily are not limited to interpreting derivation trees into discontinuous (Koller and Kuhlmann, 2009). strings or trees; by interpreting into the appropriate algebras, we can also describe languages of graphs 7 Conclusion (Eaton et al., 2007), pictures (Drewes, 2006), 3D In this paper, we have defined interpreted RTGs, a models (Bokeloh et al., 2010), and other objects grammar formalism that generalizes over a wide with a suitable algebraic structure. range of existing formalisms, including various syn- chronous grammars, tree transducers, and LCFRS. 6.3 Generalized Context-Free Grammars We presented a generic parser for IRTGs; to apply it to a new type of IRTG, we merely need to define Finally, the view we advocate here embraces a how to compute decomposition grammars D(a) tradition of grammar formalisms going back to for input objects a. This makes it easy to define generalized context-free grammar (GCFG, Pollard synchronous grammars that are heterogeneous both (1984)), which follows itself research in theoret- in the grammar formalism and in the objects that ical computer science (Mezei and Wright, 1967; each of its dimensions describes. Goguen et al., 1977). A GCFG grammar can be The purpose of our paper was to pull together seen as an RTG over a signature Σ whose trees are a variety of existing research and explain it in a evaluated as terms of some Σ-algebra . This is A new, unified light: We have not shown how to do a special case of an IRTG, in which the homomor- something that was not possible before, only how phism is simply the identical function on T , and Σ to do it in a uniform way. Nonetheless, we expect the algebra is . In fact, we could have equiva- A that future work will benefit from the clarified for- lently defined an IRTG as an RTG whose trees are mal setup we have proposed here. In particular, interpreted in multiple Σ-algebras; the mediating we believe that the view of parse charts as RTGs homomorphisms do not add expressive power. We may lead to future algorithms which exploit their go beyond GCFG in three ways. First, the fact closure under intersection, e.g. to reduce syntactic that we map the trees described by the RTG into ambiguity (Schuler, 2001). terms of other algebras using different homomor- phisms means that we can choose the signatures of Acknowledgments these algebras and the RTG freely; in particular, we We thank the reviewers for their helpful comments, can reuse common algebras such as T ∗ for many different RTGs and homomorphisms. This is espe- as well as all the colleagues with whom we have cially important in relation to the second advance, discussed this work—especially Martin Kay for a which is that we offer a generic parsing algorithm discussion about the relationship to chart parsing.

11 References Liang Huang, Hao Zhang, Daniel Gildea, and Kevin Knight. 2009. Binarization of syn- A. Arnold and M. Dauchet. 1982. Morphismes et chronous context-free grammars. Computa- bimorphismes d’arbres. Theoretical Computer tional Linguistics, 35(4):559–595. Science, 20(1):33–93. Aravind K. Joshi and Yves Schabes. 1997. Tree- Sylvie Billot and Bernard Lang. 1989. The struc- Adjoining Grammars. In Grzegorz Rozenberg ture of shared forests in ambiguous parsing. In and Arto Salomaa, editors, Handbook of Formal Proceedings of the 27th ACL. Languages, volume 3, pages 69–123. Springer. M. Bokeloh, M. Wand, and H.-P. Seidel. 2010. Kevin Knight and Jonathan Graehl. 2005. An A connection between partial symmetry and in- overview of probabilistic tree transducers for verse procedural modeling. In Proceedings of natural language processing. In Computational SIGGRAPH. linguistics and intelligent text processing, pages David Chiang. 2007. Hierarchical phrase- 1–24. Springer. based translation. Computational Linguistics, Alexander Koller and Marco Kuhlmann. 2009. De- 33(2):201–228. pendency trees and the strong generative capac- ity of CCG. In Proceedings of the 12th EACL. H. Comon, M. Dauchet, R. Gilleron, C. Löding, F. Jacquemard, D. Lugiez, S. Tison, and M. Tom- Alexander Koller and Stefan Thater. 2010. Com- masi. 2007. Tree automata techniques and ap- puting weakest readings. In Proceedings of the plications. Available on: http://www.grappa. 48th ACL. univ-lille3.fr/tata. Marco Kuhlmann. 2010. Dependency structures Frank Drewes. 2006. Grammatical picture genera- and lexicalized grammars: An algebraic ap- tion: a tree-based approach. EATCS. Springer. proach, volume 6270 of Lecture Notes in Com- puter Science. Springer. Nancy Eaton, Zoltán Füredi, Alexandr V. Kos- tochka, and Jozef Skokan. 2007. Tree repre- P. M. Lewis and R. E. Stearns. 1968. Syntax- sentations of graphs. European Journal of Com- directed transduction. Journal of the ACM, binatorics, 28(4):1087–1098. 15(3):465–488.

Jason Eisner. 2003. Learning non-isomorphic tree Andreas Maletti. 2010. A tree transducer model mappings for machine translation. In Proceed- for synchronous tree-adjoining grammars. In ings of the 41st ACL. Proceedings of the 48th ACL.

Ferenc Gécseg and Magnus Steinby. 1997. Tree David McAllester. 2002. On the complexity analy- languages. In Grzegorz Rozenberg and Arto Sa- sis of static analyses. Journal of the Association lomaa, editors, Handbook of Formal Languages, for Computing Machinery, 49(4):512–537. volume 3, pages 1–68. Springer. Jorge E. Mezei and Jesse B. Wright. 1967. Alge- Joseph A. Goguen, James W. Thatcher, Eric G. braic automata and context-free sets. Informa- Wagner, and Jesse B. Wright. 1977. Initial alge- tion and Control, 11(1–2):3–29. bra semantics and continuous algebras. Journal Rebecca Nesson and Stuart M. Shieber. 2006. Sim- of the Association for Computing Machinery, pler TAG semantics through synchronization. In 24(1):68–95. Proceedings of the 11th Conference on . Carlos Gómez-Rodríguez, Marco Kuhlmann, and Giorgio Satta. 2010. Efficient parsing of well- Carl J. Pollard. 1984. Generalized Phrase Struc- nested linear context-free rewriting systems. In ture Grammars, Head Grammars, and Natural Proceedings of NAACL-HLT. Language. Ph.D. thesis, Stanford University. Jonathan Graehl, Kevin Knight, and Jonathan May. Owen Rambow and Giorgio Satta. 1996. Syn- 2008. Training tree transducers. Computational chronous models of language. In Proceedings of Linguistics, 34(4):391–427. the 34th ACL.

12 Philip Resnik. 1992. Probabilistic tree-adjoining grammar as a framework for statistical natural language processing. In Proceedings of COL- ING.

Giorgio Satta and Enoch Peserico. 2005. Some computational complexity results for syn- chronous context-free grammars. In Proceed- ings of Human Language Technology Confer- ence and Conference on Empirical Methods in Natural Language Processing (HLT/EMNLP).

Sylvain Schmitz and Joseph Le Roux. 2008. Fea- ture unification in tag derivation trees. In Pro- ceedings of the 9th TAG+ Workshop.

William Schuler. 2001. Computational proper- ties of environment-based disambiguation. In Proceedings of the 39th ACL.

Stuart Shieber and Yves Schabes. 1990. Syn- chronous tree-adjoining grammars. In Proceed- ings of the 13th COLING.

Stuart Shieber. 1994. Restricting the weak generative capacity of synchronous tree- adjoining grammars. Computational Intelli- gence, 10(4):371–386.

Stuart M. Shieber. 2004. Synchronous grammars as tree transducers. In Proceedings of the Sev- enth International Workshop on Tree Adjoining Grammar and Related Formalisms (TAG+ 7).

Stuart M. Shieber. 2006. Unifying synchronous tree-adjoining grammars and tree transducers via bimorphisms. In Proceedings of the 11th EACL.

Mark Steedman. 2001. The Syntactic Process. MIT Press.

K. Vijay-Shanker, David J. Weir, and Aravind K. Joshi. 1987. Characterizing structural de- scriptions produced by various grammatical for- malisms. In Proceedings of the 25th ACL.

13