Parsing Algorithms Based on Tree Automata

Total Page:16

File Type:pdf, Size:1020Kb

Parsing Algorithms Based on Tree Automata Parsing Algorithms based on Tree Automata Andreas Maletti Giorgio Satta Departament de Filologies Romaniques` Department of Information Engineering Universitat Rovira i Virgili, Tarragona, Spain University of Padua, Italy [email protected] [email protected] Abstract resorting to so-called parental annotations (John- We investigate several algorithms related son, 1998), but this, of course, results in a different to the parsing problem for weighted au- tree language, since these annotations will appear tomata, under the assumption that the in- in the derived tree. put is a string rather than a tree. This Most of the theoretical work on parsing and es- assumption is motivated by several natu- timation based on PTA has assumed that the in- ral language processing applications. We put is a tree (Graehl et al., 2008), in accordance provide algorithms for the computation of with the very definition of these devices. How- parse-forests, best tree probability, inside ever, both in parsing as well as in machine transla- probability (called partition function), and tion, the input is most often represented as a string prefix probability. Our algorithms are ob- rather than a tree. When the input is a string, some tained by extending to weighted tree au- trick is applied to map the problem back to the tomata the Bar-Hillel technique, as defined case of an input tree. As an example in the con- for context-free grammars. text of machine translation, assume a probabilistic tree transducer T as a translation model, and an 1 Introduction input string w to be translated. One can then inter- Tree automata are finite-state devices that recog- mediately construct a tree automaton Mw that rec- nize tree languages, that is, sets of trees. There ognizes the set of all possible trees that have w as is a growing interest nowadays in the natural yield, with internal nodes from the input alphabet language parsing community, and especially in of T . This automaton Mw is further transformed the area of syntax-based machine translation, for into a tree transducer implementing a partial iden- probabilistic tree automata (PTA) viewed as suit- tity translation, and such a transducer is composed able representations of grammar models. In fact, with T (relation composition). This is usually probabilistic tree automata are generatively more called the ‘cascaded’ approach. Such an approach powerful than probabilistic context-free gram- can be easily applied also to parsing problems. mars (PCFGs), when we consider the latter as de- In contrast with the cascaded approach above, vices that generate tree languages. This difference which may be rather inefficient, in this paper we can be intuitively understood if we consider that a investigate a more direct technique for parsing computation by a PTA uses hidden states, drawn strings based on weighted and probabilistic tree from a finite set, that can be used to transfer infor- automata. We do this by extending to weighted mation within the tree structure being recognized. tree automata the well-known Bar-Hillel construc- As an example, in written English we can em- tion defined for context-free grammars (Bar-Hillel pirically observe different distributions in the ex- et al., 1964) and for weighted context-free gram- pansion of so-called noun phrase (NP) nodes, in mars (Nederhof and Satta, 2003). This provides the contexts of subject and direct-object positions, an abstract framework under which several pars- respectively. This can be easily captured using ing algorithms can be directly derived, based on some states of a PTA that keep a record of the dif- weighted tree automata. We discuss several appli- ferent contexts. In contrast, PCFGs are unable to cations of our results, including algorithms for the model these effects, because NP node expansion computation of parse-forests, best tree probability, should be independent of the context in the deriva- inside probability (called partition function), and tion. This problem for PCFGs is usually solved by prefix probability. 1 Proceedings of the 11th International Conference on Parsing Technologies (IWPT), pages 1–12, Paris, October 2009. c 2009 Association for Computational Linguistics 2 Preliminary definitions is, an alphabet whose symbols have an associated arity. We write Σ to denote the set of all k-ary Let S be a nonempty set and be an associative k · symbols in Σ. We use a special symbol e Σ0 binary operation on S. If S contains an element 1 ∈ to syntactically represent the empty string ε. The such that 1 s = s = s 1 for every s S, then · · ∈ set of Σ-trees, denoted by TΣ, is the smallest set (S, , 1) is a monoid. A monoid (S, , 1) is com- · · satisfying both of the following conditions mutative if the equation s1 s2 = s2 s1 holds · · for every α Σ0, the single node labeled α, for every s1, s2 S.A commutative semiring • ∈ ∈ written α(), is a tree of TΣ, (S, +, , 0, 1) is a nonempty set S on which a bi- · for every σ Σk with k 1 and for every nary addition + and a binary multiplication have • ∈ ≥ · t1, . , tk TΣ, the tree with a root node la- been defined such that the following conditions are ∈ beled σ and trees t , . , t as its k children, satisfied: 1 k written σ(t , . , t ), belongs to T . (S, +, 0) and (S, , 1) are commutative 1 k Σ • · As a convention, throughout this paper we assume monoids, that σ(t , . , t ) denotes σ() if k = 0. The size distributes over + from both sides, and 1 k •· of the tree t TΣ, written t , is defined as the s 0 = 0 = 0 s for every s S. ∈ | | • · · ∈ number of occurrences of symbols from Σ in t. A weighted string automaton, abbreviated WSA, Let t = σ(t , . , t ). The yield of t is recur- (Schutzenberger,¨ 1961; Eilenberg, 1974) is a sys- 1 k sively defined by tem M = (Q, Σ, , I, ν, F ) where S Q is a finite alphabet of states, • σ if σ Σ0 e Σ is a finite alphabet of input symbols, ∈ \{ } • yd(t) = ε if σ = e = (S, +, , 0, 1) is a semiring, •S · I : Q S assigns initial weights, yd(t1) yd(tk) otherwise. • → ··· ν : Q Σ Q S assigns a weight to each • × × → The set ofpositions of t, denoted by Pos(t), is transition, and recursively defined by F : Q S assigns final weights. • → We now proceed with the semantics of M. Let Pos(σ(t1, . , tk)) = w Σ∗ be an input string of length n. For each ∈ ε iw 1 i k, w Pos(ti) . integer i with 1 i n, we write w(i) to denote { } ∪ { | ≤ ≤ ∈ } ≤ ≤ the i-th character of w. The set Pos(w) of posi- Note that t = Pos(t) and, according to our con- | | | | tions of w is i 0 i n .A run of M on w k = 0 { | ≤ ≤ } vention, when the above definition provides is a mapping r : Pos(w) Q. We denote the set Pos(σ()) = ε . We denote the symbol of t at → { } of all such runs by RunM (w). The weight of a position w by t(w) and its rank by rkt(w). run r Run (w) is ∈ M A weighted tree automaton (WTA) is a system M = (Q, Σ, , µ, F ) where n S Q is a finite alphabet of states, wtM (r) = ν(r(i 1), w(i), r(i)) . • − Σ is a finite ranked alphabet of input symbols, i=1 • Y = (S, +, , 0, 1) is a semiring, We assume the right-hand side of the above equa- •S · µ is an indexed family (µk)k N of mappings • k ∈ tion evaluates to 1 in case n = 0. The WSA M µ :Σ SQ Q , and k k → × recognizes the mapping M :Σ∗ S, which is F : Q S → 1 assigns final weights. defined for every w Σ of length n by • → k ∈ ∗ In the above definition, Q is the set of all strings over Q having length k, with Q0 = ε . Fur- M(w) = I(r(0)) wtM (r) F (r(n)) . Q Qk { } · · ther note that S × is the set of all matrices r Run (w) ∈ XM with elements in S, row index set Q, and column In order to define weighted tree automata (Bers- index set Qk. Correspondingly, we will use the tel and Reutenauer, 1982; Esik´ and Kuich, 2003; common matrix notation and write instances of µ Borchardt, 2005), we need to introduce some addi- in the form µk(σ)q0,q1 qk . Finally, we assume ··· tional notation. Let Σ be a , that q1 qk = ε if k = 0. ranked alphabet ··· We define the semantics also in terms of runs. 1We overload the symbol M to denote both an automaton Let t TΣ.A run of M on t is a mapping and its recognized mapping. However, the intended meaning ∈ will always be clear from the context. r : Pos(t) Q. We denote the set of all such runs → 2 σ by Run (t). The weight of a run r Run (t) M ∈ M is @ σ γ wtM (r) = µk(t(w))r(w),r(w1) r(wk) . δ ··· w Pos(t) ∈Y γ δ α @ rkt(w)=k α β σ α Note that, according to our convention, the string β @ r(w1) r(wk) denotes ε when k = 0. The ··· β α σ α WTA M recognizes the mapping M : T S, Σ → which is defined by @ β α M(t) = wt (r) F (r(ε)) M · r Run (t) ∈ XM Figure 1: Input tree t and encoded tree enc(t). for every t TΣ. We say that t is recognized ∈ by M if M(t) = 0.
Recommended publications
  • Tree Automata Techniques and Applications
    Tree Automata Techniques and Applications Hubert Comon Max Dauchet Remi´ Gilleron Florent Jacquemard Denis Lugiez Sophie Tison Marc Tommasi Contents Introduction 7 Preliminaries 11 1 Recognizable Tree Languages and Finite Tree Automata 13 1.1 Finite Tree Automata . 14 1.2 The pumping Lemma for Recognizable Tree Languages . 22 1.3 Closure Properties of Recognizable Tree Languages . 23 1.4 Tree homomorphisms . 24 1.5 Minimizing Tree Automata . 29 1.6 Top Down Tree Automata . 31 1.7 Decision problems and their complexity . 32 1.8 Exercises . 35 1.9 Bibliographic Notes . 38 2 Regular grammars and regular expressions 41 2.1 Tree Grammar . 41 2.1.1 Definitions . 41 2.1.2 Regular tree grammar and recognizable tree languages . 44 2.2 Regular expressions. Kleene’s theorem for tree languages. 44 2.2.1 Substitution and iteration . 45 2.2.2 Regular expressions and regular tree languages. 48 2.3 Regular equations. 51 2.4 Context-free word languages and regular tree languages . 53 2.5 Beyond regular tree languages: context-free tree languages . 56 2.5.1 Context-free tree languages . 56 2.5.2 IO and OI tree grammars . 57 2.6 Exercises . 58 2.7 Bibliographic notes . 61 3 Automata and n-ary relations 63 3.1 Introduction . 63 3.2 Automata on tuples of finite trees . 65 3.2.1 Three notions of recognizability . 65 3.2.2 Examples of the three notions of recognizability . 67 3.2.3 Comparisons between the three classes . 69 3.2.4 Closure properties for Rec£ and Rec; cylindrification and projection .
    [Show full text]
  • Capturing Cfls with Tree Adjoining Grammars
    Capturing CFLs with Tree Adjoining Grammars James Rogers* Dept. of Computer and Information Sciences University of Delaware Newark, DE 19716, USA j rogers©cis, udel. edu Abstract items that occur in that string. There are a number We define a decidable class of TAGs that is strongly of reasons for exploring lexicalized grammars. Chief equivalent to CFGs and is cubic-time parsable. This among these are linguistic considerations--lexicalized class serves to lexicalize CFGs in the same manner as grammars reflect the tendency in many current syntac- the LC, FGs of Schabes and Waters but with consider- tic theories to have the details of the syntactic structure ably less restriction on the form of the grammars. The be projected from the lexicon. There are also practical class provides a nornlal form for TAGs that generate advantages. All lexicalized grammars are finitely am- local sets m rnuch the same way that regular grammars biguous and, consequently, recognition for them is de- provide a normal form for CFGs that generate regular cidable. Further, lexicalization supports strategies that sets. can, in practice, improve the speed of recognition algo- rithms (Schabes et M., 1988). Introduction One grammar formalism is said to lezicalize an- We introduce the notion of Regular Form for Tree Ad- other (Joshi and Schabes, 1992) if for every grammar joining (;rammars (TA(;s). The class of TAGs that in the second formalism there is a lexicalized grammar are in regular from is equivalent in strong generative in the first that generates exactly the same set of struc- capacity 1 to the Context-Free Grammars, that is, the tures.
    [Show full text]
  • Deterministic Top-Down Tree Automata: Past, Present, and Future
    Deterministic Top-Down Tree Automata: Past, Present, and Future Wim Martens1 Frank Neven2 Thomas Schwentick1 1 University of Dortmund Germany 2 Hasselt University and Transnational University of Limburg School for Information Technology Belgium Abstract In strong contrast to their non-deterministic counterparts, deter- ministic top-down tree automata received little attention in the sci- entific literature. The aim of this article is to survey recent and less recent results and stipulate new research directions for top-down de- terministic tree automata motivated by the advent of the XML data exchange format. In particular, we survey different ranked and un- ranked top-down tree automata models and discuss expressiveness, closure properties and the complexity of static analysis problems. 1 Introduction The goal of this article is to survey some results concerning deterministic top-down tree automata motivated by purely formal language theoretic rea- sons (past) and by the advent of the data exchange format XML (present). Finally, we outline some new research directions (future). The Past. Regular tree languages have been studied in depth ever since their introduction in the late sixties [10]. Just as for regular string languages, regular tree languages form a robust class admitting many closure properties and many equivalent formulations, the most prominent one in the form of tree automata. A striking difference with the string case where left-to-right equals right-to-left processing, is that top-down is no longer equivalent to bottom-up. In particular, top-down deterministic tree automata are strictly less expressive than their bottom-up counterparts and consequently form a strict subclass of the regular tree languages.
    [Show full text]
  • Programming Using Automata and Transducers
    University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations 2015 Programming Using Automata and Transducers Loris D'antoni University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/edissertations Part of the Computer Sciences Commons Recommended Citation D'antoni, Loris, "Programming Using Automata and Transducers" (2015). Publicly Accessible Penn Dissertations. 1677. https://repository.upenn.edu/edissertations/1677 This paper is posted at ScholarlyCommons. https://repository.upenn.edu/edissertations/1677 For more information, please contact [email protected]. Programming Using Automata and Transducers Abstract Automata, the simplest model of computation, have proven to be an effective tool in reasoning about programs that operate over strings. Transducers augment automata to produce outputs and have been used to model string and tree transformations such as natural language translations. The success of these models is primarily due to their closure properties and decidable procedures, but good properties come at the price of limited expressiveness. Concretely, most models only support finite alphabets and can only represent small classes of languages and transformations. We focus on addressing these limitations and bridge the gap between the theory of automata and transducers and complex real-world applications: Can we extend automata and transducer models to operate over structured and infinite alphabets? Can we design languages that hide the complexity of these formalisms? Can we define executable models that can process the input efficiently? First, we introduce succinct models of transducers that can operate over large alphabets and design BEX, a language for analysing string coders. We use BEX to prove the correctness of UTF and BASE64 encoders and decoders.
    [Show full text]
  • Translations on a Context Free Grammar A. V. AHO and J. D
    INFORMATION AND CONTROL 19, 439-475 (1971) Translations on a Context Free Grammar A. V. AHO AND J. D. ULLMAN* Bell Telephone Laboratories, Incorporated, Murray Hill, New Jersey 07974 Two schemes for the specification of translations on a context-free grammar are proposed. The first scheme, called a generalized syntax directed translation (GSDT), consists of a context free grammar with a set of semantic rules associated with each production of the grammar. In a GSDT an input word is parsed according to the underlying context free grammar, and at each node of the tree, a finite number of translation strings are computed in terms of the translation strings defined at the descendants of that node. The functional relationship between the length of input and length of output for translations defined by GSDT's is investigated. The second method for the specification of translations is in terms of tree automata--finite automata with output, walking on derivation trees of a context free grammar. It is shown that tree automata provide an exact characterization for those GSDT's with a linear relationship between input and output length. I. SYNTAX DIRECTED TRANSLATION A translation is a set of pairs of strings. One common technique for the specification of translations is to use a context free grammar (CFG) to specify the domain of the translation. An input string is parsed according to this grammar, and the output string associated with this input is specified as a function of the parse tree. A large class of translations can be specified in this manner. Compilers utilizing this principle are termed syntax directed, and several compilers and compiler writing systems have been built around this concept.
    [Show full text]
  • Efficient Techniques for Parsing with Tree Automata
    Efficient techniques for parsing with tree automata Jonas Groschwitz and Alexander Koller Mark Johnson Department of Linguistics Department of Computing University of Potsdam Macquarie University Germany Australia groschwi|[email protected] [email protected] Abstract ing algorithm which recursively explores substruc- tures of the input object and assigns them non- Parsing for a wide variety of grammar for- terminal symbols, but their exact relationship is malisms can be performed by intersecting rarely made explicit. On the practical side, this finite tree automata. However, naive im- parsing algorithm and its extensions (e.g. to EM plementations of parsing by intersection training) have to be implemented and optimized are very inefficient. We present techniques from scratch for each new grammar formalism. that speed up tree-automata-based pars- Thus, development time is spent on reinventing ing, to the point that it becomes practically wheels that are slightly different from previous feasible on realistic data when applied to ones, and the resulting implementations still tend context-free, TAG, and graph parsing. For to underperform. graph parsing, we obtain the best runtimes Koller and Kuhlmann (2011) introduced Inter- in the literature. preted Regular Tree Grammars (IRTGs) in order to address this situation. An IRTG represents 1 Introduction a language by describing a regular language of Grammar formalisms that go beyond context-free derivation trees, each of which is mapped to a term grammars have recently enjoyed renewed atten-
    [Show full text]
  • An Automata-Based Approach to Pattern Matching
    Intelligent Control and Automation, 2013, 4, 309-312 http://dx.doi.org/10.4236/ica.2013.43036 Published Online August 2013 (http://www.scirp.org/journal/ica) An Automata-Based Approach to Pattern Matching Ali Sever Pfeiffer University, Misenheimer, USA Email: [email protected] Received January 21, 2013; revised March 4, 2013; accepted March 11, 2013 Copyright © 2013 Ali Sever. This is an open access article distributed under the Creative Commons Attribution License, which per- mits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. ABSTRACT Due to its importance in security, syntax analysis has found usage in many high-level programming languages. The Lisp language has its share of operations for evaluating regular expressions, but native parsing of Lisp code in this way is unsupported. Matching on lists requires a significantly more complicated model, with a different programmatic ap- proach than that of string matching. This work presents a new automata-based approach centered on a set of functions and macros for identifying sequences of Lisp S-expressions using finite tree automata. The objective is to test that a given list is an element of a given tree language. We use a macro that takes a grammar and generates a function that reads off the leaves of a tree and tries to parse them as a string in a context-free language. The experimental results in- dicate that this approach is a viable tool for parsing Lisp lists and expressions in the abstract interpretation framework. Keywords: Computation and Automata Theory; Pattern Matching; Regular Languages 1.
    [Show full text]
  • Theory and Applications of Tree Languages
    Contributions to the Theory and Applications of Tree Languages Johanna H¨ogberg Ph.D. Thesis, May 2007 Department of Computing Science Ume˚a University Department of Computing Science Ume˚aUniversity SE-901 87 Ume˚a,Sweden Copyright c J. H¨ogberg, 2007 Except Paper I, c Springer-Verlag, 2003 Paper II, c Elsevier, 2007 Paper III, c F. Drewes, J. H¨ogberg, 2007 Paper IV, c World Scientific, 2007 Paper V, c Springer-Verlag, 2007 Paper VI, c Springer-Verlag, 2007 Paper VII, c Springer-Verlag, 2005 Paper VIII, c Springer-Verlag, 2007 ISBN 978-91-7264-336-9 ISSN 0348-0542 UMINF 07.07 Printed by Solfj¨adernOffset, Ume˚a,2007 Preface The study of tree languages was initiated in the late s and was originally motivated by various problems in computational linguistics. It is presently a well-developed branch of formal language theory and has found many uses, including natural language processing, compiler theory, model checking, and tree-based generation. This thesis is, as its name suggests, concerned with theoretical as well as practical aspects of tree languages. It consists of an opening introduction, written to provide motivation and basic concepts and terminology, and eight papers, organised into three parts. The first part treats algorithmic learning of regular tree languages, the second part is concerned with bisimulation minimisation of nondeterministic tree automata, and the third part is devoted to tree-based generation of music. We now summarise the contributions made in each part, and list the papers included therein. More detailed summaries of the individual papers are given in Chapter 3.
    [Show full text]
  • Automata and Formal Languages II Tree Automata
    Automata and Formal Languages II Tree Automata Peter Lammich SS 2015 1 / 161 Overview by Lecture • Apr 14: Slide 3 • Apr 21: Slide 2 • Apr 28: Slide 4 • May 5: Slide 50 • May 12: Slide 56 • May 19: Slide 64 • May 26: Holiday • Jun 02: Slide 79 • Jun 09: Slide 90 • Jun 16: Slide 106 • Jun 23: Slide 108 • Jun 30: Slide 116 • Jul 7: Slide 137 • Jul 14: Slide 148 2 / 161 Organizational Issues Lecture Tue 10:15 – 11:45, in MI 00.09.38 (Turing) Tutorial ? Wed 10:15 – 11:45, in MI 00.09.38 (Turing) • Weekly homework, will be corrected. Hand in before tutorial. Discussion during tutorial. Exam Oral, Bonus for Homework! • ≥ 50% of homework =) 0:3/0:4 better grade On first exam attempt. Only if passed w/o bonus! Material Tree Automata: Techniques and Applications (TATA) • Free download at http://tata.gforge.inria.fr/ Conflict with Equational Logic. 3 / 161 Proposed Content • Finite tree automata: Basic theory (TATA Ch. 1) • Pumping Lemma, Closure Properties, Homomorphisms, Minimization, ... • Regular tree grammars and regular expressions (TATA Ch. 2) • Hedge Automata (TATA Ch. 8) • Application: XML-Schema languages • Application: Analysis of Concurrent Programs • Dynamic Pushdown Networks (DPN) 4 / 161 Table of Contents 1 Introduction 2 Basics 3 Alternative Representations of Regular Languages 4 Model-Checking concurrent Systems 5 / 161 Tree Automata • Finite automata recognize words, e.g.: b q0 qF a q0 ! a(qF ) qF ! b(q0) • Words of alternating as and bs, ending with a, e.g., aba or abababa • Generalize to trees q0 ! a(q1; q1) q1 ! b(q0; q0) q1 ! L() • Trees with alternating „layers” of a nodes and b nodes.
    [Show full text]
  • A Weighted Tree Transducer Toolkit for Syntactic Natural Language Processing Models
    A Weighted Tree Transducer Toolkit for Syntactic Natural Language Processing Models Jonathan May June 2008 Abstract Solutions for many natural language processing problems such as speech recognition, transliteration, and translation have been described as weighted finite-state transducer cascades. The transducer formalism is very useful for researchers, not only for its ability to expose the deep similarities between seemingly disparate models, but also because expressing models in this formalism allows for rapid implementation of real, data-driven systems. Finite-state toolkits can interpret and process transducer chains using generic algorithms and many real-world systems have been built using these toolkits. Current research in NLP makes use of syntax-rich models that are poorly suited to extant transducer toolkits, which process linear input and output. Tree transducers can handle these models, and a weighted tree transducer toolkit with appropriate generic algorithms will lead to the sort of gains in syntax-based modeling that were achieved with string transducer toolkits. This thesis presents Tiburon, a weighted tree transducer toolkit that is highly suited for the processing of syntactic NLP models. Tiburon contains implementations of novel determinization, minimization, and training algorithms that are extensions of classical algorithms suitable for the processing of weighted tree transducers and automata. Tiburon also contains algorithms reflecting previously known results for generation, composition, and projection. We show Tiburon’s effectiveness at rapid reproduction of previously presented syntax-based machine translation and parsing models as well as its utility in constructing and evaluating novel models. Chapter 1 Introduction: Models and Transducers I can’t work without a model.
    [Show full text]
  • Tree Pushdown Automata
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector JOURNAL OF COMPUTER AND SYSTEM SCIENCES 30, 25-40 (1985) Tree Pushdown Automata KARL M. SCHIMPF Department of Computer and Information Sciences, University of California, Santa Cruz, California AND JEAN H. GALLIER Department of Computer and Information Science, University of Pennsylvania, Philadelphia, Pennsylvania 19104 Received March 5, 1983; revised June 1, 1984 This paper presents a new type of automaton called a tree pushdown automaton (a bottom- up tree automaton augmented with internal memory in the form of a tree, similar to the way a stack is added to a finite state machine to produce a pushdown automaton) and shows that the class of languages recognized by such automata is identical to the class of context-free tree languages. 0 1985 Academic Press, Inc. I. INTRODUCTION Tree languages are frequently used to model computations based on tree (or hierarchical) structures. For example, one can use tree structures to model the call- ing structure of a set of procedures and functions such as in derivation trees, syntax directed translation, and recursive schemes [30, 2, 15, 16, 8, 32, 6, 7, 17, 251. These applications of tree language theory frequently use the methods and results of for- mal language theory [27, 28, 9, 10, 311. In particular, these applications can be characterized using context-free tree languages [27, 28, 12, 11, 71. Other applications include the class of macro (or indexed) languages since the class of languages generated by the yield of context-free tree languages is the class of macro (context-sensitive) string languages [l, 13, 14, 2.5, 9, 11, 26, 271.
    [Show full text]
  • When Is a Bottom-Up Deterministic Tree Translation Top-Down
    When Is a Bottom-Up Deterministic Tree Translation Top-Down Deterministic? Sebastian Maneth Universität Bremen, Germany [email protected] Helmut Seidl TU München, Germany [email protected] Abstract We consider two natural subclasses of deterministic top-down tree-to-tree transducers, namely, linear and uniform-copying transducers. For both classes we show that it is decidable whether the translation of a transducer with look-ahead can be realized by a transducer without look- ahead. The transducers constructed in this way, may still make use of inspection, i.e., have an additional tree automaton restricting the domain. We provide a second procedure which decides whether inspection can be removed and if so, constructs an equivalent transducer without inspection. The construction relies on a fixpoint algorithm that determines inspection requirements and on dedicated earliest normal forms for linear as well as uniform-copying transducers which can be constructed in polynomial time. As a consequence, equivalence of these transducers can be decided in polynomial time. Applying these results to deterministic bottom-up transducers, we obtain that it is decidable whether or not their translations can be realized by deterministic uniform-copying top-down transducers without look-ahead (but with inspection) – or without both look-ahead and inspection. 2012 ACM Subject Classification Theory of computation → Models of computation; Theory of computation → Formal languages and automata theory Keywords and phrases Top-Down Tree Transducers, Earliest Transformation, Linear Transducers, Uniform-copying Transucers, Removal of Look-ahead, Removal of Inspection Digital Object Identifier 10.4230/LIPIcs.ICALP.2020.134 Category Track B: Automata, Logic, Semantics, and Theory of Programming 1 Introduction Even though top-down and bottom-up tree transducers are well-studied formalisms that were introduced already in the 1970’s (by Rounds [13] and Thatcher [14] independently, and by Thatcher [15], respectively), some fundamental questions have remained open until today.
    [Show full text]