New Results on Context-Free Tree Languages
Total Page:16
File Type:pdf, Size:1020Kb
Technische Universität Dresden New Results on Context-Free Tree Languages Dissertation zur Erlangung des akademischen Grades Doctor rerum naturalium (Dr. rer. nat.) vorgelegt an der Technischen Universität Dresden Fakultät Informatik eingereicht am 20. Dezember 2017 von Dipl.-Inf. Johannes Osterholzer geboren am 07. Dezember 1984 in Simbach am Inn Gutachter: Prof. Dr.-Ing. habil. Dr. h.c./Univ. Szeged Heiko Vogler, Technische Universität Dresden (Betreuer) Prof. Dr. rer. nat. Andreas Maletti, Universität Leipzig Verteidigt am: 04. Mai 2018, Dresden Acknowledgements It may be a cliché to begin a thesis with the words “This thesis would not have been possible without. ”, followed by an exhaustive list of relatives, colleagues, friends, acquaintances, spiritual and worldly authorities, deities, favorite pet animals, and so forth. I cannot completely spare the reader such a concatenation, but I will try to keep it brief, and assure it comes from the heart. First of all, I am indebted to my parents for their continuous support and patience. As compensation, I wish you many pleasant hours with reading this thesis! Moreover, I want to thank my doctorate supervisor, Heiko Vogler, for his guidance, and for leaving me the freedom to decide on my own where to go next in my research. Thank you very much for this opportunity, Heiko! I also want to thank Andreas Maletti for agreeing to be second assessor for this work. Thank you for taking the time, Andreas! My colleagues at the Chair of Foundations of Programming also deserve my gratitude. In particular, I thank Toni Dietze for being a great office-mate and for the many counterexamples he provided, and Tobias Denkinger for his frequent flashes of inspiration. Of course, I must also thank Kerstin Achtruth, forher helpfulness and for her being the heart and soul of the institute. In the course of Toni’s, Luisa’s, and my research on inverse homomorphic closure of context-free tree languages, we came into contact with André Arnold. In our email correspondence, which we enjoyed very much, he showed us the flaws in our first proof attempts, and encouraged us to keep trying. Merci beaucoup! Finally, I want to thank Luisa for being there for me (which may not always have been easy), for mustering the strength to read all of the following pages (and spotting many mistakes), and of course for 3D printing the (objectively) best son (yet?) in the world! iii Contents This book is divided into chapters, which are divided into sections, which are divided into paragraphs, which are divided into sentences, which are divided into words, which are divided into letters. (Carl Linderholm, Mathematics Made Difficult) Introduction 1 1 Fundamental Notions and Properties 7 1.1 Mathematical Preliminaries . .8 1.1.1 Sets, Relations, and Functions . .8 1.1.2 Algebraic Structures . 11 1.1.3 Principles of Induction . 14 1.2 Formal Languages . 16 1.2.1 Words and Languages . 16 1.2.2 Recognizable Languages . 17 1.2.3 Context-Free Languages . 18 1.2.4 Indexed Languages . 22 1.2.5 Recursively Enumerable Languages and Complexity Classes . 24 1.3 Formal Tree Languages . 30 1.3.1 Trees and Tree Languages . 30 1.3.2 Recognizable Tree Languages . 40 1.3.3 Trees, Tuples, and Structural Induction . 41 1.3.4 Tree Homomorphisms and Tree Transformations . 42 1.4 Weighted Tree Languages and Weighted Tree Transformations . 45 2 Context-Free Tree Languages 47 2.1 Context-Free Tree Grammars . 51 2.1.1 Particular Restrictions . 52 2.1.2 Special Forms . 53 2.1.3 Examples . 54 2.1.4 Elementary Properties of Derivations . 58 2.1.5 Derivation Modes . 62 2.1.6 Linear Context-Free Tree Grammars . 65 v Contents 2.2 Pushdown Tree Automata . 71 2.3 Yield and Path Languages . 77 2.4 Closure Properties . 83 2.5 Complexity of Decision Problems . 86 2.6 Chapter Conclusion . 88 3 Decision Problems of Context-Free Tree Grammars 95 3.1 Space- and Time-Efficient Pushdown Tree Automata . 97 3.1.1 Derivations . 97 3.1.2 Succinct Pushdown Tree Automata . 97 3.1.3 Subdivisions of Symbols and Compact Systems . 99 3.1.4 Representing M] by a Finite Object . 108 3.2 The Uniform Membership Problem . 112 3.2.1 Upper Bound . 112 3.2.2 Lower Bound . 113 3.2.3 Uniform Membership of "-free Indexed Grammars . 115 3.3 The Non-Uniform Membership Problem . 117 3.4 The Infiniteness Problem . 122 3.5 Linear Context-Free Tree Grammars . 123 3.6 Chapter Conclusion . 127 4 Linear Context-Free Tree Languages and Inverse Linear Tree Homomorphisms 129 4.1 Linear Context-Free Tree Languages and Inverse Linear Tree Homomorphisms 132 4.1.1 Notation . 132 4.1.2 The tree language L ................................. 133 4.1.3 A normal form for G ................................ 137 4.1.4 Derivation Trees . 149 4.1.5 Dyck Words and Sequences of Chains . 152 4.1.6 A witness for L(G) = L ............................... 156 4.2 Linear Monadic Context-Free6 Tree Languages and Inverse Homomorphisms . 161 4.3 Chapter Conclusion . 167 5 Synchronous Context-Free Tree Transformations and Pushdown Tree Transducers 169 5.1 Synchronous Context-Free Tree Grammars . 171 5.1.1 Simple Synchronous Context-Free Tree Grammars . 181 5.1.2 Simple Synchronous Context-Free Tree Grammars in Normal Form . 187 5.2 Pushdown Extended Tree Transducers . 190 5.2.1 One-State Transducers . 194 5.2.2 Transducers in Normal Form . 196 5.3 Characterization of Simple Weighted Context-Free Tree Transformations . 199 5.4 Chapter Conclusion . 202 vi Contents 6 Footed and Linear Monadic Context-Free Tree Grammars 203 6.1 Footed and Linear Monadic Context-Free Tree Grammars . 205 6.2 Chapter Conclusion . 211 Conclusion 213 Index 215 Bibliography 221 vii Introduction Character is like a tree and reputation like a shadow. The shadow is what we think of it; the tree is the real thing. (Abraham Lincoln) Context-Free Grammars Context-free grammars (cfg) rank among the fundamental models applied in computer science. Given by a finite number of context-free productions such as A aAb and A " , ! ! they permit describing a possibly infinite set of words, such as the formal language n n a b n N , 2 in finite space. They are an effective representation: there are algorithms to decide many interesting properties of the languages representable by a cfg – and one can even find efficient algorithms for quite some of those problems. Moreover, the context-free languages thus represented are mathematically well-behaved: given an arbitrary context-free language L and a reasonable operation ' on formal languages, in many cases the image '(L) is also context-free. The naturalness of the context-free languages is further underscored by the fact that they appear in many different guises – such as ALGOL-like languages, languages defined by (E)BNF definitions, by (simple) phrase-structure grammars, or by pushdown automata, solutions of particular equation systems, and many more. In summary, since the 1950s there has been much scientific progress on (i) complexity of decision problems of cfg, (ii) closure properties of cfg, and (iii) characterization results for cfg.1 Tree Languages Due to the associativity of monoids, there is not much structure to a word. The word aab is represented likewise by a ab and aa b. However, many topics in computer science necessitate structured data – be it to· represent the· syntax of a program, a structured document such as an XML file, or to symbolize the grammatical structure of a natural language sentence. The 1 And yet, there are still open problems on cfg; compare e.g. [150]. 1 Introduction epitomical means to represent such structure is by trees. For our example, we obtain two distinct trees · · a and b , · · a b a a corresponding to our two conceptions of the structure of aab from above. From the 1970s onward, tree language theory has evolved as a full-fledged subfield of formal language theory. Many well-known results, e.g. on recognizable sets, have been generalized from words to trees. While some generalizations are straightforward, there are also instances where the increment in structure that comes with trees complicates matters, or where properties that hold in the case of word languages are outright false when generalized to the tree case.2 Context-Free Tree Grammars So how to generalize cfg to the realm of trees? This question has been answered by Rounds, who defined a context-free tree grammar (cftg) to be given by a finite set of productions such as A A A A σ S α γ , σ γ , and . ! x1 x2 ! x1 x2 ! x1 x2 α x1 x2 x2 As we see, these productions are still context-free in the sense that each left-hand side contains precisely one nonterminal symbol – in this case, S or A. Since in a cftg, nonterminals may also occur as inner nodes of a tree, we must somehow represent a nonterminal’s subtrees. This role is fulfilled by the symbols x1, x2, . , called variables. In our example, S has no variables, it will therefore occur as a leaf. The nonterminal A has two variables x1 and x2; thus it will occur as a binary inner node. The right-hand side of a cftg production is a tree made up of nonterminal symbols, terminal symbols – in this case, σ, γ and α –, and the variables from the left-hand side, which may occur as leaves only. Given this description, the application of a production of form A(x1,..., xn) % to an occurrence of the nonterminal A in a tree is defined quite naturally: we replace! the occurrence of A by the right-hand side %, and substitute the subtrees of A for the respective occurrences of the variables x1,..., xn.