Two Characterisation Results of Multiple Context-Free Grammars and Their Application to Parsing
Total Page:16
File Type:pdf, Size:1020Kb
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Technische Universität Dresden: Qucosa Two characterisation results of multiple context-free grammars and their application to parsing Dissertation zur Erlangung des akademischen Grades Doctor rerum naturalium (Dr. rer. nat.) vorgelegt an der Technische Universität Dresden Fakultät Informatik eingereicht am 5. Juli 2019 von Dipl.-Inf. Tobias Denkinger geboren am 07. 12. 1989 in Rodewisch im Vogtland Gutachter: • Prof. Dr.-Ing. habil. Dr. h.c./Univ. Szeged Heiko Vogler, Technische Universität Dresden (Betreuer) • Dr. Mark-Jan Nederhof, University of St Andrews Fachreferentin: • Prof. Dr. Laura Kallmeyer, Heinrich-Heine-Universität Düsseldorf Verteidigt am: 27. September 2019 in Dresden Acknowledgements Firstly, I would like to thank my supervisor Heiko Vogler for his patience and for all the hours we spent at the blackboard discussing details I would otherwise have missed. I thank Mark-Jan Nederhof for his advice over the years and for agreeing to review this thesis. I am grateful to my current and former colleagues for creating a great work environment at the Chair for Foundations of Programming. In particular, I thank Toni Dietze for always having time to discuss my problems, no matter how technical they were; Johannes Osterholzer for his guidance and for answering countless questions of mine; my office mates Luisa Herrmann and Thomas Ruprecht for letting me bounce ideas off of them and for challenging my(frequently wrong) assumptions; and Kerstin Achtruth for helping me navigate the arcane ways of a university administration. I thank my proofreaders Frederic Dörband, Kilian Gebhardt, Luisa Herrmann, Richard Mörbitz, Nadine Palme, and Thomas Ruprecht for taking the time andfor helping me fix a lot of mistakes. I drew a lot of strength from the loving support and understanding of Nadine Palme during my time as a doctoral candidate; thank you, Nadine! Finally, I thank my parents for always believing in me, especially when I didn’t. 3 Contents Acknowledgements 3 1 Introduction 9 2 Preliminaries 15 2.1 Basic mathematical concepts . 15 2.1.1 Sets . 15 2.1.2 Binary relations . 16 2.1.3 Strings, sequences, and ranked trees . 17 2.1.4 Algebras and homomorphisms . 19 2.1.5 Sorted trees and sorted algebras . 20 2.1.6 Writing conventions . 21 2.1.7 Concrete algebras . 21 2.2 Language devices . 26 2.2.1 Context-free grammars . 26 2.2.2 Parallel multiple context-free grammars . 28 2.2.3 Finite-state automata . 34 2.2.4 Automata with data storage . 35 2.2.5 Turing machines . 46 2.2.6 String homomorphisms . 47 2.3 Weighted language devices . 47 2.3.1 Weighted parallel multiple context-free languages . 48 2.3.2 Weighted finite state automata . 53 2.3.3 Weighted automata with data storage . 54 2.3.4 Weighted string homomorphisms . 57 2.4 Problems and algorithms . 61 3 An automaton characterisation for weighted MCFLs 63 3.1 Introduction . 63 3.2 Tree-stack automata . 64 3.2.1 Normal forms . 68 3.2.2 Restricted tree-stack automata . 72 3.3 The equivalence of MCFGs and restricted TSAs . 73 3.3.1 Every MCFG has an equivalent restricted TSA . 73 3.3.2 Every restricted TSA has an equivalent MCFG . 82 3.3.3 The theorem and the weighted case . 91 3.4 Related formalisms . 92 5 Contents 4 Approximation of weighted automata with data storage 95 4.1 Introduction . 95 4.2 Approximation of (unweighted) automata with data storage . 96 4.2.1 Superset approximations . 99 4.2.2 Subset approximations . 101 4.2.3 Potentially incomparable approximations . 102 4.3 Approximation of multiple context-free languages . 103 4.4 Approximation of weighted automata with data storage . 104 5 A Chomsky-Schützenberger characterisation of weighted MCFLs 109 5.1 Introduction . 109 5.2 Preliminaries . 110 5.3 The Chomsky-Schützenberger characterisation . 113 5.4 Equivalence multiple Dyck languages . 114 5.4.1 Relation between grammar and equivalence multiple Dyck languages 116 5.4.2 Deciding membership in an equivalence multiple Dyck language . 117 5.4.3 A Chomsky-Schützenberger theorem using equivalence multiple Dyck languages . 122 6 Parsing of natural languages 125 6.1 Introduction . 125 6.2 Parsing weighted PMCFGs using weighted TSAs . 129 6.3 Coarse-to-fine parsing of weighted automata with storage . 131 6.3.1 Coarse-to-fine parsing . 132 6.3.2 The algorithm . 132 6.3.3 The implementation and practical relevance . 134 6.4 Chomsky-Schützenberger parsing of weighted MCFGs . 135 6.4.1 Introduction . 135 6.4.2 The naïve parser . 136 6.4.3 The problem: harmful loops . 138 6.4.4 The modification of ℳ(퐺) ......................... 140 6.4.5 Factorisable POCMOZs . 143 6.4.6 Restricted weighted MCFGs . 146 6.4.7 A modification of isMember ........................ 148 6.4.8 The algorithm . 149 6.4.9 The implementation and practical results . 153 6.4.10 Related parsing approaches . 153 Appendix 155 A Between type-0 and type-2 languages in the Chomsky hierarchy 157 6 Contents B Additional material for Chapter 2 163 B.1 Another closure of data storage . 163 B.2 Goldstine automata . 164 B.3 Proof of lemma 2.71 . 166 C Additional material for Chapter 3 169 C.1 Transitions of ℳ(퐺) from example 3.24 . 169 References 171 Index 185 7 1 Introduction The results presented in this thesis are mainly motivated by natural language processing, partic- ularly parsing. Parsing is a process that enhances a string (e.g. a sentence in a natural language) with syntactic structure.1 Let us consider, for example, the sentence I saw the man with the telescope. We can assign a word class to each word of the sentence: pronoun verb determiner noun preposition determiner noun | | | | | | | I saw the man with the telescope In the natural language processing community, such word classes are called part-of-speech tags. Now we combine words of the sentence to larger grammatical (in the sense of linguistics) units, so-called constituents. For example, we can combine “the” and “man” to a noun phrase (i.e. a constituent that acts as a noun in a sentence). The constituent “the telescope” and the word “I” can also act as a noun and are thus noun phrases. The combination of a noun phrase and a preposition is called a prepositional phrase. The combination of a verb, a noun phrase, and a prepositional phrase is a verb phrase. Finally, a sentence may consist of a noun phrase (the subject of the clause) and a verb phrase. This grammatical description of the sentence canbe visualised as a tree, a so-called constituent tree (shown in figure 1.1). Constituent trees are one kind of syntactic structures. The rules of how to build constituents are traditionally given asa sentence verb phrase prepositional phrase noun phrase noun phrase noun phrase pronoun verb determiner noun preposition determiner noun I saw the man with the telescope Figure 1.1: A constituent tree 1There is also semantic parsing, which we will ignore in this thesis. 9 1 Introduction formal grammar. For example, the rule “telescope is a noun” can be written as NN → telescope where NN is an abbreviation for noun and the rule “the combination of a preposition and a noun phrase is called a prepositional phrase” can be written as PP → IN NP where PP, IN , and NP are abbreviations for prepositional phrase, preposition or subordinating conjunction, and noun phrase, respectively. If we collect all the rules that were used in our example, we end up with 푆 → NP VP NP → PRP PRP → I VP → VBD NP PP VBD → saw NP → DT NN DT → the NN → man PP → IN NP IN → with NN → telescope which constitutes a context-free grammar. Context-free grammars are the traditional model for constituent trees [Cho56; Cha96]. Many linguistic theories of constituency have been devised since 1956, see Müller [Mül18] for an overview. Requirements for a linguistic theory to be used for parsing are: • a mathematical formalisation of the linguistic theory (we call this mathematical formali- sation a grammar formalism), • the existence of data with annotations that are compatible with the grammar formalism (such data are called corpora), and • an efficient parsing algorithm for the grammar formalism. Hence, among others, tree-adjoining grammars [JLT75; SJ88; Chi00], tree substitution grammars [continuous: Sch90; Bod92; discontinuous: CSS11; CSB16], lexical functional grammars [KB82; Rie+01], context-free grammars with latent annotations [MMT05; Pet+06], linear context-free rewriting systems [VWJ87; KM13], combinatory categorial grammars [Ste87; Cla02; LLZ16], and minimalist grammars [Sta97; Tor+19], have been used for parsing. Recently, transition- systems were also used to describe constituent trees [Ver14; Mai15; CC17]. Dependency trees are another kind of syntactic structures. They model the dependencies between words of the given sentence. For example, the word “I” in our sentence is the subject of the word “saw”. This dependency is expressed by an arrow from “saw” to “I” that is labelled with “subject”. The root is usually the verb of the sentence, in our case “saw”. An exampleof a dependency tree is shown in figure 1.2. To improve readability of a dependency tree, the sentence is written under the tree, and each word in the sentence is connected to the equally labelled node in the tree by a dashed line. In this dissertation, we will restrict ourselves to constituent trees.2 Furthermore, we restrict ourselves to grammar-based models since they allow us to use results in formal language theory to improve parsing. More precisely, we will assume that the possible syntactic structures are 2Note that constituent trees can be converted to dependency trees [JN07].