View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by Technische Universität Dresden: Qucosa

Two characterisation results of multiple context-free grammars and their application to parsing

Dissertation

zur Erlangung des akademischen Grades Doctor rerum naturalium (Dr. rer. nat.)

vorgelegt an der Technische Universität Dresden Fakultät Informatik

eingereicht am 5. Juli 2019

von Dipl.-Inf. Tobias Denkinger

geboren am 07. 12. 1989 in Rodewisch im Vogtland

Gutachter:

• Prof. Dr.-Ing. habil. Dr. h.c./Univ. Szeged Heiko Vogler, Technische Universität Dresden (Betreuer) • Dr. Mark-Jan Nederhof, University of St Andrews

Fachreferentin:

• Prof. Dr. Laura Kallmeyer, Heinrich-Heine-Universität Düsseldorf

Verteidigt am: 27. September 2019 in Dresden

Acknowledgements

Firstly, I would like to thank my supervisor Heiko Vogler for his patience and for all the hours we spent at the blackboard discussing details I would otherwise have missed. I thank Mark-Jan Nederhof for his advice over the years and for agreeing to review this thesis. I am grateful to my current and former colleagues for creating a great work environment at the Chair for Foundations of Programming. In particular, I thank Toni Dietze for always having time to discuss my problems, no matter how technical they were; Johannes Osterholzer for his guidance and for answering countless questions of mine; my office mates Luisa Herrmann and Thomas Ruprecht for letting me bounce ideas off of them and for challenging my(frequently wrong) assumptions; and Kerstin Achtruth for helping me navigate the arcane ways of a university administration. I thank my proofreaders Frederic Dörband, Kilian Gebhardt, Luisa Herrmann, Richard Mörbitz, Nadine Palme, and Thomas Ruprecht for taking the time andfor helping me fix a lot of mistakes. I drew a lot of strength from the loving support and understanding of Nadine Palme during my time as a doctoral candidate; thank you, Nadine! Finally, I thank my parents for always believing in me, especially when I didn’t.

3

Contents

Acknowledgements 3

1 Introduction 9

2 Preliminaries 15 2.1 Basic mathematical concepts ...... 15 2.1.1 Sets ...... 15 2.1.2 Binary relations ...... 16 2.1.3 Strings, sequences, and ranked trees ...... 17 2.1.4 Algebras and homomorphisms ...... 19 2.1.5 Sorted trees and sorted algebras ...... 20 2.1.6 Writing conventions ...... 21 2.1.7 Concrete algebras ...... 21 2.2 Language devices ...... 26 2.2.1 Context-free grammars ...... 26 2.2.2 Parallel multiple context-free grammars ...... 28 2.2.3 Finite-state automata ...... 34 2.2.4 Automata with data storage ...... 35 2.2.5 Turing machines ...... 46 2.2.6 String homomorphisms ...... 47 2.3 Weighted language devices ...... 47 2.3.1 Weighted parallel multiple context-free languages ...... 48 2.3.2 Weighted finite state automata ...... 53 2.3.3 Weighted automata with data storage ...... 54 2.3.4 Weighted string homomorphisms ...... 57 2.4 Problems and algorithms ...... 61

3 An automaton characterisation for weighted MCFLs 63 3.1 Introduction ...... 63 3.2 Tree-stack automata ...... 64 3.2.1 Normal forms ...... 68 3.2.2 Restricted tree-stack automata ...... 72 3.3 The equivalence of MCFGs and restricted TSAs ...... 73 3.3.1 Every MCFG has an equivalent restricted TSA ...... 73 3.3.2 Every restricted TSA has an equivalent MCFG ...... 82 3.3.3 The theorem and the weighted case ...... 91 3.4 Related formalisms ...... 92

5 Contents

4 Approximation of weighted automata with data storage 95 4.1 Introduction ...... 95 4.2 Approximation of (unweighted) automata with data storage ...... 96 4.2.1 Superset approximations ...... 99 4.2.2 Subset approximations ...... 101 4.2.3 Potentially incomparable approximations ...... 102 4.3 Approximation of multiple context-free languages ...... 103 4.4 Approximation of weighted automata with data storage ...... 104

5 A Chomsky-Schützenberger characterisation of weighted MCFLs 109 5.1 Introduction ...... 109 5.2 Preliminaries ...... 110 5.3 The Chomsky-Schützenberger characterisation ...... 113 5.4 Equivalence multiple Dyck languages ...... 114 5.4.1 Relation between grammar and equivalence multiple Dyck languages 116 5.4.2 Deciding membership in an equivalence multiple Dyck language . . . 117 5.4.3 A Chomsky-Schützenberger theorem using equivalence multiple Dyck languages ...... 122

6 Parsing of natural languages 125 6.1 Introduction ...... 125 6.2 Parsing weighted PMCFGs using weighted TSAs ...... 129 6.3 Coarse-to-fine parsing of weighted automata with storage ...... 131 6.3.1 Coarse-to-fine parsing ...... 132 6.3.2 The algorithm ...... 132 6.3.3 The implementation and practical relevance ...... 134 6.4 Chomsky-Schützenberger parsing of weighted MCFGs ...... 135 6.4.1 Introduction ...... 135 6.4.2 The naïve parser ...... 136 6.4.3 The problem: harmful loops ...... 138 6.4.4 The modification of ℳ(퐺) ...... 140 6.4.5 Factorisable POCMOZs ...... 143 6.4.6 Restricted weighted MCFGs ...... 146 6.4.7 A modification of isMember ...... 148 6.4.8 The algorithm ...... 149 6.4.9 The implementation and practical results ...... 153 6.4.10 Related parsing approaches ...... 153

Appendix 155

A Between type-0 and type-2 languages in the 157

6 Contents

B Additional material for Chapter 2 163 B.1 Another closure of data storage ...... 163 B.2 Goldstine automata ...... 164 B.3 Proof of lemma 2.71 ...... 166

C Additional material for Chapter 3 169 C.1 Transitions of ℳ(퐺) from example 3.24 ...... 169

References 171

Index 185

7

1 Introduction

The results presented in this thesis are mainly motivated by natural language processing, partic- ularly parsing. Parsing is a process that enhances a string (e.g. a sentence in a natural language) with syntactic structure.1 Let us consider, for example, the sentence

I saw the man with the telescope.

We can assign a word class to each word of the sentence:

pronoun verb determiner noun preposition determiner noun | | | | | | | I saw the man with the telescope

In the natural language processing community, such word classes are called part-of-speech tags. Now we combine words of the sentence to larger grammatical (in the sense of linguistics) units, so-called constituents. For example, we can combine “the” and “man” to a noun phrase (i.e. a constituent that acts as a noun in a sentence). The constituent “the telescope” and the word “I” can also act as a noun and are thus noun phrases. The combination of a noun phrase and a preposition is called a prepositional phrase. The combination of a verb, a noun phrase, and a prepositional phrase is a verb phrase. Finally, a sentence may consist of a noun phrase (the subject of the clause) and a verb phrase. This grammatical description of the sentence canbe visualised as a tree, a so-called constituent tree (shown in figure 1.1). Constituent trees are one kind of . The rules of how to build constituents are traditionally given asa

sentence

verb phrase

prepositional phrase

noun phrase noun phrase noun phrase

pronoun verb determiner noun preposition determiner noun

I saw the man with the telescope

Figure 1.1: A constituent tree

1There is also semantic parsing, which we will ignore in this thesis.

9 1 Introduction

. For example, the rule “telescope is a noun” can be written as

NN → telescope where NN is an abbreviation for noun and the rule “the combination of a preposition and a noun phrase is called a prepositional phrase” can be written as

PP → IN NP where PP, IN , and NP are abbreviations for prepositional phrase, preposition or subordinating conjunction, and noun phrase, respectively. If we collect all the rules that were used in our example, we end up with

푆 → NP VP NP → PRP PRP → I VP → VBD NP PP VBD → saw NP → DT NN DT → the NN → man PP → IN NP IN → with NN → telescope which constitutes a context-free grammar. Context-free grammars are the traditional model for constituent trees [Cho56; Cha96]. Many linguistic theories of constituency have been devised since 1956, see Müller [Mül18] for an overview. Requirements for a linguistic theory to be used for parsing are: • a mathematical formalisation of the linguistic theory (we call this mathematical formali- sation a grammar formalism), • the existence of data with annotations that are compatible with the grammar formalism (such data are called corpora), and • an efficient parsing algorithm for the grammar formalism. Hence, among others, tree-adjoining grammars [JLT75; SJ88; Chi00], tree substitution grammars [continuous: Sch90; Bod92; discontinuous: CSS11; CSB16], lexical functional grammars [KB82; Rie+01], context-free grammars with latent annotations [MMT05; Pet+06], linear context-free rewriting systems [VWJ87; KM13], combinatory categorial grammars [Ste87; Cla02; LLZ16], and minimalist grammars [Sta97; Tor+19], have been used for parsing. Recently, transition- systems were also used to describe constituent trees [Ver14; Mai15; CC17]. Dependency trees are another kind of syntactic structures. They model the dependencies between words of the given sentence. For example, the word “I” in our sentence is the subject of the word “saw”. This dependency is expressed by an arrow from “saw” to “I” that is labelled with “subject”. The root is usually the verb of the sentence, in our case “saw”. An exampleof a dependency tree is shown in figure 1.2. To improve readability of a dependency tree, the sentence is written under the tree, and each word in the sentence is connected to the equally labelled node in the tree by a dashed line. In this dissertation, we will restrict ourselves to constituent trees.2 Furthermore, we restrict ourselves to grammar-based models since they allow us to use results in theory to improve parsing. More precisely, we will assume that the possible syntactic structures are

2Note that constituent trees can be converted to dependency trees [JN07].

10 saw modifier object

subject

I man telescope case

determiner determiner the with the

I saw the man with the telescope

Figure 1.2: A dependency tree

defined by a multiple context-free grammar [Sek+91].

Why multiple context-free grammars?

The choice of grammar formalism used for parsing a natural language mainly depends onthe answers to two questions: (i) Does the grammar formalism permit an efficient parsing algorithm? (ii) How closely can the syntactic structures of the natural language be modelled with the grammar formalism? Let us consider the context-free grammars from before. They can be parsed efficiently: intime cubic in the number of words of the sentence. However, they lack the ability to model what is called discontinuous constituents [MS08] and non-projective dependencies [KS09]. Figure 1.3a shows an example of a constituent tree with discontinuous constituents. The verb phrases “Darüber nachgedacht“ und “Darüber nachgedacht werden” are both discontinuous because in the sentence there is the word “muß” in between. Figure 1.3b shows an example of a dependency tree with non-projective dependencies (the node labels and edge labels were omitted). The words that depend on “hearing” are “A hearing on the issue”. These dependencies are not projective because in the sentence, there are the words “is scheduled” in between. Discontinuity and non-projectivity are evident from crossings of edges, highlighted by circles. The phenomena of discontinuity and non-projectivity occur frequently in natural language corpora, e.g. about 28 percent of all sentences in both the NeGra corpus [Sku+98] and the TIGER corpus [Bra+04] contain discontinuous constituents [MS08, table 6] and about 23 percent of all sentences in the Prague Dependency Treebank3 contain non-projective dependencies [KS09, section 1]. As a response to the inadequacy of context-free grammars, Joshi [Jos85, section 6.3.3] formu- lated four criteria (which refine our two criteria from before) for a grammar formalism tobe suitable for parsing [cf. Kal10, section 1.1]: (i) it extends context-free grammars,

3https://ufal.mff.cuni.cz/pdt/Corpora/PDT_1.0/

11 1 Introduction

VP

VP

PROAV VMFIN VVPP VAINF

Darüber muß nachgedacht werden A hearing is scheduled on the issue today

(a) A constituent tree with discontinuous con- (b) A dependency tree with non-projective dependencies stituents taken from MS08, figure 3. taken from KS09, figure 2.

Figure 1.3: Discontinuous constituents and non-projective dependencies

(ii) it is able to describe (a limited amount of) discontinuity/non-projectivity, (iii) it is polynomially parsable, and (iv) it only generates languages of constant growth.4 He calls the class of such grammar formalisms mildly context-sensitive grammars. Multiple context-free grammars are mildly context-sensitive [Wei88, section 4.3; Kal10, page 24]. Fur- thermore, they are at least as expressive as combinatory categorial grammars [VWJ86; WJ88], linear indexed grammars [Vij88], head grammars [Sek+91], string-generating linear context- free rewriting systems [Sek+91],5 finite-copying lexical functional grammars [Sek+93], and minimalist grammars [Mic01b; Mic01a]. A more detailed review of the relations between multi- ple context-free grammars and similar grammar formalisms is shown in appendix A. Characterisation results, particularly automaton characterisations, are sometimes used to derive efficient algorithms for various problems in computer science [e.g. in model checking: VW94; or in parsing: Hul11; KM15]. This thesis provides or extends two characterisation results of multiple context-free grammars and investigates their use for parsing natural languages.

Outline

The majority of this thesis is devoted to three theoretical questions: • Chapter 3: What kind of automata is equivalent to multiple context-free grammars? For this, we will enhance finite state automata with a so-called tree-stack. • Chapter 4: How can automata that are enhanced with a storage mechanism be approxi- mated by automata with a simpler storage mechanism? • Chapter 5: How can weighted multiple context-free grammars be decomposed into a homomorphism, a , and a bracket language? Such a decomposition is called a Chomsky-Schützenberger representation.

4Intuitively, constant growth means that each application of a rule of a grammar (in the given formalism) only contributes a bounded number of words to the sentence. 5String generating linear context-free rewriting systems are in fact equivalent to multiple context-free grammars.

12 Chapter 1: Introduction

Chapter 2: Preliminaries

Chapter 5: Chapter 3: A Chomsky- An automaton charac- Schützenberger char- terisation for weighted acterisation of weighted MCFLs MCFLs

Chapter 4: Approximation of weighted automata with data storage

Section 6.1: Section 6.1: Section 6.1: Introduction Introduction Introduction

Section 6.2: Section 6.3: Section 6.4: Parsing weighted Coarse-to-fine parsing Chomsky- PMCFGs using of weighted automata Schützenberger parsing weighted TSAs with storage of weighted MCFGs

Figure 1.4: Reading this thesis

Based on the answers to those questions, we provide three parsing algorithms for multiple context-free grammars in chapter 6: • Section 6.2: A parsing algorithm based on tree-stack automata. • Section 6.3: A parsing algorithm based on tree-stack automata and approximation. • Section 6.4: A parsing algorithm based on the Chomsky-Schützenberger representation. This thesis is structured to allow readers to skip over material that they are not interested in. The diagram in figure 1.4 visualises the (partial) order in which the main chapters andsections are intended to be read. An arrow 퐴 퐵 denotes that 퐴 should be read before 퐵. Chapter 2 can safely be skipped if the reader is familiar with formal language theory, multiple context-free languages, and weighted languages. The appendix contains material that supplements the other chapters but is not essential for the exposition.

13

2 Preliminaries

2.1 Basic mathematical concepts

2.1.1 Sets

We will use an intuitive notion of sets in this work. A set is a collection of pairwise different objects; each of those objects is called an element of the set. The empty set, i.e. the set that contains no elements, is denoted by ∅. Every set that is not ∅ is called non-empty. A set that contains exactly one element is called a singleton (set). Let 퐴 be a set. We write 푎 ∈ 퐴 to denote that the object 푎 is an element of 퐴, and 푎′ ∉ 퐴 to denote that the object 푎′ is not an element of 퐴. The set 퐴 is called finite if it has a finite number of elements; otherwise 퐴 is called infinite. The set 퐴 is called countable if its elements can be assigned a natural number such that different elements get different numbers. We call 퐴 uncountable if it is not countable. Note that every finite set is countable and every uncountable set is infinite. The cardinality of 퐴, denoted by |퐴|, is the number of elements of 퐴 if 퐴 is finite, and ∞ otherwise. Let 퐴 and 퐵 be sets. By 퐴 ∪ 퐵, 퐴 ∩ 퐵, and 퐴 ∖ 퐵, we denote the union of 퐴 and 퐵, the intersection of 퐴 and 퐵, and the difference of 퐴 and 퐵 (i.e. the set of all elements of 퐴 that are not elements of 퐵), respectively. We call 퐴 and 퐵 disjoint if 퐴 ∩ 퐵 = ∅. We write 퐴 ⊆ 퐵, 퐴 ⊇ 퐵, 퐴 ⊊ 퐵, or 퐴 ⊋ 퐵 to denote that 퐴 is a subset, a superset, a proper subset, or a proper superset, respectively, of 퐵. The power set of 퐴, denoted by 풫(퐴), is the set of all subsets of 퐴. A partition of 퐴 is a set 픓 ⊆ 풫(퐴) such that the elements of 픓 are non-empty, pairwise disjoint

(i.e. each 픭1, 픭2 ∈ 픓 with 픭1 ≠ 픭2 are disjoint), and cover 퐴 (i.e. ⋃픭∈픓 픭 = 퐴). The elements of a partition 픓 are called cells. We abbreviate {0, 1} by 픹. The set of natural numbers (including 0) is denoted by ℕ and ℕ ∖ {0} is denoted by ℕ+. For each 푘 ∈ ℕ, we abbreviate {1, …, 푘} by [푘]. Note that [0] = ∅. The sets of real numbers, of non-negative real numbers, and of non-positive real numbersare denoted by ℝ, ℝ≥0, and ℝ≤0, respectively. We frequently use the set-builder notation, i.e. given an expression 푒 and a logical formula 훷 with free variables 푥1, …, 푥푘, we write {푒 ∣ 훷} to denote the set 푍 defined as follows:

(i) Let 푌 ​ be the set of all tuples (푦1, …, 푦푘) such that the closed formula obtained by replacing, for every 푖 ∈ [푘], every occurrence of 푥푖 in 훷 with 푦푖 is true.

(ii) Then 푍 is the set that contains for each (푦1, …, 푦푘) ∈ 푌 ​ the value obtained by replacing, for every 푖 ∈ [푘], every occurrence of 푥푖 in 푒 with 푦푖. We sometimes write {푒 ∈ 푈 ∣ 훷} instead of {푒 ∣ (푒 ∈ 푈) ∧ 훷}.

Let 푘 ∈ ℕ and 퐴1, …, 퐴푘 be sets. The Cartesian product of 퐴1, …, 퐴푘, denoted by 퐴1 ×…×퐴푘, is the set {(푎1, …, 푎푘) ∣ 푎1 ∈ 퐴1, …, 푎푘 ∈ 퐴푘}.

15 2 Preliminaries

2.1.2 Binary relations Let 퐴 and 퐵 be sets. A binary relation (over 퐴 and 퐵) is a subset of 퐴 × 퐵. Let 푅 be a binary relation over 퐴 and 퐵. The inverse of 푅, denoted by 푅−1, is the binary relation {(푏, 푎) ∈ 퐵×퐴 ∣ (푎, 푏) ∈ 푅}. The complement of 푅 (with respect to 퐴 and 퐵), denoted by 푅̸ 퐴×퐵 (or 푅̸ if 퐴 and 퐵 are clear from the context), is the set (퐴 × 퐵) ∖ 푅. For each 푎 ∈ 퐴, the image of 푎 (in 푅), denoted by 푅(푎), is the set {푏 ∈ 퐵 ∣ (푎, 푏) ∈ 푅}. For each 퐴′ ⊆ 퐴, the image of 퐴′ (in 푅), ′ ′ denoted by 푅(퐴 ), is the set ⋃푎′∈퐴′ 푅(푎 ). The image of 푅, denoted by img(푅), is the set 푅(퐴). The domain of 푅, denoted by dom(푅), is the set 푅−1(퐵). Now let 퐶 be a set and 푆 ⊆ 퐵 × 퐶 be a binary relation. The (sequential) composition of 푅 and 푆, denoted by 푅 ; 푆, is the binary relation {(푎, 푐) ∈ 퐴 × 퐶 ∣ (푎, 푏) ∈ 푅 ∧ (푏, 푐) ∈ 푆}. Let 퐴′ ⊆ 퐴. The restriction of 푅 to 퐴′, ′ ′ denoted by 푅|퐴′ , is the binary relation {(푎, 푏) ∈ 퐴 × 퐵 ∣ (푎, 푏) ∈ 푅, 푎 ∈ 퐴 }.

Partial orders and equivalence relations. Let 퐴 be a set. An endorelation (on 퐴) is a binary relation over 퐴 and 퐴. The identity relation of 퐴, denoted by id(퐴), is the endorelation {(푎, 푎) ∣ 푎 ∈ 퐴}. Any subset of id(퐴) is called a partial identity on 퐴. Let 푅 ⊆ 퐴 × 퐴 be an endorelation. We call 푅 • reflexive (w.r.t. 퐴) if id(퐴) ⊆ 푅; • transitive if 푅 ; 푅 ⊆ 푅; • symmetric if 푅−1 = 푅;1 • antisymmetric if 푅−1 ∩ 푅 ⊆ id(퐴); • an equivalence relation (on 퐴) if it is reflexive w.r.t. 퐴, symmetric, and transitive; and • a partial order (on 퐴) if it is reflexive w.r.t. 퐴, antisymmetric, and transitive. The transitive closure of 푅, denoted by 푅+, is the smallest transitive superset of 푅. The reflexive, transitive closure of 푅, denoted by 푅∗, is the endorelation 푅+ ∪ id(퐴). Let 푆 ⊆ 퐴 × 퐴 be an equivalence relation, and 푎 ∈ 퐴. The equivalence class of 푎 (in 푆), ′ ′ denoted by [푎]푆, is the set {푎 ∈ 퐴 ∣ (푎, 푎 ) ∈ 푆}. We may write [푎] instead of [푎]푆 if 푆 is clear from the context. The quotient of 퐴 with respect to 푆, denoted by 퐴/푆, is the set of all equivalence classes in 푆, i.e. {[푎]푆 ∣ 푎 ∈ 퐴}. Partial orders are usually represented by asymmetric symbols like ⊴, ≤, and ⊑. The inverse of some partial order is represented by the vertical mirror image of the partial orders , e.g. ⊴−1, ≤−1, and ⊑−1 are written as ⊵, ≥, and ⊒, respectively. Moreover, we strike the horizontal line of the symbol for a partial order when we refer to the binary relation obtained from the partial order by removing the identity, e.g. we write ⪇ instead of ≤ ∖ id.

Partial functions and total functions. Let 퐴 and 퐵 be sets and 푅 ⊆ 퐴 × 퐵 be a binary relation. We call 푅 • functional (w.r.t. 퐴) if |푅(푎)| ≤ 1 for each 푎 ∈ 퐴; • injective (w.r.t. 퐵) if 푅−1 is functional w.r.t. 퐵; • total (w.r.t. 퐴) if dom(푅) = 퐴;

1Note that 푅−1 ⊆ 푅 implies 푅−1 = 푅.

16 2.1 Basic mathematical concepts

• surjective (w.r.t. 퐵) if img(푅) = 퐵; • a partial function (from 퐴 to 퐵) if it is functional w.r.t. 퐴; • a (total) function (from 퐴 to 퐵) if it is functional and total w.r.t. 퐴; and • a bijection (between 퐴 and 퐵) if it is injective and surjective w.r.t. 퐵 and a total function from 퐴 to 퐵. The set of partial functions from 퐴 to 퐵 is denoted by 퐴 ‧‧➡ 퐵 and the set of total functions from 퐴 to 퐵 is denoted by 퐴 → 퐵. Let 푓 be a partial function from 퐴 to 퐵 and (푎, 푏) ∈ 퐴 × 퐵. The update of 푓 with (푎, 푏), denoted by 푓[푎/푏], is the partial function 푓|퐴∖{푎} ∪ {(푎, 푏)}. Let 퐴 and 퐼 be sets. An (퐼-indexed) 퐴-family is a function from 퐼 to 퐴. We sometimes define 퐼-indexed families using an expression of the form (푎푖 ∣ 푖 ∈ 퐼) where 푎푖 is some notation that may contain the variable 푖. Then (푎푖 ∣ 푖 ∈ 퐼) is shorthand for {(푖, 푎푖) ∣ 푖 ∈ 퐼}. We call 퐼 the index of (푎푖 ∣ 푖 ∈ 퐼).

Writing conventions. • We sometimes write 푎 푅 푏 rather than (푎, 푏) ∈ 푅. • We write 푓: 퐴 ‧‧➡ 퐵 and 푔: 퐴 → 퐵 rather than 푓 ∈ 퐴 ‧‧➡ 퐵 and 푔 ∈ 퐴 → 퐵 to denote that 푓 is a partial function from 퐴 to 퐵 and 푔 is a total function from 퐴 to 퐵. • We write 푓(푎) = 푏 and 푓(푎′) = undefined rather than 푓(푎) = {푏} and 푓(푎′) = ∅ for ′ any partial function 푓: 퐴 ‧‧➡ 퐵, (푎, 푏) ∈ 푓, and 푎 ∈ 퐴 ∖ dom(푓).

2.1.3 Strings, sequences, and ranked trees

∗ Strings. Let 훴 be a set. The set of (finite) strings (over 훴), denoted by 훴 , is the set {휎1⋯휎푘 ∣ ∗ 푘 ∈ ℕ, 휎1, …, 휎푘 ∈ 훴}. Let 푣 = 휎1⋯휎푘 ∈ 훴 with 푘 ∈ ℕ and 휎1, …, 휎푘 ∈ 훴. We call 휎푖 the 푖-th symbol of 푣 for each 푖 ∈ [푘]. The length of 푣, denoted by |푣|, is 푘. The empty string, denoted by 휀, is the element of 훴∗ with length 0. Note that 훴∗ = {휀} if 훴 is empty, 훴∗ is countable if ∗ ′ ′ ∗ 훴 is countable, and 훴 is uncountable if 훴 is uncountable. Now let 푤 = 휎1⋯휎푚 ∈ 훴 with ′ ′ 푚 ∈ ℕ and 휎1, …, 휎푚 ∈ 훴. The concatenation of 푣 and 푤, denoted by 푣 ∘ 푤 or 푣푤, is the string ′ ′ 휎1⋯휎푘휎1⋯휎푚. Note that 휀 ∘ 푣 = 푣 = 푣 ∘ 휀. A (string) language (over Σ) is a subset of 훴∗. Let 퐿 and 퐿′ be languages over 훴, i.e. 퐿 ⊆ 훴∗ and 퐿′ ⊆ 훴∗. The concatenation of 퐿 and 퐿′, denoted by 퐿 ∘ 퐿′, is the set {푣푤 ∣ 푣 ∈ 퐿, 푤 ∈ 퐿′}. Note that {휀} ∘ 퐿 = 퐿 = 퐿 ∘ {휀} and ∅ ∘ 퐿 = ∅ = 퐿 ∘ ∅. Now let 푛 ∈ ℕ. The 푛-th power of 퐿, denoted by 퐿푛, is recursively defined as {휀} if 푛 = 0 and as 퐿 ∘ 퐿푛−1 otherwise. The Kleene-star ∗ 푛 of 퐿, denoted by 퐿 , is the language ⋃푛∈ℕ 퐿 .

Sequences. Intuitively, a sequence is a string that may be infinite. We call them „sequences“ instead of „infinite strings“ to avoid confusion with (finite) strings. Let 훴 be a set. Formally, a sequence over 훴 is a partial function 푤: ℕ+ ‧‧➡훴 such that 푛 ∈ dom(푤) implies 푛−1 ∈ dom(푤) 휔 for every 푛 ∈ ℕ+. The set of sequences over 훴 is denoted by 훴 . Every string 푣 ∈ 훴∗ can be construed as an element of 훴휔 whose domain is [|푣|] and where 푣(푖) is the 푖-th symbol in 푣 for each 푖 ∈ [|푣|]. In particular, 휀 ∈ 훴∗ corresponds to ∅ ∈ 훴휔

17 2 Preliminaries and we will write 휀 for both. Now let 푣 ∈ 훴∗ and 푤 ∈ 훴휔. The (휔-)concatenation of 푣 and 푤, denoted by 푣 ∘휔 푤, is the partial function 푣 ∪ {(|푣| + 푖, 푤(푖)) ∣ 푖 ∈ dom(푤)}. Using 휔-concatenation, we can define four additional functions that allow us to work with sequences: • For any two sets 훴 and 훥, we define a (higher order) function map: (훴 → 훥) → (훴휔 → 훥휔) that applies a function to every symbol of a given sequence. It is defined for any function 푓: 훴 → 훥 and any sequence 푤 ∈ 훴휔 by

휀 if 푤 = 휀 map(푓)(푤) = { } . 푓(휎) ∘휔 map(푓)(푣) if 푤 = 휎 ∘휔 푣 for some 휎 ∈ 훴, 푣 ∈ 훴휔

• For any set 훴, we define a function take: ℕ → (훴휔 → 훴∗) that returns a finite prefix of a given sequence. It is defined for any number 푛 ∈ ℕ and any sequence 푤 ∈ 훴휔 by

⎧휀 if 푛 = 0 or 푤 = 휀 ⎫ { } (푛)(푤) = 휔 take ⎨휎 ∘ take(푛 − 1)(푣) ⎬ . { 휔 휔} ⎩ if 푛 > 0 and 푤 = 휎 ∘ 푣 for some 휎 ∈ 훴, 푣 ∈ 훴 ⎭

• For any set 훴, we define a function filter: 풫(훴) → (훴휔 → 훴휔) that filters a given sequence by a predicate and only retains those symbols of the sequence that fulfil the predicate. It is defined for any predicate 푝 ⊆ 훴 and any sequence 푤 ∈ 훴휔 by

⎧휀 if 푤 = 휀, ⎫ { } (푝)(푤) = (푝)(푣) 푤 = 휎 ∘휔 푣 휎 ∈ 훴 ∖ 푝, 푣 ∈ 훴휔 filter ⎨filter if for some ⎬ . { 휔 휔 휔 } ⎩휎 ∘ filter(푝)(푣) if 푤 = 휎 ∘ 푣 for some 휎 ∈ 푝, 푣 ∈ 훴 ⎭

• For any two sets 훴 and 풜, we define a partial function sort: (훴 → 풜) × 풫(풜 × 풜) ‧‧➡ (풫(훴) → 훴휔) that sorts a set of elements from 훴 descendingly according to a function from 훴 → 풜 and a partial order on 풜, returning a sequence. It is defined for any function 휇: 훴 → 풜, any partial order ⊴ ⊆ 풜 × 풜, and any set 푌 ⊆ 훴 by

휀 if 푌 = ∅ sort(휇, ⊴)(푌 ) = { 휔 ⊴ ′ } 휎 ∘ sort(휇, ⊴)(푌 ∖ {휎}) if 푌 ≠ ∅ and 휎 = argmax휎′∈푌 휇(휎 ) where argmax shall be an operator that is fixed within this dissertation and fulfils the following property for any set 푌 ⊆ 훴, function 휇: 훴 → 풜, partial order ⊴ on 풜, and ele- ⊴ ′ ⊴ ′ ment ̄휎∈ 푌: 휇(argmax휎′∈푌 휇(휎 )) ⋪ 휇( ̄휎). In other words, the value of argmax휎′∈푌 휇(휎 ) is an element of 푌 for which no larger element (w.r.t. the image under 휇) exists in 푌.

Ranked trees. A ranked set is a tuple (훺, rk) where 훺 is a set and rk is a function from 훺 to ℕ. We denote the set rk−1(푘) by 훺(푘) for every 푘 ∈ ℕ. Similarly to the case of sets, we call a ranked set (훺, rk)

18 2.1 Basic mathematical concepts

• finite, infinite, countable, or uncountable, if 훺 is finite, infinite, countable, or uncountable, respectively, • a subset of some ranked set (훺′, rk′) if 훺 ⊆ 훺′ and rk ⊆ rk′, and • a superset of some ranked set (훺′, rk′) if 훺 ⊇ 훺′ and rk ⊇ rk′. We will usually denote the ranked set (훺, rk) only by 훺; then the function rk will be clear from the context or referred to as rk훺. r Now let 훺 be a ranked set. The set of ranked trees over 훺, denoted by T훺, is the smallest (푘) r set 푇 such that 휎(푡1, …, 푡푘) ∈ 푇 for every 푘 ∈ ℕ, 푡1, …, 푡푘 ∈ 푇, and 휎 ∈ 훺 . Note that T훺 is (0) r (0) empty if 훺 is empty and T훺 is countable if 훺 is countable. If 훺 is uncountable and 훺 is r non-empty, then T훺 is uncountable.

2.1.4 Algebras and homomorphisms

Let 풜 be a set. The set of operations on 풜, denoted by Ops(풜), is the set ⋃푘∈ℕ Ops푘(풜) where 푘 Ops푘(풜) = 풜 → 풜 for every 푘 ∈ ℕ. Let 푘 ∈ ℕ, 푓 ∈ Ops푘(풜), and ℬ ⊆ 풜. We call ℬ closed under 푓 if 푓(푏1, …, 푏푘) ∈ ℬ for each 푏1, …, 푏푘 ∈ ℬ. Let 풞 ⊆ 풜. The closure of 풞 under 푓, denoted by cl푓(풞), is the smallest superset of 풞 that is closed under 푓.

The elements of Ops푘(풜) are called 푘-ary operations. We write nullary, unary, binary, and ternary instead of 0-ary, 1-ary, 2-ary, and 3-ary, respectively. We do not distinguish between a nullary operation and the element of its image. We usually write binary operations infix, i.e. we write 푎1 ⊚ 푎2 rather than ⊚(푎1, 푎2) for any ⊚ ∈ Ops2(풜) and 푎1, 푎2 ∈ 풜.

Now let ⊙, ⊕ ∈ Ops2(풜) and ퟘ ∈ 풜. We call ퟘ • absorbing w.r.t. ⊙ if ퟘ ⊙ 푎 = ퟘ = 푎 ⊙ ퟘ for each 푎 ∈ 풜 and • identity w.r.t. ⊕ if ퟘ ⊕ 푎 = 푎 = 푎 ⊕ ퟘ for each 푎 ∈ 풜. We call ⊙

• associative if (푎1 ⊙ 푎2) ⊙ 푎3 = 푎1 ⊙ (푎2 ⊙ 푎3) for each 푎1, 푎2, 푎3 ∈ 풜;

• commutative if 푎1 ⊙ 푎2 = 푎2 ⊙ 푎1 for each 푎1, 푎2 ∈ 풜;

• distributive over ⊕ if 푎 ⊙ (푎1 ⊕ 푎2) = (푎 ⊙ 푎1) ⊕ (푎 ⊙ 푎2) for each 푎, 푎1, 푎2 ∈ 풜; and • idempotent if 푎 ⊙ 푎 = 푎 for each 푎 ∈ 풜. Now let ퟙ ∈ 풜 be identity w.r.t. ⊙. We call ⊙ • invertible if for each 푎 ∈ 풜 there is an 푎−1 ∈ 풜 such that 푎−1 ⊙ 푎 = ퟙ = 푎 ⊙ 푎−1. Let 훺 be a ranked set. An 훺-algebra is a tuple (풜, 휃) where 풜 is a set and 휃: 훺 → Ops(풜) (푘) such that 휃(휔) ∈ Ops푘(풜) for each 푘 ∈ ℕ and 휔 ∈ 훺 . 훺 is called the signature of (풜, 휃) and 풜 is called its carrier. We will usually not distinguish between 휔 and 휃(휔) for any 휔 ∈ 훺. Now let (풜, 휃) and (ℬ, 휂) be 훺-algebras. A mapping ℎ: 풜 → ℬ is called a homomorphism (between (풜, 휃) and (ℬ, 휂)) if ℎ(휃(휔)(푎1, …, 푎푘)) = 휂(휔)(ℎ(푎1), …, ℎ(푎푘)) for each 푘 ∈ ℕ, (푘) 휔 ∈ 훺 , and 푎1, …, 푎푘 ∈ 풜.

19 2 Preliminaries

2.1.5 Sorted trees and sorted algebras

Sorts are a widespread concept in computer science. We may think of sorts as data types in a : Every concrete value has a sort (or data type) and every function requires its arguments to be of fixed sorts (or data types) and outputs a value of some fixedsort (or data type).

Sorted trees. Let 푆 be a set, the elements of which we call sorts. An 푆-sorted set is a tuple (훺, sort) where 훺 is a set and sort is a function from 훺 to 푆. We denote the set sort−1(푠) by 훺푠 for every 푠 ∈ 푆. Analogously to the case of ranked sets, we call an 푆-sorted set (훺, sort) • finite, infinite, countable, or uncountable, if 훺 is finite, infinite, countable, or uncountable, respectively, • a subset of some sorted set (훺′, sort′) if 훺 ⊆ 훺′ and sort ⊆ sort′, and • a superset of some sorted set (훺′, sort′) if 훺 ⊇ 훺′ and sort ⊇ sort′. We will usually denote the sorted set (훺, sort) only by 훺; then the function sort will be clear from the context or referred to as sort훺. Now let 푆 be a set and 훺 be an (푆∗ × 푆)-sorted set. The set of sorted trees over 훺, denoted by s T훺, is the smallest 푆-sorted set 푇 such that 휎(푡1, …, 푡푘) ∈ 푇푠 for every 푘 ∈ ℕ, 푠1, …, 푠푘, 푠 ∈ 푆, 푡 ∈ 푇 , …, 푡 ∈ 푇 , and 휎 ∈ 훺 . Note that Ts is empty if ⋃ 훺 is empty and 1 푠1 푘 푠푘 (푠1⋯푠푘,푠) 훺 푠∈푆 (휀,푠) s s T훺 is countable if 훺 is countable. If 훺 is uncountable, then T훺 may be uncountable, depending on the sorts of the elements of 훺. Sorted trees and ranked trees are closely related. Consider a ranked set (훺, rk). Now ∗ let 퐼 be the singleton set {휄} and let sortrk: 훺 → 퐼 × 퐼 be defined for every 휔 ∈ 훺 as rk(휔) rk(휔) sortrk(휔) = (휄 , 휄) where 휄 stands for the symbol 휄 occurring rk(휔) times. Then the sets Tr and Ts are the same. For the converse direction, let 푆 be a set and (훺, sort) be (훺,rk) (훺,sortrk) ∗ ∗ an (푆 × 푆)-sorted set. We define the function rksort: 훺 → ℕ for each 푢 ∈ 푆 , 푠 ∈ 푆, and s r 휔 ∈ 훺(푢,푠) by rksort(휔) = |푢|. It then follows that T ⊆ T . (훺,sort) (훺,rksort) s We fix some notations for sorted trees. Let 푡 = 휔(푡1, …, 푡푘) ∈ T훺. ∗ • The set of positions of 푡, denoted by pos(푡), is the subset of ℕ+ recursively defined by

pos(푡) = {휀} ∪ {푖휌 ∣ 푖 ∈ [푘], 휌 ∈ pos(푡푖)}.

Now let 휌 ∈ pos(푡). • The label of 푡 at 휌, denoted by 푡(휌), is the element of 훺 recursively defined by

휔 if 휌 = 휀 푡(휌) = { ′ ′ ′ } . 푡푖(휌 ) if 휌 = 푖휌 for some 푖 ∈ [푘] and 휌 ∈ pos(푡푖)

In light of this notation, we will sometimes identify the sorted tree 푡 with a partial function ∗ from ℕ+ to 훺 that maps any position 휌 of 푡 to the label of 푡 at 휌 and is undefined for any ∗ element of ℕ+ ∖ pos(푡).

20 2.1 Basic mathematical concepts

s • The subtree of 푡 at 휌, denoted by 푡|휌, is the element of T훺 recursively defined by

푡 if 휌 = 휀 푡|휌 = { ′ ′ } . 푡푖|휌′ if 휌 = 푖휌 for some 푖 ∈ [푘] and 휌 ∈ pos(푡푖)

Sorted algebras. Sorted algebras have been introduced by Higgins [Hig63, called “algebras with operator scheme”, page 117]. We use the notation of Goguen and Meseguer [GM85, definition of “algebra”, page 309]. Let 푆 be a set and 풜 be an 푆-sorted set. The set of 푆- sorted operations on 풜, denoted by Ops (풜), is the set ⋃ Ops (풜) where 푆 푘∈ℕ,푠1,…,푠푘,푠∈푆 (푠1⋯푠푘,푠) Ops (풜) = 풜 × … × 풜 → 풜 for each 푘 ∈ ℕ and 푠 , …, 푠 , 푠 ∈ 푆. We do not (푠1⋯푠푘,푠) 푠1 푠푘 푠 1 푘 distinguish between 푓 and 푓() ∈ 풜푠 for any 푠 ∈ 푆 and 푓 ∈ Ops(휀,푠)(풜). Let 푆 be a set and 훺 be an (푆∗ × 푆)-sorted set. An (푆-sorted) 훺-algebra is a tuple (풜, 휃) where 풜 is an 푆-sorted set and 휃: 훺 → Ops (풜) such that 휃(휔) ∈ Ops (풜) for each 푆 (푠1⋯푠푘,푠) 푘 ∈ ℕ and 푠 , …, 푠 , 푠 ∈ 푆 and 휔 ∈ 훺 . Again, 훺 is called the signature of (풜, 휃) and 풜 1 푘 (푠1⋯푠푘,푠) is called its carrier. Now let (풜, 휃) and (ℬ, 휂) be 푆-sorted 훺-algebras. A mapping ℎ: 풜 → ℬ is called a homo- morphism (between (풜, 휃) and (ℬ, 휂)) if ℎ(휃(휔)(푎1, …, 푎푘)) = 휂(휔)(ℎ(푎1), …, ℎ(푎푘)) for each 푘 ∈ ℕ, 푠 , …, 푠 , 푠 ∈ 푆, 휔 ∈ 훺 , and 푎 ∈ 풜 , …, 푎 ∈ 풜 . 1 푘 (푠1⋯푠푘,푠) 1 푠1 푘 푠푘 Example 2.1 (term algebra, corollary to BL70, proposition 15). Let 푆 be a set and 훺 be an (푆∗ × s 푆)-sorted set. The 훺-term algebra is the algebra (T훺, 휃) where 휃(휔)(푡1, …, 푡푘) = 휔(푡1, …, 푡푘) for each 푠 , …, 푠 , 푠 ∈ 푆, 푡 ∈ (Ts ) , …, 푡 ∈ (Ts ) , and 휔 ∈ 훺 . There is a unique 1 푘 1 훺 푠1 푘 훺 푠푘 (푠1⋯푠푘,푠) homomorphism between the 훺-term algebra and any 푆-sorted 훺-algebra. □

2.1.6 Writing conventions For brevity, we will make the following writing conventions:

• We write {푒1, …, 푒푘 ∣ 훷} instead of {푒1 ∣ 훷} ∪ … ∪ {푒푘 ∣ 훷}. • If 휔() is a ranked tree or a sorted tree, then we will abbreviate it by 휔. • For some 훺-algebra (풜, 휃) and 휔 ∈ 훺, we usually write 휔 rather than 휃(휔) if 휃 is clear from the context.

• We may write (풜, 휔1, …, 휔푘) rather than (풜, 휃) for an 훺-algebra if 훺 = {휔1, …, 휔푘} and 휃 is clear from the context; in this case we might omit the explicit specification of the signature 훺 and call (풜, 휔1, …, 휔푘) an algebra rather than an 훺-algebra. If we are not interested in the operations of (풜, 휃), we may even just write 풜 to refer to it.

2.1.7 Concrete algebras Monoids. A monoid is an algebra (풜, ⊙, ퟙ) where ⊙ is an associative binary operation and ퟙ is identity w.r.t. ⊙. Figure 2.2 places monoids among some other group-like algebras (i.e. algebras with one binary operation) that are considered in group theory [Ros68] and in order theory [Grä78]. Note that figure 2.2 is a Hasse-diagram (if we ignore the arrow heads andthe edge labels) where the relation is the subset relation. Furthermore, the edges of the diagram are

21 2 Preliminaries

groupoids

associativity

semigroups

identity idempotence

monoids idempotent semigroups left and right inverse commutativity commutativity

groups commutative monoids semilattices idempotence commutativity inverse identity Abelian groups bounded semilattices

Figure 2.2: A diagram of group-like algebras. directed (from the larger/upper class to the smaller/lower class) and labelled with the property that is additionally required by the smaller/lower class in comparison to the larger/upper class. The edge labels refer to the properties defined in section 2.1.4. In this dissertation, onlythe monoids and the commutative monoids from figure 2.2 will be relevant. The interested reader shall be referred to Rosenfeld [Ros68, pages 90, 117, and 120] and Grätzer [Grä78, pages 7 and 47] for the definitions of the other group-like algebras shown in the diagram. Example 2.3 (free monoid, cf. DK09, section 2). Let 훴 be a set. The free monoid (over 훴) is the algebra (훴∗, ∘, 휀). For any monoid (풜, ⊙, ퟙ) and any function 푓: 훴 → 풜, there is a unique ∗ superset of 푓 that is a homomorphism between (훴 , ∘, 휀) and (풜, ⊙, ퟙ). □

Let 푀1 = (풜1, ⊙1, ퟙ1) and 푀2 = (풜2, ⊙2, ퟙ2) be monoids. The product monoid of 푀1 and 푀2, denoted by 푀1 × 푀2, is the monoid (풜1 × 풜2, ⊙1 × ⊙2, (ퟙ1, ퟙ2)) where

′ ′ ′ ′ (푎1, 푎2) (⊙1 × ⊙2)(푎1, 푎2) = (푎1 ⊙1 푎1, 푎2 ⊙2 푎2)

′ ′ for each 푎1, 푎1 ∈ 풜1 and 푎2, 푎2 ∈ 풜2. Let (풜, ⊕, ퟘ) be a commutative monoid. We define a function ⨁ that takes an indexed 풜- family and returns an element of 풜 such that

⨁(푎 ∣ 푖 ∈ 퐼) = 푎 ⊕ … ⊕ 푎 푖 푖1 푖푘 for each finite set 퐼 = {푖1, …, 푖푘}, and each 퐼-indexed 풜-family (푎푖 ∣ 푖 ∈ 퐼). (풜, ⊕, ퟘ) is called complete if there is a function ⨁′ that maps every indexed 풜-family to 풜, extends ⊕, and satisfies infinitary associativity and commutativity laws [cf. Gol99, chapter 22; DV13, section2], i.e. for every set 퐼 and every 퐼-indexed 풜-family (푎푖 ∣ 푖 ∈ 퐼), the following holds ′ (i) ⨁ (푎푖 ∣ 푖 ∈ ∅) = ퟘ; ′ (ii) for every 푗 ∈ 퐼: ⨁ (푎푖 ∣ 푖 ∈ {푗}) = 푎푗;

22 2.1 Basic mathematical concepts

′ (iii) for every 푗, 푘 ∈ 퐼 with 푗 ≠ 푘: ⨁ (푎푖 ∣ 푖 ∈ {푗, 푘}) = 푎푗 ⊕ 푎푘; and

(iv) for every set 퐽 and every 퐽-indexed 풫(퐼)-family (ℐ푗 ∣ 푗 ∈ 퐽) where {ℐ푗 ∣ 푗 ∈ 퐽} is a ′ ′ ′ partition of 퐼, we have ⨁ (⨁ (푎푖 ∣ 푖 ∈ ℐ푗) ∣ 푗 ∈ 퐽) = ⨁ (푎푖 ∣ 푖 ∈ 퐼). ′ ′ We will write ⨁ rather than ⨁ since ⨁ ⊇ ⨁. Furthermore, we will write ⨁푖∈퐼 푎푖 rather then ⨁(푎푖 ∣ 푖 ∈ 퐼).

Bimonoids and semirings. A bimonoid is an algebra (풜, ⊕, ⊙, ퟘ, ퟙ) where (풜, ⊕, ퟘ) and (풜, ⊙, ퟙ) are both monoids; we call (풜, ⊕, ퟘ) the additive monoid of 풜 and (풜, ⊙, ퟙ) the multiplicative monoid of 풜. A bimonoid (풜, ⊕, ⊙, ퟘ, ퟙ) is called strong if ⊕ is commutative and ퟘ is absorbing w.r.t. ⊙. A strong bimonoid (풜, ⊕, ⊙, ퟘ, ퟙ) is called commutative if ⊙ is commutative. A semiring is a strong bimonoid (풜, ⊕, ⊙, ퟘ, ퟙ) where ⊙ is distributive over ⊕. Figure 2.4 shows a Hasse-diagram (again, ignoring the arrow heads and the edge labels) of ring-like algebras that are considered in ring theory [Ros68; Gol99] and lattice theory [Grä78]. The edges of the diagram are directed and labelled with the properties that are additionally required by the smaller/lower class in comparison to the larger/upper class. The edge labels refer to the properties defined section 2.1.4 and to absorption laws. The absorption laws are 푎1 ⊕ (푎1 ⊙ 푎2) = 푎1 and 푎1 ⊙ (푎1 ⊕ 푎2) = 푎1 for each 푎1, 푎2 ∈ 풜. Note that the absorption laws together with the existence of identities for ⊕ and ⊙ imply that ⊕ and ⊙ are idempotent (푎 ⊙ 푎 = 푎 ⊙ (푎 ⊕ ퟘ) = 푎 and 푎 ⊕ 푎 = 푎 ⊕ (푎 ⊙ ퟙ) = 푎). Strong bimonoids, commutative strong bimonoids, semirings, and commutative semirings will be relevant in this dissertation; for the definitions of the other ring-like algebras shown in the diagram refer to Rosenfeld [Ros68, p.205, 207, 210, and 211], Golan [Gol99, p. 1 and 52], and Grätzer [Grä78, p. 3 and 47]. A strong bimonoid is called complete if its additive monoid is complete. A semiring is called complete if it is a complete strong bimonoid and ⨁ satisfies infinitary distributivity laws [cf. Gol99, chapter 22], i.e. for each set 퐼, 퐼-indexed 풜-family (푎푖 ∣ 푖 ∈ 퐼), and 푎 ∈ 풜, the following holds

푎 ⊙ ⨁(푎푖 ∣ 푖 ∈ 퐼) = ⨁(푎 ⊙ 푎푖 ∣ 푖 ∈ 퐼) and ⨁(푎푖 ∣ 푖 ∈ 퐼) ⊙ 푎 = ⨁(푎푖 ⊙ 푎 ∣ 푖 ∈ 퐼).

Example 2.5 (taken from Den17b, example 2.1). We provide a list of concrete algebras [cf. DSV10, example 1] that routinely occur in formal language theory, , and natural language processing, and place them among the classes of ring-like algebras presented above: • complete commutative semirings: – the Boolean semiring, (픹, ∨, ∧, 0, 1), where ∨ is the logical disjunction and ∧ is the logical conjunction,

– the probability semiring with ∞, Pr∞ = (ℝ≥0 ∪ {∞}, +, ⋅, 0, 1), where 0 ⋅ ∞ = 0, – the Viterbi semiring, ([0, 1], max, ⋅, 0, 1), – the tropical semiring, Trop = (ℝ ∪ {∞}, min, +, ∞, 0), – the arctic semiring, Arct = (ℝ ∪ {−∞}, max, +, −∞, 0), • complete semirings: – any complete commutative semiring,

23 2 Preliminaries

bigroupoids ⊕ and ⊙ associative ⊕ and ⊙ commutative ⊕ and ⊙ associative absorption laws identities for ⊕ and ⊙ bimonoids absorbing element w.r.t. ⊙ ⊕ commutative lattices strong bimonoids

⊙ commutative ⊙ distributive over ⊕ identities for ⊕ and ⊙ commutative absorption laws strong bimonoids ⊙ commutative ⊙ distributive over ⊕ semirings ⊕ invertible commutative ⊙ invertible ⊙ commutative bounded lattices semirings semifields

⊕ invertible rings

⊙ commutative

commutative rings

⊙ invertible ⊕ invertible fields

Figure 2.4: A diagram of ring-like algebras.

24 2.1 Basic mathematical concepts

∗ – the semiring of formal languages over 훴, Lang훴 = (풫(훴 ), ∪, ∘, ∅, {휀}), where 훴 is a set, • commutative semirings: – any complete commutative semiring,

– the probability semiring, Pr = (ℝ≥0, +, ⋅, 0, 1), • complete commutative strong bimonoids: – any complete commutative semiring, ′ – the tropical bimonoid, Trop = (ℝ≥0 ∪ {∞}, +, min, 0, ∞), ′ – the arctic bimonoid, Arct = (ℝ≤0 ∪ {−∞}, +, max, 0, −∞),

– the algebra Pr1 = ([0, 1], +1, ⋅, 0, 1), where 푎+1 푏 = 푎+푏−푎⋅푏 for each 푎, 푏 ∈ [0, 1],

– the algebra Pr2 = ([0, 1], +2, ⋅, 0, 1), where 푎 +2 푏 = min{푎 + 푏, 1} for each 푎, 푏 ∈ [0, 1], • complete lattices:2

– the lattice Pow퐴 = (풫(퐴), ∪, ∩, ∅, 퐴) for an arbitrary set 퐴, and

– the lattice Div = (ℕ+ ∪ {∞}, lcm, gcd, 1, ∞), where lcm is the least common multiple, gcd is the greatest common divisor, lcm(∞, 푛) = ∞ = lcm(푛, ∞), and gcd(∞, 푛) = 푛 = gcd(푛, ∞) for each 푛 ∈ ℕ+ ∪ {∞}. • strong bimonoids: ∗ – the string semiring [Moh00, page 188], Str훴 = (훴 ∪ {∞}, ∧, ∘, ∞, 휀), where 훴 is a set, ∧ calculates the longest common prefix of its arguments, and ∞ ∉ 훴 such that ∞ ∧ 푤 = 푤 = 푤 ∧ ∞ and ∞ ∘ 푤 = ∞ = 푤 ∘ ∞ for each 푤 ∈ 훴∗.3 Note in the list above that the following algebras are not distributive and thus not semirings: • Trop′ because min{1, 1 + 1} = 1 ≠ 2 = min{1, 1} + min{1, 1}, • Arct′ because max{−1, −1 + (−1)} = −1 ≠ −2 = max{−1, −1} + max{−1, −1}, • Pr because 1 ⋅ (1 + 1) = 1 ≠ 3 = 1 ⋅ 1 + 1 ⋅ 1, and 1 2 1 2 4 2 1 2 • Pr because 2 ⋅ (1 + 1) = 2 ≠ 1 = 2 ⋅ 1 + 2 ⋅ 1. 2 3 2 3 3 2 3

Lang훴 and Str훴 have concatenation as the binary operation of their multiplicative monoid and are therefore only commutative if |훴| ≤ 1; for |훴| ≥ 2 they are not commutative, since {a} ∘ {b} = {ab} ≠ {ba} = {b} ∘ {a} for any a, b ∈ 훴 with a ≠ b. □

Let (풜, ⊕, ⊙, ퟘ, ퟙ) be a strong bimonoid, 퐵 be a set, and 푓: 퐵 → 풜. The support of 푓, denoted by supp(푓), is the set {푏 ∈ 퐵 ∣ 푓(푏) ≠ ퟘ}.

2Note that any complete lattice is bounded. 3Note that the string semiring is not a semiring because it is not right distributive, i.e. the property

∀푎, 푏, 푐: (푎 ⊕ 푏) ⊙ 푐 = (푎 ⊙ 푐) ⊕ (푏 ⊙ 푐)

does not hold.

25 2 Preliminaries

2.2 Language devices

A language is a set of strings. A language device is a syntactic object, usually a tuple, that can be written as a finite string. We associate a language ℒ(퐷) with each language device 퐷 and say that 퐷 is a (finite) representation of ℒ(퐷). Two language devices 퐷1 and 퐷2 are called equivalent if their associated languages are the same, i.e. ℒ(퐷1) = ℒ(퐷2). Here, we are interested in two kinds of language devices: grammars and automata. Other language devices also occur in the literature, e.g. expressions [Kle51] and logical formulæ [Büc60; Elg61; Tra61; Büc66]. In a grammar, a string is derived starting from an initial non-terminal by repeatedly applying so-called rules. In an automaton, a string is recognised by consuming symbols of the string from left to right while applying so-called transitions starting from an initial configuration.4

2.2.1 Context-free grammars The grammar class best known in the natural language processing community is theclassof context-free grammars, introduced by Chomsky [Cho56, section 3].

Definition 2.6 (context-free grammars, syntax). A context-free grammar (short: CFG) is a tuple

퐺 = (푁, 훴, 푁i, 푅) where • 푁 and 훴 are disjoint sets whose elements are called non-terminals and terminals, respec- tively,

• 푁i ⊆ 푁, its elements are called initial non-terminals, and ∗ • 푅 is a finite subset of 푁 × (푁 ∪ 훴) , its elements are called rules. □

Definition 2.7 (CFGs, language). Let 퐺 = (푁, 훴, 푁i, 푅) be a CFG. ∗ • The derivation relation of 퐺, denoted by ⇒퐺, is the endorelation on (푁 ∪ 훴) such that ∗ 푢퐴푤 ⇒퐺 푢푣푤 for each (퐴, 푣) ∈ 푅 and 푢, 푤 ∈ (푁 ∪ 훴) . • The elements of (푁 ∪ 훴)∗ are called sentential forms. ∗ ∗ • The language of 퐺, denoted by ℒ(퐺), is the set {푤 ∈ 훴 ∣ ∃푆 ∈ 푁i: 푆 ⇒퐺 푤}. □ In view of the definition of the derivation relation, we might think ofarule (퐴, 푣) from a CFG as a replacement operation where 퐴 is replaced by 푣 in any context (푢, 푤) ∈ (푁∪훴)∗×(푁∪훴)∗. To support this association, we will henceforth write 퐴 → 푣 rather than (퐴, 푣). We write ⇒ rather than ⇒퐺 if 퐺 is clear from the context. While it is usually required in the literature that the set of non-terminals and the set of terminals are finite, we omitted this requirement. We will also drop this requirement (orsimilar ones) in our definitions of other language devices and weighted language devices. Thisleads to a clearer exposition in some places (e.g. at the end of sections 2.2.4 and 2.3.3). The removal of this requirement does not contradict our goal of finding finite representations of languages: Only non-terminals and terminals that occur in the rules of a CFG are relevant for the language of a CFG. Since the set of rules is finite, those sets of relevant non-terminals and terminals

4Some automaton models extend this notion by allowing the automaton to move through the string more freely, e.g. two-way automata [She59; RS59].

26 2.2 Language devices

are finite as well. Hence there is a finite representation for the language of any CFG.Similar arguments apply to the other language devices and to the weighted language devices presented in this work. Example 2.8. Let 훴 = {a,a} ̄ . Consider the CFG 퐺 = ({푆}, 훴, {푆}, 푅) where

푅 = { 푆 → 휀, 푆 → a푆a푆 ̄ }.

The string 푤 = aaaaā ̄ ā is in ℒ(퐺) because

푆 ⇒ a푆a푆 ̄ ⇒ aa푆a푆 ̄ a푆 ̄ ⇒ aaa푆̄ a푆 ̄ ⇒ aaaā 푆̄ ⇒ aaaaā ̄ 푆a푆 ̄ ⇒ aaaaā ̄ a푆̄ ⇒ 푤.

Note that there are 80 different such proofs for 푆 ⇒∗ 푤. The language ℒ(퐺) is the Dyck language with one bracket pair. Dyck languages are defined in definition 5.2. □ The high number of different proofs mentioned in example 2.8 stems from the factthatthe derivation relation ⇒ may replace any non-terminal in a sentential form by the right-hand side of a matching rule and thus there is a different proof for each order in which thenon- terminals may be replaced. However, since each replacement only depends on the replaced non- terminal (and not on the surrounding non-terminals and terminals), the order of replacements is irrelevant for the resulting string of terminals. We can use a tree to track the rule that is used to replace each non-terminal. Such trees are called derivation trees and they were already used by Chomsky [Cho56]. Using the concept of sorted trees, we can state the definition of derivation trees concisely:

Definition 2.9 (CFGs, derivation trees). Let 퐺 = (푁, 훴, 푁i, 푅) be a CFG and let us construe 푅 ∗ as an (푁 ×푁)-sorted set where sort(휌) = (퐵1⋯퐵푘, 퐴) for each 휌 = 퐴 → 푢0퐵1푢1⋯퐵푘푢푘 ∈ 푅 ∗ with 푢0, …, 푢푘 ∈ 훴 and 퐴, 퐵1, …, 퐵푘 ∈ 푁. s • The set of derivation trees of 퐺, denoted by D퐺, is the 푁-sorted set T푅. • The set of complete derivation trees of 퐺, denoted by Dc , is the 푁-sorted set ⋃ (Ts ) . 퐺 푆∈푁i 푅 푆 Now let 푑 = 휌(푑1, …, 푑푘) ∈ D퐺 be a derivation tree with 휌 = 퐴 → 푢0퐵1푢1⋯퐵푘푢푘.

• The yield of 푑 (w.r.t. 퐺), denoted by yield퐺(푑), is recursively defined as the string of terminal symbols 푢0 ∘ yield퐺(푑1) ∘ 푢1 ∘ … ∘ yield퐺(푑푘) ∘ 푢푘. □

We will abbreviate yield퐺 by yield if 퐺 is clear from the context. The following observation is evident from definitions 2.7 and 2.9, in particular from thefact that ⇒퐺 performs its replacement without regard for the context of the replaced non-terminal and yield performs its recursive call without regard for the context. Observation 2.10. ℒ(퐺) = {yield(푑) ∣ 푑 ∈ ⋃ (D ) } for each CFG 퐺 = (푁, 훴, 푁 , 푅). 푆∈푁i 퐺 푆 i ∎ We continue the previous example with an illustration of derivation trees.

Example 2.8 (continuing from p. 27). For brevity, let 푟1 = (푆 → 휀) and 푟2 = (푆 → a푆a푆) ̄ . Now consider the derivation tree 푑 = 푟2(푟2(푟1, 푟1), 푟2(푟1, 푟1)) of 퐺. Its graphical representation is shown in figure 2.11. The yield of 푑 is 푤; in fact, 푑 is the only derivation tree of 퐺 whose yield is 푤. □

27 2 Preliminaries

푆 → a푆a푆 ̄ yield 푆 → a푆a푆 ̄ 푆 → a푆a푆 ̄ aaaaā ̄ ā

푆 → 휀 푆 → 휀 푆 → 휀 푆 → 휀

Figure 2.11: Derivation tree 푑 with yield 푤, cf. example 2.8.

While derivation trees are concerned with the rules that are applied to replace each non- terminal, there are similar objects that describe which part of a string in the language of a CFG is generated by which non-terminal. These objects are called parse trees and occur frequently in the literature. Figure 2.12 shows a derivation tree with the corresponding parse tree.

푆 → a푆a푆 ̄ 푆

푆 → a푆a푆 ̄ 푆 → a푆a푆 ̄ a 푆 ā 푆

푆 → 휀 푆 → 휀 푆 → 휀 푆 → 휀 a 푆 ā 푆 a 푆 ā 푆

Figure 2.12: Derivation tree 푑 and the corresponding parse tree 푡, cf. example 2.8.

It is easy to see from the figure that parse trees are neither ranked trees nor sorted trees. Rather they are trees in the graph-theoretic sense, i.e. graphs with a designated root (marked by being the top-most node) to which each node of the graph has exactly one path. Each node in a parse tree is either labelled with a non-terminal or a terminal of the underlying grammar. The close relation between parse trees and derivation trees is apparent from figure 2.12, in particular, they encode the same information about the derivation process. Since most of the constructions shown in this work will operate on the level of rules, we will henceforth restrict ourselves to derivation trees. Let 훴 be a set and 퐿 ⊆ 훴∗. We call 퐿 context-free (over 훴) if there is a CFG 퐺 such that ℒ(퐺) = 퐿. The set of context-free languages over 훴 is denoted by CFL(훴). We abbreviate CFL(훴) by CFL if 훴 is clear from the context. Note that CFL(훴) contains only languages that make use of no more than finitely many elements of 훴.

2.2.2 Parallel multiple context-free grammars Seki, Matsumura, Fujii, and Kasami [Sek+91] introduced parallel multiple context-free grammars as a generalisation of context-free grammars.

CFGs and string algebras. Let us first revisit CFGs. We have already seen two views on CFG rules: as a substitution operation and as the nodes of derivation trees. Now we add a third view for good measure: rules as elements of the signature of an algebra. Let us fix the infinite set X = {푥푖 ∣ 푖 ∈ ℕ+} which we call the set of variables. Now let 훴 be a set such that

28 2.2 Language devices

훴 ∩ X = ∅. The set of string composition representations over 훴, denoted by SCR(훴), is the ∗ ranked set (Rep, rk) where Rep = {[푢0푥1푢1…푥푘푢푘] ∣ 푘 ∈ ℕ, 푢0, …, 푢푘 ∈ 훴 } is a set of strings ∗ over {[, ]} ∪ 훴 ∪ X and rk(푐) = 푘 for each 푐 = [푢0푥1푢1…푥푘푢푘] ∈ Rep (푢0, …, 푢푘 ∈ 훴 ).

Definition 2.13 (string algebra). Let 훴 be a set. The string algebra over 훴 is the SCR(훴)-algebra ∗ (훴 , 휑훴) where 휑훴(푐)(푣1, …, 푣푘) = 푢0푣1푢1…푣푘푢푘 for each 푐 = [푢0푥1푢1…푥푘푢푘] ∈ SCR(훴) 푣 , …, 푣 ∈ 훴∗ : r → 훴∗ and 1 푘 . We define a function hom휑,훴 TSCR(훴) where

hom휑,훴(푡) = 휑훴(푐)( hom휑,훴(푡1), …, hom휑,훴(푡푘))

푡 = 푐(푡 , …, 푡 ) ∈ r 5 for each 1 푘 TSCR(훴). □ Now let 푁 and 훴 be finite sets. We define an (푁 ∗ × 푁)-sorted set 푅 = (푁 × (푁 ∪ ∗ 훴) , sort) and a function comp푁,훴: 푅 → SCR(훴) such that sort(휌) = (퐵1…퐵푘, 퐴) and ∗ comp푁,훴(휌) = [푢0푥1푢1…푥푘푢푘] for each 휌 = (퐴, 푢0퐵1푢1…퐵푘푢푘) ∈ 푅 with 푢0, …, 푢푘 ∈ 훴 s and 퐴, 퐵1, …, 퐵푘 ∈ 푁. Note that 푅 contains all rules and T푅 contains all derivation trees that may occur in a CFG with non-terminals 푁 and terminals 훴. We define a function ′ : s → r comp푁,훴 T푅 TSCR(훴) where

′ ′ ′ comp푁,훴(푑) = (comp푁,훴(휌))(comp푁,훴(푑1), …, comp푁,훴(푑푘))

s 6 ′ for each 푑 = 휌(푑1, …, 푑푘) ∈ T푅. We will write comp푁,훴 rather than comp푁,훴. The next observation follows from definitions 2.9 and 2.13 and the definition of comp푁,훴.

Proposition 2.14. yield퐺 = comp푁,훴 ; hom휑,훴 for each CFG 퐺 = (푁, 훴, 푆, 푅). : s → 훴∗ : s → s : s → Proof. As previously noted, yield퐺 T푅 , comp푁,훴 T푅 TSCR(푅), and hom휑,훴 TSCR(푅) ∗ s ∗ 훴 are each homomorphisms. Hence, comp푁,훴 ; hom휑,훴: T푅 → 훴 is also a homomorphism. s Furthermore, since T푅 is a term algebra, there is a unique homomorphism into the string algebra over 훴, cf. example 2.1. Consequently, yield퐺 and comp푁,훴 ; hom휑,훴 must be the same. ∎

In the following, for any 훴, we will no longer distinguish between a string composition representation 푐 over 훴 and 휑훴(푐). Instead we call both objects a string composition (function) and we write 푐 rather than 휑훴(푐). Furthermore, in light of the decomposition of a rule 휌 = 퐴 → 푢0퐵1푢1…퐵푘푢푘 into its string composition [푢0푥1푢1…푥푘푢푘] and its sort (퐵1…퐵푘, 퐴), we propose an alternative syntax for 7 rules of a CFG: 휌 = 퐴 → [푢0푥1푢1…푥푘푢푘](퐵1, …, 퐵푘).

String-tuple algebras and parallel multiple context-free grammars. Now we extend the definition of string algebras and of CFGs to be able to deal with tuples. Theresulting objects are then called tuple algebras and parallel multiple context-free grammars, respectively.

5 The function hom휑,훴 is a homomorphism between the SCR(훴)-term algebra and the string algebra over 훴. 6 ′ The function comp푁,훴 is a homomorphism between the 푅-term algebra and the SCR(훴)-term algebra. 7This syntax coincides with the syntax of a rule of a [Bra69; cf. notation ofEng15,p.16] where SCR(훴) is the set of terminals.

29 2 Preliminaries

Although this is a tedious task that is avoided by most authors in the field of natural language processing, we will do it explicitly in order to highlight the role of sorted algebras in the definition. (푠1…푠푘) Let us fix a set X = {푥푖,푗 ∣ 푖 ∈ [푘], 푗 ∈ [푠푖]} for each 푘, 푠1, …, 푠푘 ∈ ℕ, and let X = ⋃ X(푠1…푠푘). We call the elements of X variables. Note that X(휀) = ∅. Now let 훴 푘,푠1,…,푠푘∈ℕ be a set such that 훴 ∩ X = ∅. The set of (string-)tuple composition representations, denoted by TCR(훴), is the (ℕ∗ × ℕ)-sorted set (Rep, sort) where

(푠1…푠푘) (푠1…푠푘) ∗ Rep = {[푢1, …, 푢푠] ∣ 푘, 푠1, …, 푠푘, 푠 ∈ ℕ, 푢1, …, 푢푠 ∈ (훴 ∪ X ) }

(푠1…푠푘) and sort(푐) = (푠1…푠푘, 푠) for each 푐 = [푢1, …, 푢푠] ∈ Rep. For any tuple composition (푠1…푠푘) representation 푐 = [푢1, …, 푢푠] , the fanout of 푐, denoted by fanout(푐), is 푠. We call a tuple (푠1…푠푘) composition representation [푢1, …, 푢푠]

(푠1…푠푘) • linear if each variable from X occurs at most once in 푢1…푢푠,

(푠1…푠푘) • non-deleting if each variable from X occurs at least once in 푢1…푢푠,

• monotonous if it is linear, non-deleting, and the variables 푥1,1, …, 푥푘,1 occur in ascending order of their first indices in 푢1…푢푠, • strongly monotonous if it is monotonous and, for each 푖 ∈ [푘], the variables 푥 , …, 푥 푖,1 푖,푠푖 occur in ascending order of their second indices in 푢1…푢푠, and ∗ • terminal-free if 푢1, …, 푢푠 ∈ 푋 . Definition 2.15 (tuple algebra). Let 훴 be a set such that 훴 ∩ X = ∅.A tuple algebra over 훴 is ∗ 푠 the ℕ-sorted TCR(훴)-algebra ( ⋃푠∈ℕ(훴 ) , 휓훴) where 휓 (푐)((푣 , …, 푣 ), …, (푣 , …, 푣 )) = (푢′ , …, 푢′ ) 훴 1,1 1,푠1 푘,1 푘,푠푘 1 푠

for each 푐 = [푢 , …, 푢 ](푠1…푠푘) ∈ TCR(훴) and 푣 , …, 푣 , …, 푣 , …, 푣 ∈ 훴∗, where 1 푠 1,1 1,푠1 푘,1 푘,푠푘 ′ ′ 푢1, …, 푢푠 are obtained from 푢1, …, 푢푠, respectively, by replacing each occurrence of 푥푖,푗 with 푣 푖 ∈ [푘] 푗 ∈ [푠 ] : s → ⋃ (훴∗)푠 푖,푗 for each and 푖 . We define the function hom휓,훴 TTCR(훴) 푠∈ℕ where

hom휓,훴(푡) = 휓훴(푐)( hom휓,훴(푡1), …, hom휓,훴(푡푘))

푡 = 푐(푡 , …, 푡 ) ∈ s 8 for each 1 푘 TTCR(훴). □ In the following, for any 훴, we will no longer distinguish between a tuple composition representation 푐 over 훴 and 휓훴(푐). Instead we call both objects a tuple composition (function) (푠1…푠푘) and we write 푐 rather than 휓훴(푐). Also, we will abbreviate [푢1, …, 푢푠] by [푢1, …, 푢푠] whenever 푘 and 푠1, …, 푠푘 are clear from the context. Definition 2.16 (parallel multiple context-free grammars, syntax). A parallel multiple context-

free grammar (short: PMCFG) is a tuple 퐺 = (푁, 훴, 푁i, 푅) where • 푁 is an ℕ-sorted set, whose elements we call non-terminals,9

8 The function hom휓,훴 is a homomorphism between the TCR(훴)-term algebra and the tuple algebra over 훴. 9In the literature, non-terminals with sort 0 are usually not allowed. We include such non-terminals since they are necessary to prove lemma 2.59.

30 2.2 Language devices

• 훴 is a set disjoint from X, whose elements we call terminals,

• 푁i ⊆ 푁1, called the initial non-terminal, and ∗ ∗ • 푅 is a finite (푁 ×푁)-sorted subset of 푁 ×TCR(훴)×푁 where sort((퐴, 푐, 퐵1⋯퐵푘)) = (퐵1…퐵푘, 퐴) and sort(푐) = (sort(퐵1)…sort(퐵푘), sort(퐴)); the elements of 푅 are called rules. □

Let 퐺 = (푁, 훴, 푁i, 푅) be a PMCFG and 휌 = (퐴, 푐, 퐵1⋯퐵푘) ∈ 푅 with sort(푐) = (푠1⋯푠푘, 푠). The fanout of 휌, denoted by fanout(휌), is 푠. The rank of 휌, denoted by rank(휌), is 푘. For any 푖 ∈ [푘], the 푖-fanout of 휌, denoted by fanout푖(휌), is 푠푖.

Definition 2.17 (PMCFGs, language). Let 퐺 = (푁, 훴, 푁i, 푅) be a PMCFG and 퐴 ∈ 푁. s • The set of derivation trees of 퐺, denoted by D퐺, is the set T푅 of sorted trees over 푅. • The set of complete derivation trees of 퐺, denoted by Dc , is the 푁-sorted set ⋃ (Ts ) . 퐺 푆∈푁i 푅 푆 ∗ Now let 푑 = 휌(푑1, …, 푑푘) ∈ D퐺 with 휌 = (퐴, [푢1, …, 푢푠], 퐵1⋯퐵푘) and let 푤 ∈ 훴 . ∗ 푠 • The yield of 푑, denoted by yield(푑), is the tuple of strings (휋2 ; hom휓,훴)(푑) ∈ (훴 ) 휋 : s → s where 2 T푁×TCR(훴)×푁∗ TTCR(훴) is the function that returns the tree obtained from its argument by replacing every triple in the tree with its second component. c −1 c • The set of complete derivation trees of 퐺 for 푤, denoted by D퐺(푤), is the set yield (푤)∩D퐺.

• The language of 퐴 in 퐺, denoted by ℒ(퐺, 퐴), is the set {yield(푑) ∣ 푑 ∈ (D퐺)퐴} of tuples. c • The language of 퐺, denoted by ℒ(퐺), is the set {yield(푑) ∣ 푑 ∈ D퐺}. □

We will usually write 퐴 → [푢1, …, 푢푚](퐵1, …, 퐵ℓ) rather than (퐴, [푢1, …, 푢푚], 퐵1…퐵ℓ) for rules of a PMCFG.10

Example 2.18 (PMCFG). Consider the PMCFG 퐺 = (푁, 훴, {푆}, 푅) where 푁 = {푆, 퐴} with sort(푆) = 1 and sort(퐴) = 2, 훴 = {a, b}, and 푅 contains exactly the following three rules:

푆 → [푥1,1푥1,2](퐴) 퐴 → [푥1,1푥1,1, 푥1,2푥1,2](퐴) 퐴 → [a, b]().

푛 푛 The language of 퐺 is ℒ(퐺) = {a2 b2 ∣ 푛 ∈ ℕ}, which is not context-free. For each 푛 ∈ ℕ, c 2푛 2푛 there is a unique derivation tree 푑푛 ∈ D퐺 with yield a b , which is shown in figure 2.19. □

A PMCFG 퐺 = (푁, 훴, 푁i, 푅) is called • of fanout 푠 if each rule in 푅 has fanout at most 푠, • of rank 푘 if each rule in 푅 has rank at most 푘, • a multiple context-free grammar (short: MCFG) if the second component of each rule in 푅 is linear, • a string-rewriting linear context-free rewriting system (short: string-LCFRS) if the second component of each rule in 푅 is linear and non-deleting, c ∗ • unambiguous if |D퐺(푤)| ≤ 1 for each 푤 ∈ 훴 ,

10This syntax coincides with the syntax of a rule of a regular tree grammar [Bra69; cf. notation ofEng15,p.16] where TCR(훴) is the set of terminals.

31 2 Preliminaries

푆 → [푥1,1푥1,2](퐴)

퐴 → [푥1,1푥1,1, 푥1,2푥1,2](퐴)

yield 푛 푛 푛 times ⋮ a2 b2

퐴 → [푥1,1푥1,1, 푥1,2푥1,2](퐴)

퐴 → [a, b]()

2푛 2푛 Figure 2.19: Unique derivation tree 푑푛 with yield a b , cf. example 2.18.

• ambiguous if it is not unambiguous, c ∗ • finitely ambiguous if D퐺(푤) is finite for each 푤 ∈ 훴 , • infinitely ambiguous if it is not finitely ambiguous • monotonous if the second component of each rule in 푅 is monotonous, and • strongly monotonous if the second component of each rule in 푅 is strongly monotonous. Note that 퐺 in example 2.18 is of fanout 2, rank 1, and strongly monotonous; it is neither an MCFG nor a string-LCFRS. MCFGs are expressively equivalent to string-LCFRSs [VWJ87], i.e. for every MCFG there is an equivalent string-LCFRS and vice versa [Sek+91, lemma 2.2]. String-rewriting linear context-free rewriting systems have been intensively studied by the natural language processing community in recent years, see the book of Kallmeyer [Kal10] for an overview.

Example 2.20 (string-LCFRS, taken from Den15, example 4). Consider the MCFG 퐺 = (푁, 훴, {푆}, 푅) where 푁 = {푆, 퐴, 퐵} with sort(푆) = 1 and sort(퐴) = sort(퐵) = 2, 훴 = {a, b, c, d}, and 푅 contains exactly the following five rules:

푆 → [푥1,1푥2,1푥1,2푥2,2](퐴, 퐵) 퐴 → [a푥1,1, c푥1,2](퐴) 퐴 → [휀, 휀]()

퐵 → [b푥1,1, d푥1,2](퐵) 퐵 → [휀, 휀]().

The language of 퐺 is ℒ(퐺) = {a푚b푛c푚d푛 ∣ 푚, 푛 ∈ ℕ}, which is not context-free. For each c 푚 푛 푚 푛 푚 ∈ ℕ and 푛 ∈ ℕ, there is a unique derivation tree 푑푚,푛 ∈ D퐺 with yield a b c d w.r.t. 퐺, which is shown in figure 2.21. Note that 퐺 is a strongly monotonous string-LCFRS of fanout 2 and rank 2. □

Examples 2.18 and 2.20 were unambiguous PMCFGs. The next example illustrates finite ambiguity.

Example 2.22. Consider the string-LCFRS 퐺 = (푁, 훴, {푆}, 푅) where 푁 = {푆, 퐴} with

32 2.2 Language devices

푆 → [푥1,1푥2,1푥1,2푥2,2](퐴, 퐵)

퐴 → [a푥1,1, c푥1,2](퐴) 퐵 → [b푥1,1, d푥1,2](퐵) yield 푚 times ⋮ 푛 times ⋮ a푚b푛c푚d푛

퐴 → [a푥1,1, c푥1,2](퐴) 퐵 → [b푥1,1, d푥1,2](퐵)

퐴 → [휀, 휀]() 퐵 → [휀, 휀]()

푚 푛 푚 푛 Figure 2.21: Derivation tree 푑푚,푛 with yield a b c d w.r.t. 퐺, cf. example 2.20.

sort(퐴) = 1, 훴 = {a,a} ̄ , and 푅 contains exactly the following five rules:

푆 → [휀]() 푆 → [푥1,1](퐴)

퐴 → [aa]()̄ 퐴 → [a푥1,1a](퐴)̄ 퐴 → [푥1,1푥2,1](퐴, 퐴)

Note that 퐺 is of fanout 1 and its language is the same as the one from the CFG shown in example 2.8. 퐺 is ambiguous. There are, for example, two derivation trees of 퐺 that yield the string aā aā aa.̄ Both are shown in figure 2.23. However, for each string 푤 ∈ 훴∗, there are only a finite number of derivation trees that yield 푤. □

푆 → [푥1,1](퐴) 푆 → [푥1,1](퐴)

퐴 → [푥1,1푥2,1](퐴, 퐴) 퐴 → [푥1,1푥2,1](퐴, 퐴)

퐴 → [푥1,1푥2,1](퐴, 퐴) 퐴 → [aa]()̄ 퐴 → [aa]()̄ 퐴 → [푥1,1푥2,1](퐴, 퐴)

퐴 → [aa]()̄ 퐴 → [aa]()̄ 퐴 → [aa]()̄ 퐴 → [aa]()̄

Figure 2.23: Derivation trees of 퐺 with yield aā aā aa,̄ cf. example 2.22.

And finally, we show an example for infinite ambiguity:

Example 2.24. Consider the string-LCFRS 퐺 = (푁, 훴, 푁, 푅) of fanout 1 where 푁 = {푆} with sort(푆) = 1, 훴 = {a,a} ̄ , and 푅 contains exactly the following three rules:

푆 → [휀]() 푆 → [a푥1,1a](푆)̄ 푆 → [푥1,1푥2,1](푆, 푆)

This string-LCFRS again has the same language as the one from the CFG shown in example 2.8. However, it is not finitely ambiguous since any of the countably infinite derivation trees build from the both rules 푆 → [휀]() and 푆 → [푥1,1푥2,1](푆, 푆) have yield 휀. □

33 2 Preliminaries

For each set 훴, and 푘, 푠 ∈ ℕ+, we fix the following sets of grammars and languages: • (푠, 푘)-PMCFG(훴) denotes the set of PMCFGs of fanout 푠 and of rank 푘 whose terminals are taken from the set 훴, • (푠, 푘)-PMCFL(훴) = {ℒ(퐺) ∣ 퐺 ∈ (푠, 푘)-PMCFG(훴)}, ′ • 푠-PMCFG(훴) = ⋃ ′ (푠, 푘 )-PMCFG(훴), 푘 ∈ℕ+ • 푠-PMCFL(훴) = {ℒ(퐺) ∣ 퐺 ∈ 푠-PMCFG(훴)}, ′ • PMCFG(훴) = ⋃ ′ 푠 -PMCFL(훴), 푠 ∈ℕ+ • PMCFL(훴) = {ℒ(퐺) ∣ 퐺 ∈ PMCFG(훴)}, • (푠, 푘)-MCFG(훴) denotes the set of MCFGs of fanout 푠 and of rank 푘 whose terminals are taken from the set 훴, • (푠, 푘)-MCFL(훴) = {ℒ(퐺) ∣ 퐺 ∈ (푠, 푘)-MCFG(훴)}, ′ • 푠-MCFG(훴) = ⋃ ′ (푠, 푘 )-MCFG(훴), 푘 ∈ℕ+ • 푠-MCFL(훴) = {ℒ(퐺) ∣ 퐺 ∈ 푠-MCFG(훴)}, ′ • MCFG(훴) = ⋃ ′ 푠 -MCFG(훴), and 푠 ∈ℕ+ • MCFL(훴) = {ℒ(퐺) ∣ 퐺 ∈ MCFG(훴)}. A language 퐿 ⊆ 훴∗ is called multiple context-free if 퐿 ∈ MCFL(훴).

2.2.3 Finite-state automata Finite-state automata are well-understood. Many known results on finite-state automata were compiled in the books by Hopcroft and Ullman [HU69; HU79] and Hopcroft, Motwani, and Ullman [HMU01]. We will recall the basic definitions here.11 Definition 2.25 (finite-state automata, syntax). A finite-state automaton (short: FSA) is a tuple

ℳ = (푄, 훴, 푄i, 푄f, 푇 ) where • 푄 and 훴 are sets, whose elements we call states and terminals, respectively,

• 푄i ⊆ 푄, whose elements we call initial states,

• 푄f ⊆ 푄, whose elements we call final states, and ∗ • 푇 is a finite subset of 푄 × 훴 × 푄; the elements of 푇 are called transitions. □ The language of a finite-state automaton is defined analogously to that ofPMCFGs:by applying a function yield to runs (which take the place of derivation trees).

Definition 2.26 (FSAs, language). Let ℳ = (푄, 훴, 푄i, 푄f, 푇 ) be an FSA. ′ ′ ∗ ′ • A run (of ℳ) is a string (푞1, 푢1, 푞1)…(푞푘, 푢푘, 푞푘) ∈ 푇 with 푘 ∈ ℕ such that 푞푖 = 푞푖+1 for each 푖 ∈ [푘 − 1].

• The set of runs of ℳ is denoted by Runsℳ.

Let 휃 = (푞0, 푢1, 푞1)(푞1, 푢2, 푞2)…(푞푘−1, 푢푘, 푞푘) ∈ Runsℳ be a run of ℳ.

11In the nomenclature of Hopcroft, Motwani, and Ullman [HMU01, section 2.3.3], we define non-deterministic finite-state automata with extended transition function.

34 2.2 Language devices

• We call 휃 accepting if 푞0 ∈ 푄i and 푞푘 ∈ 푄f. acc • The set of accepting runs of ℳ is denoted by Runsℳ.

• The yield of 휃, denoted by yield(휃), is the string 푢1…푢푘. −1 • The set of all runs of ℳ that yield 푤, i.e. yield (푤) ∩ Runsℳ, is denoted by Runsℳ(푤). −1 acc • The set of all accepting runs of ℳ that yield 푤, i.e. yield (푤) ∩ Runsℳ, is denoted by acc Runsℳ(푤). acc • The language recognised by ℳ, denoted by ℒ(ℳ), is {yield(휃) ∣ 휃 ∈ Runsℳ}. □

a b

휀 start 푞a 푞b

Figure 2.27: Graphical representation of the automaton ℳ given in example 2.28.

Example 2.28 (FSA). Consider the FSA ℳ = (푄, 훴, {푞a}, {푞b}, 푇 ) where 푄 = {푞a, 푞b}, 훴 = {a, b}, and 푇 = {휏1, 휏2, 휏3} with

휏1 = (푞a, a, 푞a) 휏2 = (푞a, 휀, 푞b) 휏3 = (푞b, b, 푞b).

The language of ℳ is {a푚b푛 ∣ 푚, 푛 ∈ ℕ} = {a}∗ ∘ {b}∗. A graphical representation of ℳ is given in figure 2.27 where the states are represented by nodes, any transition (푞, 푢, 푞′) ∈ 푇 is represented by an edge from 푞 to 푞′ with label 푢, the node corresponding to the initial state is marked with an incoming edge from a node named “start”, and the nodes corresponding to the final states are drawn with a double circle. □ A language 퐿 ⊆ 훴∗ is called recognisable (over 훴) if there is an FSA ℳ such that 퐿 = ℒ(ℳ). The set of all recognisable languages over 훴 is denoted by REC(훴), or REC if 훴 is clear from the context.

An FSA ℳ = (푄, 훴, 푄i, 푄f, 푇 ) is called acc ∗ • unambiguous if |Runsℳ(푤)| ≤ 1 for each 푤 ∈ 훴 , • ambiguous if it is not unambiguous, and acc ∗ • finitely ambiguous if Runsℳ(푤) is finite for every 푤 ∈ 훴 . The FSA in example 2.28 is unambiguous.

2.2.4 Automata with data storage Automata with data storage [Gol79; Gol80; cf. also Sco67; Eng86; Eng14; HV16] add to an FSA the ability to manipulate a so-called storage configuration that is taken from a possibly infinite set.12 Let us first look at the definition of data storage and then consider the automata thatuse it. 12In contrast, only finitely many states are relevant for an FSA: the states that occur in its transitions.

35 2 Preliminaries

Definition 2.29 (data storage). A data storage is a tuple DS = (퐶, 퐼, 퐶i, 퐶f) where • 퐶 is a set, whose elements we call storage configurations, • 퐼 ⊆ 풫(퐶 × 퐶), whose elements we call instructions,

• 퐶i ⊆ 퐶, whose elements we call initial storage configurations, and

• 퐶f ⊆ 퐶, whose elements we call final storage configurations. □

Our data storages are similar to the “storage types” introduced by Engelfriet [Eng86; Eng14] and Engelfriet and Vogler [EV86]; the two differences are that storage types have predicates and that the instructions of storage types are partial functions (instead of arbitrary binary relations). The “data storage types” introduced by Herrmann and Vogler [HV16, section 3] differ fromour data storages since they additionally have predicates and their instructions are partial functions (instead of arbitrary binary relations) that depend on the input of the automaton in addition to the current storage configuration (instead of only on the current storage configuration).

Let DS = (퐶, 퐼, 퐶i, 퐶f) be a data storage. We call DS • finitely non-deterministic if 푖(푐) is finite for each 푖 ∈ 퐼 and 푐 ∈ 퐶, • boundedly non-deterministic if there is a number 푘 ∈ ℕ such that |푖(푐)| ≤ 푘 holds for each 푖 ∈ 퐼 and 푐 ∈ 퐶, and • deterministic if |푖(푐)| ≤ 1 for each 푖 ∈ 퐼 and 푐 ∈ 퐶. In the following, we present three classic examples for data storages.13

Example 2.30 (counting, taken from Eng86; Eng14, definition 3.4). The data storage Count models a simple counting mechanism:

Count = (ℕ, {id(ℕ+), id({0}), inc, dec}, {0}, ℕ)

−1 with inc = {(푛, 푛 + 1) ∣ 푛 ∈ ℕ} = dec . Count is deterministic since |id(ℕ+)(푛)| ≤ 1, |id({0})(푛)| ≤ 1, |inc(푛)| = 1, and |dec(푛)| ≤ 1 for each 푛 ∈ ℕ. □

Example 2.31 (pushdown storage, similar to Eng86; Eng14, definition 3.2). Let 훤 be a non- empty set. The data storage PD(훤 ) models pushdown storage:14

PD(훤 ) = (훤 ∗,{bottom, pop} ∪ {top(훾), push(훾) ∣ 훾 ∈ 훤 }, {휀}, 훤 ∗) where

• bottom = id({휀}), • top(훾) = {(훾푤, 훾푤) ∣ 푤 ∈ 훤 ∗} for each 훾 ∈ 훤, • pop = {(훾푤, 푤) ∣ 훾 ∈ 훤 , 푤 ∈ 훤 ∗}, and • push(훾) = {(푤, 훾푤) ∣ 푤 ∈ 훤 ∗} for each 훾 ∈ 훤. PD(훤 ) is deterministic since |push(훾)(푤)| = 1, |id({휀})(푤)| ≤ 1, |top(훾)(푤)| ≤ 1, and ∗ |pop(푤)| ≤ 1 for each 푤 ∈ 훤 and 훾 ∈ 훤. □

13The predicates of Engelfriet [Eng86; Eng14] are expressed here as partial identities [Den17a, lemma8]. 14In comparison to Engelfriet [Eng86; Eng14], we omit the dedicated “bottom of the pushdown” symbol.

36 2.2 Language devices

Example 2.32 (tape storage). Let 훤 be a set and □ ∈ 훤. The tape storage on 훤 and □ is the data storage

Tape(훤 ) = ((ℕ+ → 훤 ) × ℕ+, 퐼, {⟨{(푛, □) ∣ 푛 ∈ ℕ+}, 1⟩}, (ℕ+ → 훤 ) × ℕ+) where 퐼 = {read(훾), write(훾) ∣ 훾 ∈ 훤 } ∪ {left, right} and

• read(훾) = id({⟨푡, 푛⟩ ∣ 푡: ℕ+ → 훤 , 푛 ∈ ℕ+, 푡(푛) = 훾}) for each 훾 ∈ 훤,

• write(훾) = {(⟨푡, 푛⟩, ⟨푡[푛/훾], 푛⟩) ∣ 푡: ℕ+ → 훤 , 푛 ∈ ℕ} for each 훾 ∈ 훤,

• left = {(⟨푡, 푛⟩, ⟨푡, 푛 − 1⟩) ∣ 푡: ℕ+ → 훤 , 푛 ∈ ℕ+, 푛 > 1}, and

• right = {(⟨푡, 푛⟩, ⟨푡, 푛 + 1⟩) ∣ 푡: ℕ+ → 훤 , 푛 ∈ ℕ+}. The tape storage is intended to simulate the tape of a . The tape oftheTuring machine is represented by a function 푡: ℕ+ → 훤. The current square is modelled by a number 푛 ∈ ℕ+. If 푡(푛) = □, then we call the 푛-th square of 푡 blank. The initial configuration is atape that is all blank and where the current square is one. Any configuration is considered final. The four instructions immediately implement the usual operations on the tape: • read(훾) checks if the current square is labelled with 훾, • read(□) checks if the current square is blank, • write(훾) writes the symbol 훾 to the current square, • write(□) makes the current square blank, • left moves the current square to the left, and • right moves the current square to the right. The state changes that can be done in a Turing machine will be implemented by statechanges in the automaton with data storage and thus need not be represented in the data storage. □

Now we show two examples of data storage that are not deterministic.

Example 2.33 (taken from Den17a, example 4). Let 훤 be a non-empty set. The data storage PD′(훤 ) also models pushdown storage (cf. example 2.31).15 In comparison to PD(훤 ) it has an additional instruction pop∗ that can remove arbitrarily many symbols from the pushdown.

PD′(훤 ) = (훤 ∗,{bottom, pop, pop∗} ∪ {top(훾), push(훾) ∣ 훾 ∈ 훤 }, {휀}, 훤 ∗) where pop∗ = {(푢푤, 푤) ∣ 푢, 푤 ∈ 훤 ∗}. The data storage PD′(훤 ) is not deterministic since |pop∗(훾)| = |{휀, 훾}| = 2 > 1 for each 훾 ∈ 훤. Also, PD′(훤 ) is not boundedly non-deterministic since |pop∗(훾푛+1)| = |{휀, 훾, 훾2, …, 훾푛}| = 푛 + 1 for each 푛 ∈ ℕ and 훾 ∈ 훤. However, it is ∗ ∗ finitely non-deterministic since |pop (푤)| = |푤| < ∞ for each 푤 ∈ 훤 . □

Example 2.34 (taken from Den17a, example 5). Let 훤 be a non-empty set. The data storage PD″(훤 ) is a third version of pushdown storage (cf. examples 2.31 and 2.33). In comparison to PD(훤 ) it has an additional instruction push(훤 ) that can push any symbol from 훤 on top of the

15This example originated from a comment of an anonymous reviewer of Denkinger [Den17a].

37 2 Preliminaries

pushdown.

PD″(훤 ) = (훤 ∗,{bottom, pop, push(훤 )} ∪ {top(훾), push(훾) ∣ 훾 ∈ 훤 }, {휀}, 훤 ∗) where push(훤 ) = pop−1 = {(푤, 훾푤) ∣ 푤 ∈ 훤 ∗, 훾 ∈ 훤 }. Since |push(훤 )(푤)| = |훤 |, we have the following implications: • If |훤 | = 1, then PD″(훤 ) is deterministic. • If |훤 | ≥ 2, then PD″(훤 ) is not deterministic. • If 훤 is finite, then PD″(훤 ) is boundedly non-deterministic. ″ • If 훤 is infinite, the PD (훤 ) is not finitely non-deterministic. □

Definition 2.35 (closure of data storage). Let DS = (퐶, 퐼, 퐶i, 퐶f) be a data storage. The ∗ ∗ ∗ composition closure of DS, denoted by DS , is the data storage (퐶, 퐼 , 퐶i, 퐶f) where 퐼 is the smallest set 퐽 such that • id(퐶) ∈ 퐽, • 퐼 ⊆ 퐽, and

• if 푖1, 푖2 ∈ 퐽, then 푖1 ; 푖2 ∈ 퐽. ∗ We call DS composition closed if DS = DS. □ Let us now turn our attention to automata with data storage. Definition 2.36 (automata with data storage, syntax). An automaton with data storage is a

tuple ℳ = (푄, DS, 훴, 푄i, 푄f, 푇 ) where • 푄 and 훴 are sets, whose elements we call states and terminals, respectively,

• DS = (퐶, 퐼, 퐶i, 퐶f) is a data storage,

• 푄i ⊆ 푄, whose elements we call initial states,

• 푄f ⊆ 푄, whose elements we call final states, and ∗ ∗ • 푇 is a finite subset of 푄 × 퐼 × 훴 × 푄, the elements of which we call transitions. □ For a transition 휏 = (푞, 푖, 푢, 푞′), we call 푞 the source state, 푖 the instruction (even though it is the composition of multiple instructions of the underlying data storage), 휎 the read string, and 푞′ the target state of 휏. For convenience, we sometimes call an automaton with data storage whose data storage is DS and whose terminals are taken from 훴 a (DS, 훴)-automaton.

For the remainder of this section, let ℳ = (푄, DS, 훴, 푄i, 푄f, 푇 ) be an arbi- trary (DS, 훴)-automaton and DS = (퐶, 퐼, 퐶i, 퐶f).

A configuration of ℳ is a tuple (푞, 푐, 푤) where 푞 ∈ 푄, 푐 ∈ 퐶, and 푤 ∈ 훴∗. We call 푞 the current state, 푐 the current storage configuration, and 푤 the remaining input of this configuration. Intuitively, the application of some transition 휏 = (푝, 푖, 푢, 푝′) of ℳ to some configuration (푞, 푐, 푤) of ℳ works as follows: (i) ensure that 푝 = 푞,

38 2.2 Language devices

(ii) choose (non-deterministically) a storage configuration 푐′ from the set 푖(푐), (iii) ensure that 푤 = 푢푤′ for some string 푤′ ∈ 훴∗, and (iv) return the tuple (푝′, 푐′, 푤′) as the successor configuration. There might be zero or more successor configurations that can be returned by this process, depending on the current state, the remaining input, and the size of the set 푖(푐). If DS is not finitely non-deterministic, there might even be infinitely many successor configurations. Note that, since 푖 is an element of 퐼∗ (which already contains id(퐶)), it would be superfluous to add the identity id(퐶) to the set of instructions of a data storage. For any data storage, we will abbreviate the total identity on the set of its configurations by id. The four steps described above are the basis for the following definition.

Definition 2.37 (automata with data storage, run relation). Let 휏 = (푞, 푖, 푢, 푞′) ∈ 푇. We define ∗ the transition relation of 휏 (w.r.t. ℳ), denoted by ⊢휏, as the endorelation on 푄 × 퐶 × 훴 such ′ that (푞0, 푐0, 푤0) ⊢휏 (푞1, 푐1, 푤1) if and only if 푞 = 푞0, 푞 = 푞1, 푐1 ∈ 푖(푐0), and 푤0 = 푢푤1. The transition relation of ℳ, denoted by ⊢ℳ, is ⋃휏∈푇 ⊢휏. □ We will abbreviate the sequential composition ⊢ ;…;⊢ by ⊢ . Consequently, ⊢ is the 휏1 휏푘 휏1…휏푘 휀 ∗ ∗ identity on 푄 × 퐶 × 훴 and (⊢ℳ) = ⋃휃∈푇 ∗ ⊢휃. If we are not interested in the yield of a run, ∗ then we may abbreviate “∃푤0, 푤1 ∈ 훴 : (푞0, 푐0, 푤0) ⊢휃 (푞1, 푐1, 푤1)” by “(푞0, 푐0) ⊢휃 (푞1, 푐1)”. Similarly to our definitions for PMCFGs (definition 2.17) and FSAs (definition 2.26), the language of an automaton with data storage is defined in terms of valid strings of transitions (i.e. runs) and a function yield.

Definition 2.38 (automata with data storage, language). ′ ′ ∗ ′ • A run (of ℳ) is a string (푞1, 푖1, 푢1, 푞1)…(푞푘, 푖푘, 푢푘, 푞푘) ∈ 푇 such that 푞푗 = 푞푗+1 for each 푗 ∈ [푘 − 1] and 푖1 ; … ; 푖푘 ≠ ∅.

• The set of all runs of ℳ is denoted by Runsℳ.

Let 휃 = (푞0, 푖1, 푢1, 푞1)(푞1, 푖2, 푢2, 푞2)…(푞푘−1, 푖푘, 푢푘, 푞푘) ∈ Runsℳ be a run of ℳ.

• We call 휃 accepting if 푞0 ∈ 푄i, 푞푘 ∈ 푄f, and (푖1 ; … ; 푖푘) ∩ (퐶i × 퐶f) ≠ ∅. acc • The set of all accepting runs of ℳ is denoted by Runsℳ.

• The yield of 휃, denoted by yield(휃), is the string 푢1…푢푘. −1 • The set of all runs of ℳ that yield 푤, i.e. yield (푤) ∩ Runsℳ, is denoted by Runsℳ(푤). −1 acc • The set of all accepting runs of ℳ that yield 푤, i.e. yield (푤) ∩ Runsℳ, is denoted by acc Runsℳ(푤). acc • The language of ℳ, denoted by ℒ(ℳ), is the set {yield(휃) ∣ 휃 ∈ Runsℳ}. □

∗ Note that a string 휃 ∈ 푇 is called a run of ℳ if and only if the relation ⊢휃 is non-empty. ∗ ′ Furthermore, a string 휃 ∈ 푇 is called an accepting run if and only if there are 푞 ∈ 푄i, 푞 ∈ 푄f, ′ ∗ ′ ′ 푐 ∈ 퐶i, 푐 ∈ 퐶f, and 푤 ∈ 훴 such that (푞, 푐, 푤) ⊢휃 (푞 , 푐 , 휀). Finally, the language of ℳ is the ∗ ∗ ′ ′ set of all words 푤 ∈ 훴 for which there are 휃 ∈ 푇 , 푞 ∈ 푄i, 푞 ∈ 푄f, 푐 ∈ 퐶i, and 푐 ∈ 퐶f such ′ ′ that (푞, 푐, 푤) ⊢휃 (푞 , 푐 , 휀). The set of all languages recognised by (DS, 훴)-automata is denoted by REC(DS, 훴).

39 2 Preliminaries

⟨push(∗), a⟩

⟨bottom, 휀⟩ start 1 2

⟨pop,a⟩ ̄

Figure 2.39: Graph of the (PD(훤 ), 훴)-automaton from example 2.40

Example 2.40 (). Let 훴 = {a,a} ̄ and 훤 = {∗}. Consider the (PD(훤 ), 훴)- automaton ℳ = ({1, 2}, PD(훤 ), 훴, {1}, {2}, 푇 ) where 푇 contains exactly the following three transitions:

(1, push(∗), a, 1) (1, pop,a, ̄ 1) (1, bottom, 휀, 2).

The graph of ℳ is shown in figure 2.39. The label of each edge in the graph is a tuplethat contains the instruction that is executed by the corresponding transition and the string of terminals that is read. ℳ is equivalent to the CFG shown in example 2.8. □

⟨push(훤 ), a⟩ ⟨pop(a), a′⟩

⟨id, #⟩ ⟨bottom, 휀⟩ start 1 2 3

⟨push(훤 ), b⟩ ⟨pop(b), b′⟩

Figure 2.41: Graph of the automaton with data storage ℳ from example 2.42

Example 2.42 (taken from Den17a, example 7). Let 훴 = {a, b, #, a′, b′} and 훤 = {a, b}. Recall the data storage PD″(훤 ) from example 2.34 and let ℳ = ([3], PD″(훤 ), 훴, {1}, {3}, 푇 ) where 푇 contains exactly the following six transitions

(1, push(훤 ), a , 1) (1, push(훤 ), b , 1) (1, id , #, 2) (2, pop(a), a′, 2) (2, pop(b), b′, 2) (2, bottom, 휀 , 3)

and pop(훾) = top(훾) ; pop for every 훾 ∈ 훤. The graph of ℳ is shown in figure 2.41. The language recognised by ℳ is ℒ(ℳ) = {푢#푣 ∣ 푢 ∈ {a, b}∗, 푣 ∈ {a′, b′}∗, |푢| = |푣|}. The automaton ℳ recognises a given word 푢#푣 (with 푢 ∈ {a, b}∗ and 푣 ∈ {a′, b′}∗) as follows: In state 1, it reads the prefix 푢 and constructs any element of 훤 ∗ of length |푢| on the pushdown non-deterministically. It then reads # and goes to state 2. In state 2, it reads a′ for each a on the pushdown and it reads b′ for each b on the pushdown until the pushdown is empty. Since the

40 2.2 Language devices

pushdown can contain any string over {a, b} of length |푢|, ℳ can read any string of {a′, b′} of length |푢|, ensuring that |푢| = |푣|. □

An automaton with data storage ℳ = (푄, DS, 훴, 푄i, 푄f, 푇 ) is called acc ∗ • unambiguous if |Runsℳ(푤)| ≤ 1 for each 푤 ∈ 훴 , • ambiguous if it is not unambiguous, acc ∗ • finitely ambiguous if Runsℳ(푤) is finite for every 푤 ∈ 훴 , and • infinitely ambiguous if it is not finitely ambiguous. Note that examples 2.40 and 2.42 are both unambiguous.

Normal forms of data storage Proposition 2.43 (taken from Den17a, proposition 9). For every data storage DS there is a deterministic data storage det(DS) such that the class of (DS, 훴)-recognisable languages is equal to the class of (det(DS), 훴)-recognisable languages.

Proof. Let DS = (퐶, 퐼, 퐶i, 퐶f). Using a power set construction, we obtain the deterministic ′ ′ data storage det(DS) = (풫(퐶), det(퐼), 퐶i , 퐶f ) where • det(퐼) = {det(푖) ∣ 푖 ∈ 퐼} with det(푖) = {(푑, 푖(푑)) ∣ 푑 ⊆ 퐶, 푖(푑) ≠ ∅} for every 푖 ∈ 퐼, ′ • 퐶i = {퐶i}, and ′ • 퐶f = {푑 ∣ 푑 ⊆ 퐶, 푑 ∩ 퐶f ≠ ∅}. ′ ′ Let ℳ = (푄, DS, 훴, 푄i, 푄f, 푇 ) and ℳ = (푄, det(DS), 훴, 푄i, 푄f, 푇 ) automata with data storage. We say that ℳ and ℳ′ are related if 푇 ′ = det(푇 ) = {det(휏) ∣ 휏 ∈ 푇 } where det(휏) = (푞, det(푖), 푢, 푞′) for each 휏 = (푞, 푖, 푢, 푞′) ∈ 푇. Clearly, for every (DS, 훴)-automaton there is an (det(DS), 훴)-automaton such that both are related, and vice versa. ′ Now let ℳ = (푄, DS, 훴, 푄i, 푄f, 푇 ) and ℳ = (푄, det(DS), 훴, 푄i, 푄f, det(푇 )) be automata with data storage. Note that ℳ and ℳ′ are related. We extend det: 푇 → det(푇 ) to a function det: 푇 ∗ → (det(푇 ))∗ by point-wise application. We can show for every 휃 ∈ 푇 ∗ by induction on the length of 휃 that for each 푞, 푞′ ∈ 푄, 푐, 푐′ ∈ 퐶, and 푤, 푤′ ∈ 훴∗, the following holds:

′ ′ ′ ′ ′ ′ ′ ′ (푞, 푐, 푤) ⊢휃 (푞 , 푐 , 푤 ) ⟺ ∀푑 ∋ 푐: ∃!푑 ∋ 푐 : (푞, 푑, 푤) ⊢det(휃) (푞 , 푑 , 푤 ) (2.1)

The quantification “∃!푑′ ∋ 푐′” should be read as “there is exactly one 푑′ that contains 푐′ as an element”. Induction base: The induction base is given by the sequence of length 0. Induction step: We assume that (2.1) holds for all sequences of length 푛. Let 푞, 푞′ ∈ 푄, 푐, 푐′ ∈ 퐶, 푤, 푤′ ∈ 훴∗, 휃 ∈ 푇 푛, and 휏 ∈ 푇. We distinguish two cases. ′ ′ ′ ∗ Case 1: (푞, 푐, 푤) ⊢휃휏 (푞 , 푐 , 푤 ). Then there are ̄푞∈ 푄, ̄푐∈ 퐶, and 푢, 푣 ∈ 훴 such that ′ ′ ′ ′ ′ ′ ′ 휏 = ( ̄푞, 푖, 푣, 푞 ), 푤 = 푢푣푤 , and (푞, 푐, 푢푣푤 ) ⊢휃 ( ̄푞, ̄푐, 푣푤 ) ⊢휏 (푞 , 푐 , 푤 ). By (2.1) we ̄ ′ ̄ ′ know that for every 푑 ∋ 푐 there is exactly one 푑 ∋ ̄푐 with (푞, 푑, 푢푣푤 ) ⊢det(휃) ( ̄푞, 푑, 푣푤 ). ̄ ′ ′ ′ ′ ′ ′ It remains to be shown that ( ̄푞, 푑, 푣푤 ) ⊢det(휏) (푞 , 푑 , 푤 ) for exactly one 푑 ∋ 푐 . By construction, we know that det(휏) = ( ̄푞, det(푖), 푢, 푞′). Then (푑,̄ 푑′) ∈ det(푖) by the

41 2 Preliminaries

′ definition of ⊢det(휏) and 푑 is unique by construction of det(푖). By definition of det(푖), the fact that ( ̄푐, 푐′) ∈ det(푖), and ̄푐∈ 푑,̄ we have that 푐′ ∈ 푑′. Finally, det(휏) clearly goes from state ̄푞 to state 푞′ and reads 푣. ′ ′ ′ Case 2: ¬(푞, 푐, 푤) ⊢휃휏 (푞 , 푐 , 푤 ). This can have two reasons. ∗ Case 2.1: There is no ( ̄푞, ̄푐, ̄푤) ∈ 푄×퐶 ×훴 such that (푞, 푐, 푤) ⊢휃 ( ̄푞, ̄푐, ̄푤). Then it ̄ ̄ follows by (2.1) that for every 푑 ∋ 푐 there is no 푑 ∋ ̄푐 such that (푞, 푑, 푤) ⊢det(휃) ( ̄푞, 푑, ̄푤) ′ ′ ′ ′ ′ and hence there is no 푑 ∋ 푐 such that (푐, 푑, 푤) ⊢det(휃휏) (푐 , 푑 , 푤 ). Case 2.2: There are ( ̄푞, ̄푐, 푢, 푣) ∈ 푄 × 퐶 × 훴∗ × (훴 ∪ {휀}) such that 푤 = 푢푣푤′ and ′ (푞, 푐, 푤) ⊢휃 ( ̄푞, ̄푐, 푣푤 ), but for none of them exists a transition 휏 = ( ̄푞,푣,푝,푟,푞) such ′ ′ ′ ′ that ( ̄푞, ̄푐, 푣푤 ) ⊢휏 (푞 , 푐 , 푤 ). Then by (2.1) we know that for every 푑 ∋ 푐 there is ̄ ′ ̄ ′ some 푑 ∋ ̄푐 such that (푞, 푑, 푢푣푤 ) ⊢det(휃) ( ̄푞, 푑, 푣푤 ). It remains to be shown that there ′ ′ ′ ′ ′ ′ ′ is no 푑 ∋ 푐 such that ( ̄푞, ̄푐, 푣푤 ) ⊢det(휏) (푞 , 푐 , 푤 ). If such a 푑 were to exist, then there would exist a transition det(휏) = ( ̄푞, det(푖), 푣, 푞′) ∈ det(푇 ) such that (푑,̄ 푑′) ∈ det(푖). Then by the construction there would be some ̄푐∈ 푑 ̄ such that ( ̄푐, 푐′) ∈ 푖 which contradicts the assumption that there is no transition 휏 = ( ̄푞, 푖, 푣, 푞) such that ′ ′ ′ ′ ( ̄푞, ̄푐, 푣푤 ) ⊢휏 (푞 , 푐 , 푤 ). ′ ′ ′ We obtain ℒ(ℳ) = ℒ(ℳ ) from (2.1) and by the definitions of 퐶i and 퐶f . ∎ For practical reasons it might be preferable to avoid the construction of power sets. The proof of the following proposition shows a construction for boundedly non-deterministic data storages.

Proposition 2.44 (taken from Den17a, proposition 10). Let DS = (퐶, 퐼, 퐶i, 퐶f) be a boundedly non-deterministic data storage. There is a deterministic data storage DS′ with the same set of storage configurations such that the class of (DS, 훴)-recognisable languages is contained in the class of (DS′, 훴)-recognisable languages.

′ ′ ′ Proof. We construct the deterministic data storage DS = (퐶, 퐼 , 퐶i, 퐶f) where 퐼 is constructed as follows: Let 푖 ∈ 퐼 and 푖(푐) , …, 푖(푐) be a fixed enumeration of the elements of 푖(푐) 1 푚푟,푐 for each 푐 ∈ 퐶. Furthermore, let 푘 = max{|푖(푐)| ∣ 푖 ∈ 퐼, 푐 ∈ 퐶}. Since DS is boundedly ′ non-deterministic, the number 푘 is defined. We define for each 푗 ∈ [푘] an instruction 푖푗 by ′ ′ ′ ′ 푖푗(푐) = 푖(푐)푗 if 푗 ≤ 푚푖,푐 and 푖푗(푐) = undefined otherwise. Let 퐼 contain the instruction 푖푗 for every 푖 ∈ 퐼 and 푗 ∈ [푘]. Now let ℳ = (푄, DS, 훴, 푄i, 푄f, 푇 ) be an (DS, 훴)-automaton. We ′ ′ ′ ′ ′ construct the (DS , 훴)-automaton ℳ = (푄, DS , 훴, 푄i, 푄f, 푇 ) where 푇 contains for every ′ ′ ′ ′ transition 휏 = (푞, 푖, 푢, 푞 ) ∈ 푇 and 푗 ∈ [푘] the transition 휏푗 = (푞, 푖푗, 푢, 푞 ). Then

⊢ = ⋃ ⊢ = ⋃ ⋃ ⊢ ′ = ⋃ ⊢ ′ = ⊢ ′ ℳ 휏∈푇 휏 휏=(푞,푖,푢,푞′)∈푇 푗∈[푘] 휏푗 휏′∈푇 ′ 휏 ℳ and thus ℒ(ℳ) = ℒ(ℳ′). ∎

The above construction fails for data storages that are not boundedly nd. Consider thedata ′ storage PD (훤 ) from example 2.33. Then there exists no bound 푘pop∗ ∈ ℕ as would be required by the proof. The containment shown in proposition 2.44 is strict as the following example reveals.

42 2.2 Language devices

⟨push(a), a⟩ ⟨pop(a), a⟩

⟨id, 휀⟩ ⟨bottom, 휀⟩ start 1 2 3

⟨push(b), b⟩ ⟨pop(b), b⟩

† ′ Figure 2.46: Graph of the (PD훤, 훴)-automaton ℳ from example 2.45

Example 2.45 (taken from Den17a, example 11, due to Nederhof [Ned17]). Recall the data storage PD″(훤 ) from example 2.34. Consider the similar data storage

PD†(훤 ) = (훤 ∗,{bottom, id, push(훤 )} ∪ {pop(훾) ∣ 훾 ∈ 훤 }, 휀) where 훤 is finite and pop(훾) = {(훾푤, 푤) ∣ 훾 ∈ 훤 , 푤 ∈ 훤 ∗} for each 훾 ∈ 훤. We can again think of 훤 ∗ as a pushdown. Now, starting from PD†(훤 ), we construct the deterministic data storage (PD†(훤 ))′ by the construction given in proposition 2.44. We thereby obtain

(PD†(훤 ))′ = (훤 ∗,{bottom, id} ∪ {push(훾), pop(훾) ∣ 훾 ∈ 훤 }, 휀).

The only difference between PD†(훤 ) and (PD†(훤 ))′ is that the instruction push(훤 ) is replaced by the |훤 | instructions in the set {push(훾) ∣ 훾 ∈ 훤 }. Now consider the sets 훴 = {a, b} and 훤 = 훴, and the language 퐿 = {푤푤R ∣ 푤 ∈ 훴∗} where 푤R denotes the reverse of 푤 for each 푤 ∈ 훴∗. The following ((PD†(훤 ))′, 훴)- automaton ℳ′ recognises 퐿 and thus demonstrates that 퐿 is ((PD†(훤 ))′, 훴)-recognisable: ℳ′ = ([3], PD†(훤 ))′, 훴, {1}, {3}, 푇 ′) with

푇 ′: (1, a, push(a), 1) (1, b, push(b), 1) (1, 휀, id , 2) (2, a, pop(a) , 2) (2, b, pop(b) , 2) (2, 휀, bottom, 3).

The graph of ℳ′ is shown in figure 2.46. In state 1, ℳ′ stores the input in reverse on the pushdown until it decides non-deterministically go to state 2. In state 2, ℳ′ accepts the sequence of symbols that is stored on the pushdown. We can only enter the final state 3 if the pushdown is empty, thus ℳ′ recognises 퐿. On the other hand, there is no (PD†(훤 ), 훴)-automaton ℳ that recognises 퐿. Assume that some (PD†(훤 ), 훴)-automaton ℳ recognises 퐿. Then ℳ would have to encode the first half of the input in the pushdown since this unbounded information can not be stored in the states. The only instruction that adds information to the pushdown is push(훤 ). Thus, in the first half of the input, whenever we read the symbol a, we have to execute push(훤 ); and whenever we read the symbol b, we also have to execute push(훤 ). This offers no means of distinguishing the two situations (reading symbol a and reading symbol a) and hence no means of encoding the first half of the input in the pushdown. □

Proposition 2.47 (taken from Den17a, proposition 12). Let DS = (퐶, 퐼, 퐶i, 퐶f) be a data

43 2 Preliminaries

storage and 퐿 be an (DS, 훴)-recognisable language. If 퐶 is finite, then 퐿 is recognisable (by a finite state automaton).

Proof. By proposition 2.44, we assume that DS is deterministic. We will use a product construction. In particular, the states of the constructed FSA are elements of 푄 × 퐶. ′ ′ Let ℳ = (푄, DS, 훴, 푄i, 푄f, 푇 ). We construct the fsa ℳ = (푄×퐶, 훴, 푄i ×퐶i, 푄f ×퐶f, 푇 ) where 푇 ′ = {((푞, 푐), 푢, (푞′, 푐′)) ∣ (푞, 푖, 푢, 푞′) ∈ 푇 , (푐, 푐′) ∈ 푖}. Now let 푐 ∈ 퐶, 휏 = (푝, 푖, 푢, 푞) ∈ 푇, and 휏 ′ = (푝′, 푢′, 푞′) ∈ 푇 ′. We call 휏 and 휏 ′ 푐-related if ′ ′ ′ ′ 푝 = (푝, 푐), and 푞 = (푞, 푖(푐)). Furthermore, let 휃 ∈ Runsℳ and 휃 ∈ Runsℳ′ . If 휃 = 휀 = 휃 , then we call 휃 and 휃′ 푐-related. If 휃 = 휏휂 and 휃′ = 휏 ′휂′ for some 휏 = (푝, 푖, 푢, 푞) ∈ 푇 and 휏 ′ ∈ 푇 ′, 휏 and 휏 ′ are 푐-related, and 휂 and 휂′ are 푖(푐)-related, then 휃 and 휃 are 푐-related. Otherwise, 휃 and 휃′ are not 푐-related. ′ It is easy to see that for every 푐 ∈ 퐶 and 휃 ∈ Runsℳ, there is a 휃 ∈ Runsℳ′ such that 휃 ′ ′ and 휃 are 푐-related. Also, it is easy to see that for every 푐 ∈ 퐶 and 휃 ∈ Runsℳ′ , there is a ′ 휃 ∈ Runsℳ such that 휃 and 휃 are 푐-related. ′ ′ Let 푐 ∈ 퐶, 휃 ∈ Runsℳ, and 휃 ∈ Runsℳ′ such that 휃 and 휃 are 푐-related. We show by induction on the length of 휃 that the following holds for each 푞, 푞′ ∈ 푄, 푐′ ∈ 퐶, and 푤, 푤′ ∈ 훴∗:

′ ′ ′ ′ ′ ′ (푞, 푐, 푤) ⊢휃 (푞 , 푐 , 푤 ) ⟺ ((푞, 푐), 푤) ⊢휃′ ((푞 , 푐 ), 푤 ) (2.2)

Induction base: If 휃 = 휀, then 휃′ = 휀 and the claim follows trivially. ′ ′ Induction step: Let 푐 ∈ 퐶, 휃 ∈ Runsℳ, and 휃 ∈ Runsℳ′ such that 휃 and 휃 are 푐-related and ′ ′ ′ ′ (2.2) holds. Furthermore, let 휏 ∈ 푇 and 휏 ∈ 푇 such that 휃휏 ∈ Runsℳ, 휃 휏 ∈ Runsℳ′ , and 휃휏 and 휃′휏 ′ are 푐-related. We derive

′ ′ ′ (푞, 푐, 푤) ⊢휃휏 (푞 , 푐 , 푤 ) ∗ ′ ′ ′ ⟺ ∃ ̄푞∈ 푄, ̄푐∈ 퐶, ̄푤∈ 훴 : (푞, 푐, 푤) ⊢휃 ( ̄푞, ̄푐, ̄푤) ⊢휏 (푞 , 푐 , 푤 ) ⟺ ∃ ̄푞∈ 푄, ̄푐∈ 퐶, ̄푤∈ 훴∗, 푖 ∈ 퐼: ′ ′ (푞, 푐, 푤) ⊢휃 ( ̄푞, ̄푐, ̄푤) ∧ 휏 = ( ̄푞, 푖, 푢, 푞 ) ∧ 푖( ̄푐) = 푐 ⟺ ∃ ̄푞∈ 푄, ̄푐∈ 퐶, ̄푤∈ 훴∗, 푖 ∈ 퐼: ′ ′ ′ ′ (푞, 푐, 푤) ⊢휃 ( ̄푞, ̄푐, ̄푤) ∧ (( ̄푞, 푖), 푢, 푞 ) ∈ 푇 ∧ 푖( ̄푐) = 푐 (by construction of ℳ ) ⟺ ∃ ̄푞∈ 푄, ̄푐∈ 퐶, ̄푤∈ 훴∗, 푖 ∈ 퐼: ′ ′ ′ ′ (푞, 푐, 푤) ⊢휃 ( ̄푞, ̄푐, ̄푤) ∧ 휏 = (( ̄푞, 푖), 푢, 푞 ) ∈ 푇 ∧ 푖( ̄푐) = 푐 (by definition of relatedness) ⟺ ∃ ̄푞∈ 푄, ̄푐∈ 퐶, ̄푤∈ 훴∗, 푖 ∈ 퐼: ′ ′ ′ ′ ((푞, 푐), 푤) ⊢휃′ (( ̄푞, ̄푐), ̄푤) ∧ 휏 = (( ̄푞, 푖), 푢, 푞 ) ∈ 푇 ∧ 푖( ̄푐) = 푐 (by induction hypothesis) ∗ ′ ′ ′ ⟺ ∃ ̄푞∈ 푄, ̄푐∈ 퐶, ̄푤∈ 훴 , 푖 ∈ 퐼: ((푞, 푐), 푤) ⊢휃′ (( ̄푞, ̄푐), ̄푤) ⊢휏′ ((푞 , 푐 ), 푤 )

⟺ ((푞, 푐), 푤) ⊢휃′휏′ (( ̄푞, ̄푐), ̄푤)

44 2.2 Language devices

We then derive

ℒ(ℳ) ∗ = {푤 ∈ 훴 ∣ Runsℳ(푤) ≠ ∅} ∗ ′ ′ ′ ′ = {푤 ∈ 훴 ∣ ∃휃 ∈ Runsℳ, 푞 ∈ 푄i, 푞 ∈ 푄f, 푐 ∈ 퐶i, 푐 ∈ 퐶f: (푞, 푐, 푤) ⊢휃 (푞 , 푐 , 휀)} (by definition 2.38) ∗ ′ ′ ′ ′ ′ = {푤 ∈ 훴 ∣ ∃휃 ∈ Runsℳ′ , 푞 ∈ 푄i, 푞 ∈ 푄f, 푐 ∈ 퐶i, 푐 ∈ 퐶f: ((푞, 푐), 푤) ⊢휃′ ((푞 , 푐 ), 휀)} (by the above) ∗ ′ = {푤 ∈ 훴 ∣ Runsℳ′ (푤) ≠ ∅} = ℒ(ℳ ). ∎

Instruction normal form An automaton with data storage is said to be in instruction normal form if each transition contains at most one instruction.

Definition 2.48. Let ℳ = (푄, DS, 훴, 푄i, 푄f, 푇 ) be an automaton with data storage and DS = ∗ (퐶, 퐼, 퐶i, 퐶f). We say that ℳ is in instruction normal form if 푇 ⊆ 푄×(퐼 ∪{id})×훴 ×푄. □ Proposition 2.49. Let DS be a data storage and 훴 be a set. For each (DS, 훴)-automaton, there is an equivalent (DS, 훴)-automaton in instruction normal form.

Proof. Let ℳ = (푄, DS, 훴, 푄i, 푄f, 푇 ) be a (DS, 훴)-automaton. ′ ′ ′ (Construction) We construct the (DS, 훴)-automaton ℳ = (푄 , DS, 훴, 푄i, 푄f, 푇 ) where ′ ′ ̄ ′ 푄 = 푄 ∪ (푇 × ℕ+) and 푇 is the smallest set 푇 such that for each 휏 = (푞, 푖, 푢, 푞 ) ∈ 푇, the following two statements hold: (i) If 푖 ∈ {id} ∪ 퐼, then 휏 ∈ 푇.̄

(ii) If 푖 ∉ {id} ∪ 퐼, then there are 푖1, …, 푖푘 ∈ 퐼 such that 푖 = 푖1 ; ⋯ ; 푖푘. Then ′ ̄ • 휏1 = (푞, 푖1, 푢, ⟨휏, 1⟩) is in 푇, ′ ̄ • 휏푛 = (⟨휏, 푛 − 1⟩, 푖푛, 휀, ⟨휏, 푛⟩) is in 푇 for every 푛 ∈ {2, …, 푘 − 1}, and ′ ′ ̄ • 휏푘 = (⟨휏, 푘 − 1⟩, 푖푘, 휀, 푞 ) is in 푇. (Correctness of the construction) Let us consider the function 푔: 푇 → (푇 ′)∗ such that

휏 if 푖 ∈ {id} ∪ 퐼 푔(휏) = { ′ ′ } 휏1⋯휏푘 if 푖 = 푖1 ; ⋯ ; 푖푘 with 푖1, …, 푖푘 ∈ 퐼 and 푘 ≥ 2

for every 휏 = (푞, 푖, 푢, 푞′) ∈ 푇. Let 푔 also denote the superset of 푔 that is a homomorphism from 푇 ∗ to (푇 ′)∗. Clearly, for each 휏 ∈ 푇, 푔(휏) is a run in ℳ′ and furthermore, 휏 and 푔(휏) have the same state behaviour and storage behaviour. Hence, Runsℳ′ = {푔(휃) ∣ 휃 ∈ Runsℳ} acc acc and since the sets of initial and final states are the same, also Runsℳ′ = {푔(휃) ∣ 휃 ∈ Runsℳ}. We derive

′ ′ ′ acc ℒ(ℳ ) = {yield(휃 ) ∣ 휃 ∈ Runsℳ′ } (by definition 2.38) acc = {yield(푔(휃)) ∣ 휃 ∈ Runsℳ} (by the above argument)

45 2 Preliminaries

acc ′ = {yield(휃) ∣ 휃 ∈ Runsℳ} (by definitions of 푔 and 푇 ) = ℒ(ℳ) ∎

2.2.5 Turing machines Turing machines were introduced by Turing [Tur37; Tur38] as a model to investigate com- putability. There are many equivalent models of Turing machines. We recall here a simpleform of Turing machines with one semi-infinite tape.

Definition 2.50 (Turing machine, syntax, taken from HU69, section 6.2). A Turing machine is

a tuple ℳ = (푄, 훴, 훤 , 푄i, 푄f, 푇 ) were • 푄 and 훤 are sets, whose elements we call states and tape symbols, respectively, • there is a designated type symbol □ in 훤 that is called blank, • 훴 ⊆ 훤 ∖ {□}, whose elements we call input symbols,

• 푄i ⊆ 푄, whose elements we call initial states,

• 푄f ⊆ 푄, whose elements we call final states, and • 푇 is a finite subset of 푄 × 훤 × 훤 × {−1, +1} × 푄. □

For the remainder of this section let ℳ = (푄, 훴, 훤 , 푄i, 푄f, 푇 ) be an arbi- trary Turing machine.

A configuration of ℳ is a tuple (푞, 푡, 푛) ∈ 푄 × (ℕ+ → 훤 ) × ℕ+. We call 푞 the current state, 푡 the current type, and 푛 the current square of the configuration (푞, 푡, 푛).

Definition 2.51 (Turing machine, language, taken from HU69, section 6.2). Consider the ′ ′ transition 휏 = (푞, 훾, 훾 , 푚, 푞 ) ∈ 푇. The transition relation of 휏 (w.r.t. ℳ), denoted by ⊢휏, is the endorelation on 푄 × (ℕ+ → 훤 ) × ℕ+ such that (푞1, 푡1, 푛1) ⊢휏 (푞2, 푡2, 푛2) if and only if ′ • 푞1 = 푞 and 푞2 = 푞 , and ′ • 푡1(푛1) = 훾, 푡2 = 푡1[푛1/훾 ], and 푛2 = 푛1 + 푚.

The computation relation of ℳ, denoted by ⊢ℳ, is ⋃휏∈푅 ⊢휏. The language of ℳ, denoted by ℒ(ℳ), is the set

∗ ′ ′ ∗ ′ ′ {푤 ∈ 훴 ∣ ∃푞i ∈ 푄i, 푞f ∈ 푄f, 푡 ∈ ℕ+ → 훤 , 푛 ∈ ℕ+: (푞i, 푡i(푤), 1) ⊢ℳ (푞f, 푡 , 푛 )} where 푡i(푤)(푛) = 휎 if 푛 ≤ |푤| and 휎 is the 푛-th symbol of 푤; and 푡i(푤)(푛) = □ if 푛 > |푤|. □

Note that Turing machines and (Tape(훤 ), 훴)-automata are similar (cf. example 2.32). The only two differences are that • the input string is initially written on the tape in a Turing machine while it is read during the run in a (Tape(훤 ), 훴)-automaton and • each transition in a Turing machine reads, writes, and changes the current square while a transition in a (Tape(훤 ), 훴)-automaton can perform any combination of those operations.

46 2.3 Weighted language devices

2.2.6 String homomorphisms Let 훴 and 훥 be sets and 푔: 훴 → 훥∗. Furthermore, let ̂푔: 훴∗ → 훥∗ be the function that is defined for each 푘 ∈ ℕ and 휎1, …, 휎푘 ∈ 훴 by the equation

̂푔(휎1…휎푘) = 푔(휎1) ∘ … ∘ 푔(휎푘).

We call ̂푔 a (훴, 훥)-string homomorphism.16 We call ̂푔 alphabetic if there is a function ℎ: 훴 → 훥 ∪ {휀} such that ℎ̂ =. ̂푔 For any sets 훴 and 훥, we fix the following two sets of functions: • HOM(훴, 훥) denotes the set of (훴, 훥)-string homomorphisms and • αHOM(훴, 훥) denotes the set of alphabetic (훴, 훥)-string homomorphisms.

∗ Lemma 2.52. Let 훴 be a set, 푠, 푘 ∈ ℕ+ be numbers, and 퐿 ⊆ 훴 be a language. The following are equivalent (i) 퐿 ∈ (푠, 푘)-MCFL(훴). (ii) There is a set 훥, an alphabetic homomorphism ℎ ∈ αHOM(훥, 훴), and an unambiguous MCFG 퐺 ∈ (푠, 푘)-MCFG(훥) such that 퐿 = ℎ(ℒ(퐺)).

Proof. This is a corollary of lemma 2.74. ∎

2.3 Weighted language devices

Grammars were, as the name suggests, initially conceived to describe the syntax of sentences in natural languages such as English [Cho59]. Natural languages are inherently ambiguous and so is their syntax. Consider, for example, the numerous meanings of the word “match”. Like many words in the English language, “match” can be used both as a verb (“My socks don’t match.”) and as a noun (“He was no match for her.”). Consequently, grammars are designed to allow ambiguity as well. Chomsky and Schützenberger [CS63, section 2.3] proposed that a weighted language can be associated with each grammar that determines for each word the count of its different syntactic realisations (or derivation trees). Another way of dealing with the ambiguity of grammars is to assign a probability to each syntactic realisation of a given sentence. This probability may indicate how likely it is to observe this syntactic realisation in the real world (or in a subdomain thereof). This is achieved by attaching a probability to each rule of the grammar [Sup72].The probability of a derivation tree in such a probabilistic grammar is then obtained by multiplying the probabilities of occurrences of rules in the derivation tree. The probability of a word ina probabilistic grammar is the sum of probabilities of all derivation trees of that word. Similarly, we can make an automaton probabilistic by assigning a probability to each transition. The probabilities of runs and of words are then defined analogously to that of derivation trees and of words in probabilistic grammars.

16A (훴, 훥)-string homomorphism is a homomorphism between the monoid (훴∗, ∘, 휀) and the monoid (훥∗, ∘, 휀) in the sense of section 2.1.4.

47 2 Preliminaries

Multiplicities and probabilities are not the only ways to deal with ambiguity. The weights of rules or transitions may also come from other algebras such as semirings [Goo99], strong bimonoids [DSV10], multioperator monoids [Kui99; SVF09], or valuation monoids [DM11; DV14]. In our definitions, we will use strong bimonoids.

For the remainder of this section, let (풜, ⊕, ⊙, ퟘ, ퟙ) be an arbitrary strong bimonoid.

The central objects of discussion in this section are weighted languages17 and weighted language devices. An 풜-weighted 훴-language is a function 퐿: 훴∗ → 풜 where 훴 is a set. A weighted language device is a syntactic object, usually a tuple, that can be written as a finite string. We associate a weighted language ⟦퐷⟧ with each weighted language device 퐷 and say that 퐷 is a finite representation of ⟦퐷⟧. Two weighted language devices 퐷1 and 퐷2 are called equivalent if their associated weighted languages are the same, i.e. ⟦퐷1⟧ = ⟦퐷2⟧. The following three subsections contain the formal definitions for weighted versions of PMCFGs, FSAs, and automata with data storage based on the above intuition. The subsections will be very similar to each other since we chose our definitions of PMCFGs, FSAs, and automata with data storage in section 2.2 to be analogous to each other.

2.3.1 Weighted parallel multiple context-free languages Definition 2.53 (weighted PMCFGs, syntax). An 풜-weighted parallel multiple context-free

grammar (short: 풜-wPMCFG) is a tuple 퐺 = (푁, 훴, 푁i, 휇) where 휇 is a function from 푁 × ∗ 18 TCR(훴) × 푁 to 풜, called weight assignment, and 퐺uw = (푁, 훴, 푁i, supp(휇)) is a PMCFG. 19 We call 퐺uw the underlying grammar of 퐺. □

∗ Let 퐺 = (푁, 훴, 푁i, 휇) be an 풜-wPMCFG and 푤 ∈ 훴 . The set of rules of 퐺 is supp(휇). 퐺 inherits many properties from its underlying grammar. • The set of derivation trees of 퐺, denoted by D , is D . 퐺 퐺uw • The set of complete derivation trees of 퐺, denoted by Dc , is Dc . 퐺 퐺uw

• The language of 퐺, denoted by ℒ(퐺), is ℒ(퐺uw). • The set of derivation trees of 퐺 for 푤, denoted by Dc (푤), is Dc (푤). 퐺 퐺uw 퐺 is called

• 풜-weighted multiple context-free grammar (short: 풜-wMCFG) if 퐺uw is an MCFG, • 풜-weighted string-rewriting linear context-free rewriting system (short: 풜-string-wLCFRS)

if 퐺uw is a string-LCFRS,

17Weighted languages are often called “formal power series” in the literature [see e.g. Niv69; SS78; KS85, section1; DK09] as they are a generalisation of power series that was used, for example, in before it became relevant for the modelling of natural languages. However, as our intention is the description of natural languages, we will stick with the term “weighted languages”. 18 Note that (푁, 훴, 푁i, supp(휇)) being a PMCFG implies that supp(휇) is finite and hence 휇 can be represented finitely. 19 The “uw” in 퐺uw stands for unweighted.

48 2.3 Weighted language devices

• unambiguous if 퐺uw is unambiguous,

• ambiguous if 퐺uw is ambiguous,

• finitely ambiguous of 퐺uw is finitely ambiguous,

• monotonous if 퐺uw is monotonous,

• strongly monotonous if 퐺uw is strongly monotonous, and

• of fanout 푠 if 퐺uw is of fanout 푠.

Definition 2.54 (wPMCFGs, weighted language). Let 퐺 = (푁, 훴, 푁i, 휇) be an 풜-wPMCFG. The weight assignment of 퐺 for derivations, denoted by wt퐺, is the function from D퐺 to 풜 that is recursively defined for every 푑 = 푟(푑1, …, 푑푘) ∈ D퐺 by

wt퐺(푑) = 휇(푟) ⊙ wt퐺(푑1) ⊙ wt퐺(푑2) ⊙ … ⊙ wt퐺(푑푘).

If 퐺 is finitely ambiguous or 풜 is complete,20 we define the weighted language of 퐺, denoted by ⟦퐺⟧, as the function from 훴∗ to 풜 where for each 푤 ∈ 훴∗ we have

⟦퐺⟧(푤) = ⨁ wt퐺(푑). □ c 푑∈D퐺(푤)

Example 2.55 (wPMCFG, cf. example 2.18). Consider the Pr-wPMCFG 퐺 = (푁, 훴, {푆}, 휇) where 푁 = {푆, 퐴} with sort(푆) = 1 and sort(퐴) = 2, 훴 = {a, b}, |supp(휇)| = 3, and

휌1 = 푆 → [푥1,1푥1,2](퐴) 휇(휌1) = 1

휌2 = 퐴 → [푥1,1푥1,1, 푥1,2푥1,2](퐴) 휇(휌2) = 1/2

휌3 = 퐴 → [a, b]() 휇(휌3) = 1/2.

The underlying grammar of 퐺 is the PMCFG shown in example 2.18. Recall that for each 푛 ∈ ℕ, c 2푛 2푛 there is a unique derivation tree 푑푛 ∈ D퐺 with yield a b w.r.t. 퐺. The weight of such a 푑푛 푛+1 is wt퐺(푑푛) = 1/(2 ). The weighted language of 퐺 is immediately given by wt퐺 since 퐺 is unambiguous:

푛 푛 1/(2푛+1) if 푤 = a2 b2 for some 푛 ∈ ℕ, ⟦퐺⟧(푤) = { } □ 0 otherwise.

Example 2.56 (wMCFG, cf. example 2.20, taken from Den17a, example 2.5). Consider the Pr-wMCFG 퐺 = (푁, 훴, {푆}, 휇) where 푁 = {푆, 퐴, 퐵} with sort(퐴) = sort(퐵) = 2, 훴 = {a, b, c, d}, |supp(휇)| = 5, and

휌1 = 푆 → [푥1,1푥2,1푥1,2푥2,2](퐴, 퐵) 휇(휌1) = 1

휌2 = 퐴 → [a푥1,1, c푥1,2](퐴) 휇(휌2) = 1/2

휌3 = 퐴 → [휀, 휀]() 휇(휌3) = 1/2

20This restriction ensures that the sum below is defined for every 푤 ∈ 훴∗. Note that such a restriction is not

necessary for the definition of wt퐺 since our (derivation) trees are always of finite size.

49 2 Preliminaries

휌4 = 퐵 → [b푥1,1, d푥1,2](퐵) 휇(휌4) = 1/3

휌5 = 퐵 → [휀, 휀]() 휇(휌5) = 2/3.

The underlying grammar of 퐺 is the PMCFG shown in 2.20. Recall that for each 푚 ∈ ℕ and c 푚 푛 푚 푛 푛 ∈ ℕ, there is a unique derivation tree 푑푚,푛 ∈ D퐺 with yield a b c d w.r.t. 퐺, which is 푚 푛+1 shown in figure 2.21. The weight of sucha 푑푚,푛 is wt퐺(푑푚,푛) = 1/(2 ⋅ 3 ). The weighted language of 퐺 is immediately given by wt퐺 since 퐺 is unambiguous:

1/(2푚 ⋅ 3푛+1) if 푤 = a푚b푛c푚d푛 for some 푚, 푛 ∈ ℕ, ⟦퐺⟧(푤) = { } □ 0 otherwise.

Our final example here shows an ambiguous weighted MCFG.

휌2 휌2 휌2 휌2

휌2 휌1 휌2 휌1 휌2 휌1 휌2 휌1 휌2

휌2 휌1 휌1 휌2 휌2 휌2 휌2 휌1 휌1 휌2

휌1 휌1 휌1 휌1 휌1 휌1 휌1 휌1 휌1 휌1 휌1 휌1

Figure 2.57: All five derivation trees for a4 in 퐺, cf. example 2.58.

Example 2.58 (ambiguous wMCFG). Consider the Pr-wMCFG 퐺 = (푁, 훴, 푁, 휇) where 푁 = {푆}, 훴 = {a}, |supp(휇)| = 2, and

휌1 = 푆 → [푥1,1푥2,1](푆, 푆) 휇(휌1) = 1

휌2 = 푆 → [a]() 휇(휌2) = 1.

The language of 퐺 is ℒ(퐺) = 훴∗ ∖ {휀}. Figure 2.57 shows all five derivation trees for the string a4. The weight of every derivation tree is 1. Hence the weighted language of 퐺 returns the number of derivation trees for its argument. In other words, for every a푛 ∈ 훴∗ it returns the number of binary trees with 푛 leaves, i.e. the Catalan number21 of 푛 − 1:

푛 c 푛 (2푛 − 2)! ⟦퐺⟧(a ) = ∑ wt퐺(푑) = |D퐺(a )| = C푛−1 = c 푛 푚! ⋅ (푚 − 1)! 푑∈D퐺(a )

Note that even though we use the probability semiring, the weighted language ⟦퐺⟧ is clearly no probability distribution. □

For each finite set 훴, and 푘, 푠 ∈ ℕ+, we fix the following sets of grammars and languages:

21 The Catalan numbers, usually denoted by C1, C2,…, are sequence A000108 in the On-Line Encyclopedia of Integer Sequences (OEIS): https://oeis.org/A000108.

50 2.3 Weighted language devices

• (푠, 푘)-PMCFG(훴, 풜) denotes the set of 풜-weighted PMCFGs of fanout 푠 and of rank 푘 whose terminals are taken from the set 훴, • (푠, 푘)-PMCFL(훴, 풜) = {⟦퐺⟧ ∣ 퐺 ∈ (푠, 푘)-PMCFG(훴, 풜)}, • 푠-PMCFG(훴, 풜) = ⋃ (푠, 푘)-PMCFG(훴, 풜), 푘∈ℕ+ • 푠-PMCFL(훴, 풜) = {⟦퐺⟧ ∣ 퐺 ∈ 푠-PMCFG(훴, 풜)}, • PMCFG(훴, 풜) = ⋃ 푠-PMCFL(훴, 풜), 푠∈ℕ+ • PMCFL(훴, 풜) = {⟦퐺⟧ ∣ 퐺 ∈ PMCFG(훴, 풜)}, • (푠, 푘)-MCFG(훴, 풜) denotes the set of 풜-weighted MCFGs of fanout 푠 and of rank 푘 whose terminals are taken from the set 훴, • (푠, 푘)-MCFL(훴, 풜) = {⟦퐺⟧ ∣ 퐺 ∈ (푠, 푘)-MCFG(훴, 풜)}, • 푠-MCFG(훴, 풜) = ⋃ (푠, 푘)-MCFG(훴, 풜), 푘∈ℕ+ • 푠-MCFL(훴, 풜) = {⟦퐺⟧ ∣ 퐺 ∈ 푠-MCFG(훴, 풜)}, • MCFG(훴, 풜) = ⋃ 푠-MCFL(훴, 풜), and 푠∈ℕ+ • MCFL(훴, 풜) = {⟦퐺⟧ ∣ 퐺 ∈ MCFG(훴, 풜)}. We call a weighted language 퐿: 훴∗ → 풜 multiple context-free if 퐿 ∈ MCFL(훴, 풜). An 풜-weighted MCFG is called non-deleting if the composition representation that occurs in every rule is non-deleting. Seki, Matsumura, Fujii, and Kasami [Sek+91, lemma 2.2] showed that for every MCFG of fanout 푠 and rank 푘 there is a non-deleting MCFG of fanout 푠 and rank 푘. We generalise this result to 풜-weighted MCFGs.

Lemma 2.59 (taken from Den17b, lemma 2.6). For every 풜-weighted MCFG of fanout 푠 and rank 푘 there is an equivalent non-deleting 풜-weighted MCFG of fanout 푠 and rank 푘.

Proof idea. The construction by Seki, Matsumura, Fujii, and Kasami [Sek+91, lemma 2.2] forthe unweighted case decorates each non-terminal 퐴 of the original MCFG with a set 훹 ⊆ [sort(퐴)] that denotes which components (that the non-terminals 퐴 produces in the original MCFG) should be deleted. Different occurrences of the same non-terminal may be decorated with different sets. We modify the construction of Seki, Matsumura, Fujii, and Kasami [Sek+91] such that it preserves the structure of derivations. Then the weight assignment can be defined inan obvious manner.

Proof. Let 퐺 = (푁, 훴, 푁i, 휇) be an 풜-weighted MCFG of fanout 푠 and rank 푘. Furthermore, let 퐺uw = (푁, 훴, 푁i, 푅) be the underlying grammar of 퐺.

Construction: For every rule 휌 = 퐴 → [푢1, …, 푢푛](퐵1, …, 퐵ℓ) ∈ 푅 and every 푀 ⊆ [푛], let ′ • 푀 = {푗 ∈ [sort(퐵 )] ∣ 푥 ′ does not occur in 푢 …푢 } for each 푖 ∈ [ℓ]. 푖 푖 푖,푗 휄1 휄푚 and let furthermore

푅 = {퐴⟨훹⟩ → [푢 , …, 푢 ](퐵 ⟨훹 ⟩, …, 퐵 ⟨훹 ⟩) 휌,푀 휄1 휄푚 1 1 ℓ ℓ

∣ 훹푖 ∈ [sort(퐵푖)] ‧‧➡ 푈: dom(훹푖) = 푀푖 for each 푖 ∈ [ℓ]}

be the set of rules where

51 2 Preliminaries

• 푈 denotes the set of all components that occur in composition representations of 퐺,

• {휄1, …, 휄푚} = [푛] ∖ 푀,

• 휄1 < … < 휄푚, and

• 훹 = {(휄, 푢휄) ∣ 휄 ∈ 푀}. ′ ′ ′ We construct the MCFG 퐺uw = (푁 , 훴, {푆⟨∅⟩ ∣ 푆 ∈ 푁i}, 푅 ) where ′ • 푁 = {퐴⟨훹⟩ ∣ 퐴 ∈ 푁, 훹 ∈ [sort(퐴)] ‧‧➡ 푈} and ′ • 푅 = ⋃(푅휌,푀 ∣ 휌 = 퐴 → [푢1, …, 푢푛](퐵1, …, 퐵ℓ) ∈ 푅, 푀 ⊆ [푛]). ′ The construction of 퐺uw here is based on the construction for the unweighted case [lemma 2.2 Sek+91, step 2 of procedure 1]. The construction for the unweighted case enhances the left-hand side non-terminal of each rule with the set of indices of components thathave been deleted. This construction additionally annotates the actual components (∈ (푋 ∪ 훴)∗) that have been deleted. Furthermore, we allow deletion of all components of a rule; this construction may therefore create rules of fanout 0. ′ ′ ′ Intermediate proposition (S1): Let 푔: 푅 → 푅 such that 푔(휌 ) = 휌 if and only if 휌 ∈ 푅휌,푀 for s s some 푀. Furthermore, let ̂푔: T푅′ → T푅 be the function obtained by applying 푔 point-wise. We show the following statement (S1):

s For every 퐴 ∈ 푁 and 훹 ∈ [sort(퐴)] → 푈: ̂푔 is a bijection between (T푅′ )퐴⟨훹⟩ and s (T푅)퐴.

For that consider the following statement (S2):

s For every 퐴 ∈ 푁, 훹 ∈ [sort(퐴)] → 푈, and 푑 ∈ (T푅)퐴: there is exactly one ′ s ′ 푑 ∈ (T푅)퐴⟨훹⟩ such that ̂푔(푑 ) = 푑.

The statements (S1) and (S2) are clearly equivalent.

Proof of (S1): We prove (S2) by induction on the structure of 푑. Let 푑 = 휌(푑1, …, 푑ℓ) for some 푑 ∈ (Ts ) , …, 푑 ∈ (Ts ) , and 휌 = 퐴 → [푢 , …, 푢 ](퐵 , …, 퐵 ) ∈ 푅. Now 1 푅 퐵1 ℓ 푅 퐵ℓ 1 푛 1 ℓ let 푀1, …, 푀ℓ be defined as in the construction. Then for each 푖 ∈ [ℓ], the function 훹푖 has domain 푀푖. By construction, for each 푖 ∈ [ℓ] and 푚푖 ∈ 푀푖, we have that 훹푖(푚푖) is the 푚푖-th component of the composition representation of 푑푖(휀). Thus 훹1, …, 훹ℓ are uniquely defined. ′ Consequently, 휌 ∈ 푅휌,dom(훹), as obtained by the above construction, is unique defined. By induction hypothesis, there are unique derivations 푑′ ∈ (Ts ) , …, 푑′ ∈ (Ts ) 1 푅′ 퐴1⟨훹1⟩ ℓ 푅′ 퐴ℓ⟨휓ℓ⟩ ′ ′ ′ ′ ′ such that ̂푔(푑푖) = 푑푖 for each 푖 ∈ [ℓ]. Therefore, 푑 = 휌 (푑1, …, 푑ℓ) is the unique derivation s ′ in (T푅′ )퐴⟨훹⟩ such that ̂푔(푑 ) = 푑. Hence (S2) and, consequently, (S1) hold. Proof of the lemma: We construct the 풜-weighted MCFG 퐺′ = (푁 ′, 훴, 푆⟨∅⟩, 휇′) where 휇′ = 푔 ; 휇. Since ̂푔 preserves the structure of derivations, we know that ̂푔; wt퐺 = wt퐺′ . Finally, we derive for every 푤 ∈ 훴∗:

′ ′ ⟦퐺 ⟧(푤) = ⨁ wt퐺′ (푑 ) (by definition 2.54) 푑′∈Dc (푤) 퐺′ ′ = ⨁ wt퐺( ̂푔(푑 )) (since ̂푔; wt퐺 = wt퐺′ ) 푑′∈Dc (푤) 퐺′

52 2.3 Weighted language devices

′ = ⨁ ⨁ wt퐺( ̂푔(푑 )) (since 푔 is a total function) 푑∈Dc (푤) 푑′∈Dc (푤) 퐺 퐺′ ̂푔(푑′)=푑

= ⨁ ⨁ wt퐺(푑) 푑∈Dc (푤) 푑′∈Dc (푤) 퐺 퐺′ ̂푔(푑′)=푑

= ⨁ wt퐺(푑) (by (S1)) c 푑∈D퐺(푤) = ⟦퐺⟧(푤) (by definition 2.54).

Fanout and rank are not increased by this construction. ∎

2.3.2 Weighted finite state automata Definition 2.60 (weighted FSAs, syntax). An 풜-weighted finite state automaton (short: 풜- ∗ wFSA) is a tuple ℳ = (푄, 훴, 푄i, 푄f, 휇) where 휇 is a function from 푄 × 훴 × 푄 to 풜 (called weight assignment) and ℳuw = (푄, 훴, 푄i, 푄f, supp(휇)) is an FSA. We call ℳuw the underlying automaton of 퐺. □

∗ Let ℳ = (푄, 훴, 푄i, 푄f, 휇) be an 풜-wFSA and 푤 ∈ 훴 . The set of transitions of ℳ is supp(휇). ℳ inherits many properties from its underlying automaton. • The set of runs of ℳ, denoted by Runs , is Runs . ℳ ℳuw • The set of accepting runs of ℳ, denoted by Runsacc, is Runsacc . ℳ ℳuw • The set of runs of ℳ that yield 푤, denoted by Runs (푤), is Runs (푤). ℳ ℳuw • The set of accepting runs of ℳ that yield 푤, denoted by Runsacc(푤), is Runsacc (푤). ℳ ℳuw

• The language recognised by ℳ, denoted by ℒ(ℳ), is ℒ(ℳuw).

• ℳ is called unambiguous if ℳuw is unambiguous. • ℳ is called ambiguous if it is not unambiguous.

• ℳ is called finitely ambiguous if ℳuw is finitely ambiguous. • ℳ is called infinitely ambiguous if it is not finitely ambiguous.

Definition 2.61 (wFSAs, weighted language). Let ℳ = (푄, 훴, 푄i, 푄f, 휇) be an 풜-wFSA. The weight assignment of ℳ for runs, denoted by wtℳ, is the function from Runsℳ to 풜 that is defined for every 휃 = 휏1…휏푘 ∈ Runsℳ with 휏1, …, 휏푘 ∈ supp(휇) by

wtℳ(휃) = 휇(휏1) ⊙ … ⊙ 휇(휏푘).

If ℳ is finitely ambiguous or 풜 is complete, we define the weighted language of ℳ, denoted by ⟦ℳ⟧, as the function from 훴∗ to 풜 where for each 푤 ∈ 훴∗ we have

⟦ℳ⟧(푤) = ⨁ wtℳ(휃). □ acc 휃∈Runsℳ(푤)

53 2 Preliminaries

⟨a,{x}⟩ ⟨b,{y}⟩

⟨휀, {#}⟩ start 푞a 푞b

Figure 2.62: Graph of the wFSA ℳ shown in example 2.63.

Example 2.63 (wFSA). Consider the Lang훥-wFSA ℳ = (푄, 훴, {푞a}, {푞b}, 휇) where 훥 = {x, y, #}, 푄 = {푞a, 푞b}, 훴 = {a, b}, |supp(휇)| = 3, and

휏1 = (푞a, a, 푞a) 휇(휏1) = {x}

휏2 = (푞a, 휀, 푞b) 휇(휏2) = {#}

휏3 = (푞b, b, 푞b) 휇(휏3) = {y}.

We will visualise wFSAs similar to automata with data storage by a directed graph whose edges are labelled by tuples, see figure 2.62. The underlying automaton of ℳ is shown in example 2.28. The weighted languages of ℳ is given by

푚 푛 푚 푛 ⟦ℳ⟧(a b ) = {⏟⏟⏟⏟⏟x} ∘ … ∘ {x} ∘{#} ∘ {⏟⏟⏟⏟⏟y} ∘ … ∘ {y} = {x #y } for each 푚, 푛 ∈ ℕ and 푚 times 푛 times ⟦ℳ⟧(푤) = ∅ for each 푤 ∈ 훴∗ ∖ ℒ(ℳ).

Through using the semiring of formal 훥-languages, we have effectively created a device that takes a word in 훴∗ and returns a set of words in 훥∗. In the literature such devices are called (string) transducers, cf. Berstel [Ber79, section III.6]. □

2.3.3 Weighted automata with data storage Definition 2.64 (weighted automata with data storage, syntax). An 풜-weighted automaton with data storage is a tuple ℳ = (푄, DS, 훴, 푄i, 푄f, 휇) where DS = (퐶, 퐼, 퐶i, 퐶f) is a data ∗ ∗ storage, 휇 is a function from 푄 × 퐼 × 훴 × 푄 to 풜 (called weight assignment), and ℳuw = (푄, DS, 훴, 푄i, 푄f, supp(휇)) is an automaton with data storage. We call ℳuw the underlying automaton of 퐺. □ For convenience, we sometimes call an 풜-weighted automaton with data storage whose data storage is DS and whose terminals are taken from a set 훴 a (DS, 훴, 풜)-automaton. ∗ Let ℳ = (푄, DS, 훴, 푄i, 푄f, 휇) be a (DS, 훴, 풜)-automaton and 푤 ∈ 훴 . The set of transitions of ℳ is supp(휇). ℳ inherits many properties from its underlying automaton. • The set of runs of ℳ, denoted by Runs , is Runs . ℳ ℳuw • The set of accepting runs of ℳ, denoted by Runsacc, is Runsacc . ℳ ℳuw • The set of runs of ℳ that yield 푤, denoted by Runs (푤), is Runs (푤). ℳ ℳuw • The set of accepting runs of ℳ that yield 푤, denoted by Runsacc(푤), is Runsacc (푤). ℳ ℳuw

• The language recognised by ℳ, denoted by ℒ(ℳ), is ℒ(ℳuw).

54 2.3 Weighted language devices

• ℳ is called unambiguous if ℳuw is unambiguous. • ℳ is called ambiguous if it is not unambiguous.

• ℳ is called finitely ambiguous if ℳuw is finitely ambiguous. • ℳ is called infinitely ambiguous if it is not finitely ambiguous.

Definition 2.65 (weighted automata with data storage, weighted language). Consider the

풜-weighted automaton with data storage ℳ = (푄, DS, 훴, 푄i, 푄f, 휇). The weight assignment of ℳ for runs, denoted by wtℳ, is the function from Runsℳ to 풜 that is defined for every 휃 = 휏1…휏푘 ∈ Runsℳ with 휏1, …, 휏푘 ∈ supp(휇) by

wtℳ(휃) = 휇(휏1) ⊙ … ⊙ 휇(휏푘).

If ℳ is finitely ambiguous or 풜 is complete, we define the weighted language of ℳ, denoted by ⟦ℳ⟧, as the function from 훴∗ to 풜 where for each 푤 ∈ 훴∗ we have

⟦ℳ⟧(푤) = ⨁ wtℳ(휃). □ acc 휃∈Runsℳ(푤)

The set of all weighted languages recognised by (DS, 훴, 풜)-automata is denoted by REC(DS, 훴, 풜).

⟨push(훤 ), a, 1⟩ ⟨pop(a), a′, 1⟩

⟨id, #, 1⟩ ⟨id({휀}), 휀, 0⟩ start 1 2 3

⟨push(훤 ), b, 1⟩ ⟨pop(b), b′, 1⟩

Figure 2.66: Graph of the automaton with data storage ℳ from example 2.67

Example 2.67 (weighted automaton with data storage, cf. example 2.42). Consider Trop- weighted automaton with data storage ℳ = ([3], PD″(훤 ), 훴, {1}, {3}, 휇) where PD″ is taken from example 2.34, 훤 = {a, b}, 훴 = {a, b, #, a′, b′}, |supp(휇)| = 6, and

휏1 = (1, push(훤 ), a, 1) 휇(휏1) = 1

휏2 = (1, push(훤 ), b, 1) 휇(휏2) = 1

휏3 = (1, id, #, 2) 휇(휏3) = 1 ′ 휏4 = (2, pop(a), a , 2) 휇(휏4) = 1 ′ 휏5 = (2, pop(b), b , 2) 휇(휏4) = 1

휏6 = (2, id({휀}), 휀, 3) 휇(휏6) = 0.

The graph of ℳ is shown in figure 2.66. The weights of each transition are given as thethird component of the label of the corresponding edge in the graph. The underlying automaton of

55 2 Preliminaries

ℳ is given in example 2.42 and hence the language of ℳ is

ℒ(퐺) = {푢#푣 ∣ 푢 ∈ {a, b}∗, 푣 ∈ {a′, b′}∗, |푢| = |푣|}}

The weighted language of ℳ is given by

acc ⟦ℳ⟧(푤) = min{|휃| − 1 ∣ 휃 ∈ Runsℳ(푤)} = |푤| for each 푤 ∈ ℒ(퐺) and ∗ ⟦ℳ⟧(푤) = ∞ for each 푤 ∈ 훴 ∖ ℒ(퐺). □

Normal forms of data storage We extend proposition 2.43 to the weighted case.

Proposition 2.68 (taken from Den17a, proposition 31). Let det be defined as in proposition 2.43. Then REC(DS, 훴, 풜) = REC(det(DS), 훴, 풜).

′ ′ ′ ′ ′ Proof. Let ℳ = (푄, DS, 훴, 푄i, 푄f, 휇) and ℳ = (푄 , det(DS), 훴, 푄i , 푄f , 휇 ) be 풜-weighted ′ ′ automata with data storage. We call ℳ and ℳ related if ℳuw and ℳuw are related (cf. proposition 2.43), and 휇′(det(휏)) = 휇(휏) for each 휏 ∈ supp(휇). Note that det: supp(휇) → supp(휇′) is injective. Clearly, for every (DS, 훴, 풜)-automaton ℳ there is an (det(DS), 훴, 풜)- automaton ℳ′ such that ℳ and ℳ′ are related and vice versa. It remains to be shown that ⟦ℳ⟧ = ⟦ℳ′⟧. For every 푤 ∈ 훴∗, we derive

⟦ℳ⟧(푤) = ⨁ acc wtℳ(휃) (by definition 2.65) 휃∈Runsℳ ′ = ⨁ acc wtℳ′ (휃det) (by Def. of 휇 ) 휃∈Runsℳ ′ = ⨁ wt ′ (휃 ) (since det is bijective and by induction hypothesis) 휃′∈Runsacc ℳ ℳ′ = ⟦ℳ′⟧(푤) (by definition 2.65) ∎

Instruction normal form Definition 2.69. Let ℳ be a weighted automaton with data storage. We say that ℳ is in instruction normal form if ℳuw is in instruction normal form □

Proposition 2.70. Let DS be a data storage and 훴 be a set. For every (DS, 훴, 풜)-automaton, there is an equivalent (DS, 훴, 풜)-automaton in instruction normal form.

Proof. Let DS = (퐶, 퐼, 퐶i, 퐶f). Furthermore, let ℳ = (푄, DS, 훴, 푄i, 푄f, 휇) be an (DS, 훴, 풜)- ′ ′ ′ automaton. Now construct ℳ = (푄 , DS, 훴, 푄i, 푄f, 휇 ) as the (DS, 훴, 풜)-automaton in ′ ′ instruction normal form such that (푄 , DS, 훴, 푄i, 푄f, supp(휇 )) is the (DS, 훴)-automaton in ′ instruction normal form that is constructed by proposition 2.49 from ℳuw and 휇 is defined by

휇(휏) if there is a 휏 ∈ supp(휇) for which the first transition in 푔(휏) is 휏 ′ 휇′(휏 ′) = { } ퟙ otherwise

56 2.3 Weighted language devices

for every 휏 ′ ∈ supp(휇′). Then, because ퟙ is identity w.r.t. ⊙ and by definition 2.65:

푔 ; wtℳ′ = wtℳ.

We derive

′ ′ ⟦ℳ ⟧(푤) = ⨁ wtℳ′ (휃 ) (by definition 2.65) 휃′∈Runsacc (푤) ℳ′ ′ = ⨁ wtℳ′ (휃 ) ′ acc 휃 ∈Runs ′ (푤) ℳuw

= ⨁ wtℳ′ (푔(휃)) (by proposition 2.49) 휃∈Runsacc (푤) ℳuw

= ⨁ wtℳ′ (푔(휃)) acc 휃∈Runsℳ(푤)

= ⨁ wtℳ(휃) (since 푔 ; wtℳ′ = wtℳ) acc 휃∈Runsℳ(푤) = ⟦ℳ⟧(푤) (by definition 2.65) ∎

2.3.4 Weighted string homomorphisms Up to the definition of weighted string homomorphisms, the definitions in this section are taken from Droste and Kuich [DK09]. The notation was changed to conform to the rest ofthis dissertation.

For the remainder of section 2.3.4, let 풜 be a semiring.

Monomials and polynomials. Let 훥 be a set. A (훥-)monomial is a mapping 푓: 훥∗ → 풜 whose support is empty or a singleton. We write 푎.푤 for 푓 if 푓(푤) = 푎 and for each 푤′ ∈ 훥∗∖{푤} we have 푓(푤′) = ퟘ. Note that the monomial {(푤, ퟘ) ∣ 푤 ∈ 훥∗}, which is zero everywhere, can be written as ퟘ.푤 for any 푤 ∈ 훥∗; we will just write ퟘ instead. We denote the set of all 훥-monomials by 풜⟨훥∗⟩. A (훥-)polynomial is a mapping 푓: 훥∗ → 풜 whose support is finite. We denote the set of all 훥-polynomials by 풜⟪훥∗⟫.22

The semiring of weighted languages. Consider the tuple (훥∗ → 풜, ⊞, ⊠, ퟘ, ퟙ.휀) where ⊞ and ⊠ are defined by the equations

(푓 ⊞ 푔)(푤) = 푓(푤) ⊕ 푔(푤)

(푓 ⊠ 푔)(푤) = ⨁ ∗ 푓(푤1) ⊙ 푔(푤2) 푤1,푤2∈훥 :푤=푤1∘푤2

22We deviate from the standard notation [ÉK09, section 2]. While Ésik and Kuich [ÉK09] use 풜⟪훥∗⟫ to denote a 풜-weighted language over 훥, we already have a notation for that: 훥∗ → 풜. However, we need notations for monomials and polynomials. Therefore, I decided to use 풜⟨훥∗⟩ for monomials (“mono” as in one pair of angled brackets) and 풜⟪훥∗⟫ for polynomials (“poly” as in multiple, i.e. two, pairs of angled brackets).

57 2 Preliminaries

for each 푓, 푔: 훥∗ → 풜 and 푤 ∈ 훥∗.

Lemma 2.71. The tuple (훥∗ → 풜, ⊞, ⊠, ퟘ, ퟙ.휀) is a semiring.

Proof. The proof is shown in appendix B.3, page 166. Note that, in particular, the distributivity of 풜 is needed to show the associativity of ⊠. ∎

The following four properties can be easily verified: • 풜⟨훥∗⟩ ⊆ 풜⟪훥∗⟫ ⊆ (훥∗ → 풜). • 풜⟨훥∗⟩ is closed under ⊠, but not under ⊞. • 풜⟪훥∗⟫ is closed under ⊞ and ⊠. ∗ ∗ ∗ ∗ • The closure of 풜⟨훥 ⟩ under ⊞ is 풜⟪훥 ⟫, i.e. cl⊞(풜⟨훥 ⟩) = 풜⟪훥 ⟫. Hence, (풜⟨훥∗⟩, ⊠, ퟙ.휀) is a monoid; we call it the monoid of (훥-)monomials. Furthermore, (풜⟪훥∗⟫, ⊞, ⊠, ퟘ, ퟙ.휀) and (훥∗ → 풜, ⊞, ⊠, ퟘ, ퟙ.휀) are called the semiring of (훥-)polynomials and the semiring of weighted languages (over 훥), respectively.

Weighted homomorphims. Let 훴 be a set and 푔: 훴 → 풜⟨훥∗⟩. Now let ̂푔 be the function ∗ ∗ ̂푔: 훴 → 풜⟨훥 ⟩ that is defined for each 푘 ∈ ℕ and 휎1, …, 휎푘 ∈ 훴 by the equation

̂푔(휎1⋯휎푘) = 푔(휎1) ⊠ … ⊠ 푔(휎푘).

We call ̂푔 an 풜-weighted (훴, 훥)-string homomorphism.23 Furthermore, let ̂푔̂ be the partial ̂ ∗ ∗ function ̂푔:풫(훴 ) ‧‧➡ 훥 → 풜 that is defined by the equation

̂푔(퐿)=̂ ̂푔(푤) ⊞푤∈퐿

if 퐿 ⊆ 훴∗ is finite or 풜 is complete. (Otherwise, ̂푔(퐿)̂ is undefined.) We call ̂푔 alphabetic if there is a function ℎ: 훴 → (훥 ∪ {휀} → 풜) such that ℎ̂ =. ̂푔 We usually write 푔 instead of ̂푔 or .̂푔̂ For any sets 훴 and 훥, we fix the following two sets of functions: • HOM(훴, 훥, 풜) denotes the set of 풜-weighted (훴, 훥)-string homomorphisms, and • αHOM(훴, 훥, 풜) denotes the set of alphabetic 풜-weighted (훴, 훥)-string homomor- phisms. The following lemma shows that weighted homomorphisms and weighted alphabetic homo- morphisms are each closed under composition.

Lemma 2.73 (taken from Den15, lemma 18). For any sets 훴, 훥, and 훤: (i) HOM(훴, 훥) ; HOM(훥, 훤 , 풜) ⊆ HOM(훴, 훤 , 풜). (ii) αHOM(훴, 훥) ; αHOM(훥, 훤 , 풜) ⊆ αHOM(훥, 훤 , 풜).

Proof. The proof is illustrated by figure 2.72.

23An 풜-weighted (훴, 훥)-string homomorphism is a homomorphism between the monoid (훴∗, ∘, 휀) and the monoid (풜⟨훥∗⟩, ⊠, ퟙ.휀) in the sense of section 2.1.4.

58 2.3 Weighted language devices

̂′ ℎ1 ; ℎ2 = ℎ1 ; ℎ2

̂′ ℎ1 = ℎ1 ℎ2 훴∗ 훥∗ (훤 ∗ → 풜)

ℎ′ 1 ′ ℎ1 ; ℎ2 훴

Figure 2.72: Illustration of the proof of lemma 2.73.

′ ∗ For (i): Let ℎ1 ∈ HOM(훴, 훥) and ℎ2 ∈ HOM(훥, 훤 , 풜). Then there is an ℎ1: 훴 → 훥 such ̂′ ∗ ∗ ̂′ that ℎ1 = ℎ1. Since ℎ2(푢) ∈ 풜⟨훤 ⟩ for each 푢 ∈ 훥 , we know that ℎ1 ; ℎ2 ∈ HOM(훴, 훤 , 풜) ̂′ and ℎ1 ; ℎ2 = ℎ1 ; ℎ2. ′ For (ii): Let ℎ1 ∈ αHOM(훴, 훥) and ℎ2 ∈ αHOM(훥, 훤 , 풜). Then there is a ℎ1: 훴 → (훥 ∪ ̂′ ′ ̂′ {휀} such that ℎ1 = ℎ1. Since ℎ2(img(ℎ1)) ⊆ (훤 ∪ {휀} → 풜), we know that ℎ1 ; ℎ2 ∈ αHOM(훴, 훤 , 풜) and hence ℎ1 ; ℎ2 ∈ αHOM(훴, 훤 , 풜). ∎

Decomposition and closure results The following two lemmas apply the concept of weight separation [compare DV13, lemmas3 and 4] to weighted MCFGs and weighted automata with data storage, respectively.

Lemma 2.74 (taken from Den15, lemma 15). Let 훴 be a set, 푠, 푘 ∈ ℕ+ be numbers, 풜 be a complete commutative semiring, and 퐿: 훴∗ → 풜 be a weighted language. The following are equivalent: (i) 퐿 ∈ (푠, 푘)-MCFL(훴, 풜). (ii) There is a set 훥, a weighted homomorphism ℎ ∈ αHOM(훥, 훴, 풜), and an unambiguous unweighted MCFG 퐺′ ∈ (푠, 푘)-MCFG(훥) such that 퐿 = ℎ(ℒ(퐺′)).

Proof. For (i) ⟹ (ii): Let 퐿: 훴∗ → 풜 be an element of (푠, 푘)-MCFL(훴, 풜). By lemma 2.59,

there is a non-deleting 풜-weighted MCFG 퐺 = (푁, 훴, 푁i, 휇) such that ⟦퐺⟧ = 퐿. Now let ℛ(훴, 푁) denote the set of all potential rules over terminals 훴 and non-terminals 푁, i.e.

ℛ(훴, 푁) = {퐴 → 푐(퐵 , …, 퐵 ) ∣ 푛, 푠, 푠 , …, 푠 ∈ ℕ , 푐 ∈ (TCR(훴)) , 1 푛 1 푛 + (푠1⋯푠푛,푠) 퐴 ∈ 푁 , 퐵 ∈ 푁 , …, 퐵 ∈ 푁 }. 푠 1 푠1 푛 푠푛

푗 푗 Furthermore, let 훥 = 훴 ∪ {⟦휌,⟧휌 ∣ 휌 ∈ ℛ(훴, 푁), 푗 ∈ [fanout(휌)]}, ℛ(훥, 푁) be defined s s analogous to ℛ(훴, 푁), and 푔: Tℛ(훴,푁) → Tℛ(훥,푁) be a homomorphism given by

1 1 푚 푚 푔(휌) = 퐴 → [⟦휌푢1⟧휌, …, ⟦휌 푢푚⟧휌 ](퐵1, …, 퐵푛)

59 2 Preliminaries

for each 휌 = (퐴 → [푢1, …, 푢푚](퐵1, …, 퐵푛)) ∈ ℛ(훴, 푁). We now show that the homomor- s ∗ ∗ phism 푔 ; yield: Tℛ(훴,푁) → (훥 ) is injective by structural induction.

Injectivity of 푔 ; yield: Let (푤1, …, 푤푚) ∈ img(푔 ; yield). Then (푤1, …, 푤푚) must be of the 1 1 푚 푚 form (⟦휌푣1⟧휌, …, ⟦휌 푣푚⟧휌 ) for unique 휌 = (퐴 → [푢1, …, 푢푚](퐵1, …, 퐵푛)) ∈ ℛ(훴, 푁) ∗ and 푣1, …, 푣푚 ∈ 훥 due to the definition of 푔. Furthermore, by the definitions of yield and 1⃗ 푛 ∗ ∗ 1⃗ 푛 푔, there must be 푣 , …, 푣 ⃗ ∈ (훥 ) such that [푢1, …, 푢푚](푣 , …, 푣 ⃗ ) = (푣1, …, 푣푚). This 1⃗ 푛 decomposition into tuples 푣 , …, 푣 ⃗ is unique due the bracketing of substrings in 푣1, …, 푣푚. 1 푛 s By induction hypothesis, we assume that there are unique trees 푑 , …, 푑 ∈ Tℛ(훴,푁) with (푔 ; yield)(푑푖) = 푣푖⃗ for each 푖 ∈ [푛]. Hence there is a unique tree 푑 = 휌(푑1, …, 푑푛) such that (푔 ; yield)(푑) = (푤1, …, 푤푚) and therefore 푔 ; yield is injective. ′ Finally, we define the (unweighted) MCFG 퐺 = (푁, 훥, 푁i, 푔(supp(휇))) and the 풜-weighted (훥, 훴)-homomorphism ℎ: 훥∗ → 풜⟨훴∗⟩ where

⎧ퟙ.훿 if 훿 ∈ 훴 ⎫ { } ℎ(훿) = (휇(휌)).휀 if 훿 = ⟦1 for some 휌 ∈ ℛ(훴, 푁) ⎨ 휌 ⎬ { } ⎩ퟙ.휀 otherwise ⎭

for each 훿 ∈ 훥. 퐺′ is unambiguous since 푔 ; yield is injective (and 푔 is a function). It remains to be shown that 퐿 = ℎ(ℒ(퐺′)). For this, we derive for every 푤 ∈ 훴∗:

퐿(푤) = ⟦퐺⟧(푤) (by definition of 퐺)

= ⨁ c wt퐺(푑) (by definition 2.54) 푑∈D퐺(푤)

= ⨁ c (푔 ; yield ; ℎ)(푑)(푤) 푑∈D퐺(푤) (by definitions of 푔 and ℎ, and definitions 2.54 and 2.17)

= ⨁ c ℎ(푢)(푤) (since 푔 ; yield is injective) 푢∈(푔;yield)(D퐺(푤)) = ⨁ ℎ(푢)(푤) 퐺′ 푢∈ℒ(퐺′) (by definition of ) = ( ℎ(푢))(푤) (by definition of ⊞) ⊞푢∈ℒ(퐺′) = ℎ(ℒ(퐺′))(푤)

For (ii) ⟹ (i): Let 퐺 = (푁, 훥, 푁i, 푅) be an unambiguous (푠, 푘)-MCFG and ℎ ∈ αHOM(훥, 훴, 풜). ∗ ∗ ∗ Consider the homomorphisms ℎ1: 훥 → 훴 and ℎ2: 훥 → 풜 where

푤 if 훿 ∈ supp(ℎ) and ℎ(훿) = 푎.푤 for some 푎 ∈ 풜 and 푤 ∈ 훴∗ ℎ (훿) = { } 1 휀 otherwise 푎 if 훿 ∈ supp(ℎ) and ℎ(훿) = 푎.푤 for some 푎 ∈ 풜 and 푤 ∈ 훴∗ ℎ (훿) = { } 2 ퟘ otherwise

60 2.4 Problems and algorithms

′ for each 훿 ∈ 훥. We construct the 풜-weighted 푠, 푘-MCFG 퐺 = (푁, 훴, 푁i, 휇) where

′ ′ 휇(휌 ) = ⨁ ℎ2(휌) ′ ′ 휌∈푅:ℎ1(휌)=휌

′ with ℎ1: ℛ(훥, 푁) → ℛ(훴, 푁) being the function that replaces each 훿 ∈ 훥 that occurs in ′ its argument by ℎ1(훿) and ℎ2: ℛ(훥, 푁) → 풜 being the function that returns the product of values ℎ2(훥) for each occurrence of 훿 in its argument. Then we derive for each 푤 ∈ 훴∗:

ℎ(ℒ(퐺))(푤) = ⨁ ℎ(푢)(푤) 푢∈ℒ(퐺)

= ⨁ ℎ2(푢) (by definitions of ℎ1 and ℎ2) 푢∈ℒ(퐺):ℎ1(푢)=푤

= ⨁ c ℎ2(yield(푑)) (since 퐺 is unambiguous) 푑∈D퐺:ℎ1(yield(푑))=푤 ̂ ′ = ⨁ c ̂ ℎ2(yield(푑)) (where ℎ1 extends ℎ1 to trees position-wise) 푑∈D퐺:yield(ℎ1(푑))=푤 ′ = ⨁ wt ′ (푑 ) (by construction of 휇 and distributivity) 푑′∈Dc :yield(푑′)=푤 퐺 퐺′ ′ = ⨁ wt ′ (푑 ) (by definition 2.17) 푑′∈Dc (푤) 퐺 퐺′ = ⟦퐺′⟧(푤) (by definition 2.54) ∎

Lemma 2.75 (taken from HV15, theorem 6). Let DS be a data storage, 훴 be a set, 풜 be a complete semiring, and 퐿: 훴∗ → 풜 be a weighted language. The following are equivalent: (i) 퐿 ∈ REC(DS, 훴, 풜). (ii) There is a set 훥, a weighted homomorphism ℎ ∈ αHOM(훥, 훴, 풜), and an unambiguous, 휀-free (DS, 훥)-automaton ℳ′ such that 퐿 = ℎ(ℒ(ℳ′)).

Note. The basic definitions of Herrmann and Vogler [HV15] differ slightly from thedefinitions used in this thesis. The original proof of Herrmann and Vogler [HV15] would only require negligible adjustments to work in our context and is thus not (re-)stated here. ∎

2.4 Problems and algorithms

Problems. A problem is a binary relation 푃 ⊆ 퐼 × 푂 where 퐼 and 푂 are arbitrary sets. We call each 푖 ∈ 퐼 an input of 푃 and each 표 ∈ 푂 an output of 푃. Note that for each input 푖 of 푃 there may exist zero or more different outputs 표 ∈ 푂 such that (푖, 표) ∈ 푃. In this thesis, the set 퐼 will often be the Cartesian product of other sets 퐼1, …, 퐼푘. For convenience, we will write down a problem 푃 as follows (or with some variation in wording and typesetting):

Problem 푃 Input: an 푖 ∈ 퐼 Output: an 표 ∈ 푂 such that 퐶(푖, 표)

61 2 Preliminaries where 퐶(푖, 표) is a logical formula with free variables 푖 and 표. If we denote 푃 like this, then the binary relation 푃 is defined by 푃 = {(푖, 표) ∈ 퐼 × 푂 ∣ 퐶(푖, 표) is true}. Now let 푓: 퐼 ‧‧➡ 푂 be a partial function. We say that 푓 solves 푃 if 푓 ⊆ 푃 and dom(푓) = dom(푃 ).

Algorithms. An algorithm is a tuple Alg = (푃 , Code) where 푃 ⊆ 퐼 × 푂 is a problem and Code is a string of statements in pseudocode24 whose free variables are declared in the Input of 푃 and where the argument of each return-statement in Code is an expression of type 푂. The function induced by Code, denoted by ⟦Code⟧, is the partial function from 퐼 to 푂 that assigns for each input 푖 ∈ 퐼 the value returned by Code for this value if a return-statement is reached; if no return-statement is reached for 푖 then ⟦Code⟧ is not defined for 푖. We say that Alg is correct if ⟦Code⟧ solves 푃. For convenience, we will write down an algorithm Alg = (푃 , Code) as follows (or with some variation in wording and typesetting):

Algorithm Alg Input: an 푖 ∈ 퐼 Output: and 표 ∈ 푂 such that 퐶(푖, 표)

stat1 …

stat푘 where 퐶(푖, 표) is a logical formula with free variables 푖 and 표. If we denote Alg like this, then Alg = (푃 , Code) where 푃 is given by the Input and Output, as described above, and Code = stat1…stat푘.

Complexities. We briefly recall the 풪-notation for the time and space complexities of algo- rithms [cf. Cor+09, section 3.1]. Let 푘 ∈ ℕ and 푓 be a function from ℕ푘 to ℕ. We denote by 푘 풪(푓) the set of functions 푔: ℕ → ℕ for which there are constants 퐶1, 퐶2 ∈ ℕ+ such that for 푘 each ⃗푥∈ ℕ the inequation 0 ≤ 푔( ⃗푥)≤ 퐶1 ⋅ 푓( ⃗푥)+ 퐶2 holds. Usually, 푓 will be given by an expression 푒 with free variables 푥1, …, 푥푘. In this case, we write 풪(푒) instead of 풪(푓). The 풪-notation is used to characterise the time complexity or the space complexity of an algorithm. Let 퐼 and 푂 be sets, Alg = (푃 , Code) be an algorithm with 푃 ⊆ 퐼 × 푂, size: 퐼 → ℕ푘 for some 푘 ∈ ℕ, and 푓: ℕ푘 → ℕ. Following the Church-Turing thesis [Chu36, Section 7; Tur37, Section 9; cf. also Sip12, Chapter 3], we can translate Alg into an equivalent deterministic Turing machine ℳ. Now, let 푔: ℕ푘 → ℕ be the function that assigns for every ⃗푥∈ ℕ푘 the maximal number of steps that ℳ executes for any input 푖 ∈ size−1( ⃗푥) before halting. Also, let ℎ: ℕ푘 → ℕ be the function that assigns for every ⃗푥∈ ℕ푘 the maximal number of squares on the tape that ℳ uses for any input 푖 ∈ size−1( ⃗푥) before halting. We say that Alg runs in time 풪(푓) if 푔 ∈ 풪(푓). Analogously, we say that Alg runs in space 풪(푓) if ℎ ∈ 풪(푓).

24We assume that the reader is sufficiently familiar with imperative programming languages and their semantics to understand the pseudocode used in this dissertation. In line with this assumption, we will not specify the syntax and semantics of our pseudocode.

62 3 An automaton characterisation for weighted MCFLs

This chapter is a substantially revised and extended version of T. Denkinger. “An Automata Characterisation for Multiple Context-Free Languages”. In: Proceedings of the 20th International Conference on Developments in Language Theory. Ed. by S. Brlek and C. Reutenauer. 2. Springer Berlin Heidelberg, 2016, pp. 138–150. doi: 10.1007/978-3-662-53132-7_12.

3.1 Introduction

In section 2.2, we have seen several kinds of grammars and automata and their associated language classes,1 e.g. • context-free grammars generate the class of context-free languages CFL, • regular expressions [HU79, section 2.5] generate the class of regular languages REG, • multiple context-free grammars generate the class of multiple context-free languages MCFL, • finite state automata recognise the class of recognisable languages REC, and • automata with data storage pushdown (short: pushdown automata) recognise pushdown languages PDL. It turns out that some of these language classes are the same: REG = REC and CFL = PDL. This observation provides us with two viewpoints on each of the language classes: a generating one (i.e. a grammar class) which tells us how to generate well-formed strings of a language in the language class and a recognising one (i.e. an automaton class) which tells us how to decide for a given string, by traversing it from left to right, if it is well-formed with respect to the language class. Language classes are usually (first) defined in terms of a kind of grammar. Hence, when we find a kind of automaton that recognises such a language class, we callthis discovery an automaton characterisation. Figure 3.1 shows a Hasse-diagram that contains the four language classes of the Chomsky hierarchy and three additional language classes together with their associated kinds of automata. Note that the figure associates no kind of automaton with „multiple context-free languages“. This section fills that gap with restricted tree-stack automata.

1The term language class will be used for a set of languages to indicate that the languages in the set have something in common, e.g. they have the same kind of generating device. Particularly, the word class does not refer to the homonymous concept in the Zermelo-Fraenkel set theory.

63 3 An automaton characterisation for weighted MCFLs

Type 0 recursively enumerable ↪ Turing machines [HU79, section 7.3]

Type 1 context-sensitive languages ↪ linear bounded automata [HU79, theorem 9.5]

multiple context-free languages indexed languages ↪ nested stack automata [Aho69]

yield languages of TAGs ↪ embedded pushdown automata [Vij88, section 3]

Type 2 context-free languages ↪ pushdown automata [Cho62; Sch63]

Type 3 regular languages ↪ finite state automata [HU79, theorems 2.3 and 2.4]

Figure 3.1: Language classes and their automaton characterisations.

3.2 Tree-stack automata

A stack consists of a pushdown, i.e. a finite string, and a stack pointer, i.e. a position in the pushdown. A tree-stack is a generalisation of a stack in which branching is allowed in the pushdown.2 More precisely, the pushdown (of the stack) is replaced by a tree in the tree-stack, and the stack pointer is then, consequently, a position in this tree. The node that is at the position of the stack pointer is called the current node. The operations on tree-stacks are similar to those usually defined for stacks, but modifiedto accommodate branching: We may check if the stack pointer is at the root or we may compare the label at the current node to a specific symbol. Furthermore, we allow the stack pointer to be moved downward (i.e. replaced by the parent position of the current node), upward (i.e. replaced by any child position of the current node), and we may push a symbol to any vacant child position of the current node. If the stack pointer does not point to the root, then we may write at the current node. Formally, for any set 훤, a tree-stack over 훤 is a tuple [휉, 푝] where ∗ • 휉 is a partial function from ℕ+ to 훤, called the tree-pushdown, whose domain is non-empty ′ ′ ′ ∗ and prefix-closed, i.e. 푝 푖 ∈ dom(휉) implies 푝 ∈ dom(휉) for each 푝 ∈ ℕ+ and 푖 ∈ ℕ+,

2We will see later on that in contrast to a stack our tree-stack has no instruction to pop elements from the stack. However, this instruction is not essential. It can be simulated by adding a new symbol ⊥ to the stack alphabet and then executing (set(⊥) ; down) instead of pop and executing (push(푖, 훾) ∪ (up(푖) ; equals(⊥) ; set(훾))) instead of push(푖, 훾), cf. proposition B.2.

64 3.2 Tree-stack automata

and • 푝 ∈ dom(휉) is called the stack pointer. We call 휉(푝) the current stack symbol of [휉, 푝]. The set of all tree-stacks over 훤 is denoted by TS(훤 ). For any 훤, we define partial functions, given for each [휉, 푝] ∈ TS(훤 ) as follows:

[휉, 푝] if 푝 = 휀 bottom([휉, 푝]) = { } , undefined otherwise [휉, 푝] if 휉(푝) = 훾 equals(훾)([휉, 푝]) = { } for each 훾 ∈ 훤, undefined otherwise [휉, 푝′] if 푝 = 푝′푖 for some 푝′ ∈ ℕ∗ and 푖 ∈ ℕ down([휉, 푝]) = { + +} , undefined if 푝 = 휀 [휉, 푝푛] if 푝푛 ∈ dom(휉) up(푛)([휉, 푝]) = { } for each 푛 ∈ ℕ , undefined otherwise + [휉[푝푛/훾], 푝푛] if 푝푛 ∉ dom(휉) push(푛, 훾)([휉, 푝]) = { } for each 푛 ∈ ℕ and 훾 ∈ 훤, and undefined otherwise + set(훾)([휉, 푝]) = [휉[푝/훾], 푝] for each 훾 ∈ 훤.

In our examples, we will denote a tree-stack [휉, 푝] ∈ TS(훤 ) by writing 휉 as a set and then under- lining the unique element of 휉 that has 푝 as its first item. For example, we will abbreviate the tree- stack [{(휀, 훼), (1, 휎), (7, 훽), (11, 훾), (12, 훾)}, 12] by {(휀, 훼), (1, 휎), (7, 훽), (11, 훾), (12, 훾)}. Example 3.2 (tree-stack). Let 훤 = {훼, 훽, 훾, 휎, 훿} and consider the tree-stack

푡 = {(휀, 훼), (1, 휎), (7, 훽), (11, 훾), (12, 훾)} ∈ TS(훤 ).

Figure 3.3 shows a graphical representation of 푡 on the left. The tree structure is due tothe prefix-closedness of the domain of 푡 and the arrow stands for the stack pointer. Furthermore, the figure illustrates the functions defined above. □

훽 훽 훽 1 1 1 훾 훾 훾 훾 훾 훾 훾 훿 push(1, 훽) down set(훿) 1 2 1 2 1 2 1 2 휎 훽 휎 훽 up(1) 휎 훽 휎 훽

1 7 1 7 1 7 1 7 훼 훼 훼 훼

Figure 3.3: Illustration of the partial functions push(1, 훽), down, up(1), and set(훿) on a tree- stack, cf. example 3.2.

We will use the tree-stack to define a data storage.

65 3 An automaton characterisation for weighted MCFLs

Definition 3.4 (tree-stack storage, based on Den16a, definition 1). Let 훤 be a set that does not contain the symbol @. The tree-stack storage with respect to 훤 is the data storage

TSS(훤 ) = (TS(훤@), 퐼, {{(휀, @)}}, TS(훤@)) where

• 훤@ = 훤 ∪ {@}, • 퐼 = {bottom, equals(훾), down, up(푛), push(푛, 훾), write(훾) ∣ 푛 ∈ ℕ, 훾 ∈ 훤 }, and • write(훾) = (id(TS(훤 ∪ {@})) ∖ bottom); set(훾) for each 훾 ∈ 훤. □

Note that in addition to 훤, TSS(훤 ) has an additional stack symbol @. Also, write(훾) is undefined (for any 훾 ∈ 훤) if its argument’s current node is the root. Let us now consider automata with data storage whose data storage is a tree-stack storage. For brevity, we will call such an automaton a tree-stack automaton (short: TSA).

Example 3.5 (taken from Den16a, example 2). Let 훴 = {a, b, c, d} and 훤 = {∗, #}. Consider the TSA ℳ = ([5], TSS(훤 ), 훴, {1}, {5}, 푇 ) where 푇 contains exactly the following eight transitions:

휏1 = (1 , push(1, ∗) , a , 1) 휏5 = (3 , equals(∗) ; up(1) , c , 3)

휏2 = (1 , push(1, #) ; down , 휀 , 2) 휏6 = (3 , equals(#) ; down , 휀 , 4)

휏3 = (2 , equals(∗) ; down , b , 2) 휏7 = (4 , equals(∗) ; down , d , 4)

휏4 = (2 , bottom ; up(1) , 휀 , 3) 휏8 = (4 , bottom , 휀 , 5).

Consider the following run of ℳ:

(1, {(휀, @)}, a2b2c2d2) ⊢ (1, {(휀, @), (1, ∗)}, a b2c2d2) 휏1 ⊢ (1, {(휀, @), (1, ∗), (11, ∗)}, b2c2d2) 휏1 ⊢ (2, {(휀, @), (1, ∗), (11, ∗), (111, #)}, b2c2d2) 휏2 ⊢ (2, {(휀, @), (1, ∗), (11, ∗), (111, #)}, c2d2) 휏3휏3 ⊢ (3, {(휀, @), (1, ∗), (11, ∗), (111, #)}, c2d2) 휏4 ⊢ (3, {(휀, @), (1, ∗), (11, ∗), (111, #)}, d2) 휏5휏5 ⊢ (4, {(휀, @), (1, ∗), (11, ∗), (111, #)}, d2) 휏6 ⊢ (4, {(휀, @), (1, ∗), (11, ∗), (111, #)}, 휀) 휏7휏7 ⊢ (5, {(휀, @), (1, ∗), (11, ∗), (111, #)}, 휀). 휏8

2 2 2 2 2 2 2 2 It is easy to see that 휏1 휏2휏3 휏4휏5 휏6휏7 휏8 is the only run of ℳ that yields a b c d . Now let

66 3.2 Tree-stack automata

푛 ∈ ℕ and observe the unique run of ℳ that yields a푛b푛c푛d푛:

(1, {(휀, @)}, a푛b푛c푛d푛) 푛 푛 푛 푛 ⊢ 푛 (1, {(휀, @), (1, ∗), …, (1 , ∗)}, b c d ) 휏1 ⊢ (2, {(휀, @), (1, ∗), …, (1푛, ∗), (1푛+1, #)}, b푛c푛d푛) 휏2 푛 푛+1 푛 푛 ⊢ 푛 (2, {(휀, @), (1, ∗), …, (1 , ∗), (1 , #)}, c d ) 휏3 ⊢ (3, {(휀, @), (1, ∗), …, (1푛, ∗), (1푛+1, #)}, c푛d푛) 휏4 푛 푛+1 푛 ⊢ 푛 (3, {(휀, @), (1, ∗), , …, (1 , ∗), (1 , #)}, d ) 휏5 ⊢ (4, {(휀, @), (1, ∗), , …, (1푛, ∗), (1푛+1, #)}, d푛) 휏6 푛 푛+1 ⊢ 푛 (4, {(휀, @), (1, ∗), …, (1 , ∗), (1 , #)}, 휀) 휏7 ⊢ (5, {(휀, @), (1, ∗), …, (1푛, ∗), (1푛+1, #)}, 휀). 휏8

The state behaviour ensures that 휏1, 휏2, …, 휏8 occur in that order in any accepting run of ℳ (possibly with repetition of individual transitions). The storage behaviour ensures that the transitions 휏1, 휏3, 휏5, and 휏7 occur equally often in any accepting run of ℳ. Hence, the set of accepting runs of ℳ is

acc 푛 푛 푛 푛 Runsℳ = {휏1 휏2휏3 휏4휏5 휏6휏7 휏8 ∣ 푛 ∈ ℕ}

푛 푛 푛 푛 푛 푛 푛 푛 and the language of ℳ is ℒ(ℳ) = {yield(휏1 휏2휏3 휏4휏5 휏6휏7 휏8) ∣ 푛 ∈ ℕ} = {a b c d }. □

Note that any tree-stack [휉, 푝] that occurs during a run of ℳ (cf. example 3.5) is monadic, i.e. for each 푝′ ∈ dom(휉) there is at most one 푖 ∈ ℕ such that 푝′푖 ∈ dom(휉). If only monadic tree-stacks occur in the runs of a TSA ℳ, then we call ℳ monadic. The next example shows a TSA that is not monadic.

Example 3.6 (taken from Den16b, example 3.3). Let again 훴 = {a, b, c, d} and 훤 = {∗, #}. Consider the TSA ℳ = ([8], TSS(훤 ), 훴, {1}, {8}, 푇 ) where 푇 contains exactly the following 14 transitions:

휏1 = (1 , push(1, ∗) , a , 1) 휏8 = (4 , bottom ; up(1) , 휀 , 5)

휏2 = (1 , push(1, #) , 휀 , 2) 휏9 = (5 , equals(∗) ; up(1) , c , 5)

휏3 = (2 , down , 휀 , 2) 휏10 = (5 , equals(#) ; down , 휀 , 6)

휏4 = (2 , bottom , 휀 , 3) 휏11 = (6 , down , 휀 , 6)

휏5 = (3 , push(2, ∗) , b , 3) 휏12 = (6 , bottom ; up(2) , 휀 , 7)

휏6 = (3 , push(2, #) , 휀 , 4) 휏13 = (7 , equals(∗) ; up(2) , d , 7)

휏7 = (4 , down , 휀 , 4) 휏14 = (7 , equals(#) , 휀 , 8).

67 3 An automaton characterisation for weighted MCFLs

Now, for any 푚, 푛 ∈ ℕ+, consider the following run of ℳ:

(1, {(휀, @)}, a푚b푛c푚d푛) 푚 푛 푛 푛 ⊢ 푚 (1, {(휀, @), (1, ∗), …, (1 , ∗)}, b c d ) 휏1 ⊢ (2, {(휀, @), (1, ∗), …, (1푚, ∗), (1푚+1, #)}, b푛c푛d푛) 휏2 푚 푚+1 푛 푛 푛 ⊢ 푚+1 (2, {(휀, @), (1, ∗), …, (1 , ∗), (1 , #)}, b c d ) 휏3 ⊢ (3, {(휀, @), (1, ∗), …, (1푚, ∗), (1푚+1, #)}, b푛c푛d푛) 휏4 푚 푚+1 푛 푛 푛 ⊢ 푛 (3, {(휀, @), (1, ∗), …, (1 , ∗), (1 , #), (2, ∗), …, (2 , ∗)}, c d ) 휏5 ⊢ (4, {(휀, @), (1, ∗), …, (1푚, ∗), (1푚+1, #), (2, ∗), …, (2푛, ∗), (2푛+1, #)}, c푛d푛) 휏6 푚 푚+1 푛 푛+1 푛 푛 ⊢ 푛+1 (4, {(휀, @), (1, ∗), …, (1 , ∗), (1 , #), (2, ∗), …, (2 , ∗), (2 , #)}, c d ) 휏7 ⊢ (5, {(휀, @), (1, ∗), …, (1푚, ∗), (1푚+1, #), (2, ∗), …, (2푛, ∗), (2푛+1, #)}, c푛d푛) 휏8 푚 푚+1 푛 푛+1 푛 ⊢ 푚 (5, {(휀, @), (1, ∗), …, (1 , ∗), (1 , #), (2, ∗), …, (2 , ∗), (2 , #)}, d ) 휏9 ⊢ (6, {(휀, @), (1, ∗), …, (1푚, ∗), (1푚+1, #), (2, ∗), …, (2푛, ∗), (2푛+1, #)}, d푛) 휏10 푚 푚+1 푛 푛+1 푛 ⊢ 푚 (6, {(휀, @), (1, ∗), …, (1 , ∗), (1 , #), (2, ∗), …, (2 , ∗), (2 , #)}, d ) 휏11 ⊢ (7, {(휀, @), (1, ∗), …, (1푚, ∗), (1푚+1, #), (2, ∗), …, (2푛, ∗), (2푛+1, #)}, d푛) 휏12 푚 푚+1 푛 푛+1 ⊢ 푛 (7, {(휀, @), (1, ∗), …, (1 , ∗), (1 , #), (2, ∗), …, (2 , ∗), (2 , #)}, 휀) 휏13 ⊢ (8, {(휀, @), (1, ∗), …, (1푚, ∗), (1푚+1, #), (2, ∗), …, (2푛, ∗), (2푛+1, #)}, 휀) 휏14

As in our previous example, the state behaviour ensures that 휏1, 휏2, …, 휏14 occur in that order in any accepting run of ℳ and the storage behaviour ensures that they are repeated as often as in the above run, for some 푚, 푛 ∈ ℕ. The accepting runs of ℳ are therefore

acc 푚 푚+1 푛 푛+1 푚 푚 푛 Runsℳ = {휏1 휏2휏3 휏4휏5 휏6휏7 휏8휏9 휏10휏11휏12휏13휏14 ∣ 푚, 푛 ∈ ℕ}

and the language of ℳ is

푚 푚+1 푛 푛+1 푚 푚 푛 ℒ(ℳ) = {yield(휏1 휏2휏3 휏4휏5 휏6휏7 휏8휏9 휏10휏11휏12휏13휏14) ∣ 푚, 푛 ∈ ℕ} 푚 푛 푚 푛 = {a b c d ∣ 푚, 푛 ∈ ℕ}. □

3.2.1 Normal forms

In this section, we will consider two possible properties of TSAs that are undesirable with regard to section 3.3 and how they can be avoided: • the existence of stationary cycles, i.e. non-empty runs that do not change the storage configuration and that start from the same state they end in;and • acceptance of the automaton with a stack pointer other than 휀.

68 3.2 Tree-stack automata

Stationary cycles

Let ℳ = (푄, TSS(훤 ), 훴, 푄i, 푄f, 푇 ) be a TSA. We define the following abbreviations:

• 퐼|stay denotes the set of all instructions in TSS(훤 ) that do not move the stack pointer, i.e. 퐼|stay = {bottom, equals(훾), write(훾) ∣ 훾 ∈ 훤 },

• 푇 |stay denotes the set of all transitions in ℳ that do not move the stack pointer, i.e. ∗ ∗ 푇 |stay = 푇 ∩ (푄 × (퐼|stay) × 훴 × 푄), and

• Runsℳ|stay denotes the set of all (not necessarily accepting) runs of ℳ that do not move ∗ the stack pointer, i.e. Runsℳ|stay = Runsℳ ∩ (푇 |stay) . ∗ A run 휃 ∈ Runsℳ is called a stationary cycle if 휃 ≠ 휀 and {휃} ⊆ Runsℳ|stay. Definition 3.7 (taken from Den16a, definition 3). We call a TSA ℳ (stationary) cycle-free if Runsℳ contains no stationary cycle. □ The TSAs in examples 3.5 and 3.6 are both cycle-free.

Proposition 3.8. Let ℳ = (푄, TSS(훤 ), 훴, 푄i, 푄f, 푇 ) be a TSA. It is decidable in time 풪(|푇 |) whether ℳ is cycle-free.

Proof. This proof will reduce cycle-freeness of ℳ to depth-first search of a graph. For this purpose, let us briefly recall some basic definitions from graph theory [cf. Cor+09, section B.4]:

A (directed) graph is a tuple (푉 , 퐸) where 푉 is an arbitrary set and 퐸 ⊆ 푉 × 푉. We call the elements of 푉 and 퐸 vertices and edges, respectively. A path in (푉 , 퐸) is ′ ′ ∗ ′ an string of edges (푣1, 푣1)⋯(푣푘, 푣푘) ∈ 퐸 such that 푣푖 = 푣푖+1 for each 푖 ∈ [푘 − 1]. A path (푣0, 푣1)(푣1, 푣2)⋯(푣푘−1, 푣푘) in (푉 , 퐸) is called a cycle if 푣0 = 푣푘. A graph (푉 , 퐸) is called acyclic if no path in (푉 , 퐸) is a cycle.

Let 푞 ∈ 푄, 훾 ∈ 훤, and 휏 = (푞0, 푖, 푢, 푞1) ∈ 푇 |stay. We say that 푞 occurs in 휏 if 푞0 = 푞 or 푞1 = 푞. We say that 훾 occurs in 휏 if 푟 ∈ 푅∗ contains the instruction equals(훾) or the instruction write(훾). We say that @ occurs in 휏 (regardless of 휏). Now let 푄′ and 훤 ′ be the subsets of 푄 and 훤 ∪ {@}, respectively, that occur in 푇. For each 푞, 푞′ ∈ 푄′ and 훾, 훾′ ∈ 훤 ′, we denote the set

′ ′ ∗ {휏 ∈ 푇 |stay ∣ ∃[휉, 푝], [휉 , 푝] ∈ TS(훤 ∪ {@}), 푤, 푤 ∈ 훴 : ′ ′ ′ ′ ′ ′ (푞, [휉, 푝], 푤) ⊢휏 (푞 , [휉 , 푝], 푤 ) ∧ 휉(푝) = 훾 ∧ 휉 (푝 ) = 훾 }

′ ′ (푞,훾)→(푞 ,훾 ) ′ ′ by 푇 |stay . Now we construct a directed graph (푉 , 퐸) where 푉 = 푄 × 훤 and

′ ′ ′ ′ ′ ′ ′ ′ (푞,훾)→(푞 ,훾 ) 퐸 = {((푞, 훾), (푞 , 훾 )) ∣ 푞, 푞 ∈ 푄 , 훾, 훾 ∈ 훤 , 푇 |stay ≠ ∅}.

Clearly, (푉 , 퐸) is acyclic if and only if ℳ is cycle-free. We can determine whether (푉 , 퐸) is acyclic by doing a depth-first search of (푉 , 퐸) Cormen, Leiserson, Rivest, and Stein [Cor+09, lemma 22.11]. The depth-first search of (푉 , 퐸) can be performed in time 풪(|푉 | + |퐸|) [Cor+09, page 606]. We can see that |푉 | ≤ |푄′ × 훤 ′| ≤ |푇 |2 and |퐸| ≤ |푇 |. Hence, cycle-freeness of ℳ is decidable in time 풪(|푇 |2 + |푇 |) = 풪(|푇 |2). ∎

69 3 An automaton characterisation for weighted MCFLs

Observation 3.9. Let ℳ be a TSA. Then Runsℳ|stay is finite if and only if ℳ is cycle-free. ∎ Stationary cycles may be iterated and nested (into each other) during the run of an automaton. In the following proof, we want to consider only stationary cycles that are not obtained by iteration of nesting of other stationary cycles. We call such stationary cycles primitive.

Definition 3.10. A stationary cycle is called primitive if none of its proper substrings3 are stationary cycles. □ Observation 3.11. Each stationary cycle contains a primitive stationary cycle as a substring. ∎

Observation 3.12. Only finitely many runs of any TSA are primitive stationary cycles. ∎

Next, we show that cycle-freeness is a normal form among TSAs.

Proposition 3.13 (taken from Den16a, lemma 4). For every TSA, there is an equivalent cycle- free TSA.

Proof idea. Given a TSA ℳ, we construct a new TSA ℳ′ as follows: We select a primitive stationary cycle 휃 from the runs of ℳ. We modify 휃 (by modifying the transitions of ℳ) to end with a push-instruction. The resulting run 휃′ is then no longer a (primitive) stationary cycle. The TSA ℳ′ then consists of the unmodified transitions from ℳ and the (modified) transitions that occur in 휃′. We simulate repeating 휃 arbitrarily often by iterating 휃′ for the desired amount of times and then returning the stack pointer to its original position. This is achieved by placing a dedicated symbol ∗ along the path to the original position with the above mentioned push-instruction. After the iteration of 휃′ is finished, we go down until the current stack symbol is no longer ∗. We then iterate this construction until the resulting automaton no longer contains primitive stationary cycles.

Proof. Let ℳ = (푄, TSS(훤 ), 훴, 푄i, 푄f, 푇 ) be a TSA that is not cycle-free and let 퐾 ∈ ℕ+ such that ℳ contains no up(퐾)- and no push(퐾, 훾)-instruction (for any 훾 ∈ 훤. Furthermore, let 훤̄ be the set of all elements of 훤 that occur in the elements of 푇 and

휏1⋯휏푛 = (푞0, 푖1, 푢1, 푞1)(푞1, 푖2, 푢2, 푞2)⋯(푞푛−1, 푖푛, 푢푛, 푞푛) ∈ Runsℳ|stay

be a primitive stationary cycle and let 푞 = 푞0 = 푞푛. ′ ′ ′ ′ ′ (Construction) Let ℳ = (푄 , TSS(훤 ), 훴, 푄i, 푄f , 푇 ) be a TSA where • 푄′ = 푄 ∪ {푞↑, 푞↓, 푞′} for some 푞↑, 푞↓, 푞′ ∉ 푄, • 훤 ′ = 훤 ∪ {∗} for some ∗ ∉ 훤, ′ ′ 푄f ∪ {푞 }, if 푞 ∈ 푄f • 푄f = { }, and 푄f, otherwise • 푇 ′ is the smallest set such that 3A substring of some string 푤 is called a proper substring of 푤 if it is not equal to 푤.

70 3.2 Tree-stack automata

′ – 휏 ∈ 푇 for each 휏 ∈ 푇 ∖ {휏푛}, ′ ′ ′ – 휏푛 = (푞푛−1, 푖푛, 푢푛, 푞 ) ∈ 푇 , ′ ′ ′ ′ ′ – 휏 = (푞 , 푖, 푢, 푝 ) ∈ 푇 for each 휏 = (푞, 푖, 푢, 푝 ) ∈ 푇 ∖ {휏1}, begin ↑ ′ – 휏cycle = (푞, ⋃훾∈훤̄ equals(훾), 휀, 푞 ) ∈ 푇 , ↑ ↑ ↑ ′ – 휏 = (푞 , push(퐾, ∗), 푢1⋯푢푛, 푞 ) ∈ 푇 , – 휏 ↑↓ = (푞↑, id, 휀, 푞↓) ∈ 푇 ′, – 휏 ↓ = (푞↓, down, 휀, 푞↓) ∈ 푇 ′, and end ↓ ′ ′ 4 – 휏cycle = (푞 , ⋃훾∈훤̄ equals(훾), 휀, 푞 ) ∈ 푇 . ′ (Correctness of the construction) Let 휃 ∈ Runsℳ. A run 휃 ∈ Runsℳ′ is obtained from 휃 by the following replacement rule: ℓ Each maximal substring of 휃 that has the form (휏1⋯휏푛) (for some ℓ ∈ ℕ+) begin ↑ ℓ ↑↓ ↓ ℓ end is replaced by 휏cycle (휏 ) 휏 (휏 ) 휏cycle. Furthermore, the occurrence of a tran- ′ ℓ sition 휏 = (푞, 푖, 푢, 푝 ) immediately after a maximal (휏1⋯휏푛) is replaced by 휏 ′ = (푞′, 푖, 푢, 푝′). Clearly, yield(휃) = yield(휃′) and 휃 is accepting if and only if 휃′ is accepting. Also, there are no accepting runs in ℳ′ that are not obtained by the above replacement rule since the ↑ ↓ ′ begin ↑ ℓ ↑↓ ↓ ℓ end states 푞 , 푞 , and 푞 only occur in runs of the form 휏cycle (휏 ) 휏 (휏 ) 휏cycle (for some ℓ ∈ ℕ+). Hence ℒ(ℳ) = ℒ(ℳ′). (Proof of the proposition) The construction presented above does not introduce any new prim- itive stationary cycles. Also, the construction removes the primitive stationary cycle 휏1⋯휏푛 that it selects. Since there are only finitely many primitive stationary cycle for anyTSA ℳ (observation 3.12), we can obtain a cycle-free TSA ℳ′ by applying the construction multiple (finitely many) times. We then obtain the proposition because the construction preserves the language of the automaton. ∎

Stack normal form Definition 3.14 (taken from Den16a, definition 5). We say that a TSA ℳ is in stack normal form if the current stack pointer is 휀 whenever ℳ is in a final state. □ The TSA in example 3.5 is in stack normal form; the TSA in example 3.6is not.

Proposition 3.15 (taken from Den16a, lemma 6). For every TSA there is an equivalent TSA in stack normal form.

Proof idea. Let ℳ be a TSA. We construct an equivalent TSA ℳ that is in stack normal form as follows: We introduce a new state 푞f that is not a state in ℳ and make it the only final state

4 We use the instruction ⋃훾∈훤̄ equals(훾) to simplify the exposition even though this instruction does not exist in ′ ̄ begin the data storage TSS(훤 ). However, due to the fact that 훤 is finite and proposition B.2, we could replace 휏cycle begin ′ and 휏cycle by transitions that are possible with data storage TSS(훤 ) without changing the behaviour of the automaton.

71 3 An automaton characterisation for weighted MCFLs

in ℳ′. In addition to the transitions of ℳ, ℳ′ has transitions that ensure that there are runs beginning from any final state 푞 of ℳ that perform down-instructions until the stack pointer is

휀 and then go to state 푞f.

Proof. Let ℳ = (푄, TSS(훤 ), 훴, 푄i, 푄f, 푇 ) be a TSA and let 푞down, 푞f ∉ 푄. We construct a TSA ′ ′ ℳ = (푄 ∪ {푞down, 푞f}, TSS(훤 ), 푄i, {푞f}, 푇 ) where

푇 = 푇 ∪ {(푞, id, 휀, 푞down) ∣ 푞 ∈ 푄f} ∪ {(푞down, down, 휀, 푞down)}

∪ {(푞down, bottom, 휀, 푞f)}.

Since 푞f is only reachable from any element of 푄f (regardless of the current storage configuration), we immediately obtain ℒ(ℳ) = ℒ(ℳ′). Furthermore, ℳ′ is in stack normal form since, due

to bottom, 푞f can only be reached when the stack pointer is at the root. ∎

3.2.2 Restricted tree-stack automata Proposition 3.16. Any recursively enumerable language can be represented by a TSA.

Proof idea. For a given Turing machine, we construct a TSA with a monadic stack. The initial content of the tape is the input of the TSA. The current tape symbol of the Turing machine corresponds to the current stack symbol of the TSA. Every right move of the Turing machine becomes an up-instruction or a push-instruction, every left move becomes a down-instruction, every write becomes a write-instruction, and every read becomes an equals-instruction. ∎ It follows from the Church-Turing thesis [Chu36, section 7; Tur37, section 9; cf. also Sip12, chapter 3] that any TSA can be simulated by a TM. Thus TMs, TSAs, and TSAs with monadic stack are pairwise expressively equivalent. In particular, branching does not increase the expressive power of TSAs and is therefore superfluous in general. However, branching allows us to define a restricted version of TSAs in this section which is then proven to be equivalent to MCFGs in section 3.3. It is apparent from the above proposition that tree-stack automata are not equivalent to mul- tiple context-free grammars. We therefore introduce restricted tree-stack automata. Intuitively, in a 푠-restricted TSA, the stack pointer is only allowed to enter any position of the tree-stack at most 푠 times from below (i.e. from the parent position). acc Let ℳ be a TSA and 휃 = 휏1…휏푘 ∈ Runs휃 such that 휏1, …, 휏푘 are transitions of ℳ and 푟1, …, 푟ℓ are the individual instructions (i.e. elements of 푅 from definition 3.4) that appear in 휏1…휏푘 5 from left to right. The entrance count function of 휃 (w.r.t. ℳ), denoted by 푐ℳ,휃, is the total ∗ function from ℕ+ to ℕ such that

• 푐ℳ,휃(휀) = 1 and ∗ • for each 푝 ∈ ℕ+ and 푖 ∈ ℕ+, 푐ℳ,휃(푝푖) is the number of indices 푗 ∈ [ℓ] such that the stack pointers of the (푗 − 1)-th and 푗-th item in the family6

((푟1 ; … ; 푟푗)({(휀, @)}) ∣ 푗 ∈ {0, …, ℓ})

5The numbers 푘 and ℓ might be different since a transition may have multiple instructions. 6This family is defined since 휃 is an accepting run.

72 3.3 The equivalence of MCFGs and restricted TSAs

of tree-stacks are 푝 and 푝푖, respectively. ∗ The entrance count of 휃 (w.r.t. ℳ), denoted by cℳ(휃), is the number max{푐ℳ,휃(푝) ∣ 푝 ∈ ℕ+}. acc 7 The entrance count of ℳ, denoted by c(ℳ), is the number max{cℳ(휃) ∣ 휃 ∈ Runsℳ}. For any 푠 ∈ ℕ+, we call ℳ 푠-restricted if c(ℳ) ≤ 푠. The TSAs in examples 3.5 and 3.6 are both 2-restricted. The constructions in the proofs of propositions 3.13 and 3.15 do not increase the entrance count of the given TSAs. Hence, we obtain the following two corollaries.

Corollary 3.17 (to proposition 3.13, taken from Den16a, lemma 4). Let 푠 ∈ ℕ+. For every 푠-restricted TSA there is an equivalent 푠-restricted cycle-free TSA. ∎

Corollary 3.18 (to proposition 3.15, taken from Den16a, lemma 6). Let 푠 ∈ ℕ+. For every 푠-restricted TSA there is an equivalent 푠-restricted TSA in stack normal form. ∎

3.3 The equivalence of MCFGs and restricted TSAs

In this section, we will prove the following theorem:

Theorem 3.19 (taken from Den16a, theorem 18). 푠-MCFL(훴) = 푠-TSL(훴) for any set 훴 and number 푠 ∈ ℕ+. This is achieved by showing the inclusion 푠-MCFL(훴) ⊆ 푠-TSL(훴) in section 3.3.1 and the inclusion 푠-MCFL(훴) ⊇ 푠-TSL(훴) in section 3.3.2.

3.3.1 Every MCFG has an equivalent restricted TSA We show that for every MCFG there is an equivalent restricted TSA. The proof is conducted in three steps: (step 1) We give a construction (construction 3.20) that provides for every PMCFG 퐺 a TSA ℳ(퐺) that is equivalent (proposition 3.33). (step 2) We show that ℳ(퐺) is restricted whenever 퐺 is an MCFG (lemma 3.34). (step 3) The conclusion of step 1 and step 2 is that MCFGs can be implemented by restricted TSAs (proposition 3.35). As a nice feature of this step-wise approach, we can obtain a parser for PMCFGs (cf. section 6.2) with the help of step 1.

Step 1: Implementation of PMCFGs with TSAs Villemonte de la Clergerie [Vil02b; Vil02a, section 4] showed how to construct for any ordered simple range concatenation grammar [Bou98a; Bou98b; Bou00] an equivalent . Since ordered simple range concatenation grammars and thread automata are devices similar to PMCFGs and TSAs, respectively, we base our construction on Villemonte de la Clergerie’s idea. However, we additionally have to deal with copying, deletion, and permutation of argument

7 Note that c(ℳ) is at least 1 since 푐ℳ,휃(휀) = 1 for any accepting run 휃.

73 3 An automaton characterisation for weighted MCFLs

components. The TSA that we will construct guesses for any input word a corresponding derivation in the MCFG in a top-down left-to-right manner.8

Construction 3.20 (taken from Den16a, construction 7). Let 퐺 = (푁, 훴, 푁i, 푅) be a PMCFG. The automaton with respect to 퐺 is

ℳ(퐺) = (푄, TSS(푄 ∪ 푅), 훴, {푞i}, {푞f}, 푇) where 푄 = {⟨휌, 푗, 휅⟩ ∣ 휌 = 퐴 → [푢1, …, 푢푠](퐵1, …, 퐵푘) ∈ 푅, 푗 ∈ [푠], 휅 ∈ {0, …, |푢푗|}} ∪ ′ {푞i, 푞f} and 푇 is the smallest set 푇 such that

• for each 휌 = 푆 → [푢](퐵1, …, 퐵푘) ∈ 푅 with 푆 ∈ 푁i, the following two transitions are in 푇 ′:

initial(휌) = (푞i, push(1, 푞f), 휀, ⟨휌, 1, 0⟩) and

final(휌) = (⟨휌, 1, |푢|⟩, equals(푞f); write(휌) ; down, 휀, 푞f),

• for each 휌 = 퐴 → [푢1, …, 푢푠](퐵1, …, 퐵푘) ∈ 푅, 푗 ∈ [푠], 휅 ∈ [|푢푗|], and 휎 ∈ 훴 for which ′ 휎 is the 휅-th symbol in 푢푗, the following transition is in 푇 :

read(휌, 푗, 휅) = (⟨휌, 푗, 휅 − 1⟩, id, 휎, ⟨휌, 푗, 휅⟩), and

′ • for each 휌 = 퐴 → [푢1, …, 푢푠](퐵1, …, 퐵푘) ∈ 푅, 휌 = 퐵푖′ → [푣1, …, 푣푠′ ](퐶1, …, 퐶푘′ ) ∈ ′ ′ 푅, 푗 ∈ [푠], 푗 ∈ [푠 ], and 휅 ∈ [|푢푗|] such that 푥푖′,푗′ is the 휅-th symbol in 푢푗, the following three transitions are in 푇 ′:

call(휌, 푗, 휅, 휌′) = (⟨휌, 푗, 휅 − 1⟩, push(푖′, ⟨휌, 푗, 휅⟩), 휀, ⟨휌′, 푗′, 0⟩), resume(휌, 푗, 휅, 휌′) = (⟨휌, 푗, 휅 − 1⟩, up(푖′); equals(휌′); write(⟨휌, 푗, 휅⟩), 휀, ⟨휌′, 푗′, 0⟩), and suspend(휌, 푗, 휅, 휌′) ′ ′ ′ = (⟨휌 , 푗 , |푣푗′ |⟩, equals(⟨휌, 푗, 휅⟩) ; write(휌 ); down, 휀, ⟨휌, 푗, 휅⟩). □

For each rule of a PMCFG, construction 3.20 creates multiple transitions. We illustrate this in the following example.

′ Example 3.21 (taken from Den16a, example 8). Let 휌 = 퐴 → [a푥1,2, c푥1,1](퐵) and 휌 = 퐵 → [휀, 휀]() be rules of a PMCFG 퐺 such that neither 퐴 nor 퐵 is an initial non-terminal. Then ℳ(퐺) contains the following transitions for 휌:

read(휌, 1, 1) = (⟨휌, 1, 0⟩, id, a, ⟨휌, 1, 1⟩) and read(휌, 2, 1) = (⟨휌, 2, 0⟩, id, c, ⟨휌, 2, 1⟩)

8This strategy of guessing the rules of the derivation corresponds to Earley parsing [Kan08; Ear70].

74 3.3 The equivalence of MCFGs and restricted TSAs

⟨휌, 1, 1⟩ ⟨휌, 1, 0⟩ ⟨휌, 2, 2⟩ ⟨휌′, 2, 0⟩

′ 휌 = 퐴 → [ • a • 푥1,2 • , • c • 푥1,1 • ](퐵) 휌 = 퐵 → [ • , • ]()

⟨휌, 1, 0⟩ ⟨휌, 1, 2⟩ ⟨휌, 2, 1⟩ ⟨휌′, 1, 0⟩

Figure 3.22: Positions in the rules 휌 and 휌′ from example 3.21.

휌 = 퐴 → [ • a • 푥1,2 • , • c • 푥1,1 • ](퐵)

휌′ = 퐵 → [ • , • ]()

Figure 3.23: Transitions for the rules 휌 and 휌′ from example 3.21 are shown with solid arrows. Horizontal solid arrows signify read transitions of 휌, downward solid arrows signify call and resume transitions that involve 휌 and 휌′, and upward solid arrows signify suspend transitions that involve 휌 and 휌′. Dotted arrows signify the call/resume transitions (downward) and suspend transitions (upward) that involve 휌, but not 휌′.

and the following transitions for the combination of 휌 and 휌′:

call(휌, 1, 2, 휌′) = (⟨휌, 1, 1⟩, push(1, ⟨휌, 1, 2⟩), 휀, ⟨휌′, 2, 0⟩) call(휌, 2, 2, 휌′) = (⟨휌, 2, 1⟩, push(1, ⟨휌, 2, 2⟩), 휀, ⟨휌′, 1, 0⟩)

resume(휌, 1, 2, 휌′) = (⟨휌, 1, 1⟩, up(1) ; equals(휌′); write(⟨휌, 1, 2⟩), 휀, ⟨휌′, 2, 0⟩) resume(휌, 2, 2, 휌′) = (⟨휌, 2, 1⟩, up(1) ; equals(휌′); write(⟨휌, 2, 2⟩), 휀, ⟨휌′, 1, 0⟩)

suspend(휌, 1, 2, 휌′) = (⟨휌′, 2, 0⟩, equals(⟨휌, 1, 2⟩) ; write(휌′); down, 휀, ⟨휌, 1, 2⟩) suspend(휌, 2, 2, 휌′) = (⟨휌′, 1, 0⟩, equals(⟨휌, 2, 2⟩) ; write(휌′); down, 휀, ⟨휌, 2, 2⟩).

We can think of the states of ℳ(퐺) as positions in the rules of 퐺. The positions of 휌 and 휌′ are shown in figure 3.22. Then we can think of the transitions of ℳ(퐺) as arrows between positions in the rules of 퐺, see figure 3.23. □

Even for a small PMCFG 퐺 with, say, five rules, the number of transitions in ℳ(퐺) is large. Therefore, the next example concentrates on a runof ℳ(퐺) rather than on a listing of all transitions.

Example 3.24 (taken from Den16a, example 8). Consider the MCFG 퐺 from example 2.20. For convenience, we repeat its rules:

휌1 = 푆 → [푥1,1푥2,1푥1,2푥2,2](퐴, 퐵) 휌2 = 퐴 → [a푥1,1, c푥1,2](퐴) 휌3 = 퐴 → [휀, 휀]()

휌4 = 퐵 → [b푥1,1, d푥1,2](퐵) 휌5 = 퐵 → [휀, 휀]().

75 3 An automaton characterisation for weighted MCFLs

By applying construction 3.20 to 퐺, we obtain a TSA ℳ(퐺). The 54 transitions of ℳ(퐺) are listed in appendix C.1. Figure 3.25 shows the only run of ℳ(퐺) on bd. If we think of the states of ℳ(퐺) as positions in the rules of 퐺 (cf. example 3.21), then each run in ℳ(퐺) corresponds to a traversal through the corresponding derivation of 퐺. This traversal is shown in figure 3.26. □ Next, we are interested in the size of ℳ(퐺) from construction 3.20.

Proposition 3.27. Let 퐺 = (푁, 훴, 푁i, 푅) be a PMCFG. The number of transitions of ℳ(퐺) is in 풪(|푅|2 ⋅ ℓ) where ℓ is the maximum length of composition representations that occur in the elements of 푅. If 퐺 is a terminal-separated9 MCFG of fanout 푠 and rank 푘, then the number of transitions of ℳ(퐺) is in 풪(|푅|2 ⋅ 푠 ⋅ 푘).

Proof. By analysing the quantifications in construction 3.20, we can infer that thereare • at most |푅| transitions of the form initial(휌), • at most |푅| transitions of the form final(휌), • at most |푅| ⋅ ℓ transitions of the form read(휌, 푗, 휅), • at most |푅| ⋅ ℓ ⋅ |푅| transitions of the form call(휌, 푗, 휅, 휌′), • at most |푅| ⋅ ℓ ⋅ |푅| transitions of the form resume(휌, 푗, 휅, 휌′), and • at most |푅| ⋅ ℓ ⋅ |푅| transitions of the form suspend(휌, 푗, 휅, 휌′). Hence, ℳ(퐺) has at most 2 ⋅ |푅| + |푅| ⋅ ℓ + 3 ⋅ |푅| ⋅ ℓ transitions. The number of transitions in ℳ(퐺) is therefore in 풪(2 ⋅ |푅| + ℓ|푅| + 3 ⋅ ℓ ⋅ |푅|) = 풪(|푅|2 ⋅ ℓ). If 퐺 is a terminal-separated MCFG, then no variable occurs more than once in a composition representation of 퐺. Therefore, if 퐺 has fanout 푠 and rank 푘, we have ℓ ≤ 푠 ⋅ 푘 and the number of transitions in ℳ(퐺) is in 풪(|푅|2 ⋅ 푠 ⋅ 푘). ∎

The following observation is immediately apparent from construction 3.20:

Observation 3.28. Let 퐺 be a PMCFG. The automaton with respect to 퐺 is a cycle-free TSA in stack normal form. ∎

Lemma 3.29 (taken from Den16a, lemma 10). ℒ(퐺) ⊆ ℒ(ℳ(퐺)) for each PMCFG 퐺.

Proof idea. We will show the claim by induction in the structure of derivations of 퐺. In a derivation of 퐺 of the form 푑 = 휌(푑1, …, 푑푘), the components of any yield(푑푖) may occur copied or permuted in yield(푑) due to the composition function in 휌. We model this permutation and repetition with functions 푗푖 (for 푑푖) and 푗 (for 푑). We will construct for any derivation 푑 and any permutation with repetition 푗 of the components of yield(푑), a string of runs of ℳ(퐺) that yield exactly the 푗-permutation of yield(푑). These runs are only allowed to make an excursion upwards in the tree-stack and they have to return to the stack-pointer that they started with.

Proof. Let 퐺 = (푁, 훴, 푁i, 푅) be a PMCFG. We show the following statement by induction on the structure of derivations:

9An MCFG 퐺 is called terminal-separated if each composition representation that occurs in a rule of 퐺 (i) either contains only variables or exactly one terminal symbol and (ii) has no component that is 휀.

76 3.3 The equivalence of MCFGs and restricted TSAs

(푞i, {(휀, @)}, bd) ⊢ (⟨휌 , 1, 0⟩, {(휀, @), (1, 푞 )}, bd) initial(휌1) 1 f ⊢ (⟨휌 , 1, 0⟩, {(휀, @), (1, 푞 ), (11, ⟨휌 , 1, 1⟩)}, bd) call(휌1,1,1,휌3) 3 f 1 ⊢ (⟨휌 , 1, 1⟩, {(휀, @), (1, 푞 ), (11, 휌 )}, bd) suspend(휌1,1,1,휌3) 1 f 3 ⊢ (⟨휌 , 1, 0⟩, {(휀, @), (1, 푞 ), (11, 휌 ), (12, ⟨휌 , 1, 2⟩)}, bd) call(휌1,1,2,휌4) 4 f 3 1 ⊢ (⟨휌 , 1, 1⟩, {(휀, @), (1, 푞 ), (11, 휌 ), (12, ⟨휌 , 1, 2⟩)}, d) read(휌4,1,1) 4 f 3 1 ⊢ (⟨휌 , 1, 0⟩, {(휀, @), (1, 푞 ), (11, 휌 ), (12, ⟨휌 , 1, 2⟩), (121, ⟨휌 , 1, 2⟩)}, d) call(휌4,1,2,휌5) 5 f 3 1 4 ⊢ (⟨휌 , 1, 2⟩, {(휀, @), (1, 푞 ), (11, 휌 ), (12, ⟨휌 , 1, 2⟩), (121, 휌 )}, d) suspend(휌4,1,2,휌5) 4 f 3 1 5 ⊢ (⟨휌 , 1, 2⟩, {(휀, @), (1, 푞 ), (11, 휌 ), (12, 휌 ), (121, 휌 )}, d) suspend(휌1,1,2,휌4) 1 f 3 4 5 ⊢ (⟨휌 , 2, 0⟩, {(휀, @), (1, 푞 ), (11, ⟨휌 , 1, 3⟩), (12, 휌 ), (121, 휌 )}, d) resume(휌1,1,3,휌3) 3 f 1 4 5 ⊢ (⟨휌 , 1, 3⟩, {(휀, @), (1, 푞 ), (11, 휌 ), (12, 휌 ), (121, 휌 )}, d) suspend(휌1,1,3,휌3) 1 f 3 4 5 ⊢ (⟨휌 , 2, 0⟩, {(휀, @), (1, 푞 ), (11, 휌 ), (12, ⟨휌 , 1, 4⟩), (121, 휌 )}, d) resume(휌1,1,4,휌4) 4 f 3 1 5 ⊢ (⟨휌 , 2, 1⟩, {(휀, @), (1, 푞 ), (11, 휌 ), (12, ⟨휌 , 1, 4⟩), (121, 휌 )}, 휀) read(휌4,2,1) 4 f 3 1 5 ⊢ (⟨휌 , 2, 0⟩, {(휀, @), (1, 푞 ), (11, 휌 ), (12, ⟨휌 , 1, 4⟩), (121, ⟨휌 , 2, 2⟩)}, 휀) resume(휌4,2,2,휌5) 5 f 3 1 4 ⊢ (⟨휌 , 2, 2⟩, {(휀, @), (1, 푞 ), (11, 휌 ), (12, ⟨휌 , 1, 4⟩), (121, 휌 )}, 휀) suspend(휌4,2,2,휌5) 4 f 3 1 5 ⊢ (⟨휌 , 1, 4⟩, {(휀, @), (1, 푞 ), (11, 휌 ), (12, 휌 ), (121, 휌 )}, 휀) suspend(휌1,1,4,휌4) 1 f 3 4 5 ⊢ (푞 , {(휀, @), (1, 휌 ), (11, 휌 ), (12, 휌 ), (121, 휌 )}, 휀) final(휌1) f 1 3 4 5

Figure 3.25: Run of ℳ(퐺) on bd, cf. example 3.24.

푞i 푞f

푆 → [ • 푥1,1 • 푥2,1 • 푥1,2 • 푥2,2 • ](퐴, 퐵)

퐴 → [ • , • ]() 퐵 → [ • b • 푥1,1 • , • d • 푥1,2 • ](퐵)

퐴 → [ • , • ]()

Figure 3.26: Representation of the run of ℳ(퐺) shown in figure 3.25 as a traversal through the corresponding derivation of 퐺. Each arrow corresponds to exactly one transition.

77 3 An automaton characterisation for weighted MCFLs

(Induction hypothesis) For every derivation 푑 ∈ D퐺 with 푑(휀) = 휌 = 퐴 → [푢1, …, 푢푠](퐵1, …, 퐵푘) ∗ and yield(푑) = (푤1, …, 푤푠) for some 푠 ∈ ℕ+ and 푤1, …, 푤푠 ∈ 훴 , for each number 푚 ∈ ℕ, and for each function 푗: [푚] → [푠], there are runs 휃1, …, 휃푚 ∈ Runsℳ(퐺) such that

(i) (푤푗(1), …, 푤푗(푚)) = (yield(휃1), …, yield(휃푚)) and

(ii) there are tree-stacks [휉1, 휀], …, [휉푚, 휀] such that

(⟨휌, 푗(1), 0⟩, {(휀, @)}) ⊢ (⟨휌, 푗(1), |푢 |⟩, [휉 , 휀]), 휃1 푗(1) 1

(⟨휌, 푗(2), 0⟩, [휉 , 휀]) ⊢ (⟨휌, 푗(2), |푢 |⟩, [휉 , 휀]), 1 휃2 푗(2) 2 ⋮ (⟨휌, 푗(푚), 0⟩, [휉 , 휀]) ⊢ (⟨휌, 푗(푚), |푢 |⟩, [휉 , 휀]). 푚−1 휃푚 푗(푚) 푚

We abbreviate this property by 푃 (푑, 푗, 휃1, …, 휃푚). (Induction base) The induction base follows from the induction step by setting 푘 = 0.

(Induction step) Let 푑 = 휌(푑1, …, 푑푘) ∈ D퐺 with 휌 = 퐴 → [푢1, …, 푢푠](퐵1, …, 퐵푘) and ∗ yield(푑) = (푤1, …, 푤푠). Let us denote the 휅-th symbol in 푢푛 ∈ (훴 ∪ X) by 훿푛,휅 for each 푛 ∈ [푠] and 휅 ∈ [|푢푛|]. Furthermore, let 푚 ∈ ℕ and 푗: [푚] → [푠]. For every 푖 ∈ [푘], let 푚푖 ′ be the number of occurrences of elements from the set 푋푖 = {푥푖,푗′ ∣ 푗 ∈ [fanout(퐵푖)]} in the string 푢 ⋯푢 and let 푗 : [푚 ] → [fanout(퐵 )] such that 푥 , …, 푥 occur 푗(1) 푗(푚) 푖 푖 푖 푖,푗푖(1) 푖,푗푖(푚푖) exactly in that order in 푢푗(1)⋯푢푗(푚). By induction hypothesis, there are, for each 푖 ∈ [푘], runs 휃 , …, 휃 ∈ Runs such that 푃 (푑 , 푗 , 휃 , …, 휃 ). Consider the set Pos = 푖,1 푖,푚푖 ℳ(퐺) 푖 푖 푖,1 푖,푚푖 ′ ′ {⟨ℓ, 휅⟩ ∣ ℓ ∈ [푚], 휅 ∈ [|푢푗(ℓ)|]} and the binary relation ≼ ⊆ Pos × Pos where ⟨ℓ, 휅⟩ ≼ ⟨ℓ , 휅 ⟩ ′ ′ ′ iff (ℓ < ℓ ) ∨ ((ℓ = ℓ ) ∧ (휅 < 휅 )). We define the function occ: Pos → ℕ+ as

occ(⟨ℓ, 휅⟩) ′ ′ ′ ′ ′ = |{⟨ℓ , 휅 ⟩ ∈ Pos ∣ ∃푖, 푛, 푛 ∈ ℕ+: (훿푗(ℓ),휅 = 푥푖,푛) ∧ (훿푗(ℓ′),휅′ = 푥푖,푛′ ), ⟨ℓ , 휅 ⟩ ≼ ⟨ℓ, 휅⟩}|.

Intuitively, ⟨ℓ, 휅⟩ signifies the position of the occ(⟨ℓ, 휅⟩)-th occurrence of a variable with first index 푖 (for some 푖) in 푢푗(1)⋯푢푗(푚) among all the occurrences of variables with first index 푖. Now consider the function ℎ: Pos → Runsℳ(퐺) given by

⎧read(휌, 푗(ℓ), 휅) if 훿 ∈ 훴 ⎫ { 푗(ℓ),휅 } { } {call(휌, 푗(ℓ), 휅, 푑 (휀)) 휃 suspend(휌, 푗(ℓ), 휅, 푑 (휀)) } { 푖 푖,1 푖 } 훿 푥 (⟨ℓ, 휅⟩) = 1 ℎ(⟨ℓ, 휅⟩) = ⎨ if 푗(ℓ),휅 is of the form 푖,푛 and occ ⎬ { } { } {resume(휌, 푗(ℓ), 휅, 푑푖(휀)) 휃푖,occ(⟨ℓ,휅⟩) suspend(휌, 푗(ℓ), 휅, 푑푖(휀)) } { } ⎩ if 훿푗(ℓ),휅 is of the form 푥푖,푛 and occ(⟨ℓ, 휅⟩) ≠ 1⎭

for each ⟨ℓ, 휅⟩ ∈ Pos. For each ℓ ∈ [푚], we construct the run 휃ℓ = ℎ(⟨ℓ, 1⟩)⋯ℎ(⟨ℓ, |푢푗(ℓ)|⟩). It is easy to see from the definition of ℎ that (푤푗(1), …, 푤푗(푚)) = (yield(휃1), …, yield(휃푚)). Furthermore, since item (ii) of the property 푃 holds for the runs 휃 , …, 휃 for each 푖 ∈ [푘] 푖,1 푖,푚푖

78 3.3 The equivalence of MCFGs and restricted TSAs

and the definition of ℎ only adds push-/up- and down-instructions in a specific manner, we also know that item (ii) of property 푃 holds for the runs 휃1, …, 휃푚. Hence, 푃 (푑, 푗, 휃1, …, 휃푚) holds. c Now let 푑 ∈ D퐺. Note that sort(푑) ∈ 푁i. Then by the inductive proof above there isa ′ run 휃 ∈ Runs퐺 such that 푃 (푑, {(1, 1)}, 휃). Then the run 휃 = initial(푑(휀)) 휃 final(푑(휀)) is accepting by construction 3.20. Hence,

c ′ ′ acc ℒ(퐺) = {yield(푑) ∣ 푑 ∈ D퐺} ⊆ {yield(휃 ) ∣ 휃 ∈ Runsℳ(퐺)} = ℒ(ℳ(퐺)). ∎

For the inclusion ℒ(ℳ(퐺)) ⊆ ℒ(퐺), we first analyse the form of the runsof ℳ(퐺).

Observation 3.30 (taken from Den16a, lemma 11). Let ℳ(퐺) = (푄, TSS(훤 ), 훴, {푞i}, {푞f}, 푇 ) ∗ for some PMCFG 퐺. Furthermore, let 푝 ∈ ℕ+ ∖ {휀} and 휏1, …, 휏푛 ∈ 푇 such that 휃 = 휏1⋯휏푛 ∈ Runs퐺. There is a rule 휑휃(푝) in 퐺 such that, during the run 휃, the automaton ℳ(퐺) is in a 10 state of the form ⟨휑휃(푝), 푗, 휅⟩ ∈ 푄 whenever the stack-pointer is at position 푝.

Justification. The rule 휑휃(휌) is selected when 푝 is first reached with “call”. Then whenever we enter 푝 with “resume”, a previous “suspend” has stored the rule 휑휃(푝) at position 푝 and “resume” enforces the claimed property. The claimed property is preserved by any “read”. And whenever we enter 푝 with “suspend”, a previous “call” or “resume” has stored an appropriate state in the stack and “suspend” merely jumps back to that state, observing the claimed property. ∎

Using construction 3.20 and observation 3.30, we can further characterise the runs of ℳ(퐺).

Lemma 3.31 (taken from Den16a, observation 12). Let ℳ(퐺) = (푄, TSS(훤 ), 훴, {푞i}, {푞f}, 푇 ) for some PMCFG 퐺, 휃 ∈ Runsℳ(퐺) be a run, and 휏 ∈ 푇 be a transition in 휃. Furthermore, let ′ ′ 푞, 푞 ∈ 푄, let [휉, 푝], [휉 , 푝푖] ∈ TS(훤 ∪ {@}) with 푖 ∈ ℕ+, and let 휌 = 휑휃(푝푖) be of the form 퐴 → [푢1, …, 푢푠](퐵1, …, 퐵푘). ′ ′ (i) If (푞, [휉, 푝]) ⊢휏 (푞 , [휉 , 푝푖]) and 푝 = 휀, then ′ (i.a) 푞 = 푞i, [휉, 푝] = {(휀, @)}, [휉 , 푝푖] = {(휀, @), (1, 푞f)}, (i.b) 푞′ = ⟨휌, 1, 0⟩, and 휏 = initial(휌). ′ ′ (ii) If (푞 , [휉 , 푝푖]) ⊢휏 (푞, [휉, 푝]) and 푝 = 휀, then ′ (ii.a) 푖 = 1, 푞 = 휉 (푝푖) = 푞f, ′ (ii.b) 푞 = ⟨휌, 1, |푢1|⟩, 휉(푝푖) = 휌, and 휏 = final(휌). ′ ′ (iii) If (푞, [휉, 푝]) ⊢휏 (푞 , [휉 , 푝푖]), 푝 ≠ 휀, and 푝푖 ∉ dom(휉), then

(iii.a) 푞 ∉ {푞i, 푞f}, (iii.b) 푞′ = ⟨휌, 푗, 0⟩ for some 푗 ∈ [푠], and ′ ′ ′ ′ ′ ′ (iii.c) 휏 = call(휌 , 푗 , 휅 , 휌) for some 휌 ∈ 푅, and 푗 , 휅 ∈ ℕ+. ′ ′ (iv) If (푞, [휉, 푝]) ⊢휏 (푞 , [휉 , 푝푖]), 푝 ≠ 휀, and 푝푖 ∈ dom(휉), then

(iv.a) 푞 ∉ {푞i, 푞f},

10 ∗ We may think of 휑휃 as a partial function from ℕ+ ∖ {휀} to the rules of 퐺.

79 3 An automaton characterisation for weighted MCFLs

(iv.b) 푞′ = ⟨휌, 푗, 0⟩ for some 푗 ∈ [푠], and ′ ′ ′ ′ ′ ′ (iv.c) 휏 = resume(휌 , 푗 , 휅 , 휌) for some 휌 ∈ 푅 and 푗 , 휅 ∈ ℕ+. ′ ′ (v) If (푞 , [휉 , 푝푖]) ⊢휏 (푞, [휉, 푝]) and 푝 ≠ 휀, then

(v.a) 푞 ∉ {푞i, 푞f}, ′ (v.b) 푞 = ⟨휌, 푗, |푢푗|⟩ for some 푗 ∈ [푠], and ′ ′ ′ ′ ′ ′ (v.c) 휏 = suspend(휌 , 푗 , 휅 , 휌) for some 휌 ∈ 푅 and 푗 , 휅 ∈ ℕ+.

Proof. Note that 휌 = 휑휃(푝푖) exists due to observation 3.30. item (i): Only transitions of the form initial(휌′), for some rule 휌′, are applicable when 푝 = 휀. This immediately implies item (i.a). Item (i.b) then follows from 휑휃(푝푖) = 휌. item (ii): Only transitions of the form final(휌′), for some rule 휌′, can change the stack-pointer to 휀. Together with item (i.a), this implies item (ii.a). Item (ii.b) then follows from 휑휃(푝푖) = 휌. items (iii.a), (iv.a) and (v.a): These hold because there are no transitions in ℳ(퐺) that involve

푞i or 푞f, but not the stack pointer 휀. items (iii.b), (iv.b) and (v.b): These hold because 휑휃(푝푖) = 휌 and, whenever the stack-pointer is moved upwards (resp. downwards), the target state (resp. source state) of the corresponding transition has 0 as its third component. items (iii.c), (iv.c) and (v.c): Follows immediately from items (iii.b), (iv.b) and (v.b). ∎ Lemma 3.32 (taken from Den16a, lemma 13). 퐿(퐺) ⊇ 퐿(ℳ(퐺)) for each PMCFG 퐺 that only has productive non-terminals.11.

Proof. Let 퐺 = (푁, 훴, 푁i, 푅), ℳ(퐺) = (푄, TSS(훤 ), 훴, {푞i}, {푞f}, 푇 ), and 휃 = 휏1⋯휏푛 ∈ acc Runsℳ(퐺) be a run with 휏1, …, 휏푛 ∈ 푇. Then there are states 푞1, …, 푞푛−1 ∈ 푄, strings ∗ 푤1, …, 푤푛 ∈ 훴 , and tree-stacks [휉1, 푝1], …, [휉푛−1, 푝푛−1], [휉푛, 휀] ∈ TS(훤 ∪ {@}) such that

(푞i, {(@, 휀)}, 푤1…푤푛) ⊢ (푞 , [휉 , 1푝 ], 푤 …푤 ) ⊢ … ⊢ (푞 , [휉 , 1푝 ], 푤 ) 휏1 1 1 1 2 푛 휏2 휏푛−1 푛−1 푛−1 푛−1 푛 ⊢ (푞 , [휉 , 휀], 휀). 휏푛 f 푛

′ ∗ ′ ∗ We define the function 휑휃:ℕ+ → 푅 as 휑휃(푝) = 휑휃(1푝) for every 푝 ∈ ℕ+ with 1푝 ∈ dom(휑휃) (cf. observation 3.30). Consider the following property, called (†): ′ 12 ′ For every 푑 ∈ D퐺 with 푑 ⊇ 휑휃, 푝 ∈ dom(휑휃), and every maximal interval [푎, 푏] for which 푝푎, …, 푝푏 have prefix 푝, we have 푤푎⋯푤푏 = yield(푑|푝)푗 where ′ 푞푎 = ⟨휑휃(푝), 푗, 0⟩ for some 푗 ∈ ℕ+ and yield(푑|푝)푗 stands for the 푗-th component of the tuple yield(푑|푝).

If we set 푝 = 휀 in (†), we obtain 푤2⋯푤푛−1 = yield(푑). By lemma 3.31, we know that 휏1 = initial(푑(휀)) due to item (i.b) and that 휏푛 = final(푑(휀)) due to item (ii.b). Thus 푤1 = 휀 = 푤푛 and therefore 푤1⋯푤푛 = yield(푑). 11A non-terminal of 퐺 is called productive if there is a rule with this non-terminal that occurs in a derivation of 퐺. 12 ∗ A derivation is a sorted tree over rules, which can be identified with a partial function from ℕ+ to 푅 (i.e. a binary ∗ relation that is functional w.r.t. ℕ+).

80 3.3 The equivalence of MCFGs and restricted TSAs

Proof of (†): We prove (†) by induction on the structure of 푑. Note that 0 < 푎 ≤ 푏 < 푛 because in the initial and final configuration of the run 휃, the stack-pointer is at position 휀, which does ′ ′ not start with 1 as in the definition of 휑휃. Let 휑휃(푝) be the rule 퐴 → [푢1, …, 푢푠](퐵1, …, 퐵푘). Consider the run

(푞 , [휉 , 1푝 ], 휀) ⊢ (푞 , [휉 , 1푝 ], 푤 ) ⊢ … ⊢ (푞 , [휉 , 1푝 ], 푤 ⋯푤 ). 푎−1 푎−1 푎−1 휏푎 푎 푎 푎 푎 휏푎+1 휏푏 푏 푏 푏 푎 푏 Since [푎, 푏] is maximal and a transition can add at most one symbol to the stack-pointer, we know that 푝푎 = 푝 = 푝푏. By lemma 3.31, items (iii.b), (iv.b) and (v.b), we also know that 푞 = ⟨휑′ (푝), 푗, 0⟩ and 푞 = ⟨휑′ (푝), 푗, |푢 |⟩ for some 푗 ∈ [푠]. We define strings 푣 , …, 푣 as 푎 휃 푏 휃 푗 1 |푢푗| follows:

휎 if the 휅-th symbol of the string 푢푗 is some 휎 ∈ 훴 푣휅 = { } yield(푑|푝푖)푗′ if the 휅-th symbol of the string 푢푗 is some 푥푖,푗′ ∈ X

for each 휅 ∈ [|푢푗|].

From the definition of 푣 , …, 푣 and from the induction hypothesis follows that 푣 ⋯푣 = 1 |푢푗| 1 |푢푗| 푤푎⋯푤푏 = yield(푑|푝)푗. ∎

Proposition 3.33 (taken from Den16a, proposition 14). 퐿(퐺) = 퐿(ℳ(퐺)) if 퐺 only has productive non-terminals.

Proof. The claim follows directly from lemmas 3.29 and 3.32. ∎

Step 2: Linearity of 퐺 implies restrictedness of ℳ(퐺)

Lemma 3.34 (taken from Den16a, observation 9). Let 퐺 be a 푠-MCFG. Then ℳ(퐺) is 푠- restricted.

∗ ′ Proof. Let 퐺 = (푁, 훴, 푁i, 푅). Consider some arbitrary position 푝 ∈ ℕ+ and number 푖 ∈ ℕ+. Position 푝푖′ can only be reached from below if the current stack pointer is at position 푖′ and if we either execute the transition call(휌, 푗, 휅, 휌′) or the transition resume(휌, 푗, 휅, 휌′) for some ′ 휌 = 퐴 → [푢1, …, 푢푠](퐵1, …, 퐵ℓ) ∈ 푅, 휌 = 퐵푖′ → [푣1, …, 푣푠′ ](퐶1, …, 퐶ℓ′ ) ∈ 푅, 푗 ∈ [푠], ′ ′ 푗 ∈ [푠 ], and 휅 ∈ [|푢푗|] where the 휅-th symbol of 푢푖 is 푥푖′,푗′ . For those transitions to be applicable, the automaton has to be in state ⟨휌, 푗, 휅 − 1⟩. Therefore, there are exactly as many states from which we can reach position 푝푖′ as there are occurrences of elements of {푥푖′,1, …, 푥휅,푠′ } in the string 푢1⋯푢푠. Since [푢1, …, 푢푠] is linear, the number of such occurrences is smaller or equal to 푠′ and (since 퐺 is an 푠-MCFG) also smaller or equal to 푠. It is easy to see that in the part of the run where the stack pointer is never below 푝, the states ⟨휌, 푗, 0⟩, …, ⟨휌, 푗, |푢푗|⟩ occur in that order whenever the stack pointer is at 푝 and, in particular, none of those states ′ occur twice. Therefore, we have that 푐ℳ(퐺),휃(푝푖 ) ≤ 푠 for every run 휃 of ℳ(퐺) and, since 푝 and 푖′ were chosen arbitrarily, we have that for any non-empty position 푝′ ≠ 휀 and every run 휃 ′ of ℳ(퐺) the inequation 푐ℳ(퐺),휃(푝 ) ≤ 푠 holds. Since the position 휀 can never be entered from below, we know that ℳ(퐺) is 푠-restricted. ∎

81 3 An automaton characterisation for weighted MCFLs

Step 3: The conclusion

Proposition 3.35. 푠-MCFL(훴) ⊆ 푠-TSA(훴) for any set 훴 and number 푠 ∈ ℕ+. Proof. It is easy to observe that for each 푠-MCFG(훴), there is an equivalent 푠-MCFG(훴) with only productive non-terminals. Then claim follows directly from proposition 3.33 and lemma 3.34. ∎

3.3.2 Every restricted TSA has an equivalent MCFG We want to show that for any cycle-free 푠-restricted TSA ℳ in instruction normal form and stack normal form, there is an equivalent 푠-MCFG 퐺(ℳ). For this, we first construct an 푠- ′ acc ∗ MCFG 퐺 (ℳ) whose language is the set of accepting runs of ℳ. Clearly, yield: Runsℳ → 훴 is a homomorphism. Hence, we use yield and the closure of MCFGs under homomorphisms to obtain 퐺(ℳ).

Intuition and an example The non-terminals of 퐺′(ℳ) will have the form

′ ′ ⟨푞1, 푞1, …, 푞푚, 푞푚; 훾0, …, 훾푚⟩

′ ′ where 푞1, 푞1…, 푞푚, 푞푚 are states and 훾0, …, 훾푚 are stack symbols of ℳ. Our construction for 퐺′(ℳ) will guarantee the following property:

′ ′ ′ For each number 푚 ∈ [푠], non-terminal ⟨푞1, 푞1, …, 푞푚, 푞푚; 훾0, …, 훾푚⟩ of 퐺 (ℳ), and runs 휃1, …, 휃푚 of ℳ, t.f.a.e. ′ ′ ′ (i) (휃1, …, 휃푚) ∈ ℒ(퐺 (ℳ), ⟨푞1, 푞1, …, 푞푚, 푞푚; 훾0, …, 훾푚⟩).

(ii) 휃1, …, 휃푚 each make only upward excursions (i.e. they each return the stack pointer to the position it started from and let it never go below that position), ′ and for each 푖 ∈ [푚]: 푞푖 is the source state of 휃푖, 푞푖 is the target state of 휃푖, 휃푖 is applicable if the current stack symbol is 훾푖−1, and after the execution of 휃푖, the current stack symbol is 훾푖. Let us start with an example for illustration.

Example 3.38 (based on Den16a, example 15). Let ℳ = (푄, TSS(훤 ), 훴, {1}, {5}, 푇 ) be a ′ ′ ″ ′ ″ ′ TSA where 푄 = [5] ∪ {1 , 2 , 2 , 3 , 3 , 4 }, 푇 = {휏1, …, 휏14}, and the transitions 휏1, …, 휏14 are shown in figure 3.36 (left). An example for an accepting runof ℳ is shown in figure 3.36 (right). Note that ℳ is cycle-free, in stack normal form, and in instruction normal form. Let us consider position 휀 of the stack. The only transitions applicable there are 휏1, 휏2, 휏6, 휏7, 휏9, and 휏14 due to their instructions. Because of the states, all valid runs must start with 휏1 or 휏2 and end with 휏14. Furthermore, 휏7 is always preceded by 휏6; 휏6 is preceded by 휏3 or 휏5; and 휏14 is preceded by 휏11 or 휏13. Thus, each valid run is of one of the forms

′ ′ ′ 휃 = 휏1 휃1 휏5휏6휏7 휃2 휏13휏14 or 휃 = 휏2 휃1 휏3휏6휏7 휃2 휏11휏14

82 3.3 The equivalence of MCFGs and restricted TSAs

(1, {(휀, @)}, abcd) ⊢ (1, {(휀, @), (1, ∗)}, bcd) 휏1 ′ ⊢휏 (1 , {(휀, @), (1, ∗), (11, #)}, bcd) 휏1 = (1, push(1, ∗), a, 1 ) 2 ′ ⊢ (2, {(휀, @), (1, ∗), (11, #)}, bcd) 휏2 = (1, push(1, #), 휀, 1 ) 휏3 휏 = (1′, , 휀, 2 ) ⊢ (2′, {(휀, @), (1, ∗), (11, #)}, cd) 3 down 휏4 휏 = (2, equals(∗), b, 2′ ) ⊢ (2, {(휀, @), (1, ∗), (11, #)}, cd) 4 휏5 휏 = (2′, down, 휀, 2 ) ″ 5 ⊢휏 (2 , {(휀, @), (1, ∗), (11, #)}, cd) ″ 6 휏6 = (2, bottom, 휀, 2 ) ⊢휏 (3, {(휀, @), (1, ∗), (11, #)}, cd) ″ 7 휏7 = (2 , up(1), 휀, 3 ) ′ ⊢휏 (3 , {(휀, @), (1, ∗), (11, #)}, d) 휏 = (3, equals(∗), c, 3′ ) 8 8 ⊢ (3, {(휀, @), (1, ∗), (11, #)}, d) ′ 휏9 휏9 = (3 , up(1), 휀, 3 ) ⊢ (3″, {(휀, @), (1, ∗), (11, #)}, d) ″ 휏10 휏10 = (3, equals(#), 휀, 3 ) ″ ⊢휏 (4, {(휀, @), (1, ∗), (11, #)}, d) 휏11 = (3 , down, 휀, 4 ) 11 ′ ⊢ (4′, {(휀, @), (1, ∗), (11, #)}, 휀) 휏12 = (4, equals(∗), d, 4 ) 휏12 ′ ⊢ (4, {(휀, @), (1, ∗), (11, #)}, 휀) 휏13 = (4 , down, 휀, 4 ) 휏13 휏 = (4, bottom, 휀, 5 ) ⊢ (5, {(휀, @), (1, ∗), (11, #)}, 휀). 14 휏14

Figure 3.36: Set of transitions (left) and an accepting run (right) of ℳ, cf. example 3.38

′ ′ for some runs 휃1, 휃2, 휃1, and 휃2. ′ The target state of 휏1 is 1 and the source state of 휏5 is 2 . Also, 휏1 pushes a ∗ to position 1 ′ which is never changed since ℳ has no write-instructions. Thus, 휃1 must go from state 1 to 2 ′ and from stack symbol ∗ to ∗ at position 1. Similarly, we obtain that 휃2 goes from state 3 to 4 ′ ′ ′ and from stack symbol ∗ to ∗ at position 1; 휃1 goes from state 1 to 1 and from stack symbol # ′ ″ to #; and 휃2 goes from state 3 to 3 and from stack symbol # to #. We call the runs 휃1 and 휃2 linked since they both are executed while the stack pointer is in the same subtree (i.e. the first one). Clearly, linked runs need to be produced by the same non-terminal. A non-terminal will describe the state behaviour and the storage behaviour that is expected from the corresponding ′ ′ linked runs. Hence, the pair (휃1, 휃2) of linked runs gets the non-terminal ⟨1, 2 , 3, 4 ; ∗, ∗, ∗⟩. ′ ′ ′ ′ ″ ′ For (휃1, 휃2), we have the non-terminal ⟨1 , 1 , 3, 3 ; #, #, #⟩. Since 휃 and 휃 both go from state 1 to 5 and from stack symbol @ to @, we construct the following two rules

′ ′ ⟨1, 5; @, @⟩ → [휏1 푥1,1 휏5휏6휏7 푥1,2 휏13휏14](⟨1, 2 , 3, 4 ; ∗, ∗, ∗⟩) and ′ ′ ″ ⟨1, 5; @, @⟩ → [휏2 푥1,1 휏3휏6휏7 푥1,2 휏11휏14](⟨1 , 1 , 3, 3 ; #, #, #⟩).

Next, we explore the non-terminal ⟨1, 2′, 3, 4′; ∗, ∗, ∗⟩, i.e. we search for a run that goes from state 1 to 2′ and from stack symbol ∗ to ∗ and another run that goes from state 3 to 4′ and from stack symbol ∗ to ∗. There are two kinds of such pairs of runs: (휏1휃1휏5휏4, 휏8휏9휃2휏13휏12) ′ ′ ′ ′ ′ ′ and (휏2휃1휏3휏4, 휏8휏9휃2휏11휏12) for some runs 휃1, 휃2, 휃1, and 휃2. The runs 휃1, 휃2, 휃1, and 휃2 of this

83 3 An automaton characterisation for weighted MCFLs

휏10 111 (1′, #) (3, #) (3″, #)

휏2 휏3 휏9 휏11 휏4 휏8 휏12 11 (1, ∗) (2, ∗) (2′, ∗) (3, ∗) (3′, ∗) (4, ∗) (4′, ∗)

휏1 휏5 휏9 휏13 휏4 휏8 휏12 1 (1, ∗) (2, ∗) (2′, ∗) (3, ∗) (3′, ∗) (4, ∗) (4′, ∗)

휏1 휏5 휏7 휏13 휏6 휏14 휀 (1, @) (2, @) (2″, @) (4, @) (5, @)

′ ′ ⟨1, 5; @, @⟩ → [휏1 푥1,1 휏5휏6휏7 푥1,2 휏13휏14](⟨1, 2 , 3, 4 ; ∗, ∗, ∗⟩)

′ ′ ′ ′ ⟨1, 2 , 3, 4 ; ∗, ∗, ∗⟩ → [휏1 푥1,1 휏5휏4, 휏8휏9 푥1,2 휏13휏12](⟨1, 2 , 3, 4 ; ∗, ∗, ∗⟩)

′ ′ ′ ′ ″ ⟨1, 2 , 3, 4 ; ∗, ∗, ∗⟩ → [휏2 푥1,1 휏3휏4, 휏8휏9 푥1,2 휏11휏12](⟨1 , 1 , 3, 3 ; #, #, #⟩)

′ ′ ″ ⟨1 , 1 , 3, 3 ; #, #, #⟩ → [휀, 휏10]()

푆 → [a 푥1,1 푥1,2](퐴)

퐴 → [a 푥1,1 b, c 푥1,2 d](퐴)

퐴 → [푥1,1 b, c 푥1,2 d](퐵)

퐵 → [휀, 휀]()

Figure 3.37: Run of ℳ (top), and the corresponding derivations of 퐺′(ℳ) (middle) and 퐺(ℳ) (bottom, with renamed non-terminals), cf. example 3.38. Matching colors mark the correspondence between (pairs of) runs and rules.

84 3.3 The equivalence of MCFGs and restricted TSAs paragraph exhibit the same state behaviour and storage behaviour as those in the previous paragraph and hence, we have the rules

′ ′ ′ ′ ⟨1, 2 , 3, 4 ; ∗, ∗, ∗⟩ → [휏1 푥1,1 휏5휏4, 휏8휏9 푥1,2 휏13휏12](⟨1, 2 , 3, 4 ; ∗, ∗, ∗⟩) and ′ ′ ′ ′ ″ ⟨1, 2 , 3, 4 ; ∗, ∗, ∗⟩ → [휏2 푥1,1 휏3휏4, 휏8휏9 푥1,2 휏11휏12](⟨1 , 1 , 3, 3 ; #, #, #⟩).

′ ′ ″ For the non-terminal ⟨1 , 1 , 3, 3 ; #, #, #⟩, we can only take the pair (휀, 휏10) of runs and therefore have the rule

′ ′ ″ ⟨1 , 1 , 3, 3 ; #, #, #⟩ → [휀, 휏10]() in 퐺′(ℳ). Let us abbreviate the non-terminals ⟨1, 5; @, @⟩, ⟨1, 2′, 3, 4′; ∗, ∗, ∗⟩, and ⟨1′, 1′, 3, 3″; #, #, #⟩ by 푆, 퐴, and 퐵, respectively. If we replace each transition in the rules of 퐺′(ℳ) by its compo- nent of type 훴∗, then we obtain the MCFG 퐺(ℳ) = ({푆, 퐴, 퐵}, 훴, 푆, 푅) where 푅 contains exactly the following five rules:

푆 → [a 푥1,1 푥1,2](퐴) 퐴 → [a 푥1,1 b, c 푥1,2 d](퐴) 퐵 → [휀, 휀]()

푆 → [푥1,1 푥1,2](퐵) 퐴 → [푥1,1 b, c 푥1,2 d](퐵)

′ Figure 3.37 shows a run of ℳ and the corresponding derivations of 퐺 (ℳ) and 퐺(ℳ). □

The formal construction Let us formalise the analysis of the transitions of a TSA ℳ from example 3.38. This is done in five steps: (step 1) To find out which transitions can be combined, we first group transitions bytheir source and target states and by their instructions. (step 2) We combine the transitions from step 1 to runs that move the stack-pointer either not at all, or one node down, or one node up, or one node down and then one node up. (step 3) We combine the runs from step 2 to tuples of linked runs (linked as described in the above example). (step 4) We form tuples of tuples of linked runs from step 4 in preparation for the construction of rules of an MCFG. (step 5) Finally, we use the tuples of tuples of linked runs from step 5 to define rules of an MCFG.

For the remainder of this section, let ℳ = (푄, TSS(훤 ), 훴, 푄i, 푄f, 푇 ) be an 푠- restricted cycle-free TSA.

Step 1: grouping transitions. We define the following four subsets of 푇: • For any 푞, 푞′ ∈ 푄 and 훾, 훾′ ∈ 훤 ∪ {@}, the set 푇 [푞 → 푞′; 훾 → 훾′] ⊆ 푇 shall contain all the transitions of 푇 that do not move the stack-pointer, have source state 푞, target state

85 3 An automaton characterisation for weighted MCFLs

푞′, and change the stack symbol at the current position from 훾 to 훾′:

푇 [푞 → 푞′; 훾 → 훾′] = {휏 ∈ 푇 ∣ 훾 = 훾′ = @, ∃푢 ∈ 훴∗: 휏 = (푞, bottom, 푢, 푞′)} ∪ {휏 ∈ 푇 ∣ 훾 = 훾′ ≠ @, ∃푢 ∈ 훴∗: 휏 = (푞, equals(훾), 푢, 푞′)} ∪ {휏 ∈ 푇 ∣ 훾 ≠ @, ∃푢 ∈ 훴∗: 휏 = (푞, write(훾′), 푢, 푞′)}.

′ ′ • For any 푞, 푞 ∈ 푄, 푗 ∈ ℕ, and 훽 ∈ 훤, the set 푇 [푞 → 푞 ; ↗푗 훽] ⊆ 푇 shall contain all the transitions of 푇 that have source state 푞, target state 푞′, and push(푗, 훽) as their instruction:

′ ∗ ′ 푇 [푞 → 푞 ; ↗푗 훽] = {휏 ∈ 푇 ∣ ∃푢 ∈ 훴 : 휏 = (푞, push(푗, 훽), 푢, 푞 )}.

′ ′ • For any 푞, 푞 ∈ 푄 and 푗 ∈ ℕ, the set 푇 [푞 → 푞 ; ↗푗] ⊆ 푇 shall contain all the transitions of 푇 that have source state 푞, target state 푞′, and up(푗) as their instruction:

′ ∗ ′ 푇 [푞 → 푞 ; ↗푗] = {휏 ∈ 푇 ∣ ∃푢 ∈ 훴 : 휏 = (푞, up(푗), 푢, 푞 )}.

• For any 푞, 푞′ ∈ 푄, the set 푇 [푞 → 푞′; ↘] ⊆ 푇 shall contain all the transitions of 푇 that have source state 푞, target state 푞′, and down as their instruction:

푇 [푞 → 푞′; ↘] = {휏 ∈ 푇 ∣ ∃푢 ∈ 훴∗: 휏 = (푞, down, 푢, 푞′)}.

Step 2: building runs. We define the following four kinds of subsets of Runsℳ:

′ ′ • For any states 푞, 푞 ∈ 푄 and stack-symbols 훾, 훾 ∈ 훤 ∪ {@}, the set Runsℳ[푞 → ′ ′ 푞 ; 훾 → 훾 ] ⊆ Runsℳ shall contain all runs of ℳ that do not move the stack-pointer, go from state 푞 to state 푞′, and change the stack symbol at the current position from 훾 to 훾′. It is recursively defined as follows:

′ ′ Runsℳ[푞 → 푞 ; 훾 → 훾 ] {휀} if 푞 = 푞′ and 훾 = 훾′ = { } ∅ otherwise ′ ′ ∪ ⋃ Runsℳ[푞 → ̄푞; 훾 → ̄훾] ∘ 푇 [ ̄푞→ 푞 ; ̄훾→ 훾 ]. ̄푞∈푄, ̄훾∈훤∪{@}

′ ′ • For any states 푞, 푞 ∈ 푄, stack symbols 훾, 훾 ∈ 훤 ∪ {@}, 훽 ∈ 훤, and number 푗 ∈ ℕ+, ′ ′ the set Runsℳ[푞 → 푞 ; 훾 → 훾 , ↗푗 훽] shall contain all runs of ℳ that go from state 푞 to 푞′ overall, and go from stack symbol 훾 to 훾′ without moving the stack-pointer and then move the stack-pointer to the 푗-th child which is labelled with stack symbol 훽. It is defined as

′ ′ Runsℳ[푞 → 푞 ; 훾 → 훾 , ↗푗 훽] ′ ′ ′ = ⋃ Runsℳ[푞 → ̄푞; 훾 → 훾 ] ∘ (푇 [ ̄푞→ 푞 ; ↗푗 훽] ∪ 푇 [ ̄푞→ 푞 ; ↗푗]). ̄푞∈푄

86 3.3 The equivalence of MCFGs and restricted TSAs

• down

훾 훾′ 훾 훾′ 푞 푞′ 푞 푞′

′ ′ ′ ′ Runsℳ[푞 → 푞 ; 훾 → 훾 ] Runsℳ[푞 → 푞 ; ↘, 훾 → 훾 ]

훽 훽 (푗, 훽) (푗) up push 훾 훾′ 훾 훾′ 푞 푞′ 푞 푞′

′ ′ Runsℳ[푞 → 푞 ; 훾 → 훾 , ↗푗 훽]

• 훽 • 훽 down (푗, 훽) down (푗) up push 훾 훾′ 훾 훾′ 푞 푞′ 푞 푞′

′ ′ Runsℳ[푞 → 푞 ; ↘, 훾 → 훾 , ↗푗 훽]

Figure 3.39: The four groups of runs from step 2.

′ ′ ′ • For any states 푞, 푞 ∈ 푄 and stack symbols 훾, 훾 ∈ 훤 ∪ {@}, the set Runsℳ[푞 → 푞 ; ↘ , 훾 → 훾′] shall contain all runs of ℳ that go from state 푞 to 푞′ overall, and move the stack-pointer to the parent and then go from stack symbol 훾 to 훾′ without moving the stack-pointer. It is defined as

′ ′ Runsℳ[푞 → 푞 ; ↘, 훾 → 훾 ] ′ ′ = ⋃ 푇 [푞 → ̄푞; ↘] ∘ Runsℳ[ ̄푞→ 푞 ; 훾 → 훾 ]. ̄푞∈푄

′ ′ • For any states 푞, 푞 ∈ 푄, stack symbols 훾, 훾 ∈ 훤 ∪ {@}, 훽 ∈ 훤, and number 푗 ∈ ℕ+, the ′ ′ set Runsℳ[푞 → 푞 ; ↘, 훾 → 훾 , ↗푗 훽] shall contain all runs of ℳ that go from state 푞 to 푞′ overall, and move the stack-pointer to the parent, then go from stack symbol 훾 to 훾′ without moving the stack-pointer, and finally move the stack-pointer to the 푗-th child which is labelled with stack symbol 훽. It is defined as

′ ′ Runsℳ[푞 → 푞 ; ↘, 훾 → 훾 , ↗푗 훽] ′ ′ = ⋃ 푇 [푞 → ̄푞; ↘] ∘ Runsℳ[ ̄푞→ 푞 ; 훾 → 훾 , ↗푗 훽]. ̄푞∈푄

87 3 An automaton characterisation for weighted MCFLs

The notation of the above sets has the form Runsℳ[푋; 푌 ] where 푋 signifies the overall state behaviour and 푌 represents the instructions on the tree-stack. A visual representation of the sets is shown in figure 3.39. Note that each of the sets is finite since ℳ is stationary cycle-free.

Step 3: pairing linked runs. We build tuples of runs from the four kinds of subsets of Runsℳ defined in the previous paragraph. This is achieved by matching the stack behaviour of neighbouring runs. Let 푡 = (휃1, …, 휃ℓ) be a tuple of runs of ℳ, i.e. 휃1, …, 휃ℓ ∈ Runsℳ. We call 푡 admissible if one of the following requirements is satisfied: ′ ′ (R1) ℓ = 1 and 휃1 ∈ Runsℳ[푞1 → 푞1; 훾0 → 훾1] for some 푞1, 푞1 ∈ 푄 and 훾0, 훾1 ∈ 훤 ∪ {@} or ′ ′ (R2) ℓ ≥ 2 and there are 푞1, 푞1, …, 푞ℓ, 푞ℓ ∈ 푄, 훾0, 훾1, …, 훾ℓ ∈ 훤 ∪ {@}, 푗1, …, 푗ℓ ∈ ℕ+, and 훽1, …, 훽ℓ−1 ∈ 훤 such that • 휃 ∈ Runs [푞 → 푞′ ; 훾 → 훾 , ↗ 훽 ], 1 ℳ 1 1 0 1 푗1 1 • for each 푖 ∈ {2, 3, …, ℓ − 1}: 휃 ∈ Runs [푞 → 푞′; ↘, 훾 → 훾 , ↗ 훽 ], and 푖 ℳ 푖 푖 푖−1 푖 푗푖 푖 ′ • 휃ℓ ∈ Runsℳ[푞ℓ → 푞ℓ; ↘, 훾ℓ−1 → 훾ℓ]. adm adm The set of admissible tuples of runs of ℳ is denoted by Runsℳ . Note that Runsℳ is finite since ℳ is restricted and cycle-free and the sets of runs defined in the previous paragraph are each finite. ′ Note that 휃푖휃푖+1 (푖 ∈ [ℓ − 1]) may not be a run in ℳ because 푞푖 and 푞푖+1 may be different. However, there may be a run 휃 in ℳ such that 휃푖휃휃푖+1 is a run in ℳ. Then 휃 would have to go ′ from state 푞푖 to 푞푖+1, and start with stack symbol 훽푖 and with the stack-pointer at the 푗푖-th child ′ position of where 휃푖 starts. We therefore say that there is a ⟨푞푖 → 푞푖+1; 푗푖, 훽푖⟩-gap between 휃푖 ′ and 휃푖+1, or alternatively, that the 푖-th gap in 푡 has kind ⟨푞푖 → 푞푖+1; 푗푖, 훽푖⟩. Furthermore, we ′ say that 푡 has type ⟨푞1 → 푞ℓ; 훾0 → 훾ℓ⟩. We intend to fill the gaps later by tuples of runs (of appropriate types) that are represented by variables. Hence, for any 푦2, 푦3, …, 푦ℓ ∈ X, we write 푡[푦2, 푦3, …, 푦ℓ] instead of 휃1푦2휃2푦3휃3…푦ℓ휃ℓ.

Step 4: pairing pairs of linked runs. The next step is to look at tuples of elements of adm adm Runsℳ . Let 푚 ∈ [푠] and 푇 = (푡1, …, 푡푚) be a tuple where 푡1, …, 푡푚 ∈ Runsℳ and let ℓ1, …, ℓ푚 be the numbers of gaps in 푡1, …, 푡푚, respectively. For each 푗 ∈ [푚] and 휅 ∈ [ℓ푗], let ′ ⟨푞(푗,휅) → 푞(푗,휅); 푖(푗,휅), 훽(푗,휅)⟩ be the kind of the 휅-th gap in 푡푗. Let 휑푇, 휓푇:ℕ+ × ℕ+ ‧‧➡ ℕ+ and 휋푇:ℕ+ × ℕ+ ‧‧➡ ℕ+ × ℕ+ be partial functions such that for every 푗 ∈ [푚] and 휅 ∈ [ℓ푗], the number 푖 is the 휑 (푗, 휅)-th distinct number occurring in 퐼 = 푖 ⋯푖 ⋯푖 ⋯푖 (푗,휅) 푇 (1,1) (1,|푡1|) (푚,1) (푚,|푡푚|) when read left-to-right, 푖(푗,휅) occurs for the 휓푇(푗, 휅)-th time at the element with index (푗, 휅) in 퐼, and 휋푇(푗, 휅) = (휑푇(푗, 휅), 휓푇(푗, 휅)). Moreover let 푘 be the count of distinct numbers in 퐼. We call 푇 admissible if

• the 휅-th run in 푡푗 ends with a push-instruction whenever 휑푇(푗, 휅) = 1 and ′ ′ • there are 푞1, 푞1, …, 푞푚, 푞푚 ∈ 푄 and 훾0, 훾1, …, 훾푚 ∈ 훤 ∪ {@} such that for every 푗 ∈ [푚], ′ we have that 푡푗 is of type ⟨푞푗 → 푞푗; 훾푗−1 → 훾푗⟩. adm adm ⋆ The set of admissible tuples of elements of Runsℳ is denoted by (Runsℳ ) . Note that adm ⋆ adm (Runsℳ ) is finite since Runsℳ is finite and 푚 ≤ 푠.

88 3.3 The equivalence of MCFGs and restricted TSAs

We then say that 푇 has type (퐴; 퐵1, …, 퐵푘), denoted by type(푇 ) = (퐴; 퐵1, …, 퐵푘), where ′ ′ ′ 퐴 = ⟨푞1 → 푞1, …, 푞푚 → 푞푚; 훾0⟩, and for every 휅 ∈ [푘]:

′ ′ 퐵 ′ = ⟨푞 −1 ′ → 푞 , …, 푞 −1 ′ → 푞 ; 훽 −1 ′ ⟩ . 휅 휋 (휅 ,1) −1 ′ 휋 (휅 ,ℓ휅′) −1 ′ 휋 (휅 ,1) 푇 휋푇 (휅 ,1) 푇 휋푇 (휅 ,ℓ휅′) 푇

Step 5: constructing an MCFG. In the following definition, we will construct an MCFG ′ adm ⋆ 퐺 (ℳ) for a given restricted TSA ℳ by creating a rule for each element of (Runsℳ ) .

Definition 3.40 (taken from Den16a, construction 16). Let ℳ = (푄, TSS(훤 ), 훴, 푄i, 푄f, 푇 ) be ′ a cycle-free 푠-restricted TSA in stack normal form. Define the 푠-MCFG 퐺 (ℳ) = (푁, 훴, 푁i, 푅) where adm ⋆ • 푁 = {퐴, 퐵1, …, 퐵푘 ∣ ⟨퐴; 퐵1, …, 퐵푘⟩ is the type of some element of (Runsℳ ) }, ′ ′ • 푁i = {⟨푞 → 푞 ; @⟩ ∣ 푞 ∈ 푄i, 푞 ∈ 푄f}, and adm ⋆ • 푅 contains for every 푇 = (푡1, …, 푡푚) ∈ (Runsℳ ) the rule

퐴 → [푢1, …, 푢푚](퐵1, …, 퐵푘)

where (퐴; 퐵 , …, 퐵 ) is the type of 푇 and 푢 = 푡 [푥 , …, 푥 ] for every 휅 ∈ 1 푘 휅 휅 휋푇(휅,1) 휋푇(휅,ℓ휅) [푚]. □

′ acc Lemma 3.41. ℒ(퐺 (ℳ)) = Runsℳ for any cycle-free 푠-restricted TSA ℳ in stack normal form.

Proof. Let ℳ = (푄, TSS(훤 ), 훴, 푄i, 푄f, 푇 ) be a cycle-free 푠-restricted TSA in stack normal ′ form. Furthermore, let 퐺 (ℳ) = (푁, 훴, 푁i, 푅) be defined as in definition 3.40 and letus abbreviate 훤 ∪ {@} by 훤@. ′ ′ For every 푚 ∈ [푠], 푞1, 푞1, …, 푞푚, 푞푚 ∈ 푄, and 훾 ∈ 훤@, we show the following property, ′ ′ abbreviated by 푃 (⟨푞1 → 푞1, …, 푞푚 → 푞푚; 훾⟩), by induction on the structure of derivations in 퐺′(ℳ):

∗ For every 휃1, …, 휃푚 ∈ 푇 : ′ ′ ′ (휃1, …, 휃푚) ∈ ℒ(퐺 (ℳ), ⟨푞1 → 푞1, …, 푞푚 → 푞푚; 훾⟩) ′ ′ ⟺ there are [휉1, 휀], [휉1, 휀], …, [휉푚, 휀], [휉푚, 휀] ∈ TS(훤@) such that 휉 (휀) = 훾 and ⋀푚−1((푞 , [휉 , 휀]) ⊢ (푞′, [휉′, 휀])) ∧ (휉′(휀) = 휉 (휀)). 1 푗=1 푗 푗 휃푗 푗 푗 푗 푗+1

′ ′ ∗ For this let 푘 ∈ [푠], 푞1, 푞1, …, 푞푚, 푞푚 ∈ 푄, 훾 ∈ 훤, and 휃1, …, 휃푚 ∈ 푇 . Let us abbreviate ⟨푝1 → 푞1, …, 푝푚 → 푞푚; 훾⟩ by 퐴. We derive the following sequence of equivalent statements:

′ (휃1, …, 휃푚) ∈ ℒ(퐺 (ℳ), 퐴)

1 ℓ ⟺ ∃퐴 → [푢1, …, 푢푚](퐵 , …, 퐵 ) ∈ 푅, ∃휃1⃗ ∈ ℒ(퐺′(ℳ), 퐵1), …, 휃ℓ⃗ ∈ ℒ(퐺′(ℳ), 퐵ℓ): 1⃗ ℓ⃗ [푢1, …, 푢푚](휃 , …, 휃 ) = (휃1, …, 휃푚) (by definitions 2.15 and 2.17)

89 3 An automaton characterisation for weighted MCFLs

1 ℓ ⟺ ∃퐴 → [푢1, …, 푢푚](퐵 , …, 퐵 ) ∈ 푅, ∃휃1, …, 휃1 , …, 휃ℓ, …, 휃ℓ ∈ 푇 ∗, 1 푚1 1 푚ℓ ∃[휉1, 휀], [휁1, 휀], …, [휉1 , 휀], [휁1 , 휀], …, [휉ℓ, 휀], [휁ℓ, 휀], …, [휉ℓ , 휀], [휁ℓ , 휀] ∈ TS(훤 ), 1 1 푚1 푚1 1 1 푚ℓ 푚ℓ @ ∃푝1, 푞1, …, 푝1 , 푞1 , …, 푝ℓ, 푞ℓ, …, 푝ℓ , 푞ℓ ∈ 푄: 1 1 푚1 푚1 1 1 푚ℓ 푚ℓ ⋀ 퐵푖 = ⟨푝푖 → 푞푖 , …, 푝푖 → 푞푖 , 휉푖 (휀)⟩ 1 1 푚푖 푚푖 1 푖∈[ℓ] ∧ [푢 , …, 푢 ]((휃1, …, 휃1 ), …, (휃ℓ, …, 휃ℓ )) = (휃 , …, 휃 ) 1 푚 1 푚1 1 푚ℓ 1 푚 푖 푖 푖 푖 푖 푖 ∧ ⋀ ((푝 , [휉 , 휀]) ⊢ 푖 (푞 , [휁 , 휀])) ∧ (휁 (휀) = 휉 (휀)) 휅 휅 휃휅 휅 휅 휅 휅+1 푖∈[ℓ],휅∈[푚푖−1] (by induction hypothesis, i.e. 푃 (퐵1) ∧ ⋯ ∧ 푃 (퐵ℓ))

1 ℓ ⟺ ∃퐴 → [푢1, …, 푢푚](퐵 , …, 퐵 ) ∈ 푅, ∃휃1, …, 휃1 , …, 휃ℓ, …, 휃ℓ ∈ 푇 ∗, 1 푚1 1 푚ℓ

∃[휉1, 휀], [휁1, 휀], …, [휉푚, 휀], [휁푚, 휀] ∈ TS(훤@): [푢 , …, 푢 ]((휃1, …, 휃1 ), …, (휃ℓ, …, 휃ℓ )) = (휃 , …, 휃 ) 1 푚 1 푚1 1 푚ℓ 1 푚

∧ 휉1(휀) = 훾

푚푖−1

∧ ⋀ ((푝푗, [휉푗, 휀]) ⊢ 1 1 ℓ ℓ (푞푗, [휁푗, 휀])) ∧ (휁푗(휀) = 휉푗+1(휀)) [푢푗]((휃 ,…,휃푚 ),…,(휃 ,…,휃푚 )) 푗=1 1 1 1 ℓ (by ⋆)

⟺ ∃[휉1, 휀], [휁1, 휀], …, [휉푚, 휀], [휁푚, 휀] ∈ TS(훤@):

푚푖−1 (휉 (휀) = 훾) ∧ ⋀ ((푞 , [휉 , 휀]) ⊢ (푞′, [휁 , 휀])) ∧ (휁 (휀) = 휉 (휀)) 1 푗 푗 휃푗 푗 푗 푗 푗+1 푗=1 (by definition 3.40)

푖 For (⋆), we use the fact that the 휃휅’s only make upward excursions. Thus there are tree-stacks 휉푗, 휁푗 whose label at position 휀 is consistent with the transitions contained in 푢푗, i.e. when applying the transitions in 푢푗 to 휉푗 the symbol at 휀 changes from 휉푗(휀) to 휁푗(휀), and whose labels in the 푖-th subtree are consistent with trees from 휉푖 , 휁푖 , …, 휉푖 , 휁푖 that correspond (i.e. 1 1 푚푖 푚푖 have the same indices) to the variables 푢푗. The claim of this lemma immediately follows from ⋀ 푃 (푆). ∎ 푆∈푁i

Proposition 3.42 (taken from Den16a, proposition 17). 푠-MCFL(훴) ⊇ 푠-TSA(훴) for any set 훴 and number 푠 ∈ ℕ+.

Proof. There is an MCFG 퐺(ℳ) such that ℒ(퐺) = {⟦휃⟧ ∣ 휃 ∈ ℒ(퐺′(ℳ))} because ⟦⋅⟧ is a homomorphism and 푘-MCFLs are closed under homomorphisms [Sek+91, theorem 3.9]. Then

acc ′ ℒ(ℳ) = {⟦휃⟧ ∣ 휃 ∈ Runsℳ} = {⟦휃⟧ ∣ 휃 ∈ ℒ(퐺 (ℳ))} (by lemma 3.41) = ℒ(퐺(ℳ)) (by def. of 퐺(ℳ)) ∎

90 3.3 The equivalence of MCFGs and restricted TSAs

3.3.3 The theorem and the weighted case Let us repeat theorem 3.19 from page 73 and provide the proof.

Theorem 3.19 (taken from Den16a, theorem 18). 푠-MCFL(훴) = 푠-TSL(훴) for any set 훴 and number 푠 ∈ ℕ+.

Proof. We get “⊆” from proposition 3.35 and “⊇” from proposition 3.42. ∎

By combining the above theorem with the decomposition and closure results from section 2.3.4, we obtain the following theorem:

Theorem 3.43. Let 훴 be a set, 푠 ∈ ℕ+ be a number, and 풜 be a complete commutative semiring. Then 푠-MCFL(훴, 풜) = 푠-TSL(훴, 풜).

Proof. In the following, we will denote • the set of all languages recognised by an 푠-restricted and unambiguous tree-stack au- tomaton with terminals from 훥 by 푠u-TSL(훥); • the set of all languages recognised by an 푠-restricted, unambiguous, and 휀-free tree-stack automaton with terminals from 훥 by 푠u휀-TSL(훥); and • the set of all languages recognised by an unambiguous 푠-MCFG with terminals from 훥 by 푠u-MCFL(훥). Let 퐿: 훴∗ → 풜 be a weighted language. We derive

퐿 ∈ 푠-TSL(훴, 풜) ⟺ ∃set 훥, ℎ ∈ HOM(훥, 훴, 풜), 퐿′ ∈ 푠u휀-TSL(훥): 퐿 = ℎ(퐿′) (by lemma 2.75) ⟺ ∃set 훥′, ℎ ∈ HOM(훥′, 훴, 풜), 퐿′ ∈ 푠u-MCFL(훥′): 퐿 = ℎ(퐿′) (by ⋆)

⟺ 퐿 ∈ 푠-MCFL(훴, 풜) (by lemma 2.74 and ⋆⋆) where (⋆⋆) holds because the constructions for lemma 2.74 preserve fanout 푠. In the remainder, we show (⋆), i.e. HOM(훥, 훥′)(푠u휀-TSL(훥)) = 푠u-MCFL(훥′) where 훥′ = 훥 ∪ {⊥} for some new symbol ⊥ ∉ 훥. • HOM(훥′, 훥)(푠u휀-TSL(훥′)) ⊆ 푠u휀-TSL(훥) ⊆ 푠u-MCFL(훥) follows from theorem 3.19 and the fact that the construction preserves unambiguity. • 푠u-MCFL(훥) ⊆ 푠u-TSL(훥) also follows from theorem 3.19 and the fact that the con- struction preserves unambiguity. • 푠u-TSL(훥) ⊆ HOM(훥′, 훥)(푠u휀-TSL(훥′)) can be shown by replacing in the 훥′-side of the inequation each 휀-transition with a ⊥-transition (that has the same source state, target state, and instruction as the replaced 휀-transition) and choosing an element of HOM(훥′, 훥) that deletes ⊥’s and acts as identity on the subset 훥 ⊆ 훥′. The extra homomorphism on the left-hand side of(⋆) disappears in the chain of equivalences because of lemma 2.73. ∎

91 3 An automaton characterisation for weighted MCFLs

3.4 Related formalisms

Let us compare tree-stack automata to several similar automata formalisms from the literature: • The relation between Turing machines [Tur37; Tur38] and tree-stack automata was dis- cussed at the beginning of section 3.2.2. In short: the classes of languages recognised by tree-stack automata and by Turing machines are the same. • A (one-way non-deterministic) stack automaton [GGH67] has an input tape, a current state, and a stack (which corresponds to our tree stack). The state behaviour and the reading of the input tape are the same in both formalisms. The stack of a stack automaton does not branch which makes it a monadic tree-stack. A stack automaton is only allowed to write to the stack if it is at the top of the stack whereas a tree-stack automaton may always write to the tree stack. Tree-stack automata are strictly more powerful than stack automata because each language recognised by a stack automaton is context-sensitive [HU69, theorem 13.7]. • The register tree pushdown transducers of Filé [Fil86, definition 1.4] also operate with the help of a tree stack: Their configurations contain a derivation tree of a CFGanda designated node 푛 in the tree. They are transducers from (derivation) trees to a so-called semantic domain. To get as close to tree-stack automata as possible, we choose a string- tuple algebra as the semantic domain. The derivation tree is given as the input ofthe transducer. Each node of the derivation tree is associated with assignment rules that define how so-called attributes of the node are calculated from attributes of its parent and its children. One could say that the attribute of a node depends on the attributes from which it is calculated (according to the assignment rules). During the evaluation of the derivation tree, the register tree pushdown transducers follows these dependencies (using a specific evaluation strategy). Therefore, the symbols of the output string-tuple are, in contrast to tree-stack automata, not produced from left to right during the run. • Let us consider deterministic tree-walking transducers (short: DTWTs) [AU71; we use the definition of Wei92]. The input of a DTWT is a parse tree (i.e. a tree of terminals and non-terminals) of an 휀-free context-free grammar.13 The difference between terminals and non-terminals is immaterial for the DTWT, hence the set of all terminals and non- terminals corresponds to the set of stack symbols of a TSA. The output is a string. The instructions allowed on the tree are “stay” (corresponds to “id” in a TSA), “up” (corresponds to “down” in a TSA), and “d(푘)” (for 푘 ∈ ℕ+, corresponds “up(푘)” in a TSA). A DTWT has a transition function (instead of a transition relation, since DTWTs deterministic and total). The transition function maps a tuple of state and stack symbol to a triple ofstate, instruction, and string of output symbols. The inspection of the stack symbol corresponds to an equals-instruction in a TSA. With the help of its states and instructions, a DTWT can check if a given tree is in fact a parse tree of its underlying context-free grammar. Hence, instead of the parse tree to be given to the DTWT, we could consider that it is guessed node-by-node during the run of the DTWT. Adopting this view, we can consider the first visit of a node of the parse treeasa push in the sense of a TSA. A DTWT has no

13A context-free grammar is called 휀-free if it contains no rule with 휀 on the right-hand side.

92 3.4 Related formalisms

aspect that corresponds to the bottom-instruction of a TSA; but “bottom” is not essential: We can simply require that the initial non-terminal 푆 of the underlying CFG does not occur on the right-hand side of any rule and then use an equals(푆)-instruction instead of the bottom-instruction. In conclusion, we can see that DTWTs and TSAs are quite similar. The two differences are that there is no write-instruction in DTWTs and that DTWTs are deterministic. In terms of expressive power, DTWTs and restricted TSAs are equivalent since they are both equivalent to MCFGs (evident from Weir [Wei92, page 138] and theorem 3.19). This is easily explained by the fact that every DTWT can be converted to an equivalent restricted DTWT:14 The only way for a DTWT to be unrestricted is to have a loop that allows itto execute an up-instruction arbitrarily often. But since a DTWT is deterministic, aloop that is once started would never terminate. Hence, all loops can be removed from the DTWT without changing its recognised language. • While our tree-stack automata generalise stack automata by allowing the stack to branch upwards (i.e. the root of the tree is the bottom of the stack), the tree-stack automata of Golubski and Lippe [GL96] generalise specific15 stack automata by allowing the stack to branch downwards (i.e. the root of the tree is the top of the stack). The tree-stack automata of Golubski and Lippe [GL96] have a read-mode in which the stack-pointer may be moved, but the stack may not be modified, and a write-mode in which the stack- pointer is at the root of the tree and the stack can be modified. Our tree-stack automata do not have distinct write- or read-modes: they can always write and always move the stack pointer. Due to the read- and write-modes of Golubski and Lippe [GL96], their tree-stack automata are not equivalent to Turing machines. The tree-stack automata of Golubski and Lippe [GL96] and our restricted tree-stack automata are incomparable with respect to their expressive power. This is evident from theorem 3.19, figure A.1, Golubski and Lippe [GL96, figure 4], and Golubski and Lippe [GL96, example on page 232]. • The thread automata of Villemonte de la Clergerie [Vil02a; Vil02b, section 2] also use a tree-stack: The tree is called a thread store and the stack-pointer is called active thread. Thread automata are introduced as a framework to design chart parsers for diverse gram- mar formalisms (such as tree-adjoining grammars and simple ordered range concatenation grammars). To make the resulting chart parsers efficient, a thread automaton has compo- nents such as a triggering function that are instantiated based on the concrete grammar formalism that we want to parse. The influence of these components on the expressive power of thread automata is not clear to the author. Hence, we omit the comparison of thread automata and tree-stack automata in terms of expressive power.

14The definition is the same as for TSAs: ADTWT ℳ is called restricted if the number of times that a node is entered from below is bounded by a constant for each run of ℳ. 15Golubski and Lippe [GL96] consider stack automata where modifications are only allowed at the top of the stack.

93

4 Approximation of weighted automata with data storage

This chapter is a revised version of sections 4 and 5 of T. Denkinger. “Approximation of Weighted Automata with Storage”. In: Proceedings of the Eighth International Symposium on Games, Automata, Logics and Formal Verification. Vol. 256. Electronic Proceedings in Theoretical Computer Science. Open Publishing Association, Sept. 2017, pp. 91–105. doi: 10.4204/eptcs. 256.7.

4.1 Introduction

Powerful language models make it easy to model the phenomena that occur in natural languages. For example, context-free grammars easily permit (in comparison to finite-state automata) modelling dependencies between constituents that are (arbitrarily) far apart in the sentence. On the other hand, many applications of language models (e.g. in translation systems or in speech recognition systems) require algorithms to be more efficient in time and space than can be achieved with powerful language models. Methods for the approximation of language models allow us to compromise between easy modelling and efficient algorithms: First, a language is modelled with a powerful formalism, such as context-free grammars, resulting in what is called a fine language model. Then, with the help of a so-called approximation strategy, this language model is converted into a less powerful formalism. The result of the conversion is called a coarse language model. The conversion is lossy since the powerful formalism might cover phenomena that the less powerful formalism does not. In particular, there might be strings that are allowed in the fine language model but not in the coarse language model and vice versa. Finally, wecan use the coarse language model in the design of our algorithms, providing us with the desired efficiency. The hope is that the coarse language model is sufficiently close to the finelanguage model to obtain results that are good enough for the specific practical application. (In section 6.3, we will even see an algorithm that allows us to use a coarse language model to increase the efficiency of an algorithm that works on a fine language model. This approach gives usthebest of both worlds: efficiency and powerful formalisms. However, the gain in efficiency ishighly dependent on the choice of the approximation strategy.) In order to approximate context-free grammars it is common (but not exclusive [e.g. Ned00; Cha+06]) to first construct an equivalent pushdown automaton and then approximate this automaton [KT81; Pul86; LL87; BS90; PW91; Eva97; Joh98], e.g. by restricting the height of the pushdown. We extend this idea in section 4.2 to automata with data storage (section 2.2.4). This allows us to use arbitrary data storage (instead of only pushdowns), and thus extends the theory to a wide variety of language classes, e.g. to multiple context-free languages using the automaton characterisation presented in chapter 3. The approximation of multiple context-free

95 4 Approximation of weighted automata with data storage

퐴 퐶 퐶′ 푖 퐴(푖) 퐴 퐶 퐶′

Figure 4.3: Sketch of the construction of 퐴(푖) in definition 4.2.

languages has been studied from a grammar-based point of view [BL05; Cra12]. Section 4.3 complements this with an automaton-based perspective. In section 4.4, we show that the theory presented in section 4.2 can be extended to the weighted setting.

4.2 Approximation of (unweighted) automata with data storage

An approximation strategy maps a data storage to another data storage. It is specified in terms of storage configurations and naturally extended to instructions, data storages, transitions, and automata with data storage.

Definition 4.1 (taken from Den17a, definition 13). Let DS = (퐶, 퐼, 퐶i, 퐶f) be a data storage. ′ ′ An approximation strategy is a partial function 퐴: 퐶 ‧‧➡ 퐶 for some set 퐶 . □

Definition 4.2 (taken from Den17a, definition 13). Let DS = (퐶, 퐼, 퐶i, 퐶f) be a data storage ′ and 퐴: 퐶 ‧‧➡ 퐶 be an approximation strategy. The approximation of DS with respect to 퐴, ′ denoted by 퐴(DS), is the data storage (퐶 , 퐴(퐼), 퐴(퐶i), 퐴(퐶f)) where • 퐴(퐼) = {퐴(푖) ∣ 푖 ∈ 퐼} with 퐴(푖) = 퐴−1 ; 푖 ; 퐴 for every 푖 ∈ 퐼,

• 퐴(퐶i) = {퐴(푐) ∣ 푐 ∈ 퐶i}, and

• 퐴(퐶f) = {퐴(푐) ∣ 푐 ∈ 퐶f}. □ Since we chose 퐴(푖) = 퐴−1 ; 푖 ; 퐴 in definition 4.2, the diagram in figure 4.3 commutes, i.e. 퐴 ; 퐴(푖) = 푖 ; 퐴 for any instruction 푖.1

Definition 4.4 (taken from Den17a, definition 13). Let DS = (퐶, 퐼, 퐶i, 퐶f) be a data storage ′ −1 ′ and 퐴: 퐶 ‧‧➡ 퐶 be an approximation strategy. We call 퐴 DS-proper if (퐴 ; 푖 ; 퐴)(푐 ) is finite ′ ′ for each 푖 ∈ 퐼 and 푐 ∈ 퐶 . □ Example 4.5 (taken from Den17a, example 15). Recall the data storage Count from example 2.30.

Furthermore, consider the approximation strategy 퐴o: ℕ → {odd} ∪ {2푛 ∣ 푛 ∈ ℕ} that assigns to every odd number the value odd and to every even number the number itself. Then 퐴o is not −1 −1 Count-proper since (퐴o ; inc ; 퐴o)(odd) = (퐴o ; dec ; 퐴o)(odd) = {2푛 ∣ 푛 ∈ ℕ} is not finite.

1A (relation) diagram is a directed graph whose vertices represent sets and whose edges represent binary relations between those sets. The closure of a path in such a diagram is obtained by composing the relations represented by the edges on the path in the order indicated by the arrows. We say that a diagram commutes if, for any two vertices, the closures of all paths between the two vertices are the same. A diagram that commutes is called a commutative diagram. You may consult Mendelson [Men62, chapter I, section 7] for further details on diagrams.

96 4.2 Approximation of (unweighted) automata with data storage

퐴eo 퐴eo {0, 2, 4, …} {even} {1, 3, 5, …} {odd}

inc 퐴eo(inc) inc 퐴eo(inc) 퐴eo 퐴eo {1, 3, 5, …} {odd} {2, 4, 6, …} {even}

Figure 4.6: Concrete example for the approximation of inc, cf. example 4.5.

Now consider the approximation strategy 퐴eo: ℕ → {even, odd} that returns odd for every −1 odd number and even otherwise. This approximation strategy is Count-proper since (퐴eo ; −1 −1 −1 inc ; 퐴eo)(even) = (퐴eo ; dec ; 퐴eo)(even) = {odd} and (퐴eo ; inc ; 퐴eo)(odd) = (퐴eo ; dec ; 퐴eo)(odd) = {even} are finite. The construction of 퐴eo(inc) is shown in figure 4.6. □

Definition 4.7 (taken from Den17a, definition 16). Let DS = (퐶, 퐼, 퐶i, 퐶f) be a data storage, ′ ℳ = (푄, DS, 훴, 푄i, 푄f, 푇 ) be an automaton with data storage, and 퐴: 퐶 ‧‧➡ 퐶 be an ap- proximation strategy. The approximation of ℳ with respect to 퐴, denoted by 퐴(ℳ), is the

automaton with data storage (푄, 퐴(DS), 훴, 푄i, 푄f, 퐴(푇 )) where 퐴(푇 ) = {퐴(휏) ∣ 휏 ∈ 푇 } and ′ ′ 퐴(휏) = (푞, 퐴(푖), 푢, 푞 ) for each 휏 = (푞, 푖, 푢, 푞 ) ∈ 푇. □ Example 4.8 (taken from Den17a, example 17). Let 훴 = {a, b}. Consider the automata

ℳ = ([3], Count, 훴, {1}, {3}, 푇 ) and 퐴eo(ℳ) = ([3], 퐴eo(Count), 훴, {1}, {3}, 퐴eo(푇 )) with

푇 : 휏1 = (1, inc , a, 1) 퐴eo(푇 ): 퐴eo(휏1) = (1, toggle , a, 1) 휏2 = (1, dec , b, 2) 퐴eo(휏2) = (1, toggle , b, 2) 휏3 = (2, dec , b, 2) 퐴eo(휏3) = (2, toggle , b, 2) 휏4 = (2, id({0}), 휀, 3) 퐴eo(휏4) = (2, id({even}), 휀, 3) where 퐴eo(inc) = toggle = 퐴eo(dec) with

toggle = {(even, odd), (odd, even)}

and 퐴eo(id({0})) = id({even}) are the instructions of 퐴eo(Count). The graphs of ℳ and 퐴eo(ℳ) are shown in figure 4.9. The word aabb ∈ {a, b}∗ is recognised by both automata:

ℳ: (1, 0, aabb) 퐴eo(ℳ): (1, even, aabb) ⊢ (1, 1, abb) ⊢ (1, odd, abb) 휏1 퐴eo(휏1) ⊢ (1, 2, bb) ⊢ (1, even, bb) 휏1 퐴eo(휏1) ⊢ (2, 1, b) ⊢ (2, odd, b) 휏2 퐴eo(휏2) ⊢ (2, 0, 휀) ⊢ (2, even, 휀) 휏3 퐴eo(휏3) ⊢ (3, 0, 휀) ⊢ (3, even, 휀). 휏4 퐴eo(휏4)

On the other hand, the word bb can be recognised by 퐴eo(ℳ) but not by ℳ:

(1, even, bb) ⊢ (2, odd, b) ⊢ (2, even, 휀) ⊢ (3, even, 휀). 퐴eo(휏2) 퐴eo(휏3) 퐴eo(휏4) □

97 4 Approximation of weighted automata with data storage

⟨inc, a⟩ ⟨dec, b⟩

⟨dec, b⟩ ⟨id({0}), 휀⟩ ℳ: start 1 2 3

⟨toggle, a⟩ ⟨toggle, b⟩

⟨toggle, b⟩ ⟨id({even}), 휀⟩ 퐴eo(ℳ): start 1 2 3

Figure 4.9: The automata ℳ and 퐴eo(ℳ) from example 4.8

퐶 퐶

퐴1 퐴2 퐴1 퐴2 ∃ ∃ 퐶1 퐶2 퐶1 퐶2

(a) 퐴1 is finer than 퐴2 (b) 퐴1 is less partial than 퐴2

Figure 4.12: Illustration of definition 4.13 using commutative diagrams. Dashed arrows represent partial functions, solid arrows represent total functions, dashed arrows with hooks represent injective partial functions, and arrows labelled with ∃ denote that such an arrow exists.

Observation 4.10 (taken from Den17a, observation 18). Let DS = (퐶, 퐼, 퐶i, 퐶f), ℳ be a ̄ ̄ ′ (DS, 훴)-automaton, and 퐴1: 퐶 ‧‧➡ 퐶 and 퐴2: 퐶 ‧‧➡ 퐶 be approximation strategies. Then 퐴2(퐴1(DS)) = (퐴1 ; 퐴2)(DS) and 퐴2(퐴1(ℳ)) = (퐴1 ; 퐴2)(ℳ).

Proof. Follows directly from definitions 4.2 and 4.7. ∎

Observation 4.11 (taken from Den17a, observation 18). Let DS = (퐶, 퐼, 퐶i, 퐶f), ℳ be a ̄ ̄ ′ (DS, 훴)-automaton, and 퐴1: 퐶 ‧‧➡ 퐶 and 퐴2: 퐶 ‧‧➡ 퐶 be approximation strategies. If 퐴1 is DS-proper and 퐴2 is 퐴1(DS)-proper, then 퐴2(퐴1(DS)) is DS-proper.

Proof. Follows directly from definitions 4.2 and 4.4. ∎

We call an approximation strategy total if it is a total function and we call it injective if it is an injective partial function. The distinction between total and injective approximation strategies allows us to define two preorders on approximation strategies (definition 4.13) and provides us with simple criteria to ensure that an approximation strategy leads to a superset (proposition 4.15) or a subset approximation (proposition 4.20).

98 4.2 Approximation of (unweighted) automata with data storage

Definition 4.13 (taken from Den17a, definition 19). Let 퐴1: 퐶 ‧‧➡ 퐶1 and 퐴2: 퐶 ‧‧➡ 퐶2 be approximation strategies. We call 퐴1 finer than 퐴2, denoted by 퐴1 ⪯ 퐴2, if there is a total approximation strategy 퐴: 퐶1 → 퐶2 with 퐴1 ; 퐴 = 퐴2. We call 퐴1 less partial than 퐴2, denoted by 퐴1 ⊑ 퐴2, if there is an injective approximation strategy 퐴: 퐶1 ‧‧➡ 퐶2 with 퐴1 ; 퐴 = 퐴2. □

4.2.1 Superset approximations In this section we will show that total approximation strategies lead to superset approximations.

Lemma 4.14 (taken from Den17a, lemma 20). Let ℳ = (푄, DS, 훴, 푄i, 푄f, 푇 ) be an automaton ′ where DS = (퐶, 퐼, 퐶i, 퐶f), and 퐴: 퐶 → 퐶 be a total approximation strategy. Then for each 휃 ∈ 푇 ∗, 푞, 푞′ ∈ 푄, 푐, 푐′ ∈ 퐶, and 푤, 푤′ ∈ 훴∗, the following holds:

′ ′ ′ ′ ′ ′ (푞, 푐, 푤) ⊢휃 (푞 , 푐 , 푤 ) ⟹ (푞, 퐴(푐), 푤) ⊢퐴(휃) (푞 , 퐴(푐 ), 푤 )

Proof. The claim can be shown by induction on the lengthof 휃. Induction base: The induction base is given by the string 휃 = 휀 of length 0. Induction step: We assume that the above claim holds for all strings 휃 ∈ 푇 ∗ of length 푛. Let ′ ′ ′ ∗ 푛 ′ ′ ′ 푞, 푞 ∈ 푄, 푐, 푐 ∈ 퐶, 푤, 푤 ∈ 훴 , 휃 ∈ 푇 , and 휏 ∈ 푇 such that (푞, 푐, 푤) ⊢휃휏 (푞 , 푐 , 푤 ). Then there are 푢 ∈ 훴∗, 푣 ∈ 훴 ∪ {휀}, and a configuration ( ̄푞, ̄푐, 푣푤′) ∈ 푄 × 퐶 × 훴∗ such that ′ ′ ′ ′ ′ ′ ′ 푤 = 푢푣푤 and (푞, 푐, 푢푣푤 ) ⊢휃 ( ̄푞, ̄푐, 푣푤 ) ⊢휏 (푞 , 푐 , 푤 ). Hence 휏 has the form ( ̄푞, 푖, 푣, 푞 ) ′ ′ for some 푖 ∈ 퐼 with ( ̄푐, 푐 ) ∈ 푖. By induction hypothesis, we have (푞, 퐴(푐), 푢푣푤 ) ⊢퐴(휃) ( ̄푞, 퐴( ̄푐), 푣푤′) and by definition 4.7, we know that 퐴(휏) = ( ̄푞, 퐴(푖), 푣, 푞′) ∈ 퐴(푇 ). Note that 퐴(푐′) ∈ 퐶′ exists since 퐴 is a total function. Now from definition 4.2, we immediately obtain that (퐴( ̄푐), 퐴(푐′)) ∈ 퐴(푖). Since the states and the read string of terminals (or 휀) are taken ′ ′ ′ ′ over in 퐴(휏), we therefore know that ( ̄푞, 퐴( ̄푐), 푣푤 ) ⊢퐴(휏) (푞 , 퐴(푐 ), 푤 ). ∎

Proposition 4.15 (taken from Den17a, theorem 21). Let ℳ be a (DS, 훴)-automaton with DS = ′ (퐶, 퐼, 퐶i, 퐶f) and let 퐴: 퐶 → 퐶 be a total approximation strategy. Then ℒ(퐴(ℳ)) ⊇ ℒ(ℳ).

Proof. The claim follows immediately from lemma 4.14 and the definition of 퐴(ℳ). ∎

Example 4.16 (taken from Den17a, example 22). Recall ℳ and 퐴eo(ℳ) from example 4.8. 푛 푛 푚 푛 Their recognised languages are ℒ(ℳ) = {a b ∣ 푛 ∈ ℕ+} and ℒ(퐴eo(ℳ)) = {a b ∣ 푚 ∈ ℕ, 푛 ∈ ℕ+, 푚 ≡ 푛 mod 2}. Thus, ℒ(퐴eo(ℳ)) is a superset of ℒ(ℳ). □

Corollary 4.17. Let ℳ be a (DS, 훴)-automaton, and 퐴1: 퐶 ‧‧➡ 퐶1 and 퐴2: 퐶 ‧‧➡ 퐶2 be approximation strategies. If 퐴1 is finer than 퐴2, then ℒ(퐴1(ℳ)) ⊆ ℒ(퐴2(ℳ)).

Proof. Since 퐴1 is finer than 퐴2, there is a total approximation strategy 퐴: 퐶1 → 퐶2 such that 퐴1 ; 퐴 = 퐴2. Hence we obtain

ℒ(퐴1(ℳ)) ⊆ ℒ(퐴(퐴1(ℳ))) (by proposition 4.15)

= ℒ((퐴1 ; 퐴)(ℳ)) (by observation 4.10)

= ℒ(퐴2(ℳ)). ∎

99 4 Approximation of weighted automata with data storage

The following example shows four approximation strategies that occur in the literature. The first three approximation strategies approximate a context-free language by a recognisable language (taken from Nederhof [Ned00, Sec. 7]). The fourth approximation strategy approxi- mates a context-free language by another context-free language. It is easy to see that the shown approximation strategies are total and thus lead to superset approximations.

Example 4.18 (taken from Den17a, example 24). Let 훤 be a set and 푘 ∈ ℕ+. (i) Evans [Eva97] proposed to map each pushdown to its top-most element. The same result is achieved by dropping conditions 7 and 8 from Baker [Bak81]. This idea is expressed by ∗ the total approximation strategy 퐴top: 훤 → 훤 ∪ {@} where

@ if 푐 = 휀 퐴 (푐) = { } top 훾 if 푐 is of the form 훾푐′ with 훾 ∈ 훤

∗ for each 푐 ∈ 훤 . 퐴top is PD(훤 )-proper if and only if 훤 is finite. (ii) Bermudez and Schimpf [BS90] proposed to map each pushdown to its top-most 푘 elements. ∗ ∗ This idea is expressed by the total approximation strategy 퐴top,푘: 훤 → {푤 ∈ 훤 ∣ |푤| ≤ 푘} where

푐 if |푐| ≤ 푘 퐴 ,푘(푐) = { } top 푢 if 푐 is of the form 푢푣 with 푢 ∈ 훤 푘 and 푣 ∈ 훤 +

∗ for each 푐 ∈ 훤 . 퐴top,푘 is PD(훤 )-proper if and only if 훤 is finite. (iii) Pereira and Wright [PW91] proposed to map each pushdown to one where no pushdown symbol occurs more than once. To achieve this, they replace in the given pushdown 푤 each sub-string of the form 훾푤′훾 (for some 훾 ∈ 훤 and 푤′ ∈ 훤 ∗) by the symbol 훾: ∗ Consider the total approximation strategy 퐴uniq: 훤 → Seqnr(훤 ) where

• Seqnr(훤 ) denotes the set of all strings over 훤 without repetition and 퐴 (푢훾푤) if 푐 is of the form 푢훾푣훾푤 with 훾 ∈ 훤 • 퐴 (푐) = { uniq } for each 푐 ∈ uniq 푐 otherwise 훤 ∗.

퐴uniq is PD(훤 )-proper if and only if 훤 is finite. (iv) In their coarse-to-fine parsing approach for CFGs, Charniak, Pozar, Vu, Johnson, Elsner, Austerweil, Ellis, Haxton, Hill, Shrivaths, and Moore [Cha+06] propose, given an equiva- lence relation ≡ on the set of non-terminals 푁 of some CFG 퐺, to construct a new CFG 퐺′ whose non-terminals are the equivalence classes of ≡. Let 훴 be the terminal alphabet of 퐺. Say that 푔: 푁 → 푁/≡ is the function that assigns for a nonterminal of 퐺 its corre- sponding equivalence class; and let 푔′: (푁 ∪ 훴)∗ → ((푁/≡) ∪ 훴)∗ be an extension of 푔∪{(휎, 휎) ∣ 휎 ∈ 훴} to strings of terminals and non-terminals. Then ℒ(푔′(ℳ)) = ℒ(퐺′) where ℳ is the (PD(푁 ∪ 훴), 훴)-automaton obtained from 퐺 by the usual construction ′ [HU79, Thm. 5.3]. The approximation strategy 푔 is PD(푁 ∪ 훴)-proper. □

100 4.2 Approximation of (unweighted) automata with data storage

4.2.2 Subset approximations In this section we will show that injective approximation strategies lead to a subset approxima- tion, this is proved by a variation of the proof of proposition 4.15.

Lemma 4.19 (taken from Den17a, lemma 25). Let ℳ = (푄, DS, 훴, 푄i, 푄f, 푇 ) be an automaton ′ where DS = (퐶, 퐼, 퐶i, 퐶f), and 퐴: 퐶 ‧‧➡ 퐶 be an injective approximation strategy. Then ∗ ′ ′ ′ ∗ ′ ′ ′ for each 휃 ∈ 푇 , 푞, 푞 ∈ 푄, 푐, 푐 ∈ img(퐴), and 푤, 푤 ∈ 훴 : (푞, 푐, 푤) ⊢퐴(휃) (푞 , 푐 , 푤 ) ⟹ −1 ′ −1 ′ ′ (푞, 퐴 (푐), 푤) ⊢휃 (푞 , 퐴 (푐 ), 푤 ). Proof idea. The claim can be shown by straightforward induction on the lengthof 휃.

Proof. Induction base: The induction base is given by the string 휃 = 휀 of length 0. Induction step: We assume that the above claim holds for all strings of length 푛. Let 푞, 푞′ ∈ 푄, ′ ′ ∗ 푛 ′ ′ ′ 푐, 푐 ∈ img(퐴), 푤, 푤 ∈ 훴 , 휃 ∈ 푇 , and 휏 ∈ 푇 such that (푞, 푐, 푤) ⊢퐴(휃휏) (푞 , 푐 , 푤 ). Then there are 푢 ∈ 훴∗, 푣 ∈ 훴∗, ̄푞∈ 푄, and ̄푐∈ img(퐴) such that 푤 = 푢푣푤′ and

′ ′ ′ ′ ′ (푞, 푐, 푢푣푤 ) ⊢퐴(휃) ( ̄푞, ̄푐, 푣푤 ) ⊢퐴(휏) (푞 , 푐 , 푤 ).

Hence 퐴(휏) has the form ( ̄푞, 퐴(푖), 푣, 푞′) for some 푖 ∈ 퐼 such that ( ̄푐, 푐′) ∈ 퐴(푖). By induction −1 ′ −1 ′ hypothesis, we have (푞, 퐴 (푐), 푢푣푤 ) ⊢휃 ( ̄푞, 퐴 ( ̄푐), 푣푤 ) and by definition 4.7 we know that 휏 = ( ̄푞, 푖, 푣, 푞′) ∈ 푇. Note that 퐴−1(푐′) is uniquely defined since 푐′ ∈ img(퐴) and 퐴 is injective. Now from definition 4.2, we immediately obtain that (퐴−1( ̄푐), 퐴−1(푐′)) ∈ 푖. Since the states and the read string of terminals are taken over in 퐴(휏), we therefore know that −1 ′ ′ −1 ′ ′ ( ̄푞, 퐴 ( ̄푐), 푣푤 ) ⊢휏 (푞 , 퐴 (푐 ), 푤 ). ∎ Proposition 4.20 (taken from Den17a, theorem 26). Let ℳ be a (DS, 훴)-automaton where ′ DS = (퐶, 퐼, 퐶i, 퐶f) and 퐴: 퐶 ‧‧➡ 퐶 be an injective approximation strategy. Then ℒ(퐴(ℳ)) ⊆ ℒ(ℳ).

Proof. The claim follows immediately from lemma 4.19 and the definition of 퐴(ℳ). ∎

Corollary 4.22 (taken from Den17a, corollary 27). Let ℳ be a (DS, 훴)-automaton, and 퐴1: 퐶 ‧‧➡ 퐶1 and 퐴2: 퐶 ‧‧➡ 퐶2 be approximation strategies. If 퐴1 is less partial than 퐴2, then ℒ(퐴1(ℳ)) ⊇ ℒ(퐴2(ℳ)).

Proof. Since 퐴1 is less partial than 퐴2, we know that there is an injective approximation strategy 퐴 such that 퐴1 ; 퐴 = 퐴2. Hence we obtain

ℒ(퐴1(ℳ)) ⊇ ℒ(퐴(퐴1(ℳ))) (by proposition 4.20)

= ℒ((퐴1 ; 퐴)(ℳ)) (by observation 4.10)

= ℒ(퐴2(ℳ)). ∎

The following example approximates a context-free language with a recognisable language (taken from Nederhof [Ned00, Sec. 7]). It is easy to see that the shown approximation strategy is injective and thus leads to subset approximations.

101 4 Approximation of weighted automata with data storage

퐶1 퐴1

퐶 ∃ implies ℒ(퐴1(ℳ)) ⊆ ℒ(퐴2(ℳ))

퐴2 퐶2 (a) Illustration of corollary 4.17

퐶1 퐴1

퐶 ∃ implies ℒ(퐴1(ℳ)) ⊇ ℒ(퐴2(ℳ))

퐴2 퐶2 (b) Illustration of corollary 4.22

Figure 4.21: Illustrations of corollaries 4.17 and 4.22 using commutative diagrams. As in fig- ure 4.12, dashed arrows represent partial functions, solid arrows represent total functions, dashed arrows with hooks represent injective partial functions, and ar- rows labelled with ∃ denote that such an arrow exists.

Example 4.23 (taken from Den17a, example 28). Let 훤 be a set and 푘 ∈ ℕ+. Krauwer and Tombe [KT81], Pulman [Pul86], and Langendoen and Langsam [LL87] proposed to disallow pushdowns + of a height greater than 푘. This can be achieved by the partial identity 퐴bd,푘: 훤 ‧‧➡ {푤 ∈ 훤 ∣ |푤| ≤ 푘} where 푐 if |푐| ≤ 푘 퐴 (푐) = { } bd,푘 undefined otherwise

∗ for each 푐 ∈ 훤 . Note that 퐴bd,푘 is PD(훤 )-proper. □

4.2.3 Potentially incomparable approximations The following example shows that our framework is also capable of expressing approximation strategies that lead neither to superset nor to subset approximations.

Example 4.24 (taken from Den17a, example 29). Let 훤 be a set and 훥 be a finite set, 푘 ∈ ℕ+, and 푔: 훤 → 훥 be a total function. For pushdown automata with an infinite pushdown alphabet, Johnson [Joh98, end of Section 1.4] proposed to first approximate the infinite pushdown alphabet with a finite set and then restrict the pushdown height to 푘. This can be easily expressed as the composition of two approximation strategies:

+ 퐴incomp,푘: 훤 ‧‧➡ {푤 ∣ 푤 ∈ 훥, |푤| ≤ 푘} 퐴incomp,푘 = ̂푔; 퐴bound,푘 where ̂푔: 훤 ∗ → 훥∗ is the extension of 푔 to strings. Let |훥| < |훤 |. Then ̂푔 is total but not

injective, 퐴bd,푘 is injective but not total, and 퐴incomp,푘 is neither total nor injective. Hence propositions 4.15 and 4.20 provide no further insights into the approximation strategy 퐴incomp,푘.

102 4.3 Approximation of multiple context-free languages

This concurs with the observation of Johnson [Joh98, end of Section 1.4]that 퐴incomp,푘 is not guaranteed to induce either subset or superset approximations. □

4.3 Approximation of multiple context-free languages

Due to the equivalence of pushdown automata and context-free grammars [HU79, Thms. 5.3 and 5.4], the approximation strategies in examples 4.18 and 4.23 can be used for the approx- imation of context-free languages. The framework presented in this chapter together with the automaton characterisation of multiple context-free languages (see chapter 3) allows an automata-theoretic view on the approximation of multiple context-free languages. Consider the following example of a TSA.

Example 4.25 (taken from Den17a, example 36). Let 훴 = {a, b, c}, 훤 = {∗, #}, and ℳ = ([4], TSS(훤 ), 훴, {1}, {4}, 푇 ) be an automaton with data storage where 푇 contains exactly the following seven transitions:

휏1 = (1, push(1, ∗) , a, 1) 휏5 = (2, bottom ; up(1) , 휀, 3) 휏2 = (1, push(1, #) , 휀, 2) 휏6 = (3, equals(∗) ; up(1), c, 3) 휏3 = (2, equals(#) ; down, 휀, 2) 휏7 = (3, equals(#) , 휀, 4) 휏4 = (2, equals(∗) ; down , b, 2)

The runs of ℳ all have a specific form: ℳ executes 휏1 arbitrarily often (say 푛 times) until 푛 푛+1 it executes 휏2, leading to the storage configuration 휁 = {(휀, @), (1, ∗), …, (1 , ∗), (1 , #)} where 1푘 means that 1 is repeated 푘 times. The stack of 휁 is a monadic tree2 where the leaf is labelled with #, the root is labelled with @, and the remaining 푛 nodes are labelled with ∗. The stack pointer of 휁 points to the leaf. From this configuration ℳ executes 휏3 once and 휏4 푛 times (i.e. for each ∗ on the stack), moving the stack pointer to the root. Then ℳ executes 휏5 once and 휏6 푛 times, reaching the leaf again. Finally, ℳ executes 휏6, leading to the final state. Hence 푛 푛 푛 the language of ℳ is ℒ(ℳ) = {a b c ∣ 푛 ∈ ℕ}, which is not context-free. □

The approximation strategies for multiple context-free grammars from the literature canbe translated to TSAs. This is shown in the next example.

Example 4.26 (taken from Den17a, example 37). The following two approximation strategies for multiple context-free languages are taken from the literature. Let 훤 be a set. (i) Cranenburgh [Cra12, section 4] observed that the idea of example 4.18 (iv) also applies to MCFGs. The idea can be applied to tree-stack automata similar to the way it was applied to pushdown automata in example 4.18 (iv). The resulting data storage is still a tree-stack storage. This approximation strategy is total and thus leads to a superset approximation. (ii) Burden and Ljunglöf [BL05, section 4] and Cranenburgh [Cra12, section 4] proposed to split each production of a given MCFG into multiple productions, each of fanout 1. Since the resulting grammar is of fanout 1, it produces a context-free language and can

2A monadic tree is a tree where each node has at most one child.

103 4 Approximation of weighted automata with data storage

(1, push(∗) , a, 1) (1, {(훾, ∗) ∣ 훾 ∈ 훤 ∪ {@}} , a, 1) (1, push(#) , 휀, 2) (1, {(훾, #) ∣ 훾 ∈ 훤 ∪ {@}}, 휀, 2) (2, top(#) ; pop , 휀, 2) (2, {(#, 훾) ∣ 훾 ∈ 훤 ∪ {@}}, 휀, 2) (2, top(∗) ; pop , b, 2) (2, {(∗, 훾) ∣ 훾 ∈ 훤 ∪ {@}} , b, 2) (2, id({휀}) ; push(훤 ), 휀, 3) (2, {(@, 훾) ∣ 훾 ∈ 훤 ∪ {@}}, 휀, 3) (3, top(∗) ; push(훤 ) , c, 3) (3, {(∗, 훾) ∣ 훾 ∈ 훤 ∪ {@}} , c, 3) (3, top(#) ; pop , 휀, 4) (3, {(#, #)} , 휀, 4)

Figure 4.27: Transitions of 퐴cf,훤(ℳ) (left) and (퐴cf,훤 ; 퐴top)(ℳ) (right), taken from Den17a, figure 2.

be recognised by a pushdown automaton. The corresponding approximation strategy in ∗ our framework is 퐴cf,훤: TS(훤 ∪ {@}) → 훤 where

퐴cf,훤(⟨휉, 푛1⋯푛푘⟩) = 휉(푛1)휉(푛1푛2)⋯휉(푛1⋯푛푘)

for every ⟨휉, 푛1⋯푛푘⟩ ∈ TS(훤 ∪ {@}) with 푛1, …, 푛푘 ∈ ℕ+. The resulting data storage is a pushdown storage (see example 2.34). 퐴cf,훤 is total and thus leads to a superset approximation. □

Let us apply the approximation strategies 퐴cf,훤 and 퐴cf,훤 ; 퐴top (examples 4.26 and 4.18) to the TSA ℳ from example 4.25.

Example 4.28 (taken from Den17a, example 38). Let us consider the (TSS(훤 ), 훴)-automaton ℳ from example 4.25. Figure 4.27 shows the transitions of the (PD″(훤 ), 훴)-automaton ″ 퐴cf,훤(ℳ) on the left and the (퐴top(PD (훤 )), 훴)-automaton (퐴cf,훤 ; 퐴top)(ℳ) on the right. 푛 푛 푚 The languages recognised by the two automata are ℒ(퐴cf,훤(ℳ)) = {푎 푏 푐 ∣ 푛, 푚 ∈ ℕ} 푛 푚 푘 and ℒ((퐴cf,훤 ; 퐴top)(ℳ)) = {푎 푏 푐 ∣ 푛, 푚, 푘 ∈ ℕ}. Clearly, ℒ(퐴cf,훤(ℳ)) is a context-free language. Since (퐴cf ; 퐴top)(ℳ) has finitely many storage configurations, its language is recog- nisable by an FSA (see proposition 2.47). □

4.4 Approximation of weighted automata with data storage

For the remainder of this section, let (풜, ⊕, ⊙, ퟘ, ퟙ) be a commutative semir- ing and ⊴ be a partial order on 풜 such that ퟘ ⊴ 푎, 푎1 ⊴ 푎1 ⊕ 푎, and 3 푎1 ⊙ 푎 ⊴ 푎2 ⊙ 푎 for each 푎, 푎1, 푎2 ∈ 풜 with 푎1 ⊴ 푎2.

This section will define a well-behaved extension of approximation of automata withdata storage to a weighted setting. We take “well-behaved” to mean that there are appropriate extensions of propositions 4.15 and 4.20. The following definition gives such an extension.

Definition 4.29 (taken from Den17a, definition 32). Let ℳ = (푄, DS, 훴, 푄i, 푄f, 휇) be an 풜- ′ weighted automaton with data storage where DS = (퐶, 퐼, 퐶i, 퐶f) and let 퐴: 퐶 ‧‧➡ 퐶 be an

3Then (풜, ⊙, ퟙ, ퟘ, ⊴) is what we call a POCMOZ in chapter 6.

104 4.4 Approximation of weighted automata with data storage

approximation strategy. The approximation of ℳ with respect to 퐴, denoted by 퐴(ℳ) is the

풜-weighted automaton with data storage (푄, 퐴(DS), 훴, 푄i, 푄f, 퐴(휇)) where 퐴(DS) is defined as in definition 4.2 and 퐴(휇)(휏 ′) = ⨁ 휇(휏) 휏∈supp(휇):퐴(휏)=휏′ ′ ∗ ∗ for every 휏 ∈ 푄 × 퐴(퐼) × 훴 × 푄. □ Let us state two properties of our definition.

Lemma 4.30 (taken from Den17a, lemma 33). Let ℳ be a (DS, 훴, 풜)-automaton where DS = ′ (퐶, 퐼, 퐶i, 퐶f) and 퐴: 퐶 ‧‧➡ 퐶 be an approximation strategy. Then

′ ′ wt퐴(ℳ)(휃 ) ⊵ ⨁ ′ wtℳ(휃) for every 휃 ∈ Runs퐴(ℳ). 휃∈Runsℳ:퐴(휃)=휃

′ Proof. Let ℳ = (푄, DS, 훴, 푄i, 푄f, 휇). We proof the claim by induction on the length of 휃 . Induction base: Let 휃′ = 휀. We derive

wt퐴(ℳ)(휀) = ퟙ ⊵ ퟙ = wtℳ(휀) = ⨁ wtℳ(휃). 휃∈Runsℳ:퐴(휃)=휀

′ ′ Induction step: Let 휃 ∈ Runs퐴(ℳ) such that the claim holds. Furthermore, let 휏 ∈ 푄 × ∗ ′ ′ 퐴(퐼) × 훴 × 푄 such that 휃 휏 ∈ Runs퐴(ℳ). Then

′ ′ ′ ′ wt퐴(ℳ)(휃 휏 ) = wt퐴(ℳ)(휃 ) ⊙ 퐴(휇)(휏 ) ′ ⊵ ( ⨁ ′ wtℳ(휃)) ⊙ 퐴(휇)(휏 ) 휃∈Runsℳ,퐴(휃)=휃 (by induction hypothesis and definition of ⊴)

= ( ⨁ ′ wtℳ(휃)) ⊙ ( ⨁ ′ 훿(휏)) 휃∈Runsℳ,퐴(휃)=휃 휏∈supp(휇): 퐴(휏)=휏 (by definition 4.29)

= ⨁ ′ ′ wtℳ(휃) ⊙ 휇(휏) 휃∈Runsℳ,휏∈supp(휇): (퐴(휃)=휃 )∧(퐴(휏)=휏 ) (by distributivity of 풜)

⊵ ⨁ ′ ′ wtℳ(휃) ⊙ 휇(휏) 휃∈Runsℳ,휏∈supp(휇): 휃휏∈Runsℳ∧(퐴(휃휏)=휃 휏 ) (by (∗) and definition of ⊴) ̄ = ⨁ ̄ ̄ ′ ′ wtℳ(휃) (by definition 4.29) 휃∈Runsℳ:(퐴(휃)=휃 휏 ) For (∗), we note that the index set of the left sum subsumes that of the right sum andhence ⊵ is justified. ∎

Lemma 4.31 (taken from Den17a, lemma 33). Let ℳ be a (DS, 훴, 풜)-automaton where DS = ′ (퐶, 퐼, 퐶i, 퐶f) and 퐴: 퐶 ‧‧➡ 퐶 be an approximation strategy. If 퐴 is injective, then

′ ′ wt퐴(ℳ)(휃 ) = ⨁ wtℳ(휃) for every 휃 ∈ Runs퐴(ℳ). ′ 휃∈Runsℳ:퐴(휃)=휃

′ Proof. Let ℳ = (푄, DS, 훴, 푄i, 푄f, 휇). We proof the claim by induction on the length of 휃 .

105 4 Approximation of weighted automata with data storage

Induction base: Let 휃′ = 휀. We derive

wt퐴(ℳ)(휀) = ퟙ = wtℳ(휀) = ⨁ wtℳ(휃). 휃∈Runsℳ:퐴(휃)=휀

′ ′ Induction step: Let 휃 ∈ Runs퐴(ℳ) such that the claim holds and let 휏 ∈ supp(퐴(휇)). Then

′ ′ ′ ′ 푤푡퐴(ℳ)(휃 휏 ) = wt퐴(ℳ)(휃 ) ⊙ 퐴(휇)(휏 ) ′ = ( ⨁ ′ wtℳ(휃)) ⊙ 퐴(휇)(휏 ) (by induction hypothesis) 휃∈Runsℳ,퐴(휃)=휃

= ( ⨁ ′ wtℳ(휃)) ⊙ ( ⨁ ′ 휇(휏)) 휃∈Runsℳ,퐴(휃)=휃 휏∈supp(휇):퐴(휏)=휏 (by definition 4.29)

= ⨁ ′ ′ wtℳ(휃) ⊙ 휇(휏) 휃∈Runsℳ,휏∈supp(휇):(퐴(휃)=휃 )∧(퐴(휏)=휏 ) (by distributivity of 풜)

= ⨁ ′ ′ wtℳ(휃) ⊙ 휇(휏) (by (†)) 휃∈Runsℳ,휏∈supp(휇):휃휏∈Runsℳ∧(퐴(휃휏)=휃 휏 ) ̄ = ⨁ ̄ ̄ ′ ′ wtℳ(휃) (by definition 4.29) 휃∈Runsℳ:(퐴(휃)=휃 휏 ) For (†), we propose that the index sets of the left and the right sum are the same. Thisholds ′ ′ true because 퐴 is injective, 휃 휏 is in Runs퐴(ℳ), and hence (by lemma 4.19) each 휃휏 with ′ ′ 퐴(휃휏) = 휃 휏 is in Runsℳ. ∎

The following two propositions are straight-forward generalisations of propositions 4.15 and 4.20: Essentially, “⊆” is replaced by “⊴”. This shows that definition 4.29 is well-behaved in the sense mentioned at the beginning of section 4.4.

Proposition 4.32 (taken from Den17a, theorem 34). Let ℳ be a (DS, 훴, 풜)-automaton where ′ DS = (퐶, 퐼, 퐶i, 퐶f) and 퐴: 퐶 ‧‧➡ 퐶 be an approximation strategy. If 퐴 is total, then ⟦퐴(ℳ)⟧(푤) ⊵ ⟦ℳ⟧(푤) for every 푤 ∈ 훴∗.

Proof. For every 푤 ∈ 훴∗, we derive:

′ ⟦퐴(ℳ)⟧(푤) = ⨁ ′ acc wt퐴(ℳ)(휃 ) (by definition 2.65) 휃 ∈Runs퐴(ℳ)(푤)

⊵ ⨁ ′ acc ⨁ ′ wtℳ(휃) (by (∗)) 휃 ∈Runs퐴(ℳ)(푤) 휃∈Runsℳ:퐴(휃)=휃

= ⨁ ′ acc ⨁ acc ′ wtℳ(휃) (by definition 4.7) 휃 ∈Runs퐴(ℳ)(푤) 휃∈Runsℳ(푤):퐴(휃)=휃

= ⨁ acc wtℳ(휃) (by (†)) 휃∈Runsℳ(푤) = ⟦ℳ⟧(푤) (by definition 2.65) where (∗) follows from lemma 4.30 and the definition of ⊴. For (†), we argue that for each ′ ′ 휃 ∈ Runsℳ(푤) there is exactly one 휃 ∈ Runs퐴(ℳ)(푤) with 퐴(휃) = 휃 since 퐴 is total. Hence the left side and the right side of the equation have exactly the same addends. Then,since ⊕ is commutative, the “=” is justified. ∎

106 4.4 Approximation of weighted automata with data storage

Proposition 4.33 (taken from Den17a, theorem 34). Let ℳ be a (DS, 훴, 풜)-automaton where ′ DS = (퐶, 퐼, 퐶i, 퐶f) and 퐴: 퐶 ‧‧➡ 퐶 be an approximation strategy. If 퐴 is injective, then ⟦퐴(ℳ)⟧(푤) ⊴ ⟦ℳ⟧(푤) for every 푤 ∈ 훴∗.

Proof. For every 푤 ∈ 훴∗, we derive:

′ ⟦퐴(ℳ)⟧(푤) = ⨁ ′ wt퐴(ℳ)(휃 ) (by definition 2.65) 휃 ∈Runs퐴(ℳ)(푤)

= ⨁ ′ ⨁ ′ wtℳ(휃) (by lemma 4.31) 휃 ∈Runs퐴(ℳ)(푤) 휃∈Runsℳ:퐴(휃)=휃

= ⨁ ′ ⨁ ′ wtℳ(휃) (by definition 4.7) 휃 ∈Runs퐴(ℳ)(푤) 휃∈Runsℳ(푤):퐴(휃)=휃

⊴ ⨁ wtℳ(휃) (by (‡)) 휃∈Runsℳ(푤) = ⟦ℳ⟧(푤) (by definition 2.65)

′ For (‡), we argue that for each 휃 ∈ Runsℳ(푤) there is at most one 휃 ∈ Runs퐴(ℳ)(푤) with 퐴(휃) = 휃′ since 퐴 is a partial function. Hence all the addends on the left side of the inequality also occur on the right side. But there may be an addend wtℳ(휃) on the right side which does not occur on the left side because 퐴(휃) is undefined. Then “⊴” is justified by definition of ⊴. ∎

107

5 A Chomsky-Schützenberger characterisation of weighted MCFLs

This chapter is a revised version of T. Denkinger. “A Chomsky-Schützenberger representation for weighted multiple context-free languages”. In: Proceedings of the 12th International Conference on Finite-State Methods and Natural Language Processing. 12. 2015. url: https://www.aclweb. org/anthology/W15-4803 and sections 3 and 4 of T. Denkinger. “Chomsky-Schützenberger parsing for weighted multiple context-free languages”. In: Journal of Language Modelling 5.1 (July 2017), p. 3. doi: 10.15398/jlm.v5i1.159.

5.1 Introduction

The Chomsky-Schützenberger theorem for context-free languages is widely known in formal language theory. Intuitively, it states that context-free languages are essentially well-bracketed strings (modulo a recognisable language and a homomorphism). A set of well-bracketed strings is called a Dyck language; the formal definition is given in definition 5.2. A Dyck language is a language over opening brackets from some set 훥 and closing brackets from some set 훥 where each opening bracket 훿 ∈ 훥 has exactly one closing bracket in 훥, denoted by 훿.̄ The elements of a Dyck language are all the well-matched strings of brackets. We will denote a Dyck language with opening brackets from the set 훥 by D(훥). Let us now recall this classic theorem.

Theorem 5.1 (taken from CS63, proposition 2). Let 훴 be a set and 퐿 ⊆ 훴∗ be a language. The following are equivalent: (i) 퐿 is a context-free language. (ii) There is a set 훥, a recognisable language 푅 ⊆ (훥 ∪ 훥)∗, and a string homomorphism ℎ: (훥 ∪ 훥)∗ → 훴∗ such that 퐿 = ℎ(D(훥) ∩ 푅).1 ∎

After its initial conception, the theorem has been generalised to a variety of settings: • to context-free languages weighted with commutative semirings [SS78, theorem 4.5], • to indexed languages [DPS79, theorems 1 and 2; FV15, theorem 4; FV16, theorem 18], • to tree-adjoining languages [Wei88, lemma 3.5.2], • to multiple context-free languages [YKS10, theorem 3],

1D(훥) denotes the set of well-bracketed words where the opening brackets are taken from 훥 and the closing bracket for each 훿 ∈ 훥 is 훿 ∈ 훥.

109 5 A Chomsky-Schützenberger characterisation of weighted MCFLs

• to context-free languages weighted with unital valuation monoids [DV13, theorem 2; DV14, theorem 5], • to yields of simple context-free tree languages [Kan14, theorem 8.3], and • to automata with storage weighted with unital valuation monoids [HV15, theorem 11]. We generalise the result of Yoshinaka, Kaji, and Seki [YKS10] to the weighted setting (more precisely to complete commutative semirings) using a technique that we call weight separation [as in DV13; DV14] in section 5.3. The “multiple Dyck languages” defined by Yoshinaka, Kaji, and Seki [YKS10] are defined with the help of MCFGs. We call them grammar multiple Dyck languages. We develop in section 5.4 an alternative which is defined independently of MCFGs by an equivalence relation. Consequently, we call these alternative languages equivalence multiple Dyck languages. We show the utility of equivalence multiple Dyck languages by showing that they can replace grammar multiple Dyck languages in the Chomsky-Schützenberger theorem of Yoshinaka, Kaji, and Seki [YKS10].

5.2 Preliminaries

Let 훥 be a set. Formally, 훥 denotes the smallest set that contains 훿 ̄for each 훿 ∈ 훥. Let us first recall the definition of Dyck languages. The cancellation relation w.r.t. 훥, denoted by ≡훥, is the smallest equivalence relation ≡ ⊆ (훥 ∪ 훥)∗ such that

̄ ∗ 푢0훿훿푢1 ≡ 푢0푢1 for each 훿 ∈ 훥 and 푢0, 푢1 ∈ (훥 ∪ 훥) .

Note that ≡훥 is a congruence relation on the free monoid over 훥 ∪ 훥 (cf. example 2.3). Definition 5.2. Let 훥 be a set. The Dyck language w.r.t. 훥, denoted by D(훥), is [휀] . ≡훥 □ The set of all Dyck languages is denoted by DYCK.

Example 5.3. Let 훥 = {a}. We can show by application of the equivalence relation ≡훥 that the string 푤 = aaaā ā ā is in D(훥). In the following chain the relation ≡훥 is used to cancel out the respective underlined parts:

aaaaā ̄ ā ≡훥 aaā ā ≡훥 aā ≡훥 휀 □

Any Dyck language can be represented by a context-free grammar. For example, the language D({a}) is represented by the context-free grammar (cf. example 2.8) with the two rules

푆 → 휀 and 푆 → a푆a푆 ̄ .

Let us recall that the Chomsky-Schützenberger theorem for CFGs represents a context-free language with the help of a string-homomorphism, a regular language, and a Dyck language. The most complex of these three objects is the Dyck language. (A string homomorphism canbe represented by a finite state transducer with one state and a regular language can be represented by a finite state automaton, but a Dyck language can not be represented by any finite state mechanism.) Consequently, Dyck languages are often called the generators of context-free

110 5.2 Preliminaries

languages because any context-free language can be obtained from a Dyck language using only finite state mechanisms. Yoshinaka, Kaji, and Seki [YKS10] identified generators of multiple context-free languages and called them multiple Dyck languages. They are defined with the help of a multiple context-free grammar and hence, in this disseration we will call them grammar multiple Dyck languages.

Definition 5.4 (taken from YKS10, definition 1, adapted to our notation). Let 훥 be a finite ℕ+-sorted set. Furthermore, let 푠 = max훿∈훥 sort훥(훿) and 푘 ≥ 푠. We define

̂ 푖 푖̄ 훥 = {훿 , 훿 ∣ 훿 ∈ 훥, 푖 ∈ [sort훥(훿)]}.

The multiple Dyck grammar with respect to 훥 and 푘 is the (푠, 푘)-MCFG(훥)̂

푘 ̂ 퐺훥 = ({퐴1, …, 퐴푠}, 훥, {퐴1}, 푅) where sort(퐴푖) = 푖 for each 푖 ∈ [푠], and 푅 contains exactly the following rules: (i) for every linear, non-deleting, and terminal-free tuple composition 푐 ∈ TCR(훥)̂ for which there are ℓ ∈ [푘] and 푚1, …, 푚ℓ, 푚 ∈ [푠] with sort(푐) = (푚1⋯푚ℓ, 푚), the rule

퐴 → 푐(퐴 , …, 퐴 ) is in 푅, 푚 푚1 푚ℓ

(ii) for every 푚 ∈ [푠] and 훿 ∈ 훥푚, the rule

1 1̄ 푚 푚̄ 퐴푚 → [훿 푥1,1훿 , …, 훿 푥1,푚훿 ](퐴푚) is in 푅, and

(iii) for every 푚 ∈ [푠], any rule

퐴푚 → [푢1, …, 푢푚](퐴푚)

1 1̄ 1 1̄ with 푢푖 ∈ {훿 훿 푥푖, 푥푖, 푥푖훿 훿 ∣ 훿 ∈ 훥1}, for every 푖 ∈ [푚], is in 푅.

The grammar multiple Dyck language with respect to 훥 and 푘, denoted by mDG(훥, 푘), is 푘 ℒ(퐺훥). We call 푠 the dimension of mDG(훥, 푘) and 푘 the rank of mDG(훥, 푘). The set of grammar multiple Dyck languages of dimension at most 푠 and rank at most 푘 is denoted by mDYCKG(푠, 푘). □

Using these generators, Yoshinaka, Kaji, and Seki [YKS10, theorem 3] showed the following theorem:

Theorem 5.5 (taken from YKS10, theorem 3). Let 훴 be a set, 퐿 ⊆ 훴∗ be a language, 푘 ∈ ℕ, and 푠 ∈ ℕ+. The following are equivalent: (i) 퐿 ∈ (푠, 푘)-MCFL(훴). (ii) There is a set 훥, a recognisable language 푅 ⊆ 훥∗, a grammar multiple Dyck language mD ⊆ 훥∗ of dimension 푠 and rank 푘, and a string homomorphim ℎ: 훥∗ → 훴∗ such that 퐿 = ℎ(mD ∩ 푅). ∎

111 5 A Chomsky-Schützenberger characterisation of weighted MCFLs

In the following, if the elements of 훥 are opening brackets in the conven- tional sense, then we will write for each element of 훥 the closing bracket corresponding to its counter-part in 훥.

Intuitively, the elements of the string language mD ∩ 푅 encode the derivations of an MCFG that generates 퐿. The language 푅 models the local properties of derivations, e.g. that derivations start from an initial non-terminal or that sub-derivations start from the correct non-terminal according to the parent rule. The language mD models long-distance properties, e.g. that for every rule that we start processing, we will eventually finish. In order to allow checking of local and long-distance properties, we use three kinds of brackets:

• the bracket string ⟦휎⟧휎 is an encoding of a terminal symbol 휎 ∈ 훴, 푚 푚 • brackets ⟦휌 and ⟧휌 mark the beginning and the end of sub-strings that correspond to the 푚-th component of the rule 휌, and 푗 푗 • brackets ⟦휌,푖 and ⟧휌,푖 mark the beginning and the end of a sub-string that corresponds to the variable 푥푖,푗 in the rule 휌. The homomorphism ℎ transforms the encoded derivations into the corresponding yields (i.e. into strings over 훴). Using this intuition, we recall the Chomsky-Schützenberger representation of MCFGs.

Definition 5.6 (taken from YKS10, section 3.2). Let 퐺 = (푁, 훴, 푁i, 푅) be an MCFG. The generator set with respect to 퐺 is the ℕ-sorted set

훥 = {⟦휎 ∣ 휎 ∈ 훴} ∪ {⟦휌 ∣ 휌 ∈ 푅} ∪ {⟦휌,푖 ∣ 휌 ∈ 푅, 푖 ∈ [rank(휌)]} where sort(⟦휎) = 1, sort(⟦휌) = fanout(휌), and sort(⟦휌,푖) = fanout푖(휌) for every 휎 ∈ 훴, ∗ 휌 ∈ 푅, and 푖 ∈ rank(휌). For each 푢 = 휎1⋯휎푚 ∈ 훴 (with 푚 ∈ ℕ and 휎1, …, 휎푚 ∈ 훴), we will abbreviate ⟦1 ⟧1 ⋯⟦1 ⟧1 by ̃푢. The grammar multiple Dyck language with respect to 퐺, 휎1 휎1 휎푚 휎푚 denoted by mDG(퐺), is mDG(훥, 푘) where 푘 is the rank of 퐺. ̂ 1 The automaton with respect to 퐺 is the FSA ℳ(퐺) = (푄, 훥, {푆 ∣ 푆 ∈ 푁i}, {푞f}, 푇 ), where

푗 푄 = {퐴 ∣ 퐴 ∈ 푁, 푗 ∈ [sort(퐴)]} ∪ {푞f}

and 푇 contains for every rule 휌 = 퐴 → [푣1, …, 푣푠](퐵1, …, 퐵푘) ∈ 푅 and each 푚 ∈ [푠] (where 푣 is of the form 푢 푥 푢 ⋯푥 푢 with 푢 , …, 푢 ∈ 훴∗) 푚 푚,0 푖(푚,1),푗(푚,1) 푚,1 푖(푚,ℓ푚),푗(푚,ℓ푚) 푚,ℓ푚 푚,0 푚,ℓ푚 exactly the following transitions

푚 푚 푚 (퐴 ,⟦휌 ̃푢푚,0⟧휌 , 푞f) if ℓ푚 = 0, 푚 푚 푗(푚,1) 푗(푚,1) (퐴 ,⟦휌 ̃푢푚,0⟦휌,푖(푚,1), 퐵푖(푚,1) ) if ℓ푚 > 0, 푗(푚,푧) 푗(푚,푧+1) 푗(푚,푧+1) (푞f,⟧휌,푖(푚,푧)̃푢푚,푧⟦휌,푖(푚,푧+1), 퐵푖(푚,푧+1) ) if ℓ푚 > 0, for every 푧 ∈ [ℓ푚 − 1], and

푗푚,ℓ푚 푚 (푞 ,⟧ ̃푢푚,ℓ ⟧휌 , 푞 ) if ℓ푚 > 0. f 휌,푖(푚,ℓ푚) 푚 f The recognisable language with respect to 퐺 is 푅(퐺) = ℒ(ℳ(퐺)).

112 5.3 The Chomsky-Schützenberger characterisation

⟦1 ⟦1 휌1 휌 ,1 1 1 1 ⟦1 ̃푎⟦1 start 푆 퐴 휌2 휌2,1

⟦1 ⟧1 휌3 휌3 2 1 1 1 2 2 {⟧ ⟧휌 ,⟧ ⟧휌 , ⟦ ⟧ 휌1,2 1 휌2,1 2 휌4 휌4 ⟧2 ⟧2 ,⟧1 ⟧1 ,⟧2 ⟧2 } 휌2,1 휌2 휌3,1 휌3 휌3,1 휌3 퐴2 1 2 ⟧ ⟦ 푞f 휌1,2 휌1,1 ⟧2 ⟦2 휌1,1 휌1,2 ⟦2 ̃푐⟦2 휌2 휌2,1 ⟦2 ⟧2 휌5 휌5

퐵2 ⟦1 ⟧1 휌5 휌5 ⟧1 ⟦1 휌1,1 휌1,2 ⟦2 푑⟦̃ 2 휌3 휌 ,1 ⟦1 푏⟦̃ 1 1 3 휌3 휌3,1 퐵

Figure 5.7: The automaton with respect to 퐺 [essentially taken from Den15, figure 2] where 퐺

is taken from example 2.20. The loop on state 푞f is labelled with a set of five strings; it represents a set of five transitions, each reading one of the strings.

The homomorphism with respect to 퐺 is the string-homomorphism hom(퐺): 훥̂∗ → 훴∗ where

1 휎 if 훿 = ⟦휎 for some 휎 ∈ 훴 hom(퐺)(훿) = { } . □ 휀 otherwise

Figure 5.7 shows an example to illustrate the construction of ℳ(퐺) for some MCFG 퐺.

5.3 The Chomsky-Schützenberger characterisation

Using theorem 5.5 and our preparation in section 2.3.4 of the preliminaries, we can immediately extend the unweighted Chomsky-Schützenberger theorem to the weighted setting.

Theorem 5.8 (taken from Den17b, theorem 3.12). Let 훴 be a set, 풜 be a complete commutative ∗ semiring, 퐿: 훴 → 풜 be a weighted language, 푘 ∈ ℕ, and 푠 ∈ ℕ+. The following are equivalent: (i) 퐿 ∈ (푠, 푘)-MCFL(훴, 풜). (ii) There is a set 훥, a recognisable language 푅 ⊆ 훥∗, a grammar multiple Dyck language mD ⊆ 훥∗, and a weighted string homomorphims ℎ: 훥∗ → (훴∗ → 풜) such that 퐿 = ℎ(mD ∩ 푅).

113 5 A Chomsky-Schützenberger characterisation of weighted MCFLs

Proof. We derive the following:

퐿 ∈ (푠, 푘)-MCFL(훴, 풜)

′ ⟺ ∃set 훤 , ℎ2 ∈ αHOM(훤 , 훴, 풜), unambiguous (푠, 푘)-MCFG(훤 ) 퐺 : ′ 퐿 = ℎ2(ℒ(퐺 )) (by lemma 2.74) ∗ ⟺ ∃sets 훥, 훤 , ℎ2 ∈ αHOM(훤 , 훴, 풜), recognisable language 푅 ⊆ 훥 , ∗ grammar multiple Dyck language mD ⊆ 훥 , ℎ1 ∈ αHOM(훥, 훤 ):

퐿 = ℎ2(ℎ1(푅 ∩ mD)) (by theorem 5.5) ⟺ ∃set 훥, ℎ ∈ αHOM(훥, 훴, 풜), recognisable language 푅 ⊆ 훥∗, grammar multiple Dyck language mD ⊆ 훥∗: 퐿 = ℎ(푅 ∩ mD) (by lemma 2.73 item (ii) and because id ∈ αHOM(훤 , 훥) for 훤 = 훥) ∎

5.4 Equivalence multiple Dyck languages

It has been noted by Kanazawa [Kan14, section 1] that multiple Dyck languages were only characterised in terms of MCFGs (see definition 5.4) and lacked an independent definition. Denkinger [Den15, Def. 7] gave a definition of multiple Dyck languages in terms of an equiva- lence relation. Since the definitions are not equivalent (cf. observation 5.15), we refer to them as grammar multiple Dyck languages and equivalence multiple Dyck languages, respectively.2 We demonstrate that equivalence multiple Dyck languages are useful (despite their difference to grammar multiple Dyck languages) by providing a Chomsky-Schützenberger representation that uses them (cf. section 5.4.3).

Definition 5.9 (taken from RD19, section 2, Chomsky-Schützenberger theorem). Let 훥 be a 3 set and 픓 be a partition of 훥. The cancellation relation w.r.t. 픓, denoted by ≡픓, is the smallest ̄ ∗ ̄ ∗ equivalence relation ≡ ⊆ 풫((훥∪훥) )×풫((훥∪훥) ) such that for any cell 픭 = {훿1, …, 훿푘} ∈ 픓 ∗ with |픭| = 푘, strings 푢0, …, 푢푘, 푣1, …, 푣푘 ∈ D(훥), and language 퐿 ⊆ (훥 ∪ 훥) :

{푢0훿1푣1훿1푢1⋯훿푘푣푘훿푘푢푘} ∪ 퐿 ≡ {푢0⋯푢푘, 푣1⋯푣푘} ∪ 퐿. □

̄ ∗ Note that ≡픓 is a congruence relation on the monoid (풫((훥 ∪ 훥) ), ∪, ∅).

Definition 5.10 (taken from Den15, definition 7). Let 훥 be a set and 픓 a partition of 훥. The equivalence multiple Dyck language w.r.t. 픓, denoted by mD≡(픓), is the set ⋃(퐿 ∣ 퐿 ∈ [{휀}]) ⊆ ∗ (훥 ∪ 훥) . The dimension of mD≡(픓) is max픭∈픓|픭|. □

2In the following, equivalence multiple Dyck languages are presented as in Ruprecht and Denkinger [RD19, section 2]. The definition is equivalent to that in Denkinger [Den15]. 3Recall from section 2.1.1 that a partition of 훥 is a subset of 풫(훥) whose elements are non-empty, disjoint, and cover 훥.

114 5.4 Equivalence multiple Dyck languages

The set of equivalence multiple Dyck languages that have a dimension ofatmost 푠 is denoted by mDYCK≡(푠) for any 푠 ∈ ℕ+.

Example 5.11 (taken from Den15, example 8). Let 훥 = {(, ⦅, [, ⟦} and 픓 = {픭1, 픭2} with ̄ ̄ ̄ ̄ 픭1 = {(, ⦅} and 픭2 = {[, ⟦}. We will write ), ⦆, ], and ⟧ instead of (, ⦅, [, and ⟦, respectively. Using the cancellation relation w.r.t. 픓, we see that ⟦()⟧[⦅⦆] and ⟦()⟧[]⟦⟧[⦅⦆] are in mD≡(픓):

⟦()⟧[⦅⦆] ∈ {⟦()⟧[⦅⦆]} ≡픓 {휀, ()⦅⦆} ≡픓 {휀} and

⟦()⟧[]⟦⟧[⦅⦆] ∈ {⟦()⟧[]⟦⟧[⦅⦆]} ≡픓 {()⦅⦆, []⟦⟧} ≡픓 {휀, []⟦⟧} ≡픓 {휀}, where we underlined the brackets that are removed in each step.

But ⟦()⟧⦅[]⦆ is not in mD≡(픓), since no set that contains ⟦()⟧⦅[]⦆ can be reduced to {휀} using the cancellation relation with respect to 픓. To prove this, let 퐿 ⊆ (훥 ∪ 훥)∗ with ⟦()⟧⦅[]⦆ ∈ 퐿.

Case 1: Let 퐿 ∖ {⟦()⟧⦅[]⦆} ≢픓 {휀}. Then 퐿 ∖ {⟦()⟧⦅[]⦆} ≢픓 {휀} since, as a congruence relation, ≡픓 respects set union ∪.

Case 2: Let 퐿 ∖ {⟦()⟧⦅[]⦆} ≡픓 {휀}. Then 퐿 ≡픓 {⟦()⟧⦅[]⦆} since ≡픓 respects set union ∪. But {⟦()⟧⦅[]⦆} can not be further reduced since no choice of cell from 픓 allows a reduction:

Case 2.1: Let us take 픭1 = {훿1, 훿2} = {(, ⦅} ∈ 픓. Then, when we match ⟦()⟧⦅[]⦆ with the

pattern 푢0훿1푣1훿1푢1⋯훿푘푣푘훿푘푢푘, we obtain the following:

⏟⟦ ⏟( ⏟휀 ⏟) ⏟⟧ ⏟⦅ []⏟ ⏟⦆ ⏟휀

푢0 훿 푣1 푢1 훿2 푣2 푢2 1 훿1 훿2

But since 푢0 = ⟦ and 푢1 = ⟧ are not in D(훥), the cancellation relation does not apply. Case 2.2: Let us take 픭2 = {훿1, 훿2} = {[, ⟦} ∈ 픓. Then, when we match ⟦()⟧⦅[]⦆ with the

pattern 푢0훿1푣1훿1푢1⋯훿푘푣푘훿푘푢푘, we obtain the following:

⏟휀 ⏟⟦ ()⏟ ⏟⟧ ⏟⦅ ⏟[ ⏟휀 ⏟] ⏟⦆

푢0 훿 푣 푢1 훿 푣2 푢2 1 1 훿1 2 훿2

But since 푢1 = ⦅ and 푢2 = ⦆ are not in D(훥), the cancellation relation does not apply. □

In general, since ≡픓 respects set union ∪, we can make the following observation.

Observation 5.12. Let 훥 be a set and 픓 be a partition of 훥. Furthermore, let 푤 ∈ (훥 ∪ 훥)∗. The following are equivalent:

(i) 푤 ∈ mD≡(픓), (ii) there is an 퐿 ∈ [{휀}] such that 푤 ∈ 퐿, and ≡픓 (iii) {푤} ∈ [{휀}] . ∎ ≡픓

Furthermore, since the cells 픭 in the partition 픓 of 훥 are sets and hence unordered, we get the following observation.

115 5 A Chomsky-Schützenberger characterisation of weighted MCFLs

Observation 5.13 (taken from Den15, observation 9). Let 훥 be a set and 픓 be a partition of 훥. Furthermore, let 푢1, …, 푢푘 ∈ D(훥) with 푢1⋯푢푘 ∈ mD≡(픓). Then every permutation of 푢1, …, 푢푘 is in mD≡(픓), i.e. for each surjective function 휋: [푘] → [푘], the string 푢휋(1)⋯푢휋(푘) is in mD≡(픓). ∎

Similar to grammar multiple Dyck languages, the equivalence multiple Dyck languages cover the Dyck languages by setting the dimension to 1. Also, they form a (strict) hierarchy with increasing dimension.

Proposition 5.14 (taken from Den15, proposition 13).

DYCK = mDYCK≡(1) ⊊ mDYCK≡(2) ⊊ …

Proof. For DYCK = mDYCK≡(1): Let 훥 be a set. The dimension of some partition 픓 of 훥 is 1 if and only if 픓 = {{훿} ∣ 훿 ∈ 훥}. The claim then follows easily from definition 5.10.

For mDYCK≡(푠 − 1) ⊆ mDYCK≡(푠): Follows directly from the definition of mDYCK≡(푠), in particular the part “of at most dimension 푠”.

For mDYCK≡(푠 − 1) ≠ mDYCK≡(푠): Let 훥 be a set and 픓 be a partition of 훥 such that there is a 픭 in 픓 with size 푠. Furthermore, let 훿1, …, 훿푠 ∈ 훥 with 픭 = {훿1, …, 훿푠}. Assume that there ′ ′ is an mD≡(픓 ) ∈ mDYCK≡(푠 − 1) such that mD≡(픓 ) = mD≡(픓). Since 훿1훿1⋯훿푠훿푠 is in mD≡(픓) and mD≡(픓) has dimension strictly less than 푠, there must be at least two pairwise ′ ′ ′ ′ ′ different cells 픭1, …, 픭푘 in 픓 such that 픭 = 픭1 ∪ ⋯ ∪ 픭푘. Let, without loss of generality, ′ ′ ′ ′ ′ ′ ′ ′ ′ ′ 훿1, …, 훿푖 ∈ 훥 such that 픭1 = {훿1, …, 훿푖} and |픭1| = 푖. Then 푤 = 훿1훿1⋯훿푠훿푠 훿1훿1⋯훿푖훿푖 is in ′ ′ mD≡(픓 ). However, 푤 is not in mD≡(픓) which contradicts the assumption. ∎

5.4.1 Relation between grammar and equivalence multiple Dyck languages It is easy to see from their definition that equivalence multiple Dyck languages are unable to encode the rank 푘 that is encoded into a multiple Dyck grammar. Hence, equivalence multiple Dyck languages and grammar multiple Dyck languages are different language classes.

Observation 5.15 (taken from Den17b, observation 4.6). Let mDG be a grammar multiple Dyck language and mD≡ be an equivalence multiple Dyck language. Then mDG ≠ mD≡. ∎

However, for any grammar multiple Dyck language, we can at least find an equivalence multiple Dyck language that subsumes it.

Proposition 5.16 (taken from Den17b, proposition 4.5). Let 푠, 푘 ∈ ℕ+. For each mDG ∈ mDYCKG(푠, 푘), there is an mD≡ ∈ mDYCK≡(푠) such that mDG ⊆ mD≡.

Proof idea. Since mDG ∈ mDYCKG(푠, 푘), there is an ℕ+-sorted set 훥 with max훿∈훥 sort훥(훿) ≤ 푠 ′ 푖 and mDG = mDG(훥, 푘). Let 훥 = {훿 ∣ 훿 ∈ 훥, 푖 ∈ [sort훥(훿)]}. We construct an equivalence multiple Dyck language mD≡ with at most dimension 푠 such that, if a tuple (푤1, …, 푤푚) can be 푘 ′ generated in 퐺훥 from non-terminal 퐴푚, then 푤1, …, 푤푚 ∈ D(훥 ) and 푤1⋯푤푚 ∈ mD≡. We 푘 prove the correctness of our construction by induction on the structure of derivations of 퐺훥.

116 5.4 Equivalence multiple Dyck languages

Proof. Let mDG ∈ mDYCKG(푠, 푘). Then there is an ℕ+-sorted set 훥 such that mDG = mD(훥, 푘) 푖 ′ and max훿∈훥 sort훥(훿) ≤ 푠. We define 픭훿 = {훿 ∣ 푖 ∈ [sort훥(훿)]} for every 훿 ∈ 훥, 훥 = ⋃훿∈훥 픭훿, and 픓 = {픭훿 ∣ 훿 ∈ 훥}. Clearly, max픭∈픓|픭| ≤ 푠. Thus mD≡(픓) ∈ mDYCK≡(푠). 푘 푘 Let Tup(퐺훥, 퐴푚) denote the set of tuples generated in 퐺훥 when starting with non-terminal 퐴푚 where 퐴푚 is not necessarily initial. In the following, we show that for every 푚 ∈ ′ ′̄ ∗ [max훿∈훥 sort훥(훿)] and 푤1, …, 푤푚 ∈ (훥 ∪ 훥 ) :

푘 ′ (푤1, …, 푤푚) ∈ Tup(퐺훥, 퐴푚) ⟹ 푤1⋯푤푚 ∈ mD≡(픓) ∧ 푤1, …, 푤푚 ∈ D(훥 ). (IH)

푘 푘 It follows from the definitions of Tup and 퐺훥 that (푤1, …, 푤푚) ∈ Tup(퐺훥, 퐴푚) implies that 푚 there are a rule 퐴 → 푐(퐴 , …, 퐴 ) in 퐺푘 and tuples ⃗푢 = (푢1, …, 푢 푖 ) ∈ Tup(퐺푘 , 퐴 ) 푚 푚1 푚ℓ 훥 푖 푖 푖 훥 푚푖 for every 푖 ∈ [ℓ] such that 푐( ⃗푢1, …, ⃗푢ℓ) = (푤1, …, 푤푚). By applying the induction hypothesis ℓ 1 푚1 1 푚ℓ 1 푚1 1 푚ℓ times, we obtain 푢1, …, 푢1 , …, 푢ℓ , …, 푢ℓ ∈ D(훴) and 푢1⋯푢1 , …, 푢ℓ ⋯푢ℓ ∈ mD≡(픓). We 푘 distinguish three cases (each corresponding to one type of rule in 퐺훥): (i) Let 푐 be a linear, non-deleting, and terminal-free tuple composition. Then we have for 1 푚1 1 푚ℓ ∗ ′ every 푖 ∈ [푚] that 푤푖 ∈ {푢1, …, 푢1 , …, 푢ℓ , …, 푢ℓ } and therefore also 푤푖 ∈ D(훥 ). Furthermore, by applying observation 5.13, we obtain 푤1⋯푤푚 ∈ mD≡(픓). 1 1 1̄ 푚 푚 푚̄ (ii) Let 푐 = [훿 푥1훿 , …, 훿 푥1 훿 ]. Then ℓ = 1, 푚1 = 푚, and for every 푖 ∈ [푚] we 1 푖 푖̄ 푖 ′ ′ have 푤푖 = 훿 푢1훿 and since 푢1 ∈ D(훥 ), also 푤푖 ∈ D(훥 ). Furthermore, 푤1⋯푤푚 = 1 1 1̄ 푚 푚 푚̄ 훿 푢1훿 ⋯훿 푢1 훿 ∈ mD≡(픓) due to the cancellation relation. 1 1 1 1̄ 1 1̄ 1 (iii) Let 푐 = [푢1, …, 푢푚] where 푢푖 ∈ {푥푖 , 푥푖 훿 훿 , 훿 훿 푥푖 ∣ 훿 ∈ 훥1} for every 푖 ∈ [푚]. 1 1 1 1̄ 1 1̄ 1 Then 푤푖 ∈ {푢푖 , 푢푖 훿 훿 , 훿 훿 푢푖 ∣ 훿 ∈ 훥1} for every 푖 ∈ [푚], ℓ = 1, and 푚1 = 푚. ′ Since ≡훴 respects string composition ∘, we have that 푤1, …, 푤푚 ∈ D(훥 ). By applying observation 5.13, we have that 푤1⋯푤푚 ∈ mD≡(픓). ∎

5.4.2 Deciding membership in an equivalence multiple Dyck language We give an algorithm to decide membership in an equivalence multiple Dyck language (algo- rithm 5.18). It is closely related to the cancellation relation ≡픓 and thus provides an algorithmic view on equivalence multiple Dyck languages. Algorithm 5.18 works roughly as follows: It is a recursive algorithm. In every call of is- Member with some word 푤, it checks if {푤} can be reduced to {휀} by applications of the cancellation relation. It is sufficient to only check singleton sets since ≡픓 respects set union ∪. Furthermore, we only apply the cancellation relation from left to right (as given in its definition). In order to consider all possible applications of the cancellation relation, isMember checks all decompositions of the input string into non-empty Dyck words. For this purpose, we use a function split (cf. algorithm 5.17).

Description of isMember. In the following, all line numbers refer to algorithm 5.18. We first check if 푤 is in D(훥), e.g. with a context-free grammar (cf. example 2.8) or a pushdown automaton. If 푤 is not in D(훥), it is also not in mD≡(픓) and we return False. Otherwise, we split 푤 into shortest non-empty Dyck words (on line 5), i.e. we compute the tuple of

shortest strings (푣1, …, 푣ℓ) such that 푣1, …, 푣ℓ ∈ D(훥) ∖ {휀} and 훿1푣1훿1⋯훿ℓ푣ℓ훿ℓ = 푤 for some

117 5 A Chomsky-Schützenberger characterisation of weighted MCFLs

Algorithm 5.17 Algorithm to split a word in D(훥) into shortest non-empty strings from D(훥) [taken from Den15, algorithm 2] Input: alphabet 훥, 푤 ∈ D(훴) Output: tuple (푢1, …, 푢ℓ) of shortest strings 푢1, …, 푢ℓ ∈ D(훥) ∖ {휀} such that 푤 = 푢1 ⋯ 푢ℓ

1: function Split(훥, 푤)

2: let 훿1, …, 훿푘 ∈ 훥 ∪ 훥 such that 푤 = 훿1⋯훿푘 3: let pd = 휀 ▷ pd is an empty pushdown 4: let 푗 = 1 and 푢1 = 휀 5: for 푖 ∈ [푘] do 6: append 훿푖 to the end of 푢푗

7: if 훿푖 ∈ 훥 then 8: remove 훿푖 from the end of pd ▷ never fails because 푤 ∈ D(훥) 9: if pd = 휀 then 10: let 푗 = 푗 + 1 and 푢푗 = 휀 11: end if 12: else 13: append 훿푖 to the end of pd 14: end if 15: end for 16: return (푢1, …, 푢푗−1) 17: end function

118 5.4 Equivalence multiple Dyck languages

Algorithm 5.18 Function isMember to decide membership in mD≡(픓) [taken from Den17b, algorithm 2] Input: a partition 픓 of some set 훥 and a string 푤 ∈ (훥 ∪ 훥)∗ Output: True if 푤 ∈ mD≡(픓), False otherwise 1: function isMember(픓, 푤) 2: if 푤 ∉ D(훥) then 3: return False 4: end if 5: let (훿1푣1훿1, …, 훿ℓ푣ℓ훿ℓ) = split(푤) such that 훿1, …, 훿ℓ ∈ 훥 6: let ℐ = ∅ 7: for each 퐼 = {푖 , …, 푖 } ⊆ [ℓ] with {훿 , …, 훿 } ∈ 픓 do 1 푘 푖1 푖푘 8: if isMember(픓, 푣 ⋯푣 ) then 푖1 푖푘 9: add 퐼 as an element to ℐ 10: end if 11: end for 12: for each 퐽 ⊆ ℐ do 13: if 퐽 is a partition of [ℓ] then 14: return True 15: end if 16: end for 17: return False 18: end function

훿1, …, 훿ℓ ∈ 훥. We denote (훿1푣1훿1, …, 훿ℓ푣ℓ훿ℓ) by split(푤). Note that split(푤) can be calculated in time and space linear in |푤| with the help of a (deterministic) pushdown transducer [cf. AU72, Section 3.1.4].4 Since each of these shortest non-empty Dyck words has the form 훿푢훿 for some ∗ 훿 ∈ 훥 and 푣 ∈ (훥 ∪ 훥) , we write (훿1푣1훿1, …, 훿ℓ푣ℓ훿ℓ) for the left-hand side of the assignment on line 5. On lines 6 to 11 we calculate the set ℐ of sets of indices 퐼 = {푖1, …, 푖푘} such that the singleton set {훿 푣 훿 ⋯훿 푣 훿 } can be reduced to {휀} with the cancellation relation. This 푖1 푖1 푖1 푖푘 푖푘 푖푘 reduction is possible if there exists an appropriate cell in 픓 (checked on line 7) and if {푣 ⋯푣 } 푖1 푖푘 can be reduced to {휀} (checked on line 8). Therefore, at the end of line 11, each element of ℐ represents one possible application of the cancellation relation. In order for ≡훴,픓 to reduce

{푤} to {휀}, each component of (훿1푣1훿1, …, 훿ℓ푣ℓ훿ℓ) needs to be reduced (exactly once) in this manner. This is equivalent to a subset of ℐ being a partition of [ℓ] (checked on lines 12 to 16). If no such subset exists, then {푤} is not equivalent to {휀} and we return False on line 17.

Example 5.19 (taken from Den15; Den17b, examples 14 and 4.8, respectively). Recall from

4We initially write “(” on the output tape. Whenever we read an element of 훥, we write that element on the output tape and push it onto the pushdown. Whenever we read an element of 훥, we write this element on the output tape and pop it from the pushdown. Upon reaching the bottom of the pushdown stack, we write a “,” on the output tape. Finally, we write “)” on the output tape. Then the inscription of the output tapeis

“(훿1푣1훿1, …, 훿ℓ푣ℓ훿ℓ)”.

119 5 A Chomsky-Schützenberger characterisation of weighted MCFLs

Table 5.1: Run of algorithm 5.18 on the word ⟦()⟧[⦅⦆], cf. examples 5.19 and 5.11 [taken from Den17b, table 1].

isMember(픓, ⟦()⟧[⦅⦆]) line 5: ℓ = 2, 훿1 = ⟦, 훿2 = [, 푣1 = (), 푣2 = ⦅⦆ line 6: ℐ = ∅ line 7: 퐼 = {1, 2} line 8: isMember(픓, ()⦅⦆) line 5: ℓ = 2, 훿1 = (, 훿2 = ⦅, 푣1 = 휀 = 푣2 line 6: ℐ = ∅ line 7: 퐼 = {1, 2} line 8: isMember(픓, 휀) line 5: ℓ = 0 lines 6 and 11: ℐ = ∅ ▷ no subset of ∅ is an element of 픓 = {{(, ⦅}, {[, ⟦}} line 12: 퐽 = ∅ ▷ 퐽 = ∅ is a partition of [0] = ∅ line 14: return True lines 9 and 11: ℐ = {{1, 2}} line 12: 퐽 = ∅ ▷ 퐽 = ∅ is not a partition of [2] = {1, 2} line 12: 퐽 = {{1, 2}} ▷ 퐽 = {{1, 2}} is a partition of [2] = {1, 2} line 14: return True lines 9 and 11: ℐ = {{1, 2}} line 12: 퐽 = ∅ ▷ 퐽 = ∅ is not a partition of [2] = {1, 2} line 12: 퐽 = {{1, 2}} ▷ 퐽 = {{1, 2}} is a partition of [2] = {1, 2} line 14: return True

example 5.11 the sets 훥 = {(, ⦅, [, ⟦} and 픓 = {픭1, 픭2} with 픭1 = {(, ⦅} and 픭2 = {[, ⟦}. In the following, we use algorithm 5.18 to show that the words ⟦()⟧[⦅⦆] and ⟦()[]⟦⟧[⦅⦆] are in the languages mD≡(픓). For this, we give a record of the variable assignment for a run of our algorithm. We report a subset of the variable assignment at the end of lines 5, 6, 7, 8, 9, 11 and 12 as well as the returned values at the ends of lines 14 and 17. The runs for ⟦()⟧[⦅⦆] and ⟦()[]⟦⟧[⦅⦆] are shown in tables 5.1 and 5.2. Similarly, we show in table 5.3 that the word ⟦()⟧⦅[]⦆ is not in the language mD≡(픓). □

In light of the close relation between algorithm 5.18 and the cancellation relation ≡픓, we omit the proof of correctness.

Proof of termination for algorithm 5.18. We distinguish two cases. Case 1: Let 푤 ∉ D(훴). Then the algorithm terminates on line 3. Case 2: Let 푤 ∈ D(훴). Since 풫([|푤|]) is finite, the loop on lines 7 to 11 considers only finitely many values 퐼. Thus there are only finitely many calls to isMember on line 8 for each recursion. In the call of isMember, the length of the third argument 푣 ⋯푣 is strictly 푖1 푖푘

120 5.4 Equivalence multiple Dyck languages

Table 5.2: Run of algorithm 5.18 on the word ⟦()⟧[]⟦⟧[⦅⦆], cf. examples 5.19 and 5.11 [taken from Den17b, table 2]. isMember(픓, ⟦()⟧[]⟦⟧[⦅⦆]) line 5: ℓ = 4, 훿1 = ⟦ = 훿3, 훿2 = [ = 훿4, 푣1 = (), 푣2 = 휀 = 푣3, 푣4 = ⦅⦆ line 6: ℐ = ∅ line 7: 퐼 = {1, 2} line 8: isMember(픓, ()) line 5: ℓ = 1, 훿1 = (, 푣1 = 휀 lines 6 and 11: ℐ = ∅ ▷ no subset of {훿1} = {(} is an element of 픓 = {{(, ⦅}, {[, ⟦}} line 12: 퐽 = ∅ ▷ 퐽 = ∅ is not a partition of [1] = {1} line 17: return False line 11: ℐ = ∅ line 7: 퐼 = {1, 4} line 8: isMember(픓, ()⦅⦆) line 5: ℓ = 2, 훿1 = (, 훿2 = ⦅, 푣1 = 휀 = 푣2 line 6: ℐ = ∅ line 7: 퐼 = {1, 2} line 8: isMember(픓, 휀) line 5: ℓ = 0 lines 6 and 11: ℐ = ∅ ▷ no subset of ∅ is an element of 픓 = {{(, ⦅}, {[, ⟦}} line 12: 퐽 = ∅ ▷ 퐽 = ∅ is a partition of [0] = ∅ line 14: return True lines 9 and 11: ℐ = {{1, 2}} line 12: 퐽 = ∅ ▷ 퐽 = ∅ is not a partition of [2] = {1, 2} line 12: 퐽 = {{1, 2}} ▷ 퐽 = ∅ is a partition of [2] = {1, 2} line 14: return True lines 9 and 11: ℐ = {{1, 4}} line 7: 퐼 = {2, 3} line 8: isMember(픓, 휀) line 5: ℓ = 0 lines 6 and 11: ℐ = ∅ ▷ no subset of ∅ is an element of 픓 = {{(, ⦅}, {[, ⟦}} line 12: 퐽 = ∅ ▷ 퐽 = ∅ is a partition of [0] = ∅ line 14: return True lines 9 and 11: ℐ = {{1, 4}, {2, 3}} line 7: 퐼 = {3, 4} line 8: isMember(픓, ⦅⦆) line 5: ℓ = 1, 훿1 = ⦅, 푣1 = 휀 lines 6 and 11: ℐ = ∅ ▷ no subset of {훿1} = {⦅} is an element of 픓 = {{(, ⦅}, {[, ⟦}} line 12: 퐽 = ∅ ▷ 퐽 = ∅ is not a partition of [1] = {1} line 17: return False line 11: ℐ = {{1, 4}, {2, 3}} line 12: 퐽 = ∅ ▷ 퐽 = ∅ is not a partition of [4] = {1, 2, 3, 4} line 12: 퐽 = {{1, 4}} ▷ 퐽 = {{1, 4}} is not a partition of [4] = {1, 2, 3, 4} line 12: 퐽 = {{2, 3}} ▷ 퐽 = {{2, 3}} is not a partition of [4] = {1, 2, 3, 4} line 12: 퐽 = {{1, 4}, {2, 3}} ▷ 퐽 = {{1, 4}, {2, 3}} is a partition of [4] = {1, 2, 3, 4} line 14: return True

121 5 A Chomsky-Schützenberger characterisation of weighted MCFLs

Table 5.3: Run of algorithm 5.18 on the word ⟦()⟧⦅[]⦆, cf. examples 5.19 and 5.11.

isMember(픓, ⟦()⟧⦅[]⦆) line 5: ℓ = 2, 훿1 = ⟦, 훿2 = ⦅, 푣1 = (), 푣2 = [] lines 6 and 11: ℐ = ∅ ▷ no subset of {훿1, 훿2} = {⟦, ⦅} is an element of 픓 = {{(, ⦅}, {[, ⟦}} line 12: 퐽 = ∅ ▷ 퐽 = ∅ is not a partition of [2] = {1, 2} line 17: return False

smaller than the length of 푤. Therefore, after a finite number of recursions, the fourth argument passed to isMember becomes the empty word and the algorithm terminates. ∎

Time complexity of isMember. The worst case time complexity of algorithm 5.18 is atleast exponential in a polynomial of the length of the input word. This is due to the cardinality of ℐ and the for-loop on lines 12 to 16. Let 휅 be the number of different cells 픭 ∈ 픓 that occur in 훿1⋯훿ℓ, and let each symbol occur at most 푟 times. Both 휅 and 푟 have upper bound ℓ. Let 푠 be the dimension of 픓. Then there are at most 휅 ⋅ 푟푠−1 ≤ ℓ푠 values of 퐼 considered in the for-loop on lines 7 to 11. Since ℓ < |푤|, we execute this for-loop at most |푤|푠 times. Hence, 푠 ℐ has cardinality at most |푤|푠. Therefore, the for-loop on lines 12 to 16 considers 2|ℐ| ≤ 2|푤| different values of 퐽 in the worst case.

5.4.3 A Chomsky-Schützenberger theorem using equivalence multiple Dyck languages In this section, we will prove the following theorem. Theorem 5.20 (taken from Den15; Den17b, theorems 19 and 4.11, respectively). Let 훴 be a set and 풜 be a commutative semiring. For every 풜-weighted multiple context-free language 퐿: 훴∗ → 풜, there is set 훥, a weighted string homomorphism ℎ: 훥∗ → (훴∗ → 풜), a regular language 푅 ⊆ 훥∗, and an equivalence multiple Dyck language mD ⊆ 훥∗ such that 퐿 = ℎ(mD ∩ 푅). Note that while every weighted multiple context-free language can be decomposed into a regular language, a weighted homomorphism, and an equivalence multiple Dyck language, the converse is not true: Not every homomorphic image of the intersection of a regular language and an equivalence multiple Dyck language is a multiple context-free language. This is easy to see if we recall that every multiple context-free grammar has a fixed rank. Let us select a multiple Dyck grammar with rank 푛. Furthermore, we select 푘 equivalence multiple Dyck 1 푚1 1 푚푘 푗 words 푢1⋯푢1 , …, 푢푘⋯푢푘 (with the 푢푖’s being Dyck words) such that 푘 > 푛. Then each 1 푚1 1 푚푘 permutation of 푢1, …, 푢1 , …, 푢푘, …, 푢푘 is again a multiple Dyck word. We select a primitive permutation. Since 푘 > 푛, the permutation would have to be decomposed in order to be representable by our multiple context-free grammar. This contradicts the permutation being primitive.

122 5.4 Equivalence multiple Dyck languages

Definition 5.21 (taken from Den17b, definition 4.9). Let 퐺 = (푁, 훴, 푁i, 푅) be an MCFG. The equivalence multiple Dyck language w.r.t. 퐺, denoted by mD≡(퐺), is mD≡(픓) where 픓 is the smallest set 푃 such that 1 • 픭휎 = {⟦휎} ∈ 푃 for every 휎 ∈ 훴, 푗 • 픭휌 = {⟦휌 ∣ 푗 ∈ [fanout(휌)]} ∈ 푃 for each 휌 ∈ 푅, and 푗 • 픭휌,푖 = {⟦휌,푖 ∣ 푗 ∈ [fanout푖(휌)]} ∈ 푃 for each 휌 ∈ 푅 and 푖 ∈ [rank(휌)]. □

Lemma 5.22 (taken from Den17b, lemma 4.10). 푅(퐺) ∩ mD≡(퐺) = 푅(퐺) ∩ mDG(퐺) for each MCFG 퐺.

Proof. For (⊆): From definitions 5.6 and 5.21 and proposition 5.16 follows that mDG(퐺) ⊆ mD≡(퐺). By monotonicity of ∩, we then get 푅(퐺) ∩ mD≡(퐺) ⊆ 푅(퐺) ∩ mDG(퐺). For (⊇): It is sufficient to show that mD≡(퐺) ⊇ 푅 ∩ mDG(퐺) which is a consequence of the following statement:

Let 퐵 ∈ 푁 and 푤1, …, 푤푚 be Dyck words such that 푤1⋯푤푚 ∈ mD≡(퐺). If 푤휅 is 휅 recognised in a run from 퐵 to 푞f in ℳ(퐺) for each 휅 ∈ [푚], then (푤1, …, 푤푚) rank(퐺) can be generated from 퐴푚 in 퐺훥 .

Now let 퐵 ∈ 푁 and 푤1, …, 푤푚 be Dyck words such that 푤1⋯푤푚 ∈ mD≡(퐺), and 푤휅 is 휅 recognised in a run from 퐵 to 푞f in ℳ(퐺) for every 휅 ∈ [푚]. By the definitions of ℳ(퐺) and mD≡(퐺), we know that there is some rule 휌 = 퐵 → 푓(퐵1, …, 퐵푘) ∈ 푅 with

푓 = [ 푢 푥 푢 ⋯ 푥 푢 , …, 1,0 푖(1,1),푗(1,1) 1,1 푖(1,푝1),푗(1,푝1) 1,푝1 푢 푥 푢 ⋯ 푥 푢 ] 푚,0 푖(푚,1),푗(푚,1) 푚,1 푖(푚,푝푚),푗(푚,푝푚) 푚,푝푚 such that for every 휅 ∈ [푚] either 휅 휅 (i) 푤휅 = ⟦휌 ̃푢휅,0⟧휌 or 휅 휅 푗(휅,1) 푗(휅,1) 푗(휅,1) 푗(휅,푝휅) 푗(휅,푝휅) 푗(휅,푝휅) (ii) 푤휅 = ⟦ ̃푢휅,0 ⟦ 푣 ⟧ ̃푢휅,1 ⋯ ⟦ 푣 ⟧ ̃푢휅,푝 ⟧ , 휌 휌,푖(휅,1) 푖(휅,1) 휌,푖(휅,1) 휌,푖(휅,푝휅) 푖(휅,푝휅) 휌,푖(휅,푝휅) 휅 휌 푗 푗 and the Dyck word 푣푖 is recognised in a run from 퐵푖 to 푞f in ℳ(퐺) for every 푖 ∈ [푘] and 1 sort(퐵푖) 푗 ∈ [sort(퐵푖)], and 푣푖 ⋯푣푖 ∈ mD≡(퐺) for every 푖 ∈ [푘]. Then by induction hypothesis, sort(퐵 ) (퐺) (푣1, …, 푣 푖 ) can be generated from 퐴 in 퐺rank for every 푖 ∈ [푘]. Using rules of 푖 푖 sort(퐵푖) 훥 types (ii) and (iii) (cf. definition 5.4), 퐴 , …, 퐴 can generate tuples that together sort(퐵1) sort(퐵푘) have exactly the following 푝1 + ⋯ + 푝푚 components: 푗(1,1) 푗(1,1) 푗(1,1) (1) ̃푢1,0 ⟦휌,푖(1,1)푣푖(1,1) ⟧휌,푖(1,1) ̃푢1,1 푗(1,2) 푗(1,2) 푗(1,2) (2) ⟦휌,푖(1,2)푣푖(1,2) ⟧휌,푖(1,2) ̃푢1,2 …

푗(1,푝1) 푗(1,푝1) 푗(1,푝1) (푝1) ⟦ 푣 ⟧ ̃푢1,푝 휌,푖(1,푝1) 푖(1,푝1) 휌,푖(1,푝1) 1 … 푗(푚,1) 푗(푚,1) 푗(푚,1) (푝1 + ⋯ + 푝푚−1 + 1) ̃푢푚,0 ⟦휌,푖(푚,1)푣푖(푚,1) ⟧휌,푖(푚,1) ̃푢푚,1

123 5 A Chomsky-Schützenberger characterisation of weighted MCFLs

푗(푚,2) 푗(푚,2) 푗(푚,2) (푝1 + ⋯ + 푝푚−1 + 2) ⟦휌,푖(푚,2)푣푖(푚,2) ⟧휌,푖(푚,2) ̃푢푚,2 …

푗(푚,푝푚) 푗(푚,푝푚) 푗(푚,푝푚) (푝1 + ⋯ + 푝푚) ⟦ 푣 ⟧ ̃푢푚,푝 . 휌,푖(푚,푝푚) 푖(푚,푝푚) 휌,푖(푚,푝푚) 푚

With the help of these components and a rule of type (i), we can generate from 퐴푚 the tuple ′ ′ ′ 푤1, …, 푤푚 where 푤휅 is the concatenation of components (푝1 +⋯+푝휅−1 +1) to (푝1 +⋯+푝휅) rank(퐺) for each 휅 ∈ [푚]. Using a rule of type (ii), we finally obtain (푤1, …, 푤푚) from 퐴푚 in 퐺훥 . 휅 ′ 휅 (In particular, 푤휅 = ⟦휌 푤휅⟧휌 for each 휅 ∈ [푚].) ∎ Proof of Theorem 5.20. This follows immediately from theorem 5.8 and lemma 5.22. ∎

124 6 Parsing of natural languages

6.1 Introduction

Parsing is a mechanism by which a string (e.g. a program written in a programming language or a sentence written in a natural language) is annotated with syntactic structure (usually abstract syntax trees for programming languages and parse trees or dependency graphs for natural languages). The obtained syntactic structure may serve as an input for other processes suchas compilation, translation, semantic analysis, question answering, and information extraction [cf. JM09, chapter 13]. There are multiple ways to augment a string with syntactic structure. The classical methods for both programming languages and for natural language are grammar-based, i.e. a formal grammar (such as an extended Backus-Naur form or a context-free grammar) is used to model the language. Such methods allow direct modelling of the possible syntactic structures and allow easy debugging and manual fine-tuning since grammars are closely related to the generated syntactic structures. Hence they are state-of-the-art for parsing programming languages. With natural languages, however, (depending on the underlying grammar formalism) grammar-based parsing may be prohibitively slow and resource-consuming for settings such as real-time processing or in embedded devices. This deficit led to the introduction of transition-based methods that are lightning-fast and have a small memory footprint [Cov01; Niv03; Niv08]. The designer of a transition-based parsing method usually defines a handful of so-called transitions that transform the input string into a syntactic structure while reading and writing on some internal data structure (e.g. a pushdown). In each instant, the transition-based parser selects a transition to be applied. The selection mechanism is obtained with the help of various machine learning methods [e.g. YM03; MCP05; GF15; VG17]. Unfortunately, transition-based methods deny their designer direct influence on the generated syntactic structures. Furthermore, as opposed to grammar-based systems, they do not lend themselves to innovation through advances in formal language theory. Therefore, we will only deal with grammar-based parsing inthis dissertation. Also, we will restrict ourselves to constituent parsing, i.e. the production of parse trees (as opposed to dependency graphs). It is desirable for programming languages to be unambiguous, i.e. every program should have exactly one corresponding abstract syntax tree. In natural languages, however, ambiguity occurs naturally. Consider for example the sentence I saw the man with the telescope. which is widely used to illustrate ambiguity. The sentence is both semantically ambiguous (i.e. there are at least two meanings) and syntactically ambiguous (i.e. there are at least two parse trees). The following two semantic interpretations witness this: (i) I saw the man. The man had the telescope.

125 6 Parsing of natural languages

(ii) I used the telescope to see the man. and the two parse trees for the sentence shown in figure 6.1. Note that in the first parse tree,the subtree under PP is attached to the NP that spans the object of the clause. In the secondparse tree, the subtree under PP is attached directly to the VP of the sentence. The first andsecond parse tree corresponds to the first and second semantic interpretation, respectively. It maybe more likely to encounter the second semantic interpretation (together with the second parse tree) in the real world. Different likelihoods are usually modelled by assigning a probability to each parse tree. However, the algorithms presented in this chapter will not only work for probabilities but for any (suitably restricted) partially ordered commutative monoid with zero.

Definition 6.2 (taken from RD19, section 2). A partially ordered commutative monoid with zero (short: POCMOZ) is a tuple (풜, ⊙, ퟙ, ퟘ, ⊴) where (풜, ⊙, ퟙ) is a commutative monoid, ퟘ ∈ 픸 is absorbing w.r.t. ⊙, and ⊴ ⊆ 풜 × 풜 is a partial order such that ퟘ ⊴ 푎 and 푎1 ⊙ 푎2 ⊴ 푎1 for every 푎, 푎1, 푎2 ∈ 풜. □

Example 6.3. Let us find POCMOZs based on the algebras in example 2.5. • The Boolean semiring (픹, ∨, ∧, 0, 1) gives rise to two POCMOZs: – the Boolean POCMOZ I (픹, ∨, 0, 1, ≥) and – the Boolean POCMOZ II (픹, ∧, 1, 0, ≤). • Using the minimum operation min: ℝ×ℝ → ℝ or the maximum operation max: ℝ×ℝ → ℝ as a basis, which are present in the tropical semiring, the arctic semiring, the tropical bimonoid, and the arctic bimonoid, we obtain the following POCMOZs:

– the tropical POCMOZ I (ℝ≥0 ∪ {∞}, min, ∞, 0, ≤) and

– the arctic POCMOZ I (ℝ≤0 ∪ {−∞}, max, −∞, 0, ≥). • Using the addition +: ℝ × ℝ → ℝ as a basis, which is present in the probability semiring (with and without ∞), the tropical semiring, the arctic semiring, the tropical bimonoid, and the arctic bimonoid, we obtain the following POCMOZs: – the tropical POCMOZ II (ℝ ∪ {∞}, +, 0, ∞, ≥) and – the arctic POCMOZ II (ℝ ∪ {−∞}, +, 0, −∞, ≤). • Using the multiplication ⋅: ℝ × ℝ → ℝ as a basis, which is present in the probability semiring (with and without ∞), the algebras Pr1 and Pr2, and the Viterbi semiring, we obtain the following POCMOZ: – the Viterbi POCMOZ ([0, 1], ⋅, 1, 0, ≤).

• Using the operations +1: ℝ × ℝ → ℝ or +2: ℝ × ℝ → ℝ as a basis, which are present in the algebras Pr1 and Pr2, we obtain the following POCMOZs:

– the +1-POCMOZ ([0, 1], +1, 0, 1, ≥) and

– the +2-POCMOZ ([0, 1], +2, 0, 1, ≥). • Using the operation ∧: 훴∗ × 훴∗ → 훴∗ that calculates the longest common prefix of two strings over some set 훴, we obtain the following POCMOZ:

126 6.1 Introduction

S

NP VP

PRP VBD NP

I saw NP PP

DT NN IN NP

the man with DT NN

the telescope S

NP VP

PRP VBD NP PP

I saw DT NN IN NP

the man with DT NN

the telescope

Figure 6.1: Syntactic ambiguity of the sentence „I saw the man with the telescope“. (See table 6.1 for information on the labels of the inner nodes.)

Table 6.1: Excerpt of the part-of-speech tags and the syntactic tags used in the Penn treebank [TMS03].

tag – description S – clause NP – noun phrase

tags VP – verb phrase

syntactic PP – prepositional phrase PRP – personal pronoun VBD – verb, past tense DT – determiner tags NN – noun, singular or mass

part-of-speech IN – preposition or subordinating conjunction

127 6 Parsing of natural languages

– the prefix POCMOZ (훴∗ ∪ {∞}, ∧, ∞, 휀, ≼) where ∞ ∉ 훴 such that for each 푢, 푣 ∈ 훴∗: ∞ ∧ ∞ = ∞, ∞ ∧ 푢 = 푢, 푢 ≼ ∞, and 푢 ≼ 푣 iff 푢 is a prefix of 푣. • No POCMOZ is based on concatenation ∘: 훴∗ × 훴∗ → 훴∗ since it is not commutative. • Any bounded lattice (퐿, ∨, ∧, ⊥, ⊤) where ∨ is the lattice join and ∧ is the lattice meet, gives rise to (at least) the following two POCMOZs:

– the ∨-POCMOZ (퐿, ∨, ⊥, ⊤, ⊒) where, for each 푙1, 푙2 ∈ 퐿, 푙1 ⊒ 푙2 if and only if 푙1 ∨ 푙2 = 푙1, and

– the ∧-POCMOZ (퐿, ∧, ⊤, ⊥, ⊑) where, for each 푙1, 푙2 ∈ 퐿, 푙1 ⊑ 푙2 if and only if 푙1 ∧ 푙2 = 푙1.

Note that this includes two POCMOZs based on Pow퐴 for any set 퐴 and two POCMOZs based on Div. • The lattice Div = (ℕ, lcm, gcd, 1, 0) gives rise to two additional POCMOZs:

– the Div-POCMOZ I (ℕ+ ∪ {∞}, lcm, 1, ∞, ≥) and

– the Div-POCMOZ II (ℕ+ ∪ {∞}, gcd, ∞, 1, ≤). □ Let (풜, ⊙, ퟘ, ퟙ, ⊴) be a POCMOZ, 퐵 be a set, and 푓: 퐵 → 풜. The support of 푓, denoted by supp(푓), is the set {푏 ∈ 퐵 ∣ 푓(푏) ≠ ퟘ}. Note that for each POCMOZ (풜, ⊙, ퟘ, ퟙ, ⊴) there exists a corresponding strong bimonoid (풜, ⊕, ⊙, ퟘ, ퟙ) where ⊕ is some operation on 풜.

For the rest of this chapter, let (풜, ⊙, ퟘ, ퟙ, ⊴) be an arbitrary POCMOZ.

The use of a POCMOZ enables us to calculate those parse trees thatare best, e.g. most likely to occur in the real world. Selecting the best, say 푛, elements from some set w.r.t. some weight- assigning function is formally described by the 푛-best function. Since the 푛 best elements of a set may not be uniquely defined, the 푛-best function returns the set of all possible strings of 푛 best elements. Definition 6.4 (taken from Den17b, definition 5.2). Let 훺 be a set, 푓: 훺 → 풜, and 푛 ∈ ℕ. The 푛-best function w.r.t. 푓, denoted by 푛-best(푓), is a function from 풫(훺) → 풫(훺∗) where for ′ ′ ′ every 훺 ⊆ 훺, 푘 ∈ [푛], and 휔1, …, 휔푘 ∈ 훺 we have that 휔1⋯휔푘 ∈ 푛-best(푓)(훺 ) if and only if the following four conditions hold: (i) 푘 = min{푛, |훺′|},

(ii) 휔1, …, 휔푘 are pairwise different,

(iii) 푓(휔푖) ⋪ 푓(휔푗) for each 푖, 푗 with 1 ≤ 푖 < 푗 ≤ 푘, and ′ (iv) 푓(휔푘) ⋪ 푓(휔) for each 휔 ∈ 훺 ∖ {휔1, …, 휔푘}. □ We can use the functions sort and take to solve the 푛-best parsing problem. Observation 6.5 (taken from Den17b, observation 5.4). Let 퐺 be an 풜-weighted MCFG over some set 훴 of terminals, 푛 ∈ ℕ, and 푤 ∈ 훴∗. Then

c c (sort(wt퐺); take(푛))(D퐺(푤)) ∈ 푛-best(wt퐺)(D퐺(푤)). ∎

128 6.2 Parsing weighted PMCFGs using weighted TSAs

푛-best parsing problem for 풜-weighted MCFGs [taken from Den17b, definition 5.3] Input: • an 풜-weighted MCFG 퐺 over some set 훴 of terminals

• an integer 푛 ∈ ℕ+ • a string 푤 ∈ 훴∗

c Output: • an element of 푛-best(wt퐺)(D퐺(푤))

However, this is not very efficient. Depending on the implementation of sort and take, this method may not even terminate. Each of the following three sections discusses a method to solve the 푛-best parsing problem.

6.2 Parsing weighted PMCFGs using weighted TSAs

With the help of chapter 3, we can design a parsing algorithm for weighted non-deleting PMCFGs.1 The parsing algorithm is based on two observations.

Observation 6.6. The search space of the accepting runs of a weighted automaton withdata storage ℳ can be represented as a (possibly infinite) weighted directed graph whose vertices are the configurations of ℳ (i.e. triples of states, storage configurations, and strings of terminal symbols of ℳ), edges are the transitions in the support of the weight assignment of ℳ, and edge weights are given by the weight assignment of ℳ. ∎

This observation allows us to find the accepting runs ofaword 푤 in a weighted automaton with data storage ℳ using a variant of the Dijkstra algorithm; shown in algorithm 6.7. The initial vertices of our graph are those with an initial state, an initial storage configuration, and the string 푤 that shall be parsed. Those initial vertices are enhanced with an initial weight and the empty run and added to the agenda Ag on line 3. Then, as long as we have not found enough runs yet (line 4), we select the best item from the agenda (lines 5 and 6), expand it according to the transitions in supp(휇) (line 10), and add the new items to the agenda (line 12). In addition to the configuration of ℳ, we keep track of the weight (the fourth component of the items in Ag) and the run (the fifth component). If the best item (according to the fourth component) in the agenda contains a final configuration of ℳ (in components 1–3), then we add the corresponding run (fourth component) to our output (line 8). Note that Dijkstra-Parse may return a string that is shorter than 푛. Also, the for-loop on line 11 is only guaranteed to terminate if DS is finitely non-deterministic.

1The automata characterisation of chapter 3 requires that 풜 is a complete commutative semiring. However, in this section it suffices that there is a bijection between the complete derivation trees ofagrammar 퐺 and the accepting runs of the corresponding automaton ℳ(퐺) that preserves the yield and the weight. The property 푃 from lemma 3.29 together with (†) from lemma 3.32 attest a bijection that preserves the yield. (Also, a grammar always has a grammar with only productive non-terminals that is equivalent.) Furthermore, since we are not interested in the weight of a word in a weighted language but only in the weight of derivations, we can ignore all properties that pertain to ⊕ in this chapter. Hence it suffices that 풜 is a POCMOZ.

129 6 Parsing of natural languages

Algorithm 6.7 Dijkstra 푛-best parsing for weighted automata with data storage

Input: • a (DS, 훴, 풜)-automaton ℳ = (푄, DS, 훴, 푄i, 푄f, 휇) with DS = (퐶, 퐼, 퐶i, 퐶f)

• an integer 푛 ∈ ℕ+ • a string 푤 ∈ 훴∗

acc Output: • an element of 푛-best(wtℳ)(Runsℳ(푤)) 1: function Dijkstra-Parse(ℳ, 푛, 푤) 2: let 푋 = ∅ ▷ 푋 is the set of runs of ℳ on 푤 that were already found

3: let Ag = {(푞i, 푐i, 푤, ퟙ, 휀) ∣ 푞i ∈ 푄i, 푐i ∈ 퐶i} ▷ fill agenda with initial configurations 4: while |푋| < 푛 and Ag ≠ ∅ do 5: let (푞, 푐, 푢, 푎, 휃) be the greatest element of Ag with respect to the fourth component 6: let Ag = Ag ∖ {(푞, 푐, 푢, 푎, 휃)}

7: if 푞 ∈ 푄f and 푐 ∈ 퐶f and 푢 = 휀 then 8: let 푋 = 푋 ∪ {휃} 9: end if 10: for each 휏 ∈ supp(휇) do ′ ′ ′ 11: for each (푞 , 푐 , 푢 ) ∈ ⊢휏((푞, 푐, 푢)) do 12: let Ag = Ag ∪ {(푞′, 푐′, 푢′, 푎 ⊙ 휇(휏), 휃휏)} 13: end for 14: end for 15: end while 16: return sort(wtℳ, ⊴)(푋) 17: end function

130 6.3 Coarse-to-fine parsing of weighted automata with storage

Observation 6.8 (taken from the end of Den16a, section 4.1). Let 퐺 be a non-deleting PMCFG. Furthermore, let [휉, 휀] be a storage configuration of ℳ(퐺), cf. construction 3.20, after recognis- ing some word 푤 and let 휉|1 denote the first subtree of 휉, defined by the equation 휉|1(휌) = 휉(1휌) ∗ 2 for each 휌 ∈ ℕ with 1휌 ∈ dom(휉). Then 휉|1 is a complete derivation tree of 푤 in 퐺. ∎

This observation allows us to derive an algorithm that uses algorithm 6.7 to derive aparsing algorithm for weighted non-deleting PMCFGs; shown in algorithm 6.9. We use construction 3.20 to create for our given PMCFG 퐺 an equivalent tree-stack automaton (line 2). Then we use algorithm 6.7 to calculate best 푛 runs of ℳ(퐺) (line 3). For each of those runs, we can calculate the (unique) storage configuration that the automaton has at the end of the run when starting from an initial configuration (line 5). Note that any run is only applicable to exactly initial configurations and hence the 휉푖s are uniquely defined. Finally, we use observation 6.8 to obtain the derivation tree 휉푖|1 corresponding to each 휉푖 (line 7).

Algorithm 6.9 TSA-based parsing for weighted non-deleting PMCFGs

Input: • an 풜-weighted non-deleting PMCFG 퐺 = (푁, 훴, 푁i, 휇)

• an integer 푛 ∈ ℕ+ • a string 푤 ∈ 훴∗

c Output: • an element of 푛-best(wt퐺)(D퐺(푤)) 1: function PMCFG-Parse(퐺, 푛, 푤) 2: let ℳ(퐺) be obtained from 퐺 by construction 3.20 3: let 휃1⋯휃푚 = Dijkstra-Parse(ℳ(퐺), 푛, 푤) ▷ 푚 may be smaller than 푛 4: for each 푖 ∈ [푚] do 5: let [휉푖, 휀] be obtained from {(휀, @)} by applying the instructions that occur in 휃푖 6: end for 7: return 휉1|1⋯휉푚|1 8: end function

The author implemented algorithms 6.7 and 6.9 as part of Rustomata.3 While the algorithms work in principle, they are (unsurprisingly) too time and space intensive for practical purposes. However, an alternative to the Dijkstra-like algorithm that also uses the automaton characteri- sation is given in section 6.3.

6.3 Coarse-to-fine parsing of weighted automata with storage

This section is a revised version of section 6 of T. Denkinger. “Approximation of Weighted Automata with Storage”. In: Proceedings of the Eighth International Symposium on Games, Automata, Logics and Formal Verification. Vol. 256. Electronic Proceedings in Theoretical

2If 퐺 were not non-deleting, then there may be subtrees in a complete derivation 푑 of 푤 in 퐺 that do not contribute

to 푤 and hence will not occur in any 휉|1. 3https://github.com/tud-fop/rustomata

131 6 Parsing of natural languages

Computer Science. Open Publishing Association, Sept. 2017, pp. 91–105. doi: 10.4204/eptcs. 256.7.

6.3.1 Coarse-to-fine parsing We will provide an algorithm that solves the 푛-best parsing problem, i.e. an algorithm that outputs for any given automaton ℛ with data storage, natural number 푛, and string 푤, a string of 푛 best runs of ℛ on 푤. Coarse-to-fine parsing [Cha+06] employs a simpler (i.e. easier to parse) automaton (with data storage) ℛ′ to parse 푤 and uses the runs of ℛ′ on 푤 to narrow down the search space for the runs of ℛ on 푤. To ensure that there are runs of ℛ′ on 푤 whenever there are runs of ℛ on 푤, we require that ℒ(ℛ′) ⊇ ℒ(ℛ). The automaton ℛ′ is obtained by superset approximation. In particular, we let ℛ′ = 퐴(ℛ) for some total approximation strategy 퐴.

6.3.2 The algorithm Algorithm 6.10 describes coarse-to-fine 푛-best parsing for weighted automata with data storage. The algorithm starts with aset 푋 that is empty (line 2) and a set 푌 that contains all the runs of 퐴(ℳ) on 푤 (line 3). Then, as long as 푋 has less than 푛 elements or an element of 푌 is greater than the smallest element in 푋 with respect to their weights (line 4), we take the greatest element 휃′ of 푌 (line 8), remove it from 푌 (line 9), calculate the corresponding strings 휃 of transitions from ℳ (line 11), and add 휃 to 푋 if 휃 is a run of ℳ (line 12). The use of 퐴(ℳ) as defined in section 4.4 requires that the weights are taken from acom- mutative semiring that is also a POCMOZ. Formally, the use of 퐴(ℳ) is still justified, because each POCMOZ can be extended to a commutative semiring by (arbitrarily) extending the partial order ⊴ to a total order ⊴′ and then defining the addition operation as the maximum w.r.t. ⊴′. In practice, the user of this coarse-to-fine parsing framework may choose the addition operation to best optimise the implementation for their use case, as long as the resulting algebra is still a commutative semiring.

Idea for the proof of correctness of algorithm 6.10. Correctness is shown by inspecting the two conditions of the while-loop. Because of the first condition, we put at least 푛 elements in 푋 (if there are that many, cf. line 5). The second condition requires that we keep looping as long as there are still better runs in 푌 than the 푛-th best run in 푋. Hence, we terminate only if 푋 has at least 푛 runs (or all runs) and every run in 푌 is worse than the 푛-th best run in 푋. From lemma 4.30 follows that for each 휃′ ∈ 푌, every run in 퐴−1(휃′) is not better than 휃′. Hence, when every run in 푌 is worse than the 푛-th best run in 푋, then also every run in 퐴−1(푌 ) is worse than the 푛-th best run in 푋. In other words, there is no longer a run in 퐴−1(푌 ) that would need to be included in an 푛-best list of runs.

Initialisation of 푌. We can restrict the automaton 퐴(ℳ) to the input 푤 with the usual product construction. The set of runs of the resulting product automaton (let uscallit ℳ퐴,푤) can be mapped onto Runs퐴(ℳ)(푤) by some projection 휑. Hence ℳ퐴,푤 (finitely) represents Runs퐴(ℳ)(푤). The automaton ℳ퐴,푤 can be construed as a (not necessarily finite) graph 퐺퐴,푤 with the ℳ퐴,푤-configurations as nodes. The edges shall be labelled with the images ofthe

132 6.3 Coarse-to-fine parsing of weighted automata with storage

Algorithm 6.10 Coarse-to-fine 푛-best parsing for weighted automata with data storage [taken from Den17a, algorithm 3] Input: • a total approximation strategy 퐴: 퐶 → 퐶′

• a (DS, 훴, 풜)-automaton ℳ where DS = (퐶, 퐼, 퐶i, 퐶f)

• an integer 푛 ∈ ℕ+ • a string 푤 ∈ 훴∗

acc Output: • an element of 푛-best(wtℳ)(Runsℳ(푤)) 1: function CTF-Parse(퐴, ℳ, 푛, 푤) 2: let 푋 = ∅ ▷ 푋 is the set of runs of ℳ on 푤 that were already found acc 3: let 푌 = Runs퐴(ℳ)(푤) ▷ 푌 is the set of runs of 퐴(ℳ) on 푤 not yet considered 4: while |푋| < 푛 or else the weight of the 푛-th best element of 푋 is smaller (⊲) than the weight of the best element of 푌 do 5: if 푌 = ∅ then 6: break 7: end if ′ 8: let 휃 = greatest element of 푌 with respect to the image under wt퐴(ℳ) 9: let 푌 = 푌 ∖ {휃′} 10: for each 휃 ∈ 퐴−1(휃′) that is a string of transitions in ℳ do acc 11: if 휃 ∈ Runsℳ then ▷ it is sufficient to only check the storage behaviour of 휃 12: let 푋 = 푋 ∪ {휃} 13: end if 14: end for 15: end while 16: return a string of 푛 greatest elements of 푋 with respect to the image under wtℳ in descending order 17: end function

corresponding transitions of ℳ퐴,푤 under 휑. Then the paths (i.e. sequences of edge labels) in 퐺퐴,푤 from the initial ℳ퐴,푤-configuration to all the final ℳ퐴,푤-configurations are exactly the elements of Runs퐴(ℳ)(푤). Those paths can be enumerated in descending order of their weights using a variant of Dijkstra’s algorithm (similar to algorithm 6.7). This provides us with a method ′ ′ to compute max휃′∈푌 wt퐴(ℳ)(휃 ) on line 4 and 휃 on line 8. One requirement for this method to be effective is that 퐴(DS) is finitely non-deterministic, or, equivalently, 퐴 is DS-proper.

Example 6.11 (taken from Den17a, example 39). Let 훤 = {a, b, c}, 훴 = 훤 ∪ {#}, 풜 ∗ be the tropical POCMOZ II (ℝ ∪ {∞}, +, 0, ∞, ≥), and 퐴#: 훤 → ℕ with 퐴#(푢) = |푢|. Note that 퐴#(PD(훤 )) = Count. Now consider the 풜-weighted automata with data stor- ′ age ℳ = ([3], PD(훤 ), 훴, {1}, {3}, 휇) and 퐴#(ℳ) = ([3], Count, 훴, {1}, {3}, 휇 ) where ′ ′ ′ ′ ′ ′ supp(휇) = {휏1, …, 휏8} and supp(휇 ) = {휏1, 휏23, 휏4, 휏5, …, 휏8} with

133 6 Parsing of natural languages

퐴# ′ 휏1 = (1, push(a), a , 1) 휏1 = (1, inc , a , 1) 휏 = (1, ( ) , 휀 , 1) 2 push b ′ 휏23 = (1, inc , 휀 , 1) 휏3 = (1, push(c) , 휀 , 1) ′ 휏4 = (1, id , #, 2) 휏4 = (1, id , #, 2) ′ 휏5 = (2, top(a); pop, a , 2) 휏5 = (2, id(ℕ+); dec, a , 2) ′ 휏6 = (2, top(b); pop, b , 2) 휏6 = (2, id(ℕ+); dec, b , 2) ′ 휏7 = (2, top(c); pop, c , 2) 휏7 = (2, id(ℕ+); dec, c , 2) ′ 휏8 = (2, bottom , 휀 , 3) 휏8 = (2, id({0}) , 휀 , 3)

′ and 휇(휏푖) = 1 = 휇 (휏푖) for each 푖 ∈ [8]. The language ℒ(ℳ) contains exactly the strings of the form a푘#푤 for which 푘 ∈ ℕ, 푤 ∈ ∗ {a, b, c} , and a occurs 푘 times in 푤. The language ℒ(퐴#(ℳ)) contains exactly the strings of the form a푘#푤 for which 푘 ∈ ℕ, 푤 ∈ {a, b, c}∗, and |푤| ≥ 푘. We use algorithm 6.10 to obtain the 1-best run of 푤 = a#ba: On line 8, we get 휃′ = ′ ′ ′ ′ ′ ′ 휏1휏23휏4휏6휏5휏8 (the only run of 퐴#(ℳ) on 푤). Then there are only two possible values for 휃 on line 11, namely 휃1 = 휏1휏2휏4휏7휏5휏8 and 휃2 = 휏1휏3휏4휏7휏5휏8 of which only 휃2 is a run of ℳ, hence the algorithm returns the string (of runs) that contains only the run 휃2. □

6.3.3 The implementation and practical relevance A proof-of-concept implementation of the coarse-to-fine parsing shown in algorithm 6.10 was done by Korn [Kor17] in Rustomata.4 He did experiments on small portions of the NeGra corpus (subsets of up to 20 sentences). Depending on the approximation strategies that were used, he could show a decrease of parse time by 25 % to 50 % in comparison to the method shown in section 6.2. A decrease in the quality of the parses did not occur, possibly because of the low size of the used corpora. His measurements showed that the decrease of parse time steepens with increased corpus size. He conjectured that this trend continues beyond corpus sizes of 20 sentences.

The implementation of algorithm 6.10 uses algorithm 6.7 to compute Runs퐴(ℳ)(푤). In particular, no optimisations (like dynamic programming, heuristic search, or approximative search) were used. Hence, the implementation can not be run on sufficiently large corpora to allow for a substantiated guess on the viability of automaton-based coarse-to-fine parsing. Optimising the implementation and evaluating it on large corpora is open for future work. In particular, the implementation should be compared to (at least) state-of-the-art grammar-based parsers. Even if it turns out that this parsing approach is not viable in practice, it still provides a theoretical framework of characterising coarse-to-fine parsing pipelines: by means of approxi- mation strategies. This is relevant even for describing grammar-based coarse-to-fine parsing because the used grammar models usually have an automaton characterisation.

4https://github.com/tud-fop/rustomata

134 6.4 Chomsky-Schützenberger parsing of weighted MCFGs

6.4 Chomsky-Schützenberger parsing of weighted MCFGs

This is a revised version of sections 5 and 6 of T. Denkinger. “Chomsky-Schützenberger parsing for weighted multiple context-free languages”. In: Journal of Language Modelling 5.1 (July 2017), p. 3. doi: 10.15398/jlm.v5i1.159.

6.4.1 Introduction Hulden [Hul11] introduced Chomsky-Schützenberger parsing for context-free grammars. The general idea is to use the decomposition of context-free grammars provided by Chomsky- Schützenberger representation of context-free languages (repeated below) to derive a parsing algorithm.

Theorem 5.1 (taken from CS63, proposition 2). Let 훴 be a set and 퐿 ⊆ 훴∗ be a language. The following are equivalent: (i) 퐿 is a context-free language. (ii) There is a set 훥, a recognisable language 푅 ⊆ (훥 ∪ 훥)∗, and a string homo- morphism ℎ: (훥 ∪ 훥)∗ → 훴∗ such that 퐿 = ℎ(D(훥) ∩ 푅).5 ∎

For this purpose, Hulden [Hul11] extracts from a finite state automaton that represents 푅 all the strings that are in D(훥), which enables him to describe a parsing algorithm for (weighted) context-free grammars. This section will introduce an algorithm that uses the Chomsky- Schützenberger representation of (unweighted) multiple context-free languages (theorem 5.5) to derive a parsing algorithm for weighted multiple context-free grammars.

Theorem 5.5 (taken from YKS10, theorem 3). Let 훴 be a set, 퐿 ⊆ 훴∗ be a language, 푘 ∈ ℕ, and 푠 ∈ ℕ+. The following are equivalent: (i) 퐿 ∈ (푠, 푘)-MCFL(훴). (ii) There is a set 훥, a recognisable language 푅 ⊆ 훥∗, a grammar multiple Dyck language mD ⊆ 훥∗ of dimension 푠 and rank 푘, and a string homomorphim ℎ: 훥∗ → 훴∗ such that 퐿 = ℎ(mD ∩ 푅). ∎

The following corollary analyses the construction of theorem 5.5 [see YKS10, theorem 3]and allows us to convert the elements of 푅 ∩ mD to derivation trees. Recall the notations 푅(퐺) and hom(퐺) from definition 5.6 and the notation mD≡(퐺) from definition 5.21. Corollary 6.12 (taken from Den17b, corollary 3.9). Let 퐺 be an MCFG. There exists a bijective c function toBr: D퐺 → 푅(퐺) ∩ mD≡(퐺) such that yield = toBr ; hom(퐺). 5D(훥) denotes the set of well-bracketed words where the opening brackets are taken from 훥 and the closing bracket for each 훿 ∈ 훥 is 훿 ∈ 훥.

135 6 Parsing of natural languages

Proof. The constructions in lemmas 1 and 3 in Yoshinaka, Kaji, and Seki [YKS10] already hintat c such a bijective function. We will merely point out this function toBr: D퐺 → 푅(퐺) ∩ mDG(퐺) c and its inverse fromBr: 푅(퐺) ∩ mDG(퐺) → D퐺 here. Let 퐺 = (푁, 훴, 푁i, 푅) and 훥 be the generator set with respect to 퐺 and 푘 = rank(퐺). We examine the proof of Yoshinaka, Kaji, and Seki [YKS10, lemma 1], cf. definition 5.6. They construct for every rule 퐴 → 푓(퐵1, …, 퐵ℓ) ∈ 푅 and all tuples 1̄휏 , …,ℓ ̄휏 that are generated by 푘 푘 퐵1, …, 퐵ℓ in 퐺훥, respectively, a tuple ̄푢= (푢1, …, 푢푚) that is generated from 퐴 in 퐺훥. For each 푗 푗 ∈ [푚], ℳ(퐺) recognises 푢푗 on the way from 퐴 to 푞f, and 푓(hom(퐺)(휏1), …, hom(퐺)(휏ℓ)) = hom(퐺)( ̄푢), where hom(퐺) is applied to tuples component-wise. Now, we look at any initial

non-terminal 푆 ∈ 푁i. Then ̄푢 has only one component and this construction can be conceived c as a function toBr: D퐺 → 푅(퐺) ∩ mDG(퐺) such that toBr ; hom(퐺) = yield. In lemma 3, Yoshinaka, Kaji, and Seki [YKS10] give a construction for the converse direction 푘 by recursion on the structure of derivations in 퐺훥. In a similar way as above, we view this c construction as a function fromBr: 푅(퐺) ∩ mDG(퐺) → D퐺 such that fromBr ; yield = hom(퐺). Then fromBr ; toBr ; hom(퐺) = hom(퐺), and hence toBr−1 = fromBr.

Finally, let us point out that 푅(퐺) ∩ mDG(퐺) = 푅(퐺) ∩ mD≡(퐺) by lemma 5.22, and hence c c toBr: D퐺 → 푅(퐺) ∩ mDG(퐺) and fromBr: 푅(퐺) ∩ mDG(퐺) → D퐺. ∎

6.4.2 The naïve parser

∗ Let 퐺 = (푁, 훴, 푁i, 휇) be a non-deleting 풜-weighted MCFG and 푤 ∈ 훴 . Corollary 6.12 c provides a decomposition of D퐺(푤):

c −1 c D퐺(푤) = yield (푤) ∩ D퐺 (by definition 2.17) −1 c −1 = fromBr(toBr(yield (푤) ∩ D퐺)) (since toBr = fromBr)

−1 c = fromBr(toBr(yield (푤)) ∩ toBr(D퐺))

−1 c = fromBr(hom(퐺uw) (푤) ∩ toBr(D퐺)) (by corollary 6.12)

−1 = fromBr(hom(퐺uw) (푤) ∩ 푅(퐺uw) ∩ mD≡(퐺uw)) (by corollary 6.12) −1 = (hom(퐺uw) ; (∩푅(퐺uw)) ; (∩mD≡(퐺uw)) ; fromBr)(푤) where (∩푅(퐺uw)) and (∩mD≡(퐺uw)) are functions that intersect their arguments with 푅(퐺uw) and mD≡(퐺uw), respectively. ∗ In the next step, we find a function 휈퐺: 훥 → 풜 such that toBr ; 휈퐺 = wt퐺. This can be achieved along the lines of the definition of ℎ in the proof of lemma 2.74 by letting 휈퐺 be a homomorphism from the free monoid on 훥 to the POCMOZ 풜 such that

1 휇(휌) if 훿 = ⟦휌 for some rule 휌 ∈ supp(휇) 휈퐺(훿) = { } ퟙ otherwise

for each 훿 ∈ 훥. Using this function 휈퐺, we apply 푛-best parsing to the previously derived

136 6.4 Chomsky-Schützenberger parsing of weighted MCFGs

훥휀 훥휀 훥휀

1 1 ⟦a ⟦c start 0 1 2

Figure 6.15: An FSA that recognises the domain language of hom(퐺) with respect to the string ac where 퐺 is taken from example 2.20. Note that 훥휀 denotes the set of (opening or closing) brackets that are mapped to 휀 by hom(퐺). This figure is taken from Den17b, figure 6.

c decomposition of D퐺(푤):

c (sort(wt퐺, ⊴) ; take(푛))(D퐺(푤)) −1 = (hom(퐺uw) ; (∩푅(퐺uw)) ; (∩mD≡(퐺uw)) ; fromBr ; sort(wt퐺, ⊴) ; take(푛))(푤) (by the above derivation) −1 = (hom(퐺uw) ; (∩푅(퐺uw)) ; (∩mD≡(퐺uw)) ; sort(휈퐺, ⊴) ; map(fromBr); take(푛))(푤) (since toBr ; 휈퐺 = wt퐺) −1 = (hom(퐺uw) ; (∩푅(퐺uw)) ; sort(휈퐺, ⊴) ; filter(mD≡(퐺uw)) ; map(fromBr); take(푛))(푤).

Definition 6.13. Let 퐺 be a non-deleting 풜-weighted MCFG and 푛 ∈ ℕ. The naïve Chomsky- Schützenberger 푛-best parser with respect to 퐺 is the function

−1 hom(퐺uw) ; (∩푅(퐺uw)) ; sort(휈퐺, ⊴) ; filter(mD≡(퐺uw)) ; map(fromBr); take(푛). □

Implementation details. The parser description in definition 6.13 is quite abstract (i.e. many details are left out). Let us state these details here: −1 (i) hom(퐺uw) (푤) is a recognisable language and may be implemented as a finite state automaton. Definition 6.14 (taken from Den17b, definition 5.5). Let 훥 be a finite set and 훴 be a ∗ ∗ set. Furthermore, let ℎ: 훥 → 훴 be an alphabetic homomorphism, 휎1, …, 휎푛 ∈ 훴, and 푤 = 휎1⋯휎푛. Consider the 푔: 훥 → 훴 ∪ {휀} for which ̂푔= ℎ. The domain language of ℎ with respect to 푤, denoted by domLℎ(푤), is the following set:

−1 ∗ −1 −1 ∗ −1 −1 ∗ (푔 (휀)) ∘ 푔 (휎1) ∘ (푔 (휀)) ∘ … ∘ 푔 (휎푛) ∘ (푔 (휀)) □

−1 −1 −1 Note that 푔 (휎1), …, 푔 (휎푛), 푔 (휀) in the above definition are all finite sets (whereas domLℎ(푤) may not be finite). Figure 6.15 shows an example for definition 6.14. The following observation follows from the definition of domL and the definition of string homomorphisms. Observation 6.16. Let 훥 be a finite set and 훴 be a set. Furthermore, let ℎ: 훥∗ → 훴∗ be ∗ −1 an alphabetic homomorphism and 푤 ∈ 훴 . Then domLℎ(푤) = ℎ (푤). ∎ Lemma 6.17 (taken from Den17b, lemma 5.6). Let 훥 be a finite set and 훴 be a set.

137 6 Parsing of natural languages

Furthermore, let ℎ: 훥∗ → 훴∗ be an alphabetic homomorphism and 푤 ∈ 훴∗. Then ℎ−1(푤) is regular.

Proof. Due to observation 6.16, it suffices to show that domLℎ(푤) is regular. For this, let 푔: 훥 → 훴 ∪ {휀} be the function such that ̂푔= ℎ and let 휎1, …, 휎푛 ∈ 훴 such −1 −1 −1 that 푤 = 휎1⋯휎푛. Since the sets 푔 (휎1), …, 푔 (휎푛), 푔 (휀) are all finite, they are also recognisable languages [HU69, theorem 3.7]. Since recognisable languages are closed under Kleene-star, (푔−1(휀))∗ is a recognisable language [HU69, theorem 3.9]. Furthermore, recognisable languages are closed under concatenation [HU69, theorem 3.7] and thus

−1 ∗ −1 −1 ∗ −1 −1 ∗ domLℎ(푤) = (푔 (휀)) ∘ 푔 (휎1) ∘ (푔 (휀)) ∘ … ∘ 푔 (휎푛) ∘ (푔 (휀))

is recognisable. ∎ −1 (ii) hom(퐺uw) (푤) ∩ 푅(퐺uw) is the intersection of two recognisable languages and can be obtained via the product construction of two finite state automata [cf. HU69, theorem 3.6].

An example for 푅(퐺uw) is given in figure 6.20 (top) and the product automaton obtained from it and figure 6.15 is shown in figure 6.18 (for now, let us ignore that some edgesare bold). −1 (iii) sort(휈퐺, ⊴)(hom(퐺uw) (푤) ∩ 푅(퐺uw)) can be implemented as an iterator that enumer- −1 ates the strings in the set hom(퐺uw) (푤) ∩ 푅(퐺uw) ordered by wt퐺 in a Dijkstra-like fashion (similar to algorithm 6.7).

(iv) With the operations filter(mD≡(퐺uw)) and map(fromBr), we apply the predicate mD≡(퐺uw) and the function fromBr element-wise to the output of the iterator, forming a new iterator.

(v) filter(mD≡(퐺uw)) is implemented with the help of isMember (algorithm 5.18). (vi) take(푛) consumes its input iterator lazily, i.e. evaluates only the prefix of the input sequence that is needed to generate the output. The rest of section 6.4 will concern itself with the termination of the outlined algorithm.

6.4.3 The problem: harmful loops

−1 A crucial part of the algorithm is the enumeration of the elements of hom(퐺uw) (푤) ∩ 푅(퐺uw). This is done using a Dijkstra-like algorithm on the graph of an FSA representing −1 hom(퐺uw) (푤) ∩ 푅(퐺uw). The weights determine the order in which states of the FSAare visited. Hence, any loops in the graph that have weight ퟙ may prevent the Dijkstra-algorithm

from ever reaching the final state. We call any loop in the graph of ℳ(퐺uw) harmful if, for −1 some string 푤, the graph of the FSA for hom(퐺uw) (푤) ∩ 푅(퐺uw) (obtained by product con- struction) has loops with weight ퟙ.

Definition 6.19 (taken from Den17b, definition 5.8). Let 훴 and 훥 be sets, ℳ = (푄, 훥, 푄i, 푄f, 푇 ) be an FSA, and ℎ: 훥∗ → 훴∗ and 휈: 훥∗ → 풜 be homomorphisms. A run

(푞0, 푢1, 푞1)(푞1, 푢2, 푞2)⋯(푞푘−1, 푢푘, 푞푘)

138 6.4 Chomsky-Schützenberger parsing of weighted MCFGs

1 1 1 ⟦휌 ⟦ 1 푆 1 휌1,1 퐴 start 0 0

⟦1 a⟦̃ 1 휌2 휌2,1

퐴1 1 2 1 1 1 {⟧ ⟧휌 ,⟧ ⟧휌 , ⟦1 ⟧1 휌1,2 1 휌2,1 2 휌4 휌4 ⟧2 ⟧2 ,⟧1 ⟧1 ,⟧2 ⟧2 } 휌2,1 휌2 휌3,1 휌3 휌3,1 휌3

⟦1 ⟧1 1 휌5 휌5 푞 퐵 f ⟧2 ⟦2 1 1 휌1,1 휌1,2 퐵2 1 1 1 ⟧휌 ,1⟦휌 ,2 1 1 ⟦2 ⟧2 휌5 휌5 ⟦2 ⟧2 ⟧1 ⟦2 휌4 휌4 휌1,2 휌1,1

퐴2 1

⟦2 c⟦̃ 2 휌2 휌2,1

퐴2 2

⟧1 ⟦2 ⟦2 ⟧2 휌1,2 휌1,1 휌4 휌4

⟦1 ⟧1 ⟧2 ⟦2 휌5 휌5 휌1,1 휌1,2

1 2 퐵 푞f 퐵 2 ⟧1 ⟦1 2 ⟦2 ⟧2 2 휌1,1 휌1,2 휌5 휌5

{⟧2 ⟧1 ,⟧1 ⟧1 ,⟧2 ⟧2 ,⟧1 ⟧1 ,⟧2 ⟧2 } 휌1,2 휌1 휌2,1 휌2 휌2,1 휌2 휌3,1 휌3 휌3,1 휌3

−1 Figure 6.18: Automaton for the intersection of hom(퐺uw) (ac) and 푅(퐺uw) where the weighted MCFG 퐺 is taken from example 2.56; obtained via product construc- tion. (Loops of weight ퟙ under 휈퐺 are printed in bold, cf. section 6.4.3.)

139 6 Parsing of natural languages in ℳ is called harmful (with respect to 휈 and ℎ) if

• 푞1, …, 푞푘 are pairwise different states,

• 푞0 = 푞푘,

• ℎ(푢1) = … = ℎ(푢푘) = 휀, and

• 휈(푢1) = … = 휈(푢푘) = ퟙ. □ We can easily see that the FSA in figure 6.20 (top) has eight harmful loops, printed in bold. This causes 14 loops of weight ퟙ (also printed in bold) in figure 6.18. To ensure that the Chomsky-Schützenberger parser terminates if a derivation tree exists for the given string in the wMCFG, we need to eliminate all harmful loops.6 We achieve this with the following three steps: • We modify the construction of ℳ(퐺), cf. definition 5.6, in section 6.4.4. • We provide a restriction to the weight structure (see section 6.4.5). • We provide a restriction to the grammar (see section 6.4.6). Furthermore, we provide an optimisation that is not necessary for termination, but seems nevertheless prudent. • We optimise isMember for the case that the grammar has a specific form (see section 6.4.7). The restrictions to grammar and weight structure turn out not to be problematic in practice. Finally, in section 6.4.8, we present the Chomsky-Schützenberger parsing algorithm.

6.4.4 The modification of ℳ(퐺) Consider an MCFG 퐺 with the sorted set 푁 of non-terminals. Then, intuitively, ℳ(퐺) has two kinds of states: (i) For every 퐴 ∈ 푁 and every 푗 ∈ [sort(퐴)], there is a state 퐴푗 that signifies that the automaton is about to process the 푗-th component of a tuple of strings generated in 퐺 by starting from the non-terminal 퐴.

(ii) There is a state 푞f that signifies that the automaton just finished processing some component of a tuple of strings generated in 퐺 by starting from any non-terminal.

We will split the state 푞f in ℳ(퐺) up to formalise the following intuition: (ii’) For every 퐴 ∈ 푁 and every 푗 ∈ [sort(퐴)], there is a state 퐴푗 that signifies that the automaton just finished processing the 푗-th component of a tuple of strings generated in 퐺 by starting from the non-terminal 퐴. The resulting automaton will be denoted by ℳ′(퐺). Figure 6.20 contrasts ℳ(퐺) and ℳ′(퐺). Note that ℳ′(퐺) only has four harmful loops (whereas ℳ(퐺) has eight). Hence the automaton for hom(퐺)−1(푤) ∩ ℒ(ℳ′(퐺)), shown in figure 6.21, has only four loops of weight ퟙ (printed in bold); the automaton for hom(퐺)−1(푤) ∩ 푅(퐺), see figure 6.18, has 14.

Definition 6.22 (taken from Den17b, definition 5.15). Let 퐺 = (푁, 훴, 푁i, 푅) be an MCFG and recall the definitions of 훥 and 훥̂ from definitions 5.6 and 5.4. The modified automaton with

6An additional modification is required to make the parser terminate even if no derivation tree exists.

140 6.4 Chomsky-Schützenberger parsing of weighted MCFGs

⟦1 ⟦1 휌1 휌 ,1 1 1 1 ⟦1 ̃푎⟦1 start 푆 퐴 휌2 휌2,1

⟦1 ⟧1 휌3 휌3 2 1 1 1 2 2 {⟧ ⟧휌 ,⟧ ⟧휌 , ⟦ ⟧ 휌1,2 1 휌2,1 2 휌4 휌4 ⟧2 ⟧2 ,⟧1 ⟧1 ,⟧2 ⟧2 } 휌2,1 휌2 휌3,1 휌3 휌3,1 휌3 퐴2 1 2 ⟧ ⟦ 푞f 휌1,2 휌1,1 ⟧2 ⟦2 휌1,1 휌1,2 ⟦2 ̃푐⟦2 휌2 휌2,1 ⟦2 ⟧2 휌5 휌5

퐵2 ⟦1 ⟧1 휌5 휌5 ⟧1 ⟦1 휌1,1 휌1,2 ⟦2 푑⟦̃ 2 휌3 휌 ,1 ⟦1 푏⟦̃ 1 1 3 휌3 휌3,1 퐵

⟦1 ̃푎⟦1 ⟧1 ⟧1 ⟦1 푏⟦̃ 1 ⟧1 ⟧1 휌 휌 ,1 휌 ,1 휌2 휌3 휌 ,1 휌3,1 휌3 start 2 2 2 3

⟦1 ⟦1 ⟦1 ⟧1 ⟧1 ⟦1 ⟦1 ⟧1 휌1 휌1,1 휌4 휌4 휌1,1 휌1,2 휌5 휌5 푆1 퐴1 퐴1 퐵1 퐵1

⟧1 ⟦2 휌1,2 휌1,1

푆1 퐵2 퐵2 퐴2 퐴2 ⟧2 ⟧1 ⟦2 ⟧2 ⟧2 ⟦2 ⟦2 ⟧2 휌1,2 휌1 휌5 휌5 휌1,1 휌1,2 휌4 휌4

2 2 2 ̃ 2 2 2 2 2 ⟧ ⟧ ⟦ 푑⟦ ⟧ ⟧휌 ⟦휌 ̃푐⟦휌 ,1 휌3,1 휌3 휌3 휌3,1 휌2,1 2 2 2

Figure 6.20: The automaton with respect to 퐺uw (top, repeated from figure 5.7) and the modified automaton with respect to 퐺uw (bottom, taken from Den17b, figure 7) where 퐺 is taken from example 2.56.

141 6 Parsing of natural languages

start

1 1 1 1 1 1 1 ⟦휌 ⟦ 1 ⟦휌 a⟦̃ 1 ⟦ ⟧ 푆 1 휌1,1 퐴 2 휌2,1 퐴 휌4 휌4 퐴1 1 1 ⟧ ⟧휌 0 0 1 1 휌2,1 2

1 1 ⟧ ⟧휌 휌3,1 3 ⟧1 ⟦1 휌1,1 휌1,2

2 2 1 2 1 1 2 ⟦휌 c⟦̃ 2 ⟧ ⟦ ⟦ ⟧ 1 퐴 2 휌2,1 퐴 휌1,2 휌1,1 퐵1 휌5 휌5 퐵 2 1 1 1

2 2 2 2 ⟧ ⟧휌 ⟦ ⟧ 휌3,1 3 휌4 휌4

2 2 2 2 2 1 ⟧ ⟦ 2 ⟦ ⟧ ⟧ ⟧휌 2 2 퐴2 휌1,1 휌1,2 퐵 휌5 휌5 퐵2 휌1,2 1 푆1 ⟧ ⟧휌 휌2,1 2 2 2 2 2

−1 ′ Figure 6.21: Automaton for the intersection of hom(퐺uw) (ac) and ℒ(ℳ (퐺uw)) where 퐺 is taken from example 2.56; obtained via the product construction. Loops of weight ퟙ under 휈퐺 are printed in bold [taken from Den17b, figure 8].

′ ̂ 1 1 respect to 퐺, denoted by ℳ (퐺), is the FSA (푄 ∪ 푄, 훥, {푆 ∣ 푆 ∈ 푁i}, {푆 ∣ 푆 ∈ 푁i}, 푇 ) where

푄 = {퐴푗 ∣ 퐴 ∈ 푁, 푘 ∈ [sort(퐴)]} 푄 = {퐴푗 ∣ 퐴 ∈ 푁, 푘 ∈ [sort(퐴)]}

and 푇 contains for every rule 휌 = 퐴 → [푣1, …, 푣푠](퐵1, …, 퐵푘) ∈ 푅 and each 푚 ∈ [푠] (where 푗(푚,1) 푗(푚,ℓ푚) ∗ 푣푚 is of the form 푢푚,0푥 푢푚,1⋯푥 푢푚,ℓ with 푢푚,0, …, 푢푚,ℓ ∈ 훴 ) exactly the 푖(푚,1) 푖(푚,ℓ푚) 푚 푚 following transitions

푚 푚 푚 푚 (퐴 ,⟦휌 ̃푢푚,0⟧휌 , 퐴 ) if ℓ푚 = 0, 푚 푚 푗(푚,1) 푗(푚,1) (퐴 ,⟦휌 ̃푢푚,0⟦휌,푖(푚,1), 퐵푖(푚,1) ) if ℓ푚 > 0, 푗(푚,푧−1) 푗(푚,푧−1) 푗(푚,푧) 푗(푚,푧) (퐵푖(푚,푧−1) ,⟧휌,푖(푚,푧−1) ̃푢푚,푧−1⟦휌,푖(푚,푧), 퐵푖(푚,푧) ) if ℓ푚 > 0, for every 푧 ∈ [ℓ푚], and

푗(푚,ℓ푚) 푗푚,ℓ푚 푚 푚 (퐵 ,⟧ ̃푢푚,ℓ ⟧휌 , 퐴 ) if ℓ푚 > 0. 푖(푚,ℓ푚) 휌,푖(푚,ℓ푚) 푚

′ ′ The modified regular language with respect to 퐺, denoted by 푅 (퐺), is ℒ(ℳ (퐺)). □ The next lemma ensures that we can use 푅′(퐺) for 푅 in theorem 5.20.

Lemma 6.23 (taken from Den17b, lemma 5.16). Let 퐺 be an MCFG. Then

′ 푅(퐺) ∩ mD≡(퐺) = 푅 (퐺) ∩ mD≡(퐺)

142 6.4 Chomsky-Schützenberger parsing of weighted MCFGs

Proof. Let 퐺 = (푁, 훴, 푁i, 푅) and 훥 be the generator alphabet with respect to 퐺. For (⊇): For this we show that 푅(퐺) ⊇ 푅′(퐺). Let

휃 = (푞0, 푢1, 푞1)(푞1, 푢2, 푞2)⋯(푞푚−1, 푢푚, 푞푚)

be an accepting run in ℳ′(퐺). We define the string

′ 휃 = (푡(푞0), 푢1, 푡(푞1))(푡(푞1), 푢2, 푡(푞2))⋯(푡(푞푚−1), 푢푚, 푡(푞푚))

where 푞 if 푞 = 퐴푗 for some 퐴 ∈ 푁 and 푗 ∈ sort(퐴) 푡(푞) = { } 푞f otherwise. Clearly for every transition (푞, 푢, 푞′) in ℳ′(퐺), there is a transition (푡(푞), 푢, 푡(푞′)) in ℳ(퐺). 1 ′ 1 For any 푆 ∈ 푁i, the state 푆 is the final state in ℳ (퐺) and 푡(푆 ) = 푞f is a final state in ℳ(퐺), we have that 휃′ is an accepting run in ℳ(퐺). c For (⊆): Since there exists an injective function from D퐺 to 푅(퐺) ∩ mD≡(퐺) (corollary 6.12), ′ it suffices to show that yield(D퐺) ⊆ hom(퐺)(푅 (퐺)). We prove the following for every 퐴 ∈ 푁 and 푑 ∈ D퐺(퐴) by induction on 푑: ̄ ∗ Let yield(푑) = (푢1, …, 푢푚). There are 푣1, …, 푣푚 ∈ (훥 ∪ 훥) such that, for every ′ 푗 ∈ [푚], hom(퐺)(푣푗) = 푢푗 and there is a run in ℳ (퐺) that reads 푣푗 and goes from 퐴푗 to 퐴푗. This statement implies the claim.

Induction base: Let 푑 = 휌 = 퐴 → [푢1, …, 푢푚]() ∈ 푅. By construction, there is a transition 푗 푗 푗 푗 ′ 푗 푗 ′ (퐴 ,⟦휌̃푢푗⟧휌, 퐴 ) in ℳ (퐺) for every 푗 ∈ [푚]. Clearly, hom(퐺)(⟦휌̃푢푗⟧휌) = 푢푗 and ℳ (퐺) 푗 푗 푗 푗 recognises ⟦휌̃푢푗⟧휌 from 퐴 to 퐴 . Induction step: Let 푑 = 휌(푑1, …, 푑푘) with 휌 = 퐴 → [푢1, …, 푢푚](퐵1, …, 퐵푘) and 푚푖 = fanout(퐵 ) for every 푖 ∈ [푘]. By induction hypothesis there are 푣푖 , …, 푣푖 for every 푖 1 푚푖 푖 ∈ [푘] such that (hom(퐺)(푣푖 ), …, hom(퐺)(푣푖 )) = yield(푑 ) and ℳ′(퐺) recognises 1 푚푖 푖 푖 푗 푗 푣푗 from 퐵푖 to 퐵푖 for every 푗 ∈ [푚]. By the existence of 휌 in 푅 and definition 6.22, ′ 1 1 푚 푚 we can construct runs in ℳ (퐺) from 퐴 to 퐴 , …, 퐴 to 퐴 , recognising 푣1, …, 푣푚, respectively, such that (hom(퐺)(푣1), …, hom(퐺)(푣푚)) = yield(푑). ∎

6.4.5 Factorisable POCMOZs Definition 6.24 (taken from Den17b, definition 5.14). We call 풜 factorisable if for every 푎 ∈ 풜 ∖ {ퟘ, ퟙ} and natural number 푘 ≥ 2, there are 푎1, …, 푎푘 ∈ 풜 such that

• 푎1, …, 푎푘 ∉ {푎, ퟘ, ퟙ},

• 푎 ⊴ 푎1, …, 푎 ⊴ 푎푘, and

• 푎1 ⊙ … ⊙ 푎푘 = 푎.

We then call the string 푎1 ⊙ … ⊙ 푎푘 a 푘-factorisation of 푎.

143 6 Parsing of natural languages

We call 풜 trivially factorisable if 풜 ∖ {ퟘ, ퟙ} is empty. Furthermore, we call 풜 non-trivially factorisable if it is factorisable but not trivially factorisable. □

Table 6.2 shows some examples for POCMOZ together with an appropriate factorisation. In order to remove the remaining four harmful loops from figure 6.20 (bottom), we give an alternative to the function 휈 presented in section 6.4.2. The idea is to not assign a weight 1 휇(휌) only to the symbol ⟦휌 but to factorise it and assign a factor to each of the symbols 1 1 fanout(휌) fanout(휌) ⟦휌,⟧휌, …, ⟦휌 ,⟧휌 .

Definition 6.25 (taken from Den17b, definition 5.17). Let 풜 be factorisable. Furthermore, let

퐺 = (푁, 훴, 푁i, 휇) be an 풜-weighted MCFG and 훥 be the generator alphabet with respect to 퐺. For every 휌 = 퐴 → [푢1, …, 푢푚](퐵1, …, 퐵푘) ∈ supp(휇), we fix a factorisation 푎휌,1 ⊙ ⋯ ⊙ 푎휌,2푚 of 휇(휌) if it exists (i.e. 휇(휌) ≠ ퟙ); otherwise let 푎휌,1 = … = ⊙푎휌,2푚 = ퟙ. We define the ′ ∗ homomorphism 휈퐺: 훥 → 풜 from the free monoid on 훥 to the factorisable POCMOZ 풜 such that 푗 ⎧푎휌,2푗−1 if 훿 is of the form ⟦휌⎫ { } 휈′ (훿) = 푗 퐺 ⎨푎휌,2푗 if 훿 is of the form ⟧휌⎬ { } ⎩ퟙ otherwise ⎭ for each 훿 ∈ 훥. □

Example 6.26 (taken from Den17b, example 5.18). Recall the wMCFG 퐺 from example 2.56. First, we fix factorisations of the weights in 퐺 as shown in table 6.2 for the probability POCMOZ:

4 4 4 4 for 휌2 and 휌4: 1/2 = √1/2 ⋅ √1/2 ⋅ √1/2 ⋅ √1/2

4 4 4 4 for 휌3: 1/3 = √1/3 ⋅ √1/3 ⋅ √1/3 ⋅ √1/3

4 4 4 4 for 휌5: 2/3 = √2/3 ⋅ √2/3 ⋅ √2/3 ⋅ √2/3

′ Then 휈퐺 is given as follows:

⎧ 4 푗 푗 ⎫ {√1/2 if 훿 ∈ {⟦휌,⟧휌 ∣ 휌 ∈ {휌2, 휌4}, 푗 ∈ [2]}} { } { 4 푗 푗 } { 1/3 if 훿 ∈ {⟦ ,⟧ ∣ 푗 ∈ [2]} } 휈′ (훿) = √ 휌3 휌3 퐺 ⎨ ⎬ □ { 4 푗 푗 } { 2/3 if 휎 ∈ {⟦ ,⟧ ∣ 푗 ∈ [2]} } {√ 휌5 휌5 } { } ⎩1 otherwise ⎭

′ We examine figures 6.21 and 6.20 again, but this time weuse 휈퐺 from example 6.26 instead of 휈퐺. Then there are no loops of weight ퟙ and harmful loops, respectively.

Lemma 6.27 (taken from Den17b, lemma 5.20). Let 풜 be factorisable and 퐺 = (푁, 훴, 푁i, 휇) ′ ′ be an 풜-weighted MCFG. Then 휈퐺(푢) = 휈퐺(푢) for every 푢 ∈ 푅 (퐺uw) ∩ mD≡(퐺uw).

144 6.4 Chomsky-Schützenberger parsing of weighted MCFGs

Table 6.2: The POCMOZs from example 6.3. A POZMOZ (풜, ⊙, ퟙ, ퟘ, ⊴) may be trivially factoris- able. If 풜 is non-trivially factorisable, then a 푘-factorisation of some 푎 ∈ 풜 ∖ {ퟙ, ퟘ} is given. example POCMOZ (풜, ⊙, ퟙ, ퟘ, ⊴) 푘-factorisation of 푎 Boolean POCMOZ I (픹, ∨, 1, 0, ≥) trivially factorisable

Boolean POCMOZ II (픹, ∧, 0, 1, ≤) trivially factorisable

tropical POCMOZ I (ℝ≥0 ∪ {∞}, min, ∞, 0, ≤) not factorisable

arctic POCMOZ I (ℝ≤0 ∪ {−∞}, max, −∞, 0, ≥) not factorisable

tropical POCMOZ II ([0, ∞), +, 0, ∞, ≥) 푎/푘 + … + 푎/푘

arctic POCMOZ II ((−∞, 0], +, 0, −∞, ≤) 푎/푘 + … + 푎/푘

푘 푘 Viterbi POCMOZ ([0, 1], ⋅, 1, 0, ≤) √푎 ⋅ … ⋅ √푎

+ -POCMOZ ([0, 1], + , 0, 1, ≥) 푎/푘 + … + 푎/푘 + 푎−푠 1 1 ⏟⏟⏟⏟⏟⏟⏟1 1 1 1−푠 =:푠

+2-POCMOZ ([0, 1], +2, 0, 1, ≥) 푎/푘 + … + 푎/푘

prefix POCMOZ (훴∗ ∪ {∞}, ∧, ∞, 휀, ≼) not factorisable for 훴 ≠ ∅

∨-POCMOZ (퐿, ∨, ⊥, ⊤, ⊒) not factorisable for 퐿 ≠ {⊤, ⊥}

∧-POCMOZ (퐿, ∧, ⊤, ⊥, ⊑) not factorisable for 퐿 ≠ {⊤, ⊥}

Div-POCMOZ I (ℕ+ ∪ {∞}, lcm, 1, ∞, ≥) not factorisable

Div-POCMOZ II (ℕ+ ∪ {∞}, gcd, ∞, 1, ≤) not factorisable

145 6 Parsing of natural languages

′ 푗 Proof. Let 휌 ∈ supp(휇) be an arbitrary production, 푚 = fanout(휌), and 휈퐺(⟦휌) = 푎2⋅푗−1 and ′ 푗 휈퐺(⟧휌) = 푎2⋅푗 for every 푗 ∈ [푚]. By the definition of the cell 픭휌 (cf. definition 5.21) and the 푗 cancellation relation, we know that for every symbol ⟦휌 there must occur corresponding symbols 1 푗−1 푗+1 푚 1 푚 ′ ⟦휌, …, ⟦휌 ,⟦휌 , …, ⟦휌 and ⟧휌, …, ⟧휌 in mD≡(퐺uw). Then by the definition of 휈퐺 follows that

′ 1 ′ 1 ′ 푚 ′ 푚 휈⏟퐺(⟦휌) ⊙ 휈⏟퐺(⟧휌) ⊙… ⊙ 휈⏟퐺(⟦휌 ) ⊙ 휈⏟퐺(⟧휌 )

=푎1 =푎2 =푎2푚−1 =푎2푚 is 휇(휌) and thus exactly the weight assigned to those symbols by 휈. ∎

′ ′ Unfortunately, it does not hold for any non-deleting wMCFG 퐺 that 휈퐺 together with 푅 (퐺uw) avoids harmful loops (for example any 퐺 that contains a rule of the form 퐴 → [푥1,1](퐴) with a non-terminal 퐴 of fanout 1 and with weight ퟙ). This is only achieved for restricted wMCFGs as shown in section 6.4.6.

6.4.6 Restricted weighted MCFGs

Definition 6.28 (taken from Den17b, definition 5.9). An 풜-weighted MCFG 퐺 = (푁, 훴, 푁i, 휇) is called restricted if there is no derivation 푑 in 퐺 and position 푛1⋯푛푘 ∈ pos(푑) such that

• 푛1, …, 푛푘 ∈ ℕ+,

• 푑(휀) = 푑(푛1⋯푛푘),

• 푑(휀), 푑(푛1), 푑(푛1푛2), …, 푑(푛1⋯푛푘) are pairwise different, and

• 휇(푑(휀)) = 휇(푑(푛1)) = 휇(푑(푛1푛2)) = … = 휇(푑(푛1⋯푛푘)) = ퟙ. □ Restricted weighted MCFG are strictly less powerful than (unrestricted) weighted MCFG, as the next example shows. Example 6.29 (taken from Den17b, example 5.10). Let us consider an arbitrary 픹-weighted MCFG 퐺 and let 푚 be the number of rules in 퐺. Assume that ℒ(퐺) is not finite. Then there are derivations in 퐺 of arbitrary height. It is clear that every derivation 푑 in 퐺 with a height greater than 푚 + 1 must have positions 푝, 푝푝′ ∈ pos(푑) such that 푑(푝) = 푑(푝푝′). Then, since 퐺 has weights from 픹, we know that 1 is assigned to every production in 퐺 and thus 퐺 is not restricted. □ Restricted weighted MCFGs are still useful in practice, as the following two propositions show. Definition 6.30 (taken from Den17b, definition 5.11). An 풜-weighted MCFL is called proper if 풜 is the probability POCMOZ and for each non-terminal 퐴 the sum (using the usual addition in ℝ) of the weights of all productions with left-hand side 퐴 is ퟙ. □ Proposition 6.31 (taken from Den17b, observation 5.12). Every proper weighted MCFG is restricted. Proof. Assume that 퐺 is proper but not restricted. Then there is a derivation 푑 in 퐺 and a position 푝 ∈ pos(푑) such that the weights of all productions along the path from the root to

146 6.4 Chomsky-Schützenberger parsing of weighted MCFGs

position 푝 in 푑 is ퟙ and 푑(휀) = 푑(푝) = 휌. All productions along the path from the root to position 푝 are unique for their respective left-hand side non-terminals since 퐺 is proper. This means that every derivation 푑′ with the rule 휌 at the root has the position 푝 and 휌 = 푑′(휀) = 푑′(휌). But then {휀, 푝, 푝푝, 푝푝푝, …} ⊆ pos(푑) and hence 푑 is not a (finite) term, which contradicts our definition of a derivation. ∎

If we extract a weighted MCFG from a corpus and assign the weights by maximum-likelihood estimation [as for example in KM13, p. 107], then we will get a weighted MCFG that is proper and therefore restricted. The next observation allows us to enrich an unweighted MCFG with a weight structureto make it suitable for Chomsky-Schützenberger parsing.

Proposition 6.32 (taken from Den17b, observation 5.13). For every MCFG 퐺, there is a re- stricted weighted MCFG 퐺′ such that ℒ(퐺) = ℒ(퐺′).

c Proof. This can be done by assigning for each derivation 푑 ∈ D퐺 its size (i.e. number of rules), i.e. wt퐺′ (푑) = |pos(푑)|. Then for each string 푤, we assign the smallest value of a ′ c derivation for 푤, i.e. ⟦퐺 ⟧(푤) = min{wt퐺′ (푑) ∣ 푑 ∈ D퐺(푤)}. To achieve that, we choose the POCMOZ (ℕ ∪ {∞}, +, 0, ∞, ≤) as weight algebra for 퐺′, use the productions from 퐺 as the support of the weight assignment of 퐺′, and give every production of 퐺 the weight 1. Then no production in 퐺′ has the weight ퟙ (which is 0 in the above mentioned POCMOZ) as its weight ′ ′ and 퐺 is therefore restricted. Furthermore, since 퐺 = 퐺uw (cf. definition 2.53), we know that ℒ(퐺) = ℒ(퐺′). ∎

′ It turns out that for restricted wMCFGs 퐺, the function 휈퐺 together with the language ′ 푅 (퐺uw) are sufficient to eliminate harmful loops.

Lemma 6.33 (taken from Den17b, lemma 5.19). Let 풜 be factorisable and 퐺 = (푁, 훴, 푁i, 휇) ′ be a restricted 풜-weighted MCFG that only has productive non-terminals. Then ℳ (퐺uw) has ′ no harmful loops with respect to 휈퐺.

Proof. We show the claim by contradiction. For this, assume that the run

(푞0, 푢1, 푞1)(푞1, 푢2, 푞2)⋯(푞푘−1, 푢푘, 푞푘)

′ ′ ′ ′ is a harmful loop in ℳ (퐺uw) with respect to 휈퐺. Then 푞0 = 푞푘 and 휈퐺(푢1) = … = 휈퐺(푢푘) = ퟙ. We now distinguish two cases:

Case 1: Let 푞0 = 푞푘 ∈ 푄 (with 푄 as defined in definition 6.22). Furthermore, let 퐼 = {푖0, …, 푖푚} be the maximal subset of {0, …, 푘} such that

(i) 0 = 푖0 < ⋯ < 푖푚 = 푘,

푗푖 (ii) for every 푖 ∈ 퐼, we have that 푞푖 is of the form 퐵푖 for some 퐵푖 ∈ 푁 and 푗푖 ∈ [sort(퐵푖)], and 푗 푖휅−1 (iii) for every 휅 ∈ [푚], we have that 퐵 occurs on the right-hand side of 휌 where ⟦휌 푖휅 푖휅−1 푖휅−1 is read in the transition that leaves state 푞 . 푖휅−1

147 6 Parsing of natural languages

Since every non-terminal in 퐺 is productive, there is a derivation 푑 in 퐺 and a position 푛 ⋯푛 in 푑 such that 푛 , …, 푛 ∈ ℕ and 푑(휀) = 휌 , 푑(푛 ) = 휌 , …, 푑(푛 ⋯푛 ) = 휌 , 1 푚 1 푚 + 푖1 1 푖2 1 푚−1 푖푚 푗 and 푑(푛 ⋯푛 ) = 휌 . For every 푖 ∈ 퐼, we know that 휇(휌 ) = ퟙ since 휈′ (⟦ 푖 ) = ퟙ. This 1 푚 푖1 푖 퐺 휌푖 contradicts 퐺 being restricted. ̄ ̄ Case 2: Let 푞0 = 푞푘 ∈ 푄 (with 푄 as defined in definition 6.22). Furthermore, let 퐼 = {푖0, …, 푖푚} be the maximal subset of {0, …, 푘} such that

(i) 0 = 푖0 < ⋯ < 푖푚 = 푘,

푗푖 (ii) for every 푖 ∈ 퐼 we have that 푞푖 is of the form 퐵푖 for some 퐵푖 ∈ 푁 and 푗푖 ∈ [sort(퐵푖)], and 푗 푖휅−1 (iii) for every 휅 ∈ [푚] we have that 퐵 occurs on the right-hand side of 휌 where ⟧휌 푖휅 푖휅−1 푖휅−1 is read in the transition that reaches 푞 . 푖휅−1

Since every non-terminal in 퐺 is productive, there is a derivation 푑 and a position 푛1⋯푛푚 in 푑 such that 푛 , …, 푛 ∈ ℕ and 푑(휀) = 휌 , 푑(푛 ) = 휌 , …, 푑(푛 ⋯푛 ) = 휌 , and 1 푚 푖푚 1 푖푚−1 1 푚−1 푖1 푗 푑(푛 ⋯푛 ) = 휌 . For every 푖 ∈ 퐼, we know that 휇(휌 ) = ퟙ since 휈′ (⟧ 푖 ) = ퟙ. This 1 푚 푖푚 푖 퐺 휌푖 contradicts 퐺 being restricted. ∎

6.4.7 A modification of isMember Although algorithm 5.18 is at least exponential in a polynomial of the length of the input word, it becomes quadratic if we only accept input words of a specific form. The parsing parsing algorithm presented in section 6.4.8 will only consider words of that form.

Let 훥 be a set and 픓 be a partition of 훥. Furthermore, let 푤 ∈ D(훥) and (훿1푣1훿1, …, 훿ℓ푣ℓ훿ℓ) = split(푤). For every 훿 ∈ 훥, we define occ훿푤 as |{푖 ∈ [ℓ] ∣ 훿푖 = 훿}| and for every 픭 ∈ 픓, we define occ픭푤 = max{occ훿푤 ∣ 훿 ∈ 픭}. Definition 6.34 (taken from Den17b, section 4.1, Properties of isMember). Let 훥 be a set, 픓 be a partition of 훥, and 푤 ∈ (훥 ∪ 훥)∗. We call 푤 픓-simple if (inductively defined), whenever the cancellation relation (cf. definition 5.9) can be applied to 푤 to cancel an occurrence of the cell 픭 (where 픭 has more than one element), then there is only one such occurrence and the strings 푢0⋯푢푘 and 푣1⋯푣푘 (from the definition of the cancellation relation) are also 픓-simple. □ In order to modify isMember to recognise 푤 only if its is 픓-simple, we add a check to it whether there is a cell 픭 ∈ 픓 for which |픭| ≥ 2 and occ픭푤 ≤ 1, see algorithm 6.35, lines 6 to 8. If this is the case, then we return False, otherwise, we continue. Note that this check can be done in time linear in the length of 푤 (more precisely: linear in ℓ, since the output of split(푤) can be used for that purpose). Let us call the function obtained in this manner isMember’, see algorithm 6.35. Then the 퐼s that isMember’ considers in the for-loop on lines 10 to 14 are pairwise disjoint. This means that each 푣푖 (for 푖 ∈ [ℓ]) occurs in at most one recursive call on line 11. Then the elements of ℐ are always pairwise disjoint and we only need to consider 퐽 = ℐ in the for-loop on lines 15 to 19. We can decide in time 풪(ℓ) whether ℐ is a partition of [ℓ]. Lines 2 to 5 can be computed in time 풪(|푤|). Since ℓ < |푤|, we know that for each call of isMember’, we have to invest time linear in the length of the third argument. The maximum depth of recursion is |푤|/2

148 6.4 Chomsky-Schützenberger parsing of weighted MCFGs

Algorithm 6.35 Function isMember’ to decide whether a string is 픓-simple and in mD≡(픓) Input: a partition 픓 of some set 훥 and a string 푤 ∈ (훥 ∪ 훥)∗ Output: True if 푤 is 픓-simple and in mD≡(픓), False otherwise 1: function isMember’(픓, 푤) 2: if 푤 ∉ D(훥) then 3: return False 4: end if 5: let (훿1푣1훿1, …, 훿ℓ푣ℓ훿ℓ) = split(푤) such that 훿1, …, 훿ℓ ∈ 훥 6: if there is a 픭 ∈ 픓 with |픭| ≥ 2 and occ픭푤 > 1 then 7: return False 8: end if 9: let ℐ = ∅ 10: for each 퐼 = {푖 , …, 푖 } ⊆ [ℓ] with {훿 , …, 훿 } ∈ 픓 do 1 푘 푖1 푖푘 11: if isMember(픓, 푣 ⋯푣 ) then 푖1 푖푘 12: add 퐼 as an element to ℐ 13: end if 14: end for 15: for each 퐽 ⊆ ℐ do 16: if 퐽 is a partition of [ℓ] then 17: return True 18: end if 19: end for 20: return False 21: end function

because the third argument in the call on line 11 has at most length |푤| − 2. For every recursion depth, the sum of the lengths of all third arguments is at most |푤| because 푣푖 (for 푖 ∈ [ℓ]) occurs in at most one recursive call on line 11. Therefore isMember’(픓, 푤) can be calculated in time 풪(|푤|2).

6.4.8 The algorithm Now let us present our parser and show that it is correct. Definition 6.36 (taken from Den17b, definition 5.21). Let 퐺 be a restricted non-deleting 풜- weighted MCFG and 푛 ∈ ℕ. The Chomsky-Schützenberger 푛-best parser with respect to 퐺, denoted by CS-parse(퐺, 푛) is the function

−1 ′ ′ hom(퐺uw) ; (∩푅 (퐺uw)) ; sort(휈퐺, ⊴) ; filter(mD≡(퐺uw)) ; map(fromBr); take(푛). □

Theorem 6.37 (taken from Den17b, 5.22). CS-parse is correct, i.e. it solves the 푛-best parsing problem for restricted non-deleting weighted MCFGs (see page 129). Proof. This follows immediately from observation 6.5, the derivation in section 6.4.2, aswell

149 6 Parsing of natural languages

lemmas 6.23 and 6.27. ∎

We give an algorithm that adds implementation details to CS-parse. In order to achieve termination of the algorithm, we add a threshold 휗 ∈ 풜 for the weight of the found derivations. Let 퐺 be a grammar and 푤 be a string of terminals from 퐺. Then the algorithm will ignore −1 ′ ′ elements of hom(퐺uw) (푤) ∩ 푅 (퐺uw) whose value with respect to 휈퐺 is smaller than 휗. Therefore, if 푤 has a complete derivation in 퐺 whose weight is below 휗, then our algorithm will not find it. Thus the algorithm is not correct and only an approximation of CS-parse.

Reachable thresholds. In the following, we will call the elements of ℳ (as constructed in algorithm 6.38, candidates. In order for the algorithm to terminate, it has to be possible to ′ eventually obtain candidates whose image under 휈퐺 is less than 휗. Since we can not easily determine, which loops of ℳ the algorithm will repeat unboundedly often, we simply require that there is a power of the weight of each loop that is smaller than 휗. For this, we collect the weights of all the loops in a set ℬ. Such a set is said to be admissible if it does not contain ퟙ. Since we use restricted wMCFGs, it is guaranteed that the weights of all loops of ℳ form an admissible set. Definition 6.39. Let ℬ ⊆ 풜 and 휗 ∈ 풜 ∖ {ퟘ}. We call 휗 (ℬ, ⊙, ⊴)-reachable (short: ℬ- 푘 reachable) if for each 푏 ∈ ℬ, there is a number 푘 ∈ ℕ+ such that ⨀푖=1 푏 ⊴ 휗. □ The following observation follows from ퟙ being identity. Observation 6.40. Let 휗 ∈ 풜 ∖ {ퟘ}. Then 휗 is {ퟙ}-reachable if and only if 휗 = ퟙ. ∎ In the probability POCMOZ, every threshold is reachable for any admissible set. Proposition 6.41. Let ℬ ∈ [0, 1) and 휗 ∈ (0, 1]. Then 휗 is (ℬ, ⋅, ≤)-reachable. 푘 Proof. Let 푏 ∈ ℬ and 푘 = ⌈log푏 휗⌉. Then, 푏 ≤ 휗. ∎ However, there are POCMOZs in which some thresholds may not be reachable from any admissible set. Example 6.42. Consider the POCMOZ ([0, 1]2, ⋅2, (1 1)T, (0 0)T, ≤2) where

푎 푏 푎 ⋅ 푏 ( 1) ⋅2 ( 1) = ( 1 1) and 푎2 푏2 푎2 ⋅ 푏2

푎1 2 푏1 ( ) ≤ ( ) ⟺ (푎1 < 푏1) ∨ ((푎1 = 푏1) ∧ (푎2 ≤ 푏2)) 푎2 푏2

T for any 푎1, 푎2, 푏1, 푏2 ∈ [0, 1]. Now consider the threshold 휗 = (0.5 0.5) and the set ℬ = T {(1 0.5) }. Clearly, ℬ is admissible. However, there is no 푘 ∈ ℕ+ such that

1 2 2 1 1 2 0.5 ( ) ⋅ … ⋅ ( ) = ( 푘) ≤ ( ) . ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟0.5 0.5 0.5 0.5 푘 times

2 2 Hence, 휗 is not (ℬ, ⋅ , ≤ )-admissible. □

150 6.4 Chomsky-Schützenberger parsing of weighted MCFGs

Algorithm 6.38 Chomsky-Schützenberger parsing algorithm for wMCFGs [taken from Den17b, algorithm 3] Input: • a threshold 휗 ∈ 풜 ▷ (풜, ⊙, ퟙ, ퟘ, ⊴) is factorisable, ퟘ ⊴ 휗, and ퟘ ≠ 휗 • a restricted 풜-weighted MCFG 퐺 ▷ 퐺 has terminals from a set 훴 • a number 푛 ∈ ℕ • a string 푤 ∈ 훴∗ c Output: • an element of 푛-best(wt퐺)({푑 ∈ D퐺 ∣ 휗 ⊴ wt퐺(푑)}) 1: function CS-Parse(휗, 퐺, 푛, 푤) ′ 2: construct the automaton ℳ (퐺uw) 3: let mD≡(픓) = mD≡(퐺uw) −1 4: construct an automaton ℳ푤 such that ℒ(ℳ푤) = hom(퐺uw) (푤) ′ 5: construct ℳ as the product FSA of ℳ (퐺uw) and ℳ푤 6: let 퐶 = ∅ ▷ initialise the set of already considered candidates 7: let 푃 = 휀 ▷ initialise the string of output parses 8: while hasNextCandidate(퐺, ℳ, 퐶) ∧ |푃 | < 푛 do 9: let (퐶, 푢) = nextCandidate(퐺, ℳ, 퐶) ▷ obtain a new candidate 10: if isMember’(픓, 푢) then 11: let 푑 = fromBr(푢) ▷ convert the candidate 푢 to a derivation 12: append 푑 to the end of 푃 13: end if 14: end while 15: return 푃 16: end function

17: function hasNextCandidate(퐺, ℳ, 퐶) ′ 18: if there is an 푢 ∈ ℒ(ℳ) ∖ 퐶 such that 휗 ⊴ 휈퐺(푢) then 19: return True 20: else 21: return False 22: end if 23: end function

24: function nextCandidate(퐺, ℳ, 퐶) ′ 25: let 푢 be an element of ℒ(ℳ) ∖ 퐶 whose image under 휈퐺 is maximal w.r.t. ⊴ 26: return (퐶 ∪ {푢}, 푢) 27: end function

151 6 Parsing of natural languages

′ ′ Table 6.3: First eight paths (sorted by their image under 휈퐺) in the product of ℳ (퐺uw) and ℳ푤 and whether the corresponding candidate 푢푖 is in mD≡(퐺uw), taken from Den17b, table 4 ′ 푖 휈퐺(푢푖) path corresponding to 푢푖 푢푖 ∈ mD≡(퐺uw)? 1 1 √ without using any loops no 3 2 2 1 use the loop of (퐴1, 1) no 4 33√8 use the loop of (퐴2, 2) no 4 1 use the loop of (퐵1, 1) no 4 53√12 use the loop of (퐵2̄ , 2) no 6 use the loop of (퐴1, 1) twice no 1 (퐴1, 1) (퐴2, 2) 76 use the loops of and yes 8 use the loop of (퐴2, 2) twice no

If we restrict ourselves to reachable thresholds, then algorithm 6.38 terminates.

Theorem 6.43 (termination of algorithm 6.38). Let 풜 be a factorisable POCMOZ, 휗 ∈ 풜 ∖ {ퟘ}, 퐺 be an 풜-weighted MCFG with terminals from the set 훴, 푛 ∈ ℕ, and 푤 ∈ 훴∗. Furthermore, ′ ′ let ℬ be the set of weights with respect to 휈퐺 of the loops in ℳ (퐺uw) that read an element of hom 퐺−1(휀). Let us call such loops suspicious. If 휗 is ℬ-reachable, then the call CS-parse(휗, 퐺, 푛, 푤) terminates.

′ Proof (essentially taken from Den17b, page 46). By lemma 6.33, ℳ (퐺uw) contains no harmful ′ loops with respect to 휈퐺. Then, by definition 6.19, ℬ does not contain ퟙ. Then, since 휗 is ℬ- reachable and by definition 6.39, we know that for every 푏 ∈ ℬ there is a number 푘(푏) ∈ ℕ+ 푘(푏) such that 푏 ⊴ 휗. Let 푘 = max푏∈ℬ 푘(푏). There are only finitely many runs in the product ′ ′ 7 FSA of ℳ (퐺uw) and ℳ푤 that contain every suspicious loop of ℳ (퐺uw) at most once. Hence hasNextCandidate is False after a finite number of iterations ofthe while-loop, and algorithm 6.38 terminates. ∎

Example 6.44 (taken from Den17b, example 5.23). Consider the wMCFG 퐺 from example 2.56 and the string 푤 = ac. We find the derivations of 푤 in 퐺 using algorithm 6.38. The product ℳ ′ of ℳ (퐺uw) and ℳ푤 is shown in figure 6.21 (for the product construction see after Theorem 3.3 HU79). It suffices for the while-loop to consider at most 8 candidates to find the (only) derivation of 푤 in 퐺, as is shown in table 6.3. The candidates themselves are not shown, instead we see

7This holds even though there may be infinitely many suspicious loops. (In fact, there are either zero suspicious loops or infinitely many, since for any suspicious loop 휃 ∈ Runs ′ , the run 휃휃 is again a suspicious loop.) ℳ (퐺uw) The reason is that after considering each primitive suspicious loop (i.e. a suspicious loop that does not contain another suspicious loop as a substring) 푘 times, the algorithm can not consider another (not necessarily primitive) suspicious loop without considering at least one primitive suspicious loop at least 푘 + 1 times.

152 6.4 Chomsky-Schützenberger parsing of weighted MCFGs

′ their respective weights under 휈퐺, their corresponding path in the graphical representation of ℳ, and whether they are in mD≡(퐺uw). Candidate 푢7 is exactly toBr(휌1(휌2(휌4), 휌5)). □

6.4.9 The implementation and practical results The viability of algorithm 6.38 for natural language processing has been investigated byThomas Ruprecht in his Master’s thesis [Rup18]. He found out that various optimisations and approxi- mations are necessary to make it feasible. With those optimisations and approximations, the implementation8 was comparable in speed and accuracy to state-of-the-art wMCFG parsers (such as disco-dop [CSB16]). Detailed findings on this can be found in our joint paper [RD19].

6.4.10 Related parsing approaches An established approach to speed up the parsing of MCFGs for practical applications is to use a formalism with lower parsing complexity than MCFGs to guide the exploration of the search space. In the following, we will focus on four such approaches. The parsers in Barthélemy, Boullier, Deschamp, and Clergerie [Bar+01], Burden and Ljunglöf [BL05], and Cranenburgh [Cra12] work as follows: Say, we want to parse a given word 푤 with a grammar 퐺 of a formalism 퐴. We first construct a grammar (or automaton) 퐺′ in a formalism 퐵 that has a lower parsing complexity than 퐴. This can be done offline. Then, weparse 푤 with 퐺′. Lastly, we parse 푤 with 퐺, but while doing so, we consult the parses of 푤 in 퐺′ to guide the exploration of the search space (of possible parses). The three papers differ in their choice of formalisms for 퐺 and 퐺′, and in their use of the parses of 푤 in 퐺′ while parsing 푤 in 퐺: (i) Barthélemy, Boullier, Deschamp, and Clergerie [Bar+01] have a positive range concatena- tion grammar (short: PRCG) [Bou98a] of arbitrary arity for 퐺 and use a PRCG of arity 1 for 퐺′. They extract from the parse forest 퐹 of 푤 in 퐺′ a so-called guiding structure and query this structure while parsing 푤 in 퐺. The guiding structure can range from a set of instantiated clauses that occur in 퐹 to 퐹 itself. In their experiments they used as a guiding structure the function that assigns for each instantiated clause the number of its occurrences in 퐹. (ii) Burden and Ljunglöf [BL05, Section 4] have a linear context-free rewriting system (short: LCFRS) [VWJ87] for 퐺 and a context-free grammar for 퐺′. They use deductive parsing. The parse chart 퐶′ of 푤 in 퐺′ is created. While creating the parse chart of 푤 in 퐺, only items are created that are consistent with the items in 퐶′. The algorithm is therefore an instance of coarse-to-fine parsing [Cha+06]. (iii) Cranenburgh [Cra12] has a probabilistic LCFRS (of arbitrary fanout) for 퐺 and a prob- abilistic LCFRS of fanout 1 for 퐺′. As Burden and Ljunglöf [BL05], he uses deductive parsing: First, a parse chart 퐶′ of 푤 in 퐺′ is created. Then the probabilities of 퐺′ are used to restrict 퐶′ to the 푛 best parses, obtaining a new parse chart 퐶̂, this step is called prun- ing. A value of 푛 = 50 was used in the experiments. Then, while creating the parse chart of 푤 in 퐺, only items are created that are consistent with the items in 퐶̂. The algorithm is an instance of coarse-to-fine parsing.

8The implementation is part of Rustomata which can be found under https://github.com/tud-fop/rustomata.

153 6 Parsing of natural languages

Kallmeyer and Maier [KM15] present a different approach: (iv) They construct an FSA 퐺′ as the predict/resume-closure of thread automaton thread stores [Vil02b] where the corresponding thread automaton is constructed from the given LCFRS 퐺. The addresses in the thread stores are represented by regular expressions to keepthe set of states of 퐺′ finite. Then, a parse table is read off of 퐺′. As opposed to items (i), (ii) and (iii), 푤 is not parsed with 퐺′. Instead, while parsing 푤 with 퐺 using a shift-reduce parser, the parse table is consulted directly at each shift or reduce operation to determine the successor state. Their algorithm is an instance of LR-parsing. With the Chomsky-Schützenberger parsing presented in this paper, we construct from the ′ given weighted LCFRS 퐺 three devices (instead of just one): the deterministic FSA ℳ (퐺uw) ′ together with the weight assignment 휈퐺, the congruence multiple Dyck language mD≡(퐺uw), and the alphabetic homomorphism hom(퐺uw). For the given word 푤 we construct a determin- −1 ′ istic FSA, lets call it ℳ, that recognises hom(퐺uw) (푤) ∩ ℒ(ℳ (퐺uw)). Constructing ℳ is an additional pre-processing step in comparison to items (i), (ii), (iii) and (iv). In contrast to ′ item (iii), we do not use the weight assignment 휈퐺 for pruning. We instead use it to enumerate the elements of 퐿(ℳ) in increasing order of their costs. Finally, we filter the list of those el- ements with mD≡(퐺uw). Note that 푤 is never actually parsed with 퐺 as in items (i), (ii), (iii) and (iv).

154 Appendix

155

A Between type-0 and type-2 languages in the Chomsky hierarchy

This chapter gives a list of grammar formalisms, their language classes, and the relations between those classes. We consider grammar formalisms whose expressiveness lies between that of type-0 languages (i.e. recursively enumerable languages) and type-2 languages (i.e. context-free languages) of the Chomsky-hierarchy [cf. Cho59, theorem 9]. The author makes no claim of completeness of this list. Furthermore, we only consider string languages (as opposed to tree languages, graph languages). No definitions or proofs are given in this chapter; the reader may consult the referenced literature for details. We consider the following grammar formalisms and some of their variants: • context-free grammars [Cho56, paragraph 3.2], • context-sensitive grammars [Cho59, restriction 1], • unrestricted grammars [Cho59, section 2], • indexed grammars [Aho68, section 2] (we also consider the linear variant [GP85]), • OI macro grammars [Fis68a, definition 3.7; Fis68b, definition 2.2.12] (we also consider the non-duplicating variant [Fis68b, page 2-15]), • OI context-free tree grammars [Rou69, page 145] (we only consider their yields), • tree-adjoining grammars [JLT75, definition 2.7] (we only consider their yields), • multi-component tree-adjoining grammars [JLT75, definition 7.1] (we only consider the yields of the set-local variant [Wei88, section 4.5]), • lexical functional grammars [KB82] (we only consider the variants restricted [Nis92, definition 2.1] and finite copying [Sek+93, page 135]), • head grammars [Pol84], • combinatory categorial grammars [Ste87], • linear context-free rewriting systems [VWJ87, section 4.1] (we only consider the string variant), • multiple context-free grammars [Sek+91, section 2.2] (we also consider the variants well- nested [Kan09, page 316] and fanout-2 [the fanout is called 푚 in Sek+91, section 2.2]), • parallel multiple context-free grammars [Sek+91, section 2.2], • coupled context-free grammars [Gua92; HP96, definition 4], • unordered scattered context grammars [RS94, definition 1] (we only consider the local variant [RS94, definition 2]), • minimalist grammars [Sta97, section 1],

157 A Between type-0 and type-2 languages in the Chomsky hierarchy

• literal movement grammars [Gro97, definition 2], and • range concatenation grammars [Bou98a, section 2] (we only consider the variant simple [Bou98b, definition 9]). There are pairs of equivalent formalisms among them (and their variants).1 Equivalent grammar formalisms generate the same language class: • The class of recursively enumerable languages (short: REL) is generated – by unrestricted grammars [Cho59, theorem 2], – by restricted lexical functional grammars [NSK92], – by literal movement grammars [Gro97, section 2.1], and • The class of context-sensitive languages (short: CSL) is generated by context-sensitive grammars. • The class of parallel multiple context-free languages (short: PMCFL) is generated by parallel multiple context-free rewriting systems. • The class of multiple context-free languages (short: MCFL) is generated – by multiple context-free grammars, – as yields of set-local multi-component tree-adjoining grammars [JVW90, section 6], – by string linear context-free rewriting systems [Sek+91, section 1 and lemma 2.2], – by finite copying lexical functional grammars [Sek+93, theorem 8.1], – by local unordered scattered context grammars [RS94, theorem 6], – by simple range-concatenation grammars [Bou98b, 15], and – by minimalist grammars [Mic01b, section 4; Mic01a, section 4]. • The class of indexed languages (short: IL) is generated – by indexed grammars, – by OI macro grammars [Fis68a, theorem 5.3; Fis68b, theorem 4.2.8], and – as the yields of OI-Context-free tree grammars [Rou70, page 113]. • The class of well-nested multiple context-free languages (short: wnMCFL) is generated – by well-nested multiple context-free grammars, – by coupled context-free grammars [Kan09, section 1], and – by non-duplicating OI macro grammars [Kan09, section 1]. • The class of 2-multiple context-free languages (short: 2-MCFL) is generated by multiple context-free grammars of fanout at most 2. • The class of head languages (short: HL) is generated – by head grammars,

1 Grammar formalisms 풟1 and 풟2 are equivalent if for each grammar in 풟1 there is an equivalent grammar in 풟2.

158 – well-nested multiple context-free grammars of fanout 2, – by linear indexed grammars [for a definition of linear, see GP85; Vij88, sections 3.3.2 and 3.3.3], – as yields of tree-adjoining grammars [VWJ86, sections 2 and 3; Sek+91, lemma 4.10], and – by combinatorial categorial grammars [WJ88, section 3; Wei88, section 5.2.2]. • The class of context-free languages (short: CFL) is generated by context-free grammars. Figure A.1 shows a diagram containing the above mentioned nine language classes. • An arrow from 퐴 to 퐵 denotes that 퐴 is a proper superset of 퐵. Transitive arrows are left out. • A dashed edge between 퐴 and 퐵 denotes that 퐴 and 퐵 are incomparable, denoted by 퐴 ⚭ 퐵, i.e. 퐴 ∖ 퐵 and 퐵 ∖ 퐴 are both non-empty. Dashed edges are labelled with the symbol ⚭. • A dotted edge between 퐴 and 퐵 denotes that the relation between 퐴 and 퐵 is unknown to the author. Dotted edges are labelled with a question mark. The arrows and dashed edges are justified in the following: • CSL ⊊ REL follows from Chomsky [Cho59, theorem 9]. • PMCFL ⊊ CSL was shown by Seki, Matsumura, Fujii, and Kasami [Sek+91, theorem 3.1]. • IL ⊊ CSL follows from Aho [Aho68, theorem 5.2]. • MCFL ⊊ PMCFL – MCFL ⊆ PMCFL holds since each multiple context-free grammar is a parallel multiple context-free grammar. – PMCFL ∖ MCFL ≠ ∅ was shown by Seki, Matsumura, Fujii, and Kasami [Sek+91, theorem 3.6]. • MCFL ⚭ IL 푛 – IL ∖ MCFL contains (at least) the language {a2 ∣ 푛 ∈ ℕ} [Wei88, example 2.7.3].

– The fact that MCFL ∖ IL contains (at least) the language 퐿3 from Staudacher [Sta93, equation 5.3] (and is thus non-empty) was stated by Michaelis [Mic09, proposi- tion 35]. • wnMCFL ⊊ MCFL – wnMCFL ⊆ MCFL holds since each well-nested multiple context-free grammars is a multiple context-free grammar. – The strictness follows from 2-MCFL ∖ wnMCFL ≠ ∅ (see below) and since each multiple context-free grammar of fanout 2 is a multiple context-free grammar. • wnMCFL ⊊ IL – wnMCFL ⊆ IL holds since each non-duplicating OI macro grammars is an OI macro grammar.

159 A Between type-0 and type-2 languages in the Chomsky hierarchy

type-0 REL

[Cho59]

type-1 CSL

[Sek+91]

PMCFL [Sek+91] ? [Aho68]

⚭ [Wei88; Sta93; Mic09] MCFL IL

[Sek+91] [HP96]

2-MCFL wnMCFL ⚭ [KS10]

[Sek+91]

HL

[JLT75]

type-2 CFL

Figure A.1: Relations of the language classes generated by mildly context-sensitive and related grammar formalisms.

160 – IL ∖ wnMCFL ≠ ∅ follows from wnMCFL ⊆ MCFL and IL ∖ MCFL ≠ ∅. • 2-MCFL ⊊ MCFL is a consequence of Seki, Matsumura, Fujii, and Kasami [Sek+91, theorem 3.4]. • 2-MCFL ⚭ wnMCFL – 2-MCFL ∖ wnMCFL ≠ ∅ was shown by Kanazawa and Salvati [KS10, corollary 10]. – wnMCFL ∖ 2-MCFL contains (at least) the language {a푘b푘c푘d푘e푘f푘|푘 ∈ ℕ} and is therefore non-empty. • HL ⊊ 2-MCFL is a consequence of Seki, Matsumura, Fujii, and Kasami [Sek+91, corol- lary 4.16]. • HL ⊊ wnMCFL was shown by Hotz and Pitsch [HP96, theorem 1]. • CFL ⊊ HL was shown by Joshi, Levy, and Takahashi [JLT75, corollary 3.1]. As far as the author knows, the relation between PMCFL and IL is currently unknown. However, since MCFL ⊆ PMCFL and MCFL ∖ IL ≠ ∅, we already know that PMCFL ∖ IL ≠ ∅.

161

B Additional material for Chapter 2

B.1 Another closure of data storage

We have established that the expressive power of automata with data storage DS does not decrease if we require the instructions that occur in the automaton to be composed of at most one instruction from DS (see proposition 2.49). This proposition also holds for the weighted case (proposition 2.70). However, we can go even further: We can allow the instructions to be closed under composition and set union.

Definition B.1 ((;, ∪)-closure of data storage). Let DS = (퐶, 퐼, 퐶i, 퐶f) be a data storage. The † † † (;, ∪)-closure of DS, denoted by DS , is the data storage (퐶, 퐼 , 퐶i, 퐶f) where 퐼 is the smallest set 퐽 such that • ∅ ∈ 퐽 and id ∈ 퐽, • 퐼 ⊆ 퐽, and

• for each 푖1, 푖2 ∈ 퐽, the instructions 푖1 ; 푖2 and 푖1 ∪ 푖2 are also on 퐽. □ This does not increase the expressive power in the unweighted case.

Proposition B.2. Let DS be a data storage and 훴 be a set. Then REC(DS, 훴) = REC(DS†, 훴).

Proof. Let DS = (퐶, 퐼, 퐶i, 퐶f). (For REC(DS, 훴) ⊆ REC(DS†, 훴)) Since 퐼 ⊆ 퐼†, it is the case that every (DS, 훴)-automaton is also a (DS†, 훴)-automaton. Hence, “⊆” trivially holds. † † † (For REC(DS, 훴) ⊇ REC(DS , 훴)) Let ℳ = (푄, DS , 훴, 푄i, 푄f, 푇 ) be a (DS , 훴)-automaton. ′ ′ ′ (Construction) We construct the tuple ℳ = (푄, DS, 훴, 푄i, 푄f, 푇 ) where 푇 is the smallest set 푇̄ such that for each 휏 = (푞, 푖, 푢, 푞′) ∈ 푇, the following two statements hold: (i) If 푖 ∈ 퐼∗, then 휏 ∈ 푇.̄ ∗ † ∗ (ii) If 푖 ∉ 퐼 , then, since (퐼 , ∪, ;, ∅, id) is a semiring, there are 푖1, …, 푖푘 ∈ 퐼 such that ′ ̄ 푖 = 푖1 ∪ ⋯ ∪ 푖푘. Then the transition (푞, 푖푛, 푢, 푞 ) is 푇 for each 푛 ∈ [푘]. Then ℳ′ is a (DS, 훴)-automaton. ′ ′ ′ (Correctness of the construction) We show ⊢ℳ = ⊢ℳ′ . Let (푞, 푐, 푤), (푞 , 푐 , 푤 ) ∈ 푄 × 퐶 × 훴∗. Then

′ ′ ′ (푞, 푐, 푤) ⊢ℳ (푞 , 푐 , 푤 ) ⟺ ∃(푞, 푖, 푢, 푞′) ∈ 푇 : ((푐, 푐′) ∈ 푖) ∧ (푤 = 푢푤′) (by definition 2.37) ′ ∗ ′ ′ ⟺ ∃(푞, 푖, 푢, 푞 ) ∈ 푇 , ∃푖1, …, 푖푘 ∈ 퐼 : ((푐, 푐 ) ∈ 푖) ∧ (푤 = 푢푤 ) ∧ (푖 = 푖1 ; ⋯ ; 푖푘) (since (퐼†, ∪, ;, ∅, id) is a semiring)

163 B Additional material for Chapter 2

′ ′ ′ ′ ′ ⟺ ∃(푞, 푖1, 푢, 푞 ), …, (푞, 푖푘, 푢, 푞 ) ∈ 푇 : ((푐, 푐 ) ∈ (푖1 ∪ ⋯ ∪ 푖푘)) ∧ (푤 = 푢푤 ) (by construction) ′ ′ ′ ′ ′ ⟺ ∃(푞, 푖1, 푢, 푞 ), …, (푞, 푖푘, 푢, 푞 ) ∈ 푇 , ∃푛 ∈ [푘]: ((푐, 푐 ) ∈ 푖푛) ∧ (푤 = 푢푤 ) ′ ′ ′ ⟺ (푞, 푐, 푤) ⊢ℳ′ (푞 , 푐 , 푤 ) (by definition 2.37)

Then by definition 2.38 follows that ℒ(ℳ) = ℒ(ℳ′). ∎

It is easy to see that the above proposition also holds in the weighted case if we take an idempotent semiring.

Observation B.3. Let DS be a data storage, 훴 be a set, and 풜 be an idempotent semiring. Then REC(DS, 훴, 풜) = REC(DS†, 훴, 풜). ∎

However, if idempotency or distributivity are not given, the author conjectures that this property does not hold.

Conjecture B.4. Let DS be a data storage, 훴 be a set. There is a strong bimonoid 풜 such that REC(DS, 훴, 풜) ⊊ REC(DS†, 훴, 풜). ∎

B.2 Goldstine automata

In the following, we will briefly outline a more algebraic view on automata with data storage. This is done by essentially converting Goldstine [Gol80] to our notation. First, note that the graph of ℳ (cf. example 2.42) in figure 2.41 is the same as the graph of ∗ ∗ the FSA ℳfsa = ([3], 푅 × 훴 , {1}, {3}, 푇fsa) where

∗ 푇fsa: (1, ⟨push(훤 ), a ⟩, 1) (1, ⟨push(훤 ), b ⟩, 1) (1, ⟨id(훤 ) , #⟩, 2) (2, ⟨pop(a), a′⟩, 2) (2, ⟨pop(b), b′⟩, 2) (2, ⟨id({휀}), 휀 ⟩, 3).

The language of ℳfsa is

∗ ∗ ℒ(ℳfsa) = {⟨push(훤 ), a⟩, ⟨push(훤 ), b⟩} ∘ {⟨id(훤 ), #⟩} ∘ {⟨pop(a), a’⟩, ⟨pop(b), b’⟩}∗ ∘ {⟨id({휀}), 휀⟩}.

∗ ∗ Now let us interpret the elements of ℒ(ℳfsa) over the product monoid 푅 × 훴 :

푚 ∗ ′ ⟦ℒ(ℳfsa)⟧푅∗×훴∗ = { ⟨(push(훤 )) ; id(훤 ) ; 푟 ; id({휀}), 푤#푤 ⟩ ∣ 푚, 푛 ∈ ℕ, 푟 ∈ {pop(a′), pop(b′)}푛, 푤 ∈ {a, b}푚, 푤′ ∈ {a′, b′}푛 }

= { ⟨(push(훤 ))푚 ; 푟 ; id({휀}), 푤#푤′⟩ ∣ 푚, 푛 ∈ ℕ, 푟 ∈ {pop(a′), pop(b′)}푛, 푤 ∈ {a, b}푚, 푤′ ∈ {a′, b′}푛 }

The set ⟦ℒ(ℳfsa)⟧푅∗×훴∗ contains strings of instructions together with the corresponding input strings. From this set, we only select the strings whose corresponding string of instructions is

164 B.2 Goldstine automata valid, i.e. goes from initial storage configurations to final storage configurations (in ourcase from {휀} to {a, b}∗). This leaves us with an alternative definition for the language of ℳ:

∗ ∗ ℒ(ℳ) = { 푤 ∈ 훴 ∣ ∃⟨푟, 푤⟩ ∈ ⟦ℒ(ℳfsa)⟧푅∗×훴∗ : ({휀} × {a, b} ) ∩ 푟 ≠ ∅ }

This alternative definition holds for arbitrary automata with data storage [Gol80]:

Proposition B.5 (Goldstine [Gol80, page 120]). Let ℳ = (푄, DS, 훴, 푄i, 푄f, 푇 ) be an au- ∗ ∗ ′ tomaton with data storage, DS = (퐶, 푅, 퐶i, 퐶f), and ℳfsa = (푄, 푅 × 훴 , 푄i, 푄f, 푇 ) where 푇 ′ = {(푞, ⟨푟, 휎⟩, 푞′) ∣ (푞, 푟, 휎, 푞′) ∈ 푇 }. Then

∗ ℒ(ℳ) = { 푤 ∈ 훴 ∣ ∃⟨푟, 푤⟩ ∈ ⟦ℒ(ℳfsa)⟧푅∗×훴∗ : (퐶i × 퐶f) ∩ 푟 ≠ ∅ }. ∎

Goldstine goes even on step further and states that the language ℒ(ℳfsa) may be defined in terms of any language device that generates recognisable languages, e.g. a or a . He consequently proposes that an automaton ℳ with data storage ∗ ∗ ∗ consists only of a data storage DS = (퐶, 푅, 퐶i, 퐶f) and a regular language 퐿 ⊆ (푅 × 훴 ) for some set 훴, and that the language of ℳ should be defined as

∗ ℒ(ℳ) = {푤 ∈ 훴 ∣ ∃⟨푟, 푤⟩ ∈ ⟦퐿⟧푅∗×훴∗ : (퐶i × 퐶f) ∩ 푟 ≠ ∅}.

The following example shows how proposition B.5 can be used to obtain the language ofan automaton with data storage.

Example 3.5 (continuing from p. 66). Example 3.5 on p. 66 illustrates how the language of a TSA can be obtained through analysing runs. Alternatively, we can use proposition B.5 to first determine the state behaviour by giving an appropriate regular subset of (푅∗ × 훴∗)∗:

∗ ℒ(ℳfsa) = {⟨push(1, ∗), a⟩} ∘ {⟨push(1, #) ; down, 휀⟩} ∘ {⟨equals(∗) ; down, b⟩}∗ ∘ {⟨bottom ; up(1), 휀⟩} ∘ {⟨equals(∗) ; up(1), c⟩}∗ ∘ {⟨equals(#) ; down, 휀⟩} ∘ {⟨equals(∗) ; down, d⟩}∗ ∘ {⟨bottom, 휀⟩}.

Then we interpret the elements of this subset in the product monoid 푅∗ × 훴∗:

훼 훽 훾 훿 ⟦ℒ(ℳfsa)⟧푅∗×훴∗ = {⟨푟훼,훽,훾,훿, a b c d ⟩ ∣ 훼, 훽, 훾, 훿 ∈ ℕ} where

훼 푟훼,훽,훾,훿 = (push(1, ∗)) ; push(1, #) ; down ;(equals(∗) ; down)훽 ; bottom ; up(1) ;(equals(∗) ; up(1))훾 ; equals(#) ; down ;(equals(∗) ; down)훿 ; bottom.

Lastly, we determine the storage behaviour by solving the inequation

({{(휀, @)}} × TS(훤 ∪ {@})) ∩ 푟훼,훽,훾,훿 ≠ ∅.

165 B Additional material for Chapter 2

We obtain the set {(훼, 훽, 훾, 훿) ∣ 훼 = 훽 = 훾 = 훿 ∈ ℕ} of solutions and therefore the language of 푛 푛 푛 푛 ℳ is ℒ(ℳ) = {a b c d ∣ 푛 ∈ ℕ}, which agrees with the approach using accepting runs. □

B.3 Proof of lemma 2.71

Lemma 2.71. The tuple (훥∗ → 풜, ⊞, ⊠, ퟘ, ퟙ.휀) is a semiring. Proof. The lemma is proven by verifying the following seven properties of the tuple: (Closedness) This is easily verified. (⊞ is associative and commutative) Follows from ⊕ being associative and commutative. (ퟘ is identity for ⊞) Follows from the fact that ퟘ is identity for ⊕. (⊠ is associative.) Let 푓, 푔, ℎ ∈ 풜⟪훥∗⟫ and 푤 ∈ 훥∗. We derive

((푓 ⊠ 푔) ⊠ ℎ)(푤)

= ⨁ ∗ (푓 ⊠ 푔)(푢) ⊙ ℎ(푤3) (by definition of ⊠) 푢,푤3∈훥 :푤=푢푤3

= ⨁ ∗ (⨁ ∗ 푓(푤1) ⊙ 푔(푤2)) ⊙ ℎ(푤3) (by definition of ⊠) 푢,푤3∈훥 :푤=푢푤3 푤1,푤2∈훥 :푢=푤1푤2

= ⨁ ∗ ⨁ ∗ 푓(푤1) ⊙ 푔(푤2) ⊙ ℎ(푤3) 푢,푤3∈훥 :푤=푢푤3 푤1,푤2∈훥 :푢=푤1푤2 (by distibutivity of ⊙ over ⊕)

= ⨁ ∗ 푓(푤1) ⊙ 푔(푤2) ⊙ ℎ(푤3) (by commutativity of ⊕) 푤1,푤2,푤3∈훥 :푤=푤1푤2푤3

= ⨁ ∗ ⨁ ∗ 푓(푤1) ⊙ 푔(푤2) ⊙ ℎ(푤3) 푤1,푣∈훥 :푤=푤1푣 푤2,푤3∈훥 :푣=푤2푤3 (by commutativity of ⊕)

= ⨁ ∗ 푓(푤1) ⊙ ⨁ ∗ 푔(푤2) ⊙ ℎ(푤3) 푤1,푣∈훥 :푤=푤1푣 푤2,푤3∈훥 :푣=푤2푤3 (by distributivity of ⊙ over ⊕)

= ⨁ ∗ 푓(푤1) ⊙ (푔 ⊠ ℎ)(푣) (by definition of ⊠) 푤1,푣∈훥 :푤=푤1푣 = (푓 ⊠ (푔 ⊠ ℎ))(푤) (by definition of ⊠)

(ퟙ.휀 is identity for ⊠) Follows from ퟙ and 휀 being identities for ⊙ and ∘, respectively. (ퟘ is absorbing w.r.t. ⊠) Follows from the fact that ퟘ is absorbing w.r.t ⊙. (⊠ distributes over ⊞) Let 푓, 푔, ℎ ∈ 풜⟪훥∗⟫ and 푤 ∈ 훥∗. We derive

(푓 ⊠ (푔 ⊞ ℎ))(푤)

= ⨁ ∗ 푓(푤1) ⊙ (푔 ⊞ ℎ)(푤2) (by definition of ⊠) 푤1,푤2∈훥 :푤=푤1푤2

= ⨁ ∗ 푓(푤1) ⊙ (푔(푤2) ⊕ ℎ(푤2)) (by definition of ⊞) 푤1,푤2∈훥 :푤=푤1푤2

= ⨁ ∗ (푓(푤1) ⊙ 푔(푤2)) ⊕ (푓(푤1) ⊙ ℎ(푤2)) 푤1,푤2∈훥 :푤=푤1푤2 (by distributivity of ⊙ over ⊕)

= (⨁ ∗ 푓(푤1) ⊙ 푔(푤2)) ⊕ (⨁ ∗ 푓(푤1) ⊙ ℎ(푤2)) 푤1,푤2∈훥 :푤=푤1푤2 푤1,푤2∈훥 :푤=푤1푤2 (by commutativity of ⊕)

166 B.3 Proof of lemma 2.71

= (푓 ⊠ 푔)(푤) ⊕ (푓 ⊠ ℎ)(푤) (by definition of ⊠) = ((푓 ⊠ 푔) ⊠ (푓 ⊠ ℎ))(푤) (by definition of ⊞) ∎

167

C Additional material for Chapter 3

C.1 Transitions of ℳ(퐺) from example 3.24

The 54 transitions of ℳ(퐺) (cf. example 3.24 and construction 3.20) are shown in the table below. For convenience, we recall the rules of 퐺:

휌1 = 푆 → [푥1,1푥2,1푥1,2푥2,2](퐴, 퐵) 휌2 = 퐴 → [a푥1,1, c푥1,2](퐴) 휌3 = 퐴 → [휀, 휀]()

휌4 = 퐵 → [b푥1,1, d푥1,2](퐵) 휌5 = 퐵 → [휀, 휀]().

# abbreviation transition

1 inital(휌1) (푞i, push(1, 푞f), 휀, ⟨휌1, 1, 0⟩) 2 final(휌1) (⟨휌1, 1, 4⟩, equals(푞f); set(휌1); down, 휀, 푞f )

3 read(휌2, 1, 1) (⟨휌2, 1, 0⟩, id, 푎, ⟨휌2, 1, 1⟩) 4 read(휌2, 2, 1) (⟨휌2, 2, 0⟩, id, 푐, ⟨휌2, 2, 1⟩) 5 read(휌4, 1, 1) (⟨휌4, 1, 0⟩, id, 푏, ⟨휌4, 1, 1⟩) 6 read(휌4, 2, 1) (⟨휌4, 2, 0⟩, id, 푑, ⟨휌4, 2, 1⟩)

7 call(휌1, 1, 1, 휌2) (⟨휌1, 1, 0⟩, push(1, ⟨휌1, 1, 1⟩), 휀, ⟨휌2, 1, 0⟩) 8 resume(휌1, 1, 1, 휌2) (⟨휌1, 1, 0⟩, up(1) ; equals(휌2); set(⟨휌1, 1, 1⟩), 휀, ⟨휌2, 1, 0⟩) 9 suspend(휌1, 1, 1, 휌2) (⟨휌2, 1, 2⟩, equals(⟨휌1, 1, 1⟩) ; set(휌2); down, 휀, ⟨휌1, 1, 1⟩)

10 call(휌1, 1, 1, 휌3) (⟨휌1, 1, 0⟩, push(1, ⟨휌1, 1, 1⟩), 휀, ⟨휌3, 1, 0⟩) 11 resume(휌1, 1, 1, 휌3) (⟨휌1, 1, 0⟩, up(1) ; equals(휌3); set(⟨휌1, 1, 1⟩), 휀, ⟨휌3, 1, 0⟩) 12 suspend(휌1, 1, 1, 휌3) (⟨휌3, 1, 0⟩, equals(⟨휌1, 1, 1⟩) ; set(휌3); down, 휀, ⟨휌1, 1, 1⟩)

13 call(휌1, 1, 2, 휌4) (⟨휌1, 1, 1⟩, push(2, ⟨휌1, 1, 2⟩), 휀, ⟨휌4, 1, 0⟩) 14 resume(휌1, 1, 2, 휌4) (⟨휌1, 1, 1⟩, up(2) ; equals(휌4); set(⟨휌1, 1, 2⟩), 휀, ⟨휌4, 1, 0⟩) 15 suspend(휌1, 1, 2, 휌4) (⟨휌4, 1, 2⟩, equals(⟨휌1, 1, 2⟩) ; set(휌4); down, 휀, ⟨휌1, 1, 2⟩)

16 call(휌1, 1, 2, 휌5) (⟨휌1, 1, 1⟩, push(2, ⟨휌1, 1, 2⟩), 휀, ⟨휌5, 1, 0⟩) 17 resume(휌1, 1, 2, 휌5) (⟨휌1, 1, 1⟩, up(2) ; equals(휌5); set(⟨휌1, 1, 2⟩), 휀, ⟨휌5, 1, 0⟩) 18 suspend(휌1, 1, 2, 휌5) (⟨휌5, 1, 0⟩, equals(⟨휌1, 1, 2⟩) ; set(휌5); down, 휀, ⟨휌1, 1, 2⟩)

19 call(휌1, 1, 3, 휌2) (⟨휌1, 1, 2⟩, push(1, ⟨휌1, 1, 3⟩), 휀, ⟨휌2, 2, 0⟩) 20 resume(휌1, 1, 3, 휌2) (⟨휌1, 1, 2⟩, up(1) ; equals(휌2); set(⟨휌1, 1, 3⟩), 휀, ⟨휌2, 2, 0⟩) 21 suspend(휌1, 1, 3, 휌2) (⟨휌2, 2, 2⟩, equals(⟨휌1, 1, 3⟩) ; set(휌2); down, 휀, ⟨휌1, 1, 3⟩)

22 call(휌1, 1, 3, 휌3) (⟨휌1, 1, 2⟩, push(1, ⟨휌1, 1, 3⟩), 휀, ⟨휌3, 2, 0⟩) 23 resume(휌1, 1, 3, 휌3) (⟨휌1, 1, 2⟩, up(1) ; equals(휌3); set(⟨휌1, 1, 3⟩), 휀, ⟨휌3, 2, 0⟩) 24 suspend(휌1, 1, 3, 휌3) (⟨휌3, 2, 0⟩, equals(⟨휌1, 1, 3⟩) ; set(휌3); down, 휀, ⟨휌1, 1, 3⟩)

169 C Additional material for Chapter 3

# abbreviation transition

25 call(휌1, 1, 4, 휌4) (⟨휌1, 1, 3⟩, push(2, ⟨휌1, 1, 4⟩), 휀, ⟨휌4, 2, 0⟩) 26 resume(휌1, 1, 4, 휌4) (⟨휌1, 1, 3⟩, up(2) ; equals(휌4); set(⟨휌1, 1, 4⟩), 휀, ⟨휌4, 2, 0⟩) 27 suspend(휌1, 1, 4, 휌4) (⟨휌4, 2, 2⟩, equals(⟨휌1, 1, 4⟩) ; set(휌4); down, 휀, ⟨휌1, 1, 4⟩)

28 call(휌1, 1, 4, 휌5) (⟨휌1, 1, 3⟩, push(2, ⟨휌1, 1, 4⟩), 휀, ⟨휌5, 2, 0⟩) 29 resume(휌1, 1, 4, 휌5) (⟨휌1, 1, 3⟩, up(2) ; equals(휌5); set(⟨휌1, 1, 4⟩), 휀, ⟨휌5, 2, 0⟩) 30 suspend(휌1, 1, 4, 휌5) (⟨휌5, 2, 0⟩, equals(⟨휌1, 1, 4⟩) ; set(휌5); down, 휀, ⟨휌1, 1, 4⟩)

31 call(휌2, 1, 2, 휌2) (⟨휌2, 1, 1⟩, push(1, ⟨휌2, 1, 2⟩), 휀, ⟨휌2, 1, 0⟩) 32 resume(휌2, 1, 2, 휌2) (⟨휌2, 1, 1⟩, up(1) ; equals(휌2); set(⟨휌2, 1, 2⟩), 휀, ⟨휌2, 1, 0⟩) 33 suspend(휌2, 1, 2, 휌2) (⟨휌2, 1, 2⟩, equals(⟨휌2, 1, 2⟩) ; set(휌2); down, 휀, ⟨휌2, 1, 2⟩)

34 call(휌2, 1, 2, 휌3) (⟨휌2, 1, 1⟩, push(1, ⟨휌2, 1, 2⟩), 휀, ⟨휌3, 1, 0⟩) 35 resume(휌2, 1, 2, 휌3) (⟨휌2, 1, 1⟩, up(1) ; equals(휌3); set(⟨휌2, 1, 2⟩), 휀, ⟨휌3, 1, 0⟩) 36 suspend(휌2, 1, 2, 휌3) (⟨휌3, 1, 0⟩, equals(⟨휌2, 1, 2⟩) ; set(휌3); down, 휀, ⟨휌2, 1, 2⟩)

37 call(휌2, 2, 2, 휌2) (⟨휌2, 2, 1⟩, push(2, ⟨휌2, 2, 2⟩), 휀, ⟨휌2, 2, 0⟩) 38 resume(휌2, 2, 2, 휌2) (⟨휌2, 2, 1⟩, up(2) ; equals(휌2); set(⟨휌2, 2, 2⟩), 휀, ⟨휌2, 2, 0⟩) 39 suspend(휌2, 2, 2, 휌2) (⟨휌2, 2, 2⟩, equals(⟨휌2, 2, 2⟩) ; set(휌2); down, 휀, ⟨휌2, 2, 2⟩)

40 call(휌2, 2, 2, 휌3) (⟨휌2, 2, 1⟩, push(2, ⟨휌2, 2, 2⟩), 휀, ⟨휌3, 2, 0⟩) 41 resume(휌2, 2, 2, 휌3) (⟨휌2, 2, 1⟩, up(2) ; equals(휌3); set(⟨휌2, 2, 2⟩), 휀, ⟨휌3, 2, 0⟩) 42 suspend(휌2, 2, 2, 휌3) (⟨휌3, 2, 0⟩, equals(⟨휌2, 2, 2⟩) ; set(휌3); down, 휀, ⟨휌2, 2, 2⟩)

43 call(휌4, 1, 2, 휌4) (⟨휌4, 1, 1⟩, push(1, ⟨휌4, 1, 2⟩), 휀, ⟨휌4, 1, 0⟩) 44 resume(휌4, 1, 2, 휌4) (⟨휌4, 1, 1⟩, up(1) ; equals(휌4); set(⟨휌4, 1, 2⟩), 휀, ⟨휌4, 1, 0⟩) 45 suspend(휌4, 1, 2, 휌4) (⟨휌4, 1, 2⟩, equals(⟨휌4, 1, 2⟩) ; set(휌4); down, 휀, ⟨휌4, 1, 2⟩)

46 call(휌4, 1, 2, 휌5) (⟨휌4, 1, 1⟩, push(1, ⟨휌4, 1, 2⟩), 휀, ⟨휌5, 1, 0⟩) 47 resume(휌4, 1, 2, 휌5) (⟨휌4, 1, 1⟩, up(1) ; equals(휌5); set(⟨휌4, 1, 2⟩), 휀, ⟨휌5, 1, 0⟩) 48 suspend(휌4, 1, 2, 휌5) (⟨휌5, 1, 0⟩, equals(⟨휌4, 1, 2⟩) ; set(휌5); down, 휀, ⟨휌4, 1, 2⟩)

49 call(휌4, 2, 2, 휌4) (⟨휌4, 2, 1⟩, push(2, ⟨휌4, 2, 2⟩), 휀, ⟨휌4, 2, 0⟩) 50 resume(휌4, 2, 2, 휌4) (⟨휌4, 2, 1⟩, up(2) ; equals(휌4); set(⟨휌4, 2, 2⟩), 휀, ⟨휌4, 2, 0⟩) 51 suspend(휌4, 2, 2, 휌4) (⟨휌4, 2, 2⟩, equals(⟨휌4, 2, 2⟩) ; set(휌4); down, 휀, ⟨휌4, 2, 2⟩)

52 call(휌4, 2, 2, 휌5) (⟨휌4, 2, 1⟩, push(2, ⟨휌4, 2, 2⟩), 휀, ⟨휌5, 2, 0⟩) 53 resume(휌4, 2, 2, 휌5) (⟨휌4, 2, 1⟩, up(2) ; equals(휌5); set(⟨휌4, 2, 2⟩), 휀, ⟨휌5, 2, 0⟩) 54 suspend(휌4, 2, 2, 휌5) (⟨휌5, 2, 0⟩, equals(⟨휌4, 2, 2⟩) ; set(휌5); down, 휀, ⟨휌4, 2, 2⟩)

170 Bibliography

[Aho68] A. V. Aho. “Indexed grammars – an extension of context-free grammars”. In: Journal of the Association for Computing Machinery 15.4 (Oct. 1968), pp. 647–671. issn: 0004- 5411. doi: 10.1145/321479.321488. [Aho69] A. V. Aho. “Nested Stack Automata”. In: Journal of the ACM 16.3 (July 1969), pp. 383– 406. issn: 0004-5411. doi: 10.1145/321526.321529. [AU71] A. V. Aho and J. D. Ullman. “Translations on a context free grammar”. In: Information and Control 19.5 (Dec. 1971), pp. 439–475. doi: 10.1016/s0019-9958(71)90706- 6. [AU72] A. V. Aho and J. D. Ullman. The Theory of Parsing, Translation, and Compiling. Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1972. isbn: 0-13-914556-7. url: http: //dl.acm.org/citation.cfm?id=SERIES11430.578789. [Bak81] T. P. Baker. “Extending lookahead for LR parsers”. In: Journal of Computer and System Sciences 22.2 (Apr. 1981), pp. 243–259. doi: 10.1016/0022-0000(81)90030-1. [Bar+01] F. Barthélemy, P. Boullier, P. Deschamp, and É. V. de la Clergerie. “Guided Parsing of Range Concatenation Languages”. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics. 2001. url: http://aclweb.org/ anthology/P01-1007. [Ber79] J. Berstel. Transductions and Context-Free Languages. Leitfäden der angewandten Mathematik und Mechanik. Stuttgart, Germany: Teubner, 1979. isbn: 3-519-02340-7. [BL05] H. Burden and P. Ljunglöf. “Parsing Linear Context-free Rewriting Systems”. In: Proceedings of the Ninth International Workshop on Parsing Technology. Ed. by H. Bunt, R. Malouf, and A. Lavie. Parsing ’05. Vancouver, British Columbia, Canada: Association for Computational Linguistics, Oct. 9, 2005, pp. 11–17. url: http : //dl.acm.org/citation.cfm?id=1654494.1654496. [BL70] G. Birkhoff and J. D. Lipson. “Heterogeneous algebras”. In: Journal of Combinatorial Theory 8.1 (Jan. 1970), pp. 115–133. [Bod92] R. Bod. “A Computational Model Of Language Performance: Data Oriented Parsing”. In: 14th International Conference on Computational Linguistics (COLING 1992). 1992, pp. 855–859. url: http://aclweb.org/anthology/C92-3126. [Bou00] P. Boullier. “Range Concatenation Grammars”. In: Proceedings of the Sixth Interna- tional Workshop on Parsing Technology (IWPT 2000). Ed. by J. Carroll. 2000. url: http://www.sussex.ac.uk/Users/johnca/iwpt2000/boullier-iwpt2000. ps.gz.

171 Bibliography

[Bou98a] P. Boullier. “A generalization of mildly context-sensitive formalisms”. In: Proceed- ings of the fourth international workshop on tree adjoining grammars and related formalisms (TAG+ 4). Citeseer. 1998, pp. 17–20. [Bou98b] P. Boullier. Proposal for a Natural Language Processing Syntactic Backbone. Rapport de recherche RR-3342. Institut national de recherche en informatique et en automatique (INRIA), Jan. 1998. url: http://hal.inria.fr/inria-00073347. [Bra+04] S. Brants, S. Dipper, P. Eisenberg, S. Hansen-Schirra, E. König, W. Lezius, C. Rohrer, G. Smith, and H. Uszkoreit. “TIGER: Linguistic Interpretation of a German Corpus”. In: Research on Language and Computation 2.4 (Dec. 2004), pp. 597–620. doi: 10. 1007/s11168-004-7431-3. [Bra69] W. S. Brainerd. “Tree Generating Regular Systems”. In: Information and Control 14.2 (1969), pp. 217–231. doi: 10.1016/S0019-9958(69)90065-5. [BS90] M. E. Bermudez and K. M. Schimpf. “Practical arbitrary lookahead LR parsing”. In: Journal of Computer and System Sciences 41.2 (Oct. 1990), pp. 230–250. doi: 10.1016/0022-0000(90)90037-l. [Büc60] J. R. Büchi. “Weak Second-Order Arithmetic and Finite Automata”. In: Zeitschrift für mathematische Logik und Grundlagen der Mathematik 6.1-6 (1960), pp. 66–92. issn: 1521-3870. doi: 10.1002/malq.19600060105. [Büc66] J. R. Büchi. “On a Decision Method in Restricted Second Order Arithmetic”. In: Proceeding of the 1960 International Congress on Logic, Methodology and Philosophy of Science. Symposium on Decision Problems. Elsevier, 1966, pp. 1–11. doi: 10.1016/ s0049-237x(09)70564-6. [CC17] M. Coavoux and B. Crabbé. “Incremental Discontinuous Phrase Structure Parsing with the GAP Transition”. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. Valencia, Spain: Association for Computational Linguistics, 2017, pp. 1259–1270. url: http://aclweb.org/anthology/E17-1118. [Cha+06] E. Charniak, M. Pozar, T. Vu, M. Johnson, M. Elsner, J. Austerweil, D. Ellis, I. Hax- ton, C. Hill, R. Shrivaths, and J. Moore. “Multilevel coarse-to-fine PCFG parsing”. In: Proceedings of the Main Conference on Human Language Technology Confer- ence of the North American Chapter of the Association of Computational Linguis- tics (NAACL HLT). Association for Computational Linguistics (ACL), 2006. doi: 10.3115/1220835.1220857. [Cha96] E. Charniak. “Tree-Bank Grammars”. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence and Eighth Innovative Applications of Artificial Intelligence Conference, AAAI 96, IAAI 96, Portland, Oregon, USA, August 4-8, 1996, Volume 2. 1996, pp. 1031–1036. url: http://www.aaai.org/Library/AAAI/ 1996/aaai96-153.php.

172 Bibliography

[Chi00] D. Chiang. “Statistical parsing with an automatically-extracted tree adjoining gram- mar”. In: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics - ACL ’00. Association for Computational Linguistics, 2000. doi: 10. 3115/1075218.1075276. [Cho56] N. Chomsky. “Three models for the description of language”. In: IEEE Transactions on Information Theory 2.3 (Sept. 1956), pp. 113–124. doi: 10.1109/tit.1956. 1056813. [Cho59] N. Chomsky. “On certain formal properties of grammars”. In: Information and Control 2.2 (1959), pp. 137–167. doi: 10.1016/S0019-9958(59)90362-6. [Cho62] N. Chomsky. “Formal properties of grammars”. In: Handbook of Mathmatical Psy- chology. Ed. by R. D. Luce, R. R. Bush, and E. Galanter. Vol. 2. 1962. [Chu36] A. Church. “An Unsolvable Problem of Elementary Number Theory”. In: American Journal of Mathematics 58.2 (Apr. 1936), pp. 345–363. url: https://www.jstor. org/stable/2371045. [Cla02] S. Clark. “Supertagging for Combinatory Categorial Grammar”. In: Proceedings of the Sixth International Workshop on Tree Adjoining Grammar and Related Frameworks (TAG+6). Universitá di Venezia: Association for Computational Linguistics, May 2002, pp. 19–24. url: https://www.aclweb.org/anthology/W02-2203. [Cor+09] T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. 3rd ed. The MIT Press, 2009. [Cov01] M. A. Covington. “A fundamental algorithm for dependency parsing”. In: Proceedings of the 39th annual ACM southeast conference. 2001, pp. 95–102. [Cra12] A. van Cranenburgh. “Efficient Parsing with Linear Context-free Rewriting Systems”. In: Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics. Ed. by W. Daelemans. EACL ’12. Avignon, France: Association for Computational Linguistics, Apr. 23, 2012, pp. 460–470. isbn: 978-1- 937284-19-0. url: http://dl.acm.org/citation.cfm?id=2380816.2380873. [CS63] N. Chomsky and M. P. Schützenberger. “The algebraic theory of context-free lan- guages”. In: Computer Programming and Formal Systems, Studies in Logic (1963), pp. 118–161. doi: 10.1016/S0049-237X(09)70104-1. [CSB16] A. van Cranenburgh, R. Scha, and R. Bod. “Data-Oriented Parsing with Discontinu- ous Constituents and Function Tags”. In: Journal of Language Modelling 4.1 (Apr. 2016), p. 57. doi: 10.15398/jlm.v4i1.100. [CSS11] A. van Cranenburgh, R. Scha, and F. Sangati. “Discontinuous Data-oriented Parsing: A Mildly Context-sensitive All-fragments Grammar”. In: Proceedings of the Second Workshop on Statistical Parsing of Morphologically-Rich Languages. Ed. by D. Seddah, R. Tsarfaty, and J. Foster. SPMRL ’11. Dublin, Ireland: Association for Computational Linguistics, Oct. 6, 2011, pp. 34–44. isbn: 978-1-932432-73-2. url: http://dl.acm. org/citation.cfm?id=2206359.2206364.

173 Bibliography

[Den15] T. Denkinger. “A Chomsky-Schützenberger representation for weighted multiple context-free languages”. In: Proceedings of the 12th International Conference on Finite-State Methods and Natural Language Processing. 12. 2015. url: https://www. aclweb.org/anthology/W15-4803. [Den16a] T. Denkinger. “An Automata Characterisation for Multiple Context-Free Languages”. In: Proceedings of the 20th International Conference on Developments in Language Theory. Ed. by S. Brlek and C. Reutenauer. 2. Springer Berlin Heidelberg, 2016, pp. 138–150. doi: 10.1007/978-3-662-53132-7_12. [Den16b] T. Denkinger. “An automata characterisation for multiple context-free languages”. In: Computing Research Repository (June 2016). eprint: 1606.02975. [Den17a] T. Denkinger. “Approximation of Weighted Automata with Storage”. In: Proceedings of the Eighth International Symposium on Games, Automata, Logics and Formal Verification. Vol. 256. Electronic Proceedings in Theoretical Computer Science. Open Publishing Association, Sept. 2017, pp. 91–105. doi: 10.4204/eptcs.256.7. [Den17b] T. Denkinger. “Chomsky-Schützenberger parsing for weighted multiple context- free languages”. In: Journal of Language Modelling 5.1 (July 2017), p. 3. doi: 10. 15398/jlm.v5i1.159. [DK09] M. Droste and W. Kuich. “Semirings and Formal Power Series”. In: Handbook of Weighted Automata. Ed. by M. Droste, W. Kuich, and H. Vogler. Monographs in Theoretical Computer Science. Springer Berlin Heidelberg, 2009, pp. 3–28. doi: 10.1007/978-3-642-01492-5_1. [DM11] M. Droste and I. Meinecke. “Weighted automata and regular expressions over valuation monoids”. In: International Journal of Foundations of Computer Science 22.08 (2011), pp. 1829–1844. doi: 10.1142/S0129054111009069. [DPS79] J. Duske, R. Parchmann, and J. Specht. “A Homomorphic Characterization of Indexed Languages”. In: Elektronische Informationsverarbeitung und Kybernetik 15.4 (1979), pp. 187–195. [DSV10] M. Droste, T. Stüber, and H. Vogler. “Weighted finite automata over strong bi- monoids”. In: Information Sciences 180.1 (2010), pp. 156–166. doi: 10.1016/j.ins. 2009.09.003. [DV13] M. Droste and H. Vogler. “The Chomsky-Schützenberger Theorem for Quantitative Context-Free Languages”. In: Developments in Language Theory. Ed. by M.-P. Béal and O. Carton. Vol. 7907. Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2013, pp. 203–214. isbn: 978-3-642-38770-8. doi: 10.1007/978-3-642- 38771-5_19. [DV14] M. Droste and H. Vogler. “The Chomsky-Schützenberger theorem for quantitative context-free languages”. In: International Journal of Foundations of Computer Science 25.08 (Dec. 2014), pp. 955–969. issn: 1793-6373. doi: 10.1142/s0129054114400176. [Ear70] J. Earley. “An efficient context-free parsing algorithm”. In: Communications of the ACM 13.2 (Feb. 1970), pp. 94–102. doi: 10.1145/362007.362035.

174 Bibliography

[ÉK09] Z. Ésik and W. Kuich. “Finite Automata”. In: Handbook of Weighted Automata. Springer Berlin Heidelberg, 2009, pp. 69–104. doi: 10.1007/978-3-642-01492- 5_3. [Elg61] C. C. Elgot. “Decision problems of finite automata design and related arithmetics”. In: Transactions of the American Mathematical Society 98.1 (Jan. 1961), pp. 21–21. doi: 10.1090/s0002-9947-1961-0139530-9. [Eng14] J. Engelfriet. “Context-free grammars with storage”. In: CoRR (2014). eprint: 1408. 0683. url: http://arxiv.org/pdf/1408.0683. [Eng15] J. Engelfriet. “Tree Automata and Tree Grammars”. In: CoRR (Oct. 2015). eprint: 1510.02036. [Eng86] J. Engelfriet. Context-free grammars with storage. Tech. rep. I86-11. Leiden University, 1986. [EV86] J. Engelfriet and H. Vogler. “Pushdown machines for the macro tree transducer”. In: Theoretical Computer Science 42.0 (1986), pp. 251–368. issn: 0304-3975. doi: 10.1016/0304-3975(86)90052-6. url: http://www.sciencedirect.com/ science/article/pii/0304397586900526. [Eva97] E. G. Evans. “Approximating context-free grammars with a finite-state calculus”. In: Proceedings of the 8th Conference of the European chapter of the Association for Computational Linguistics. Association for Computational Linguistics (ACL), 1997. doi: 10.3115/979617.979675. [Fil86] G. Filé. “Machines for attribute grammars”. In: Information and Control 69.1-3 (Apr. 1986), pp. 41–124. issn: 0019-9958. doi: 10.1016/s0019-9958(86)80043-2. [Fis68a] M. J. Fischer. “Grammars with macro-like productions”. In: IEEE Conference Record of 9th Annual Symposium on Switching and Automata Theory. IEEE. 1968, pp. 131– 142. doi: 10.1109/SWAT.1968.12. [Fis68b] M. J. Fischer. “Grammars with macro-like productions”. PhD thesis. Harvard Uni- versity, 1968. [FV15] S. Fratani and E. M. Voundy. “Context-free characterization of Indexed Languages”. In: CoRR abs/1409.6112 (2015). eprint: 1409.6112. url: http://arxiv.org/abs/ 1409.6112. [FV16] S. Fratani and E. M. Voundy. “Homomorphic Characterizations of Indexed Lan- guages”. In: Proceedings of the 10th Internation Conference on Language and Au- tomata Theory and Applications (LATA 2015). Ed. by A.-H. Dediu, J. Janoušek, C. Martín-Vide, and B. Truthe. Springer Science + Business Media, 2016, pp. 359–370. doi: 10.1007/978-3-319-30000-9_28. [GF15] C. Gómez-Rodrıgueź and D. Fernández-González. “An Efficient Dynamic Oracle for Unrestricted Non-Projective Parsing”. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Association for Computational Linguistics, 2015. doi: 10.3115/v1/p15-2042.

175 Bibliography

[GGH67] S. Ginsburg, S. A. Greibach, and M. A. Harrison. “One-way stack automata”. In: Journal of the ACM 14.2 (Apr. 1967), pp. 389–418. doi: 10.1145/321386.321403. [GL96] W. Golubski and W.-M. Lippe. “Tree-stack automata”. In: Mathematical Systems Theory 29.3 (June 1996), pp. 227–244. issn: 1433-0490. doi: 10.1007/BF01201277. url: http://dx.doi.org/10.1007/BF01201277. [GM85] J. A. Goguen and J. Meseguer. “Completeness of many-sorted equational logic”. In: Houston Journal of Mathematics 11.3 (1985). url: https://www.math.uh.edu/ ~hjm/vol11-3.html. [Gol79] J. Goldstine. “A rational theory of AFLs”. In: Automata, Languages and Programming. Springer Berlin Heidelberg, 1979, pp. 271–281. doi: 10.1007/3-540-09510-1_21. [Gol80] J. Goldstine. “Formal languages and their relation to automata: What Hopcroft & Ullman didn’t tell us”. In: Formal Language Theory: Perspectives and Open Problems. Academic Press, New York, 1980, pp. 109–140. isbn: 978-1483236414. [Gol99] J. S. Golan. Semirings and their Applications. Springer Netherlands, 1999. isbn: 978- 0-7923-5786-5. doi: 10.1007/978-94-015-9333-5. [Goo99] J. Goodman. “Semiring Parsing”. In: Computational Linguistics 25 (4 Dec. 1999), pp. 573–605. issn: 0891-2017. url: http://dl.acm.org/citation.cfm?id= 973226.973230. [GP85] G. Gazdar and G. K. Pullum. “Computationally Relevant Properties of Natural Languages and Their Grammars”. In: The Formal Complexity of Natural Language. Springer Netherlands, 1985, pp. 387–437. doi: 10.1007/978-94-009-3401-6_17. [Grä78] G. Grätzer. General lattice theory. Pure and applied mathematics. Academic Press, 1978. isbn: 0-12-295750-4. [Gro97] A. V. Groenink. “Mild Context-Sensitivity and Tuple-Based Generalizations of Context-Grammar”. In: Linguistics and Philosophy 20.6 (1997), pp. 607–636. doi: 10.1023/a:1005376413354. [Gua92] Y. Guan. “Klammergrammatiken, Netzgrammatiken und Interpretationen von Net- zen”. PhD thesis. University of Saarbrücken, 1992. [Hig63] P. J. Higgins. “Algebras with a Scheme of Operators”. In: Mathematische Nachrichten 27.1-2 (1963), pp. 115–132. doi: 10.1002/mana.19630270108. [HMU01] J. E. Hopcroft, R. Motwani, and J. D. Ullman. Introduction to Automata Theory, Languages, and Computation. 2nd. Addison-Wesley, 2001. [HP96] G. Hotz and G. Pitsch. “On parsing coupled-context-free languages”. In: Theoret- ical Computer Science 161.1â��2 (1996), pp. 205–233. issn: 0304-3975. doi: http: / / dx . doi . org / 10 . 1016 / 0304 - 3975(95 ) 00114 - X. url: http : / / www . sciencedirect.com/science/article/pii/030439759500114X. [HU69] J. E. Hopcroft and J. D. Ullman. Formal languages and their relation to automata. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1969.

176 Bibliography

[HU79] J. E. Hopcroft and J. D. Ullman. Introduction to Automata Theory, Languages and Computation. 1st ed. Addison-Wesley, 1979. [Hul11] M. Hulden. “Parsing CFGs and PCFGs with a Chomsky-Schützenberger Represen- tation”. In: Human Language Technology. Challenges for Computer Science and Lin- guistics. Ed. by Z. Vetulani. Vol. 6562. Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2011, pp. 151–160. isbn: 978-3-642-20094-6. doi: 10.1007/978- 3-642-20095-3_14. [HV15] L. Herrmann and H. Vogler. “A Chomsky-Schützenberger Theorem for Weighted Automata with Storage”. In: Proceedings of the 6th International Conference on Al- gebraic Informatics (CAI 2015). Ed. by A. Maletti. Vol. 9270. Springer International Publishing, 2015, pp. 90–102. isbn: 978-3-319-23021-4. doi: 10.1007/978-3-319- 23021-4_11. [HV16] L. Herrmann and H. Vogler. “Weighted Symbolic Automata with Data Storage”. In: Developments in Language Theory. Springer Berlin Heidelberg, 2016, pp. 203–215. doi: 10.1007/978-3-662-53132-7_17. [JLT75] A. K. Joshi, L. S. Levy, and M. Takahashi. “Tree adjunct grammars”. In: Journal of Computer and System Sciences 10.1 (1975), pp. 136–163. issn: 0022-0000. doi: http://dx.doi.org/10.1016/S0022-0000(75)80019-5. url: http://www. sciencedirect.com/science/article/pii/S0022000075800195. [JM09] D. Jurafsky and J. H. Martin. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Second. Pearson Prentice Hall, 2009. isbn: 978-0-13-504196-3. [JN07] R. Johansson and P. Nugues. “Extended Constituent-to-Dependency Conversion for English”. In: Proceedings of the 16th Nordic Conference of Computational Linguistics (NODALIDA 2007). Tartu, Estonia: University of Tartu, Estonia, May 2007, pp. 105– 112. url: https://www.aclweb.org/anthology/W07-2416. [Joh98] M. Johnson. “Finite-state approximation of constraint-based grammars using left- corner grammar transforms”. In: Proceedings of the 17th international conference on Computational linguistics. Association for Computational Linguistics (ACL), 1998. doi: 10.3115/980451.980948. [Jos85] A. K. Joshi. “How much context-sensitivity is needed for characterizing structural descriptions: Tree adjoining grammars”. In: Natural language parsing: Psychological, computational and theoretical perspectives. (1985). [JVW90] A. K. Joshi, K. Vijay-Shanker, and D. J. Weir. The convergence of mildly context- sensitive grammar formalisms. Tech. rep. University of Pennsylvania, 1990. [Kal10] L. Kallmeyer. Parsing beyond context-free grammars. Springer, 2010. doi: 10.1007/ 978-3-642-14846-0.

177 Bibliography

[Kan08] M. Kanazawa. “A Prefix-Correct Earley Recognizer for Multiple Context-Free Gram- mars”. In: Proceedings of the Ninth International Workshop on Tree Adjoining Gram- mar and Related Frameworks (TAG+9). Tübingen, Germany: Association for Compu- tational Linguistics, 2008, pp. 49–56. url: http://aclweb.org/anthology/W08- 2307. [Kan09] M. Kanazawa. “The pumping lemma for well-nested multiple context-free lan- guages”. In: Developments in Language Theory. Springer. 2009, pp. 312–325. doi: 10.1007/978-3-642-02737-6_25. [Kan14] M. Kanazawa. “Multidimensional trees and a Chomsky-Schützenberger-Weir repre- sentation theorem for simple context-free tree grammars”. In: Journal of Logic and Computation (June 2014). issn: 1465-363X. doi: 10.1093/logcom/exu043. [KB82] R. M. Kaplan and J. Bresnan. “Lexical-functional grammar: A formal system for grammatical representation”. In: Formal Issues in Lexical-Functional Grammar (1982), pp. 29–130. [Kle51] S. C. Kleene. Representation of events in nerve nets and finite automata. Tech. rep. Santa Monica, CA: Rand Corporation, Project Air Force, 1951. [KM13] L. Kallmeyer and W. Maier. “Data-driven parsing using probabilistic linear context- free rewriting systems”. In: Computational Linguistics 39.1 (2013), pp. 87–119. doi: 10.1162/COLI_a_00136. [KM15] L. Kallmeyer and W. Maier. “LR Parsing for LCFRS”. In: Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Lin- guistics: Human Language Technologies. Association for Computational Linguistics, 2015. doi: 10.3115/v1/n15-1134. [Kor17] M. Korn. Coarse-to-fine recognition for weighted tree-stack automata. Technischer Bericht für Profilmodul (INF-PM-FPG). Technische Universität Dresden, Oct. 6, 2017. [KS09] M. Kuhlmann and G. Satta. “Treebank Grammar Techniques for Non-projective Dependency Parsing”. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics. EACL ’09. Athens, Greece: Associa- tion for Computational Linguistics, 2009, pp. 478–486. url: http://dl.acm.org/ citation.cfm?id=1609067.1609120. [KS10] M. Kanazawa and S. Salvati. “The copying power of well-nested multiple context-free grammars”. In: Language and Automata Theory and Applications. Ed. by A.-H. Dediu, H. Fernau, and C. Martín-Vide. Springer, 2010, pp. 344–355. doi: 10.1007/978-3- 642-13089-2_29. [KS85] W. Kuich and A. Salomaa. Semirings, Automata, Languages. Berlin, Heidelberg: Springer-Verlag, 1985. isbn: 0387137165. [KT81] S. Krauwer and L. des Tombe. “Transducers and Grammars as Theories of Language”. In: Theoretical Linguistics 8.1-3 (1981). doi: 10.1515/thli.1981.8.1-3.173. [Kui99] W. Kuich. “Linear systems of equations and automata on distributive multioperator monoids”. In: Contributions to general algebra 12 (1999).

178 Bibliography

[LL87] D. T. Langendoen and Y. Langsam. “On the design of finite transducers for parsing phrase-structure languages”. In: Mathematics of Language (1987), pp. 191–235. [LLZ16] M. Lewis, K. Lee, and L. Zettlemoyer. “LSTM CCG Parsing”. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguis- tics, 2016. doi: 10.18653/v1/n16-1026. [Mai15] W. Maier. “Discontinuous Incremental Shift-reduce Parsing”. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th In- ternational Joint Conference on Natural Language Processing (Volume 1: Long Papers). Beijing, China: Association for Computational Linguistics, July 2015, pp. 1202–1212. url: http://www.aclweb.org/anthology/P15-1116. [MCP05] R. McDonald, K. Crammer, and F. Pereira. “Online large-margin training of de- pendency parsers”. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics - ACL ’05. Association for Computational Linguistics, 2005. doi: 10.3115/1219840.1219852. [Men62] B. Mendelson. Introduction to Topology. College Mathematics Series. Boston: Allyn and Bacon, Inc., 1962. [Mic01a] J. Michaelis. “Derivational Minimalism Is Mildly Context-Sensitive”. English. In: Logical Aspects of Computational Linguistics. Ed. by M. Moortgat. Vol. 2014. Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2001, pp. 179–198. isbn: 978-3-540-42251-8. doi: 10.1007/3-540-45738-0_11. [Mic01b] J. Michaelis. “Transforming Linear Context-Free Rewriting Systems into Minimalist Grammars”. English. In: Logical Aspects of Computational Linguistics. Ed. by P. Groote, G. Morrill, and C. Retoré. Vol. 2099. Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2001, pp. 228–244. isbn: 978-3-540-42273-0. doi: 10. 1007/3-540-48199-0_14. [Mic09] J. Michaelis. “An additional observation on strict derivational minimalism”. In: Proceedings of the 10th conference on Formal Grammar and the 9th Meeting on Mathematics of Language (FG-MoL 2005). Citeseer, 2009, pp. 101–111. [MMT05] T. Matsuzaki, Y. Miyao, and J. Tsujii. “Probabilistic CFG with latent annotations”. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics - ACL ’05. Association for Computational Linguistics, 2005. doi: 10.3115/1219840. 1219850. [Moh00] M. Mohri. “Minimization algorithms for sequential transducers”. In: Theoretical Computer Science 234.1–2 (Mar. 2000), pp. 177–201. issn: 0304-3975. doi: 10.1016/ s0304-3975(98)00115-7. [MS08] W. Maier and A. Søgaard. “Treebanks and mild context-sensitivity”. In: Proceedings of Formal Grammar. 2008, p. 61. url: http : / / web . stanford . edu / group / cslipublications/cslipublications/FG/2008/maier.pdf.

179 Bibliography

[Mül18] S. Müller. Grammatical Theory: From To Constraint- Based Approaches. Second Revised And Extended Edition. en. Zenodo, 2018. doi: 10.5281/zenodo.1193241. [Ned00] M.-J. Nederhof. “Regular approximation of CFLs: a grammatical view”. In: Advances in Probabilistic and other Parsing Technologies. Ed. by H. Bunt and A. Nijholt. Springer, 2000, pp. 221–241. doi: 10.1007/978-94-015-9470-7_12. [Ned17] M.-J. Nederhof. personal communication. 2017. [Nis92] T. Nishino. “Relating attribute grammars and lexical-functional grammars”. In: Information Sciences 66.1-2 (Dec. 1992), pp. 1–22. doi: 10.1016/0020-0255(92) 90084-l. [Niv03] J. Nivre. “An Efficient Algorithm for Projective Dependency Parsing”. In: Proceedings of the 8th International Workshop on Parsing Technologies. 2003, pp. 149–160. [Niv08] J. Nivre. “Algorithms for Deterministic Incremental Dependency Parsing”. In: Com- putational Linguistics 34.4 (Dec. 2008), pp. 513–553. doi: 10.1162/coli.07-056- r1-07-027. [Niv69] I. Niven. “Formal Power Series”. In: The American Mathematical Monthly 76.8 (Oct. 1969), pp. 871–889. doi: 10.1080/00029890.1969.12000359. [NSK92] R. Nakanishi, H. Seki, and T. Kasami. “On the Generative Capacity of Lexical- Functional Grammars”. In: IEICE Transactions on Information and Systems E75-D.4 (July 25, 1992), pp. 509–516. issn: 0916-8532. [Pet+06] S. Petrov, L. Barrett, R. Thibaux, and D. Klein. “Learning accurate, compact, and interpretable tree annotation”. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the ACL - ACL ’06. Association for Computational Linguistics, 2006. doi: 10.3115/1220175.1220230. [Pol84] C. J. Pollard. “Generalized phrase structure grammars, head grammars, and natural language”. PhD thesis. Stanford University, 1984, p. 248. url: http://searchworks. stanford.edu/view/1095753. [Pul86] S. G. Pulman. “Grammars, parsers, and memory limitations”. In: Language and Cog- nitive Processes 1.3 (July 1986), pp. 197–225. doi: 10.1080/01690968608407061. [PW91] F. C. N. Pereira and R. N. Wright. “Finite-state approximation of phrase structure grammars”. In: Proceedings of the 29th annual meeting on Association for Computa- tional Linguistics (ACL’91). Association for Computational Linguistics (ACL), 1991. doi: 10.3115/981344.981376. url: http://dl.acm.org/citation.cfm? doid=981344.981376. [RD19] T. Ruprecht and T. Denkinger. “Implementation of a Chomsky-Schützenberger n- best parser for weighted multiple context-free grammars”. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Min- neapolis, Minnesota: Association for Computational Linguistics, June 2019, pp. 178– 191. url: https://www.aclweb.org/anthology/papers/N/N19/N19-1016.

180 Bibliography

[Rie+01] S. Riezler, T. H. King, R. M. Kaplan, R. Crouch, J. T. Maxwell, and M. Johnson. “Parsing the wall street journal using a Lexical-Functional Grammar and discriminative estimation techniques”. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics - ACL ’02. Association for Computational Linguistics, 2001. doi: 10.3115/1073083.1073129. [Ros68] A. Rosenfeld. An introduction to algebraic structures. Holden-Day, 1968. [Rou69] W. C. Rounds. “Context-free grammars on trees”. In: Proceedings of the first annual ACM symposium on theory of computing (STOC ’69). Association for Computing Machinery (ACM), 1969. doi: 10.1145/800169.805428. [Rou70] W. C. Rounds. “Tree-oriented Proofs of Some Theorems on Context-free and In- dexed Languages”. In: Proceedings of the Second Annual Association for Computing Machinery Symposium on Theory of Computing. STOC ’70. Northampton, Mas- sachusetts, USA: Association for Computing Machinery, 1970, pp. 109–116. doi: 10.1145/800161.805156. [RS59] M. O. Rabin and D. Scott. “Finite Automata and Their Decision Problems”. In: IBM Journal of Research and Development 3.2 (Apr. 1959), pp. 114–125. doi: 10.1147/ rd.32.0114. [RS94] O. Rambow and G. Satta. A two-dimensional hierarchy for parallel rewriting systems. Tech. rep. Institute for Research in Cognitive Science, 1994. [Rup18] T. Ruprecht. “Implementation and evaluation of k-best Chomsky-Schützenberger parsing for weighted multiple context-free grammars”. MA thesis. Technische Universität Dresden, Mar. 21, 2018. url: https://forschungsinfo.tu-dresden. de/detail/abschlussarbeit/41842. [Sch63] M. P. Schützenberger. “On context-free languages and push-down automata”. In: Information and Control 6.3 (1963), pp. 246–264. doi: 10.1016/S0019-9958(63) 90306-1. [Sch90] R. Scha. “Language theory and language technology; competence and performance”. Dutch. In: Computertoepassingen in de Neerlandistiek (1990), pp. 7–22. [Sco67] D. Scott. “Some definitional suggestions for automata theory”. In: Journal of Com- puter and System Sciences 1.2 (Aug. 1967), pp. 187–212. issn: 0022-0000. doi: 10. 1016/s0022-0000(67)80014-x. [Sek+91] H. Seki, T. Matsumura, M. Fujii, and T. Kasami. “On multiple context-free grammars”. In: Theoretical Computer Science 88.2 (1991), pp. 191–229. issn: 0304-3975. doi: 10.1016/0304-3975(91)90374-B. [Sek+93] H. Seki, R. Nakanishi, Y. Kaji, S. Ando, and T. Kasami. “Parallel Multiple Context- free Grammars, Finite-state Translation Systems, and Polynomial-time Recognizable Subclasses of Lexical-functional Grammars”. In: Proceedings of the 31st Annual Meeting on Association for Computational Linguistics. ACL ’93. Columbus, Ohio: Association for Computational Linguistics, 1993, pp. 130–139. doi: 10 . 3115 / 981574.981592.

181 Bibliography

[She59] J. C. Shepherdson. “The Reduction of Two-Way Automata to One-Way Automata”. In: IBM Journal of Research and Development 3.2 (Apr. 1959), pp. 198–200. doi: 10.1147/rd.32.0198. [Sip12] M. Sipser. Introduction to the . Third. Cengage Learning, June 27, 2012. 458 pp. isbn: 113318779X. [SJ88] Y. Schabes and A. K. Joshi. “An Earley-type parsing algorithm for Tree Adjoining Grammars”. In: Proceedings of the 26th annual meeting on Association for Computa- tional Linguistics -. Association for Computational Linguistics, 1988. doi: 10.3115/ 982023.982055. [Sku+98] W. Skut, T. Brants, B. Krenn, and H. Uszkoreit. “A Linguistically Interpreted Corpus of German Newspaper Text”. In: Proceedings of the 10th European Summer School in Logic, Language and Information. Workshop on Recent Advances in Corpus Annotation. 1998. [SS78] A. Salomaa and M. Soittola. Automata-Theoretic Aspects of Formal Power Series. Ed. by F. L. Bauer and D. Gries. Springer New York, 1978. isbn: 978-1-4612-6266-4. doi: 10.1007/978-1-4612-6264-0. [Sta93] P. Staudacher. “New frontiers beyond context-freeness: DI-Grammars and DI- Automata”. In: Proceedings of the sixth conference on European chapter of the Associ- ation for Computational Linguistics (1993). doi: 10.3115/976744.976786. [Sta97] E. Stabler. “Derivational minimalism”. In: Logical Aspects of Computational Linguis- tics. Ed. by C. Retoré. Vol. 1328. Lecture Notes in Computer Science. Springer Berlin Heidelberg, 1997, pp. 68–95. isbn: 978-3-540-63700-4. doi: 10.1007/BFb0052152. [Ste87] M. Steedman. “Combinatory grammars and parasitic gaps”. In: Natural Language and Linguistic Theory 5.3 (Aug. 1987), pp. 403–439. doi: 10.1007/bf00134555. [Sup72] P. Suppes. “Probabilistic Grammars for Natural Languages”. In: Semantics of Natural Language. Springer Netherlands, 1972, pp. 741–762. doi: 10.1007/978-94-010- 2557-7_25. [SVF09] T. Stüber, H. Vogler, and Z. Fülöp. “Decomposition of weighted multioperator tree automata”. In: International Journal of Foundations of Computer Science 20.02 (2009), pp. 221–245. doi: 10.1142/S012905410900653X. [TMS03] A. Taylor, M. Marcus, and B. Santorini. “The Penn Treebank: An Overview”. In: Treebanks. Springer Netherlands, 2003, pp. 5–22. doi: 10.1007/978-94-010- 0201-1_1. [Tor+19] J. Torr, M. Stanojević, M. Steedman, and S. Cohen. “Wide-Coverage Neural A* Parsing for Minimalist Grammars”. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Florence, Italy: Association for Computational Linguistics, 2019. [Tra61] B. A. Trakhtenbrot. “Finite automata and the logic of single-place predicates”. In: Doklady Akademii Nauk SSSR 140 (2 1961), pp. 326–329.

182 Bibliography

[Tur37] A. M. Turing. “On Computable Numbers, with an Application to the Entschei- dungsproblem”. In: Proceedings of the London Mathematical Society s2-42.1 (1937), pp. 230–265. doi: 10.1112/plms/s2-42.1.230. [Tur38] A. M. Turing. “On Computable Numbers, with an Application to the Entschei- dungsproblem. A Correction”. In: Proceedings of the London Mathematical Society s2-43.1 (1938), pp. 544–546. doi: 10.1112/plms/s2-43.6.544. [Ver14] Y. Versley. “Experiments with Easy-first nonprojective constituent parsing”. In: Proceedings of the First Joint Workshop on Statistical Parsing of Morphologically Rich Languages and Syntactic Analysis of Non-Canonical Languages. Dublin, Ireland: Dublin City University, Aug. 2014, pp. 39–53. url: https://www.aclweb.org/ anthology/W14-6104. [VG17] D. Vilares and C. Gómez-Rodríguez. “A non-projective greedy dependency parser with bidirectional LSTMs”. In: Proceedings of the CoNLL 2017 Shared Task: Multilin- gual Parsing from Raw Text to Universal Dependencies. Vancouver, Canada: Associa- tion for Computational Linguistics, Aug. 2017, pp. 152–162. isbn: 978-1-945626-70-8. url: http://www.aclweb.org/anthology/K17-3016. [Vij88] K. Vijay-Shanker. “A study of tree adjoining grammars”. PhD thesis. University of Pennsylvania, 1988. url: http : / / citeseerx . ist . psu . edu / viewdoc / download?doi=10.1.1.401.1695&rep=rep1&type=pdf. [Vil02a] É. Villemonte de la Clergerie. “Parsing MCS languages with thread automata”. In: Proceedings of the Sixth International Workshop on Tree Adjoining Grammars and Related Formalisms. 6. 2002, pp. 101–108. [Vil02b] É. Villemonte de la Clergerie. “Parsing Mildly Context-Sensitive Languages with Thread Automata”. In: Proceedings of the 19th International Conference on Com- putational Linguistics. Vol. 1. 19. Taipei, Taiwan: Association for Computational Linguistics, 2002, pp. 1–7. doi: 10.3115/1072228.1072256. [VW94] M. Vardi and P. Wolper. “Reasoning about Infinite Computations”. In: Information and Computation 115.1 (Nov. 1994), pp. 1–37. doi: 10.1006/inco.1994.1092. [VWJ86] K. Vijay-Shanker, D. J. Weir, and A. K. Joshi. “Tree adjoining and head wrapping”. In: Proceedings of the 11th coference on Computational linguistics. Association for Computational Linguistics. 1986, pp. 202–207. doi: 10.3115/991365.991425. [VWJ87] K. Vijay-Shanker, D. J. Weir, and A. K. Joshi. “Characterizing Structural Descriptions Produced by Various Grammatical Formalisms”. In: Proceedings of the 25th Annual Meeting on Association for Computational Linguistics. ACL ’87. Stanford, California: Association for Computational Linguistics, 1987, pp. 104–111. doi: 10 . 3115 / 981175.981190. [Wei88] D. J. Weir. “Characterizing mildly context-sensitive grammar formalisms”. PhD thesis. University of Pennsylvania, 1988. url: http://repository.upenn.edu/ dissertations/AAI8908403.

183 Bibliography

[Wei92] D. J. Weir. “Linear Context-free Rewriting Systems and Deterministic Tree-walking Transducers”. In: Proceedings of the 30th Annual Meeting on Association for Compu- tational Linguistics. ACL ’92. Newark, Delaware: Association for Computational Linguistics, 1992, pp. 136–143. doi: 10.3115/981967.981985. [WJ88] D. J. Weir and A. K. Joshi. “Combinatory categorial grammars: Generative power and relationship to linear context-free rewriting systems”. In: Proceedings of the 26th annual meeting on Association for Computational Linguistics. Association for Computational Linguistics. 1988, pp. 278–285. doi: 10.3115/982023.982057. [YKS10] R. Yoshinaka, Y. Kaji, and H. Seki. “Chomsky-Schützenberger-type characteriza- tion of multiple context-free languages”. In: Language and Automata Theory and Applications. Ed. by A.-H. Dediu, H. Fernau, and C. Martín-Vide. Springer, 2010, pp. 596–607. isbn: 978-3-642-13088-5. doi: 10.1007/978-3-642-13089-2_50. [YM03] H. Yamada and Y. Matsumoto. “Statistical Dependency Analysis with Support Vector Machines”. In: Proceedings of the 8th International Workshop on Parsing Technology. 2003, pp. 195–206.

184 Index

(relation) diagram, 96 candidate, 150 (;, ∪)-closure of DS, 163 cardinality, 15 풪-notation, 62 carrier, 19, 21 푖-fanout, 31 Cartesian product, 15 휔-concatenation, 18 cell, 15 Chomsky hierarchy, 63 absorbing, 19 closed, 19 absorption laws, 23 closure, 19, 96 accepting run, 35, 39, 53, 54 coarse, 95 acyclic graph, 69 coarse-to-fine parsing, 153 additive monoid, 23 commutative, 19 admissible, 88, 150 commutative diagram, 96 admissible tuple of runs, 88 commutative strong bimonoid, 23 algebra, 19 commute, 96 algorithm, 62 complement, 16 alphabetic string homomorphism, 47 complete, 22, 23 ambiguous, 32, 35, 41, 49, 53, 55 complete derivation tree, 48 antisymmetric, 16 complete derivation trees, 27, 31 approximation, 95–97 composition, 16 approximation strategy, 95, 96 composition closed, 38 arctic bimonoid, 25 composition closure, 38 arctic semiring, 23 computation relation, 46 argmax, 18 concatenation, 17 arity, 19 configuration, 38, 46 associative, 19 context-free (over 훴), 28 automaton characterisation, 63 context-free grammar, 26 automaton with data storage, 38 countable, 19, 20 automaton with respect to 퐺, 112 countable set, 15 bijection, 17 current stack symbol, 65 bimonoid, 23 current state, 38 binary relation, 16 current storage configuration, 38 Boolean semiring, 23 cycle, 69 boundedly non-deterministic, 36 data storage, 36 cancellation relation, 110, 114 derivation relation, 26

185 Index derivation tree, 48 harmful loop, 140 derivation trees, 27, 31 homomorphism, 19, 21 deterministic, 36 deterministic tree walking transducer, 92 idempotent, 19 dimension, 111, 114 identity, 19 disjoint, 15 identity relation, 16 distributive, 19 image, 16 domain, 16 index, 17 domain language, 137 indexed family, 17 Dyck language, 110 infinite, 19, 20 infinite set, 15 edge, 69 infinitely ambiguous, 32, 41, 53, 55 element, 15 initial non-terminal, 26, 31 empty set, 15 initial state, 34, 38 empty string, 17 initial storage configuration, 36 endorelation, 16 injective, 16 entrance count, 72 instruction, 36 equivalence class, 16 inverse, 16 equivalence multiple Dyck language, 114 invertible, 19 equivalence relation, 16 equivalent, 26, 48, 158 kind, 88 factorisable, 143 Kleene-star, 17 fanout, 30, 31, 49 language, 17, 26, 31, 35, 39, 46, 48, 53, 54 filter, 18 language class, 63 final state, 34, 38 language device, 26 final storage configuration, 36 fine, 95 length, 17 finer than, 99 less partial than, 99 finite, 19, 20 linear, 30 finite representation, 48 linear context-free rewriting system, 31 finite set, 15 linked, 83 finite-state automaton, 34 map, 18 finitely ambiguous, 32, 35, 41, 49, 53, 55 mildly context-sensitive grammars, 12 finitely non-deterministic, 36 monadic, 67 free monoid, 22 monoid, 21 function, 17 monoid of monomials, 58 functional, 16 monomial, 57 gap, 88 monotonous, 30, 32, 49 generator set, 112 multiple context-free, 34, 51 grammar-based parsing, 125 multiple context-free grammar, 31 graph, 69 multiple Dyck grammar, 111 guide, 153 multiple Dyck language, 111 guiding structure, 153 multiplicative monoid, 23

186 Index n-best, 128 remaining input, 38 natural numbers, 15 representation, 26 naïve Chomsky-Schützenberger parser, 137, restricted tree-stack automata, 73 149 restricted wMCFG, 146 non-deleting, 30, 51 restriction, 16 non-empty set, 15 rules, 26, 31, 48 non-negative real numbers, 15 run, 34, 39, 53, 54 non-positive real numbers, 15 non-terminals, 26, 30 semantic ambiguity, 125 non-trivially factorisable, 144 semiring, 23 semiring of polynomials, 58 operation, 19 semiring of weighted languages, 58 sentential forms, 26 parallel multiple context-free grammar, 30 sequence, 17 parse table, 154 set, 15 parse tree, 28 set-builder, 15 partial function, 17 signature, 19, 21 partial identity, 16 simple, 148 partial order, 16 singleton set, 15 partition, 15 sort, 18 path, 69 sorted algebra, 21 PMCFL, 34, 51 sorted operation, 21 POCMOZ, 126 sorted set, 20 polynomial, 57 sorted trees, 20 power, 17 source state, 38 power set, 15 space complexity, 62 prefix-closed, 64 stack automaton, 92 primitive stationary cycle, 70 stack normal form, 71 primitive suspicious loop, 152 stack pointer, 65 probability semiring, 25 state, 34, 38 probability semiring with ∞, 23 stationary cycle, 69 product monoid, 22 storage configuration, 36 pruning, 153 string algebra, 29 pushdown automaton, 63 string composition representation, 29 quotient, 16 string homomorphism, 47 strings, 17 rank, 31, 111 strong bimonoid, 23 ranked set, 18 strongly monotonous, 30, 32, 49 ranked trees, 19 subset, 19, 20 reachable, 150 successor configuration, 39 real numbers, 15 superset, 19, 20 recognisable, 35 support, 25, 128 reflexive, 16 surjective, 17 reflexive, transitive closure, 16 suspicious loop, 152

187 Index

symmetric, 16 weighted FSA, 53 syntactic ambiguity, 125 weighted language, 48, 49, 53, 55 weighted language device, 48 take, 18 weighted linear context-free rewriting sys- tape storage, 37 tem, 48 term algebra, 21 weighted multiple context-free grammar, 48 terminal, 34, 38 weighted parallel multiple context-free gram- terminal-free, 30 mar, 48 terminals, 26, 31 weighted string homomorphism, 58 time complexity, 62 total, 16 yield, 27, 31, 35, 39 total function, 17 transition, 34, 38 transition relation, 39, 46 transition-based parsing, 125 transitions, 53, 54 transitive, 16 transitive closure, 16 tree-pushdown, 64 tree-stack, 64 tree-stack automaton, 66 tree-stack storage, 66 trivially factorisable, 144 tropical bimonoid, 25 tropical semiring, 23 tuple algebra, 30 tuple composition, 30 Turing machine, 46 type, 88

unambiguous, 31, 35, 41, 49, 53, 55 uncountable, 19, 20 uncountable set, 15 underlying automaton, 53, 54 underlying grammar, 48 update, 17 variables, 28, 30 vertex, 69 Viterbi semiring, 23 weight assignment, 48, 49, 53–55 weighted alphabetic string homomorphism, 58 weighted automaton with data storage, 54

188 List of variable names

For convenience, we provide a list of variable names that are used consistently for specific concepts in this thesis. The variable names may occur with various decorations (such as subscripts, superscripts, prime, hat, bar, etc.).

Basics: • sort: 푠 ∈ 푆 • binary operation: ⊕, ⊙ • tree: 푡 • algebra: 풜, ℬ, 풞 • (weighted) language: 퐿 • (weighted) homomorphism: ℎ • (weighted) language device: 퐷

Automata: • run: 휃 • (weighted) automaton: ℳ • data storage: DS • state: 푝, 푞 ∈ 푄 • storage configuration: 푐 ∈ 퐶

• initial state: 푞i ∈ 푄i • initial storage configuration: 푐i ∈ 퐶i

• final state: 푞f ∈ 푄f • final storage configuration: 푐f ∈ 퐶f • terminal symbol: 휎 ∈ 훴, 훿 ∈ 훥 • instruction: 푖 ∈ 퐼 • transition: 휏 ∈ 푇 • stack/pushdown symbol: 훾 ∈ 훤 • weight assignment: 휇 • tree stack: [휉, 푝], [휁, 푝]

Grammars: • weight assignment: 휇 • (weighted) grammar: 퐺 • derivation tree: 푑 • non-terminal: 퐴, 퐵 ∈ 푁 • fanout: 푠

• initial non-terminal: 푆 ∈ 푁i • rank: 푘

• terminal symbol: 휎 ∈ 훴, 훿 ∈ 훥 • composition function: 푐 = [푢1, …, 푢푚] • rule: 휌 ∈ 푅

189