University of Oxford Mathematical Institute

Decision Procedures for Guarded Logics

Kevin Kappelmann Wolfson College supervised by Prof. Michael Benedikt

A thesis submitted in partial fulfilment for the degree of arXiv:1911.03679v2 [cs.LO] 25 Mar 2021 MSc in Mathematics and Foundations of Computer Science

Trinity 2018

To my father, Hans-Jürgen Kappelmann †16.01.2018

Acknowledgments

I want to thank my supervisor Prof. Michael Benedikt for all his support, especially his great suggestions for structuring this thesis, appropriately abstracting and presenting the results, and introducing me to an interesting research topic. I also want to thank all my friends that visited me during my overwhelming year in Oxford: Alex, Bene, Cedric, Domi B., Jonny, Liz, Lukas, Netti, Niklas, Philipp, Silja, Tom, and Vali. Words cannot describe how thankful I am to have you as part of my life. Moreover, I want to thank all my new friends that I met in Oxford, especially Domi W., Philip, and Vasilii, and all my extremely talented MFoCS coursemates for our invaluable discussions, exchange of ideas, and, most importantly, collective suffering and consumption of biscuits in the common room. Finally, I want to thank my mother, Erika, for always believing in me and her uncondi- tional support and love.

Abstract

An important class of decidable first-order logic fragments are those satisfying a guard- edness condition, such as the guarded fragment (GF). Usually, decidability for these logics is closely linked to the tree-like model property – the fact that satisfying models can be taken to have tree-like form. Decision procedures for the guarded fragment based on the tree-like model property are difficult to implement. An alternative approach, based on restricting first-order , has been proposed, and this shows more promise from the point of view of implementation. In this work, we connect the tree-like model property of the guarded fragment with the resolution-based approach. We derive efficient resolution-based rewriting algorithms that solve the Quantifier-Free Query Answering Prob- lem under Guarded Tuple Generating Dependencies (GTGDs) and Disjunctive Guarded Tuple Generating Dependencies (DisGTGDs). The Query Answering Problem for these classes subsumes many cases of GF satisfiability. Our algorithms, in addition to mak- ing the connection to the tree-like model property clear, give a natural account of the selection and ordering strategies used by resolution procedures for the guarded fragment. We also believe that our rewriting algorithm for the special case of GTGDs may prove itself valuable in practice as it does not require any Skolemisation step and its theoretical runtime outperforms those of known GF resolution procedures in case of fixed dependencies. Moreover, we show a novel normalisation procedure for the widely used chase procedure in case of (disjunctive) GTGDs, which could be useful for future studies.

Contents

1. Introduction1 1.1. Motivation ...... 1 1.2. Contributions ...... 2 1.3. Thesis Outline ...... 2

2. Preliminaries5 2.1. First-Order Logic ...... 5 2.1.1. Unification ...... 7 2.1.2. Resolution ...... 8 2.2. The Guarded Fragment ...... 8 2.3. The Relational Model, Queries, and Dependencies ...... 9 2.4. Problem Statement: Query Answering under DisTGDs ...... 11 2.5. Decision Procedures for Atomic Rewritings ...... 13 2.5.1. A Decision Procedure for Atomic Rewritings of TGDs ...... 13 2.5.2. A Decision Procedure for Atomic Rewritings of DisTGDs . . . . . 14 2.6. Proof Machinery – Chasing the Truth ...... 14 2.6.1. The Chase ...... 14 2.6.2. The Disjunctive Chase ...... 18

3. The One-Pass Property 21 3.1. One-Pass Chases ...... 21 3.2. One-Pass Disjunctive Chases ...... 24

4. Quantifier-Free Query Answering under GTGDs 29 4.1. Normal Forms ...... 29 4.2. An Abstract Property for Completeness ...... 31 4.3. The Simple Saturation ...... 34 4.4. The Guarded Saturation ...... 37 4.4.1. Correctness ...... 39 4.4.2. Complexity Analysis ...... 43

5. Quantifier-Free Query Answering under DisGTGDs 47 5.1. An Abstract Property for Completeness ...... 47 5.2. The Disjunctive Guarded Saturation ...... 51 5.2.1. Correctness ...... 57 5.2.2. Complexity Analysis ...... 62

6. Comparison with GF Resolution 65 6.1. GF Satisfiability vs. the Quantifier-Free Query Answering Problem . . . . 65

vii Contents

6.2. GF Resolution vs. the (Disjunctive) Guarded Saturation ...... 66

7. Conclusion 69 7.1. Related Work ...... 69 7.2. Summary and Future Work ...... 69

Appendix A. Proofs for Section 4.3 71

Bibliography 75

Notation 77

Index 79

viii 1. Introduction

1.1. Motivation

The satisfiability problem for first-order logic is known to be undecidable, as shown by Church[1936] and Turing[1937]. Nevertheless, some expressive fragments of first-order logic have been shown to be decidable. One very rich family of decidable fragments is the family of so-called guarded logics. Many guarded languages enjoy desirable characteristics, such as the finite model property, Craig interpolation, and the tree-like model property [Grädel, 1999]. The historically first representative of the guarded family is the guarded fragment (GF), introduced by Andréka et al.[1998]. Its satisfiability problem has been solved in a variety of ways, and arguably the most natural one is to use the tree-like model property of GF:

• One can use the tree-like model property to reduce to the monadic second-order logic over trees, which was shown decidable by Rabin[1969].

• A refinement of above argument is to directly generate an automaton that accepts codes of tree-like models. One then checks for non-emptiness of the language of the automaton.

Both techniques, however, fall short in different ways. The first approach is clearly infeasible in practice as it relies on Rabin’s decision procedure for the monadic second-order logic over trees, which has non-elementary complexity; that is, its complexity is not bounded by any power of exponentials. The second approach has complexity bounded by two exponentials, which is tight in the worst-case. But the building of the automaton always hits the worst-case time complexity. An approach that seems superficially quite different is to use resolution, as outlined by De Nivelle and De Rijke[2003] and Ganzinger and De Nivelle[1999]. This approach gives rise to a deterministic algorithm that has the possibility of stopping much earlier than the worst-case bounds on typical cases. However, it is not clear how it connects to the tree-like model property exploited in the techniques described above. The existing completeness proofs rely on general results on resolution with selection [Ganzinger and De Nivelle, 1999], or a resolution game for non-liftable orders [De Nivelle and De Rijke, 2003], techniques that seem very distinct from the tree-like model property. The goal of this thesis is to give a new resolution based algorithm for a special case of GF satisfiability. We will prove completeness of the method via analysis of the tree-like model property.

1 1. Introduction

1.2. Contributions

We work in this thesis not directly with the guarded fragment, but with some fragments that make the connection with the tree-like model property clearer. We use Disjunctive Guarded Tuple Generating Dependencies (DisGTGDs), which represent a guarded version of disjunctive, existential Datalog. Our main contribution is a completeness proof of a resolution-based rewriting algorithm from GTGDs to Datalog and from DisGTGDs to disjunctive Datalog that directly uses the tree-like model property. This language does not comprise all of GF, but it captures a substantial fragment that enjoys a wide range of applications in database theory (e.g. deductive databases and data exchange) and artificial intelligence (e.g. knowledge representation and reasoning) [Gottlob et al., 2012, Bourhis et al., 2016]. To this end, we will contribute to an important technique used for query evaluation and containment, called the chase, that has found its way into numerous studies, such as [Calì and Martinenghi, 2010, Baget et al., 2011, Benedikt et al., 2015], to name but a few. More specifically, we show in Chapter3 that the chase not only admits a tree-like property for GTGDs, but can also be tamed to admit a one-pass behaviour in which chase inferences are more closely related to the tree structure. This result is then generalised to show an analagous one-pass property for disjunctive chases under DisGTGDs. We then use this close connection to formulate an algorithm-independent property that implies completeness of any procedure satisfying the property. This is done independently for GTGDs in Section 4.2 and then naturally extended to DisGTGDs in Section 5.1. Due to its algorithm-independent formulation, these properties can be used for work beyond this thesis. Following that, we derive two theoretically optimal GTGD rewriting algorithms in Sections 4.3 and 4.4. Notably, their theoretical runtime outperforms those of known GF resolution procedures in the case of a fixed set of dependencies. We then generalise our ideas in Section 5.2 to derive an efficient DisGTGD rewriting algorithm. The completeness of all algorithms is shown by means of our abstract completeness properties, which explains the connection to the tree-like model property. Moreover, our algorithms give a natural account of the selection and ordering strategies used by resolution procedures for the guarded fragment.

1.3. Thesis Outline

We start this thesis by briefly recalling the required background for our upcoming algorithms and proofs. In particular, we give a brief account of first-order logic and introduce our considered fragments, namely the guarded fragment in Section 2.2, and the relational model and DisGTGDs in Section 2.3. We then formulate the problem statement and explain our general solution approach in Section 2.4. Our approach will split the problem into a rewriting part, and a simpler, second step, which can be solved by well-known saturation processes described in Section 2.5. In Section 2.6, we then add an essential proof technique to our toolbox – the chase. It will be a common pattern throughout this thesis that we perform an initial investigation

2 1.3. Thesis Outline for the simpler language of GTGDs and then extend our results to the more expressive disjunctive case. That being said, our main work begins in Chapter3, where we first prove a normalisation result for the chase under GTGDs, which we call the one-pass property. We then naturally extend this property to the disjunctive chase to conclude the chapter. In Chapter4, we present an abstract property that implies completeness of GTGD rewriting algorithms. The completeness proof based on these properties will make use of the one-pass property shown in the previous chapter. We then present a simple resolution- based algorithm in Section 4.3 that satisfies the formulated completeness property; however, this simple algorithm experiences a serious unification issue, and we thus derive a second, improved version in Section 4.4 that solves this issue. Chapter5 then generalises and extends our results to the disjunctive case in an analogous way; that is, we again first formulate an abstract completeness property and then derive a resolution-based DisGTGD rewriting algorithm that satisfies this property. We then compare our algorithms with the known resolution procedures for the guarded fragment in Chapter6. In particular, we examine which fragments of GF we can cover with our chosen framework. To conclude the thesis, we give a brief summary of our results as well as some final thoughts about possible future developments and related work.

3

2. Preliminaries

In this chapter, we first introduce some background knowledge and define basic notions used throughout this thesis. We then set our problem statement and explain our general solution idea in Section 2.4. Our approach will split the problem into two parts, the second of which can be solved by very simple, well-known saturation processes that are described in Section 2.5. Finally, we introduce some required machinery that we will use for the correctness proofs of our upcoming algorithms.

2.1. First-Order Logic

First-order logic is a formal language that allows, for example, the formalisation of reasoning, axiomatisation of mathematical systems, and specification of software and hardware systems. All problems considered in this thesis are formulated in first-order logic, and we assume a basic knowledge of its syntax and semantics. Introductions can be found, for example, in [Abiteboul et al., 1995, Hodges, 1997]. In this section, we briefly introduce some syntactic conventions and recall some (perhaps less well-known) concepts and results, namely the notion of a homomorphism, Skolemisation, and Herbrand’s Theorem. To speak about the syntax of first-order logic, we fix an infinite set of variables 푥1, 푥2,... , denoted by 풱, and a signature 휎 = (풞, ℱ, ℛ), where

• 풞 is a set of constants 푐1, 푐2,... ,

• ℱ is a set of function symbols 푓1, 푓2,... with associated arities 푎푓푖 ≥ 1, and

• ℛ is a set of relation symbols 푅1, 푅2,... with associated arities 푎푅푖 ≥ 1.

We denote the set of terms by 풯 and often abbreviate a sequence of terms 푡1, . . . , 푡푛 as ⃗푡; we also allow ourselves to treat ⃗푡 as a set when convenient. Given a formula 휙, we denote the set of free variables of 휙 by vars(휙) and the set of used constants of 휙 by consts(휙). Similarly, we define vars(푡), consts(푡) for the set of used variables and constants of a term 푡. A formula without free variables is also called a sentence, and a formula 휙 = 푅(푡1, . . . , 푡푛), where 푅 ∈ ℛ, is called an atom. Unless otherwise noted, we write 휙(⃗푥) to indicate that the set of free variables of 휙 is among ⃗푥. That is, we write vars(휙) ⊆ ⃗푥, and analogously, 휙(⃗푐) to indicate that the set of constants of 휙 is among ⃗푐. These definitions are lifted in ⋃︀ the natural way to sets of formulas 푆, for example vars(푆) := 휙∈푆 vars(휙). Given a signature 휎 and a sentence 휙, in order to speak about the semantics of 휙, we also have to fix a structure 풜 consisting of

• a set of elements dom(풜), called the domain of 풜,

• an assignment 푐풜 ∈ dom(풜) for any 푐 ∈ 풞,

5 2. Preliminaries

• an assignment 푓 풜 : dom(풜)푎 → dom(풜) for any 푓 ∈ ℱ of arity 푎, and • an assignment 푅풜 ⊆ dom(풜)푎 for any 푅 ∈ ℛ of arity 푎. Definition 2.1.1 (Satisfiability). Given a structure 풜 and sentence 휙, we say that 풜 satisfies 휙 or 풜 models 휙, written 풜 |= 휙, if the interpretation of 휙 under 풜 evaluates to “true” using the usual semantics.1 We say that 휙 is satisfiable if there is some structure 풜 that models 휙, and we call 휙 valid if every structure is a model for 휙. These definitions are lifted to set of sentences 푆 in the usual way, for example, we write 풜 |= 푆 if 풜 |= 휙 for every 휙 ∈ 푆. Lastly, given two sets of sentences 푆 and 푆′, we say that 푆 entails 푆′, written 푆 |= 푆′, if every model 풜 of 푆 is also a model of 푆′. Given two 휎-structures 풜, ℬ, there are different notions that allow us to compare the structures. One of them is that of a homomorphism. Intuitively, a homomorphism between 풜 and ℬ is a map that preserves constants, functions, and relations from 풜 to ℬ. More formally: Definition 2.1.2 (Homomorphism). Let 풜, ℬ be two 휎-structures. A homomorphism from 풜 to ℬ is a map ℎ : dom(풜) → dom(ℬ) such that: (a) For any constant 푐 of 휎, we have ℎ(푐풜) = 푐ℬ. (b) For any function 푓 of 휎 and elements ⃗푎 ∈ dom(풜), we have ℎ(푓 풜(⃗푎)) = 푓 ℬ(ℎ(⃗푎)). (c) For any relation 푅 of 휎 and elements ⃗푎 ∈ dom(풜), we have 푅풜(⃗푎) =⇒ 푅ℬ(ℎ(⃗푎)). For convenience, we write ℎ : 풜 → ℬ if we have a homomorphism from 풜 to ℬ. Next, we recall the idea of Skolemisation, a technique commonly applied in automated theorem proving. Skolemisation is used to remove existential variables of a given first-order formula, and hence gives rise to a normal form transformation in which all quantifications are universal. The basic idea is to replace any existential variable 푦 that is in the scope of universal quantifiers binding variables ⃗푥 by a functional term 푓푦(⃗푥), where 푓푦 is a fresh function symbol. The fresh functions 푓푦 are commonly called Skolem functions. Example 2.1.3. The Skolemisation of

∀푥1∃푦1∀푥2∃푦2[푅(푥1, 푦2) ∧ 푃 (푦1, 푥1, 푥2)] is

∀푥1∀푥2[푅(푥1, 푓푦2 (푥1, 푥2)) ∧ 푃 (푓푦1 (푥1), 푥1, 푥2)]. Theorem 2.1.4 ([Hodges, 1997, Chapter 3]). A formula 휙 is satisfiable if and only if its Skolemisation 휙′ is satisfiable. Lastly, we state a simple form of a fundamental result of : Herbrand’s Theorem. It allows a reduction of first-order logic validity to validity of propositional logic. We will make use of it in our procedure given in Section 2.5.2, but we will not require it for any main result of this thesis. Theorem 2.1.5 (Herbrand[1930]) . A set of clauses 푆 is unsatisfiable if and only if there is a finite unsatisfiable set 푆′ of ground instances of clauses in 푆. 1see, for example, [Abiteboul et al., 1995, Section 2.4] for a formal definition.

6 2.1. First-Order Logic

2.1.1. Unification An important step in our algorithms will be the unification of atoms. First-order unification has its origin in the seminal paper of Robinson[1965] that introduced the resolution calculus of first-order logic. In this section, we introduce basic definitions and results from the literature that we will later make us of.

Definition 2.1.6 (Substitution). A substitution is a partial function 휌 : 풱 → 풯 .

Given a substitution 휌, we let dom(휌) denote the domain of 휌 and assume 푥휌 := 휌(푥) = 푥 for 푥∈ / dom(휌). Given two substitutions 휌, 휌′, we write 휌휌′ for the composition of the ′ ′ substitutions; that is, 휌휌 := 휌 ∘휌. Moreover, we write [푡1/푥1, . . . , 푡푛/푥푛] for a substitution 휌 defined by 푥푖휌 := 푡푖 for 1 ≤ 푖 ≤ 푛. Lastly, given a formula 휙(⃗푥), we write 휙휌 for the formula obtained by replacing any free variable 푥푖 in 휙 by 휌(푥푖).

Example 2.1.7. We have

푅(푥1, 푥2)[푦/푥1, 푓(푐)/푥2] = 푅(푦, 푓(푐)),

푅(푥1, 푥2)[푥2/푥1] = 푅(푥2, 푥2),

푅(푥1, 푥2)[푥2/푥1, 푥1/푥2] = 푅(푥2, 푥1),

푅(푥1, 푥2)[푥2/푥1][푥1/푥2] = 푅(푥1, 푥1).

Definition 2.1.8 (Renaming). A substitution 휌 is a renaming if 휌 is injective on dom(휌) and 푥휌 is a variable for any 푥 ∈ dom(휌). Lemma 2.1.9. A substitution 휌 is a renaming if and only if it has an inverse substitution.

Proof. If 휌 is a renaming, then 휌−1 is a partial function on 풯 since 휌 is injective, and since 푥휌 is a variable for any 푥 ∈ dom(휌), it is a substitution. If 휌 has an inverse substitution 휌−1, then 휌 must be injective, and for any 푥 ∈ dom(휌), 푥휌 must be a variable as 휌−1 is defined on 푥휌 and 휌−1 is a substitution.

Definition 2.1.10 (Unifier). A unifier of atoms 퐴1, . . . , 퐴푛 and 퐵1, . . . , 퐵푛 is a substitu- tion 휃 satisfying 퐴푖휃 = 퐵푖휃 for 1 ≤ 푖 ≤ 푛.

In theory, there might be many different unifiers of atoms 퐴1, . . . , 퐴푛 and 퐵1, . . . , 퐵푛. Practically, however, one does not want to consider the vast set of all possible unifiers. Luckily, it often suffices to only consider one special unifier:

Definition 2.1.11 (Most general unifier). A unifier 휃 of atoms 퐴1, . . . , 퐴푛 and 퐵1, . . . , 퐵푛 is called a most general unifier (mgu) if for any unifier 휌 of 퐴1, . . . , 퐴푛 and 퐵1, . . . , 퐵푛, there is a substitution 훿 such that 휌 = 휃훿.

Lemma 2.1.12. For any unifiable atoms 퐴1, . . . , 퐴푛 and 퐵1, . . . , 퐵푛, the mgu is unique up to renaming.

Proof. Let 휃, 휃′ be two mgus of the given atoms. Then there are substitutions 훿, 훿′ such that 휃 = 휃′훿′ and 휃′ = 휃훿. Thus, 휃 = 휃훿훿′ and 휃′ = 휃′훿′훿, i.e. 훿훿′ = 푖푑 and 훿′훿 = 푖푑. Hence, 훿, 훿′ are mutual inverse bijections, and thus renamings by Lemma 2.1.9.

7 2. Preliminaries

The problem of obtaining an mgu for given atoms 퐴1, . . . , 퐴푛 and 퐵1, . . . , 퐵푛 is rather old and many different algorithms, some even linear, exist.

Lemma 2.1.13. The mgu of atoms 퐴1, . . . , 퐴푛 and 퐵1, . . . , 퐵푛 can be obtained in time (︀∑︀푛 )︀ 풪 푖=1 size(퐴푖) + size(퐵푖) , where size(퐴) is the encoding size of an atom 퐴. Proof. A linear procedure can be found, for example, in [Paterson and Wegman, 1976].

Some of our algorithms will only have to deal with function-free atoms, for which the following lemma will be a useful helper.

Lemma 2.1.14. Assume 퐴1, . . . , 퐴푛 and 퐵1, . . . , 퐵푛 are unifiable and function-free, and let 휃 be an mgu. Then all 퐴푖휃 = 퐵푖휃 are function-free.

Proof. Assume otherwise. Pick some variable 푥. Since all given atoms are function-free and unifiable, the substitution 휃′ defined by 푣휃′ := 푥 for any 푣 ∈ 풱 is a unifier of given atoms ′ ′ ′ and all 퐴푖휃 = 퐵푖휃 are function-free. Thus, there is a substitution 훿 such that 휃 = 휃훿. Pick some 퐴푖 such that 퐴푖휃 contains a functional subterm 푓(⃗푡 ). Then 푓(⃗푡 )훿 = 푓(⃗푡훿) and ′ hence, 퐴푖휃 = 퐴푖휃훿 is functional, a contradiction.

2.1.2. Resolution

In this section, we very briefly recall first-order logic resolution. A detailed introduction can be found in [Bachmair and Ganzinger, 2001]. Resolution, as introduced by Robinson[1965], is a complete theorem proving method for the unsatisfiability problem of first-order logic, and undeniably, it builds the foundation of some of the most successful automated theorem proving procedures. Resolution operates on a formula in , usually represented by a set of clauses. A clause ′ ′ 퐶 = {¬퐿1,..., ¬퐿푛, 퐿1, . . . , 퐿푚} corresponds to a disjunction of (negated) atoms, also called literals. The most important inference rule of the resolution calculus is the binary resolution rule: 퐶 ∪ {퐿} 퐷 ∪ {¬퐿′} (퐶 ∪ 퐷)휃 where 퐶, 퐷 are clauses and 휃 is an mgu of 퐿 and 퐿′. In our work, we will usually work with clauses that contain at least one negative literal. ′ ′ Note that any such clause 퐶 = {¬퐿1,..., ¬퐿푛, 퐿1, . . . , 퐿푚} can be written as an inference ⋀︀푛 ⋁︀푚 ′ rule 푖=1 퐿푖 → 푖=1 퐿푖. There are many extensions of the standard resolution calculus, such as ordered resolution, paramodulation, and superposition, that increase the expressiveness or efficiency of the calculus, but we will not have to directly deal with the rules of these extensions.

2.2. The Guarded Fragment

In this section, we briefly introduce the guarded fragment (GF). The core idea of guarded logics is to “guard” certain operations by using atoms in the language. In the case of GF, the operations being guarded are the universal and existential quantification of variables.

8 2.3. The Relational Model, Queries, and Dependencies

Definition 2.2.1. The formulas of GF are defined inductively as follows:

(a) The atomic formulas of GF are ⊤, ⊥, any function-free atom 푅(⃗푥,⃗푐), and any equality atom 푡1 ≈ 푡1 for 푡1, 푡2 ∈ 풱 ∪ 풞. (b) If 휙, 휓 are in GF, then ¬휙, 휙 ∧ 휓, and 휙 ∨ 휓 are in GF. (c) If 휙 and 퐺 are in GF and 퐺 is an atom with vars(휙) ⊆ vars(퐺), then ∃⃗푥(퐺 ∧ 휙) and ∀⃗푥(퐺 → 휙) are in GF for any finite ⃗푥 ⊆ 풱. The atom 퐺 is called the guard of 휙.

It is noteworthy that the original definition of GF in [Andréka et al., 1998] does not allow constants in its formulas. This feature has been added by Grädel[1999], who at the same time also proved the following result:

Lemma 2.2.2 ([Grädel, 1999]). For any GF sentence 휙, one can construct in polynomial time a GF sentence 휙′ without constants that is satisfiable if and only if 휙 is satisfiable.

Proof. Let ⃗푐 = {푐1, . . . , 푐푘} = consts(휙). For every 푚-ary relation symbol 푅 of 휙, introduce a new (푘 + 푚)-ary relation symbol 푅*. Further, let 푃 be a 푘-ary fresh relation symbol. Then the sentence 휙′ := ∃⃗푐 (︀푃 (⃗푐) ∧ 휙[푅*(⃗푐,⃗푥)/푅(⃗푥)])︀ is in GF, where by abuse of notation, 휙[푅*(⃗푐,⃗푥)/푅(⃗푥)] is the formula obtained by replacing any atom of the form 푅(⃗푥) in 휙 by the corresponding atom 푅*(⃗푐,⃗푥). The constants ⃗푐 are now variables in 휙′, and it is clear that 휙 and 휙′ are satisfiable over the same domains.

In particular, the resolution procedures given by De Nivelle and De Rijke[2003] and Ganzinger and De Nivelle[1999] use a constant-free definition of GF.

Example 2.2.3. The following sentences are in GF:

∀푥[푥 ≈ 푥 → 푅(푥)], 푅(푐1, 푐2) ∧ (¬푃 (푐3) ∨ ¬푆(푐1, 푐2)), [︀ ]︀ ∀푥1, 푥2 푅(푥1, 푥2) → ¬푅(푥2, 푥1) , ∃푦1[퐺(푦1) ∧ ¬푅(푦1)], [︀ ]︀ [︀ ]︀ ∀푥1, 푥2 푅(푥1, 푥2) → ∃푦[푅(푥2, 푦) ∧ ⊤] , ∃푦 퐺(푦) ∧ [∀푥(푅(푥, 푦) → 푥 ≈ 푦)] .

A typical sentence that is not expressible in GF is the simple sentence ∀푥1, 푥2 푅(푥1, 푥2) or the transitivity axiom ∀푥1, 푥2, 푥3[푅(푥1, 푥2) ∧ 푅(푥2, 푥3) → 푅(푥1, 푥3)].

2.3. The Relational Model, Queries, and Dependencies

As mentioned before, we will not directly work with general GF formulas, but instead consider problems that are closely related to database querying, for which we fix a function- free first-order signature 휎 = (풞, ∅, ℛ). In the spirit of database theory, we introduce some special terminology in this section. The set ℛ of relational symbols is called a relational schema.A fact is a function-free atom 푅(⃗푐) without free variables. An instance of a relation 푅 ∈ ℛ is a set of facts of the form 푅(⃗푐). An instance 퐼 of a relational schema ℛ is a union of instances for each 푅 ∈ ℛ. We also call such an instance a database 풟. Note that every instance 퐼 can be considered as a 휎-structure with domain 풞 mapping any constant 푐 to the element of the same name.

9 2. Preliminaries

Since 퐼 is just a set of facts, we write consts(퐼) for the set of constants that occur in at least one fact of 퐼; that is, consts(퐼) := {푐 | ∃퐹 ∈ 퐼 : 푐 ∈ consts(퐹 )}. Given a database 풟, we naturally want to query it. One simple query fragment is given by the class of conjunctive queries. Although these queries are very simple, they constitute the vast majority of relational database queries arising in practice [Abiteboul et al., 1995]. A conjunctive query (CQ) over a database 풟 with schema ℛ is a formula of the form

푄(⃗푥) = ∃⃗푦 (푅1 ∧ ... ∧ 푅푛), with 푅푖 ∈ ℛ and each 푅푖 only uses arguments in ⃗푥, ⃗푦, or constants from consts(풟). If 푄 contains no free variables, we call 푄 a boolean conjunctive query (BCQ), and if 푄 is a single fact, we call 푄 an atomic query. A union of conjunctive queries (UCQ) is a disjunction of CQs. Similarly, a union of boolean conjunctive queries (UBCQ) is a disjunction of BCQs. By abuse of notation, we sometimes consider a UCQ (UBCQ) 푄 as a set of CQs (BCQs). That is, we write 푄푖 ∈ 푄 to mean “푄푖 is a disjunct occurring in 푄”. Finally, we often want to enforce certain constraints on databases. Practically, such constraints can come in a variety of forms, such as functional dependencies, referential integrity constraints, inclusion dependencies, join dependencies, etc. Intuitively, many constraints are based on the idea that whenever some tuples satisfy some condition, then some other tuples must also exist or some values must be equal, as described by Beeri and Vardi[1984]. They took this thought and gave a unified first-order logic framework for database constraints. One prominent part of this framework is constituted by the class of tuple-generating dependencies. A tuple-generating dependency (TGD) is a function-free first-order sentence of the form

휏 = ∀⃗푥[︀훽(⃗푥) → ∃⃗푦휂(⃗푥,⃗푦)]︀, where 훽 and 휂 are conjunctions of constant-free atoms, and vars(훽) = ⃗푥. The conjunction 훽 is referred to as the body and the conjunction 휂 as the head of 휏. A disjunctive TGD (DisTGD) is a function-free first-order sentence of the form

푛 ⋁︁ ∀⃗푥[훽(⃗푥) → ∃⃗푦푖 휂푖(⃗푥,⃗푦푖)], 푖 where 훽 and each 휂푖 are conjunctions of constant-free atoms, and vars(훽) = ⃗푥. We refer to th 휂푖 as the 푖 head conjunction of 휏, and call 휂푖 a non-full head conjunction if it contains some existential variable. The variables 푥푖 ∈ vars(훽) that also occur in some head conjunction of the DisTGD are called the exported variables. We will use ⃗푥푖 to refer to the set of exported variables of the 푖th head conjunction of a DisTGD. A full disjunctive TGD is one that has no existentials in the head. If it is also a TGD, we call it a full TGD. A disjunctive guarded TGD (DisGTGD) is one where there is a single atom in 훽 that contains all of the variables of 훽. Such an atom is called the guard of 휏. If 휏 is also a TGD, we call it a guarded TGD (GTGD).

10 2.4. Problem Statement: Query Answering under DisTGDs

To avoid redundancy, we treat the body, each head conjunction, and the head of a (disjunctive) TGD as sets. For brevity, we will omit the universal quantifiers in front of (disjunctive) TGDs and assume a minimal quantification for all occurring variables. For example, the DisGTGD [︀ (︀ )︀]︀ ∀푥1 퐵(푥1) ∧ 퐵(푥1) → ∃푦1, 푦2 퐻(푦2, 푥1) ∨ ∃푦1, 푦2 퐻(푦2, 푥1) ∧ 퐻(푦2, 푥1) will be treated as the pair (︁ {︀ }︀)︁ (︁ {︀ }︀)︁ {퐵(푥1), 퐵(푥1)}, {퐻(푦2, 푥1)}, {퐻(푦2, 푥1), 퐻(푦2, 푥1)} = {퐵(푥1)}, {퐻(푦2, 푥1)} and hence be simplified to [︀ ]︀ ∀푥1 퐵(푥1) → ∃푦2 퐻(푦2, 푥1) and then be written as 퐵(푥1) → ∃푦2 퐻(푦2, 푥1).

2.4. Problem Statement: Query Answering under DisTGDs

Given some database 풟 and a set of DisTGDs Σ, we can consider possible extensions of 풟 that satisfy the dependencies Σ; that is, databases 풟′ ⊇ 풟 satisfying 풟′ |= Σ. Moreover, given a query 푄, we then might not only wonder whether some extension of 풟 satisfies 푄, but whether all extensions will satisfy 푄. This thought captures the general idea of our problem statement, of which we now give a formal definition: Definition 2.4.1 (Query Answering Problem). Given a database 풟, a set of (disjunctive) TGDs Σ, a UCQ 푄(⃗푥), and some ⃗푐 ⊆ consts(풟), the Query Answering Problem is to decide whether the UBCQ 푄(⃗푐) holds in every possible world for 풟, Σ. That is, we want to decide whether 풟, Σ |= 푄(⃗푐). If 푄 is a quantifier-free UCQ, we will call it the Quantifier-Free Query Answering Problem. In this work, we are interested in theories that consist of (disjunctive) GTGDs Σ and a quantifier-free UCQ 푄. It is noteworthy that this problem is easily expressible as a satisfiability problem in GF: Lemma 2.4.2. For any database 풟, set of DisGTGDs Σ, and quantifier-free UBCQ 푄, we can reduce the Quantifier-Free Query Answering Problem to the satisfiability problem of GF in polynomial time. Proof. It is well know that 풟, Σ |= 푄 if and only if 풟, Σ, ¬푄 is unsatisfiable. Note that 풟 and ¬푄 are in GF as they only consist of boolean combinations of ground atoms. Now any DisGTGD can be transformed into its single-head normal form, as will be described in Definition 5.2.2 (a DisGTGD is single-headed if each head conjunction contains exactly one atom). Any single-headed DisGTGD of the form

[︃ 푛 푚 ]︃ ⋀︁ ⋁︁ ∀⃗푥 퐺(⃗푥) ∧ 퐵푖(⃗푥) → ∃⃗푦푖 퐻푖(⃗푥,⃗푦푖) , 푖=1 푖=1

11 2. Preliminaries can be rewritten into GF by rewriting implications and using De Morgan’s law for quantifiers in the usual way to obtain a GF sentence of the form

[︃ 푛 푚 ]︃ ⋀︁ ⋀︁ ¬∃⃗푥 퐺(⃗푥) ∧ 퐵푖(⃗푥) ∧ ¬∃⃗푦푖 퐻푖(⃗푥,⃗푦푖) , 푖=1 푖=1 noting that ∃⃗푦푖 퐻푖(⃗푥,⃗푦) ≡ ∃⃗푦푖 [퐻푖(⃗푥,⃗푦)∧⊤] and the guard atom 퐺(⃗푥) contains each 푥푖. While the satisfiability problem of GF is usually considered for an arbitrary sentence 휙, our problem setting is strongly motivated by its connection to database querying. Our input consists of three independent parts – database, dependencies, and query – and it is common in practice to ask many different queries while fixing the database and constraints. This motivates a solution approach that also separates the concerns between dependencies and queries. We hence follow a two step approach: The first step performs an atomic rewriting of given dependencies, independent of any query and even database. Definition 2.4.3 (Atomic rewriting). Given a set of (disjunctive) TGDs Σ, an atomic rewriting of Σ is a finite set of full (disjunctive) TGDs Σ′ that have the same answer for any quantifier-free UBCQ 푄 and database 풟 as Σ. Example 2.4.4. Given the database 풟 = {푅(푐, 푑), 푃 (푑)}, a set of DisTGDs Σ

푅(푥1, 푥2) → ∃푦1 푆(푥1, 푦1), (︀ )︀ 푆(푥1, 푥2) ∧ 푃 (푥3) → 푇 (푥1) ∨ 푇 (푥1) ∧ 푈(푥1) ,

푅(푥1, 푥2) ∧ 푇 (푥1) → 푈(푥2), and query 푄 = 푈(푥), we have 풟, Σ |= 푄(푑) but 풟, Σ ̸|= 푄(푐). The rules

푅(푥1, 푥2) ∧ 푃 (푥3) → 푇 (푥1) ∨ 푇 (푥1) ∧ 푈(푥1)

푅(푥1, 푥2) ∧ 푇 (푥1) → 푈(푥2) are an atomic rewriting of Σ. The rewriting step reduces the problem to the case of full (disjunctive) TGDs. Using these full rules, the problem can then easily be solved by simple, well-known saturation algorithms for any database 풟 and query 푄, as will be described in Sections 2.5.1 and 2.5.2. This two-step approach does not only separate concerns, but is also believed to be one of the most promising approaches to achieve scalability of expressive query languages by reusing existing optimised database technologies Ahmetaj et al.[2018]. Our main goal, thus, is to derive a procedure that converts a set of (disjunctive) GTGDs into an atomic rewriting. In order to verify the correctness of such a process, we will have to show that the returned answer Σ′ satisfies two key properties of an atomic rewriting: soundness and completeness. Definition 2.4.5 (Completeness). Let Σ, Σ′ be two sets of (disjunctive) TGDs. We say that Σ′ is complete (with respect to Σ) if 풟, Σ |= 푄 implies 풟, Σ′ |= 푄. for any database 풟 and quantifier-free UBCQ 푄.

12 2.5. Decision Procedures for Atomic Rewritings

Definition 2.4.6 (Soundness). Let Σ, Σ′ be two sets of (disjunctive) TGDs. We say that Σ′ is sound (with respect to Σ) if 풟, Σ′ |= 푄 implies 풟, Σ |= 푄 for any database 풟 and quantifier-free UBCQ 푄. By plugging in the given definitions, we obtain the following lemma: Lemma 2.4.7. Let Σ, Σ′ be two sets of (disjunctive) TGDs. Then Σ′ is an atomic rewriting of Σ if and only if Σ′ is finite, full, sound, and complete. Proof. Follows straight from Definition 2.4.3.

We hence want to look for ways that, given some DisGTGDs Σ, return a finite, complete, and sound set of full DisTGDs Σ′. We will do this first for GTGDs in Chapter4, to give the idea, and then extend to DisGTGDs in Chapter5.

2.5. Decision Procedures for Atomic Rewritings

To motivate our solution approach, the next two sections explain how any atomic rewriting can be used as a decision procedure for the Quantifier-Free Query Answering Problem. Even though we will only be interested in theories that consist of (disjunctive) GTGDs, the procedures described in Sections 2.5.1 and 2.5.2 work more generally for atomic rewritings of (disjunctive) TGDs.

2.5.1. A Decision Procedure for Atomic Rewritings of TGDs As has been mentioned, in the course of our work, we will first derive an atomic rewriting procedure for GTGDs. Once this will be done, we can easily obtain a decision procedure as follows: Proposition 2.5.1. Let 풟 be a database, Σ be a set of TGDs, Σ′ be an atomic rewriting of Σ, let 푄 be a quantifier-free UBCQ, and 푤 be the maximum number of variables in any rule in Σ′. Then Σ′ provides a decision procedure for the Quantifier-Free Query Answering Problem under Σ. The complexity of the procedure is polynomial in 푐푤 and size(풟, Σ′, 푄). Proof. Since Σ′ is an atomic rewriting, we know that 푄 follows from 풟 and Σ if and only if 푄 follows from 풟 and Σ′. Note that Σ′ is a set of full TGDs. Such rules correspond to a language prominently known as Datalog, see for example [Abiteboul et al., 1995, Chapter 12]. We have thus reduced to the Datalog Query Answering Problem, which is EXPTIME-complete in 푤 [Dantsin et al., 2001]. For the sake of completeness, we give an easy, non-optimal bottom-up approach.2 A more efficient approach can be found, for example, in [Abiteboul et al., 1995, Chapter 12–13]. Let 푐 be the number of constants in 풟. For any 휏 ∈ Σ′, we have at most 푐푤 different ground instantiations. For any instantiation, we can check whether all body facts are contained in 풟. If this is the case, we add all head facts to 풟. We only have to fire every rule at most once; hence, this process takes at most |Σ′|푐푤 iterations. Once we reach the fixpoint of this saturation, we only need to check if all facts of some 푄푖 ∈ 푄 are contained in the final set of facts. In total, the complexity is polynomial in 푐푤 and size(풟, Σ′, 푄). 2The approach constructs the full chase of 풟 under Σ′, as will be described in Section 2.6.1.

13 2. Preliminaries

2.5.2. A Decision Procedure for Atomic Rewritings of DisTGDs

Once we generalise our results to obtain an atomic rewriting procedure for DisGTGDs, we can again obtain a decision procedure in a simple way:

Proposition 2.5.2. Let 풟 be a database, Σ be a set of DisTGDs, Σ′ be an atomic rewriting of Σ, let 푄 be a quantifier-free UBCQ, 푛 be the number of relation symbols in 풟, Σ′, 푄, let 푐 be the number of constants in 풟, and 푎 be the maximum arity of any predicate in Σ′. Then Σ′ provides a decision procedure for the Quantifier-Free Query Answering Problem under Σ. The complexity of the procedure is polynomial in 2푛푐푎 and size(풟, Σ′, 푄).

Proof. Since Σ′ is an atomic rewriting, we know that 푄 follows from 풟 and Σ if and only if 푄 follows from 풟 and Σ′. Note that Σ′ is a set of full DisTGDs. Such rules can be transformed into their single-head normal form (as will be described in Definition 5.2.2) so that each head conjunction of each rule contains exactly one atom. The resulting rules correspond to a language prominently known as disjunctive Datalog, see for example [Eiter et al., 1997]. We have thus reduced to the Disjunctive Datalog Query Answering Problem, which is coNEXPTIME-complete [Hustadt et al., 2007]. Again, for the sake of completeness, we give an easy, non-optimal approach by reducing the problem to the propositional unsatisfiability problem. A more efficient approach can be found, for example, in [Cumbo et al., 2004]. We know that 풟, Σ′ |= 푄 if and only if 풟, Σ′, ¬푄 is unsatisfiable. Now any 퐹 ∈ 풟 ⋀︀푛 corresponds to the ground clause {퐹 } and the negation of any ( 푖=1 푅푖) = 푄푗 ∈ 푄 to the ⋀︀푛 ⋁︀푚 ground clause {¬푅1,..., ¬푅푛}. Lastly, any single-headed rule 푖=1 퐵푖 → 푖=1 퐻푖 corre- sponds to a clause {¬퐵1,..., ¬퐵푛, 퐻1, . . . , 퐻푚}. By Herbrand’s Theorem (Theorem 2.1.5), it suffices to consider all ground instantiations of these clauses with constants from 풟. We then simply use the propositional resolution calculus (see for example [Fitting, 2012]) on given clauses to obtain our answer. There are at most 2푛푐푎 ground atoms, and hence at 푎 푎 most 22푛푐 ground clauses. Thus, the resolution process stops after at most 22푛푐 iterations. 푎 It is easy to check that every step in each iteration can be done in time polynomial in 2푛푐 and size(풟, Σ′, 푄). Hence, we obtain the claimed complexity bound.

2.6. Proof Machinery – Chasing the Truth

In order to show that some set Σ′ is indeed an atomic rewriting of some (disjunctive) GTGDs Σ, we need some method that allows us to compare the answers under Σ and Σ′ for any database 풟 and query 푄. One such method is given by a procedure called the chase. In its simple form described in Section 2.6.1, the chase operates on an instance 퐼 and a set of TGDs Σ; however, extensions to DisTGDs exist and will be covered in Section 2.6.2.

2.6.1. The Chase

The chase, first introduced by Maier et al.[1979], is a longstanding technique capable of checking implications of data dependencies in database systems. Simply speaking, the chase repairs a database with respect to some set of TGDs so that the resulting database

14 2.6. Proof Machinery – Chasing the Truth

satisfies all TGDs. It takes some original instance 퐼0 (i.e. some database 풟) and modifies the instance by a sequence of chase steps. Let 퐼 be an instance and 휏 = 훽(⃗푥) → ∃⃗푦휂(⃗푥,⃗푦) be a TGD with ⃗푦 possibly empty. A trigger for 휏 in 퐼 is a mapping ℎ : ⃗푥 → consts(퐼) such that ℎ(훽(⃗푥)) := 훽(ℎ(⃗푥)) ⊆ 퐼. Moreover, an active trigger for 휏 in 퐼 is a trigger ℎ for 휏 in 퐼 such that no extension ℎ′ : ⃗푥 ∪ ⃗푦 → consts(퐼) of ℎ with ℎ′(휂(⃗푥,⃗푦)) ⊆ 퐼 exists. Remark 2.6.1. Note that a trigger ℎ is just a unifier of all atoms in 훽 with some facts in 퐼. Let ℎ be an active trigger for 휏 in 퐼. Applying a chase step for 휏 and ℎ to 퐼 extends 퐼 ′(︀ )︀ ′ ′ with facts of the conjunction ℎ 휂(⃗푥,⃗푦) , where ℎ is an extension of ℎ and ℎ (푦푖) is a fresh constant that does not occur in consts(퐼) for each 푦푖 ∈ ⃗푦. We also say that 휏 based on ℎ fires in 퐼, or shorter, 휏 fires in 퐼 if there is some trigger for which 휏 fires in 퐼. For 퐼0 an instance and Σ a set of TGDs, a chase sequence for 퐼0 and Σ is a possibly infinite sequence 퐼0,... such that for each 푖 > 0, instance 퐼푖 (if it exists) is obtained from 퐼푖−1 by applying a successful chase step with some 휏 ∈ Σ and an active trigger ℎ for 휏 in 퐼푖−1. For a UBCQ 푄 over 풟, a finite chase sequence 퐼0(=풟), 퐼1, . . . , 퐼푛 for 풟 and Σ is a chase proof from 풟 and Σ of 푄 if 퐼푛 |= 푄. The following result shows that the chase is a complete proof system: whenever 푄 logically follows from 풟 and Σ, there is a chase proof. Theorem 2.6.2 (Fagin et al.[2005]) . A UBCQ 푄 follows from a database 풟 and a set of TGDs Σ if and only if there is a chase proof of 푄 from 풟 and Σ.

The proof involves showing that an infinite chase sequence has a homomorphism to every possible world for 풟, Σ – one says that the chase is a universal model. It is a simple model-theoretic result that all existential positive first-order formulas are preserved under homomorphisms (by a basic induction on the structure of existential formulas).3 Putting both results together, it can be derived that a UBCQ is satisfied in all models if and only if it is satisfied in its universal model, the chase. In fact, there exist two different versions of the chase in the literature, called the restricted and the oblivious chase. The above described procedure corresponds to the former version. The oblivious chase differs in that it relaxes the conditions of a chase step: it removes the requirement that the trigger of a chase step must be active. In other words, the oblivious chase “forgets” to check whether a TGD is already satisfied by the given instance. This might seem like a downside first; however, using the oblivious chase in place of the restricted chase sometimes facilitates proofs. Fortunately, both versions can be used equivalently, as shown by the following theorem. Theorem 2.6.3 (Calì et al.[2013]) . For any database 풟, set of TGDs Σ, and UBCQ 푄, there is a restricted chase proof of 푄 from 풟 and Σ if and only if there is an oblivious chase proof of 푄 from 풟 and Σ.

Proof sketch. It is easy to show that there is a homomorphism from the oblivious chase into the restricted one. Consequently, the oblivious chase is a universal model. 3In fact, the famous Homomorphism Preservation Theorem strengthens this result: a first-order formula is preserved under homomorphisms on all structures if and only if it is equivalent to an existential positive first-order formula.

15 2. Preliminaries

Henceforth, we collectively refer to both versions as the chase unless otherwise noted, and we will make use of the oblivious chase whenever needed.

Example 2.6.4. Given the database 풟 = {푅(푐, 푑)}, a set of TGDs Σ

푅(푥1, 푥2) → ∃푦 푆(푥1, 푦),

푆(푥1, 푥2) → 푇 (푥1) ∧ 푈(푥2),

푅(푥1, 푥2) ∧ 푈(푥3) → 푃 (푥2, 푥3), and the CQ 푄(푥) = ∃푦 푃 (푥, 푦), the sequence

퐼0 = {푅(푐, 푑)}, 퐼1 = 퐼0 ∪ {푆(푐, 푑1)}, 퐼2 = 퐼1 ∪ {푇 (푐), 푈(푑1)}, 퐼3 = 퐼2 ∪ {푃 (푑, 푑1)} is a chase proof from 풟 and Σ of 푄(푑). Note that 퐼3 is also the fixpoint of the restricted chase. Since, for example, 푈(푑) ∈/ 퐼3, we can also conclude that 풟, Σ ̸|= 푈(푑).

Even though the chase offers a complete proof system, in its general form, it is merely a sequence of collected facts derived from some database 풟 and TGDs Σ and does not reveal any special underlying structure or interesting properties so far. In the case of GTGDs, however, we can build the chase as a tree-like structure – an important observation that we will exploit in our algorithms. Before we show this tree-like structure property, we introduce two definitions:

Definition 2.6.5 (Guarded subsets). Given a set of facts 푆, a guarded subset 푈 of 푆 is a subset of 푆 containing a fact 퐺 that contains every constant of a fact of 푈. We call 퐺 the guard of 푈.

Definition 2.6.6 (Guardedly-completeness). Given two set of facts 푆 and 푈, we say that 푈 is guardedly-complete with respect to 푆 if for any fact 퐹 of 푆, whenever 퐹 is guarded by a fact in 푈, then 퐹 is also in 푈. If 푈 ⊆ 푆, we also say that 푈 is guardedly-complete in 푆.

Example 2.6.7. Let 퐹1 = {푅(푐, 푑), 푆(푒)} and 퐹2 = {푇 (푐, 푑, 푓), 푈(푓)}. Then 퐹1 is not guarded, 퐹2 is guarded with guard 푇 (푐, 푑, 푓), 퐹1 is guardedly-complete with respect to 퐹2, and 퐹2 is not guardedly-complete with respect to 퐹1 since 푅(푐, 푑) ∈/ 퐹2.

We are now ready to show the tree-like structure property.

Definition 2.6.8 (Guarded tree decomposition). A guarded tree decomposition of an instance 퐼 is a directed tree 푇 = (푉, 퐸) and a function 푓 : 푉 → 풫(퐼), where 풫(퐼) is the power set of 퐼, such that the following hold:

(a) For any 푣 ∈ 푉 , 푓(푣) is guardedly-complete in 퐼. (b) For any 퐹 ∈ 퐼, there is 푣 ∈ 푉 such that 퐹 ∈ 푓(푣). (c) For any 푐 ∈ consts(퐼), the set of nodes 푆푐 := {푣 ∈ 푉 | 푐 ∈ consts(푓(푣))} induces a subtree of 푇 .

We also write 퐹푣 for the set of facts associated to 푣; that is, we set 퐹푣 := 푓(푣). The sets 퐹푣 are called the bags of the decomposition.

16 2.6. Proof Machinery – Chasing the Truth

The bottom line about guarded tree decompositions is that whenever a bag 퐹푣 guards a fact 퐹 ∈ 퐼, then 퐹 is also in the bag.

Proposition 2.6.9. Given a chase sequence 퐼0, . . . , 퐼푛 for a set of GTGDs Σ, each instance 퐼푖 can be decomposed into a tree 푇푖 = (푉푖, 퐸푖) with root 푟 such that consts(퐹푟) = consts(퐼0) and for any 푣 ≠ 푟, consts(퐹푣) ⊆ ℎ(⃗푥,⃗푦) for a trigger ℎ of some rule 훽(⃗푥) → ⃗푦휂(⃗푥,⃗푦) fired in 퐼푗 for some 푗 < 푖.

In fact, we will construct the chase in such a way that once a node 푣 appears in some 퐼푖, it will never disappear, but its bag may grow at further steps. We refer to the initial − 푖 contents of 푣 as the birth facts of 푣, denoted by 퐹푣 , and write 퐹푣 for the bag of 푣 at stage 푖 of the chase.

Proof of Proposition 2.6.9. Our construction will be by induction on the length of the sequence. The following inductive invariant will follow from the guardedly-completeness of 푖−1 any 퐹푣 : For any trigger ℎ that fires for some rule 휏 with body 훽, the image ℎ(훽) lies in some 푖−1 퐹푣 (we say that the chase step is triggered in 푣). Our construction proceeds in the following way: • The trivial chase sequence has a single root node, associated with the initial in- stance 퐼0.

• Suppose 퐼푖 is formed by firing a chase step with a full GTGD 휏 in 퐼푖−1 based on a 푖−1 trigger ℎ. By the inductive invariant, the image of the trigger lies in 퐹푣 for some node 푣. We add the resulting facts to 푣.

• Suppose 퐼푖 is formed by firing a chase step with a non-full GTGD 휏 in 퐼푖−1 based on a trigger ℎ. Again, by the inductive invariant, the image of the trigger lies in 푖−1 ′ 푖 퐹푣 for some node 푣. We create a new node 푣 as a child of 푣, and let 퐹푣′ initially contain the newly generated facts. As we have a guarded tree decomposition for step 푖 − 1 and 푣′ is a child of 푣, the new construction still satisfies Property 2.6.8.(c). • In either of the above two cases, after firing the chase step, we propagate every fact 퐹1 that is guarded by another fact 퐹2 in 퐼푖 to every node 푣 containing 퐹2. This 푖 propagation allows us to maintain the guardedly-completeness of any 퐹푣. We have to show that the invariant indeed holds at any step 푖: Let ℎ be the trigger for the GTGD 휏 that fires at step 푖. Since 휏 is guarded, there is an atom 퐺(⃗푥) in the body 훽(⃗푥) of 휏 containing all body variables. Let 푣 be a node at step 푖 − 1 containing the fact (︀ )︀ 푖−1 (︀ )︀ ℎ 퐺(⃗푥) . By construction, 퐹푣 is guardedly-complete in 퐼푖−1, and hence all facts ℎ 훽(⃗푥) 푖−1 푖−1 must be contained in 퐹푣 , i.e. the image of ℎ lies in 퐹푣 . Example 2.6.10. Given the database 풟 = {푅(푐, 푑)} and a set of GTGDs Σ

푅(푥1, 푥2) → ∃푦 푆(푥1, 푦), 푅(푥1, 푥2) → ∃푦 푇 (푥1, 푥2, 푦),

푇 (푥1, 푥2, 푥3) → ∃푦 푈(푥1, 푥2, 푦), 푈(푥1, 푥2, 푥3) → 푃 (푥2),

푇 (푥1, 푥2, 푥3) ∧ 푃 (푥2) → 푀(푥1), 푆(푥1, 푥2) ∧ 푀(푥1) → ∃푦 푁(푥1, 푦),

17 2. Preliminaries the sequence

퐼0 = {푅(푐, 푑)}, 퐼1 = 퐼0 ∪ {푆(푐, 푑1)}, 퐼2 = 퐼1 ∪ {푇 (푐, 푑, 푑2)}, 퐼3 = 퐼2 ∪ {푈(푐, 푑, 푑3)},

퐼4 = 퐼3 ∪ {푃 (푑)}, 퐼5 = 퐼4 ∪ {푀(푐)}, 퐼6 = 퐼5 ∪ {푁(푐, 푑4)} is a chase sequence for 풟 and Σ. The corresponding tree-like chase is depicted in Figure 2.1.

퐼3 푅(푐, 푑) 푅(푐, 푑) 퐼1 퐼2 퐼0 푅(푐, 푑) 푅(푐, 푑) 푆(푐, 푑1) 푇 (푐, 푑, 푑2) 푆(푐, 푑1) 푇 (푐, 푑, 푑2) 푅(푐, 푑)

푆(푐, 푑1) 푆(푐, 푑1) 푇 (푐, 푑, 푑2) 푈(푐, 푑, 푑3) 푈(푐, 푑, 푑3), 푃 (푑)

퐼4 푅(푐, 푑), 푃 (푑) 푅(푐, 푑), 푃 (푑) 퐼5 푅(푐, 푑), 푃 (푑), 푀(푐)

푆(푐, 푑1) 푇 (푐, 푑, 푑2), 푃 (푑) 푆(푐, 푑1) 푇 (푐, 푑, 푑2), 푃 (푑), 푀(푐) 푆(푐, 푑1), 푀(푐) 푇 (푐, 푑, 푑2), 푃 (푑), 푀(푐)

푈(푐, 푑, 푑3), 푃 (푑) 푈(푐, 푑, 푑3), 푃 (푑) 푈(푐, 푑, 푑3), 푃 (푑), 푀(푐)

푅(푐, 푑), 푃 (푑), 푀(푐) 퐼6 푅(푐, 푑), 푃 (푑), 푀(푐)

푆(푐, 푑1), 푀(푐) 푇 (푐, 푑, 푑2), 푃 (푑), 푀(푐) 푆(푐, 푑1), 푀(푐) 푇 (푐, 푑, 푑2), 푃 (푑), 푀(푐)

푁(푐, 푑4) 푈(푐, 푑, 푑3), 푃 (푑), 푀(푐) 푁(푐, 푑4), 푀(푐) 푈(푐, 푑, 푑3), 푃 (푑), 푀(푐)

Figure 2.1.: Tree-like chase for Example 2.6.10; triggered nodes are marked bold and red, and nodes with missing, non-propagated facts are marked dashed and blue.

2.6.2. The Disjunctive Chase

The preceding discussion extends to disjunctive TGDs. The analogue of the chase for DisTGDs is the disjunctive chase [Deutsch et al., 2008]. As before, we have the possibility to consider a restricted and oblivious version, but since both versions can again be used equivalently, we only define the latter. ⋁︀푛 Let 퐼 be a set of facts, 휏 = 훽 → 푖=1 ∃⃗푦푖 휂푖 be a DisTGD, and ℎ be a trigger for 휏 in 퐼. A disjunctive chase step for 휏 and ℎ to 퐼 creates instances 퐼1 . . . 퐼푛, one for each head conjunction of 휏, such that instance 퐼푖 is formed by applying an ordinary chase step for 훽 → ∃⃗푦푖 휂푖 and ℎ to 퐼. For 퐼0 an instance and Σ a set of DisTGDs, a chase tree for 퐼 and Σ is a possibly infinite tree 푇 with root 퐼0 such that for every node 퐼, the children of 퐼 are obtained by applying a disjunctive chase step for some 휏 ∈ Σ to 퐼. For any chase tree 푇 , we let lvs(푇 ) denote the set of leaves of 푇 . A path 퐼0, . . . , 퐿 with 퐿 ∈ lvs(푇 ) is called a thread of 푇 . Note that every thread 퐼0, . . . , 퐼푛 can be considered as a chase sequence.

18 2.6. Proof Machinery – Chasing the Truth

For a UBCQ 푄 over 풟, a finite chase tree 푇 for 풟 and Σ is a chase proof from 풟 and Σ of 푄 if every thread 퐼0, . . . , 퐼푛 is a chase proof of 푄. We write 푇 |= 푄 if 푇 is a chase proof of 푄. The following result shows that the disjunctive chase is a complete proof system:

Theorem 2.6.11 (Deutsch et al.[2008]) . A UBCQ 푄 follows from a database 풟 and a set of DisTGDs Σ if and only if there is a chase proof of 푄 from 풟 and Σ.

The idea of the proof is the same as for Theorem 2.6.2: one shows that an infinite chase tree has a homomorphism to every possible world for 풟, Σ and makes use of the fact that homomorphisms preserve UBCQs.

Example 2.6.12. Given the database 풟 = {푅(푐, 푑)}, a set of DisGTGDs Σ

푅(푥1, 푥2) → ∃푦 푆(푥1, 푦) ∨ ∃푦 푇 (푥1, 푥2, 푦), 푇 (푥1, 푥2, 푥3) → ∃푦 푈(푥1, 푥2, 푦),

푈(푥1, 푥2, 푥3) → 푀(푥1, 푥2) ∨ 푃 (푥2), 푇 (푥1, 푥2, 푥3) ∧ 푃 (푥2) → 푀(푥1, 푥1),

푆(푥1, 푥2) → 푀(푥1, 푥1), and the BCQ 푄 = ∃푦 푀(푐, 푦), the tree depicted in Figure 2.2 is a chase proof from 풟 and Σ of 푄 as every leaf models 푄.

푅(푐, 푑) 푇2 푇1 푇 0 푅(푐, 푑) 푅(푐, 푑), 푆(푐, 푑1) 푅(푐, 푑), 푇 (푐, 푑, 푑2) 푅(푐, 푑) 푅(푐, 푑), 푆(푐, 푑1) 푅(푐, 푑), 푇 (푐, 푑, 푑2) 푅(푐, 푑), 푇 (푐, 푑, 푑2), 푈(푐, 푑, 푑3)

푅(푐, 푑) 푇4 푇3

푅(푐, 푑) 푅(푐, 푑), 푆(푐, 푑1) 푅(푐, 푑), 푇 (푐, 푑, 푑2)

푅(푐, 푑), 푆(푐, 푑1) 푅(푐, 푑), 푇 (푐, 푑, 푑2) 푅(푐, 푑), 푇 (푐, 푑, 푑2), 푈(푐, 푑, 푑3)

푅(푐, 푑), 푇 (푐, 푑, 푑2), 푈(푐, 푑, 푑3) 푅(푐, 푑), 푇 (푐, 푑, 푑2), 푅(푐, 푑), 푇 (푐, 푑, 푑2), 푈(푐, 푑, 푑3), 푀(푐, 푑) 푈(푐, 푑, 푑3), 푃 (푑)

푅(푐, 푑), 푇 (푐, 푑, 푑2), 푅(푐, 푑), 푇 (푐, 푑, 푑2), 푈(푐, 푑, 푑3), 푀(푐, 푑) 푈(푐, 푑, 푑3), 푃 (푑) 푅(푐, 푑), 푇 (푐, 푑, 푑2), 푈(푐, 푑, 푑3), 푃 (푑), 푀(푐, 푐)

푅(푐, 푑) 푇5

푅(푐, 푑), 푆(푐, 푑1) 푅(푐, 푑), 푇 (푐, 푑, 푑2)

푅(푐, 푑), 푆(푐, 푑1), 푀(푐, 푐) 푅(푐, 푑), 푇 (푐, 푑, 푑2), 푈(푐, 푑, 푑3)

푅(푐, 푑), 푇 (푐, 푑, 푑2), 푅(푐, 푑), 푇 (푐, 푑, 푑2), 푅(푐, 푑), 푇 (푐, 푑, 푑2), 푈(푐, 푑, 푑3), 푈(푐, 푑, 푑3), 푀(푐, 푑) 푈(푐, 푑, 푑3), 푃 (푑) 푃 (푑), 푀(푐, 푐)

Figure 2.2.: Chase tree for Example 2.6.12; instances that will be extended are marked bold and red.

19 2. Preliminaries

As we did for GTGDs, we can again obtain a tree-like structure property for chase trees.

Proposition 2.6.13. Given a chase tree 푇 for a set of DisGTGDs Σ, for any thread 퐼0, . . . , 퐼푛 of 푇 , each instance 퐼푖 can be decomposed into a tree 푇푖 = (푉푖, 퐸푖) with root 푟 such that consts(퐹푟) = consts(퐼0) and for any 푣 ≠ 푟, consts(퐹푣) ⊆ ℎ(⃗푥,⃗푦푘) for a trigger ℎ ⋁︀푚 of some rule 훽(⃗푥) → 푖=1 ⃗푦푖 휂푖(⃗푥,⃗푦) fired in 퐼푗 for some 푗 < 푖 and 푘 ∈ {1, . . . , 푚}. Proof. Apply Proposition 2.6.9 to every thread of 푇 .

We will call such a chase tree tree-like; that is, a chase tree is tree-like if each thread in it is tree-like. Given such a tree-like chase tree 푇 , we will make use of a few additional definitions in order to conveniently modify and talk about the structure of 푇 :

• As 푇 is created by a sequence of chase steps, we can refer to the chase tree created by only firing the first 푖 chase steps of 푇 by 푇푖.

• Given a node 푁 of 푇 , we write 푇 푁 for the subtree of 푇 rooted at 푁. Note that 푇 푁 is a chase tree with root 푁.

• Given a node 푁 of 푇 and a node 푣 in 푁, we write 푁푣 for the bag of 푣 in 푁. This definition is lifted to sets of nodes 푆 by setting 푆푣 := {푁푣 | 푁 ∈ 푆}.

• Given a set of nodes 푆 of 푇 , we write cut(푇, 푆) for the chase tree obtained from 푇 by cutting off the children of all nodes in 푆. If 푆 is a singleton {푁}, we also write cut(푇, 푁) := cut(푇, {푁}).

20 3. The One-Pass Property

The tree-like chases presented so far are experiencing a chaotic behaviour in the sense that triggers can potentially fire in any node of the (internal) tree at any time. It will be useful for our upcoming completeness proofs to tame this behaviour by further normalising the chase proofs. This idea comes from Amarilli and Benedikt[2018], although the proofs that are being normalised there are more specialised than general (disjunctive) GTGDs. We will do this first for chase sequences and then extend to chase trees.

3.1. One-Pass Chases

Let us start with the formal definition of our desired one-pass property.

Definition 3.1.1 (One-pass chase sequence). We say that a tree-like chase sequence 퐼0, . . . , 퐼푛 is one-pass if for every node 푣 present at some stage 퐼푖, if at some stage 푗 > 푖 we perform a chase step triggered in a node outside of the subtree of 푣, then for all 푘 with 푗 ≤ 푘 < 푛, the 푘th step does not modify the subtree of 푣.

Example 3.1.2. The chase depicted in Figure 2.1 is not one-pass. From step four to five, we propagate a fact not only to ancestors that have a suitable guard, but also to a child and a sibling that have a guard for it. In the next step, we make crucial use of this propagated fact in the sibling. In order to obtain a one-pass chase, instead of propagating facts to all nodes in the tree, we can first only propagate facts to ancestors and then recursively re-create all subtrees, starting from the node that triggered the chase step up to the root. This idea is depicted in Figure 3.1, and it is the key idea of the proof of Proposition 3.1.6.

Recall that in a tree-like chase sequence, we have a sequence of trees, where each node is associated with a set of facts and each fact is inferred from facts in prior trees. The usefulness of a one-pass chase is that the inferences we make are more closely related to the tree structure. This thought is captured in Lemma 3.1.4. To prove the lemma, we first give the following definition.

Definition 3.1.3. Let 푣 be a node in a tree 푇 . We say that a node 푤 is a non-strict ancestor of 푣 if 푣 = 푤 or 푤 is an ancestor of 푣 in 푇 . Similarly, we say that 푤 is a non-strict descendant of 푣 if 푣 = 푤 or 푤 is an descendant of 푣 in 푇 .

Lemma 3.1.4. Suppose 퐼0, . . . , 퐼푛 is a one-pass chase sequence and 푣 is a node created in 퐼푖. Suppose 푑 is a non-strict descendant of 푣 in 퐼푗 with 푗 ≥ 푖, and 푑 adds on facts in 퐼푗. 푗 푖 Then we have a one-pass chase proof of 퐹푑 from 퐹푣 using steps from 푖, . . . , 푗 − 1.

21 3. The One-Pass Property

′ 푅(푐, 푑), 푃 (푑) 퐼4 푅(푐, 푑), 푃 (푑) 푅(푐, 푑), 푃 (푑), 푀(푐) 퐼4

푆(푐, 푑1) 푇 (푐, 푑, 푑2), 푃 (푑) 푆(푐, 푑1) 푇 (푐, 푑, 푑2), 푃 (푑), 푀(푐) 푆(푐, 푑1) 푇 (푐, 푑, 푑2), 푃 (푑), 푀(푐)

푈(푐, 푑, 푑3), 푃 (푑) 푈(푐, 푑, 푑3), 푃 (푑) 푈(푐, 푑, 푑3), 푃 (푑)

′′ 푅(푐, 푑), 푃 (푑), 푀(푐) 퐼4 푅(푐, 푑), 푃 (푑), 푀(푐) 퐼5

푆(푐, 푑1) 푇 (푐, 푑, 푑2), 푃 (푑), 푀(푐) 푆(푐, 푑1) 푇 (푐, 푑, 푑2), 푃 (푑), 푀(푐) 푆(푐, 푑5), 푀(푐)

푈(푐, 푑, 푑3), 푃 (푑) 푈(푐, 푑, 푑4), 푃 (푑), 푀(푐) 푈(푐, 푑, 푑3), 푃 (푑) 푈(푐, 푑, 푑4), 푃 (푑), 푀(푐)

푅(푐, 푑), 푃 (푑), 푀(푐) 푅(푐, 푑), 푃 (푑), 푀(푐) 퐼6

푆(푐, 푑1) 푇 (푐, 푑, 푑2), 푃 (푑), 푀(푐) 푆(푐, 푑5), 푀(푐) 푆(푐, 푑1) 푇 (푐, 푑, 푑2), 푃 (푑), 푀(푐) 푆(푐, 푑5), 푀(푐)

푈(푐, 푑, 푑3), 푃 (푑) 푈(푐, 푑, 푑4), 푃 (푑), 푀(푐) 푁(푐, 푑6) 푈(푐, 푑, 푑3), 푃 (푑) 푈(푐, 푑, 푑4), 푃 (푑), 푀(푐) 푁(푐, 푑6), 푀(푐)

Figure 3.1.: We can modify the chase from Figure 2.1 to obtain a one-pass chase. Triggered nodes are marked bold and red, and nodes with missing, non-propagated facts are marked dashed and blue if one-pass propagation is allowed, and dotted and brown if one-pass propagation is prohibited and a new copy has not been created so far.

푗 푖 Proof. We prove the claim by induction on 푗 − 푖. If 푖 = 푗, we have 퐹푑 = 퐹푣, and hence we 푗 푖 do not need any steps to create 퐹푑 from 퐹푣. Now assume 푗 > 푖. Let 푤 be the node that was used to generate the new facts 퐹 . Since the chase is one-pass, 푤 must be a non-strict descendant of 푑, or 푤 is the parent that generated 푑 at step 푗; in any case, 푤 is a non-strict descendant of 푣. Let 푘 < 푗 be the last time a fact was added to 푤. By the inductive hypothesis, we have a one-pass chase proof 푘 푗−1 푖 of 퐹푤 = 퐹푤 from 퐹푣 using steps from 푖, . . . , 푘 − 1. Further, we have a chase step from 푗−1 퐹푤 to 퐹 . The claim now follows by transitivity.

Corollary 3.1.5. Suppose 퐼0, . . . , 퐼푛 is a one-pass chase sequence, 푣 is a node in some 퐼푖 that was created in 퐼푘, and 푑 is non-strict descendant of 푣 in 퐼푗 with 푗 ≥ 푖. Then we have 푗 푖 a one-pass chase proof of 퐹푑 from 퐹푣 using steps from 푘, . . . , 푗 − 1. Proof. Let 푙 ≤ 푗 be the last time a fact was added to 푑. By Lemma 3.1.4, we have a 푙 푗 푘 푘 푖 one-pass chase proof of 퐹푑 = 퐹푑 from 퐹푣 using steps from 푘, . . . , 푙 − 1. Since 퐹푣 ⊆ 퐹푣 and 푗 푖 푙 ≤ 푗, we also have a one-pass chase proof of 퐹푑 from 퐹푣 using steps from 푘, . . . , 푗 − 1. As a next step, we want to show that we can indeed assume the existence of a one-pass chase proof whenever we have a chase proof for some UBCQ 푄. We achieve this by constructing a suitable one-pass chase sequence from a given arbitrary chase sequence as sketched in Example 3.1.2.

Proposition 3.1.6. For every tree-like chase sequence 퐼0, . . . , 퐼푛, there is a one-pass chase sequence 퐼0(=퐼0), 퐼1,..., 퐼푚 such that there is a homomorphism ℎ : 퐼푛 → 퐼푚 with ℎ(푐) = 푐 for any 푐 ∈ consts(퐼0).

22 3.1. One-Pass Chases

푖 푖 Proof. We write 퐹푣 for the set of facts of some node 푣 in 퐼푖 and 퐹푣 for the set of facts of some node 푣 in 퐼푖. We also show that ℎ preserves the tree structure of 퐼푛; that is, for any subtree 푇 in 퐼푛, ℎ(푇 ) is a subtree in 퐼푚 graph-isomorphic to 푇 . This allows us to write 푚 푛 푣 := ℎ(푣) to refer to the node 푣 in 퐼푚 with 퐹푣 = ℎ(퐹푣 ) for any node 푣 in 퐼푛. We prove the claim by induction on 푛. For 푛 = 0, we simply set 퐼0 := 퐼0 and let ℎ be the identity. Assume 푛 > 0. Then by the inductive hypothesis, we get a one-pass chase sequence 퐼0(=퐼0), 퐼1,..., 퐼푚 such that 퐼푛−1 is homomorphic to 퐼푚 via ℎ. Assume the chase step from 퐼푛−1 to 퐼푛 fires in 푣. First, we use the inductive one-pass chase to generate a copy of the subtree of 푣 (with respect to 퐼푛−1) starting from the root of 퐼푚; more precisely, we re-fire all rules used to create the subtree of 푣 but use fresh constants 푐′ for any 푐 ∈ consts(퐼푚) ∖ consts(퐼0) in this chase while any constant in consts(퐼0) stays unchanged. Call the copy of 푣 in this subtree 푣′. We then fire the chase step in 푣′ but only propagate new facts to ancestors. Since some newly generated facts might have propagated to non-ancestor nodes 푤 of 푣, we also have to create new copies of these non-ancestor nodes in our one-pass chase. Further, we have to update ℎ to preserve the tree structure of 퐼푛. We achieve this by modifying the subtree of any non-strict ancestor of 푣′ in the following way: Beginning from 푝0 := 푣, we will create copies of all facts occurring in the subtree of 푝푖 in 퐼푛 that are still missing in our one-pass chase. For this, let 퐼0,..., 퐼푚* be the ′ one-pass chase created so far, 푝푖 be the copy of 푝푖 in 퐼푚, and 푝푖 be the non-strict ancestor ′ of 푣 corresponding to 푝푖. By the inductive hypothesis and Corollary 3.1.5, we have a 푝 퐹 − 푝 퐹 − one-pass chase proof of the subtree of 푖 from 푝푖 (the initial set of facts of 푖). As 푝푖 is 푚* ′ 퐹 ′ 푝푖 homomorphic to 푝푖 , we can also fire the steps of this chase proof starting from in ′ ′ 퐼푚* to create all missing facts in the subtree of 푝푖 (while again using fresh constants 푐 for any 푐 ∈ consts(퐼푚* ) ∖ consts(퐼0)). We then let 푝푖+1 be the parent of 푝푖 and proceed with 푝푖+1. Finally, for any node 푢 in 퐼푛, we let ℎ map to the newest copy of 푢 in 퐼푚* . As all newly generated facts were propagated to all ancestors of 푣′, ℎ preserved the tree structure of 퐼푛−1, and we re-created the missing facts of all subtrees, ℎ(푢) will be a copy of 푢 in 퐼푚* . Further, ℎ preserves the tree structure of 퐼푛 as we re-created all subtrees.

We have to show that our construction indeed preserves answers for BCQs.

′ ′ Lemma 3.1.7. Let 퐼0, . . . , 퐼푛 and 퐼0(=퐼0), . . . , 퐼푚 be two chase sequences such that there ′ is a homomorphism ℎ : 퐼푛 → 퐼푚 with ℎ(푐) = 푐 for any 푐 ∈ consts(퐼0). Then, for any ′ ′ BCQ 푄, if 퐼0, . . . , 퐼푛 is a chase proof of 푄, then 퐼0, . . . , 퐼푚 is a chase proof of 푄.

Proof. We show a stronger claim: for any CQ 푄(⃗푥) and ⃗푐 ⊆ consts(퐼푛), 퐼푛 |= 푄(⃗푐) implies ′ 퐼푚 |= ℎ(푄(⃗푐)). Since ℎ is the identity on consts(퐼0), this implies our goal. We show the claim by induction on the number of existential quantifiers of 푄, denoted by 푘. ′ If 푘 = 0, using that ℎ : 퐼푛 → 퐼푚 is a homomorphism, we get

퐼푛 |= 푄(⃗푐) ⇐⇒ 퐼푛 |= 퐴(⃗푐) for any atom 퐴 in 푄 ′ =⇒ 퐼푚 |= ℎ(퐴(⃗푐)) for any atom 퐴 in 푄 ′ ⇐⇒ 퐼푚 |= ℎ(푄(⃗푐)).

23 3. The One-Pass Property

′ ′ ′ Assume 푘 > 0. That is 푄(⃗푥) ≡ ∃푦 푄 (⃗푥,푦) with 푄 a CQ. If 퐼푛 |= 푄(⃗푐), then there is 푐 ∈ ′ ′ ′ ′ ′ consts(퐼푛) such that 퐼푛 |= 푄 (⃗푐,푐 ). Thus, by the inductive hypothesis, 퐼푚 |= ℎ(푄 (⃗푐,푐 )), ′ and hence 퐼푚 |= ℎ(푄(⃗푐)).

Finally, we derive our desired goal: Theorem 3.1.8. A UBCQ 푄 has a chase proof from 풟 and GTGDs Σ if and only if it has a one-pass chase proof from 풟 and Σ.

Proof. If we have one-pass chase proof of 푄, we also have a chase proof of 푄 since every one-pass chase proof is a chase proof. Assume we have a chase proof 퐼0, . . . , 퐼푛 of 푄 and let 퐼0,..., 퐼푚 be the corresponding one-pass chase sequence as created in Proposition 3.1.6. Then 퐼푛 |= 푄푖 for some 푄푖 ∈ 푄. Hence, by Lemma 3.1.7, 퐼푚 |= 푄푖, and thus 퐼푚 |= 푄.

3.2. One-Pass Disjunctive Chases

Next, we generalise our idea to disjunctive chases. We proceed as in the previous section.

Definition 3.2.1 (One-pass chase tree). We say that a chase tree is one-pass if each thread in it is a one-pass chase sequence.

Example 3.2.2. The tree-like chase tree for Example 2.6.12, depicted in Figure 3.2, is not one-pass. From step three to four, we propagate the fact 푀(푐, 푐) to a child that has a suitable guard.

Again, we want to construct a suitable one-pass chase tree 푇 from a given arbitrary chase tree 푇 . Since a chase tree models some UBCQ 푄 if and only if every leaf of the tree models 푄, we want to construct 푇 in a way such that for any leaf 퐿 of 푇 , there is at least one leaf 퐿 of 푇 that is homomorphic to 퐿. The basic idea is to lift the proof of Proposition 3.1.6. That is, we first only propagate facts to ancestors and then recursively re-create all subtrees. The introduction of disjunctions, however, causes some difficulties: whenever we want to create a copy of some subtree in a node of 푇 , we might generate additional threads in 푇 independent of the given subtree. We thus have to consider these additional threads in our proof.

Proposition 3.2.3. For every tree-like chase tree 푇 with root 퐼0, there is a one-pass chase tree 푇 with root 퐼0 such that for any leaf 퐿 in 푇 , there is a leaf 퐿 in 푇 and a homomorphism ℎ퐿 : 퐿 → 퐿 with ℎ(푐) = 푐 for any 푐 ∈ consts(퐼0).

Proof. As we did for Proposition 3.1.6, we also show that each ℎ퐿 preserves the tree structure of 퐿. 퐿 ∈ lvs(푇 ). We prove the claim by induction on the number of steps used to create 푇 , denoted by 푛. For 푛 = 0, we set 퐼0 := 퐼0 and let ℎ be the identity. For 푛 > 0, by the inductive hypothesis, we get a one-pass chase tree 푇 such that for any 퐿 ∈ lvs(푇 ), there is some 퐿 ∈ lvs(푇푛−1) and constructed homomorphism ℎ퐿 : 퐿 → 퐿. Assume the chase step from 푇푛−1 to 푇푛 fires in node 푣 in 퐿, creating leaves 퐿1, . . . , 퐿푚, and let 푆 be the set of all leaves 퐿 ∈ lvs(푇 ) such that 퐿 is homomorphic to 퐿 via some

24 3.2. One-Pass Disjunctive Chases

푅(푐, 푑) 푇2 푅(푐, 푑) 푇1 푇0 푅(푐, 푑) 푅(푐, 푑) 푅(푐, 푑) 푅(푐, 푑) 푅(푐, 푑) 푅(푐, 푑) 푇 (푐, 푑, 푑2), 푅(푐, 푑) 푆(푐, 푑1) 푇 (푐, 푑, 푑2), 푅(푐, 푑) 푆(푐, 푑1) 푇 (푐, 푑, 푑2), 푅(푐, 푑) 푈(푐, 푑, 푑3), 푅(푐, 푑)

푅(푐, 푑) 푇4 푅(푐, 푑) 푇3 푅(푐, 푑) 푅(푐, 푑) 푅(푐, 푑) 푅(푐, 푑) 푅(푐, 푑) 푇 (푐, 푑, 푑2), 푅(푐, 푑) 푆(푐, 푑1) 푇 (푐, 푑, 푑2), 푅(푐, 푑) 푆(푐, 푑1) 푇 (푐, 푑, 푑2), 푅(푐, 푑) 푈(푐, 푑, 푑3), 푅(푐, 푑)

푅(푐, 푑) 푅(푐, 푑), 푀(푐, 푑) 푅(푐, 푑), 푃 (푑)

푇 (푐, 푑, 푑2), 푅(푐, 푑) 푇 (푐, 푑, 푑2), 푅(푐, 푑), 푀(푐, 푑) 푇 (푐, 푑, 푑2), 푅(푐, 푑), 푃 (푑)

푈(푐, 푑, 푑3), 푅(푐, 푑) 푈(푐, 푑, 푑3), 푅(푐, 푑), 푀(푐, 푑) 푈(푐, 푑, 푑3), 푅(푐, 푑), 푃 (푑)

푅(푐, 푑), 푀(푐, 푑) 푅(푐, 푑), 푃 (푑) 푅(푐, 푑), 푃 (푑), 푀(푐, 푐)

푇 (푐, 푑, 푑2), 푅(푐, 푑), 푀(푐, 푑) 푇 (푐, 푑, 푑2), 푅(푐, 푑), 푃 (푑) 푇 (푐, 푑, 푑2), 푅(푐, 푑), 푃 (푑), 푀(푐, 푐)

푈(푐, 푑, 푑3), 푅(푐, 푑), 푀(푐, 푑) 푈(푐, 푑, 푑3), 푅(푐, 푑), 푃 (푑) 푈(푐, 푑, 푑3), 푅(푐, 푑), 푃 (푑), 푀(푐, 푐)

푅(푐, 푑) 푇5 푅(푐, 푑) 푅(푐, 푑) 푅(푐, 푑) 푇 (푐, 푑, 푑2), 푅(푐, 푑) 푆(푐, 푑1) 푇 (푐, 푑, 푑2), 푅(푐, 푑) 푈(푐, 푑, 푑3), 푅(푐, 푑) 푅(푐, 푑), 푀(푐, 푐) 푅(푐, 푑), 푀(푐, 푑) 푅(푐, 푑), 푃 (푑) 푅(푐, 푑), 푃 (푑), 푀(푐, 푐) 푆(푐, 푑1), 푀(푐, 푐) 푇 (푐, 푑, 푑2), 푅(푐, 푑), 푀(푐, 푑) 푇 (푐, 푑, 푑2), 푅(푐, 푑), 푃 (푑) 푇 (푐, 푑, 푑2), 푅(푐, 푑), 푃 (푑), 푀(푐, 푐)

푈(푐, 푑, 푑3), 푅(푐, 푑), 푀(푐, 푑) 푈(푐, 푑, 푑3), 푅(푐, 푑), 푃 (푑) 푈(푐, 푑, 푑3), 푅(푐, 푑), 푃 (푑), 푀(푐, 푐)

Figure 3.2.: Tree-like chase tree corresponding to Figure 2.2; triggered nodes are marked bold and red.

constructed ℎ퐿. If 푆 = ∅, there is nothing to do as each leaf in 푇 still has a homomorphism from some leaf in 푇푛. Hence, assume 푆 ̸= ∅. First, we want to create a copy of the subtree of 푣 (with respect to 퐿) using the inductive one-pass chase. For this, for any 퐿 ∈ 푆, let 푁퐿 be the last node in the thread 퐼0,..., 퐿 in which the subtree of the node 푣 in 퐿 corresponding to 푣 added on some fact. Let 푆푣 be the set consisting of all such nodes 푁퐿. Note that 푆푣 ̸= ∅ since 푆 ̸= ∅. We now re-fire ′ from any 퐿 ∈ 푆 all rules that created the tree cut(푇 , 푆푣) but use fresh constants 푐 for any constant 푐 ∈ consts(푇 ) ∖ consts(퐼0) in this chase while any constant in consts(퐼0) stays ′ unchanged. Let 푇 be the resulting chase tree. Note that any descendant 퐷 of some 퐿 ∈ 푆 in 푇 ′ is just an extension of 퐿, and hence, 퐿 is still homomorphic to 퐷 via ℎ퐿 (we will make use of this fact in a step below). Even ′ ′ ′ more, for any new leaf 퐿 in 푇 , there is either some 퐿 ∈ lvs(푇푛) that is homomorphic ′ ′ ′ to 퐿 (if 퐿 got created by a branch that was not affected by 푆푣), or 퐿 contains a fresh

25 3. The One-Pass Property copy of the subtree of 푣, created by a one-pass proof and eligible for further one-pass chase ′ steps. Let 푆′ consist of all leaves in 푇 for which the latter is the case. Note that 푆′ ̸= ∅ as 푆푣 ̸= ∅. ′ ′ For any 퐿 ∈ 푆 , we then fire the chase step that extended 푇푛−1 to 푇푛 in the fresh ′ copy 푣 of 푣, creating new leaves 퐿1,..., 퐿푚 that we associate to 퐿1, . . . , 퐿푚, but only * propagate new facts to ancestors in each 퐿푖. Let 푇 be the chase tree created so far. As we did for Proposition 3.1.6, we now want to recursively re-create subtrees in each 퐿푖 and ℎ 퐿 create some 퐿푖 that preserves the tree structure of 푖. We do this by iteratively updating * 푇 for 1 ≤ 푖 ≤ 푚 in the following way: * Beginning from 푝0 := 푣, for any leaf 퐿푖 in 푇 associated to 퐿푖, let 푝푗 be the copy of 푝푗 ′ in 푇 as given by ℎ퐿 (퐿푖 is a descendant of some 퐿 ∈ 푆), and 푝푗 be the non-strict ancestor 푣′ 푝 푁 푇 푝 of corresponding to 푗. Further, let 퐿푖 be the node in in which 푗 got created and 푀 푁 ,..., 퐿 푝 퐿푖 be the last node in 퐿푖 in which the subtree of 푗 added on a fact. Then let 푆 푁 푆 푀 푁 consist of all such nodes 퐿푖 , and 푀 consist of all such nodes 퐿푖 . ′ For each 퐿푖, we would now like to create all missing facts in the subtree of 푝푗 by re-firing (︀ *푁퐿 )︀ cut 푇 푖 , 푀 steps from 퐿푖 ; however, this might create new independent threads with ′ leaves that are again associated to 퐿푖 but in which the subtree of 푝푗 is still missing facts. To solve this issue, for 푈 ∈ {푆푁 , 푆푀 }, we first remove any node from 푈 that has an ancestor in 푈. Then, for each 푁 ∈ 푆푁 , let 푀푁 consist of all 푀 ∈ 푆푀 such that 푁 is an ancestor of 푀. Fix some 푁 ∈ 푆푁 . (︀ *푁 )︀ Claim 1. cut 푇 , 푀푁 contains no child of any 퐿 ∈ 푆.

Subproof. Let 퐿 ∈ 푆 and assume 퐿 is a non-strict descendant of 푁. There is at least one 퐿 퐿 푁 , 푀 푁 푆 푁 descendant 푖 associated to 푖 with corresponding 퐿푖 퐿푖 . As is in 푁 , 퐿푖 must 푁 푀 be a non-strict descendant of , and hence, also 퐿푖 must be a non-strict descendant of 푁 푀 푆 푆 . Thus, either 퐿푖 or one of its ancestors is in 푀 . All children of nodes in 푀 are removed, and consequently, no child of 퐿 occurs in 푇푁 . 

Now, for any 퐿푖 associated to 퐿푖 that is a descendant of 푁, we re-fire the steps that (︀ *푁 )︀ ′ ′ created cut 푇 , 푀푁 starting from 푝푗 to create all missing facts in the subtree of 푝푗 . (︀ *푁 )︀ ′ Since cut 푇 , 푀푁 contains no children of any 퐿 ∈ 푆, for any newly created leaf 퐿 in ′ ′ ′ this process, there is either some 퐿 ∈ lvs(푇푛) that is homomorphic to 퐿 , or 퐿 is associated ′ to 퐿푖 and the subtree of 푝푗 got repaired by the just fired chase steps. Once we processed any 푁 ∈ 푆푁 , we let 푝푗+1 be the parent of 푝푗 and proceed with 푝푗+1. * Finally, for any leaf 퐿푖 associated to some 퐿푖 in the final 푇 , and any node 푢 in 퐿푖, we * ℎ 푢 푇 ℎ let 퐿푖 map to the newest copy of in . Like in Proposition 3.1.6, each constructed 퐿푖 will be a homomorphism that preserves the tree structure of 퐿푖.

Theorem 3.2.4. A UBCQ 푄 has a chase proof from 풟 and DisGTGDs Σ if and only if it has a one-pass chase proof from 풟 and Σ.

Proof. If we have one-pass chase proof of 푄, we also have a chase proof of 푄 since every one-pass chase proof is a chase proof.

26 3.2. One-Pass Disjunctive Chases

Assume we have a chase proof 푇 of 푄, and let 푇 be the corresponding one-pass chase tree as created in Proposition 3.2.3. Take a leaf 퐿 in 푇 and let 퐿 be the leaf in 푇 that is homomorphic to 퐿. Since 퐿 |= 푄, we have 퐿 |= 푄푖 for some 푄푖 ∈ 푄. Hence, by Lemma 3.1.7, 퐿 |= 푄푖, and thus 퐿 |= 푄.

27

4. Quantifier-Free Query Answering under GTGDs

In this chapter, we present two procedures for GTGDs that terminate and output an atomic rewriting. To accomplish this, we first introduce some normal forms. Then, we describe an abstract property that implies completeness of a rewriting algorithm, which we call evolve-completeness. We then present two sound saturation processes that are evolve- complete and hence complete. For ease of exposition, we first introduce a non-optimal, but arguably more intuitive algorithm, which we call the simple saturation. We then discuss problems arising with this approach and introduce an improved version, which we call the guarded saturation.

4.1. Normal Forms

We introduce two normal forms for TGDs that facilitate subsequent proofs and ensure termination of our rewriting algorithms. Firstly, we want to normalise the variables of our set of TGDs to avoid duplicate isomorphic rules in our rewriting. Definition 4.1.1 (TGD variable normal form). Given a TGD 휏 with body 훽 and head th 휂, we say that 휏 is in variable normal form if vars(훽) = ⃗푥, vars(휂) ∖ ⃗푥 = ⃗푦, 푥푖 is the 푖 th lexicographic variable occurring in 훽, and 푦푖 the 푖 lexicographic variable occurring in 휂. We write VNF(휏) for the unique TGD obtained by rewriting 휏 into variable normal form, and given a set of TGDs Σ, we write VNF(Σ) := {VNF(휏) | 휏 ∈ Σ} for the variable normal form of Σ. Example 4.1.2. The TGD [︀ (︀ )︀]︀ ∀푥3, 푦2, 푥1 퐵(푦2, 푥1, 푥3) → ∃푦1, 푧1, 푦2 퐻1(푥1, 푧1, 푦1, 푦2) ∧ 퐻2(푦1, 푦2) is not in variable normal form. Its variable normal form is [︀ (︀ )︀]︀ ∀푥1, 푥2, 푥3 퐵(푥1, 푥2, 푥3) → ∃푦1, 푦2, 푦3 퐻1(푥2, 푦1, 푦2, 푦3) ∧ 퐻2(푦2, 푦3) .

Lemma 4.1.3. Any TGD 휏 can be rewritten into VNF(휏) in linear time.

Proof. First rename the universal and existential variables of 휏 into disjoint sets 푉∀, 푉∃ ′ ′ and obtain the corresponding 휏 . Then scan 휏 left to right to get an order on 푉∀ and 푉∃, ′ respectively, and choose fresh variables 푣1, . . . , 푣|푉∀|, 푤1, . . . , 푤|푉∃|. Then rename 휏 using 휌 defined by {︃ th 푣푖, if 푥 is the 푖 variable in 푉∀ 푥휌 := . th 푤푖, if 푥 is the 푖 variable in 푉∃

29 4. Quantifier-Free Query Answering under GTGDs

′ ′ ′ ′ Finally, rename 휏 휌 using 휌 defined by 푣푖휌 := 푥푖 and 푤푖휌 := 푦푖. All steps can clearly be done in linear time.

Secondly, we want to avoid “mixed” rules; that is, non-full TGDs which become full rules when selecting a subset of head atoms.

Definition 4.1.4 (TGD head normal form). We say that a TGD 휏 is in head normal form if 휏 is full or every head atom contains at least one existential variable. Given a set of TGDs Σ, the head normal form HNF(Σ) of Σ is obtained by replacing every rule 푛 ⋀︁ 훽(⃗푥) → ∃⃗푦 퐻푖(⃗푥,⃗푦), 푖=1 that is not in head normal form by two rules ⋀︁ 훽(⃗푥) → ∃⃗푦 퐻푖(⃗푥,⃗푦), 푖∈퐼 ⋀︁ 훽(⃗푥) → 퐻푖(⃗푥), 푖∈({1,...,푛}∖퐼) where 퐼 := {푖 | vars(퐻푖) ∩ ⃗푦 ̸= ∅}. This replacement can be done in time linear in ∑︀ size(Σ) := 휏∈Σ size(휏). Lemma 4.1.5. Given a database 풟, a set of TGDs Σ, and a UBCQ 푄, we have 풟, Σ |= 푄 if and only if 풟, HNF(Σ) |= 푄.

Proof. By Theorem 2.6.2, it suffices to show that 푄 has a chase proof from 풟 using Σ if and only if it has a chase proof from 풟 using HNF(Σ). Any 휏 ∈ Σ ∩ HNF(Σ) can fire equivalently in both chases. If we fire a 휏 ∈ Σ ∖ HNF(Σ) in a chase proof using Σ, we can simply fire both rules from HNF(휏) in a chase proof using HNF(Σ). Conversely, if we fire a 휏 ∈ HNF(Σ) ∖ Σ in a chase proof using HNF(Σ), we can simply fire the rule from which 휏 was created to get a chase proof using Σ.

The reason for the head normal form transformation is to simplify our correctness proofs and to minimise the number of variables occurring in each head of a rule. This in return will facilitate required unification steps in our upcoming rewriting algorithms.

Definition 4.1.6 (TGD width). Given a TGD 휏 with body 훽 and head 휂, we define

(a) the body width of 휏 as bwidth(휏) := |vars(훽)|, (b) the head width of 휏 as hwidth(휏) := |vars(휂)|, and (c) the width of 휏 as width(휏) := max{bwidth(휏), hwidth(휏)}.

The definitions are extended to set of TGDs Σ by taking the maximum of any 휏 ∈ Σ, e.g. width(Σ) := max{width(휏) | 휏 ∈ Σ}.

Example 4.1.7. The TGD [︀ ]︀ ∀푥1, 푥2 퐵(푥1, 푥2) → ∃푦1 퐻1(푥1, 푦1) ∧ 퐻2(푥2)

30 4.2. An Abstract Property for Completeness has width 3, whereas its head normal form [︀ ]︀ ∀푥1, 푥2 퐵(푥1, 푥2) → ∃푦1 퐻1(푥1, 푦1) , [︀ ]︀ ∀푥1, 푥2 퐵(푥1, 푥2) → 퐻2(푥2) has width 2. Lemma 4.1.8. For any set of TGDs Σ, we have hwidth(HNF(Σ)) ≤ hwidth(Σ).

Proof. Any head of a rule in HNF(Σ) is a subset of a head of a rule in Σ.

4.2. An Abstract Property for Completeness

Recall the structure of a tree-like chase from some database 풟 and set of GTGDs Σ.A tree-like chase models some UBCQ 푄 if and only if all the facts of some 푄푖 ∈ 푄 are contained in the root node 푟 of the final instance of the chase. Also recall that an atomic rewriting Σ′ must be a set of full rules. Consequently, in order to obtain completeness of Σ′, we must ensure that whenever a fact 퐹 relevant to 푟 is created by a sequence of chase steps using Σ, 퐹 can also be created by a sequence of full rules using Σ′. It is evident that 퐹 cannot only be created by firing a full rule in 푟, but also by firing a rule in a descendant of 푟, as can be seen in Figure 2.1. To derive sufficient completeness properties for Σ′, we can consider these two cases separately. If we fire a full rule 휏 ∈ Σ in 푟, we can simply add 휏 to Σ′ and then fire 휏 in the chase using Σ′. Our first property will thus require that every full rule of Σ will be contained in Σ′. If, however, we fire a rule 휏 in a descendant 푑 of 푟, we cannot create a copy of 푑 in a chase using Σ′ (as Σ′ is full). We hence need a second property which ensures that whenever a fact relevant to 푟 is created by firing some 휏 ∈ Σ in a descendant of 푟, the fact can also be created by a sequence of full rules in Σ′. Due to an inductive argument (Lemma 4.2.4), we can even relax this property a bit further and focus on facts created in a child of 푟 instead of general descendants. These thoughts capture the general idea of the following properties: Definition 4.2.1 (Evolve-completeness). Let Σ be a set of GTGDs and Σ′ be a set of full TGDs. We say that Σ′ is evolve-complete with respect to Σ if it satisfies the following properties: (a) For every full 휏 ∈ HNF(Σ), we have 휏 ∈ Σ′ (modulo renaming). (b) Let 퐼0, . . . , 퐼푛+1 be a chase, let 푟 be the node inside 퐼0, let 휏푟 ∈ HNF(Σ) be non-full ′ and 휏1, . . . , 휏푛 ∈ Σ be full such that

(i) firing 휏푟 in 퐼0 creates 퐼1, (ii) firing 휏푖 in 푣 in 퐼푖 creates 퐼푖+1, where 푣 is the child of 푟. ′ ′ 푛 ′ 푛+1 Then there exists a chase 퐼0, . . . , 퐼푚 from 퐹푟 and Σ of 퐹푟 . An illustration of Property 4.2.1.(b) can be found in Figure 4.1. Before we prove that our properties indeed imply completeness of Σ′, we introduce an apparent strengthening of Property 4.2.1.(b).

31 4. Quantifier-Free Query Answering under GTGDs

푟 1 푛 푛+1 퐹푟 퐼1 푟 퐹푟 퐼푛 푟 퐹푟 퐼푛+1 0 푟 퐹푟 퐼0 ... 1 푛 푛+1 푣 퐹푣 푣 퐹푣 푣 퐹푣

푛 ′ 푛 ′ ... 푛 ′ 푛+1 ′ 푟 퐹푟 퐼0 푟 퐹푟 ∪ 푆1 퐼1 푟 퐹푟 ∪ 푆푚−1 퐼푚−1 푟 퐹푟 퐼푚

Figure 4.1.: Illustration of Property 4.2.1.(b); the upper chase is given by assumption and the 푛+1 푛 bottom chase is obtained from Property 4.2.1.(b), where 푆1 ⊆ · · · ⊆ 푆푚−1 ⊆ (퐹푟 ∖ 퐹푟 ). Triggered nodes are marked bold and red, and the nodes’ names are written on the left.

Definition 4.2.2 (Strong evolve-completeness). Let Σ be a set of GTGDs and Σ′ be a set of full TGDs. We say that Σ′ is strongly evolve-complete with respect to Σ if it satisfies Property 4.2.1.(a) and the following strengthening of Property 4.2.1.(b):

(a) Let 퐼0, . . . , 퐼푛+1 be a chase, let 푟 be the node inside 퐼0, let 휏푟 ∈ HNF(Σ) be non-full ′ and 휏1, . . . , 휏푛 ∈ Σ be full such that

(i) firing 휏푟 in 퐼0 creates 퐼1, (ii) firing 휏푖 in 푣 in 퐼푖 creates 퐼푖+1, where 푣 is the child of 푟. ′ ′ ′ 푛+1 Then there exists a chase 퐼0, . . . , 퐼푚 from 퐼0 and Σ of 퐹푟 .

It turns out, however, that both properties are equivalent: Lemma 4.2.3. Σ′ is evolve-complete with respect to Σ if and only if Σ′ is strongly evolve- complete with respect to Σ.

푛+1 Proof. Right to left is a strengthening: if there is a chase from 퐼0 of 퐹푟 , then there is 푛 also one from 퐹푟 ⊇ 퐼0. We prove the other direction by induction on 푛. For 푛 = 0, there is nothing to do 1 since 휏푟 is in head normal form and non-full, and thus 퐼0 = 퐹푟 . ′ ′ Assume 푛 > 0. By the inductive hypothesis, we get a chase 퐼0, . . . , 퐼푚 from 퐼0 and ′ 푛 푛 Σ of 퐹푟 . Now we apply Property 4.2.1.(b) on the chase 퐹푟 , 퐼1, . . . , 퐼푛+1 to get a chase ′ ′ 푛 ′ 푛+1 퐼푚+1 . . . , 퐼푚+푚′ from 퐹푟 and Σ of 퐹푟 . Hence, by connecting the chases, we get a chase ′ 푛 from 퐼0 and Σ of 퐹푟 .

Due to the just shown equivalence, we subsequently use the notions of evolve-completeness and strong evolve-completeness interchangeably. Given a set of GTGDs Σ, we now show that every evolve-complete set Σ′ is in fact complete with respect to Σ; that is, any quantifier-free UBCQ following from Σ will also follow from Σ′. We first prove the following key lifting lemma that relies on the one-pass chase property. Lemma 4.2.4. Let Σ be a set of GTGDs, let Σ′ be evolve-complete with respect to Σ, and let 퐼0, . . . , 퐼푛 be a one-pass chase using HNF(Σ). Assume a node 푣 is created at step 푖 > 0 by firing a rule 휏푝 in parent 푝 and let 푗 ≥ 푖 such that 푣 adds on facts. Then there is a ′ ′ 푖−1 ′ 푗 chase 퐼0, . . . , 퐼푚 from 퐹푝 and Σ of 퐹푝 .

32 4.2. An Abstract Property for Completeness

* * 푖 ′ 푗 Proof. We additionally show that there is a chase 퐼0 , . . . , 퐼푘 from 퐹푣 and Σ of 퐹푣 . We prove the claim by induction on 푗 − 푖. In case 푖 = 푗, there is nothing to do since 휏푝 is in head normal form. 푖 푗 In case 푖 > 푗, we have a one-pass chase proof from 퐹푣 to 퐹푣 by Lemma 3.1.4. Hence, we can assume that the new facts relevant to 푣 are generated by some 휏 ∈ HNF(Σ) acting on some node 푑 that is a non-strict descendant of 푣. * * First consider the case 푣 = 푑. By the inductive hypothesis, we get a chase 퐼0 , . . . , 퐼푘 푖 ′ 푗−1 ′ from 퐹푣 and Σ of 퐹푣 . Since 푣 adds on facts in step 푗, 휏 must be full, and hence 휏 ∈ Σ * * (modulo renaming) by Property 4.2.1.(a). Thus, we can fire 휏 in 퐼푘 to obtain 퐼푘+1 with * 푗 푖−1 * * 퐼푘+1 |= 퐹푣 . Now we can apply Property 4.2.2.(a) on 퐹푝 , 퐼0 , . . . , 퐼푘+1 to obtain the ′ ′ desired 퐼0, . . . , 퐼푚. Next consider the case where 푑 is a descendant of 푣. Let 푐 be the first child of 푣 on the * * path to 푑, created at time 푘 > 푖. By the inductive hypothesis, we get a chase 퐼0 , . . . , 퐼푘 푖 ′ 푘−1 * * from 퐹푣 and Σ of 퐹푣 . Also by the inductive hypothesis, we get a chase 퐼푘+1, . . . , 퐼푘′ 푘−1 ′ 푗 from 퐹푣 and Σ of 퐹푣 . Now we can connect both chases and then proceed as before. ′ ′ That is, we use Property 4.2.2.(a) to obtain the desired 퐼0, . . . , 퐼푚. The completeness property now follows by a simple induction. Proposition 4.2.5 (Completeness). Let Σ be a set of GTGDs and Σ′ be evolve-complete with respect to Σ. Then Σ′ is complete with respect to Σ.

Proof. By Theorems 2.6.2 and 3.1.8 and Lemma 4.1.5, it suffices to show that whenever a quantifier-free UBCQ 푄 has a one-pass chase proof 퐼0, . . . , 퐼푛 from 풟 and HNF(Σ), then 푄 also follows from 풟 and Σ′. Let 푟 be the root inside the one-pass chase proof. Since 푛 퐼푛 |= 푄 if and only if 퐼푛 |= 푄푖 for some 푄푖 ∈ 푄 if and only if 푄푖 ⊆ 퐹푟 , it suffices to show ′ ′ ′ ′ 푛 that we can create a chase 퐼0, . . . , 퐼푚 from from 풟 and Σ with 퐼푚 |= 퐹푟 . We show the claim by induction on 푛. The case 푛 = 0 is vacuously true. In case 푛 > 0, let 휏 ∈ Σ be the rule firing at step 푛 in some node 푑, and assume that 푟 ′ ′ adds on a fact in 퐼푛; otherwise, we can just use the chase 퐼0, . . . , 퐼푚 obtained by the inductive hypothesis. If 푑 = 푣, then 휏 must be full, and thus, 휏 ∈ Σ′ by Property 4.2.1.(a). Moreover, by the ′ ′ ′ 푛−1 ′ inductive hypothesis, we get 퐼0, . . . , 퐼푚 such that 퐼푚 |= 퐹푟 . Hence, we can fire 휏 in 퐼푚 ′ ′ 푛 to obtain 퐼푚+1 with 퐼푚+1 |= 퐹푟 . Otherwise, let 푐 be the first child of the root on the path to 푑, created at step 푘 < 푛. ′ ′ ′ 푘−1 By the inductive hypothesis, we get 퐼0, . . . , 퐼푚 with 퐼푚 |= 퐹푟 . Further, by Lemma 4.2.4, ′ ′ 푘−1 ′ ′ 푛 we get 퐼푚+1, . . . , 퐼푚+푚′ from 퐹푟 and Σ with 퐼푚+푚′ |= 퐹푟 . Hence, by connecting the ′ 푛 chases, we get a chase from 풟 and Σ of 퐹푟 . Corollary 4.2.6. Let Σ be a set of GTGDs and Σ′ be finite, sound, and evolve-complete with respect to Σ. Then Σ′ is an atomic rewriting of Σ.

Proof. Σ′ is full by Definition 4.2.1, finite and sound by assumption, complete by Proposi- tion 4.2.5, and hence an atomic rewriting by Lemma 2.4.7.

Having a sufficient completeness property at hand, we next introduce two sound rewriting algorithms that fulfil this property.

33 4. Quantifier-Free Query Answering under GTGDs

4.3. The Simple Saturation

There are different ways to achieve the properties defined in the previous section. Our first algorithm will be based on the ideas of the following example.

Example 4.3.1. Given the database 풟 = {푅(푐), 푆(푐)} and a set of GTGDs Σ

휏1 = 푅(푥1) → ∃푦1, 푦2 푇 (푥1, 푦1, 푦2), 휏2 = 푇 (푥1, 푥2, 푥3) → ∃푦 푈(푥1, 푥2, 푦),

휏3 = 푈(푥1, 푥2, 푥3) → 푃 (푥1) ∧ 푉 (푥1, 푥2), 휏4 = 푇 (푥1, 푥2, 푥3) ∧ 푉 (푥1, 푥2) ∧ 푆(푥1) → 푀(푥1), the tree-like chase depicted in Figure 4.2 is a chase proof of 푄 = 푃 (푐) ∧ 푀(푐). Let us call the root node of this chase 푟, its child 푐, and the leaf node 푙.

푟 푅(푐), 푆(푐) 퐼2 퐼1 퐼0 푟 푅(푐), 푆(푐) 푐 푇 (푐, 푑1, 푑2), 푅(푐), 푆(푐) 푟 푅(푐), 푆(푐)

푐 푇 (푐, 푑1, 푑2), 푅(푐), 푆(푐) 푙 푈(푐, 푑1, 푑3), 푅(푐), 푆(푐)

푟 푅(푐), 푆(푐), 푃 (푐) 퐼3 푟 푅(푐), 푃 (푐), 푆(푐), 푀(푐) 퐼4

푐 푇 (푐, 푑1, 푑2), 푅(푐), 푆(푐), 푃 (푐), 푉 (푐, 푑1) 푐 푇 (푐, 푑1, 푑2), 푅(푐), 푆(푐), 푃 (푐), 푉 (푐, 푑1), 푀(푐)

푙 푈(푐, 푑1, 푑3), 푅(푐), 푆(푐), 푃 (푐), 푉 (푐, 푑1) 푙 푈(푐, 푑1, 푑3), 푅(푐), 푆(푐), 푃 (푐), 푉 (푐, 푑1), 푀(푐)

Figure 4.2.: Tree-like chase for Example 4.3.1; triggered nodes are marked bold and red, node names are labelled at the left.

In the third step, 푟 adds on a fact 푃 (푐) by firing the full 휏3 in 푙. In order to get a corre- sponding full rule that can be fired in 푟, we first lift 휏3 so that it can be fired in the parent ′ of 푙 by composing it with the non-full 휏2 to obtain 휏1 := 푇 (푥1, 푥2, 푥3) → 푃 (푥1) ∧ 푉 (푥1, 푥2). ′ ′ Similarly, we can now compose the non-full 휏1 and full 휏1 to obtain the full 휏2 := 푅(푥1) → 푃 (푥1), which can now be fired in 푟. This idea of composing non-full and full rules to lift triggers of rules to ancestor nodes will be captured by the (ORIGINAL)-rule in the following algorithm. Continuing with our chase, 푟 adds on another fact 푀(푐) in step four by firing 휏4 in 푐. Directly composing 휏1 with 휏4 as we did before, however, does not result in a full rule since the variable 푥2 of the premise 푉 (푥1, 푥2) in 휏4 will be mapped to the existential variable ′ 푦1 of 휏1. Instead, we note that we already obtained the rule 휏1 that can be fired in 푐 ′ and discharge this premise. Hence, we first compose the full 휏1 and 휏4 to obtain the full ′ 휏3 := 푇 (푥1, 푥2, 푥3) ∧ 푆(푥1) → 푀(푥1). This idea of composing two full rules to discharge premises will be captured by the (COMPOSITION)-rule in the following algorithm. Finally, ′ ′ composing the non-full 휏1 and full 휏3 gives the full 휏4 := 푅(푥1) ∧ 푆(푥1) → 푀(푥1), which can now be fired in 푟. Lastly note that any child of 푟 uses at most hwidth(Σ) many constants, which allows us to restrict our resolution process to only derive rules with up to hwidth(Σ) many variables.

34 4.3. The Simple Saturation

This restriction will ensure termination.

Using these observations, we now present a simple saturation process for GTGDs that returns an atomic rewriting.

Definition 4.3.2 (Simple saturation). Given a set of GTGDs Σ, the simple saturation SSat(Σ) is defined as the closure of all full rules in VNF(HNF(Σ)) under the following inference rules:

• (COMPOSITION). We can resolve two full TGDs: Assume we have two full 휏, 휏 ′ ∈ SSat(Σ) of the form1

훽(⃗푥) → 휂(⃗푥), 훽′(⃗푧) → 휂′(⃗푧)

that use distinct variables (otherwise, perform a renaming). Suppose there is a uni- ′ fier 휃 of some atom in 휂 and an atom in 훽 such that ⃗푥휃∪⃗푧휃 ⊆ {푥1, . . . , 푥hwidth(HNF(Σ))}. (︀ ′ ′ )︀ Then we add the full VNF 훽휃 ∪ (훽 휃 ∖ 휂휃) → 휂 휃 .

• (ORIGINAL). We can resolve a non-full with a full TGD if this generates a full TGD: Assume we have a non-full 휏 ∈ HNF(Σ) and a full 휏 ′ ∈ SSat(Σ) of the form

훽(⃗푥) → ∃⃗푦휂(⃗푥,⃗푦), 훽′(⃗푧) → 휂′(⃗푧)

that use distinct variables (otherwise, perform a renaming). Suppose there is a unifier 휃 of some atoms in 휂 with some atoms in 훽′ such that 1. 휃 is the identity on ⃗푦, 2. 휃 only maps to variables, (︀ ′ )︀ 3. vars 훽 휃 ∖ 휂휃 ⊆ ⃗푥휃, and 4. ⃗푥휃 ∩ ⃗푦 = ∅.2 Define 휂′′ := {퐻′ ∈ 휂′ | vars(퐻′휃) ∩ ⃗푦 = ∅}. ′′ (︀ ′ ′′ )︀ Then, if 휂 is non-empty, we add the full VNF 훽휃 ∪ (훽 휃 ∖ 휂휃) → 휂 휃 .

As we will derive another, more efficient rewriting for GTGDs, we only state the key lemmas needed to prove the correctness of the simple saturation here. The full proofs can be found in AppendixA. In the following, let 푛 be the number of predicate symbols in Σ, 푎 be the maximum arity of any predicate, and let 푤 denote the width of HNF(Σ).

1Update 25.03.2021: fixed typo: previous versions took 휏, 휏 ′ from Σ rather than SSat(Σ); proofs were already done using the right definition. 2Update 25.03.2021: this fourth constraint was missing in previous versions of (ORIGINAL); the proofs were not affected by this change.

35 4. Quantifier-Free Query Answering under GTGDs

Lemma 4.3.3. Any 휎 ∈ SSat(Σ) is a full TGD in variable normal form with width(휎) ≤ 푤.

The next properties will imply evolve-completeness of SSat(Σ). Lemma 4.3.4. SSat(Σ) has the following properties:

(a) For any 휏 ∈ HNF(Σ), we have VNF(휏) ∈ SSat(Σ). ′ (b) Let 퐼0, 퐼1, 퐼2 be a chase from SSat(Σ) and 휏, 휏 ∈ SSat(Σ) be full such that

(i) |consts(퐼0)| ≤ hwidth(HNF(Σ)), (ii) firing 휏 in 퐼0 creates 퐼1, ′ (iii) firing 휏 in 퐼1 creates 퐼2. * * Then there exists 휏 ∈ SSat(Σ) such that firing 휏 in 퐼0 creates 퐼0 ∪ (퐼2 ∖ 퐼1). ′ (c) Let 퐼0, 퐼1, 퐼2 be a chase, 휏 ∈ HNF(Σ) be non-full, and 휏 ∈ SSat(Σ) be full such that

(i) firing 휏 in the root 푟 of 퐼0 creates 퐼1 and a new node 푣, ′ 1 (ii) firing 휏 in 퐹푣 creates 퐼2, 2 (iii) 퐹푟 ∖ 퐼0 ̸= ∅. * * 2 Then there exists 휏 ∈ SSat(Σ) such that firing 휏 in 퐼0 creates 퐹푟 . Proposition 4.3.5 (Completeness). SSat(Σ) is evolve-complete with respect to Σ. Lemma 4.3.6. |SSat(Σ)| ≤ 22푛푤푎 .

Lemma 4.3.7. The simple saturation algorithm terminates in PTIME for bounded 푤, 푛, 푎, EXPTIME for bounded 푎, and 2EXPTIME otherwise.

Theorem 4.3.8. The simple saturation algorithm provides a decision procedure for the Quantifier-Free Query Answering Problem under GTGDs. The procedure terminates in PTIME for fixed Σ, EXPTIME for bounded 푎, and 2EXPTIME otherwise.

We now highlight a serious weak spot of the simple saturation, which then motivates us to derive an improved rewriting algorithm for GTGDs in Section 4.4.

Example 4.3.9. Given the database 풟 = {푅(푐, 푑), 푆(푐)} and a set of GTGDs Σ (︀ )︀ 휏1 = 푅(푥1, 푥2) → ∃푦1, 푦2 푆(푥1, 푥2, 푦1, 푦2) ∧ 푇 (푥1, 푥2, 푦2) ,

휏2 = 푆(푥1, 푥2, 푥3, 푥4) → 푈(푥4),

휏3 = 푇 (푧1, 푧2, 푧3) ∧ 푈(푧3) → 푃 (푧1), it is clear that 풟, Σ |= 푃 (푐). Let us consider the steps performed by the simple saturation. In the first iteration, (ORIGINAL) does not create any new rules since the composition of 휏1, 휏2 does not contain head atoms without existential variables and composing 휏1, 휏3 creates an existential variable in 푈(푧3). Taking the (COMPOSITION)-rule, we can clearly compose 휏2, 휏3; however, it is not so clear which unifier 휃 one should consider for this. One natural way is to restrict 휃 to be an mgu of chosen atoms. In our case, we obtain the mgu 휃 = [푥4/푧3], and the resulting resolvent corresponds to 푆(푥1, 푥2, 푥3, 푥4) ∧ 푇 (푧1, 푧2, 푥4) → 푃 (푧1). But this rule contains more than hwidth(Σ) = 4 variables and hence will not be derived by (COMPOSITION). Eliminating the upper bound on the number of variables is no solution

36 4.4. The Guarded Saturation as this restriction is needed for termination purposes. Instead, every possible unifier that keeps this upper bound must be considered, leading to a series of derived rules including

푆(푥1, 푥2, 푥3, 푥4) ∧ 푇 (푥1, 푥2, 푥4) → 푃 (푥1), 푆(푥1, 푥2, 푥3, 푥4) ∧ 푇 (푥2, 푥1, 푥4) → 푃 (푥2),

푆(푥1, 푥2, 푥3, 푥4) ∧ 푇 (푥1, 푥3, 푥4) → 푃 (푧1), 푆(푥1, 푥2, 푥3, 푥4) ∧ 푇 (푥3, 푥1, 푥4) → 푃 (푥3), and so forth, which is infeasible in practice.

We hence continue our journey to find a more efficient rewriting algorithm for GTGDs.

4.4. The Guarded Saturation

Again, we describe the basic ideas of the upcoming algorithm by means of an example.

Example 4.4.1. Let us revisit the tree-like chase from Example 4.3.1, depicted in Fig- ure 4.2, that uses the GTGDs

휏1 = 푅(푥1) → ∃푦1, 푦2 푇 (푥1, 푦1, 푦2), 휏2 = 푇 (푥1, 푥2, 푥3) → ∃푦 푈(푥1, 푥2, 푦),

휏3 = 푈(푥1, 푥2, 푥3) → 푃 (푥1) ∧ 푉 (푥1, 푥2), 휏4 = 푇 (푥1, 푥2, 푥3) ∧ 푉 (푥1, 푥2) ∧ 푆(푥1) → 푀(푥1),

As before, we first lift the full 휏3 by composing it with the non-full 휏2 to obtain the full ′ 휏1 := 푇 (푥1, 푥2, 푥3) → 푃 (푥1) ∧ 푉 (푥1, 푥2). We then continue by composing the non-full 휏1 ′ ′ with 휏1; this time, however, we do not only derive the full 휏2 := 푅(푥1) → 푃 (푥1) at this step, * (︀ )︀ but also add the non-full 휏1 := 푅(푥1) → ∃푦1, 푦2 푇 (푥1, 푦1, 푦2) ∧ 푉 (푥1, 푦1) . The idea is to collect the derivable non-full head atoms so that we can eliminate the (COMPOSITION)- step of our previous algorithm. Indeed, instead of first discharging the premise 푉 (푥1, 푥2) ′ * from 휏4 by composing it with 휏1, we can now directly compose 휏1 with 휏4 to obtain the ′ desired full 휏4 := 푅(푥1) ∧ 푆(푥1) → 푀(푥1).

Remark 4.4.2. The unification issue of Example 4.3.9 now disappears: we first compose ′ (︀ )︀ 휏1, 휏2 to obtain 휏1 := 푅(푥1, 푥2) → ∃푦1, 푦2 푆(푥1, 푥2, 푦1, 푦2) ∧ 푇 (푥1, 푥2, 푦2) ∧ 푈(푦2) , and ′ ′ then compose 휏1, 휏3 to obtain 휏2 := 푅(푥1, 푥2) → 푃 (푥1). Using these observations, we now present our improved rewriting algorithm for GTGDs.

Definition 4.4.3 (Guarded saturation). We define the following inference rule:

• (EVOLVE). We can compose a non-full with a full TGD: Assume we have a non-full 휏 and a full 휏 ′ of the form

훽(⃗푥) → ∃⃗푦휂(⃗푥,⃗푦), 훽′(⃗푧) → 휂′(⃗푧),

and assume that 휏, 휏 ′ use distinct variables (otherwise, perform a renaming). Sup- ′ ′ ′ pose 휃 is an mgu of atoms 푆 := (퐻1, . . . , 퐻푛) and 푆 := (퐵1, . . . , 퐵푛) that is the

37 4. Quantifier-Free Query Answering under GTGDs

3 ′ ′ identity on ⃗푦, where 퐻푖 ∈ 휂, 퐵푖 ∈ 훽 for 1 ≤ 푖 ≤ 푛. Moreover, assume that the following existential variable check (evc) is satisfied:

Any atom 퐵′ ∈ 훽′ with vars(퐵′휃) ∩ ⃗푦 ̸= ∅ occurs in 푆′ and ⃗푥휃 ∩ ⃗푦 = ∅.

Now define

′′ (︀ (︀ ′ ′ ′ )︀)︀ 훽 := 훽 ∪ 훽 ∖ {퐵1, . . . , 퐵푛} 휃, 휂′′ := (휂 ∪ 휂′)휃.

We then add the rule(s) VNF(HNF(훽′′ → ∃⃗푦휂′′)).

Given a set of GTGDs Σ, let GSat∃(Σ) be the closure of VNF(HNF(Σ)) under the given inference rule. The guarded saturation GSat(Σ) is obtained from GSat∃(Σ) by dropping all non-full rules.

Before we verify the correctness of the saturation, we derive two properties that follow from our restrictions imposed on (EVOLVE).

Lemma 4.4.4. Assume we perform an (EVOLVE) inference on 휏 and 휏 ′, 휏 is in head normal form, and 휏 ′ is guarded by 퐺′. Then 퐺′ occurs in 푆′.

Proof. Let 퐵′ be an atom in 푆′ that unifies with some 퐻 in 푆. As 휏 is in head normal ′ form and 휃 is the identity on ⃗푦, we derive that 퐻휃 contains some 푦푖. As 퐵 unifies with 퐻, we get vars(퐵′휃) ∩ ⃗푦 ≠ ∅. Finally, since 퐺′ guards 휏 ′, we get vars(퐵′휃) ⊆ vars(퐺′휃) and hence vars(퐺′휃) ∩ ⃗푦 ̸= ∅. Thus, by the evc, 퐺′ occurs in 푆′.

Proposition 4.4.5. Let 푤푏 := bwidth(HNF(Σ)), and 푤ℎ := hwidth(HNF(Σ)). Then any 휎 ∈ GSat∃(Σ) is a GTGD in variable normal form and head normal form, bwidth(휎) ≤ 푤푏, and hwidth(휎) ≤ 푤ℎ.

Proof. We prove the claim inductively. Clearly, all rules in VNF(HNF(Σ)) satisfy the conditions and all derived rules will be in variable and head normal form due to the final VNF(HNF(·)) application. Assume we create 휎 ∈ VNF(HNF(훽′′ → ∃⃗푦휂′′)) by (EVOLVE) from some non-full ′ ′ ′ 휏 = 훽 → ∃⃗푦휂 and full 휏 = 훽 → 휂 in GSat∃(Σ). By the inductive hypothesis, 휏 is in head normal form and 휏 ′ is guarded by some 퐺′. Hence, by Lemma 4.4.4, 퐺′ occurs in 푆′. Let 퐻 be the atom in 푆 that is unified with 퐺′ using 휃. By Lemma 2.1.14, 휃 does not introduce functions in 휎. As 퐺′ guards 휏 ′, 퐻휃 = 퐺′휃, and 휃 is the identity on ⃗푦, we get

vars(휂′휃) ⊆ vars(훽′휃) = vars(퐺′휃) = vars(퐻휃) ⊆ vars(휂휃) ⊆ vars(훽휃) ∪ ⃗푦.

From this, we can derive that vars(훽′′) ⊆ vars(훽휃) ∪ ⃗푦 and vars(휂′′) = vars(휂휃). Since the evc is satisfied, the former refines to vars(훽′′) = vars(훽휃). By the inductive hypothesis, we have bwidth(휏) ≤ 푤푏 and hwidth(휏) ≤ 푤ℎ, and hence bwidth(휎) ≤ 푤푏 and hwidth(휎) ≤ 푤ℎ. Finally, again by the inductive hypothesis, 휏 is guarded by some 퐺, which will also be a guard for 휎. 3Such an mgu can be obtained by treating ⃗푦 as constants.

38 4.4. The Guarded Saturation

4.4.1. Correctness To prove the correctness of our algorithm, we have to verify that our rewriting is sound and complete. We begin with the former.

Lemma 4.4.6. GSat∃(Σ) is sound with respect to Σ.

Proof. By Lemma 4.1.5, it suffices to show that GSat∃(Σ) is sound with respect to HNF(Σ). For this, it suffices to show that HNF(Σ) |= GSat∃(Σ). Assume we derive 휎 ∈ VNF(HNF(훽′′ → ∃⃗푦휂′′)) from some 휏 = 훽(⃗푥) → ∃⃗푦휂 and 휏 ′ = 훽′(⃗푧) → 휂′ using ′ ′ ′ (EVOLVE). Let 퐻1, . . . , 퐻푛 and 퐵1, . . . , 퐵푛 be the resolved atoms in 휂 and 훽 , respectively, let 휃 be the used mgu, and let 풜 be a structure with 풜 |= 휏, 휏 ′. Again by Lemma 4.1.5, it suffices to show that 풜 |= 휎′, where 휎′ := 훽′′ → ∃⃗푦휂′′. Without loss of generality, assume 휎′ is in variable normal form. Then by Proposition 4.4.5 (and its proof), we know that (a) 휎′ is function-free (as it is a GTGD),

(b) 푥푖휃 = 푥푗푖 for any 푥푖 ∈ ⃗푥, and (c) 푧푖휃 ∈ ⃗푥 or 푧푖휃 ∈ ⃗푦 for any 푧푖 ∈ ⃗푧. 푧 , . . . , 푧 ⃗푧 푧 휃 = 푥 푧 , . . . , 푧 Let 푙1 푙푚 be all variables in with 푙푖 푘푖 , and let 푙푚+1 푙푚+푚′ be the remaining variables with 푧푙푚+푖 휃 = 푦푘푚+푖 . ′′ Now assume 풜 |= 훽 [푐1/푥1, . . . , 푐푎/푥푎] for some 푐1, . . . , 푐푎 ∈ dom(풜). Then by definition of 훽′′, we get

풜 |= 훽[푐푗1 /푥1, . . . , 푐푗푎 /푥푎], ′ ′ ′ 풜 |= (훽 ∖ {퐵1, . . . , 퐵푛})[푐푘1 /푧푙1 , . . . , 푐푘푚 /푧푙푚 ]. (4.1)

The former implies 풜 |= ∃⃗푦휂[푐푗1 /푥1, . . . , 푐푗푎 /푥푎], and thus

′ ′ 풜 |= 휂[푐푗1 /푥1, . . . , 푐푗푎 /푥푎, 푑1/푦1, . . . , 푑푎 /푦푎 ] for some 푑1, . . . , 푑푎′ ∈ dom(풜). This in return implies

풜 |= 휂휃[푐1/푥1, . . . , 푐푎/푥푎, 푑1/푦1, . . . , 푑푎′ /푦푎′ ]. (4.2)

′ As 퐻푖휃 = 퐵푖휃, we get

풜 |= 퐵 [푐 /푧 , . . . , 푐 /푧 , 푑 /푧 , . . . , 푑 /푧 ] 푖 푘1 푙1 푘푚 푙푚 푘푚+1 푙푚+1 푘푚+푚′ 푙푚+푚′ for 1 ≤ 푖 ≤ 푛. Together with (4.1), we can thus derive that

풜 |= 훽′[푐 /푧 , . . . , 푐 /푧 , 푑 /푧 , . . . , 푑 /푧 ], 푘1 푙1 푘푚 푙푚 푘푚+1 푙푚+1 푘푚+푚′ 푙푚+푚′ and consequently

풜 |= 휂′[푐 /푧 , . . . , 푐 /푧 , 푑 /푧 , . . . , 푑 /푧 ]. 푘1 푙1 푘푚 푙푚 푘푚+1 푙푚+1 푘푚+푚′ 푙푚+푚′

This implies that

′ 풜 |= 휂 휃[푐1/푥1, . . . , 푐푎/푥푎, 푑1/푦1, . . . , 푑푎′ /푦푎′ ]. (4.3)

39 4. Quantifier-Free Query Answering under GTGDs

Now by (4.2) and (4.3), we get

′′ 풜 |= 휂 [푐1/푥1, . . . , 푐푎/푥푎, 푑1/푦1, . . . , 푑푎′ /푦푎′ ],

′′ and thus 풜 |= ∃⃗푦휂 [푐1/푥1, . . . , 푐푎/푥푎].

Corollary 4.4.7 (Soundness). GSat(Σ) is sound with respect to Σ.

Proof. By Lemma 4.4.6, GSat∃(Σ) is sound with respect to Σ, and hence, so is GSat(Σ) ⊆ GSat∃(Σ).

The next lemma will be required multiple times during our completeness proof and complexity analysis. Lemma 4.4.8. Let 휃, 휃′ be substitutions that are the identity on ⃗푦, and let 훿 be a substitution such that 휃′ = 휃훿. Then 훿 is the identity on ⃗푦.

′ Proof. Assume otherwise. Then there is 푦푖 ∈ ⃗푦 with 푦푖훿 ̸= 푦푖. Now as 휃, 휃 are the identity ′ on ⃗푦, we get 푦푖 = 푦푖휃 = 푦푖휃훿 = 푦푖훿 ̸= 푦푖, a contradiction.

To show completeness, we will have to make use of the following properties.

Lemma 4.4.9. GSat∃(Σ) has the following properties:

(a) For every 휏 ∈ HNF(Σ), we have VNF(휏) ∈ GSat∃(Σ). ′ (b) Let 퐼0, 퐼1, 퐼2 be a chase from GSat∃(Σ), 휏 be non-full, and 휏 be full such that

(i) firing 휏 in the root 푟 of 퐼0 creates 퐼1 and a new node 푣, ′ 1 (ii) firing 휏 in 퐹푣 creates 퐼2, * * Then there exists a non-full 휏∃ ∈ GSat∃(Σ) such that firing 휏∃ ∈ GSat∃(Σ) in 퐼0 2 2 2 * creates 퐹푣 ∖ 퐹푟 . Furthermore, if 퐹푟 ̸= 퐼0, then there exists a full 휏 ∈ GSat∃(Σ) such * 2 that firing 휏 in 퐼0 creates 퐹푟 ∖ 퐼0.

Proof. The first property follows from the base case of the algorithm. For the second 1 ⃗ property, without loss of generality, assume consts(퐼0) = ⃗푐 and consts(퐹푣 ) = ⃗푐 ∪ 푑, where ⃗ ⃗푐 ⊆ ⃗푐, and 푑 are fresh constants. We have 휏 = 훽(⃗푥) → ∃⃗푦휂(⃗푥,⃗푦), 휏 ′ = 훽′(⃗푧) → 휂′(⃗푧), and 휏 is in head normal form by Proposition 4.4.5. Let ℎ, ℎ′ be the triggers used for 휏, 휏 ′. ′ ′ * * ′ If ℎ (훽 ) ⊆ 퐼0, we simply set 휏∃ := 휏 and 휏 := 휏 ; If, however, the chase step makes use of ′ ′ ′ some facts in 퐼1 ∖ 퐼0, let 퐵1, . . . , 퐵푛 be all atoms in 훽 such that for each 1 ≤ 푖 ≤ 푛, there ′ ′ is 퐻푖 ∈ 휂 with ℎ(퐻푖) = ℎ (퐵푖). Then the substitution 휃 defined by ⎧ 푥푖, if 푣 ∈ ⃗푥 and ℎ(푣) = 푐푖 ⎨⎪ ′ 푣휃 := 푥푖, if 푣 ∈ ⃗푧 and ℎ (푣) = 푐푖 ⎪ ′ ⎩푦푖, if 푣 ∈ ⃗푧 and ℎ (푣) = ℎ(푦푖)

40 4.4. The Guarded Saturation

′ ′ is a unifier of 퐻1, . . . , 퐻푛 and 퐵1, . . . , 퐵푛 that is the identity on ⃗푦 and ⃗푥휃 ∩ ⃗푦 = ∅. Hence, * ′ ′ there is an mgu 휃 of 퐻1, . . . , 퐻푛 and 퐵1, . . . , 퐵푛 that is the identity on ⃗푦 (by treating ⃗푦 as constants). Thus, there is a substitution 훿 such that 휃* = 휃훿. By Lemma 4.4.8, 훿 is the identity on ⃗푦. Claim 1. ⃗푥휃* ∩ ⃗푦 = ∅.

* Subproof. Assume otherwise. Then for some 푥푖 ∈ ⃗푥, we get 푥푖휃 = 푦푗, and thus * 푥푖휃 = 푥푖휃 훿 = 푦푗훿∈ / ⃗푦 since ⃗푥휃 ∩ ⃗푦 = ∅, contradicting the fact that 훿 is the identity on ⃗푦. 

′ ′ ′ * ′ ′ Claim 2. Any 퐵 ∈ 훽 with vars(퐵 휃 ) ∩ ⃗푦 ̸= ∅ occurs in 퐵1, . . . , 퐵푛.

Subproof. If vars(퐵′휃*) ∩ ⃗푦 ≠ ∅, then also vars(퐵′휃) = vars(퐵′휃*훿) ∩ ⃗푦 ̸= ∅ as 훿 is the identity on ⃗푦. Thus, by construction of 휃, consts(ℎ′(퐵′)) ∩ 푑⃗ ̸= ∅, and hence, ℎ′(퐵′) ∈ 퐼1 ∖ 퐼0 ⊆ ℎ(휂). 

Using Claim1 and2, we can derive that 휃* satisfies the evc. Now define

′′ (︀ (︀ ′ ′ ′ )︀)︀ * 훽 := 훽 ∪ 훽 ∖ {퐵1, . . . , 퐵푛} 휃 , 휂′ := {퐻′ ∈ 휂′ | vars(퐻′휃*) ∩ ⃗푦 = ∅}.  Then using (EVOLVE), we get the non-full

휏 * = (︀훽* → ∃⃗푦휂*)︀ = VNF(︀훽′′ → ∃⃗푦 (︀휂 ∪ (휂′ ∖ 휂′ ))︀휃*)︀, ∃ ∃  휂′ ̸= ∅ and, if  , the full 휏 * = (훽* → 휂*) = VNF(︀훽′′ → 휂′ 휃*)︀.  * Let 휌 be the renaming used by VNF(·) for 휏∃ . Note that 휌 is the identity on ⃗푦 as 휏 is in variable normal form and 휃* is the identity on ⃗푦. Thus, 휌 is also the renaming used by * ′′ ′′ ′′ VNF(·) for 휏 . Let ℎ be the trigger defined by ℎ (푥푖) := 푐푖 and ℎ (푦푖) = ℎ(푦푖), and set ℎ* := ℎ′′ ∘ 훿 ∘ 휌−1.

′′ Claim 3. For any 푣 ∈ ⃗푥 ∪ ⃗푦, we have ℎ (푣휃) = ℎ(푣), and for any 푧푖 ∈ ⃗푧, we have ′′ ′ ℎ (푧푖휃) = ℎ (푧푖).

Subproof. Pick some 푥푖 ∈ ⃗푥 and assume ℎ(푥푖) = 푐푗. Then 푥푖휃 = 푥푗 by definition of 휃. ′′ ′′ ′′ ′′ Further, by definition of ℎ , ℎ (푥푗) = 푐푗. Hence, ℎ (푥푖휃) = ℎ (푥푗) = 푐푗 = ℎ(푥푖). ′′ ′′ Now pick some 푦푖 ∈ ⃗푦. Then 푦푖휃 = 푦푖, and by definition of ℎ , ℎ (푦푖) = ℎ(푦푖). Hence, ′′ ′′ ℎ (푦푖휃) = ℎ (푦푖) = ℎ(푦푖). ′ Lastly, pick some 푧푖 ∈ ⃗푧. The case ℎ (푧푖) = 푐푗 is analogous to the first case. In case ′ ℎ (푧푖) = 푑푗, there must be 푦푘 ∈ ⃗푦 with ℎ(푦푘) = 푑푗. Then 푧푖휃 = 푦푘 by definition of 휃. Hence, ′′ ′′ ′ ℎ (푧푖휃) = ℎ (푦푘) = ℎ(푦푘) = 푑푗 = ℎ (푧푖). 

* * Claim 4. ℎ (훽 ) ⊆ 퐼0.

41 4. Quantifier-Free Query Answering under GTGDs

Subproof. We have

* * ′′ −1 * ′′ ′′ ′′(︁(︀ (︀ ′ ′ ′ )︀)︀ * )︁ ℎ (훽 ) = ℎ ∘ 훿 ∘ 휌 (훽 ) = ℎ ∘ 훿(훽 ) = ℎ 훽 ∪ 훽 ∖ {퐵1, . . . , 퐵푛} 휃 훿 ′′(︁(︀ (︀ ′ ′ ′ )︀)︀ )︁ ′′(︁ (︀ ′ ′ ′ )︀ )︁ = ℎ 훽 ∪ 훽 ∖ {퐵1, . . . , 퐵푛} 휃 = ℎ 훽휃 ∪ 훽 ∖ {퐵1, . . . , 퐵푛} 휃

′′ ′′(︀(︀ ′ ′ ′ )︀ )︀ 퐶푙푎푖푚 3 ′(︀ ′ ′ ′ )︀ = ℎ (훽휃) ∪ ℎ 훽 ∖ {퐵1, . . . , 퐵푛} 휃 = ℎ(훽) ∪ ℎ 훽 ∖ {퐵1, . . . , 퐵푛} ⊆ 퐼0 ∪ (퐼1 ∖ ℎ(휂)) = 퐼0,

′ ′ ′ ′ where the last step follows from the choice of 퐵1, . . . , 퐵푛; that is, if ℎ (퐵 ) ∈ 퐼1 ∖ 퐼0 ⊆ ℎ(휂) ′ ′ ′ ′ ′ for some 퐵 ∈ 훽 , then 퐵 occurs in 퐵1, . . . , 퐵푛.  * * 2 2 Claim 5. ℎ (휂∃) = 퐹푣 ∖ 퐹푟 . Subproof. We have (︁ )︁ (︁ )︁ ℎ*(휂*) = ℎ′′ ∘ 훿 ∘ 휌−1(휂*) = ℎ′′ (︀휂 ∪ (휂′ ∖ 휂′ ))︀휃*훿 = ℎ′′ (︀휂 ∪ (휂′ ∖ 휂′ ))︀휃 ∃ ∃   = ℎ′′(휂휃) ∪ ℎ′′((휂′ ∖ 휂′ )휃) 퐶푙푎푖푚= 3 ℎ(휂) ∪ ℎ′(휂′ ∖ 휂′ ) = (퐼 ∖ 퐼 ) ∪ ℎ′(휂′ ∖ 휂′ ).   1 0  퐻′ ∈ 휂′ 퐻′ ∈ (휂′ ∖ 휂′ ) vars(퐻′휃*) ∩ ⃗푦 ̸= ∅ Now for any , we have  if and only if if and only if ′ ′ ⃗ ′ ′ 2 consts(ℎ (퐻 )) ∩ 푑 ̸= ∅ if and only if ℎ (퐻 ) ∈/ 퐹푟 , and hence

(퐼 ∖ 퐼 ) ∪ ℎ′(휂′ ∖ 휂′ ) = (퐼 ∖ 퐼 ) ∪ (︀(퐼 ∖ 퐼 ) ∖ 퐹 2)︀ = 퐹 2 ∖ 퐹 2. 1 0  1 0 2 1 푟 푣 푟  * * 2 Claim 6. ℎ (휂 ) = 퐹푟 ∖ 퐼0.

Subproof. We have

ℎ*(휂*) = ℎ′′ ∘ 훿 ∘ 휌−1(휂*) = ℎ′′(휂′ 휃*훿) = ℎ′′(휂′ 휃) 퐶푙푎푖푚= 3 ℎ′(휂′ ).    퐻′ ∈ 휂′ 퐻′ ∈ 휂′ vars(퐻′휃*) ∩ ⃗푦 = ∅ Now for any , we have  if and only if if and only if ′ ′ ⃗ ′ ′ 2 consts(ℎ (퐻 )) ∩ 푑 = ∅ if and only if ℎ (퐻 ) ∈ 퐹푟 , and hence

ℎ′(휂′ ) = (퐼 ∖ 퐼 ) ∩ 퐹 2 = 퐹 2 ∖ 퐼 .  2 1 푟 푟 0  * * 2 2 Using Claim4 to6, we derive that firing 휏∃ based on ℎ in 퐼0 creates 퐹푣 ∖ 퐹푟 and * * 2 firing 휏 based on ℎ in 퐼0 creates 퐹푟 ∖ 퐼0.

Proposition 4.4.10 (Completeness). GSat(Σ) is evolve-complete with respect to Σ.

Proof. To show Property 4.2.1.(a), we use Lemma 4.4.9.(a) to see that VNF(휏) ∈ GSat∃(Σ) for every full 휏 ∈ HNF(Σ), and hence, VNF(휏) ∈ GSat(Σ). For Property 4.2.2.(a), we additionally prove that there is a non-full 휏 ∈ GSat∃(Σ) such 푛 푛+1 푛+1 that firing 휏∃ in 퐹푟 creates 퐹푣 ∖ 퐹푟 . We prove the claim by induction on 푛. For 푛 = 0, we can simply set 휏∃ := 휏푟 using that 휏푟 is in head normal form. ′ ′ If 푛 > 0, by the inductive hypothesis, we get a chase 퐼0, . . . , 퐼푚 from 퐼0 and GSat(Σ) 푛 푛−1 푛 푛 of 퐹푟 and a non-full 휏∃ ∈ GSat∃(Σ) such that firing 휏∃ in 퐹푟 creates 퐹푣 ∖ 퐹푟 . Let

42 4.4. The Guarded Saturation

푛 ′′ ′′ 푛 ′′ 퐹푟 , 퐼1 , 퐼2 be the chase obtained by firing 휏∃ in 퐹푟 and 휏푛 in 퐼1 . Then we can apply 푛 ′′ ′′ ′ 푛+1 푛 Lemma 4.4.9.(b) on 퐹푟 , 퐼1 , 퐼2 to obtain a non-full 휏∃ ∈ GSat∃(Σ) and, if 퐹푟 ∖ 퐹푟 ̸= ∅, a ′ ′ 푛 푛+1 푛+1 ′ 푛 full 휏 ∈ GSat(Σ) such that firing 휏∃ in 퐹푟 creates 퐹푣 ∖ 퐹푟 and firing 휏 in 퐹푟 creates 푛+1 푛 ′ ′ 푛+1 퐹푟 ∖ 퐹푟 . Hence, we can fire 휏 in 퐼푚 to create 퐹푟 .

4.4.2. Complexity Analysis

In this section, we discuss the computational aspects and complexity of the algorithm. We first derive some useful properties that we can exploit in our computations.

Lemma 4.4.11. (EVOLVE) inferences are permutation independent of 푆, 푆′. More ′ ′ ′ precisely, for any 푆 = 퐻1, . . . , 퐻푛, 푆 = 퐵1, . . . , 퐵푛, and permutation 휎 ∈ 푆푛, where 푆푛 is the symmetric group with n elements, the rules derived using (EVOLVE) under 푆, 푆′ and ′ ′ 퐻휎(1), . . . , 퐻휎(푛), 퐵휎(1), . . . , 퐵휎(푛) coincide.

Proof. The evc does not consider the order of the sequences and the mgus of permuted sequences coincide.

′ Lemma 4.4.12. Assume we perform an (EVOLVE) inference on 휏, 휏 ∈ GSat∃(Σ). Then vars(퐵′휃) ∩ ⃗푦 ̸= ∅ for any 퐵′ in 푆′.

Proof. By Proposition 4.4.5, 휏 is in HNF. Using that 휃 is the identity on ⃗푦, we get that ′ ′ any atom in 휂휃 contains some 푦푖. As any 퐵 in 푆 unifies with some atom in 휂 using 휃, this implies vars(퐵′휃) ∩ ⃗푦 ̸= ∅.

′ Corollary 4.4.13. Assume we perform an (EVOLVE) inference on 휏, 휏 ∈ GSat∃(Σ). Then 퐵′ occurs in 푆′ if and only if vars(퐵′휃) ∩ ⃗푦 ̸= ∅.

Proof. Left to right is Lemma 4.4.12. Right to left follows from the evc.

Although we know that any guard 퐺′ of 휏 ′ must occur in 푆′ by Lemma 4.4.4, it is not yet clear how one determines the remaining atoms of 푆, 푆′ needed for an (EVOLVE) inference. The next proposition (and its proof) will be a useful helper in this regard.

Proposition 4.4.14. Assume we want to use (EVOLVE) on 휏 = 훽(⃗푥) → ∃⃗푦휂 and ′ ′ ′ ′ ′ 휏 = 훽 → 휂 in GSat∃(Σ) while unifying the guard 퐺 ∈ 훽 with some 퐻 ∈ 휂. Then there exists at most one sequence 푆′ (up to permutation) such that we can satisfy all restrictions imposed on (EVOLVE).

Proof. Assume some 푆, 푆′, 휃 satisfy all restrictions for an (EVOLVE) inference and 휃 ′ unifies 퐺 with 퐻. Due to Lemma 4.4.11, we can assume that 푆 = 퐻, 퐻2, . . . , 퐻푛 and ′ ′ ′ ′ ′ 푆 = 퐺 , 퐵2, . . . , 퐵푛. Since 퐻휃 = 퐺 휃 and 휃 is the identity on ⃗푦, there must be an mgu of 퐺′ and 퐻 that is the identity on ⃗푦 (by treating ⃗푦 as constants). Fix such an mgu, call it 휃*. Then there is 훿 such that 휃 = 휃*훿, and by Lemma 4.4.8, 훿 must be the identity on ⃗푦.

Claim 1. For any 푣 ∈ vars(훽′), we have 푣휃 ∈ ⃗푦 if and only if 푣휃* ∈ ⃗푦.

43 4. Quantifier-Free Query Answering under GTGDs

Subproof. Since 퐺′ guards 휏 ′, we have 푣휃* ∈ vars(퐺′휃*) = vars(퐻휃*) ⊆ ⃗푥휃* ∪ ⃗푦. Now if 푣휃* ∈ ⃗푦, then also 푣휃*훿 = 푣휃 ∈ ⃗푦 as 훿 is the identity on ⃗푦. If 푣휃* ∈ ⃗푥휃′, then * * * 푣휃 훿 ∈ ⃗푥휃 훿 = ⃗푥휃, and as ⃗푥휃 ∩ ⃗푦 = ∅, this implies 푣휃 훿 = 푣휃∈ / ⃗푦.  By Corollary 4.4.13, 퐵′ occurs in 푆′ if and only if vars(퐵′휃) ∩ ⃗푦 ̸= ∅. Thus, by Claim1, we can exactly obtain 푆′ (up to permutation) by selecting all atoms 퐵′ ∈ 훽′ with vars(퐵′휃*) ∩ ⃗푦 ̸= ∅. Now 휃* was chosen arbitrarily, and hence, the choice of 푆′ is unique once we fix 퐺′ and 퐻.

Let us continue with the complexity of our saturation algorithm. In the following, let 푛 be the number of predicate symbols in Σ, 푎 be the maximum arity of any predicate in Σ, 푤푏 := bwidth(HNF(Σ)), and 푤ℎ := hwidth(HNF(Σ)). Lemma 4.4.15. Any TGD 휏 in variable normal form contains at most 푛(bwidth(휏)푎 + hwidth(휏)푎) atoms.

Proof. For any fixed predicate symbol, we have at most bwidth(휏)푎 and hwidth(휏)푎 different instantiations in the body and head of 휏, respectively.

푛(푤푎+푤푎) Corollary 4.4.16. |GSat∃(Σ)| ≤ 2 푏 ℎ .

Proof. By Proposition 4.4.5, GSat∃(Σ) is a set of GTGDs in variable normal form with bwidth(휏) ≤ 푤푏 and hwidth(휏) ≤ 푤ℎ. A TGD consists of a subset of all possible body and head atoms. Now the claim follows by Lemma 4.4.15. Corollary 4.4.17. The guarded saturation GSat(Σ) is an atomic rewriting of Σ.

Proof. By Corollary 4.2.6 and 4.4.7 and Proposition 4.4.10.

′ Lemma 4.4.18. Let 휏, 휏 ∈ GSat∃(Σ). Then all (EVOLVE)-resolvents of 휏 = 훽 → 휂, ′ ′ ′ 휏 = 훽 → 휂 can be obtained in PTIME for bounded 푤푏, 푛, 푎, EXPTIME for bounded 푎, and 2EXPTIME otherwise.

Proof. Let 퐺′ ∈ 훽′ be a guard. Using our observations in Proposition 4.4.14, we first check for any 퐻 ∈ 휂, whether there is an mgu 휃* of 퐺′ and 퐻 that is the identity on ⃗푦 (by treating ′ ′ ′ ′ ′ * ⃗푦 as constants). We then select all atoms 퐵2, . . . , 퐵푚 ∈ 훽 ∖ {퐺 } with vars(퐵푖휃 ) ∩ ⃗푦 ̸= ∅. These first steps can be done in linear time using Lemma 2.1.13. Also note that the ′ ′ ′ ′ ′ exact choice of 퐺 and the order of 푆 := 퐺 , 퐵2, . . . , 퐵푚 are irrelevant by Lemmas 4.4.4 and 4.4.11. ′ ′ Now for any 퐵푖 ̸= 퐺 , we have at most |휂| possible counterparts in 휂. We check for any possible combination if we can extend 휃* to an mgu 휃 of the sequences that is the identity on ⃗푦 (again, by treating ⃗푦 as constants) and ⃗푥휃 ∩ ⃗푦 = ∅. Finally, we resolve the rules and normalise the result to obtain a new rule. Using Lemmas 2.1.13 and 4.1.3 and Definition 4.1.4, these steps can again be done in linear time. Since 휏 contains at 푎 ′ ′ 푎 most 푠 := 푛푤푏 body atoms and 휏 at most 푠 := 푛푤ℎ head atoms, we have to repeat above 푠 steps at most 푠′ times. Hence, we obtain the claimed complexity bounds. Lemma 4.4.19. The guarded saturation algorithm terminates in PTIME for bounded 푤, 푛, 푎, EXPTIME for bounded 푎, and 2EXPTIME otherwise.

44 4.4. The Guarded Saturation

Proof. The initial variable and head normal form transformations can be done in linear time by Lemma 4.1.3 and Definition 4.1.4. The algorithm then takes all pairs in GSat∃(Σ) and performs all (EVOLVE)-inferences. By Corollary 4.4.16, this process takes at most 푐 := 푛(푤푎+푤푎) 2 2 푏 ℎ iterations, and at any point, there are at most 풪(푐 ) pairs. By Lemma 4.4.18, the complexity for each 휏, 휏 ′ is at most as high as the stated bounds. Thus, we obtain the claimed complexity bounds.

Finally, we can combine our results to obtain a decision procedure. Theorem 4.4.20. The guarded saturation algorithm provides a decision procedure for the Quantifier-Free Query Answering Problem under GTGDs. The procedure terminates in PTIME for fixed Σ, EXPTIME for bounded 푎, and 2EXPTIME otherwise.

Proof. By Proposition 4.4.5, the number of variables for each 휏 ∈ GSat(Σ) is bounded by 푤푏. Now the claim follows from Proposition 2.5.1, Corollary 4.4.17, and Lemma 4.4.19.

Moreover, our results match the known lower bounds from the literature, and hence, our algorithm is theoretically optimal. In fact, the lower bounds already hold for a simpler problem setting in which only atomic queries are considered:

Theorem 4.4.21 ([Calì et al., 2013]). Given a set of GTGDs Σ, the Atomic Query An- swering Problem under Σ is PTIME-complete for fixed Σ, EXPTIME-complete for bounded 푎, and 2EXPTIME-complete in general. It remains 2EXPTIME-complete even if 푛 is bounded.

Remark 4.4.22. Calì et al.[2013] in fact state that the Atomic Query Answering Problem already lowers to PTIME in case of bounded 푎 and 푛; however, they only sketch this result for the case of single-headed GTGDs. Later on, they extend their results to the case of multiple head atoms, but neglect the case of bounded 푎 and 푛. In particular, we believe that the number of clouds, as described in their paper, is still exponential in case of bounded 푎 and 푛 due to the unrestricted number of existential variables. We hence do not include this result, but leave it as an open question whether the problem indeed lowers to PTIME in this case.

45

5. Quantifier-Free Query Answering under DisGTGDs

In this chapter, we extend the results that we derived for GTGDs to DisGTGDs. The structure of this chapter follows that of the previous one: we first introduce a property that will imply completeness and then present a sound resolution-based algorithm that satisfies this property.

5.1. An Abstract Property for Completeness

Recall that a DisTGD is of the form 푛 ⋁︁ 훽(⃗푥) → ∃⃗푦푖 휂푖(⃗푥,⃗푦푖). 푖=1

We might first want to generalise our head normal form definition to DisTGDs to minimise the number of body variables occurring in an existential head conjunction; however, due to the introduction of disjunctions, we might lose completeness by naively splitting the head into a non-full and full part like we did for TGDs.

Example 5.1.1. Consider the DisTGDs Σ

퐵(푥) → ∃푦(︀푃 (푥, 푦) ∧ 푅(푥))︀ ∨ ∃푦(︀푆(푥, 푦) ∧ 푇 (푥))︀,

푆(푥1, 푥2) ∧ 푇 (푥1) → 푅(푥1) and the set of rules Σ′ obtained by naively generalising our head normal form

퐵(푥) → ∃푦 푃 (푥, 푦) ∨ ∃푦 푆(푥, 푦), 퐵(푥) → 푅(푥) ∨ 푇 (푥),

푆(푥1, 푥2) ∧ 푇 (푥1) → 푅(푥1).

Let 풟 := {퐵(푐)}. Then 풟, Σ |= 푅(푐) but 풟, Σ′ ̸|= 푅(푐) since, for example, the structure ℳ := {퐵(푐), 푃 (푐, 푑), 푇 (푐)} models 풟, Σ′ but not 푅(푐). Note, however, that all rules in Σ′ are consequences of rules in Σ. That is, Σ′ is a sound rewriting.

We thus do not impose an analogous head normal form on DisTGDs. Instead, we introduce the notion of immediate full consequences.

47 5. Quantifier-Free Query Answering under DisGTGDs

Definition 5.1.2 (DisTGD immediate full consequences). Given a DisTGD

푛 ⋁︁ 휏 = 훽(⃗푥) → ∃⃗푦푖휂푖(⃗푥,⃗푦푖), 푖=1

′ ′ let 휂푖 consist of all atoms 퐻 ∈ 휂푖 that contain no 푦푖, and suppose that each 휂푖 is non-empty. ⋁︀푛 ′ Then the immediate full consequence of 휏 is defined by IFC(휏) := 훽(⃗푥) → 푖=1 휂푖(⃗푥). Given a set of DisTGDs Σ, we write IFC(Σ) for the set of immediate full consequences following from all rules in Σ. Example 5.1.3. The immediate full consequence of (︀ )︀ 퐵(푥) → ∃푦 퐻1(푥, 푦) ∧ 퐻2(푥) ∧ 퐻3(푥) ∨ 퐻4(푥) is (︀ )︀ 퐵(푥) → 퐻2(푥) ∧ 퐻3(푥) ∨ 퐻4(푥). Lemma 5.1.4. Let Σ be a set of DisTGDs. Then IFC(Σ) is sound with respect to Σ. (︀ ⋁︀푛 )︀ Proof. Let 휏 = 훽(⃗푥) → 푖=1 ∃⃗푦푖 휂푖(⃗푥,⃗푦푖) ∈ Σ such that IFC(휏) exists, and let 풜 be a structure with 풜 |= 휏. We have to show that 풜 |= IFC(휏). Assume 풜 |= 훽(⃗푐) for some ⃗ ⃗푐 ⊆ dom(풜). Then 풜 |= ∃⃗푦푖 휂푖(⃗푐,⃗푦푖) for some 휂푖. Hence, there is 푑 ⊆ dom(풜) such that ⃗ ′ ′ 풜 |= 퐻(⃗푐, 푑 ) for any 퐻 ∈ 휂푖. Thus, 풜 |= 휂푖(⃗푐), where 휂푖 consist of all atoms 퐻 ∈ 휂푖 that ′ contain no 푦푖, and hence 풜 |= 휂푖(⃗푐). Consequently, 풜 |= IFC(휏). As we did for GTGDs, we now want to derive a set of properties that will imply completeness of a rewriting with respect to some set of DisGTGDs. To achieve this, we generalise our previous thoughts from tree-like chase sequences to tree-like chase trees. Recall the structure of a tree-like chase tree from some database 풟 and set of DisGTGDs Σ. A tree-like chase tree 푇 models some UBCQ 푄 if and only if all the facts of some 푄푖 ∈ 푄 are contained in the root node 푟 of every leaf of 푇 . To obtain sufficient completeness properties for a full set of DisTGDs Σ′, we can again separate our concerns into two cases: either some rule fired in 푟 creates new facts relevant to 푟 in every new node of 푇 , or a 푖 non-full rule fired in 푟 creates new children 푣1, . . . , 푣푛 of 푟 and there is a chase tree 푇 푖 from any child 푣푖 such that 푟 adds on a fact in every leaf of 푇 . The first case is again rather simple: we just let Σ′ contain all immediate full consequences of Σ. For the second case, it again suffices to consider a simpler situation in which each 푇 푖 is created by firing full rules only. By an inductive argument (Lemma 5.1.7), we can then again lift the property for this situation to general chase trees. The following definition will be quite useful for the formal definition of our abstract properties and our upcoming proofs. Definition 5.1.5 (Query of a chase tree). Given a chase tree 푇 rooted at 푁 with node 푟, 푇 the query of 푇 is defined by 푄 := lvs(푇 )푟.

Definition 5.1.6 (Evolve-completeness). Let Σ be a set of DisGTGDs, and Σ′ be a set of full DisTGDs. We say that Σ′ is evolve-complete with respect to Σ if it satisfies the following properties:

48 5.1. An Abstract Property for Completeness

(a) For every 휏 ∈ IFC(Σ), we have 휏 ∈ Σ′ (modulo renaming). (b) Let 푇 be a chase tree with root 푁, let 푟 be the node inside 푁, let 푁 1, . . . , 푁 푛+푚 be the children of 푁, and 휏푟 ∈ Σ be non-full such that

(i) 휏푟 is the first rule fired in 푇 , (ii) the first 푛 head conjunctions of 휏푟 are non-full, 푁 푖 ′ (iii) for 1 ≤ 푖 ≤ 푛, 푇 was created by firing full rules from Σ in 푣푖 only, where 푣푖 is the new child of 푟 in 푁 푖, (iv) 푁 푛+1, . . . , 푁 푛+푚 have no subtrees. Then there exists a chase tree 푇 ′ from 푁 and Σ′ with 푇 ′ |= 푄푇 .

In the following proofs, we will frequently assume that whenever a node 푁 푖 in 푇 is 푖 created by a non-full head conjunction, its subtree 푇 푁 is fully expanded before expanding the subtree of any sibling 푁 푛+1, . . . , 푁 푛+푚 that got created by a full head conjunction. This can be done without loss of generality since the final structure of 푇 does not depend on the order used to expand the subtrees. We again show that any evolve-complete set is in fact complete by first proving a key lifting lemma that relies on the one-pass chase property. Lemma 5.1.7. Let Σ be a set of DisGTGDs, let Σ′ be evolve-complete with respect to Σ, let 푇 be a one-pass chase tree from Σ, let 푁 be a node in 푇 , let 푝 be a node inside 푁, let 1 푛+푚 푁 , . . . , 푁 be the children of 푁, and 휏푝 ∈ Σ be non-full such that

(i) 휏푝 was fired in node 푝 of 푁, (ii) the first 푛 head conjunctions of 휏푝 are non-full, 푁 푖 (iii) for 1 ≤ 푖 ≤ 푛, all leaves of 푇 added on facts in 푣푖, where 푣푖 is the new child of 푝 in 푁 푖, (iv) 푁 푛+1, . . . , 푁 푛+푚 have no subtrees.

′ ′ ′ (︀ 푁 )︀ Then there exists a chase tree 푇 from 푁푝 and Σ such that 푇 |= lvs 푇 푝. Proof. We can assume that whenever a non-full rule is fired to create 푇 , every subtree of a non-full head conjunction is fully expanded first. We additionally show for 1 ≤ 푖 ≤ 푛, 푖 푖 ′ 푖 (︀ 푁 푖 )︀ that there exists a chase tree 푇 from 푁 and Σ with 푇 |= lvs 푇 . We prove the 푣푖 푣푖 claim by induction on the number of steps used to create 푇 푁푝 , denoted by 푘. In case ′ 1 푚 푘 = 1, we initialise 푇 with the single node 푁푝, and if 푁 , . . . , 푁 added on facts in 푝, we ′ additionally obtain IFC(휏푝) ∈ Σ by Property 5.1.6.(a) that we then fire in 푁푝. 푁 푗 In case 푘 > 1, let 휏 ∈ Σ be the rule that got fired in step 푘 in a leaf 퐿 of 푇푘−1 for some 푗 ∈ {1, . . . , 푛}. Due to one-pass, we can assume that the new facts relevant to 푣푗 are generated by acting on some node 푑 that is a non-strict descendant of 푣푗. 푖 First consider the case 푣푗 = 푑. By the inductive hypothesis, we get a chase tree 푇 푖 ′ 푖 (︀ 푁 푖 )︀ from 푁 and Σ with 푇 |= lvs 푇 for 1 ≤ 푖 ≤ 푛. Since all new leaves added on facts 푣푖 푘−1 푣푖 in 푣푗, there is IFC(휏) ∈ Σ creating the new facts relevant to 푣푗, and by Property 5.1.6.(a), ′ ′ 푗 ′ IFC(휏) ∈ Σ . Hence, we can fire IFC(휏) in any leaf 퐿 of 푇 with 퐿 |= 퐿푣푗 to obtain a 푗 ′ 푗 ′ 푗 ′ (︀ 푁 푗 )︀ ′ chase tree (푇 ) from 푁푣 and Σ with (푇 ) |= lvs 푇 . Now we obtain the desired 푇 푗 푘 푣푗 by using Property 5.1.6.(b).

49 5. Quantifier-Free Query Answering under DisGTGDs

Next consider the case where 푑 is a descendant of 푣푗. Let 푐 be the first child of 푣푗 on ′ (︀ 푁 ′)︀ the path to 푑, created by firing a non-full rule in node 푁 . Set 푇푁 ′ := cut 푇푘−1, 푁 . By 푖 푖 ′ 푖 (︀ 푁 푖 )︀ the inductive hypothesis, we get a chase tree 푇 from 푁 and Σ with 푇 |= lvs 푇 ′ 푣푖 푁 푣푖 ′ ′ for 1 ≤ 푖 ≤ 푛. Let (푁 ′)1,..., (푁 ′)푛 +푚 be the children of 푁 ′, where the first 푛′ nodes are created by non-full head conjunctions. Also by the inductive hypothesis, we get a ′ ′ ′ ′ (︀ 푁 ′ )︀ chase tree 푇 from 푁 and Σ with 푇 |= lvs 푇 , where we used the assumption 푣푗 푘 푣푗 that subtrees rooted at nodes created by a non-full head conjunction are fully expanded ′ ′ 푗 ′ ′ 푗 ′ first. Hence, by attaching 푇 to any leaf 퐿 of 푇 with 퐿 |= 푁푣푗 , we obtain (푇 ) with 푗 ′ (︀ 푁 푗 )︀ (푇 ) |= lvs 푇 . Finally, we again use Property 5.1.6.(b) to obtain the desired chase 푘 푣푗 ′ tree from 푁푝 and Σ .

The completeness property again follows by a simple induction.

Proposition 5.1.8 (Completeness). Let Σ be a set of DisGTGDs and Σ′ be evolve-complete with respect to Σ. Then Σ′ is complete with respect to Σ.

Proof. By Theorems 2.6.2 and 3.2.4, it suffices to show that whenever there is a one-pass chase tree 푇 from 풟 and Σ for a quantifier-free UBCQ 푄, then 푄 also follows from 풟 and Σ′. Let 푟 be the single node of the root of 푇 and 퐿1, . . . , 퐿푚 be the leaves of 푇 . Since 푖 푖 푇 |= 푄 if and only if 퐿푟 |= 푄푗푖 for each 퐿 and some 푄푗푖 ∈ 푄, it suffices to show that we can create a chase tree 푇 ′ from 풟 and Σ′ with 푇 ′ |= 푄푇 . We can again assume that subtrees rooted at nodes created by a non-full head conjunction are fully expanded first. We show the claim by induction on the number of steps fired to create 푇 , denoted by 푛. In case 푛 = 0, we just set 푇 ′ := 푇 . ′ ′ 푇푛−1 Assume 푛 > 0. By the inductive hypothesis, we get 푇푛−1 such that 푇푛−1 |= 푄 . We ′ ′ 푇푛 have to create 푇푛 such that 푇푛 |= 푄 . Let 휏 ∈ Σ be the rule firing at step 푛 in some node 푑 of some 퐿 ∈ lvs(푇푛−1), and assume that 푟 adds on a fact in every new leaf; otherwise, ′ ′ we can simply set 푇푛 := 푇푛−1. If 푑 = 푟, then the new facts relevant to 푟 can be created by IFC(휏). By Property 5.1.6.(a), ′ ′ ′ ′ ′ IFC(휏) ∈ Σ , and thus, we can fire IFC(휏) in any leaf 퐿 of 푇푛−1 with 퐿 |= 퐿푟 to obtain 푇푛 ′ 푇푛 with 푇푛 |= 푄 . Otherwise, let 푐 be the first child of 푟 on the path to 푑, created by firing a rule in node 푁 with children 푁 1, . . . , 푁 푘+푙, where the first 푘 nodes are created by non-full head ′′ ′ ′′ (︀ 푁 )︀ conjunctions. By Lemma 5.1.7, there is a chase tree 푇 from 푁푟 and Σ with 푇 |= lvs 푇푛 푟, where we used the assumption that subtrees rooted at nodes created by a non-full head conjunction are fully expanded first. Further, by the inductive hypothesis, we get 푇 ′ with ′ cut(푇푛−1,푁) ′′ ′ ′ ′ ′ 푇 |= 푄 . Thus, by attaching 푇 to any leaf 퐿 of 푇 with 퐿 |= 푁푟, we obtain 푇푛 ′ 푇푛 with 푇푛 |= 푄 .

Corollary 5.1.9. Let Σ be a set of DisGTGDs and Σ′ be finite, sound, and evolve-complete with respect to Σ. Then Σ′ is an atomic rewriting of Σ.

Proof. Σ′ is full by Definition 5.1.6, finite and sound by assumption, complete by Proposi- tion 5.1.8, and hence an atomic rewriting by Lemma 2.4.7.

50 5.2. The Disjunctive Guarded Saturation

5.2. The Disjunctive Guarded Saturation

In this section, we present a procedure for DisGTGDs that returns an atomic rewriting. Of course, we want to reuse some ideas established for our previous rewriting algorithms. Unfortunately, simply generalising our guarded saturation from GTGDs to DisGTGDs leads to a combinatorial explosion: Example 5.2.1. Given the DisGTGDs (︀ )︀ 휏1 = 푅(푥1) → ∃푦 푆(푥1, 푦) ∧ 푇 (푥1) ∨ 푈(푥1),

휏2 = 푆(푥1, 푥2) → 푉 (푥1) ∨ 푊 (푥2), the composition of 휏1, 휏2 amounts to (︀ )︀ 푅(푥1) → ∃푦 푆(푥1, 푦) ∧ 푇 (푥1) ∧ (푉 (푥1) ∨ 푊 (푦)) ∨ 푈(푥1) (︀ )︀ (︀ )︀ ≡ 푅(푥1) → ∃푦 푆(푥1, 푦) ∧ 푇 (푥1) ∧ 푉 (푥1) ∨ ∃푦 푆(푥1, 푦) ∧ 푇 (푥1) ∧ 푊 (푦) ∨ 푈(푥1).

It is evident that the head of a derived rule can grow explosively. In general, the head of a rule can consist of any possible boolean combination of atoms (in disjunctive normal form). Using Lemma 4.4.15, the number of different atoms using 푤 variables is bounded by 푠 := 2푛푤푎, where 푛 is the number of predicate symbols and 푎 the maximum arity of any predicate. The head of a DisGTGD consists of a set of head conjunctions, which in 푠 turn consist of a set of atoms; consequently, this results in a triple exponential 22 bound on the number of different heads. To avoid this explosion, we have to introduce two new concepts. The first is given by another normal form transformation. Definition 5.2.2 (Single head normal form). We say that a DisTGD 휏 is in single head normal form or single-headed if every head conjunction of 휏 consists of exactly one atom. Given a set of DisTGDs Σ, the single head normal form SHNF(Σ) of Σ is obtained by replacing every rule 푛 푛 ⋁︁ ⋀︁푖 훽(⃗푥) → ∃⃗푦푖 퐻푖,푗(⃗푥,⃗푦푖) 푖=1 푗=1 in Σ by 푛 ⋁︁ 훽(⃗푥) → ∃⃗푦푖 푅푖(⃗푥푖, ⃗푦푖), 푖=1 and, for any 1 ≤ 푖 ≤ 푛,

푅푖(⃗푥푖, ⃗푦푖) → 퐻푖,1(⃗푥푖, ⃗푦푖), . .

푅푖(⃗푥푖, ⃗푦푖) → 퐻푖,푛푖 (⃗푥푖, ⃗푦푖), where 푅푖 ∈/ ℛ is a fresh relation symbol and ⃗푦푖 might be empty. This replacement can be done in time linear in size(Σ).

51 5. Quantifier-Free Query Answering under DisGTGDs

Example 5.2.3. The single head normal form of (︀ )︀ 퐵(푥1, 푥2) → ∃푦1, 푦2 퐻1(푦1, 푥1, 푦2) ∧ 퐻2(푦2) ∨ ∃푦1 퐻3(푦1, 푥2) is

퐵(푥1, 푥2) → ∃푦1, 푦2 푅1(푥1, 푦1, 푦2) ∨ ∃푦1 푅2(푥2, 푦1),

푅1(푥1, 푦1, 푦2) → 퐻1(푦1, 푥1, 푦2),

푅1(푥1, 푦1, 푦2) → 퐻2(푦2),

푅2(푥2, 푦1) → 퐻3(푦1, 푥2).

Lemma 5.2.4. For any database 풟, set of DisTGDs Σ, and UBCQ 푄, we have 풟, Σ |= 푄 if and only if 풟, SHNF(Σ) |= 푄.

Proof. If we fire a rule 휏 in a chase using Σ, we can simply fire all rules in SHNF(휏) to create the corresponding facts in a chase using SHNF(Σ). Conversely, if we fire a rule 휏 ∈ SHNF(Σ) ∖ Σ in a chase using SHNF(Σ), we can fire the rule 휏 ′ ∈ Σ that produced 휏 to create all corresponding facts that use relations in ℛ in a chase using Σ. All other facts with relations not in ℛ produced by 휏 can only trigger further rules in SHNF(휏 ′) that create facts which have then already been added in the chase using Σ. Since, by definition, 푄 only uses relations in ℛ, this finishes the proof.

⋁︀푛 In the following, given a single-headed DisTGD 휏 = 훽(⃗푥) → 휂 with 휂 = 푖=1 ⃗푦푖 퐻푖(⃗푥,⃗푦푖), instead of treating 휂 as the set {{퐻푖} | 1 ≤ 푖 ≤ 푛}, we treat 휂 as the set {퐻푖 | 1 ≤ 푖 ≤ 푛}. The single head normal form transformation alone is not yet enough as our previous (EVOLVE)-rule collects all head atoms of both resolved rules and hence does not preserve single-headedness. As a running example, take the single-headed DisGTGDs

휏1 = 푅(푥) → ∃푦 푃 (푥, 푦),

휏2 = 푃 (푥1, 푥2) → 푆(푥1, 푥2),

휏3 = 푃 (푥1, 푥2) ∧ 푆(푥1, 푥2) → 푇 (푥1).

′ (︀ )︀ Then composing 휏1, 휏2 results in 휏1 = 푅(푥) → ∃푦 푃 (푥, 푦) ∧ 푆(푥, 푦) , which then can be ′ ′ composed with 휏3 to obtain 휏2 = 푅(푥) → 푇 (푥). But unfortunately, 휏1 is not single-headed. If we want to preserve single-headedness, we can no longer collect all derivable non-full head atoms in a single rule. Instead, when composing 휏1, 휏2, let us try to just derive the * ′ single-headed 휏1 = 푅(푥) → ∃푦 푆(푥, 푦). It is then not possible, however, to derive 휏2 by * * composing 휏3 with 휏1 and 휏1 . We lost the information that 휏1 and 휏1 can make use of the same witnesses for their existential variables. To prevent this loss of information, we introduce the Skolem normal form of DisGTGDs (see also Section 2.1).

⋁︀푛 Definition 5.2.5 (Skolemisation). Given a DisTGD 휏 = 훽(⃗푥) → 푖=1 ∃⃗푦푖 휂푖(⃗푥,⃗푦푖), the Skolem normal form Sk(휏) of 휏 is obtained by replacing every existential 푦푗 ∈ ⃗푦푖 by a term 푓푖,푗(⃗푥), where 푓푖,푗 is a fresh function symbol. An atom that contains a function is called

52 5.2. The Disjunctive Guarded Saturation functional, and a rule is called functional if it contains a functional atom. The definition is lifted to set of DisTGDs by setting Sk(Σ) := {Sk(휏) | 휏 ∈ Σ}. Example 5.2.6. The Skolemisation of (︀ )︀ 퐵(푥1, 푥2) → ∃푦1, 푦2 퐻1(푦1, 푥1, 푦2) ∧ 퐻2(푦2) ∨ ∃푦1 퐻3(푦1, 푥2) is (︀ )︀ 퐵(푥1, 푥2) → 퐻1(푓1,1(푥1, 푥2), 푥1, 푓1,2(푥1, 푥2)) ∧ 퐻2(푓1,2(푥1, 푥2)) ∨ 퐻3(푓2,1(푥1, 푥2), 푥2).

Note that the Skolemisation of DisTGDs only creates atoms of a simple form. Definition 5.2.7 (Shallow term). A term 푡 is called shallow if 푡 contains no nested function. Definition 5.2.8 (Simple atom/fact/set). An atom 푅 is called simple if 푅 only contains shallow terms. If 푅 is also ground, we call it a simple fact. A set 푆 is called simple if every 푅 ∈ 푆 is a simple fact. Given a simple set 푆, we define 푆+ := {푅 ∈ 푆 | R is functional} and 푆− := {푅 ∈ 푆 | R is non-functional}. Given a family of simple sets 푈, we lift this definition to 푈+ := {푆+ | 푆 ∈ 푈} and 푈− := {푆− | 푆 ∈ 푈}. Coming back to our running example, we now start with a Skolemised set of rules

휏1 = 푅(푥) → 푃 (푥, 푓(푥)),

휏2 = 푃 (푥1, 푥2) → 푆(푥1, 푥2),

휏3 = 푃 (푥1, 푥2) ∧ 푆(푥1, 푥2) → 푇 (푥1).

′ Then composing 휏1 and 휏2 results in 휏1 = 푅(푥) → 푆(푥, 푓(푥)), and composing 휏1 with 휏3 ′ ′ results in 휏2 = 푅(푥) ∧ 푆(푥, 푓(푥)) → 푇 (푥). The functional premise 푆(푥, 푓(푥)) in 휏2 can ′ ′ ′ then be discharged by composing 휏1 and 휏2 to obtain 휏3 = 푅(푥) → 푇 (푥). This process gives the general idea of the upcoming algorithm, which will be described shortly. But first, we examine which kind of rules we might obtain during this process. Since DisTGDs are function-free by definition, we are not dealing with pure DisTGDs anymore; instead, we will consider rules of a more general form, namely

(︃ 푛 )︃ ⋁︁ ∀⃗푥 훽(⃗푥) → 휂푖(⃗푥) , 푖=1 where 훽 and each 휂푖 are conjunctions of simple atoms in which every functional term contains all variables of the rule, and further, 훽 contains a function-free guard. We refer to such rules as guarded simple rules. Note that the Skolemisation of non-full DisGTGDs only produces guarded simple rules with a non-functional body and functional head. Conversely, a guarded simple rule with non-functional body and functional head corresponds to a non-full DisGTGD via “de- Skolemisation”. For this reason, we call such rules non-full. Similarly, we call a function-free guarded simple rule full as they correspond to full DisGTGDs. We now extend some of our definitions to guarded simple rules.

53 5. Quantifier-Free Query Answering under DisGTGDs

⋁︀푛 Definition 5.2.9 (Variable normal form). Given a guarded simple rule 휏 = 훽 → 푖=1 휂푖, 푡ℎ we say that 휏 is in variable normal form if 푥푖 is the 푖 lexicographic variable occurring in 훽. We write VNF(휏) and VNF(Σ) for the variable normal form of some guarded simple rule 휏 and set of guarded simple rules Σ, respectively.

Example 5.2.10. The rule [︀ (︀ )︀ ]︀ ∀푥3, 푦2 퐵(푦2, 푥3) → 퐻1(푓(푥3, 푦2), 푦2, 푥3) ∧ 퐻2(푥3) ∨ 퐻3(푦2, 푔(푥3, 푦2)) is not in variable normal form. Its variable normal form is [︀ (︀ )︀ ]︀ ∀푥1, 푥2 퐵(푥1, 푥2) → 퐻1(푓(푥2, 푥1), 푥1, 푥2) ∧ 퐻2(푥2) ∨ 퐻3(푥1, 푔(푥2, 푥1)) .

Lemma 5.2.11. Any guarded simple rule 휏 can be rewritten into VNF(휏) in linear time.

Proof. Simply extend Lemma 4.1.3 to multiple head conjunctions.

Definition 5.2.12. Given a DisTGD (or a guarded simple rule) 휏 with body 훽 and head 휂 = {휂1, . . . , 휂푛}, we define

(a) the body width of 휏 as bwidth(휏) := |vars(훽)|, th (b) the 푖 head width of 휏 as hwidth푖(휏) := |vars(휂푖)|, (c) the head width of 휏 as hwidth(휏) := max{hwidth푖(휏)}, and (d) the width of 휏 as width(휏) := max{bwidth(휏), hwidth(휏)}.

These definitions are naturally extended to sets of DisTGDs (or guarded simple rules). Lemma 5.2.13. For any guarded simple rule 휏, we have width(휏) = bwidth(휏).

Proof. Any variable in the head of 휏 occurs in the guard of 휏.

Definition 5.2.14 (Immediate full consequences). Given a guarded simple rule 휏 with non- functional body, the immediate full consequence of 휏 is obtained by taking the immediate full consequence of the DisGTGD corresponding to 휏 via de-Skolemisation.

Example 5.2.15. The immediate full consequence of (︀ )︀ 퐵(푥) → 퐻1(푥, 푓(푥)) ∧ 퐻2(푥) ∧ 퐻3(푥) ∨ 퐻4(푥) is (︀ )︀ 퐵(푥) → 퐻2(푥) ∧ 퐻3(푥) ∨ 퐻4(푥). Lemma 5.2.16. Let Σ be a set of guarded simple rules. Then IFC(Σ) is sound with respect to Σ.

Proof. Analagous to Lemma 5.1.4.

Finally, we are ready to define our rewriting algorithm.

Definition 5.2.17 (Disjunctive guarded saturation). We define the following inference rule on single-headed guarded simple rules:

54 5.2. The Disjunctive Guarded Saturation

• (D-EVOLVE). Assume we have some non-full 휏 = 훽 → 휂, a functional atom 퐻 ∈ 휂, a rule 휏 ′ = 훽′ → 휂′, and some 퐵′ ∈ 훽′ such that (a) 휏 ′ is full and 퐵′ is a guard, or (b) 퐵′ is functional, and assume that 휏, 휏 ′ use distinct variables (otherwise, perform a renaming). Sup- pose 휃 is an mgu of 퐻 and 퐵′ and define

훽′′ := (︀훽 ∪ (︀훽′ ∖ {퐵′})︀)︀휃, 휂′′ := (︀(︀휂 ∖ {퐻})︀ ∪ 휂′)︀휃.

We then add the rule VNF(훽′′ → 휂′′).

Given a set of DisGTGDs Σ, let DGSat∃(Σ) be the closure of VNF(Sk(SHNF(Σ))) under the given inference rule. The disjunctive guarded saturation is then defined as DGSat(Σ) := IFC(DGSat∃(Σ)). Before we verify the correctness of the saturation, we show that all derived rules are of a common form.

Proposition 5.2.18. Any 휎 ∈ DGSat∃(Σ) is a single-headed guarded simple rule in variable normal form and width(휎) ≤ bwidth(SHNF(Σ)). Proof. We prove the claim inductively. Clearly, all rules in VNF(Sk(SHNF(Σ))) satisfy the conditions, and all derived rules will be in variable normal form due to the final VNF(·) application. Assume we create 휎 = VNF(훽′′ → 휂′′) by (D-EVOLVE) from some non-full 휏 = 훽 → 휂 ′ ′ ′ ′ and some guarded simple 휏 = 훽 → 휂 in DGSat∃(Σ). Let 퐻, 퐵 be the resolved atoms in 휂, 훽′, respectively, let 휃 be the mgu of 퐻, 퐵′, and without loss of generality, assume vars(훽) = ⃗푥 and vars(훽′) = ⃗푦. Claim 1. 휎 is single-headed

Subproof. By assumption, 휏 and 휏 ′ are single-headed, and as 휂′′ ⊆ (휂 ∪ 휂′)휃, also 휎 will be single-headed.  Claim 2. 퐻 and 퐵′ contain all variables of 휏 and 휏 ′, respectively.

Subproof. 퐻 is functional and hence contains all variables of 휏. 퐵′ is either a guard of 휏 ′ ′ ′ or functional, and hence 퐵 contains all variables of 휏 .  Since 퐻휃 = 퐵′휃, it follows that vars(⃗푥휃) = vars(⃗푦휃). Claim 3. 퐻휃 and 퐵′휃 are simple.

′ ′ ′ Subproof. Let 퐻 = 푅(푡1, . . . , 푡푎) and 퐵 = 푅(푡1, . . . , 푡푎) for some relation symbol 푅, and pick a fresh variable 푧. We consider two cases: either 퐵′ is a function-free guard, or 퐵′ is functional. Assume the former. We define a substitution 휃′ by {︃ 푧, if 푣 ∈ ⃗푥 푣휃′ := . ′ ′ 푡푖휃 , if 푣 = 푡푖 ∈ ⃗푦

55 5. Quantifier-Free Query Answering under DisGTGDs

′ ′ ′ ′ ′ We show that 휃 is well-defined; that is, if 푦푘 = 푡푖 = 푡푗, then 푡푖휃 = 푡푗휃 . If 푡푖, 푡푗 ∈ ⃗푥, ′ ′ clearly 푡푖휃 = 푡푗휃 . Assume 푡푖 = 푓1(⃗푥) and 푡푗 = 푓2(⃗푥). As 휃 is a unifier, we have ′ ′ ′ ′ 푡푖휃 = 푡푖휃 = 푡푗휃 = 푡푗휃, and hence 푓1 = 푓2. Consequently, 푡푖휃 = 푡푗휃 . Lastly, we check the cases 푡푖 ∈ ⃗푥 and 푡푗 = 푓(⃗푥), or 푡푗 ∈ ⃗푥 and 푡푖 = 푓(⃗푥). Without loss of generality, assume ′ ′ the former (the other case is symmetric). Then 푡푖휃 = 푡푖휃 = 푡푗휃 = 푡푗휃 = 푓(⃗푥)휃. Moreover, vars(푡푗) = ⃗푥 and 푡푖 ∈ ⃗푥. But then 푡푖휃 ̸= 푓(⃗푥)휃, a contradiction. Consequently, the case ′ ′ ′ ′ 푡푖 ∈ ⃗푥 and 푡푗 = 푓(⃗푥) cannot emerge. Hence, 휃 is well-defined. Moreover, 푡푖휃 = 푡푖휃 by definition. That is, 휃′ unifies 퐻 and 퐵′, and further, 퐻휃′ and 퐵′휃′ are simple. Now 휃′ = 휃훿 for some substitution 훿, and hence, 퐻휃 and 퐵′휃 must be simple. Now assume 퐵′ is functional. We proceed similarly as in the previous case. We define a substitution 휃′ defined by 푣휃′ := 푧 for 푣 ∈ ⃗푥 ∪ ⃗푦 and claim that 퐻휃′ = 퐵′휃; that is, ′ ′ ′ ′ 푡푖휃 = 푡푖휃 for 1 ≤ 푖 ≤ 푎. Clearly, the claim holds for 푡푖 ∈ ⃗푥 and 푡푖 ∈ ⃗푦. If 푡푖 = 푓1(⃗푥) ′ ′ ′ ′ and 푡푖 = 푓2(⃗푦), then again 푓1 = 푓2 using that 휃 is a unifier, and consequently, 푡푖휃 = 푡푖휃 . ′ ′ Lastly, we check the cases 푡푖 ∈ ⃗푥 and 푡푖 = 푓(⃗푦), or 푡푖 = 푓(⃗푥) and 푡푖 ∈ ⃗푦. Without loss ′ of generality, assume the former. Then 푡푖휃 = 푡푖휃 = 푓(⃗푦)휃. As 퐻 is functional, there is ′ ′ ′ 푗 ∈ {1, . . . , 푎} with 푡푗 = 푓 (⃗푥), and thus 푡푗휃 = 푡푗휃 = 푓 (⃗푥)휃. We have two cases: ′ ′ Assume 푡푗 = 푓 (⃗푦). Since vars(푡푗) = ⃗푥, this implies 푡푖휃 = 푦푘휃 for some 푦푘 ∈ ⃗푦. Hence, ′ ′ 푦푘휃 = 푡푖휃 = 푡푖휃 = 푓(⃗푦)휃. But also 푦푘휃 ̸= 푓(⃗푦)휃, a contradiction. Now assume 푡푗 = 푦푘. ′ ′ As 푡푖 ∈ ⃗푥 and 푦푘휃 = 푡푗휃 = 푓 (⃗푥휃), we get that 푡푖휃 is a subterm in 푦푘휃. But we also have ′ 푡푖휃 = 푡푖휃 = 푓(⃗푦휃), leading to a contradiction. ′ ′ Consequently, the case 푡푖 ∈ ⃗푥 and 푡푖 = 푓(⃗푦) cannot emerge. Hence, 휃 unifies 퐻 and 퐵′, and further, 퐻휃′ and 퐵′휃′ are simple. Thus, as before, 퐻휃 and 퐵′휃 must also be simple.  Claim 4. If 퐴 ∈ {퐻, 퐵′} is functional, then vars(퐴)휃 is non-functional.

Subproof. Let 푡 be a functional term in 퐴. Then 푡 contains all variables of the corresponding rule. Since 퐴휃 is simple, 푡휃 is shallow, and thus vars(퐴)휃 is non-functional.  As 퐻 is functional, Claim4 implies that vars(⃗푥휃) = ⃗푥휃, and hence

width(휎) = |vars(⃗푥휃)| = |⃗푥휃| ≤ |⃗푥| = width(휏) ≤ bwidth(SHNF(Σ)).

Claim 5. 휎 is simple.

Subproof. By Claim4, ⃗푥휃 is non-functional, and thus, 휏휃 is simple. Similarly, if 퐵′ is functional, ⃗푦휃 is non-functional, and thus 휏 ′휃 is simple. If 퐵′ is non-functional, then 휏 ′ is full and 퐵′ is a guard containing all variables of 휏 ′. Now 퐵′휃 is simple by Claim3, and ′ hence also 휏 휃 must be simple.  Claim 6. Every functional term in 휎 contains all variables.

Subproof. As ⃗푥휃 is non-functional, all functional terms 푡 in 휏휃 still contain all variables of vars(⃗푥휃) = vars(⃗푦휃). If 퐵′ is functional, then ⃗푦휃 is non-functional and the same reasoning applies. If 휏 is full and 퐵′ is a guard, let 푡 be a functional term in 휏 ′휃. Then there must ′ ′ be 푦푖 ∈ ⃗푦 = vars(퐵 ) such that 푦푖휃 = 푡. As 퐵 휃 = 퐻휃, 푡 occurs in 휏휃, and hence, by our first case, contains all variables. 

56 5.2. The Disjunctive Guarded Saturation

Claim 7. 휎 contains a function-free guard.

Subproof. Let 퐺 be the non-functional guard of 휏. Then 퐺휃 is not removed in 휎. As ⃗푥휃 is non-functional, 퐺휃 is non-functional, and further, 퐺휃 contains all the variables of ⃗푥휃 = vars(⃗푦휃).  Now by Claim1 and5 to7, 휎 is a single-headed guarded simple rule.

5.2.1. Correctness To prove the correctness of the disjunctive guarded saturation, we again first verify that the algorithm is sound.

Lemma 5.2.19. DGSat∃(Σ) is sound with respect to SHNF(Σ).

Proof. By Theorem 2.1.4, it suffices to show that DGSat∃(Σ) is sound with respect to Sk(SHNF(Σ)). For this, it suffices to show that Sk(SHNF(Σ)) |= DGSat∃(Σ), which is just a special case of the soundness proof for standard first-order logic resolution [Robinson, 1965]. A detailed, formalised proof can be found in the Masters thesis of Schlichtkrull [2015].

Corollary 5.2.20 (Soundness). DGSat(Σ) is sound with respect to SHNF(Σ).

Proof. By Lemma 5.2.19, DGSat∃ is sound with respect to SHNF(Σ), and by Lemma 5.2.16, DGSat(Σ) = IFC(DGSat∃(Σ)) is sound with respect to DGSat∃(Σ).

To prove (evolve)-completeness of the algorithm, we want to talk about chase steps using full as well as non-full rules from DGSat∃(Σ). The non-full rules of DGSat∃(Σ), however, do not contain existential variables for which a fresh constant can be introduced in a chase step; instead, each existential variable got replaced by a fresh Skolem function. We thus generalise the notion of chase steps in the natural way. That is, we allow a trigger ℎ to map to ground terms with constants from consts(퐼). In fact, during our completeness proof, we will only have to consider chase trees that introduce simple facts, like the one in the following example. Example 5.2.21. Given the database 풟 = {푅(푐, 푑)} and a set of guarded simple rules Σ

푅(푥1, 푥2) → 푆(푥1, 푓1,1(푥1, 푥2)) ∨ 푇 (푥1, 푥2),

푇 (푥1, 푥2) → 푈(푥1, 푥2, 푓2,1(푥1, 푥2)),

푈(푥1, 푥2, 푥3) → 푀(푥1) ∨ 푃 (푥2),

푆(푥1, 푥2) → 푀(푥2), the tree depicted in Figure 5.1 is a chase tree from 풟 and Σ. Now recall the definition of evolve-completeness: Definition 5.1.6 (Evolve-completeness). Let Σ be a set of DisGTGDs, and Σ′ be a set of full DisTGDs. We say that Σ′ is evolve-complete with respect to Σ if it satisfies the following properties:

57 5. Quantifier-Free Query Answering under DisGTGDs

푅(푐, 푑) 푇 푇1 2 푇 0 푅(푐, 푑) 푅(푐, 푑), 푆(푐, 푓1,1(푐, 푑)) 푅(푐, 푑), 푇 (푐, 푑) 푅(푐, 푑) 푅(푐, 푑), 푆(푐, 푓1,1(푐, 푑)) 푅(푐, 푑), 푇 (푐, 푑) 푅(푐, 푑), 푇 (푐, 푑), 푈(푐, 푑, 푓2,1(푐, 푑))

푅(푐, 푑) 푇4 푅(푐, 푑) 푇3

푅(푐, 푑), 푆(푐, 푓1,1(푐, 푑)) 푅(푐, 푑), 푇 (푐, 푑) 푅(푐, 푑), 푆(푐, 푓1,1(푐, 푑)) 푅(푐, 푑), 푇 (푐, 푑)

푅(푐, 푑), 푆(푐, 푓1,1(푐, 푑)), 푅(푐, 푑), 푇 (푐, 푑), 푈(푐, 푑, 푓2,1(푐, 푑)) 푅(푐, 푑), 푇 (푐, 푑), 푈(푐, 푑, 푓2,1(푐, 푑)) 푀(푓1,1(푐, 푑))

푅(푐, 푑), 푇 (푐, 푑), 푅(푐, 푑), 푇 (푐, 푑), 푅(푐, 푑), 푇 (푐, 푑), 푅(푐, 푑), 푇 (푐, 푑), 푈(푐, 푑, 푓2,1(푐, 푑)), 푀(푐) 푈(푐, 푑, 푓2,1(푐, 푑)), 푃 (푑) 푈(푐, 푑, 푓2,1(푐, 푑)), 푀(푐) 푈(푐, 푑, 푓2,1(푐, 푑)), 푃 (푑)

Figure 5.1.: Chase tree for Example 5.2.21; nodes that will be extended are marked bold and red.

(a) For every 휏 ∈ IFC(Σ), we have 휏 ∈ Σ′ (modulo renaming). (b) Let 푇 be a chase tree with root 푁, let 푟 be the node inside 푁, let 푁 1, . . . , 푁 푛+푚 be the children of 푁, and 휏푟 ∈ Σ be non-full such that

(i) 휏푟 is the first rule fired in 푇 , (ii) the first 푛 head conjunctions of 휏푟 are non-full, 푁 푖 ′ (iii) for 1 ≤ 푖 ≤ 푛, 푇 was created by firing full rules from Σ in 푣푖 only, where 푣푖 is the new child of 푟 in 푁 푖, (iv) 푁 푛+1, . . . , 푁 푛+푚 have no subtrees. Then there exists a chase tree 푇 ′ from 푁 and Σ′ with 푇 ′ |= 푄푇 .

The first property again follows immediately:

Lemma 5.2.22. For every 휏 ∈ IFC(SHNF(Σ)), we have VNF(휏) ∈ DGSat∃(Σ).

Proof. Follows from the base case and final step of the algorithm.

The proof of the second property requires two lemmas. Broadly speaking, the first lemma shows that whenever a rule 휏 triggers in a set of simple facts 푁, we can create a corresponding rule 휏 * that triggers in a set of function-free facts and whose consequences contain those of 휏, provided that we have a rule 휏퐹 that creates 퐹 and triggers in a function-free set of facts for each 퐹 ∈ 푁+:

Lemma 5.2.23. Let 푀, 푁, 푆 be sets, 퐹1, . . . , 퐹푛 be simple facts, 휏 ∈ DGSat∃(Σ), and for any 퐹 ∈ 푁+, let 휏퐹 ∈ DGSat∃(Σ) be non-full such that

(i) 푀 ⊇ 푁− and 푀 is function-free, (ii) 푁, 푆 are simple and consts(푁), consts(푆) ⊆ consts(푀), (iii) 휏 = 훽 → 휂 is full or has a functional body atom, ⋁︀푛 (iv) firing 휏 in 푁 produces 푖=1 퐹푖, ⋁︀ ′ firing 휏 in 푀 produces 퐹 ∨ ′ 퐹 for some 푆 ⊆ 푆 (v) 퐹 퐹 ∈푆퐹 퐹

58 5.2. The Disjunctive Guarded Saturation

* * ⋁︀푛 ⋁︀ ′ Then there exists 휏 ∈ DGSat∃(Σ) such that firing 휏 in 푀 produces 푖=1 퐹푖 ∨ 퐹 ′∈푆′ 퐹 for some 푆′ ⊆ 푆.

Proof. We show the claim by induction on 푘 := |푈|, where 푈 := {퐵 ∈ 훽 | ℎ(퐵) ∈ 푁+}. In case 푘 = 0, we just set 휏 * := 휏. In case 푘 > 0, let ℎ be the trigger obtained from Assumption (iv) for 휏. Claim 1. There is 퐵 ∈ 푈 such that 휏 is full and 퐵 guards 휏, or 퐵 is functional.

Subproof. If there is a functional 퐵 ∈ 푈, we are done. Otherwise, 휏 must be full. Let 퐺 be the guard of 휏. As 푘 > 0, ℎ unifies some variable in 휏 with a functional term. Hence, ℎ(퐺) is functional, and thus 퐺 ∈ 푈. 

Fix some 퐵 ∈ 푈 obtained by the just proven claim and set 퐹 := ℎ(퐵). Pick the corresponding 휏퐹 = 훽퐹 → 휂퐹 and the trigger ℎ퐹 obtained from Assumption (v). Then there is 퐻퐹 ∈ 휂퐹 such that ℎ퐹 (퐻퐹 ) = ℎ(퐵). Without loss of generality, assume vars(훽퐹 ) = ⃗푥, vars(훽) = ⃗푦, and ℎ퐹 (⃗푥) = ⃗푐. Then the substitution 휃 defined by ⎧ 푥푖, if 푣 ∈ ⃗푥 and ℎ퐹 (푣) = 푐푖 ⎨⎪ 푣휃 := 푥푖, if 푣 ∈ ⃗푦 and ℎ(푣) = 푐푖 ⎩⎪ 푓(푥푖1 , . . . , 푥푖푎 ), if 푣 ∈ ⃗푦 and ℎ(푣) = 푓(푐푖1 , . . . , 푐푖푎 )

* is a unifier of 퐻퐹 and 퐵. Hence, there is an mgu 휃 of 퐻 and 퐵, and thus, there is a substitution 훿 such that 휃* = 휃훿.

Claim 2. 퐻퐹 is a functional atom.

Subproof. By Assumption (i), 푀 is function-free, and hence, ℎ퐹 (⃗푥) is function-free; how- ever, 퐹 = ℎ퐹 (퐻퐹 ) is functional, and thus 퐻퐹 must be functional. 

Using Claim1 and2, we can see that all requirements for (D-EVOLVE) on 휏퐹 , 휏 are * * * ′ ′ satisfied. Thus, we obtain 휏 = (훽 → 휂 ) = VNF(훽 → 휂 ) ∈ DGSat∃(Σ), where

′ (︀ (︀ )︀)︀ * 훽 = 훽퐹 ∪ 훽 ∖ {퐵} 휃 , ′ (︀(︀ )︀ )︀ * 휂 = 휂퐹 ∖ {퐻퐹 } ∪ 휂 휃 .

Let 휌 be the renaming used by VNF(·) for 훽′ → 휂′, let ℎ′ be the trigger defined by ′ * ′ −1 ℎ (푥푖) := 푐푖, and set ℎ := ℎ ∘ 훿 ∘ 휌 . ′ Claim 3. For any 푥푖 ∈ ⃗푥, we have ℎ (푥푖휃) = ℎ퐹 (푥푖), and for any 푦푖 ∈ ⃗푦, we have ′ ℎ (푦푖휃) = ℎ(푦푖).

Subproof. First note that dom(휃) = ⃗푥 ∪ ⃗푦 by Assumptions (i) and (ii). Pick some 푥푖 ∈ ⃗푥 ′ and assume ℎ퐹 (푥푖) = 푐푗. Then 푥푖휃 = 푥푗 by definition of 휃. Further, by definition of ℎ , ′ ′ ′ ℎ (푥푗) = 푐푗. Hence, ℎ (푥푖휃) = ℎ (푥푗) = 푐푗 = ℎ퐹 (푥푖). Now pick some 푦푖 ∈ ⃗푦. The case ℎ(푦푖) = 푐푗 is analogous to the previous one. ′ In case ℎ(푦푖) = 푓(푐푗1 , . . . , 푐푗푎 ), we get 푦푖휃 = 푓(푥푗1 , . . . , 푥푗푎 ) and ℎ (푓(푥푗1 , . . . , 푥푗푎 )) =

푓(푐푗1 , . . . , 푐푗푎 ). 

59 5. Quantifier-Free Query Answering under DisGTGDs

* * ⋁︀푛 ⋁︀ ′ Claim 4. Firing 휏 based on ℎ in 푀 ∪ 푁 produces 퐹 ∨ ′ 퐹 . 푖=1 푖 퐹 ∈푆퐹 Subproof. We have

* * ′ −1 * ′ ′ ′(︁(︀ (︀ )︀)︀ * )︁ ′(︀ * )︀ ℎ (훽 ) = ℎ ∘ 훿 ∘ 휌 (훽 ) = ℎ ∘ 훿(훽 ) = ℎ 훽퐹 ∪ 훽 ∖ {퐵} 휃 훿 ⊆ ℎ (훽퐹 ∪ 훽)휃 훿

′(︀ )︀ ′ ′ ′ 퐶푙푎푖푚 3 = ℎ (훽퐹 ∪ 훽)휃 = ℎ (훽퐹 휃 ∪ 훽휃) = ℎ (훽퐹 휃) ∪ ℎ (훽휃) = ℎ퐹 (훽퐹 ) ∪ ℎ(훽) ⊆ 푀 ∪ 푁. and

* * ′ −1 * ′ ′ ′(︁(︀(︀ )︀ )︀ )︁ ℎ (휂 ) = ℎ ∘ 훿 ∘ 휌 (휂 ) = ℎ ∘ 훿(휂 ) = ℎ 휂퐹 ∖ {퐻퐹 } ∪ 휂 휃

퐶푙푎푖푚 3 ′(︀(︀ )︀ )︀ ′(︀(︀ )︀ )︀ = ℎ 휂퐹 ∖ {퐻퐹 } 휃 ∪ ℎ(휂) = ℎ 휂퐹 ∖ {퐻퐹 } 휃 ∪ {퐹1, . . . , 퐹푛} 퐶푙푎푖푚 3 (︀ )︀ = ℎ퐹 휂퐹 ∖ {퐻퐹 } ∪ {퐹1, . . . , 퐹푛} = 푆퐹 ∪ {퐹1, . . . , 퐹푛}. 

* * * * Now if ℎ (휏 ) ⊆ 푀, we are done. Hence assume there is 퐵 ∈ 훽 with ℎ (퐵) ∈ 푁+. * * * * Claim 5. For any 퐵 ∈ 훽 with ℎ (퐵) ∈ 푁+, we have 퐵 ∈ (훽 ∖ {퐵})휃 . Moreover, 휏 is full or has a functional body atom.

* Subproof. By Assumption (i), 푀 is function-free. Hence, no atom in 훽퐹 휃 unifies with an * * atom in 푁+ via ℎ , and thus 퐵 ∈ (훽 ∖ {퐵})휃 . * * Next, pick some 퐵 ∈ 훽 with ℎ (퐵) ∈ 푁+. Now either 퐵 is functional, and we are done, or ℎ* maps some 푥 ∈ vars(퐵) to a functional term. For the sake of contradiction, assume 휏 * is non-full. Then there is a functional head atom 퐻 in 휏 * that contains 푥. Hence, ℎ*(퐻) * is non-simple, contradicting the fact that ℎ (휂) is simple.  We have |(훽 ∖ {퐵})휃*| < 푘. Thus, we can apply the inductive hypothesis on 휏 *, which finishes the proof.

The second lemma basically provides an inductive proof of Property 5.1.6.(b); however, in order to make the induction work, we have to show a more general claim.

Lemma 5.2.24. Let 푇 be a chase tree from DGSat∃(Σ) with root 푁, let 푀, 푆 be sets, for any 퐹 ∈ 푁+, let 휏퐹 ∈ DGSat∃(Σ) be non-full, and if the first fired rule is non-full, let 휏 ∈ DGSat∃(Σ) such that (i) the first rule fired in 푇 is full or non-full, and all subsequent rules are full, (ii) 푀 ⊇ 푁− and 푀 is function-free, (iii) 푁, 푆 are simple and consts(푁), consts(푆) ⊆ consts(푀), ⋁︀ ′ (iv) firing 휏퐹 in 푀 produces 퐹 ∨ 퐹 ′∈푆 퐹 for some 푆퐹 ⊆ 푆, 퐹 ⋁︀푛 (v) if the first fired rule is non-full, it produces 푖=1 퐹푖 for some simple 퐹1, . . . , 퐹푛, (vi) 휏 is non-full, ⋁︀푛 ⋁︀ ′ (vii) firing 휏 in 푁− produces 푖=1 퐹푖 ∨ 퐹 ′∈푆′ 퐹 for some 푆 ⊆ 푆. * * * Then there exists a chase tree 푇 from 푀 and DGSat∃(Σ) such that for any 퐿 ∈ lvs(푇 ), * * (a) 퐿 |= lvs(푇 )− and 퐿 is function-free, or

60 5.2. The Disjunctive Guarded Saturation

* ⋃︀ (b) 퐿 = {퐹 } ∪ 푈퐿* for some 퐹 ∈ 푆 and 푈퐿* ⊆ 푀 ∪ 퐿∈lvs(푇 ) 퐿−. Proof. We prove the claim by induction on the number of steps used to create 푇 , denoted by 푘. In case 푘 = 0, we let 푇 * be the single node 푀. ⋁︀푛 In case 푘 > 0, let 휎 be the first rule fired in 푇푘, producing 푖=1 퐹푖. ′ ′ ⋁︀푛 푖 ⋁︀ ′ Claim 1. There is 휎 ∈ DGSat∃(Σ) such that firing 휎 in 푀 produces 푖=1 퐹 ∨ 퐹 ′∈푆′ 퐹 for some 푆′ ⊆ 푆.

Subproof. If 휎 is non-full, we can pick 휏 by Assumption (vii). If 휎 is full, by Lemma 5.2.23, ′ ′ ⋁︀푛 푖 ⋁︀ there is 휎 ∈ DGSat∃(Σ) such that firing 휎 in 푀 produces 푖=1 퐹 ∨ 퐹 ∈푆′ 퐹 for some ′ 푆 ⊆ 푆.  Let 푇 * be the chase tree created by firing 휎′ in 푀. We now iteratively modify 푇 * for 0 ≤ 푖 ≤ 푛 while maintaining the following invariant: After step 푖, for any 퐿* ∈ lvs(푇 *), we have * 푖 * 푖 : ⋃︀푖 (︀ 푁 푗 )︀ (1) 퐿 |= 푈 and 퐿 is function-free, where 푈 = 푗=1 lvs 푇푘 −, or * 푖 푖 ⋃︀ (2) 퐿 = {퐹 } ∪ 푈퐿* for some 퐹 ∈ ({퐹푖+1, . . . , 퐹푛} ∪ 푆) and 푈퐿* ⊆ 푀 ∪ 퐿∈푈 푖 퐿. Note that our initial 푇 * already satisfies the invariant for 푖 = 0. In case 푖 > 0, for any * * * 푖−1 * leaf 퐿 in 푇 with 퐿 = {퐹푖} ∪ 푈퐿* , let 푁퐿* be the ancestor of 퐿 in which some rule 휏퐿* 퐹 ∨ ⋁︀ 퐹 푉 * 푁 * producing 푖 퐹 ∈푉퐿* was fired for some set 퐿 . We modify the subtree of any 퐿 in the following way: 푖−1 Pick some 푁퐿* with maximum distance to the root. Due to our invariant, 푈퐿* is function-free, and hence, so is 푁퐿* . As a consequence, 푉퐿* is simple. 푖 푖 Claim 2. For any 퐹 ∈ 푁+, there is a non-full 휏퐹 such that firing 휏퐹 in 푁+ produces ⋁︀ ′ 퐹 ∨ ′ 퐹 for some 푆 ⊆ 푆 ∪ 푉 * . 퐹 ∈푆퐹 퐹 퐿 푖 Subproof. For any 퐹 ∈ 푁+ ∩ 푁+, there is a rule by Assumption (iv), and if 퐹푖 is functional, * we can additionally use the non-full 휏퐿 as 휏퐹푖 . 

푁푖 ′ 푖 Thus, we can apply the inductive hypothesis on 푇푘 to obtain a chase tree (푇 ) from ′ ′ 푖 ′ 푁 푖 ′ * (︀ )︀ 푁퐿 and DGSat∃(Σ) such that for any 퐿 ∈ lvs((푇 ) ), either 퐿 |= lvs 푇푘 − and 퐿 is ′ function-free, or 퐿 = {퐹 } ∪ 푈퐿′ for some 퐹 ∈ (푉퐿* ∪ 푆) and (︃ )︃ (︃ )︃ (︃ )︃ ⋃︁ ⋃︁ ⋃︁ ⋃︁ 푈퐿′ ⊆ 푁퐿* ∪ 퐿− ⊆ 푀 ∪ 퐿 ∪ 퐿− ⊆ 푀 ∪ 퐿 . 푖−1 푖 (︀ 푁푖 )︀ 퐿∈푈 (︀ 푁푖 )︀ 퐿∈푈 퐿∈lvs 푇푘 퐿∈lvs 푇

′ ′ 푖 ′ * For any 퐿 ∈ lvs((푇 ) ) with 퐿 = {퐹 } ∪ 푈퐿′ and 퐹 ∈ 푉퐿* , there is a child 푁퐹 of 푁퐿* in 푇 * containing 퐹 . As 푇 satisfies the invariant for 푖 − 1 and due to maximality of 푁퐿* , the leaves of (푇 *)푁퐹 already satisfy the invariant for step 푖. Hence, we can attach (푇 *)푁퐹 to any such 퐿′ to obtain (푇 *)푖 that satisfies the invariant for step 푖. Finally, we attach * 푖 (푇 ) to 푁퐿* and proceed with the next choice for 푁퐿* . Once all 푁퐿* are processed, we continue with step 푖 + 1. * 푛 Once step 푛 is finished, we obtain the desired chase tree 푇 , noting that 푈 = lvs(푇 )−.

61 5. Quantifier-Free Query Answering under DisGTGDs

Proposition 5.2.25 (Completeness). DGSat(Σ) is evolve-complete with respect to SHNF(Σ).

Proof. Property 5.1.6.(a) is Lemma 5.2.22. For Property 5.1.6.(b), note that 푁 is a set of facts, which means 푁+ = ∅. Hence, we can instantiate Lemma 5.2.24 (with 푀 = 푁, 푆 = ∅) * * * to obtain a chase tree 푇 from 푁 and DGSat∃(Σ) such that 퐿 |= lvs(푇 )− and 퐿 is function-free for any 퐿* ∈ lvs(푇 *). Thus, 푇 * must be created from full rules only, and any such rule is in DGSat(Σ). Hence, 푇 * is a chase tree from 푁 and DGSat(Σ) with 푇 * |= 푄푇 .

5.2.2. Complexity Analysis

Let us now examine the complexity bounds of our algorithm. In the following, let 푛 be the number of predicate symbols in SHNF(Σ), 푚 be the number of Skolem functions in Sk(SHNF(Σ)), 푎 be the maximum arity of any predicate in SHNF(Σ), and 푤 := bwidth(SHNF(Σ)).

Lemma 5.2.26. The maximum arity of any function in Sk(SHNF(Σ)) is bounded by 푤.

Proof. Follows straight from Definition 5.2.5.

푎푤 푚 Lemma 5.2.27. Any 휏 ∈ DGSat∃(Σ) contains at most 2푛푤 푎 atoms.

Proof. By Proposition 5.2.18, 휏 is a single-headed guarded simple rule in variable normal form with width(휏) ≤ 푤. For any fixed predicate symbol, we can thus choose at most 푎 function symbols and 푎 · 푤 variables. Further, any atom can occur in the body or the head of 휏. Hence, 휏 contains at most 2푛푤푎푤푚푎 atoms.

2푛푤푎푤푎푚 Corollary 5.2.28. |DGSat∃(Σ)| ≤ 2 .

Proof. A rule consists of a subset of all possible body and head atoms. Now the claim follows by Lemma 5.2.27.

Corollary 5.2.29 (Correctness). The disjunctive guarded saturation DGSat(Σ) is an atomic rewriting of Σ.

Proof. By Corollary 5.1.9 and 5.2.20 and Proposition 5.2.25, DGSat(Σ) is an atomic rewriting of SHNF(Σ), and hence, DGSat(Σ) is an atomic rewriting of Σ by Lemma 5.2.4.

′ Lemma 5.2.30. Let 휏, 휏 ∈ DGSat∃(Σ). Then all (D-EVOLVE)-resolvents of 휏 = 훽 → 휂, 휏 ′ = 훽′ → 휂′ can be obtained in time doubly exponential in 푎, 푚, 푤.

Proof. Let 퐻 ∈ 휂 be functional and 퐵′ ∈ 훽′ be functional or a guard. We check whether there is an mgu 휃 of 퐻 and 퐵′, and if this is the case, we resolve the rules and normalise the result to obtain a new rule. Using Lemmas 2.1.13 and 5.2.11, these steps can be done in 풪(size(휏) + size(휏 ′)). Since 휏 contains at most 푠 := 푛푤푎푤푎푚 body atoms and 휏 ′ at most 푠 head atoms, we have to repeat this step at most 푠푠 times. Hence, the complexity is doubly exponential in 푎, 푚, 푤.

62 5.2. The Disjunctive Guarded Saturation

Lemma 5.2.31. The disjunctive guarded saturation algorithm terminates in time doubly exponential in 푎, 푚, 푤.

Proof. The initial Skolemisation and variable and single head normal from transformations ′ can be done in linear time. The algorithm then takes all pairs 휏, 휏 in GSat∃(Σ) and performs 푎푤 푚 all (D-EVOLVE)-inferences. By Corollary 5.2.28, this process takes at most 푐 := 22푛푤 푎 iterations, and at any point, there are at most 풪(푐2) pairs. By Lemma 5.2.30, the complexity for each 휏, 휏 ′ is doubly exponential in 푎, 푚, 푤, which finishes the proof.

Finally, we can combine our results to obtain a decision procedure. Theorem 5.2.32. The disjunctive guarded saturation algorithm provides a decision proce- dure for the Quantifier-Free Query Answering Problem under DisGTGDs. The procedure terminates in EXPTIME for fixed Σ and 2EXPTIME otherwise.

Proof. The size of 푎, 푚, 푤 is linear in size(Σ). Now the claim follows from Proposition 2.5.2, Corollary 5.2.29, and Lemma 5.2.31.

Once more, our bounds already match for atomic queries:

Theorem 5.2.33 ([Gottlob et al., 2012]). Given a set of DisGTGDs Σ, the Atomic Query Answering Problem under Σ is EXPTIME-complete for fixed Σ, and 2EXPTIME in general.

63

6. Comparison with GF Resolution

As we now have different decision procedures for the Quantifier-Free Query Answering Problem under (disjunctive) GTGDs at our hand, we naturally want to compare them with existing procedures from the literature. Recall that our work’s main motivation was to explain completeness of the resolution procedures for the guarded fragment, as described by De Nivelle and De Rijke[2003] and Ganzinger and De Nivelle[1999], by using the tree-like model property. We hence first examine which fragments of GF we can cover with our chosen framework and then elaborate how our algorithms compare with the GF resolution procedures.

6.1. GF Satisfiability vs. the Quantifier-Free Query Answering Problem

We have already shown in Lemma 2.4.2 that the Quantifier-Free Query Answering Problem under DisGTGDs reduces to the GF satisfiability problem. To identify which fragment of GF we can capture with our framework, we start comparing our algorithms with the GF resolution procedures given in [De Nivelle and De Rijke, 2003, Ganzinger and De Nivelle, 1999]. As mentioned before, both procedures there use a constant-free definition of GF, the latter with equality. Since we do not support equality in our procedures, we focus on GF sentences without equality and constants. Both GF procedures apply an initial clausal normal form transformation to a given GF sentence 휙 that creates clauses of the following form:

Definition 6.1.1 (Guarded simple clause). A clause 퐶 is a guarded simple clause if

(a) each literal (¬)퐿 is simple, (b) every functional subterm in 퐶 contains all the variables of 퐶 and no constants, and (c) if 퐶 is non-ground, 퐶 contains a non-functional negative literal that contains all the variables of 퐶.

Remark 6.1.2. The definitions in referred articles are slightly more general as they also account for clauses that are derivable during the resolution process. Which fragment of guarded simple clauses can we capture with our framework consisting of a database 풟, DisGTGDs Σ, and a quantifier-free UBCQ 푄? Fix a set 푆 of guarded simple clauses, let 푆푄 consist of all purely negative, ground 퐶 = {¬퐿1,..., ¬퐿푛} ∈ 푆, − − and set 푆 := (푆 ∖ 푆푄). Then 푆 is unsatisfiable if and only if 푆 |= ¬푆푄, where ¬푆푄 is obtained by negating the formula corresponding to the clause set 푆푄. Note that ¬푆푄 is a positive, ground formula in disjunctive normal form; hence, ¬푆푄 can be modelled ′ ′ − by 푄. Furthermore, any ground 퐶 = {¬퐿1,..., ¬퐿푛, 퐿1, . . . , 퐿푚} ∈ 푆 containing at least

65 6. Comparison with GF Resolution one positive literal and using constants ⃗푐 can be modelled by adding a full DisGTGD ⋀︀푛 ⋁︀푚 ′ 푅(⃗푥) ∧ 푖=1 퐿푖(⃗푥) → 푖=1 퐿푖(⃗푥) to Σ and a fact 푅(⃗푐) to 풟, where 푅 is a fresh relation symbol of arity |⃗푐 |. What about non-ground clauses? Recall that the Skolemisation of a set of single-headed DisGTGD results in a set of guarded simple rules with non-functional bodies. Moreover, any introduced Skolem function cannot occur in more than one rule. Consequently, we are able to model sets 푈 ⊆ 푆− of non-ground clauses in which any clause 퐶 does not contain a functional negative literal and any function occurs in at most one clause. To see this, take ′ ′ a clause 퐶 = {¬퐿1,..., ¬퐿푛, 퐿1, . . . , 퐿푚} ∈ 푈 using functions 푓1, . . . , 푓푘 and variables ⃗푥. ⋀︀푛 ⋁︀푚 ′ Then 퐶 corresponds to the rule 푖=1 퐿푖(⃗푥) → 푖=1 퐿푖(⃗푥), which by Theorem 2.1.4 is satisfiable if and only if the rule

푛 (︃ 푚 )︃ ⋀︁ ⋁︁ ′ 퐿푖(⃗푥) → ∃푦1, . . . , 푦푘 퐿푖(⃗푥)[푦1/푓1(⃗푥), . . . , 푦푘/푓푘(⃗푥)] 푖=1 푖=1 푛 푚 ⋀︁ ⋁︁ ′ ≡ 퐿푖(⃗푥) → ∃푦1, . . . , 푦푘 퐿푖(⃗푥)[푦1/푓1(⃗푥), . . . , 푦푘/푓푘(⃗푥)] (6.1) 푖=1 푖=1

′ is satisfiable, where 퐿푖(⃗푥)[푦1/푓1(⃗푥), . . . , 푦푘/푓푘(⃗푥)] is the formula obtained by replacing any ′ term 푓푗(⃗푥) in 퐿푖 by the variable 푦푗. Now Sentence (6.1) is a DisGTGD, and hence 푈 can be modelled by Σ. Note that the disjunctive guarded saturation in Section 5.2 is defined for a larger set of clauses (for example, guarded simple clauses with functional bodies); however, our completeness proof assumes that the initial set of rules corresponds to a set of DisGTGDs. Thus, any 푆− containing a functional negative literal or a function that occurs in more than one clause is not reducible to our algorithm using our given results.

6.2. GF Resolution vs. the (Disjunctive) Guarded Saturation

In this section, we want to elaborate and demonstrate the differences between our derived algorithms and the known GF resolution procedures when tackling the Quantifier-Free Query Answering Problem under (disjunctive) GTGDs. In the case of DisGTGDs, the GF resolution procedures proceed the same in both referred articles. In fact, they apply the same selection and ordering strategy as we did for the disjunctive guarded saturation. That is, a literal 퐿 is selected in a clause 퐶 if and only if 퐿 is functional and negative, 퐿 is functional and 퐶 contains no functional negative literal, or 퐿 is a guard and 퐶 is non-functional. This fact is, in some way, surprising as we started from a different, more specialised problem setting, progressively developed a suitable selection and ordering strategy, but ended up with an identical result. At the same time, this means that our work gives a natural account of why chosen selection and ordering strategies work by considering the tree-like model property. Since our disjunctive guarded saturation algorithm coincides with considered GF resolu- tion approaches, we subsequently focus on the guarded saturation algorithm. Regarding the runtime complexities, both the guarded saturation and the known GF resolution procedures are theoretically optimal in the general case and the case of bounded arities; however, in case of fixed Σ, our algorithm stays theoretically optimal (PTIME) while the

66 6.2. GF Resolution vs. the (Disjunctive) Guarded Saturation

GF resolution procedures’ complexities are exponential in the number of constants in 풟 in this case (see [Ganzinger and De Nivelle, 1999, Theorem 4.3] and [De Nivelle and De Rijke, 2003, Theorem 3.22]). A second difference is that the guarded saturation does not require any Skolemisation step. We demonstrate the differences between the guarded saturation and the GF resolution procedures by means of a simple example. This example is not meant to show any differences in performance. An in-depth performance comparison of both approaches is open for future studies.

Example 6.2.1. Consider the set of GTGDs (︀ )︀ 휏1 = 푅(푥1, 푥2) → ∃푦1, 푦2 푆(푥1, 푥2, 푦1, 푦2) ∧ 푇 (푥1, 푥2, 푦2) ,

휏2 = 푆(푥1, 푥2, 푥3, 푥4) → 푈(푥4),

휏3 = 푇 (푧1, 푧2, 푧3) ∧ 푈(푧3) → 푃 (푧1).

Then the guarded saturation first composes 휏1, 휏2 to obtain

* (︀ )︀ 휏1 := 푅(푥1, 푥2) → ∃푦1, 푦2 푆(푥1, 푥2, 푦1, 푦2) ∧ 푇 (푥1, 푥2, 푦2) ∧ 푈(푦2) ,

* * and then composes 휏1 , 휏3 to obtain 휏2 := 푅(푥1, 푥2) → 푃 (푥1). The GF resolution procedures, on the other hand, first Skolemise 휏1 to obtain

′ (︀ )︀ (︀ )︀ 휏1 := 푅(푥1, 푥2) → 푆 푥1, 푥2, 푓1(푥1, 푥2), 푓2(푥1, 푥2) ∧ 푇 푥1, 푥2, 푓2(푥1, 푥2) ,

′ ′ (︀ )︀ ′ then resolve 휏1 and 휏2 to obtain 휏2 := 푅(푥1, 푥2) → 푈 푓2(푥1, 푥2) , then resolve 휏1 and 휏3 to obtain ′ 휏3 := 푅(푥1, 푥2) ∧ 푈(푓2(푥1, 푥2)) → 푃 (푥1), ′ ′ ′ and finally resolve 휏2 and 휏3 to obtain 휏4 := 푅(푥1, 푥2) → 푃 (푥1).

67

7. Conclusion

7.1. Related Work

Our work was motivated by the work of De Nivelle and De Rijke[2003] and Ganzinger and De Nivelle[1999] that gave a resolution-based decision procedure for the guarded fragment. To the best of our knowledge, there has been no other attempt to relate resolution-based procedures to the tree-like model property for any guarded logic. There has been considerable work on rewritings for classes of TGDs. Bárány et al. [2013] give a rewriting algorithm for frontier-guarded TGDs, an extension of GTGDs. The algorithm is based on the tree-like model property, but works by generating all guarded consequences of a certain shape, rather than using resolution. This approach is not amenable to implementation. Gottlob et al.[2014] present an algorithm for another extension of GTGDs, namely weakly-guarded TGDs. The algorithm is similar to the one presented in this work, although again the approach will produce many unnecessary rules. No proofs are given in the conference paper Gottlob et al.[2014], and neither Gottlob et al.[2014] nor Bárány et al.[2013] deal with disjunctive TGDs. A very recent rewriting approach for DisGTGDs is given by Ahmetaj et al.[2018]. They use a game-theoretic correctness proof of their translation, and most notably, they achieve a polynomial runtime in case of DisGTGDs of bounded width. Their translation, however, adds a vast number of rules and fresh predicate symbols, in particular in case of rules with unbounded width. Moreover, many reasoning problems for description logics [Baader et al., 2003] are special cases of the GF satisfiability and DisGTGD Query Answering Problem [Calì et al., 2009, Hustadt et al., 2004]. These logics, however, are restricted to unary and binary relations, whereas GF and DisGTGDs have no such limitation.

7.2. Summary and Future Work

Our main goal was to connect the resolution-based approaches for GF satisfiability [De Niv- elle and De Rijke, 2003, Ganzinger and De Nivelle, 1999] with the tree-like model property. We decided to work with the language of DisGTGDs and its Quantifier-Free Query Answer- ing Problem as this setting makes the connection with the tree-like model property clearer than the GF satisfiability problem. After setting our problem statement, we introduced the chase and its disjunctive generalisation in Section 2.6 and explained how the chase can be used to construct a universal tree-like model. In Chapter3, we gave a new approach to normalise the chase in the case of GTGDs to admit a one-pass behaviour. We then generalised this result to the disjunctive chase. This one-pass behaviour makes the connection between inferences and the tree structure

69 7. Conclusion more closely related, and it was a key ingredient for our completeness proofs that use the tree-like model property. Moreover, it motivated the definition of algorithm-independent completeness properties for GTGD rewritings in Section 4.2 and DisGTGD rewritings in Section 5.1, which can thus be used beyond this thesis. As noted in Section 1.2, the chase is a strong tool that made its way into numerous studies. We believe that the one-pass property is not just a tool useful for our purposes, but that it can give valuable insights for future investigations and new decision procedures, and that it can greatly facilitate proofs relying on the chase procedure. We then derived two theoretically optimal1 decision procedures for the case of GTGDs. The first approach – the simple saturation – experiences an unpleasant unification issue, making it useless for practical purposes. We then fixed this issue in our second procedure – the guarded saturation. We discovered that the guarded saturation can experience a better runtime in the case of fixed dependencies than the existing GF resolution procedures. Moreover the guarded saturation does not require any Skolemisation step. We highlighted the differences between our approach and the GF resolution procedures in Section 6.2 by means of a simple example. A practical implementation of our procedure and an extensive comparison with the resolution approaches for GF is an open topic worth investigating. We then extended our results and derived an efficient procedure for the case of DisGTGDs – the disjunctive guarded saturation. We then observed that the disjunc- tive guarded saturation coincides with the resolution approaches for GF. Our work thus gives a natural account of why the selection and ordering strategies of referred GF resolution procedures work by considering the tree-like model property. Last but not least, we investigated which fragment of GF we can capture with our chosen framework. We discovered that our approach’s main restriction is the limitation of functions to positive literals and single clauses. We hence did not demystify the resolution approach for all of GF, but we made a first step capturing a substantial fragment – not to mention that our chosen framework has many practical applications on its own, as already said in Section 1.2. This clearly leaves the door open for further investigations that extend our results to a more extensive fragment of GF. As the disjunctive guarded saturation coincides with the resolution approaches for GF, it is imaginable that a generalisation of our completeness proof already unveils such an extension. Another step forward would be the integration of equality in our framework. This could potentially be done with the use of so-called equality-generating dependencies and a generalised version of the chase, as described in [Deutsch et al., 2008]. Lastly, just like the tree-like model property, our approach must not be restricted to GF. Many guarded logics would benefit from practical, effective decision procedures, and we believe that ideas similar to ours can prove itself valuable for that matter. In particular, we believe that extensions to frontier-guarded and weakly-guarded TGDs [Baget et al., 2011] are a plausible next step, and perhaps even the guarded negation fragment [Bárány et al., 2011] lies within the realms of possibility.

1As stated in Remark 4.4.22, there is an open question whether our procedures are optimal in case of bounded arities and relation symbols, which requires further investigations.

70 Appendix A. Proofs for Section 4.3

Lemma 4.3.3. Any 휎 ∈ SSat(Σ) is a full TGD in variable normal form with width(휎) ≤ 푤. Proof. Clearly, all rules in VNF(HNF(Σ)) satisfy the conditions and all derived rules will be in variable normal form due to the final VNF(·) application. If we perform a (COMPOSITION) step to derive 휎, then 휎 will only contain variables among {푥1, . . . , 푥hwidth(HNF(Σ))}, and hence width(휎) ≤ 푤. Since the rules used to derive 휎 are full, 휎 will also be full. If we perform an (ORIGINAL) step to derive 휎 = 훽′′ → 휂′′, the used rules 휏, 휏 ′ are of the form

훽(⃗푥) → ∃⃗푦휂(⃗푥,⃗푦), 훽′(⃗푧) → 휂′(⃗푧).

′′ (︀ ′ )︀ By definition, 휂 contains no 푦푖. Moreover, vars 훽 휃 ∖ 휂휃 ⊆ ⃗푥휃 and ⃗푥휃 ∩ ⃗푦 = ∅. So neither 휂′′ nor 훽′′ contains existentials, and thus 휎 will be full. As 휏 ∈ HNF(Σ), we have |⃗푥휃| ≤ 푤, and hence width(휎) ≤ 푤. Lemma 4.3.4. SSat(Σ) has the following properties: (a) For any 휏 ∈ HNF(Σ), we have VNF(휏) ∈ SSat(Σ). ′ (b) Let 퐼0, 퐼1, 퐼2 be a chase from SSat(Σ) and 휏, 휏 ∈ SSat(Σ) be full such that

(i) |consts(퐼0)| ≤ hwidth(HNF(Σ)), (ii) firing 휏 in 퐼0 creates 퐼1, ′ (iii) firing 휏 in 퐼1 creates 퐼2. * * Then there exists 휏 ∈ SSat(Σ) such that firing 휏 in 퐼0 creates 퐼0 ∪ (퐼2 ∖ 퐼1). ′ (c) Let 퐼0, 퐼1, 퐼2 be a chase, 휏 ∈ HNF(Σ) be non-full, and 휏 ∈ SSat(Σ) be full such that

(i) firing 휏 in the root 푟 of 퐼0 creates 퐼1 and a new node 푣, ′ 1 (ii) firing 휏 in 퐹푣 creates 퐼2, 2 (iii) 퐹푟 ∖ 퐼0 ̸= ∅. * * 2 Then there exists 휏 ∈ SSat(Σ) such that firing 휏 in 퐼0 creates 퐹푟 . Proof. The first property follows from the base case and final step of the algorithm. For the second property, we know that 휏, 휏 ′ are of the form

훽(⃗푥) → 휂(⃗푥), 훽′(⃗푧) → 휂′(⃗푧).

71 Appendix A. Proofs for Section 4.3

′ Without loss of generality, assume that consts(퐼0) ⊆ {푐1, . . . , 푐hwidth(HNF(Σ))}, and let ℎ, ℎ ′ ′ ′ * ′ be the triggers used for 휏, 휏 . If ℎ (훽 ) ⊆ 퐼0, setting 휏 := 휏 does the job. If, however, the chase step makes use of some facts in 퐼1 ∖ 퐼0, then define a unifier 휃 by {︃ 푥푖, if 푣 ∈ ⃗푥 and ℎ(푣) = 푐푖 푣휃 := . ′ 푥푖, if 푣 ∈ ⃗푧 and ℎ (푣) = 푐푖

′ ′ Note that ⃗푥휃 ∪ ⃗푧휃 ⊆ {푥1, . . . , 푥hwidth(HNF(Σ))}. Further, for any atom 퐵 ∈ 훽 with ′ ′ ′ ′ ℎ (퐵 ) ∈ 퐼1∖퐼0, there is an atom 퐻 ∈ 휂 with ℎ(퐻) = ℎ (퐵 ), and hence, by construction of 휃, ′ * (︀ ′ ′ )︀ 퐻휃 = 퐵 휃. Now by (COMPOSITION), we get the full 휏 = VNF 훽휃 ∪ (훽 휃 ∖ 휂휃) → 휂 휃 in SSat(Σ). Let 휌 be the renaming used by VNF(·), let ℎ′′ be the trigger defined by ′′ * ′′ −1 * * ℎ (푥푖) := 푐푖, and set ℎ := ℎ ∘ 휌 . Then firing 휏 based on ℎ in 퐼0 creates 퐼0 ∪ (퐼2 ∖ 퐼1). Similarly, for the last property, we know that 휏, 휏 ′ are of the form

훽(⃗푥) → ∃⃗푦휂(⃗푥,⃗푦) 훽′(⃗푧) → 휂′(⃗푧).

Let ℎ, ℎ′ be the triggers used for 휏, 휏 ′, and without loss of generality, assume that ℎ(훽) ′ * ′ only uses constants among 푐1, . . . , 푐bwidth(휏). Again, if ℎ(훽 ) ⊆ 퐼0, setting 휏 := 휏 does the job. Otherwise, we define a unifier 휃 by ⎧ 푥푖, if 푣 ∈ ⃗푥 and ℎ(푣) = 푐푖 ⎨⎪ ′ 푣휃 := 푥푖, if 푣 ∈ ⃗푧 and ℎ (푣) = 푐푖 . ⎪ ′ ⎩푦푖, if 푣 ∈ ⃗푧 and ℎ (푣) = ℎ(푦푖)

Note that consts(ℎ′(훽′)) ⊆ consts(ℎ(휂)) ⊆ consts(ℎ(훽)) ∪ ℎ(⃗푦), and hence dom(휃) = ⃗푥 ∪ ⃗푧. Further, for any 퐵′ ∈ 훽′ with vars(퐵′휃) ∩ ⃗푦 ̸= ∅, there must be 퐻 ∈ 휂 with ℎ(퐻) = ℎ′(퐵′), and hence, by construction of 휃, 퐻휃 = 퐵′휃. Consequently, vars(훽′휃 ∖ 휂휃) ⊆ ⃗푥휃. More- over, ⃗푥휃 ⊆ ⃗푥, and hence ⃗푥휃 ∩ ⃗푦 = ∅. Now set 휂′′ := {퐻′ ∈ 휂′ | vars(퐻′휃) ∩ ⃗푦 = ∅} and note that 휂′′ ̸= ∅ by Assumption (iii). Then by (ORIGINAL), we get the full * (︀ ′ ′′ )︀ 휏 = VNF 훽휃 ∪ (훽 휃 ∖ 휂휃) → 휂 휃 in SSat(Σ). Let 휌 be the renaming used by VNF(·), ′′ ′′ * ′′ −1 * let ℎ be the trigger defined by ℎ (푥푖) := 푐푖, and set ℎ := ℎ ∘ 휌 . Then firing 휏 based * 2 on ℎ in 퐼0 creates 퐹푟 .

Corollary A.1. Let 퐼0, . . . , 퐼푛 be a chase from SSat(Σ) and 휏1, . . . , 휏푛 ∈ SSat(Σ) be full such that |consts(퐼0)| ≤ hwidth(HNF(Σ)) and firing 휏푖 in 퐼푖−1 creates 퐼푖. Then there exists * * 휏 ∈ SSat(Σ) such that firing 휏 in 퐼0 creates 퐼0 ∪ (퐼푛 ∖ 퐼푛−1).

* Proof. We prove the claim by induction on 푛. For 푛 = 1, we just set 휏 := 휏1. ′ For 푛 > 1, by applying Lemma 4.3.4.(b) on 퐼푛−2, 퐼푛−1, 퐼푛 and 휏푛−1, 휏푛, we get 휏푛−1 ∈ ′ SSat(Σ) such that firing 휏푛−1 in 퐼푛−2 creates 퐼푛−2 ∪ (퐼푛 ∖ 퐼푛−1). Now we can apply the ′ * inductive hypothesis on 퐼0, . . . , 퐼푛−2, 퐼푛−2 ∪ (퐼푛 ∖ 퐼푛−1) and 휏1, . . . , 휏푛−2, 휏푛−1 to get 휏 ∈ * (︀ )︀ SSat(Σ) such that firing 휏 in 퐼0 creates 퐼0∪ (퐼푛−2∪(퐼푛∖퐼푛−1))∖퐼푛−2 = 퐼0∪(퐼푛∖퐼푛−1).

For the next proposition, recall the definition of evolve-completeness.

72 Definition 4.2.1 (Evolve-completeness). Let Σ be a set of GTGDs and Σ′ be a set of full TGDs. We say that Σ′ is evolve-complete with respect to Σ if it satisfies the following properties:

(a) For every full 휏 ∈ HNF(Σ), we have 휏 ∈ Σ′ (modulo renaming). (b) Let 퐼0, . . . , 퐼푛+1 be a chase, let 푟 be the node inside 퐼0, let 휏푟 ∈ HNF(Σ) be non-full ′ and 휏1, . . . , 휏푛 ∈ Σ be full such that

(i) firing 휏푟 in 퐼0 creates 퐼1, (ii) firing 휏푖 in 푣 in 퐼푖 creates 퐼푖+1, where 푣 is the child of 푟. ′ ′ 푛 ′ 푛+1 Then there exists a chase 퐼0, . . . , 퐼푚 from 퐹푟 and Σ of 퐹푟 .

An illustration of Property 4.2.1.(b) can be found in Figure 4.1.

Proposition 4.3.5 (Completeness). SSat(Σ) is evolve-complete with respect to Σ.

Proof. Property 4.2.1.(a) follows from Lemma 4.3.4.(a). For Property 4.2.1.(b), we first ′ ′ 1 1 푛+1 푛 use Corollary A.1 to obtain 휏 ∈ SSat(Σ) firing 휏 in 퐹푣 creates 퐹푣 ∪ (퐹푣 ∖ 퐹푣 ). Note 1 ′ that this is possible since |consts(퐹푣 )| = hwidth(휏푟) ≤ hwidth(HNF(Σ)). Let ℎ, ℎ be the ′ 푛 ′ ′ 푛 triggers used by 휏, 휏 , and let 퐹푟 , 퐼1, 퐼2 be the chase obtained by firing 휏 in 퐹푟 based on ′ ′ 1 1 ′ ′ ℎ, creating a new node 푣 , and firing 휏 in 퐹푣′ ⊇ 퐹푣 based on ℎ . Let 푟 be the root node ′ 1 푛+1 푛 2 푛+1 in this chase. We have 퐼2 ⊇ 퐹푣 ∪ (퐹푣 ∖ 퐹푣 ), and hence 퐹푟′ ⊇ 퐹푟 . Now, by applying 푛 ′ ′ ′ * * 푛 Lemma 4.3.4.(c) on 퐹푟 , 퐼1, 퐼2 and 휏, 휏 , we get 휏 ∈ SSat(Σ) such that firing 휏 in 퐹푟 2 푛 푛+1 creates 퐹푟′ . Thus, we have a chase from 퐹푟 and SSat(Σ) of 퐹푟 . Lemma 4.3.6. |SSat(Σ)| ≤ 22푛푤푎 .

Proof. By Lemma 4.3.3, SSat(Σ) is a set of full TGDs in variable normal form of width at most 푤. Now the claim follows by Lemma 4.4.15.

Lemma 4.3.7. The simple saturation algorithm terminates in PTIME for bounded 푤, 푛, 푎, EXPTIME for bounded 푎, and 2EXPTIME otherwise.

Proof. The initial variable and head normal form transformations can be done in lin- ear time by Lemma 4.1.3 and Definition 4.1.4. The algorithm then takes all pairs in HNF(Σ) × SSat(Σ) and SSat(Σ)2 to derive all possible resolvents of any pair. By 푎 Lemma 4.3.6, this process takes at most 푐 := 22푛푤 iterations, and at any point, there are at most 풪(푐2 + |Σ|푐) pairs. At any step, we have the possibility to unify any head and body atom pair of the two 푎 considered rules. By Lemmas 4.4.15 and 4.3.3, there are at most (푛푤푎)푛푤 such pairs. Unfortunately, we do not only consider mgus, but all unifiers. There are 푤2푤 potential unifiers. If two atoms unify, we resolve the rules and normalise the result to obtain a new rule. Using Lemmas 2.1.13 and 4.1.3, these steps can be done in linear time. Thus, we obtain the claimed complexity bounds.

Theorem 4.3.8. The simple saturation algorithm provides a decision procedure for the Quantifier-Free Query Answering Problem under GTGDs. The procedure terminates in PTIME for fixed Σ, EXPTIME for bounded 푎, and 2EXPTIME otherwise.

73 Appendix A. Proofs for Section 4.3

Proof. Since SSat(Σ) is evolve-complete, it is complete by Proposition 4.2.5. It is easy to show that the algorithm is sound. Now the claim follows by Proposition 2.5.1 and Lemma 4.3.7.

Note that the matching lower bounds can be found in Theorem 4.4.21.

74 Bibliography

Serge Abiteboul, Richard Hull, and Victor Vianu. Foundations of databases: the logical level. Addison-Wesley Longman Publishing Co., Inc., 1995.

Shqiponja Ahmetaj, Magdalena Ortiz, and Mantas Simkus. Rewriting guarded existential rules into small datalog programs. In LIPIcs-Leibniz International Proceedings in Infor- matics, volume 98, pages 4:1–4:24. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, 2018.

Antoine Amarilli and Michael Benedikt. When can we answer queries using result-bounded data interfaces? In Proceedings of the 35th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, pages 281–293. ACM, 2018. For a more detailed version, but with some known bugs in it, see https://arxiv.org/abs/1706.07936.

Hajnal Andréka, István Németi, and Johan van Benthem. Modal languages and bounded fragments of predicate logic. Journal of Philosophical Logic, 27(3):217–274, 1998.

Franz Baader, Diego Calvanese, Deborah McGuinness, Peter Patel-Schneider, and Daniele Nardi. The Description Logic Handbook: Theory, Implementation and Applications. Cambridge university press, 2003.

Leo Bachmair and Harald Ganzinger. Resolution Theorem Proving. Handbook of automated reasoning, 1:19–99, 2001.

Jean-François Baget, Marie-Laure Mugnier, Sebastian Rudolph, and Michaël Thomazo. Walking the complexity lines for generalized guarded existential rules. In IJCAI Proceedings-International Joint Conference on Artificial Intelligence, volume 22, page 712, 2011.

Vince Bárány, Balder Ten Cate, and Luc Segoufin. Guarded negation. In International Colloquium on Automata, Languages, and Programming, pages 356–367. Springer, 2011.

Vince Bárány, Michael Benedikt, and Balder Ten Cate. Rewriting Guarded Negation Queries. In International Symposium on Mathematical Foundations of Computer Science, pages 98–110. Springer, 2013.

Catriel Beeri and Moshe Y Vardi. A Proof Procedure for Data Dependencies. Journal of the ACM (JACM), 31(4):718–741, 1984.

Michael Benedikt, Julien Leblay, and Efthymia Tsamoura. Querying with access patterns and integrity constraints. Proceedings of the VLDB Endowment, 8(6):690–701, 2015.

75 Bibliography

Pierre Bourhis, Marco Manna, Michael Morak, and Andreas Pieris. Guarded-based disjunctive tuple-generating dependencies. ACM Transactions on Database Systems (TODS), 41(4):27, 2016.

Andrea Calì and Davide Martinenghi. Querying incomplete data over extended er schemata. Theory and Practice of Logic Programming, 10(3):291–329, 2010.

Andrea Calì, Georg Gottlob, and Thomas Lukasiewicz. Datalog±: a unified approach to ontologies and integrity constraints. In Proceedings of the 12th International Conference on Database Theory, pages 14–30. ACM, 2009.

Andrea Calì, Georg Gottlob, and Michael Kifer. Taming the Infinite Chase: Query Answering under Expressive Relational Constraints. Journal of Artificial Intelligence Research, 48:115–174, 2013.

Alonzo Church. An unsolvable problem of elementary number theory. American journal of mathematics, 58(2):345–363, 1936.

Chiara Cumbo, Wolfgang Faber, Gianluigi Greco, and Nicola Leone. Enhancing the magic-set method for disjunctive datalog programs. In International Conference on Logic Programming, pages 371–385. Springer, 2004.

Evgeny Dantsin, Thomas Eiter, Georg Gottlob, and Andrei Voronkov. Complexity and expressive power of logic programming. ACM Computing Surveys (CSUR), 33(3): 374–425, 2001.

Hans De Nivelle and Maarten De Rijke. Deciding the guarded fragments by resolution. Journal of Symbolic Computation, 35(1):21–58, 2003.

Alin Deutsch, Alan Nash, and Jeff Remmel. The chase revisited. In Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 149–158. ACM, 2008.

Thomas Eiter, Georg Gottlob, and Heikki Mannila. Disjunctive datalog. ACM Transactions on Database Systems (TODS), 22(3):364–418, 1997.

Ronald Fagin, Phokion G Kolaitis, Renée J Miller, and Lucian Popa. Data exchange: semantics and query answering, 2005.

Melvin Fitting. First-order logic and automated theorem proving. Springer Science & Business Media, 2012.

Harald Ganzinger and Hans De Nivelle. A superposition decision procedure for the guarded fragment with equality. In Logic in Computer Science, 1999. Proceedings. 14th Symposium on, pages 295–303. IEEE, 1999.

Georg Gottlob, Marco Manna, Michael Morak, and Andreas Pieris. On the Complexity of Ontological Reasoning under Disjunctive Existential Rules. In International Symposium on Mathematical Foundations of Computer Science, pages 1–18. Springer, 2012.

76 Bibliography

Georg Gottlob, Sebastian Rudolph, and Mantas Simkus. Expressiveness of guarded existential rule languages. In Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pages 27–38. ACM, 2014.

Erich Grädel. On the restraining power of guards. The Journal of Symbolic Logic, 64(4): 1719–1742, 1999.

Jacques Herbrand. Recherches sur la théorie de la démonstration. 1930.

Wilfrid Hodges. A shorter . Cambridge university press, 1997.

Ullrich Hustadt, Boris Motik, and Ulrike Sattler. Reducing shiq-description logic to disjunctive datalog programs. KR, 4:152–162, 2004.

Ullrich Hustadt, Boris Motik, and Ulrike Sattler. Reasoning in description logics by a reduction to disjunctive datalog. Journal of Automated Reasoning, 39(3):351–384, 2007.

David Maier, Alberto O Mendelzon, and Yehoshua Sagiv. Testing implications of data dependencies. ACM Transactions on Database Systems (TODS), 4(4):455–469, 1979.

M. S. Paterson and M. N. Wegman. Linear unification. In Proceedings of the Eighth Annual ACM Symposium on Theory of Computing, STOC ’76, pages 181–186, New York, NY, USA, 1976. ACM.

Michael O Rabin. Decidability of second-order theories and automata on infinite trees. Transactions of the american Mathematical Society, 141:1–35, 1969.

John Alan Robinson. A machine-oriented logic based on the resolution principle. Journal of the ACM (JACM), 12(1):23–41, 1965.

Anders Schlichtkrull. Formalization of Resolution Calculus in Isabelle, 2015. Available at https://people.compute.dtu.dk/andschl/Thesis.pdf.

Alan M Turing. On computable numbers, with an application to the entscheidungsproblem. Proceedings of the London mathematical society, 2(1):230–265, 1937.

77

Notation

− 퐹푣 , 17 DGSat∃(Σ), 55 푖 퐹푣, 17 GSat(Σ), 38 푁푣, 20 GSat∃(Σ), 38 푄푇 , 48 HNF(Σ), 30 푆+, 53 IFC(Σ), 48 푆−, 53 SHNF(Σ), 51 푇 푁 , 20 SSat(Σ), 35 푇푖, 20 Sk(Σ), 53 [푡푖/푥푗],7 VNF(Σ), 29, 54 풜 |= 휙,6 bwidth(Σ), 30, 54 풞,5 consts(휙),5 풟,9 cut(푇, 푆), 20 ℛ,5 dom(풜),5 풯 ,5 hwidth(Σ), 30, 54 풱,5 lvs(푇 ), 18 ⃗푥푖, 10 size(Σ), 30 ⃗푡,5 vars(휙),5 DGSat(Σ), 55 width(Σ), 30, 54

79

Index

atom,5 GTGD, 31, 73 atomic query, 10 existential variable check, 38 atomic rewriting, 12 fact,9 bag, 16 full disjunctive TGD, 10 BCQ, see boolean conjunctive query full TGD, 10 birth facts, 17 GF, see guarded fragment body, 10 GTGD, see guarded TGD body width guard, 10, 16 DisTGD, 54 guarded fragment,8 guarded simple rule, 54 guarded saturation, 38 TGD, 30 guarded simple rules, 53 boolean conjunctive query, 10 guarded subset, 16 chase, 14 guarded TGD, 10 chase proof guarded tree decomposition, 16 DisTGD, 19 guardedly-complete, 16 TGD, 15 head, 10 chase sequence, 15 head conjunction, 10 chase step, 15 head normal form, 30 chase tree, 18 head width clause,8 DisTGD, 54 completeness, 12 guarded simple rule, 54 conjunctive query, 10 TGD, 30 see CQ, conjunctive query homomorphism,6 database,9 immediate full consequence, 48, 54 DisGTGD, see disjunctive guarded TGD instance,9 disjunctive chase, 18 disjunctive chase step, 18 literals,8 disjunctive guarded saturation, 55 mgu, see most general unifier disjunctive guarded TGD, 10 most general unifier,7 disjunctive TGD, 10 DisTGD, see disjunctive TGD non-full head conjunction, 10 non-strict ancestor (descendant), 21 evc, see existential variable check evolve-completeness one-pass chase sequence, 21 DisGTGD, 48, 57 one-pass chase tree, 24

81 INDEX

Quantifier-Free Query Answering thread, 18 Problem, 11 trigger, 15 tuple-generating dependency, 10 relational schema,9 renaming,7 UBCQ, see union of boolean conjunctive queries shallow term, 53 UCQ, see union of conjunctive queries simple unifier,7 atom, 53 union of boolean conjunctive queries, 10 fact, 53 union of conjunctive queries, 10 set, 53 simple saturation, 35 variable normal form single head normal form, 51 guarded simple rule, 54 Skolem normal form, 52 TGD, 29 Skolemisation,6, 52 soundness, 13 width substitution,7 DisTGD, 54 guarded simple rule, 54 TGD, see tuple-generating dependency TGD, 30

82