Datalog on Infinite Structures

Datalog on Infinite Structures DISSERTATION zur Erlangung des akademischen Grades doctor rerum naturalium (Dr. rer. nat.) im Fach Informatik eingereicht an der Mathematisch-Naturwissenschaftlichen Fakultät II Humboldt-Universität zu Berlin von Herr Dipl.-Math. Goetz Schwandtner geboren am 10.09.1976 in Mainz Präsident der Humboldt-Universität zu Berlin: Prof. Dr. Christoph Markschies Dekan der Mathematisch-Naturwissenschaftlichen Fakultät II: Prof. Dr. Wolfgang Coy Gutachter: 1. Prof. Dr. Martin Grohe 2. Prof. Dr. Nicole Schweikardt 3. Prof. Dr. Thomas Schwentick Tag der mündlichen Prüfung: 24. Oktober 2008 Abstract Datalog is the relational variant of logic programming and has become a standard query language in database theory. The (program) complexity of datalog in its main context so far, on finite databases, is well known to be in EXPTIME. We research the complexity of datalog on infinite databases, motivated by possible applications of datalog to infinite structures (e.g. linear orders) in temporal and spatial reasoning on one hand and the upcoming interest in infinite structures in problems related to datalog, like constraint satisfaction problems: Unlike datalog on finite databases, on infinite structures the computations may take infinitely long, leading to the undecidability of datalog on some infinite structures. But even in the decidable cases datalog on infinite structures may have arbitrarily high complexity, and because of this result, we research some structures with the lowest complexity of datalog on infinite structures: Datalog on linear orders (also dense or discrete, with and without constants, even colored) and tree orders has EXPTIME-complete complexity. To achieve the upper bound on these structures, we introduce a tool set specialized for datalog on orders: Order types, distance types and type disjoint programs. The type concept yields a finite representation of the infinite program results, which could also be of interest for practical applications. We create special type disjoint versions of the programs allowing to solve datalog without the recursion inherent in each datalog program. A transfer of our methods shows that constraint satisfaction problems on infinite structures occur with arbitrarily high time complexity, like datalog. Keywords: Datalog, Infinite Structures, Complexity, Decidability Zusammenfassung Datalog ist die relationale Variante der logischen Programmierung und ist eine Stan- dard-Abfragesprache in der Datenbankentheorie geworden. Die Programmkomplexi- tät von Datalog im bisherigen Hauptanwendungsgebiet, auf endlichen Strukturen, ist bekanntermassen in EXPTIME. Wir untersuchen die Komplexität von Datalog auf unendlichen Strukturen, motiviert durch mögliche Anwendungen von Datalog auf unendlichen Strukturen (z.B. linearen Ordnungen) im zeitlichen und räumlichen Schliessen, aber auch durch das aufkommende Interesse an unendlichen Strukturen bei verwandten theoretischen Problemen, wie Constraint Satisfaction Problems (CSP): Im Gegensatz zu endlichen Strukturen können Datalog-Berechnungen auf unendlichen Strukturen unendlich lange dauern, was zur Unentscheidbarkeit von Datalog auf unendlichen Strukturen führen kann. Aber auch in den entscheidbaren Fällen kann die Komplexität von Datalog auf unendlichen Strukturen beliebig hoch sein. Im Hinblick auf dieses Ergebnis widmen wir uns dann unendlichen Strukturen mit der niedrigsten Komplexität von Datalog: Wir zeigen, dass Datalog auf linearen Ordnun- gen (auch dichte und diskrete, mit oder ohne Konstanten und sogar gefärbte) und Baumordnungen EXPTIME-vollständig ist. Für die Bestimmung der oberen Schranke werden Werkzeuge für Datalog auf Ordnungen eingeführt: Ordnungstypen, Abstandstypen und typdisjunkte Program- me. Die Typkonzepte liefern eine endliche Beschreibung der unendlichen Program- mergebnisse und könnten auch für praktische Anwendungen von Interesse sein. Wir erzeugen spezielle typdisjunkte Programme, die sich ohne Rekursion lösen lassen. Ein Transfer unserer Methoden auf CSPs zeigt, dass CSPs auf unendlichen Struk- turen mit beliebig hoher Zeitkomplexität vorkommen, wie Datalog. Schlagwörter: Datalog, Unendliche Strukturen, Komplexität, Entscheidbarkeit Acknowledgements I want to thank my thesis advisor Martin Grohe for guiding me through the world of datalog on infinite structures in many valuable discussions and with many helpful suggestions and comments on my ideas. I thank Nicole Schweikardt for several interesting discussions leading to new ideas and views on the research presented in this thesis. I am grateful to Mark Weyer for his careful proofreading of most of my ideas and pointing out mistakes in the proofs, and of course for his suggestions about possible solutions. Thanks go to Thomas Gottron and Kord Eickmeyer for proofreading parts of this thesis. Although he has not participated in the work at this thesis directly, I want to thank my former advisor Clemens Lautemann. With his interesting lectures and seminars and in numerous discussions he has aroused my curiosity about questions in computer science, especially in complexity theory and logics. Sad, that he passed away so soon! I am thankful to my colleagues at the computer science department at the Jo- hannes Gutenberg-University in Mainz, especially Elmar Schömer, to enable me to finish my research project after the sudden loss of my former advisor and also Thomas Schwentick for suggesting Martin as new advisor. Last but not least I thank my parents for their continuing support. v vi Contents 1 Introduction1 1.1 Published Results.............................5 1.2 Outline of This Thesis..........................5 2 Datalog, Infinite Structures9 2.1 Notations from Logics and Model Theory................9 2.1.1 Structures.............................9 2.1.2 Homomorphisms......................... 10 2.2 Datalog.................................. 10 2.2.1 Datalog Computation....................... 10 2.2.2 Program Parameters....................... 13 2.2.3 Datalog and Logics........................ 13 2.2.4 Datalog and Negation...................... 13 2.3 Datalog Problems............................. 14 2.4 Linear Orders and Successor Structures................. 15 2.5 Representation of Infinite Structures.................. 16 2.5.1 Representations and Datalog................... 16 2.5.2 Computational Model Theory.................. 19 2.6 Allen’s Interval Algebra......................... 21 3 Lower Bounds for Datalog 25 3.1 Universal Turing Machine Simulation.................. 25 3.2 EXPTIME-hard Lower Bound...................... 27 3.3 Summary of this Chapter......................... 29 4 Undecidability of Datalog on Infinite Structures 31 4.1 Successor Structures........................... 31 4.2 Successor-Like Structures......................... 32 4.2.1 Beyond Decidability....................... 35 4.2.2 Undecidability and Boundedness................ 36 4.3 Structures for Complexity Functions.................. 37 4.3.1 Lower Bounds........................... 38 4.3.2 Datalog Games.......................... 40 4.4 Summary of this Chapter......................... 46 vii 5 Types and Upper Bounds 49 5.1 Distance Types.............................. 50 5.2 Simple Cases............................... 56 5.2.1 Non-Strict Orders......................... 57 5.2.2 Monadic Datalog......................... 57 5.3 DATALOG-NONEMPTINESS on Orders................ 59 5.4 Boundedness and DATALOG-TUPLE on Orders............ 63 5.5 Datalog on Dense Orders......................... 70 5.6 Datalog on Orders with Constants.................... 73 5.7 Negation in Datalog Rules........................ 75 5.8 Representations and DATALOG-TUPLE................ 76 5.8.1 Monadic Datalog......................... 76 5.8.2 Linear Orders........................... 77 5.8.3 Dense Linear Orders....................... 77 5.8.4 Linear Orders With Constants.................. 77 5.8.5 Decidable Structures....................... 77 5.9 Summary of this Chapter......................... 78 6 Datalog on Colored Orders 81 6.1 Colored Orders.............................. 82 6.1.1 Colored Orders and Types.................... 83 6.1.2 Interval Types........................... 84 6.1.3 Modifying Datalog Programs.................. 95 6.2 Orders with one Monadic Relation and Constants........... 96 6.3 Colored Orders With Constants..................... 104 6.3.1 A Partition Lemma for Infinite Linear Orders......... 105 6.3.2 Monadic Memberships...................... 108 6.3.3 Structure Transformations.................... 108 6.3.4 Program Transformations.................... 113 6.4 Negation in Datalog Rules........................ 119 6.5 Colored Orders and DATALOG-TUPLE................ 120 6.6 Summary of this Chapter......................... 122 7 Datalog on Tree Orders 125 7.1 Preliminaries............................... 126 7.1.1 Tree Orders............................ 126 7.1.2 DATLOG-NONEMPTINESS and Homomorphisms...... 127 7.2 Upper Bound for Tree Orders...................... 128 7.3 Upper Bounds for Tree Orders with Constants............. 129 7.4 Tree Orders with Dense Parts...................... 135 7.5 Tree Order Variants............................ 137 7.6 DATALOG-TUPLE............................ 138 7.7 Tree Orders and Negation........................ 139 7.8 Summary of this Chapter......................... 139 viii 8 Constraint Satisfaction 141 8.1 CSPs.................................... 141 8.2 Transfer of Order Results to CSP.................... 144 8.3 EXPTIME-Undecidable Non-Dichotomy...............

Datalog on Infinite Structures

Dedalus: Datalog in Time and Space

GRAPH DATABASE THEORY Comparing Graph and Relational Data Models

A Deductive Database with Datalog and SQL Query Languages

DATALOG and Constraints

Data Definition Language (Ddl)

Constraint-Based Synthesis of Datalog Programs

Datalog Logic As a Query Language a Logical Rule Anatomy of a Rule Anatomy of a Rule Sub-Goals Are Atoms

A Dictionary of DBMS Terms

Logical Query Languages

Shadow Theory: Data Model Design for Data Integration

Mapping OCL As a Query and Constraint Language Traduciendo OCL Como Lenguaje De Consultas Y Restricciones

A Theory of Regular Queries