PGQL: a Property Graph Query Language

Total Page:16

File Type:pdf, Size:1020Kb

PGQL: a Property Graph Query Language PGQL: a Property Graph Query Language Oskar van Rest Sungpack Hong Jinha Kim Oracle Labs Oracle Labs Oracle Labs [email protected] [email protected] [email protected] Xuming Meng Hassan Chafi Oracle Labs Oracle Labs [email protected] hassan.chafi@oracle.com ABSTRACT need has arisen for a well-designed query language for graphs that Graph-based approaches to data analysis have become more supports not only typical SQL-like queries over structured data, but widespread, which has given need for a query language for also graph-style queries for reachability analysis, path finding, and graphs. Such a graph query language needs not only SQL-like graph construction. It is important that such a query language is functionality for querying structured data, but also intrinsic general enough, but also has a right level between expressive power support for typical graph-style applications: reachability anal- and ability to process queries efficiently on large-scale data. ysis, path finding and graph construction. Various graph query languages have been proposed in the past. We propose a new query language for the popular Property Most notably, there is SPARQL, which is the standard query lan- Graph (PG) data model: the Property Graph Query Language guage for the RDF (Resource Description Framework) graph data (PGQL). PGQL is based on the paradigm of graph pattern model, and which has been adopted by several graph databases [1, matching, closely follows syntactic structures of SQL, and pro- 7, 12]. However, the RDF data model regularizes the graph repre- vides regular path queries with conditions on labels and prop- sentation as set of triples (or edges), which means that even con- erties to allow for reachability and path finding queries. Be- stant literals are encoded as graph vertices. Such artificial vertices sides intrinsic vertex, edge and path types, PGQL also has the make it hard to express graph queries in a natural way. A more nat- graph as intrinsic type and allows for graph construction and ural data model is the Property Graph (PG) data model, in which query composition. vertices and edges in a graph can be associated with arbitrary prop- erties as key-value pairs. The PG data model has been adopted by various graph databases [5, 8, 11, 4] and graph processing frame- CCS Concepts works [6, 2], and has formed the basis of recently proposed lan- •Information systems ! Query languages for non-relational guages for graph algorithms [22, 10, 23] and graph queries [3, 21]. engines; A notable query language for the PG data model is Cypher [3], which is, just like SPARQL, based on graph pattern matching, an Keywords elegant way of defining patterns in graphs. However, Cypher is missing some fundamental graph querying functionalities, namely, Graph query languages, graph-structured data, property graphs regular path queries and graph construction. To overcome the limitations of existing graph query languages, 1. INTRODUCTION we designed a new query language for the PG data model: the In recent years, both the data management community and the Property Graph Query Language (PGQL). PGQL combines graph data mining community have been paying a lot of attention to the pattern matching with SQL-like syntax and functionality and has graph-based approaches in which graphs are used as fundamental full-blown support for regular path queries and graph construction. representation for data analysis. The graph data model allows for Because its syntax is SQL-like, the language is intuitive to use for promising analyses over topology, in addition to analyses over data existing SQL users. Furthermore, PGQL queries return a “result stored in topology, which can be performed by standard relational set” with variables and bindings, just like in SQL. This means that mechanisms. queries can be naturally nested inside SQL queries. In PGQL, a Essential for typical data analyses is a query language to query graph pattern is not only composed of vertices and edges, but also data in a high-level and productive manner. SQL, having been of paths, such that users can perform reachability analyses or find widely adopted as the query language for relational databases, il- paths in a graph. Supported is the well-studied class of regular path lustrates very well the importance of query languages in data man- queries, but not just with conditions on labels of edges, but with agement. Now graph technology has become more widespread, the conditions on labels as well as properties of vertices and edges along paths. PGQL also has an intrinsic graph type to support Permission to make digital or hard copies of all or part of this work for graph transformation applications, allowing one or more graphs to personal or classroom use is granted without fee provided that copies are be constructed and returned from a query. not made or distributed for profit or commercial advantage and that copies Specifically, our contributions are as follows: (i) the design of bear this notice and the full citation on the first page. To copy otherwise, to PGQL, an intuitive high-level pattern-matching query language for republish, to post on servers or to redistribute to lists, requires prior special the PG data model, which closely follows SQL’s syntactic struc- permission and/or a fee. tures and provides SQL-like functionality as part of the solution Proceedings of the Fourth International Workshop on Graph Data Manage- (ii) ment Experience and Systems (GRADES 2016), June 24, 2016, Redwood (Section 3), the design of a syntax for regular path queries for Shores, USA PG graphs with conditions on labels and properties of vertices and Copyright 2016 ACM XXXXXXXXXXXX ...$15.00. edges along paths (Section 4), (iii) an elegant solution to graph 1 SELECT friend.name, friend.age construction inside a query language, allowing for graphs to be re- 2 //'Andy' 12(query solutions in comment) 3 FROM snGraph turned from a query and for graph queries to be composed (Sec- 4 WHERE tion 5). 5 (x WITH name = 'Paul') -[:likes]-> (friend), 6 (y WITH name = 'Amber') -/:likes*1..2/-> (friend), 7 x.age >= 2 * friend.age 2. RELATED WORK 8 ORDER BY friend.name There are two types of languages for graphs. First, there are the Figure 1: Example PGQL query, returning the friends of Paul that graph pattern-matching query languages that are similar to PGQL. are also friends, or friends of friends, of Amber. Only friends that Most notably, SPARQL [9] is the standard query language for the are at least twice as young are returned. RDF model. While SPARQL supports reachability by means of regular path queries (RPQs), actual paths that satisfy an RPQ can- not be returned from a query. Furthermore, conditions that compare 3. PATTERN MATCHING QUERIES vertices and edges along paths cannot be specified. Earlier propos- PGQL is based on the paradigm of graph pattern matching, which als [17, 16, 19, 27], which were also based on a labeled graph, is an elegant way of defining pattern in graphs (Section 3.1). It suffered from similar limitations. Cypher [3], the query language combines this with SQL-like syntax and also provides SQL-like of the graph database Neo4j, was the first pattern-matching query functionality as part of the solution (Section 3.2). language to target the PG model. While it has an intrinsic path type, path query support is based on finding all simple trails (i.e. paths 3.1 Graph Pattern Matching with non-repeating edges). However, evaluation of such queries A PGQL query is composed of the three clauses SELECT, FROM often requires iteration over the entire set of trails, which is an op- and WHERE, which are followed by optional solution modifier clauses eration that is exponential in the size of the graph and therefore such as ORDER BY, GROUP BY, and LIMIT. The FROM clause can also unsuitable for large graphs. Furthermore, Cypher does not have be omitted when there is only one graph instance. Additionally, an intrinsic graph type to support graph construction applications. PGQL includes special operators for graph pattern matching: ver- GraphQL [21] is a pattern-matching query language based on a PG- tex matching, edge matching and path matching. These matching like model. It supports recursion in such a way that not only paths operators are placed in the WHERE clause along with predicates over but even entire graph patterns can be the unit of repetition. How- vertices and edges. ever, the output of a GraphQL query is a set of graphs rather than Figure 1 shows an example PGQL query which finds patterns a result set with variables and bindings. In practice we are often from a data graph named snGraph (line 3). In the WHERE clause, interested in the data (i.e. properties) inside vertices and edges and the query matches a vertex x that has a property name with value such a lanugage does not allow you to extract such information. ’Paul’. The vertex x has an edge with the label ’likes’. The PQL [24] is a pattern-matching query language for biological net- destination vertex of this edge is referred to as vertex friend (line works with SQL-like syntax. Its WHERE clause supports reachability 5). Similarly, the query matches another vertex y with its name be- patterns, while actual paths are constructed in the SELECT clause ing ’Amber’. The vertex y is also connected to the vertex friend using pairs of bound variables from the WHERE clause. but through a path; the path is composed only of edges with label Second, there are the more general purpose imperative graph ’likes’ and the (hop-)length of the path is between 1 and 2 in- languages.
Recommended publications
  • Large Scale Querying and Processing for Property Graphs Phd Symposium∗
    Large Scale Querying and Processing for Property Graphs PhD Symposium∗ Mohamed Ragab Data Systems Group, University of Tartu Tartu, Estonia [email protected] ABSTRACT Recently, large scale graph data management, querying and pro- cessing have experienced a renaissance in several timely applica- tion domains (e.g., social networks, bibliographical networks and knowledge graphs). However, these applications still introduce new challenges with large-scale graph processing. Therefore, recently, we have witnessed a remarkable growth in the preva- lence of work on graph processing in both academia and industry. Querying and processing large graphs is an interesting and chal- lenging task. Recently, several centralized/distributed large-scale graph processing frameworks have been developed. However, they mainly focus on batch graph analytics. On the other hand, the state-of-the-art graph databases can’t sustain for distributed Figure 1: A simple example of a Property Graph efficient querying for large graphs with complex queries. Inpar- ticular, online large scale graph querying engines are still limited. In this paper, we present a research plan shipped with the state- graph data following the core principles of relational database systems [10]. Popular Graph databases include Neo4j1, Titan2, of-the-art techniques for large-scale property graph querying and 3 4 processing. We present our goals and initial results for querying ArangoDB and HyperGraphDB among many others. and processing large property graphs based on the emerging and In general, graphs can be represented in different data mod- promising Apache Spark framework, a defacto standard platform els [1]. In practice, the two most commonly-used graph data models are: Edge-Directed/Labelled graph (e.g.
    [Show full text]
  • Dedalus: Datalog in Time and Space
    Dedalus: Datalog in Time and Space Peter Alvaro1, William R. Marczak1, Neil Conway1, Joseph M. Hellerstein1, David Maier2, and Russell Sears3 1 University of California, Berkeley {palvaro,wrm,nrc,hellerstein}@cs.berkeley.edu 2 Portland State University [email protected] 3 Yahoo! Research [email protected] Abstract. Recent research has explored using Datalog-based languages to ex- press a distributed system as a set of logical invariants. Two properties of dis- tributed systems proved difficult to model in Datalog. First, the state of any such system evolves with its execution. Second, deductions in these systems may be arbitrarily delayed, dropped, or reordered by the unreliable network links they must traverse. Previous efforts addressed the former by extending Datalog to in- clude updates, key constraints, persistence and events, and the latter by assuming ordered and reliable delivery while ignoring delay. These details have a semantics outside Datalog, which increases the complexity of the language and its interpre- tation, and forces programmers to think operationally. We argue that the missing component from these previous languages is a notion of time. In this paper we present Dedalus, a foundation language for programming and reasoning about distributed systems. Dedalus reduces to a subset of Datalog with negation, aggregate functions, successor and choice, and adds an explicit notion of logical time to the language. We show that Dedalus provides a declarative foundation for the two signature features of distributed systems: mutable state, and asynchronous processing and communication. Given these two features, we address two important properties of programs in a domain-specific manner: a no- tion of safety appropriate to non-terminating computations, and stratified mono- tonic reasoning with negation over time.
    [Show full text]
  • A Deductive Database with Datalog and SQL Query Languages
    A Deductive Database with Datalog and SQL Query Languages FernandoFernando SSááenzenz PPéérez,rez, RafaelRafael CaballeroCaballero andand YolandaYolanda GarcGarcííaa--RuizRuiz GrupoGrupo dede ProgramaciProgramacióónn DeclarativaDeclarativa (GPD)(GPD) UniversidadUniversidad ComplutenseComplutense dede MadridMadrid (Spain)(Spain) 12/5/2011 APLAS 2011 1 ~ ContentsContents 1.1. IntroductionIntroduction 2.2. QueryQuery LanguagesLanguages 3.3. IntegrityIntegrity ConstraintsConstraints 4.4. DuplicatesDuplicates 5.5. OuterOuter JoinsJoins 6.6. AggregatesAggregates 7.7. DebuggersDebuggers andand TracersTracers 8.8. SQLSQL TestTest CaseCase GeneratorGenerator 9.9. ConclusionsConclusions 12/5/2011 APLAS 2011 2 ~ 1.1. IntroductionIntroduction SomeSome concepts:concepts: DatabaseDatabase (DB)(DB) DatabaseDatabase ManagementManagement SystemSystem (DBMS)(DBMS) DataData modelmodel (Abstract) data structures Operations Constraints 12/5/2011 APLAS 2011 3 ~ IntroductionIntroduction DeDe--factofacto standardstandard technologiestechnologies inin databases:databases: “Relational” model SQL But,But, aa currentcurrent trendtrend towardstowards deductivedeductive databases:databases: Datalog 2.0 Conference The resurgence of Datalog inin academiaacademia andand industryindustry Ontologies Semantic Web Social networks Policy languages 12/5/2011 APLAS 2011 4 ~ Introduction.Introduction. SystemsSystems Classic academic deductive systems: LDL++ (UCLA) CORAL (Univ. of Wisconsin) NAIL! (Stanford University) Ongoing developments Recent commercial
    [Show full text]
  • DATALOG and Constraints
    7. DATALOG and Constraints Peter Z. Revesz 7.1 Introduction Recursion is an important feature to express many natural queries. The most studied recursive query language for databases is called DATALOG, an ab- breviation for "Database logic programs." In Section 2.8 we gave the basic definitions for DATALOG, and we also saw that the language is not closed for important constraint classes such as linear equalities. We have also seen some results on restricting the language, and the resulting expressive power, in Section 4.4.1. In this chapter, we study the evaluation of DATALOG with constraints in more detail , focusing on classes of constraints for which the language is closed. In Section 7.2, we present a general evaluation method for DATALOG with constraints. In Section 7.3, we consider termination of query evaluation and define safety and data complexity. In the subsequent sections, we discuss the evaluation of DATALOG queries with particular types of constraints, and their applications. 7.2 Evaluation of DATALOG with Constraints The original definitions of recursion in Section 2.8 discussed unrestricted relations. We now consider constraint relations, and first show how DATALOG queries with constraints can be evaluated. Assume that we have a rule of the form Ro(xl, · · · , Xk) :- R1 (xl,l, · · ·, Xl ,k 1 ), • • ·, Rn(Xn,l, .. ·, Xn ,kn), '¢ , as well as, for i = 1, ... , n , facts (in the database, or derived) of the form where 'if;; is a conjunction of constraints. A constraint rule application of the above rule with these facts as input produces the following derived fact: Ro(x1 , ..
    [Show full text]
  • The Query Translation Landscape: a Survey
    The Query Translation Landscape: a Survey Mohamed Nadjib Mami1, Damien Graux2,1, Harsh Thakkar3, Simon Scerri1, Sren Auer4,5, and Jens Lehmann1,3 1Enterprise Information Systems, Fraunhofer IAIS, St. Augustin & Dresden, Germany 2ADAPT Centre, Trinity College of Dublin, Ireland 3Smart Data Analytics group, University of Bonn, Germany 4TIB Leibniz Information Centre for Science and Technology, Germany 5L3S Research Center, Leibniz University of Hannover, Germany October 2019 Abstract Whereas the availability of data has seen a manyfold increase in past years, its value can be only shown if the data variety is effectively tackled —one of the prominent Big Data challenges. The lack of data interoperability limits the potential of its collective use for novel applications. Achieving interoperability through the full transformation and integration of diverse data structures remains an ideal that is hard, if not impossible, to achieve. Instead, methods that can simultaneously interpret different types of data available in different data structures and formats have been explored. On the other hand, many query languages have been designed to enable users to interact with the data, from relational, to object-oriented, to hierarchical, to the multitude emerging NoSQL languages. Therefore, the interoperability issue could be solved not by enforcing physical data transformation, but by looking at techniques that are able to query heterogeneous sources using one uniform language. Both industry and research communities have been keen to develop such techniques, which require the translation of a chosen ’universal’ query language to the various data model specific query languages that make the underlying data accessible. In this article, we survey more than forty query translation methods and tools for popular query languages, and classify them according to eight criteria.
    [Show full text]
  • Constraint-Based Synthesis of Datalog Programs
    Constraint-Based Synthesis of Datalog Programs B Aws Albarghouthi1( ), Paraschos Koutris1, Mayur Naik2, and Calvin Smith1 1 University of Wisconsin–Madison, Madison, USA [email protected] 2 Unviersity of Pennsylvania, Philadelphia, USA Abstract. We study the problem of synthesizing recursive Datalog pro- grams from examples. We propose a constraint-based synthesis approach that uses an smt solver to efficiently navigate the space of Datalog pro- grams and their corresponding derivation trees. We demonstrate our technique’s ability to synthesize a range of graph-manipulating recursive programs from a small number of examples. In addition, we demonstrate our technique’s potential for use in automatic construction of program analyses from example programs and desired analysis output. 1 Introduction The program synthesis problem—as studied in verification and AI—involves con- structing an executable program that satisfies a specification. Recently, there has been a surge of interest in programming by example (pbe), where the specification is a set of input–output examples that the program should satisfy [2,7,11,14,22]. The primary motivations behind pbe have been to (i) allow end users with no programming knowledge to automatically construct desired computations by supplying examples, and (ii) enable automatic construction and repair of pro- grams from tests, e.g., in test-driven development [23]. In this paper, we present a constraint-based approach to synthesizing Data- log programs from examples. A Datalog program is comprised of a set of Horn clauses encoding monotone, recursive constraints between relations. Our pri- mary motivation in targeting Datalog is to expand the range of synthesizable programs to the new domains addressed by Datalog.
    [Show full text]
  • Graph Types and Language Interoperation
    March 2019 W3C workshop in Berlin on graph data management standards Graph types and Language interoperation Peter Furniss, Alastair Green, Hannes Voigt Neo4j Query Languages Standards and Research Team 11 January 2019 We are actively involved in the openCypher community, SQL/PGQ standards process and the ISO GQL (Graph Query Language) initiative. In these venues we in Neo4j are working towards the goals of a single industry-standard graph query language (GQL) for the property graph data model. We ​ ​ feel it is important that GQL should inter-operate well with other languages (for property graphs, and for other data models). Hannes is a co-author of the SIGMOD 2017 G-CORE paper and a co-author of the ​ ​ recent book Querying Graphs (Synthesis Lectures on Data Management). ​ ​ Alastair heads the Neo4j query languages team, is the author of the The GQL ​ Manifesto and of Working towards a New Work Item for GQL, to complement SQL ​ ​ PGQ. ​ Peter has worked on the design and early implementations of property graph typing in the Cypher for Apache Spark project. He is a former editor of OSI/TP and OASIS ​ ​ Business Transaction Protocol standards. Along with Hannes and Alastair, Peter has centrally contributed to proposals for SQL/PGQ Property Graph Schema. ​ We would like to contribute to or help lead discussion on two linked topics. ​ ​ Property Graph Types The information content of a classic Chen Entity-Relationship model, in combination with “mixin” multiple inheritance of structured data types, allows a very concise and flexible expression of the type of a property graph. A named (catalogued) graph type is composed of node and edge types.
    [Show full text]
  • Evaluating a Graph Query Language for Human-Robot Interaction Data in Smart Environments
    Evaluating a Graph Query Language for Human-Robot Interaction Data in Smart Environments Norman K¨oster1, Sebastian Wrede12, and Philipp Cimiano1 1 Cluster of Excellence Center in Cognitive Interactive Technology (CITEC) 2 Research Institute for Cognition and Robotics (CoR-Lab), Bielefeld University, Bielefeld Germany fnkoester,swrede,[email protected], Abstract. Solutions for efficient querying of long-term human-robot in- teraction data require in-depth knowledge of the involved domains and represents a very difficult and error prone task due to the inherent (sys- tem) complexity. Developers require detailed knowledge with respect to the different underlying data schemata, semantic mappings, and most im- portantly the query language used by the storage system (e.g. SPARQL, SQL, or general purpose language interfaces/APIs). While for instance database developers are familiar with technical aspects of query lan- guages, application developers typically lack the specific knowledge to efficiently work with complex database management systems. Addressing this gap, in this paper we describe a model-driven software development based approach to create a long term storage system to be employed in the domain of embodied interaction in smart environments (EISE). The targeted EISE scenario features a smart environment (i.e. smart home) in which multiple agents (a mobile autonomous robot and two virtual agents) interact with humans to support them in daily activities. To support this we created a language using Jetbrains MPS to model the high level EISE domain w.r.t. the occurring interactions as a graph com- posed of nodes and their according relationships. Further, we reused and improved capabilities of a previously created language to represent the graph query language Cypher.
    [Show full text]
  • Datalog Logic As a Query Language a Logical Rule Anatomy of a Rule Anatomy of a Rule Sub-Goals Are Atoms
    Logic As a Query Language If-then logical rules have been used in Datalog many systems. Most important today: EII (Enterprise Information Integration). Logical Rules Nonrecursive rules are equivalent to the core relational algebra. Recursion Recursive rules extend relational SQL-99 Recursion algebra --- have been used to add recursion to SQL-99. 1 2 A Logical Rule Anatomy of a Rule Our first example of a rule uses the Happy(d) <- Frequents(d,rest) AND relations: Likes(d,soda) AND Sells(rest,soda,p) Frequents(customer,rest), Likes(customer,soda), and Sells(rest,soda,price). The rule is a query asking for “happy” customers --- those that frequent a rest that serves a soda that they like. 3 4 Anatomy of a Rule sub-goals Are Atoms Happy(d) <- Frequents(d,rest) AND An atom is a predicate, or relation Likes(d,soda) AND Sells(rest,soda,p) name with variables or constants as arguments. Head = “consequent,” Body = “antecedent” = The head is an atom; the body is the a single sub-goal AND of sub-goals. AND of one or more atoms. Read this Convention: Predicates begin with a symbol “if” capital, variables begin with lower-case. 5 6 1 Example: Atom Example: Atom Sells(rest, soda, p) Sells(rest, soda, p) The predicate Arguments are = name of a variables relation 7 8 Interpreting Rules Interpreting Rules A variable appearing in the head is Rule meaning: called distinguished ; The head is true of the distinguished otherwise it is nondistinguished. variables if there exist values of the nondistinguished variables that make all sub-goals of the body true.
    [Show full text]
  • Logical Query Languages
    Logical Query Languages Motivatio n: 1. Logical rules extend more naturally to recursive queries than do es relational algebra. ✦ Used in SQL recursion. 2. Logical rules form the basis for many information-integration systems and applications. 1 Datalog Example , beer Likesdrinker Sellsbar , beer , price Frequentsdrinker , bar Happyd <- Frequentsd,bar AND Likesd,beer AND Sellsbar,beer,p Ab oveisarule. Left side = head. Right side = body = AND of subgoals. Head and subgoals are atoms. ✦ Atom = predicate and arguments. ✦ Predicate = relation name or arithmetic predicate, e.g. <. ✦ Arguments are variables or constants. Subgoals not head may optionally b e negated by NOT. 2 Meaning of Rules Head is true of its arguments if there exist values for local variables those in b o dy, not in head that make all of the subgoals true. If no negation or arithmetic comparisons, just natural join the subgoals and pro ject onto the head variables. Example Ab ove rule equivalenttoHappyd = Frequents ./ Likes ./ Sells dr ink er 3 Evaluation of Rules Two, dual, approaches: 1. Variable-based : Consider all p ossible assignments of values to variables. If all subgoals are true, add the head to the result relation. 2. Tuple-based : Consider all assignments of tuples to subgoals that make each subgoal true. If the variables are assigned consistent values, add the head to the result. Example: Variable-Based Assignment Sx,y <- Rx,z AND Rz,y AND NOT Rx,y R = B A 1 2 2 3 4 Only assignments that make rst subgoal true: 1. x ! 1, z ! 2. 2. x ! 2, z ! 3. In case 1, y ! 3 makes second subgoal true.
    [Show full text]
  • Flexible Querying of Graph-Structured Data
    Flexible Querying of Graph-Structured Data Petra Selmer July 2017 A thesis submitted in fulfillment of the requirements for the degree of Doctor of Philosophy Department of Computer Science and Information Systems Birkbeck, University of London Declaration This thesis is the result of my own work, except where explicitly acknowledged in the text. Petra Selmer July 23, 2017 Abstract Given the heterogeneity of complex graph data on the web, such as RDF linked data, it is likely that a user wishing to query such data will lack full knowledge of the structure of the data and of its irregularities. Hence, providing flexible querying capabilities that assist users in formulating their information-seeking requirements is highly desirable. The query language we adopt in this thesis comprises conjunctions of regular path queries, thus encompassing recent extensions to SPARQL to allow for querying paths in graphs using regular expressions (SPARQL 1.1). To this language we add three operators: APPROX, supporting standard notions of query approximation based on edit distance; RELAX, which performs query relaxation based on RDFS inference rules; and FLEX, which simultaneously applies both approximation and relaxation to a query conjunct, providing even greater flexibility for users. We undertake a detailed theoretical investigation of query approximation, query relaxation, and their combination. We show how these notions can be integrated into a single theoretical framework and we provide incremental evaluation algorithms that run in polynomial time in the size of the query and the data | provided the queries are acyclic and contain a fixed number of head variables | returning answers in ranked order of their `distance' from the original query.
    [Show full text]
  • Datalog on Infinite Structures
    Datalog on Infinite Structures DISSERTATION zur Erlangung des akademischen Grades doctor rerum naturalium (Dr. rer. nat.) im Fach Informatik eingereicht an der Mathematisch-Naturwissenschaftlichen Fakultät II Humboldt-Universität zu Berlin von Herr Dipl.-Math. Goetz Schwandtner geboren am 10.09.1976 in Mainz Präsident der Humboldt-Universität zu Berlin: Prof. Dr. Christoph Markschies Dekan der Mathematisch-Naturwissenschaftlichen Fakultät II: Prof. Dr. Wolfgang Coy Gutachter: 1. Prof. Dr. Martin Grohe 2. Prof. Dr. Nicole Schweikardt 3. Prof. Dr. Thomas Schwentick Tag der mündlichen Prüfung: 24. Oktober 2008 Abstract Datalog is the relational variant of logic programming and has become a standard query language in database theory. The (program) complexity of datalog in its main context so far, on finite databases, is well known to be in EXPTIME. We research the complexity of datalog on infinite databases, motivated by possible applications of datalog to infinite structures (e.g. linear orders) in temporal and spatial reasoning on one hand and the upcoming interest in infinite structures in problems related to datalog, like constraint satisfaction problems: Unlike datalog on finite databases, on infinite structures the computations may take infinitely long, leading to the undecidability of datalog on some infinite struc- tures. But even in the decidable cases datalog on infinite structures may have ar- bitrarily high complexity, and because of this result, we research some structures with the lowest complexity of datalog on infinite structures: Datalog on linear orders (also dense or discrete, with and without constants, even colored) and tree orders has EXPTIME-complete complexity. To achieve the upper bound on these structures, we introduce a tool set specialized for datalog on orders: Order types, distance types and type disjoint programs.
    [Show full text]