Database Theory
Total Page:16
File Type:pdf, Size:1020Kb
Course information 14 lectures: Thursday, DS4 (13:00–14:30) • 13 excercise classes: Monday, DS3 (11:10–12:40) • DATABASE THEORY { taught by Francesco Kriegel Oral examination (details based on applicable examination • Lecture 1: Introduction / Relational data model regulations) Course homepage (dates, slides, excercise sheets): Markus Krotzsch¨ • https://ddll.inf.tu-dresden.de/web/Database_ Theory_%28SS2016%29/en TU Dresden, 4 April 2016 Markus Krötzsch, 4 April 2016 Database Theory slide 2 of 27 Aims of the course Literature, prerequisites, related courses Serge Abiteboul, Richard Hull, Victor Vianu: Obtain an understanding of key topics in database theory with a • Foundations of Databases. Addison-Wesley. 1994. special focus on query formalisms: – Available at http://webdam.inria.fr/Alice/ Relational data model • – Slight deviations in the lecture Basic and advanced query languages – Further literature will be given for advanced topics • Expressive power of query languages Prerequisites: basics of first-order logic, Turing machines, • • Complexity of query answering + some algorithmic worst-case complexity • approaches Related courses at TUD: • Modelling with constraints – Advanced Logic • – Foundations of Semantic Web Technologies – Introduction to Logic Programming Connect databases with other advanced topics in logic/KR/formal – Introduction to Constraint Programming methods – Datenbanken (Grundlagen) – Intelligent Information Systems Markus Krötzsch, 4 April 2016 Database Theory slide 3 of 27 Markus Krötzsch, 4 April 2016 Database Theory slide 4 of 27 What is a database? What is a database? (2) A Database Management System (DBMS) is a software to manage Basic functionality of DBMS: collections of data. Schema definition: specify how data should be logically organised • Update: insert/delete/update stored data { highly important class of software systems • Query: retrieve stored data or information derived from it • { major role in industry and in research Administration: user rights management, configuration, recovery, data export, { extremely wide variety of concepts and implementations • etc. General three-level architecture of DBMS: Many related concerns: Persistence: data retained when DBMS is shut down External Level: Application-specific user views • • Optimisation: ensure maximal efficiency • Logical Level: Abstract data model, independent of Scalability: cope with increasing loads by adding resources • • implementation, conceptual view Concurrency: support many update and query operations in parallel • Physical Level: Data structures and algorithms, Distribution: combine data from several locations • • Interfaces: APIs, query languages, update languages, etc. platform-specific • ... • In this lecture: focus on logical view for relational data model In this lecture: schema, query languages, some optimisation Markus Krötzsch, 4 April 2016 Database Theory slide 5 of 27 Markus Krötzsch, 4 April 2016 Database Theory slide 6 of 27 Overview 1. Introduction | Relational data model 2. First-order queries 3. Complexity of query answering 4. Complexity of FO query answering 5. Conjunctive queries 6. Tree-like conjunctive queries The Relational Data Model 7. Query optimisation 8. Conjunctive Query Optimisation / First-Order Expressiveness 9. First-Order Expressiveness / Introduction to Datalog 10. Expressive Power and Complexity of Datalog 11. Optimisation and Evaluation of Datalog 12. Evaluation of Datalog (2) 13. Graph Databases and Path Queries 14. Outlook: database theory in practice See course homepage [ link] for more information and materials ⇒ Markus Krötzsch, 4 April 2016 Database Theory slide 7 of 27 Markus Krötzsch, 4 April 2016 Database Theory slide 8 of 27 Database = collection of tables Towards a formal definition of “table” A table row has one value for each column Lines: Stops: { row = function from the attributes of the table schema to Line Type SID Stop Accessible specific values 85 bus 17 Hauptbahnhof true 3 tram 42 Helmholtzstr. true Example: The row F1 ferry 57 Stadtgutstr. true ... ... 123 Gustav-Freytag-Str. false SID Stop Accessible ... ... ... ... ... ... Connect: 42 Helmholtzstr. true From To Line Every table has a schema: 57 42 85 ... ... ... Lines[Line:string, Type:string] 17 789 3 • Stops[SID:int, Stop:string, Accessible:bool] can be represented by the function: ... ... ... • Connect[From:int, To:int, Line:string] • f : SID 42, Stop "Helmholtzstr.", Accessible true { 7→ 7→ 7→ } Markus Krötzsch, 4 April 2016 Database Theory slide 9 of 27 Markus Krötzsch, 4 April 2016 Database Theory slide 10 of 27 Database = set of tables Database = set of relations Let dom (“domain”) be the (infinite) set of conceivable values in Observation: Attribute names don’t matter. Instead of the function tables. SID 42, Stop "Helmholtzstr.", Accessible true { 7→ 7→ 7→ } For simplicity, we drop the datatypes of database columns and assume that each column uses the same datatype that supports all we could also use a tuple: values in dom. 42, "Helmholtzstr.", true Definition h i A relation schema R[U] consists of a relation name R and a Necessary assumption: Attributes have a fixed order. • finite set U of attributes ( U is the arity of R[U]) | | Definition A table for R[U] is a finite set of functions from U to dom • A relation schema R[U] is defined as before • A database instance is a finite set of tables U • A table for R[U] is a finite subset of dom| | I • A database instance is a finite set of tables Note: we disregard the order and multiplicity of rows. • I U Recall that a subset of dom| | is just a U -ary relation. Sets of Tables are also called relation instances. The table with relation | | schema R[U] in the database instance is written RI. relations are also called relational structures. I Markus Krötzsch, 4 April 2016 Database Theory slide 11 of 27 Markus Krötzsch, 4 April 2016 Database Theory slide 12 of 27 Database = interpretation of first-order logic Database = set of facts Recall: Another convenient way to write databases: First-order logic is based on predicate symbols with a fixed Lines(85, "bus") • arity (we won’t need function symbols here) Lines(F1, "ferry") Stops(42 "Helmholtzstr." true) An interpretation of first-order logic is a pair ∆I, I : , , • I h · i ... – ∆I is a set (the domain of interpretation) – maps n-ary predicates p to n-ary relations p (∆ )n ·I I ⊆ I This is (almost) a database instance! Definition A fact is an expression p(t , ... , t ) where Definition 1 n p is an n-ary predicate symbol domain of interpretation ∆I = database domain dom • • t1,... tn are constant symbols predicate symbol = relation name • • A database instance is a finite set of facts. interpretation of predicate symbol (if finite!) = table • finite first-order logic interpretation = database instance When interpreting these facts logically, their least model is again • the database instance (viewed as a first-order logic interpretation). Markus Krötzsch, 4 April 2016 Database Theory slide 13 of 27 Markus Krötzsch, 4 April 2016 Database Theory slide 14 of 27 Visualising relations Database = hypergraph Binary relations (sets of pairs) can be viewed as directed graphs. Example: What to do with tables of arity , 2? { generalise graphs to hypergraphs Source Target Definition 1 2 A hypergraph is a triple V, E, ρ , where h i 1 3 V is a set of vertices • 2 5 E is a set of edge names • 3 2 ρ maps each edge name e E to • ∈ an n-ary relation ρ(e) Vn 3 4 ⊆ 4 3 In other words: finite hypergraphs are databases. 5 3 Many binary tables in one graph? Use table name to label edges! Markus Krötzsch, 4 April 2016 Database Theory slide 15 of 27 Markus Krötzsch, 4 April 2016 Database Theory slide 16 of 27 Summary: the relational model Relational databases are everywhere: sets of tables with named attributes (“named perspective”) • sets of relations (“unnamed perspective”) • first-order logic interpretations • The Relational Algebra sets of logical facts (ground atoms) • hypergraphs (and graphs as a special case) • . all restricted to finite sets Important elements of the theory of relational databases are very widely applicable, also to many datamodels that are not the classical relational one (e.g., graph databases, RDF databases, XML databases). Markus Krötzsch, 4 April 2016 Database Theory slide 17 of 27 Markus Krötzsch, 4 April 2016 Database Theory slide 18 of 27 Relational Algebra Queries Selection “Find all bus lines” Query language based on a set of operations on databases. σType="bus"Lines Each operation refers to one or more tables and produces another “Find all connections that begin and end in the same stop” table σFrom=ToConnect (we often simplify notation and write a table name rather than a table instance) Main operations of the named perspective: Definition Selection σ • The selection operator has the form σn=m Projection π • n is an attribute name Join ./ • • m is an attribute name or a constant value Renaming δ • • Consider a table RI for R[U]. Difference • − For m constant value: σn=m(RI) = f RI f (n) = m Union • { ∈ | } • ∪ For m attribute name: σn=m(RI) = f RI f (n) = f (m) Intersection • { ∈ | } • ∩ This is only defined if U contains the required attribute names. Markus Krötzsch, 4 April 2016 Database Theory slide 19 of 27 Markus Krötzsch, 4 April 2016 Database Theory slide 20 of 27 Projection Natural join “Find all possible types of lines” “Find all connections and their type of line” πTypeLines Connect ./ Lines “Find all pairs of adjacent stops on line 85” Connect: Lines: Connect ./ Lines: From To Line Line Type From To Line Type πFrom,To(σLine="85"Connect) 57 42 85 85 bus 57 42 85 bus Definition 17 789 3 3 tram 17 789 3 tram ... ... ... F1 ferry ... ... ... ... π The projection operator has the form a1,...,an where each ai is an ... ... attribute name. Consider a table RI for R[U]. Definition The natural join operator has the form ./. πa ,...,a (RI) = f a ,...,a f RI 1 n { 1 n} | ∈ Consider tables RI for R[U] and SI for S[V].