An Introduction to Relational Database Theory

Total Page:16

File Type:pdf, Size:1020Kb

An Introduction to Relational Database Theory An Introduction to Relational Database Theory Hugh Darwen Download free books at Hugh Darwen An Introduction to Relational Database Theory Download free ebooks at bookboon.com 2 An Introduction to Relational Database Theory © 2010 Hugh Darwen & Ventus Publishing ApS ISBN 978-87-7681-500-4 This book is dedicated to the researchers at IBM United Kingdom’s Scientific Research Centre, Peterlee, UK, in the 1970s, who designed and implemented the relational database language, ISBL, that has been my guide ever since. Download free ebooks at bookboon.com 3 An Introduction to Relational Database Theory Contents Contents Preface 10 1. Introduction 14 1.1 Introduction 14 1.2 What Is a Database? 14 1.3 “Organized Collection of Symbols” 15 1.4 “To Be Interpreted as a True Account” 16 1.5 “Collection of Variables” 17 1.6 What Is a Relational Database? 18 1.7 “Relation” Not Equal to “Table” 19 1.8 Anatomy of a Relation 21 1.9 What Is a DBMS? 22 1.10 What Is a Database Language? 23 1.11 What Does a DBMS Do? 23 1.12 Creating and Destroying Variables 24 1.13 Taking Note of Integrity Rules 26 1.14 Taking Note of Authorisations 27 Bli med i NITO studentene og få: Jobbsøkerkurs Be`iijhb\ib\q>Qjbnºfi\_ Epmd_dnf]dno\i_ #a)`fnjh\m]`d_nfjiom\fo$ F\mmd`m`q`dg`_idib A\bgdb`i`ooq`mf Please click the advert Odgb\ibodg-+++]`_mdaonbmpkk`m G`nh`mjbh`g__`bdiikidoj)ij*nop_`io Download free ebooks at bookboon.com 4 An Introduction to Relational Database Theory Contents 1.15 Updating Variables 27 1.16 Providing Results of Queries 30 EXERCISE 32 2. Values, Types, Variables, Operators 33 2.1 Introduction 33 2.2 Anatomy of A Command 33 2.3 Important Distinctions 36 2.4 A Closer Look at a Read-Only Operator (+) 37 2.5 Read-only Operators in Tutorial D 37 2.6 What Is a Type? 40 2.7 What Is a Type Used For? 42 2.8 The Type of a Relation 42 2.9 Relation Literals 43 2.10 Types and Representations 47 2.11 What Is a Variable? 49 2.12 Updating a Variable 50 2.13 Conclusion 52 EXERCISES 54 Getting Started with Rel 55 Ren vannkraft siden 1898 Ny teknologi for å utnytte vannkraften maksimalt Norges største leverandør av fjernvarme Europas mest moderne driftssentral Norges største netteier Størst innen strømsalg VI ER PÅ ... Les mer på hafslund.no Please click the advert – ER DU? Nytenkning, kompetanse og muligheter gjør Hafslund til en spennende arbeidsplass. ` Gå inn på hafslund.no/jobb Download free ebooks at bookboon.com 5 An Introduction to Relational Database Theory Contents 3. Predicates and Propositions 63 3.1 Introduction 63 3.2 What Is a Predicate? 63 3.3 Substitution and Instantiation 68 3.4 How a Relation Represents an Extension 69 3.5 Deriving Predicates from Predicates 75 EXERCISES 84 4. Relational Algebra – The Foundation 85 4.1 Introduction 85 4.2 Relations and Predicates 88 4.3 Relational Operators and Logical Operators 89 4.4 JOIN and AND 90 4.5 RENAME 93 4.6 Projection and Existential Quantification 97 4.7 Restriction and AND 103 4.8 Extension and AND 106 4.9 UNION and OR 108 4.10 Semidifference and NOT 111 Please click the advert Er du student? Les mer her www.statnett.no/no/Jobb-og-karriere/Studenter Download free ebooks at bookboon.com 6 An Introduction to Relational Database Theory Contents 4.11 Concluding Remarks 114 EXERCISES 115 Working with a Database in Rel 117 5. Building on The Foundation 120 5.1 Introduction 120 5.2 Semijoin and Composition 121 5.3 Aggregate Operators 125 5.4 Relations within a Relation 129 5.5 Using Aggregate Operators with Nested Relations 131 5.6 SUMMARIZE 132 5.7 GROUP and UNGROUP 135 5.8 WRAP and UNWRAP 139 5.9 Relation Comparison 141 5.10 Other Operators on Relations and Tuples 146 EXERCISES 148 6. Constraints and Updating 150 6.1 Introduction 150 6.2 A Closer Look at Constraints and Consistency 151 Exploration and production in Norway Please click the advert Download free ebooks at bookboon.com 7 An Introduction to Relational Database Theory Contents 6.3 Expressing Constraint Conditions 152 6.4 Useful Shorthands for Expressing Constraints 159 6.5 Updating Relvars 165 EXERCISES 172 7. Database Design I: Projection-Join Normalization 173 7.1 Introduction 173 7.2 Avoiding Redundancy 173 7.3 Join Dependencies 175 7.4 Fifth Normal Form 183 7.5 Functional Dependencies 188 7.6 Keys 193 7.7 The Role of FDs and Keys in Optimization 194 7.8 Boyce-Codd Normal Form (BCNF) 195 7.9 JDs Not Arising from FDs 203 EXERCISES 208 8. Database Design II: Other Issues 211 8.1 Group-Ungroup and Wrap-Unwrap Normalization 211 ^ ^ Please click the advert ^ Download free ebooks at bookboon.com 8 An Introduction to Relational Database Theory Contents 8.2 Restriction-Union Normalization 217 8.3 Surrogate Keys 219 8.4 Representing “Entity Subtypes” 221 Appendix A: Tutorial D Version 2 224 Appendix B: References and Bibliography 228 Notes 230 Please click the advert Download free ebooks at bookboon.com 9 An Introduction to Relational Database Theory Preface Preface This book introduces you to the theory of relational databases, focusing on the application of that theory to the design of computer languages that properly embrace it. The book is intended for those studying relational databases as part of a degree course in Information Technology (IT). Relational database theory, originally proposed by Edgar F. Codd in 1969, is a topic in Computer Science. Codd’s seminal paper (1970) was entitled A Relational Model of Data for Large Shared Data Banks (reference [5] in Appendix B). An introductory course on relational databases offered by a university’s Computer Science (or similarly named) department is typically broadly divided into a theory component and what we might call an “industrial” component. The “industrial” component typically teaches the language, SQL (Structured Query Languagei), that is widely used in the industry for database purposes, and it might also teach other topics of current significance in the industry. Although this book is only about the theory, I hope it will be interesting and helpful to you even if your course’s main thrust is industrial. The book is directly based on a course of nine lectures delivered annually to undergraduates at the University of Warwick, England, as part of a 14-lecture module entitled Fundamentals of Relational Databases. The remaining five lectures of that module are on SQL. We encourage our students to compare and contrast SQL with what they have learned in the theory part. We explain that study of the theory, and an example of a computer language based on that theory, should: enable them to understand the technology that is based on it, and how to use that technology (even if it is only loosely based on the theory, as is the case with SQL systems); provide a basis for evaluating and criticizing the current state of the art; illustrate of some of the generally accepted principles of good computer language design; equip those who might be motivated in their future careers to bring about change for the better in the database industry. Examples and exercises in this book all use a language, Tutorial D, invented by the author and C.J. Date for the express purpose of teaching the subject matter at hand. Implementations of Tutorial D, which is described in reference [11], are available as free software on the Web. The one we use at the University of Warwick is called Rel, made by Dave Voorhis of the University of Derby. Rel is freely available at http://dbappbuilder.sourceforge.net/Rel.html. This book is accompanied by Exercises in Relational Database Theory, in which the exercises given at the end of each chapter (except the last) are copied and a few further exercises have been added. Sample solutions to all the exercises are provided and the reader is strongly recommended to study these solutions (preferably after attempting the exercises!). The book consists of eight chapters and two appendixes, as follows. Download free ebooks at bookboon.com 10 An Introduction to Relational Database Theory Preface Chapter 1, Introduction, is based on my first lecture and gives a broad overview of what a database is, what a relational database is, what a database management system (DBMS) is, what a DBMS is expected to do, and how a relational DBMS does those things. In Chapter 2, Values, Types, Variables, Operators, based on my second lecture, we look at the four fundamental concepts on which most computer languages are based. We acquire some useful terminology to help us talk about these concepts in a precise way, and we begin to see how the concepts apply to relational database languages in particular. Relational database theory is based very closely on logic. Fortunately, perhaps, in-depth knowledge and understanding of logic are not needed. Chapter 3, Predicates and Propositions, based on my third lecture, teaches just enough of that subject for our present purposes, without using too much formal notation. Chapter 4, Relational Algebra – The Foundation, based on material from lectures 4 and 5, describes the set of operators that is commonly accepted as forming a suitable basis for writing a special kind of expression that is used for various purposes in connection with a relational databasenotably, queries and constraints. Chapter 5, Building on The Foundation, describes additional operators that are defined in Tutorial D (lectures 5-6) to illustrate some of the additional kinds of things that are needed in a relational database language for practical purposes.
Recommended publications
  • Using Relational Databases in the Engineering Repository Systems
    USING RELATIONAL DATABASES IN THE ENGINEERING REPOSITORY SYSTEMS Erki Eessaar Department of Informatics, Tallinn University of Technology, Raja 15,12618 Tallinn, Estonia Keywords: Relational data model, Object-relational data model, Repository, Metamodeling. Abstract: Repository system can be built on top of the database management system (DBMS). DBMSs that use relational data model are usually not considered powerful enough for this purpose. In this paper, we analyze these claims and conclude that they are caused by the shortages of SQL standard and inadequate implementations of the relational model in the current DBMSs. Problems that are presented in the paper make usage of the DBMSs in the repository systems more difficult. This paper also explains that relational system that follows the rules of the Third Manifesto is suitable for creating repository system and presents possible design alternatives. 1 INTRODUCTION technologies in one data model is ROSE (Hardwick & Spooner, 1989) that is experimental data "A repository is a shared database of information management system for the interactive engineering about the engineered artifacts." (Bernstein, 1998) applications. Bernstein (2003) envisions that object- These artifacts can be software engineering artifacts relational systems are good platform for the model like models and patterns. Repository system contains management systems. ORIENT (Zhang et al., 2001) a repository manager and a repository (database) and SFB-501 Reuse Repository (Mahnke & Ritter, (Bernstein, 1998). Bernstein (1998) explains that 2002) are examples of the repository systems that repository manager provides services for modeling, use a commercial ORDBMS. ORDBMS in this case retrieving, and managing objects in the repository is a system which uses a database language that and therefore must offer functions of the Database conforms to SQL:1999 or later standard.
    [Show full text]
  • Sixth Normal Form
    3 I January 2015 www.ijraset.com Volume 3 Issue I, January 2015 ISSN: 2321-9653 International Journal for Research in Applied Science & Engineering Technology (IJRASET) Sixth Normal Form Neha1, Sanjeev Kumar2 1M.Tech, 2Assistant Professor, Department of CSE, Shri Balwant College of Engineering &Technology, DCRUST University Abstract – Sixth Normal Form (6NF) is a term used in relational database theory by Christopher Date to describe databases which decompose relational variables to irreducible elements. While this form may be unimportant for non-temporal data, it is certainly important when maintaining data containing temporal variables of a point-in-time or interval nature. With the advent of Data Warehousing 2.0 (DW 2.0), there is now an increased emphasis on using fully-temporalized databases in the context of data warehousing, in particular with next generation approaches such as Anchor Modeling . In this paper, we will explore the concepts of temporal data, 6NF conceptual database models, and their relationship with DW 2.0. Further, we will also evaluate Anchor Modeling as a conceptual design method in which to capture temporal data. Using these concepts, we will indicate a path forward for evaluating a possible translation of 6NF-compliant data into an eXtensible Markup Language (XML) Schema for the purpose of describing and presenting such data to disparate systems in a structured format suitable for data exchange. Keywords :, 6NF,SQL,DKNF,XML,Semantic data change, Valid Time, Transaction Time, DFM I. INTRODUCTION Normalization is the process of restructuring the logical data model of a database to eliminate redundancy, organize data efficiently and reduce repeating data and to reduce the potential for anomalies during data operations.
    [Show full text]
  • Translating Data Between Xml Schema and 6Nf Conceptual Models
    Georgia Southern University Digital Commons@Georgia Southern Electronic Theses and Dissertations Graduate Studies, Jack N. Averitt College of Spring 2012 Translating Data Between Xml Schema and 6Nf Conceptual Models Curtis Maitland Knowles Follow this and additional works at: https://digitalcommons.georgiasouthern.edu/etd Recommended Citation Knowles, Curtis Maitland, "Translating Data Between Xml Schema and 6Nf Conceptual Models" (2012). Electronic Theses and Dissertations. 688. https://digitalcommons.georgiasouthern.edu/etd/688 This thesis (open access) is brought to you for free and open access by the Graduate Studies, Jack N. Averitt College of at Digital Commons@Georgia Southern. It has been accepted for inclusion in Electronic Theses and Dissertations by an authorized administrator of Digital Commons@Georgia Southern. For more information, please contact [email protected]. 1 TRANSLATING DATA BETWEEN XML SCHEMA AND 6NF CONCEPTUAL MODELS by CURTIS M. KNOWLES (Under the Direction of Vladan Jovanovic) ABSTRACT Sixth Normal Form (6NF) is a term used in relational database theory by Christopher Date to describe databases which decompose relational variables to irreducible elements. While this form may be unimportant for non-temporal data, it is certainly important for data containing temporal variables of a point-in-time or interval nature. With the advent of Data Warehousing 2.0 (DW 2.0), there is now an increased emphasis on using fully-temporalized databases in the context of data warehousing, in particular with approaches such as the Anchor Model and Data Vault. In this work, we will explore the concepts of temporal data, 6NF conceptual database models, and their relationship with DW 2.0. Further, we will evaluate the Anchor Model and Data Vault as design methods in which to capture temporal data.
    [Show full text]
  • Persistent Staging Area Models for Data Warehouses
    https://doi.org/10.48009/1_iis_2012_121-132 Issues in Information Systems Volume 13, Issue 1, pp. 121-132, 2012 PERSISTENT STAGING AREA MODELS FOR DATA WAREHOUSES V. Jovanovic, Georgia Southern University, Statesboro, USA , [email protected] I. Bojicic, Fakultet Organizacioninh Nauka, Beograd, Serbia [email protected] C. Knowles, Georgia Southern University, Statesboro, USA, [email protected] M. Pavlic, Sveuciliste Rijeka/Odjel Informatike, Rijeka, Croatia, [email protected] ABSTRACT The paper’s scope is conceptual modeling of the Data Warehouse (DW) staging area seen as a permanent system of records of interest to regulatory/auditing compliance. The context is design of enterprise data warehouses compliant with the next generation of DW architecture that is DW 2.0. We address temporal DW aspect and review relevant alternative models i.e. compliant with DW 2.0, namely the Data Vault (DV) and the Anchor Model (AM). The principal novelty is a Conceptual DV Model and its comparison with the Anchor Model. Keywords: Conceptual Data Vault, Persistent Staging Area, Data Warehouse, Anchor Model, System of Records, Staging Area Design, DW Compliance, Next Generation DW Architecture, DW 2.0 INTRODUCTION The new DW architecture 2.0 [14] is an attempt to define a standard for the next generation of Data Warehouses (DW). One of the explicit requirements in the DW 2.0 is to support changes over time (while preserving the entire history of data and its structure). A DW 2.0 staging area may be explicitly recognized as a System of Records [15] or an Enterprise DW, see [4], and [20].
    [Show full text]
  • Database Normalization
    Outline Data Redundancy Normalization and Denormalization Normal Forms Database Management Systems Database Normalization Malay Bhattacharyya Assistant Professor Machine Intelligence Unit and Centre for Artificial Intelligence and Machine Learning Indian Statistical Institute, Kolkata February, 2020 Malay Bhattacharyya Database Management Systems Outline Data Redundancy Normalization and Denormalization Normal Forms 1 Data Redundancy 2 Normalization and Denormalization 3 Normal Forms First Normal Form Second Normal Form Third Normal Form Boyce-Codd Normal Form Elementary Key Normal Form Fourth Normal Form Fifth Normal Form Domain Key Normal Form Sixth Normal Form Malay Bhattacharyya Database Management Systems These issues can be addressed by decomposing the database { normalization forces this!!! Outline Data Redundancy Normalization and Denormalization Normal Forms Redundancy in databases Redundancy in a database denotes the repetition of stored data Redundancy might cause various anomalies and problems pertaining to storage requirements: Insertion anomalies: It may be impossible to store certain information without storing some other, unrelated information. Deletion anomalies: It may be impossible to delete certain information without losing some other, unrelated information. Update anomalies: If one copy of such repeated data is updated, all copies need to be updated to prevent inconsistency. Increasing storage requirements: The storage requirements may increase over time. Malay Bhattacharyya Database Management Systems Outline Data Redundancy Normalization and Denormalization Normal Forms Redundancy in databases Redundancy in a database denotes the repetition of stored data Redundancy might cause various anomalies and problems pertaining to storage requirements: Insertion anomalies: It may be impossible to store certain information without storing some other, unrelated information. Deletion anomalies: It may be impossible to delete certain information without losing some other, unrelated information.
    [Show full text]
  • A Developer's Guide to Data Modeling for SQL Server
    Praise for A Developer’s Guide to Data Modeling for SQL Server “Eric and Joshua do an excellent job explaining the importance of data modeling and how to do it correctly. Rather than relying only on academic concepts, they use real-world ex- amples to illustrate the important concepts that many database and application develop- ers tend to ignore. The writing style is conversational and accessible to both database design novices and seasoned pros alike. Readers who are responsible for designing, imple- menting, and managing databases will benefit greatly from Joshua’s and Eric’s expertise.” —Anil Desai, Consultant, Anil Desai, Inc. “Almost every IT project involves data storage of some kind, and for most that means a relational database management system (RDBMS). This book is written for a database- centric audience (database modelers, architects, designers, developers, etc.). The authors do a great job of showing us how to take a project from its initial stages of requirements gathering all the way through to implementation. Along the way we learn how to handle some of the real-world design issues that typically surface as we go through the process. “The bottom line here is simple. This is the book you want to have just finished read- ing when your boss says ‘We have a new project I would like your help with.’” —Ronald Landers, Technical Consultant, IT Professionals, Inc. “The Data Model is the foundation of the application. I’m pleased to see additional books being written to address this critical phase. This book presents a balanced and pragmatic view with the right priorities to get your SQL server project off to a great start and a long life.” —Paul Nielsen, SQL Server MVP, SQLServerBible.com “This is a truly excellent introduction to the database design methodology that will work for both novices and advanced designers.
    [Show full text]
  • GRAPH DATABASE THEORY Comparing Graph and Relational Data Models
    GRAPH DATABASE THEORY Comparing Graph and Relational Data Models Sridhar Ramachandran LambdaZen © 2015 Contents Introduction .................................................................................................................................................. 3 Relational Data Model .............................................................................................................................. 3 Graph databases ....................................................................................................................................... 3 Graph Schemas ............................................................................................................................................. 4 Selecting vertex labels .............................................................................................................................. 4 Examples of label selection ....................................................................................................................... 4 Drawing a graph schema ........................................................................................................................... 6 Summary ................................................................................................................................................... 7 Converting ER models to graph schemas...................................................................................................... 9 ER models and diagrams ..........................................................................................................................
    [Show full text]
  • Data Definition Language (Ddl)
    DATA DEFINITION LANGUAGE (DDL) CREATE CREATE SCHEMA AUTHORISATION Authentication: process the DBMS uses to verify that only registered users access the database - If using an enterprise RDBMS, you must be authenticated by the RDBMS - To be authenticated, you must log on to the RDBMS using an ID and password created by the database administrator - Every user ID is associated with a database schema Schema: a logical group of database objects that are related to each other - A schema belongs to a single user or application - A single database can hold multiple schemas that belong to different users or applications - Enforce a level of security by allowing each user to only see the tables that belong to them Syntax: CREATE SCHEMA AUTHORIZATION {creator}; - Command must be issued by the user who owns the schema o Eg. If you log on as JONES, you can only use CREATE SCHEMA AUTHORIZATION JONES; CREATE TABLE Syntax: CREATE TABLE table_name ( column1 data type [constraint], column2 data type [constraint], PRIMARY KEY(column1, column2), FOREIGN KEY(column2) REFERENCES table_name2; ); CREATE TABLE AS You can create a new table based on selected columns and rows of an existing table. The new table will copy the attribute names, data characteristics and rows of the original table. Example of creating a new table from components of another table: CREATE TABLE project AS SELECT emp_proj_code AS proj_code emp_proj_name AS proj_name emp_proj_desc AS proj_description emp_proj_date AS proj_start_date emp_proj_man AS proj_manager FROM employee; 3 CONSTRAINTS There are 2 types of constraints: - Column constraint – created with the column definition o Applies to a single column o Syntactically clearer and more meaningful o Can be expressed as a table constraint - Table constraint – created when you use the CONTRAINT keyword o Can apply to multiple columns in a table o Can be given a meaningful name and therefore modified by referencing its name o Cannot be expressed as a column constraint NOT NULL This constraint can only be a column constraint and cannot be named.
    [Show full text]
  • A Dictionary of DBMS Terms
    A Dictionary of DBMS Terms Access Plan Access plans are generated by the optimization component to implement queries submitted by users. ACID Properties ACID properties are transaction properties supported by DBMSs. ACID is an acronym for atomic, consistent, isolated, and durable. Address A location in memory where data are stored and can be retrieved. Aggregation Aggregation is the process of compiling information on an object, thereby abstracting a higher-level object. Aggregate Function A function that produces a single result based on the contents of an entire set of table rows. Alias Alias refers to the process of renaming a record. It is alternative name used for an attribute. 700 A Dictionary of DBMS Terms Anomaly The inconsistency that may result when a user attempts to update a table that contains redundant data. ANSI American National Standards Institute, one of the groups responsible for SQL standards. Application Program Interface (API) A set of functions in a particular programming language is used by a client that interfaces to a software system. ARIES ARIES is a recovery algorithm used by the recovery manager which is invoked after a crash. Armstrong’s Axioms Set of inference rules based on set of axioms that permit the algebraic mani- pulation of dependencies. Armstrong’s axioms enable the discovery of minimal cover of a set of functional dependencies. Associative Entity Type A weak entity type that depends on two or more entity types for its primary key. Attribute The differing data items within a relation. An attribute is a named column of a relation. Authorization The operation that verifies the permissions and access rights granted to a user.
    [Show full text]
  • Shadow Theory: Data Model Design for Data Integration
    Shadow Theory: Data Model Design for Data Integration Jason T. Liu* (Draft. Date:9/12/2012) ABSTRACT In information ecosystems, semantic heterogeneity is known as General Terms Algorithms, Design, Human Factors, Theory. the root issue for the difficulties of data integration, and the Relational Model is not designed for addressing such challenges, i.e. to re-use data that is modeled from other sources in the local Keywords data model design. Although researchers have proposed many Data integration, information ecosystems, shadows, meanings, different approaches, and software vendors have designed tools to mental entity, semantic heterogeneity, tags, help the data integration task, it remains an art relying on human labors. Due to lacks of a comprehensive theory to guide the overall modeling process, the quality of the integrated data heavily depends on the data integrators’ experiences. 1. Introduction Based on observations of practical issues for enterprise customer Data integration is essential for most enterprise systems that rely data integration, we believe that the needed solution is a new data on data from multiple sources. Lenzerini defined data integration model. This new data model needs to manage the difficulties of as to combine data residing at different sources in order to provide semantic heterogeneity: different data collected from different users with a unified view of these data [1]. Specifically, the goal perspectives about the same subject matter can naturally be of Enterprise Information Integration (EII) is to provide a uniform inconsistencies or even in conflicts. In other words, existing data access to multiple data sources without having to load the data models are designed to support single version of the truth, and into a central place [2].
    [Show full text]
  • Datalog on Infinite Structures
    Datalog on Infinite Structures DISSERTATION zur Erlangung des akademischen Grades doctor rerum naturalium (Dr. rer. nat.) im Fach Informatik eingereicht an der Mathematisch-Naturwissenschaftlichen Fakultät II Humboldt-Universität zu Berlin von Herr Dipl.-Math. Goetz Schwandtner geboren am 10.09.1976 in Mainz Präsident der Humboldt-Universität zu Berlin: Prof. Dr. Christoph Markschies Dekan der Mathematisch-Naturwissenschaftlichen Fakultät II: Prof. Dr. Wolfgang Coy Gutachter: 1. Prof. Dr. Martin Grohe 2. Prof. Dr. Nicole Schweikardt 3. Prof. Dr. Thomas Schwentick Tag der mündlichen Prüfung: 24. Oktober 2008 Abstract Datalog is the relational variant of logic programming and has become a standard query language in database theory. The (program) complexity of datalog in its main context so far, on finite databases, is well known to be in EXPTIME. We research the complexity of datalog on infinite databases, motivated by possible applications of datalog to infinite structures (e.g. linear orders) in temporal and spatial reasoning on one hand and the upcoming interest in infinite structures in problems related to datalog, like constraint satisfaction problems: Unlike datalog on finite databases, on infinite structures the computations may take infinitely long, leading to the undecidability of datalog on some infinite struc- tures. But even in the decidable cases datalog on infinite structures may have ar- bitrarily high complexity, and because of this result, we research some structures with the lowest complexity of datalog on infinite structures: Datalog on linear orders (also dense or discrete, with and without constants, even colored) and tree orders has EXPTIME-complete complexity. To achieve the upper bound on these structures, we introduce a tool set specialized for datalog on orders: Order types, distance types and type disjoint programs.
    [Show full text]
  • A Theory of Regular Queries
    A Theory of Regular Queries Moshe Y. Vardi Rice University [email protected] ABSTRACT [22], he proposed using first-order logic as a declarative database A major theme in relational database theory is navigat- query language [23]. This led to the development of both ing the tradeoff between expressiveness and tractability for SEQUEL [16] and QUEL [54], as practical relational query query languages, where the query-containment problem is languages realizing Codd's idea, ultimately giving in 1986 considered a benchmark of tractability. The query class rise to the SQL standard [27], which has continued to evolve UCQ, consisting off unions of conjunctive queries, is a frag- over the past 30 years. ment of first-order logic that has a decidable query contain- Starting in the late 1970s, Codd's definition of first-order ment problem, but its expressiveness is limited. Extend- logic as \expressively complete" came under serious criti- ing UCQ with recursion yields Datalog, an expressive query cism [4], and various proposals emerged on how to extend language that has been studied extensively and has recently its expressive power [4, 6, 17], whose essence was to extend become popular in application areas such as declarative net- first-order logic with recursion, which was added to SQL in working. Unfortunately, Datalog has an undecidable query 1999 by way of common table expressions [29]. On the re- containment problem. Identifying a fragment of Datalog search side, the most popular relational language embodying that is expressive enough for applications but has a decid- recursion is Datalog [15], describe in more details below.
    [Show full text]