Logic and Databases: a Deductive Approach INTRODUCTION

Logic and Databases: A Deductive Approach HERVI~ GALLAIRE Compagnie G~n~rale d'I~lectricit~ Laboratoire de Marcoussis, Marcoussis, France JACK MINKER University of Maryland, Computer Science Department, College Park, Maryland JEAN-MARIE NICOLAS ONERA-CERT, D~partement d'In[ormatique, Toulouse, France The purpose of this paper is to show that logic provides a convenient formalism for studying classical database problems. There are two main parts to the paper, devoted respectively to conventional databases and deductive databases. In the first part, we focus on query languages, integrity modeling and maintenance, query optimization, and data dependencies. The second part deals mainly with the representation and manipulation of deduced facts and incomplete information. Categories and Subject Descriptors: H.2.1 [Database Management]: Logical Desigum data models; H.2.3 [Database Management]: Languages--query languages; H.2.4 [Database Management]: Systems--query processing General Terms: Deductive Databases, Indefinite Data, Logic and Databases, Null Values, Relational Databases INTRODUCTION Ullman 1982] or other formal theories such as mathematical logic as their framework. As emphasized by Codd [1982], theoretical The purpose of this paper is to provide an database studies form a fundamental basis overview and a survey of a subfield of logic for the development of homogeneous and as it is applied to databases. We are mostly sound database management systems concerned with the application of logic to (DBMS), which offer sophisticated capa- databases, where logic may be used both as bilities for data handling. A comprehensive an inference system and as a representation study of the many problems that exist in language; we primarily consider relational databases requires a precise formalization type databases. Some important efforts in so that detailed analyses can be carried out the application of other aspects of logic and satisfactory solutions can be obtained. theory to databases (e.g., see Maier [1983], Most of the formal database studies that Ullman [1982], and the references provided are under way at present are concerned there) or those that deal with nonrelational with the relational data model introduced (i.e., hierarchical and network) databases by Codd [1970], and use either a specially {e.g., see Jacobs [1982] and Jacobs et al. developed database theory [Maier 1983; [1982]) are not covered here. The use of logic for knowledge represen- Current address of Herv~ Gallaire and Jean-Marie Nicolas: European Computer-Industry Research tation and manipulation is primarily due to Centre (ECRC), Arabellastrasse 17, D-8000 Miinchen the work of Green [1969]. His work was the 81, FRG. basis of various studies that led to so-called Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. © 1984 ACM 0360-0300/84/0600-0153 $00.75 Computing Surveys, Vol. 16, No. 2, June 1984 154 • H. Gallaire, J. Minker, and J.-M. Nicolas CONTENTS tional databases. Its use in query languages, integrity modeling and maintenance, query evaluation, and database schema analysis is described. In Section 2 we show how logic INTRODUCTION extends relational databases to permit de- 1.1 Relational Model duction and describe how logic provides a 1.2 Mathematical Logic sound basis for a proper treatment and 1.3 Databases Viewed through Logic understanding of null values and incom- 1. CONVENTIONAL DATABASES 1.1 Query Languages plete information. 1.2 Integrity Constraints In the remainder of this introduction, we 1.3 Query Optimization first describe the main concepts of the re- 1.4 Database Design lational data model. Following this, we 2. DEDUCTIVE DATABASES 2.1 Definition of Deductive specify what is meant by mathematical or Logic Databases logic, focusing on logic relevant to data- 2.2 Definite Deductive Databases (DDDBs) bases rather than logic in general. Finally, 2.3 Indefinite Deductive Databases (IDDDBs) we briefly introduce two ways in which 2.4 Logic Databases databases can be considered from the view- 3. CONCLUSION ACKNOWLEDGMENTS point of logic. REFERENCES v 1.1 Relational Model question-answering systems, which are To define a relational model we need concerned mainly with a highly deductive some concepts. A domain is a usually finite manipulation of a small set of facts, and set of values. The Cartesian product of thus require an inferential mechanism pro- domains D1 .... , Dn is denoted by D1 vided by logic. Similar techniques have × ... x Dn and is the set of all tuples been adapted to databases to handle large (Xl, ..., x,) such that for any i, i = 1 ..... sets of facts, negative information, open n (xl ~ D1). A relation is any subset of the queries, and other specific database topics. Cartesian product of one or more domains. These techniques have given rise to what A database {instance) is a finite set of finite is called deductive databases. However, the relations. By a finite relation we mean that use of logic to study databases is not re- the extension of the relation {i.e., the total- stricted to providing deductive capabilities ity of all tuples that can appear in a rela- in a DBMS; the pioneering work of Kuhns tion) is finite. The arity of a relation R C [1967, 1970] also uses logic for conventional D1 × ... × D, is n. One may envision a databases to characterize answers to quer- relation to be a table of values. Names are ies. generally associated with the columns of Aside from the introduction and conclu- these tables; these names are called attri- sion to this paper, there are two main sec- butes. Values of an attribute associated tions, which are devoted respectively to with column i of a relation are taken from conventional databases and deductive da- domain Di. A relation R with attributes A1, tabases. In this introduction we provide .... An defines a relation scheme denoted background material to familiarize the as R(A1, ..., An), whereas the specific re- reader with the terminology used through- lation R (i.e., the relation with specified out the paper, introducing the reader to tuples) is said to be an instance or extension concepts in relational databases, to the area of the relation scheme. of mathematical logic, and to the basic re- Not all instances of a relation scheme lationships between logic and databases. have meaningful interpretations; that is, Section 1 is an extended and revised ver- they do not correspond to valid sets of data sion of material that appeared in GaUaire according to the intended semantics of the [1981]. This material shows how logic pro- database. One therefore introduces a set of vides a formalism for databases, and how constraints, referred to as integrity con- this formalism has been applied to conven- straints, associated with a relation scheme Computing Surveys, Vol. 16, No. 2, June 1984 Logic and Databases: A Deductive Approach • 155 to ensure that the database meets the in- tors, -~ (not), & (and), V (or), --~ (implica- tended semantics. Integrity constraints tion), ~ (equivalence), and (4) quantifiers, may involve interrelationships between re- V (for all), 3 (there exists). Throughout the lations. paper we use lowercase letters from the To summarize, a database scheme con- start of the alphabet to represent constants sists of a collection of relation schemes (a, b, c .... ), those from the end of the together with a set of integrity constraints. alphabet to represent variables (u, v, w, x, A database instance, also called a database y, z), and letters such as (/, g, h, ...) to state, is a collection of relation instances, denote functions. one for each relation in the database A term is defined recursively to be a scheme. A database state is said to be valid constant or a variable, or if f is an n-ary if all relation instances that it contains function and tl, ..., tn are terms, then obey the integrity constraints. In this pa- f(h, ..., tn) is a term. There are no other per, values in database/relation instances terms. We usually assume that a term in are referred to as elements, constants, or the context of databases is function free; individuals, depending on the context. that is, it is either a constant or a variable. To manipulate data in a relational data- If P is an n-ary predicate symbol and tl, base, a language is introduced. One may .... tn are terms, then P(h, ..., tn) is an introduce an algebraic language based on atomic formula. An atomic formula or its algebraic operators, or a calculus language, negation is a literal. Well-formed formulas which we discuss in the following section. (wffs) are defined recursively as follows. An In an algebraic language we need only two atomic formula is a wff. If Wl and w2 are operators for our purposes: the project and wffs, then -n(wl), (Wl) V (w2), (Wl) & (w2), join operators. Given a relation R, and X a (wl) --~ (w2), and (Wl) ~ (We) are wffs. A set of attributes of R, then the projection of closed wff is one that does not contain any R on X is {s[X][s E R}, where siX] is a free variable (i.e., it contains only quanti- tuple constructed from s by keeping all and fied variables and constants). only those components that belong to at- In dealing with wffs it is sometimes con- tributes in X. Given two relations R and S, venient to place them in a normal form. A the natural join R * S is formed by com- wff is in prenex normal form if all quanti- puting the Cartesian product, R × S, se- tiers appear in front of the formula.

Load more