Database Query Languages Embedded in the Typed Lambda Calculus
Total Page:16
File Type:pdf, Size:1020Kb
Information and Computation IC2571 information and computation 127, 117144 (1996) article no. 0055 ViewDatabase metadata, citation and Query similar papers Languages at core.ac.uk Embedded in the Typed Lambda Calculusbrought* to you by CORE provided by Elsevier - Publisher Connector Gerd G. Hillebrand- and Paris C. Kanellakis- Department of Computer Science, Brown University, Box 1910, Providence, Rhode Island 02912 and Harry G. Mairson Department of Computer Science, Brandeis University, Waltham, Massachusetts 02254 picture is somewhat complex. If inputs and outputs are We investigate the expressive power of the typed *-calculus when Church numerals typed as Int (where Int#({ Ä {) Ä { Ä { expressing computations over finite structures, i.e., databases. We for some fixed {), Schwichtenberg [42] and Statman show that the simply typed *-calculus can express various database showed that the expressible multi-argument functions of query languages such as the relational algebra, fixpoint logic, and the complex object algebra. In our embeddings, inputs and outputs are type (Int, ..., Int) Ä Int (or equivalently, Int Ä }}} Ä *-terms encoding databases, and a program expressing a query is a Int Ä Int) are exactly the extended polynomials, i.e., the *-term which types when applied to an input and reduces to an output. functions generated by 0 and 1 using the operations addi- Our embeddings have the additional property that PTIME computable tion, multiplication and conditional. If inputs and outputs queries are expressible by programs that, when applied to an input, are Church numerals given more complex types than Int, reduce to an output in a PTIME sequence of reduction steps. Under our database input-output conventions, all elementary queries are express- exponentiation and predecessor can also be expressed. ible in the typed *-calculus and the PTIME queries are expressible in the However, Statman (as quoted in [16]) showed that order-5 (order-4) fragment of the typed *-calculus (with equality). equality, ordering, and subtraction are not expressible in ] 1996 Academic Press, Inc. TLC for any typing of Church numerals. These classical expressibility results have cast a negative and slightly confusing light on the possible encodings of computa- 1. INTRODUCTION tions in TLC. Simple types have been criticized for limiting flexibility in programs, but they have also been criticized for TLC Motivation and Background. The simply typed limiting expressibility. These criticisms have provided some *-calculus of Church [12] (typed *-calculus or TLC for motivation for examining more powerful typed calculi, such as short) with its syntax and operational semantics is an essen- the GirardReynolds second-order *-calculus [17, 41] tial part of any functional programming language. For (adding polymorphism via type quantification) or Milner's example, TLC is a core subset of all state-of-the-art func- ML [22, 38] (adding monomorphic fixpoints and let- tional languages such as ML, Haskell, and Miranda. TLC polymorphism). We believe that the criticism of TLC together with let-polymorphism [22, 38] is often infor- inflexibility is justified, although hard to quantify. The criticism mally referred to as core-ML. In this paper, we investigate the of TLC expressibility is unjustified, and a theme of this paper expressive power of TLC from the point of view of expressing is to quantify how rich a framework TLC is for expressing computations over finite structures. In other words, we study computations, provided that the right setting is chosen. the ability of TLC to express database queries. In fact, it is well known that provably hard decision Our interest in database computations is in marked con- problems can be embedded into TLC. This follows from a trast to the classical approach to TLC expressibility, which theorem of Statman that deciding equivalence of normal considers computations over Church numerals (see, e.g., forms of two well-typed *-terms is not elementary recursive [4, 16, 42]). There are several results characterizing the [43]. The proof in [43] uses a result of Meyer concerning expressive power of TLC over Church numerals, but the the complexity of decision problems in higher-order type theory [37]. A simple proof of both these results appears in * A preliminary version of the work presented here appeared in [25]. [35]. However, there are a number of difficulties when one - Research supported by ONR Contract N00014-91-J-4052, ARPA Order 8225. tries to turn these proofs into frameworks for computations. Research supported by NSF Grant CCR-9216185, ONR Grant They do not separate the fixed program (representing a N00014-93-1-1015, and the Tyson Foundation. function) from the variable data (representing the input). 117 0890-5401Â96 18.00 Copyright 1996 by Academic Press, Inc. All rights of reproduction in any form reserved. File: 643J 257101 . By:MB . Date:06:08:96 . Time:18:15 LOP8M. V8.0. Page 01:01 Codes: 7109 Signs: 4729 . Length: 60 pic 11 pts, 257 mm 118 HILLEBRAND, KANELLAKIS, AND MAIRSON They use computational overkill for lower complexity a variety of semantics (e.g., inflationary) to express the classes. Specifically, the Powerset construction, crucial to all various fixpoint logics. of the proofs, adds exponential factors to the computation. Complex object databases have been proposed as a signifi- For example, a simulation of quadratic time is forced to cant extension of relational databases, with many practical take at least an exponential number of reduction steps in applications; see [2] for a recent survey. Well-known these constructions. languages in this area are the complex object algebras One way of avoiding the anomalies associated with with or without Powerset of [1]. For the analysis of representations over Church numerals was recently expressibility of the complex object algebra with Powerset demonstrated by Leivant and Marion [33] for an ``impure'' we refer to [26, 31] and without Powerset to [39]. Note version of TLC. By augmenting the simply typed *-calculus that, although Powerset in [1] is powerful (as are the with a pairing operator and a ``bottom tier'' consisting of the second order queries in [10, 15, 46), it is an impractical free algebra of words over [0, 1] with associated construc- primitive, and much attention has been given to algebras tor, destructor, and discriminator functions, they obtain a without Powerset for PTIME queries. simple characterization of the computational complexity An elegant way of manipulating complex object class PTIME. databases, related to our paper, is based on functional In this paper we re-examine the question of representing programming. There has been some practical work in functions in the ``pure'' TLC, as opposed to ``impure'' ver- database query languages in this area, e.g., the early FQL sions. However, we use encodings of finite models or finite language of [8] and the more recent work on structural first-order relational structures (databases for short) instead recursion as a query language [6, 7, 28]. One important dif- of Church numerals. Thus, we are changing the problem ference of the framework developed here from [6, 7, 28] is from encoding numerical functions to encoding generic set that we use the ``pure'' TLC without any added recursion functions, i.e., database queries. For our input and output operators (the equality predicate and atomic constants used databases we use standard techniques of encoding lists and in our presentation are not essential for our results). tuples in the typed *-calculus. Queries are then encoded as Contributions. The topic of this paper is how to embed well-typed *-terms that apply to encodings of input rela- database query languages in the typed *-calculus with (and tions and reduce to an encoding of the output relation. For without) atomic constants and an equality predicate on notational convenience, we use TLC=, the typed *-calculus these constants. We consider three requirements: with atomic constants and an equality on them, and the associated $-reduction of [4, 12]. This is not essential for (1) inputs and outputs are *-terms encoding finite sets our analysis. In Section 2.4, we show how to encode atomic of tuples for relational databases, or more arbitrary com- constants and equality in TLC. binations of finite sets and tuples for complex object Our change of data representation, i.e., the framework of databases; finite model theory instead of arithmetic on Church (2) programs are *-terms which type when applied to numerals, has some interesting consequences, because it an input, reduce to an output, and express input-output takes us into the realm of database query languages. functions that are database queries of interest; (3) there is a reduction strategy for the application of DB Motivation and Background. Database query the program to the input, such that the output normal form languages have been motivated by Codd's work on rela- is produced efficiently by the reductions (i.e., in time poly- tional databases [14] and have been studied in the context nomial in the size of the input) when the functions expressed of finite model theory, e.g., [911, 15, 27, 46, 48]. Database are PTIME queries. queries, i.e., the functions from finite models to finite models expressed in various languages, have been classified based Requirements (1)(2) above give us embeddings, in the on their complexity. The most commonly used measure is sense of expressing queries of interest. The additional the one of data complexity of [10, 48],