Type Inference, Type Improvement, and Type Simplification in a Language with User-Defined Polymorphic Relational Operators

Type Inference, Type Improvement, and Type Simplification in a Language with User-Defined Polymorphic Relational Operators by Lajos Pal´ Nagy Master of Science in Computer Science Technical University of Budapest 2000 A dissertation submitted to Florida Institute of Technology in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science Melbourne, Florida May, 2007 c Copyright 2007 Lajos Pal´ Nagy All Rights Reserved The author grants permission to make single copies We the undersigned committee hereby approve the attached thesis Type Inference, Type Improvement, and Type Simplification in a Language with User-Defined Polymorphic Relational Operators by Lajos Pal´ Nagy Ryan Stansifer, Ph.D. Major Advisor Associate Professor, Computer Sciences Philip J. Bernhard, Ph.D. Committee Member Associate Professor, Computer Sciences Philip K. Chan, Ph.D. Committee Member Associate Professor, Computer Sciences Jewgeni H. Dshalalow, Dr.Sci. Committee Member Professor, Mathematics William D. Shoaff, Ph.D. Associate Professor and Head Computer Sciences Abstract Type Inference, Type Improvement, and Type Simplification in a Language with User-Defined Polymorphic Relational Operators by Lajos Pal´ Nagy Major Advisor: Ryan Stansifer, Ph.D. The overarching goal of the current thesis is to pave the road towards a comprehensive solu- tion to the decades old problem of integrating databases and programming languages. For this purpose, we propose a record calculus as an extension of an ML-style functional programming language core. In particular, we describe: (1) a set of polymorphic record operations that are expressive enough to define the operators of the relational algebra; (2) a type system together with a type inference algorithm, based on the theory of qualified types, to correctly capture the types of said polymorphic record operations; (3) an algorithm for checking the consistency (satisfiability of predicates) of the inferred types; (4) an algorithm for improving and simplifying types; and (5) an outline of an approach to explaining type errors in the resulting type system in an informative way. iii Table of Contents Page Acknowledgment ...................................... ix Dedication.......................................... x Chapter1: Introduction................................ 1 1.1 TheGreatChasm ................................. 1 1.1.1 Rationale for Functional Programming . 4 1.1.2 Rationale for the Relational Model . 5 1.2 ProblemStatement ................................ 6 1.2.1 Main Challenges . 7 1.3 Contributions ................................... 8 1.4 Outline of Dissertation . 9 Chapter2: RelatedWork ............................... 10 2.1 Database Programming Languages . 10 2.1.1 OrthogonalPersistence.......................... 11 2.1.2 The Relational Approach . 13 2.2 DeductiveDatabases ............................... 15 iv 2.2.1 A Simple Deductive Database . 16 2.2.2 Deductive Databases versus Functional Database Programming . 18 2.3 Record Calculi and Systems with Polymorphic Relational Operators . 20 2.3.1 Record Support in Mainstream Languages . 21 2.3.2 Standard Record Subtyping . 22 2.3.3 Rows and Unchecked Row Extension . 23 2.3.4 Present/AbsentFlags........................... 23 2.3.5 Generalized Rows with Subtyping . 24 2.3.6 Record Concatenation with Disjointness Predicates . 25 2.3.7 Kinded Record Types and Machiavelli . 26 2.3.8 Qualified Types and Rows . 29 2.3.9 The Language D ............................. 30 2.3.10 Type Inference for the Relational Algebra . 32 2.3.11 HaskellDB: Strongly Typed Database Access for Haskell . 34 2.3.12 Heterogeneous Collections for Haskell . 36 Chapter 3: Language Syntax and Semantics . 40 3.1 Language Design Considerations . 40 3.1.1 The Notion of Record and the Omission of Variants . 42 3.1.2 First-class Labels . 43 3.2 BasicRecordOperations ............................. 43 3.3 FormalSemantics................................. 49 3.4 SetsandRelations................................. 53 v 3.4.1 Relation Headings . 55 3.5 Defining Relational Algebra Operators . 56 3.5.1 Sample Relational Algebra Queries . 58 Chapter4: TypeSystem ............................... 61 4.1 Kinds ....................................... 61 4.2 Types ....................................... 62 4.3 Type-level Operations and Relations . 63 4.4 Type Predicates and Qualified Types . 65 4.5 Typing Basic Record Operations . 69 4.6 Typing Rules and Type Inference . 71 4.6.1 Substitution and Unification . 71 4.7 ExamplesofInferredTypes. 74 Chapter 5: Checking Satisfiability of Predicates . 76 5.1 Definition of Satisfiability . 78 5.2 MappingtoSetExpressions ........................... 79 5.3 A Simplifying Language Restriction . 80 5.4 TheAlgorithmQ ................................. 83 5.4.1 PseudoCodeforAlgorithmQ . 84 5.5 ANormalFormforSetExpressions . 84 5.6 Solving Set Constraints . 88 5.7 SelectingBaseSets ................................ 90 5.8 Checking Field Type Constraints . 93 vi 5.9 HandlingNestedRecords............................. 96 5.10 Soundness and Completeness . 98 5.11 Complexity of Algorithm Q . 99 5.12SampleRun.................................... 101 5.13Summary ..................................... 103 Chapter 6: Type Improvement and Simplification . 104 6.1 TypeImprovement ................................ 104 6.1.1 Representative Cases . 106 6.2 Algorithm for Type Improvement . 110 6.2.1 Finding the Improving Substitution sℓ (Field Types) . 110 6.2.2 Finding the Improving Substitution sLM (Empty Row) . 111 6.2.3 Finding the Improving Substitution s≃ (SameRow). 113 6.3 Type Simplification . 116 6.3.1 Representative Cases . 118 6.4 Algorithm for Type Simplification . 122 6.4.1 Identifying Reachable Predicates . 122 6.4.2 Identifying Constructor Predicates . 124 6.4.3 Identifying Relevant Predicates . 125 6.4.4 Putting It All Together . 127 Chapter7: ExplainingTypeErrors .......................... 129 7.1 Explaining Type Errors in Polymorphic Languages . 130 7.2 TypeErrorsandQualifiedTypes . 131 vii 7.2.1 Showing the Origins of Predicates . 134 7.2.2 Identifying Conflicting Predicates . 135 7.2.3 Revealing the Contradiction . 136 Chapter8: Conclusions................................ 140 8.1 FutureWork.................................... 141 Appendix A: Overview of the Relational Model and Algebra . 150 A.1 Union ....................................... 151 A.2 Intersection .................................... 152 A.3 Difference..................................... 152 A.4 CartesianProduct ................................. 153 A.5 (Natural)Join ................................... 154 A.6 Restriction..................................... 155 A.7 Projection..................................... 156 A.8 Division...................................... 157 A.9 ‘Non-Standard’ Relational Operators . 158 A.9.1 Projecting Away and Renaming Attributes . 159 A.9.2 Variations on join: semijoin, antijoin, and compose . 160 A.9.3 Improved division: the Small Divide . 160 A.9.4 Extension................................. 161 AppendixB: Proofs................................... 162 viii Acknowledgment First of all, I would like to thank my parents for their patience and understanding. I know how hard it must have been to let their only son go to a far away country, on the other side of the Atlantic. I thank my sisters, Emi, Eva,´ and Julcsi, for their encouragement and support. I thank all the members of my doctoral committee. I thank Dr. Chan for pushing me harder and demanding focus and clarity. Special thanks goes to my major advisor, Dr. Stansifer whose knowledge, patience, professionalism, eye for detail, and last, but certainly not least, his constant encouragement really was indispensable in helping me to write this dissertation. I would like to thank my friend, Attila, who always lent a sympathetic ear whenever I hit obstacles in my research (and that happened a lot). I thank Dr. Imre Paulovits, who helped me to get to the United States and provided me with a lot of advice on how to conduct world-class research. Finally, I thank Yoshiko, who believed in me all the way, and wanted me to succeed. (She is also pretty, by the way.) ix Dedication To My Grandparents x 1 Chapter 1 Introduction Databases as separate entities of information systems emerged in the 1960s when it became obvious that the common task of handling shared and persistent data is best dealt with a dedicated component, a Database Management System. Up until then, each application program did its own data management using the facilities provided by the file system. Though this separation of duties proved to be extremely beneficial in the long run, it nevertheless created its own set of problems—most of them stemming from the difficulties of interfacing programming languages with databases. 1.1 The Great Chasm Programs that access data in a database are referred to as database applications. Traditionally, database applications are written in some high-level programming language (often called the host language in this context), and use a data sub-language (the embedded language) to query and update data in the database. (This description holds true even in the context of modern object-oriented languages, although various persistence and object/relational mapping solutions blur the language boundaries somewhat.) It has long been recognized that this dualism of languages (or ‘impedance mismatch’, as it has become infamously known) causes several problems: 2 1. differences between the primitive and complex data types of the

Type Inference, Type Improvement, and Type Simplification in a Language with User-Defined Polymorphic Relational Operators

Middleware in Action 2007

Advanced Sql Programming Third Edition

Up to a Point, Lord Copper

OLAP Solutions Building Multidimensional Information Systems

Joe Celko's SQL for Smarties

Confusing Physcal and Logical Levels of Abstraction Is The

Missing Data in the Relational Model

Oral History of C. J. Date

Stephen Gray

Relational and Object-Relational Database Management Systems As Platforms for Managing Software Engineering Artifacts

Data and Databases Concepts in Practice.Pdf

Stephen Gray