Understanding and Evolving the ML Module System

Understanding and Evolving the ML Module System Derek Dreyer May 2005 CMU-CS-05-131 School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Thesis Committee: Robert Harper (co-chair) Karl Crary (co-chair) Peter Lee David MacQueen (University of Chicago) Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy Copyright c 2005 Derek Dreyer This research was sponsored in part by the National Science Foundation under grant CCR-0121633 and EIA- 9706572, and the US Air Force under grant F19628-95-C-0050 and a generous fellowship. The views and conclusions contained in this document are those of the author and should not be interpreted as representing the official policies, either expressed or implied, of any sponsoring institution, the U.S. government or any other entity. Keywords: ML, module systems, type systems, functors, abstract data types, lambda calculus, recursive modules, singleton kinds Abstract The ML module system stands as a high-water mark of programming language support for data abstraction. Nevertheless, it is not in a fully evolved state. One prominent weakness is that module interdependencies in ML are restricted to be acyclic, which means that mutually recursive functions and data types must be written in the same module even if they belong conceptually in different modules. Existing efforts to remedy this limitation either involve drastic changes to the notion of what a module is, or fail to allow mutually recursive modules to hide type information from one another. Another issue is that there are several dialects of ML, and the module systems of these dialects differ in subtle yet semantically significant ways that have been difficult to account for in any rigorous way. It is important to come to a clear assessment of the existing design space and consolidate what is meant by \the ML module system" before embarking on such a major extension as recursive modules. In this dissertation I contribute to the understanding and evolution of the ML module system by: (1) developing a unifying account of the ML module system in which existing variants may be understood as subsystems that pick and choose different features, (2) exploring how to extend ML with recursive modules in a way that does not inhibit data abstraction, and (3) incorporating the understanding gained from (1) and (2) into the design of a new, evolved dialect of ML. I formalize the language of part (3) using the framework of Harper and Stone, in which the meanings of \external" ML programs are interpreted by translation into an \internal" type system. In my exploration of the recursive module problem, I also propose a type system for statically detecting whether or not recursive module definitions are \safe"|that is, whether they can be evaluated without referring to one another prematurely|thus enabling more efficient compilation of recursive modules. Future work remains, however, with regard to type inference and type system complexity, before my proposal can be feasibly incorporated into ML. For Constance and Benard Dreyer, the most loving and supportive parents in the world Acknowledgments First and foremost, I would like to thank my advisors, Bob Harper and Karl Crary. Many of the ideas in this thesis were developed together with them, and my character as a researcher has been influenced to a large degree by their rigorous approach to programming language research and their profound sense of aesthetics. I would also like to thank Peter Lee for his warm guidance as my former advisor and his continual encouragement of all my endeavors, and Dave MacQueen for many interesting discussions and for his extremely careful and thorough perusal of this dissertation. There are many other friends and colleagues whom I would like to thank for making my ex- perience at CMU such a memorable (and long!) journey. To name a few: Melissa and Umut Acar, Mihai and Raluca Budiu, Sharon Burks, Franklin Chen, Andrés Cladera, Catherine Copetas, Kathy Copic, Nathaniel Daw, Mark Fuhs, Anna Goldenberg, Beth and Jeff Helzner, Heather Hen- drickson, Rose Hoberman, Yan Karklin, Cathy Kelley, Adam Klivans, Sue Lee, Adriana Moscatelli, Mike Murphy, Tom Murphy VII, Aleks and Emi Nanevski, Leaf Petersen, Martha Petersen, Frank Pfenning, Chris Richards, Chuck Rosenberg, Dan Spoonhower, Chris Stone, Dave Swasey, Desney Tan, Joe Vanderwaart, Dave Walker, Kevin Watkins. I would like to give special thanks to Aleks Nanevski. I am very lucky to have had such an intellectually stimulating (if occasionally infuriating) officemate for seven whole years. I will always remember our spirited discussions about life, love, politics and programming languages with great fondness. I would also like to thank Kevin Watkins for initiating, and Dan Spoonhower for revamping, the ConCert reading group. One of the strengths of CMU is its strong graduate student community, and it was wonderful to be able to hash out the details of research papers with such an informed and motivated group of colleagues. Finally, I would like to thank Rose Amanda Hoberman for all the love, companionship and delicious food she has given me over the past four and a half years, and my parents, Constance and Benard, for all the love and encouragement they have given me throughout my life. Contents Introduction 1 I Understanding the ML Module System 5 1 The Design Space of ML Modules 7 1.1 Key Features of the ML Module System . 7 1.1.1 Structures and Signatures . 7 1.1.2 Data Abstraction via Sealing and Functors . 9 1.1.3 Translucent Signatures . 10 1.2 Key Points and Axes in the Design Space of ML Modules . 12 1.2.1 Precursors to Translucency . 13 1.2.2 First-Class vs. Second-Class, Higher-Order vs. First-Order . 13 1.2.3 Harper and Lillibridge's First-Class Modules . 14 1.2.4 SML/NJ's Higher-Order Functors . 15 1.2.5 Leroy's Applicative Functors . 16 1.2.6 The Importance of Generativity . 17 1.2.7 Supporting Both Applicative and Generative Functors . 18 1.2.8 Notions of Module Equivalence . 19 1.2.9 Conclusion . 20 2 A Unifying Account of ML Modules 21 2.1 An Analysis of ML-Style Modularity . 21 2.1.1 Projectibility and Purity . 21 2.1.2 Phase Separation . 23 2.1.3 Module Equivalence . 25 2.1.4 Total vs. Partial Functors . 25 2.1.5 Sealing as a Form of Information Hiding . 27 2.1.6 Squeezing the Balloon . 30 2.1.7 Projectibility and Transparency . 31 2.2 Fruits of the Analysis . 32 2.2.1 Understanding the Existing ML Module System Designs . 32 2.2.2 A Unifying Design . 35 2.2.3 A Modular Design . 36 2.3 Comparison With a Previous Version of This Account . 39 x CONTENTS 3 A Type System for ML Modules: Core Language 41 3.1 Type Constructors and Kinds . 41 3.1.1 Syntax . 41 3.1.2 Static Semantics . 43 3.1.3 Basic Structural Properties . 46 3.1.4 Other Declarative Properties . 47 3.1.5 Admissible Rules . 48 3.1.6 Kind Checking and Synthesis . 50 3.1.7 Deciding Constructor Equivalence . 51 3.2 Terms . 54 3.2.1 Syntax . 54 3.2.2 Static Semantics . 55 3.2.3 Declarative Properties . 56 3.2.4 Type Checking and Synthesis . 57 3.2.5 Dynamic Semantics and Type Safety . 58 4 A Type System for ML Modules: Module Language 59 4.1 Signatures . 59 4.1.1 Syntax . 59 4.1.2 Static Semantics . 62 4.1.3 Declarative Properties . 63 4.1.4 Signature Phase-Splitting . 67 4.2 Modules . 68 4.2.1 Syntax . 68 4.2.2 Projectible Modules . 70 4.2.3 Static Semantics . 72 4.2.4 Declarative Properties . 74 4.2.5 Signature Checking and Synthesis . 74 4.2.6 The Avoidance Problem . 77 4.2.7 Module Phase-Splitting . 80 II Recursive Modules 85 5 The Recursive Module Problem 87 5.1 Motivating Examples . 87 5.2 Key Issues in the Design of a Recursive Module Extension . 92 5.2.1 Dynamic Semantics . 92 5.2.2 Recursively Dependent Signatures . 93 5.2.3 The Double Vision Problem . 97 5.2.4 Separate Compilation . 100 5.3 Existing Approaches to Recursive Modules . 101 5.3.1 A Foundational Account . 101 5.3.2 Moscow ML . 102 5.3.3 O'Caml . 104 5.3.4 Units . 107 5.3.5 Mixins . 108 5.4 A New Approach . 109 CONTENTS xi 5.4.1 Overview . 109 5.4.2 Elaboration of Recursive Modules . 110 6 Type-Theoretic Extensions for Recursive Modules 117 6.1 Constructor-Language Extensions . 117 6.2 Term-Language Extensions . 120 6.3 Signature-Language Extensions . 128 6.4 Module-Language Extensions . 132 7 Safe Recursion 137 7.1 Evaluability . 138 7.1.1 The Evaluability Judgment . 139 7.1.2 A Total/Partial Distinction . 139 7.1.3 Limitations of the Total/Partial Distinction . ..

Load more