A Language for Generic Programming
Total Page:16
File Type:pdf, Size:1020Kb
A LANGUAGE FOR GENERIC PROGRAMMING Jeremy G. Siek Submitted to the faculty of the University Graduate School in partial fulfillment of the requirements for the degree Doctor of Philosophy in the Department of Computer Science Indiana University August, 2005 Accepted by the Graduate Faculty, Indiana University, in partial fulfillment of the require- ments for the degree of Doctor of Philosophy. Andrew Lumsdaine, Ph.D. R. Kent Dybvig, Ph.D. Daniel P. Friedman, Ph.D. Steven D. Johnson, Ph.D. Amr Sabry, Ph.D. August, 2005 ii Copyright 2005 Jeremy G. Siek All rights reserved iii This dissertation is dedicated to my wonderful wife Katie who was next to me every step of this journey. iv Acknowledgements First and foremost I thank my parents for all their love and for teaching me to enjoy learn- ing. I especially thank my wife Katie for her support and understanding through this long and sometimes stressful process. I also thank Katie for insisting on good error messages for ! My advisor, Andrew Lumsdaine, deserves many thanks for his support and guidance G and for keeping the faith as I undertook this long journey away from scientific computing and into the field of programming languages. I thank my thesis committee: R. Kent Dybvig, Daniel P. Friedman, Steven D. Johnson, and Amr Sabry for their advice and encourage- ment. A special thanks goes to Ronald Garcia, Christopher Mueller, and Douglas Gregor for carefully editing and catching the many many times when I accidentally skipped over the important stuff. Thanks to Jaakko and Jeremiah for hours of stimulating discussions and arguments concerning separate compilation and concept-based overloading. Thanks to David Abrahams for countless hours spent debating the merits of one design over another while jogging through the hinterlands of Norway. Thanks to Alexander Stepanov and David Musser for getting all this started, and thank you for the encouragement over the years. Thanks to Matthew Austern, his book Generic Programming in the STL was both an inspi- ration and an invaluable reference. Thanks to Beman Dawes and everyone involved with the Boost libraries. The collective experience from Boost was vital in the creation of this thesis. Thanks to Vincent Cremet and Martin Odersky for answering questions about Scala and virtual types. v Abstract The past decade of software library construction has demonstrated that the discipline of generic programming is an effective approach to the design and implementation of large- scale software libraries. At the heart of generic programming is a semi-formal interface specification language for generic components. Many programming languages have fea- tures for describing interfaces, but none of them match the generic programming specifi- cation language, and none are as suitable for specifying generic components. This lack of language support impedes the current practice of generic programming. In this dissertation I present and evaluate the design of a new programming language, named (for generic), G that integrates the generic programming specification language with the type system and features of a full programming language. The design of is based on my experiences, G and those of colleagues, in the construction of generic libraries over the past decade. The design space for programming languages is large, thus this experience is vital in guiding choices among the many tradeoffs. The design of emphasizes modularity because generic G programming is inherently about composing separately developed components. In this dis- sertation I demonstrate that the design is implementable by constructing a compiler for G (translating to C++) and show the suitability of for generic programming with prototypes G of the Standard Template Library and the Boost Graph Library in . I formalize the essential G features of in a small language and prove type soundness. G vi Contents Chapter 1. Introduction 1 1.1. Lowering the cost of developing generic components 5 1.2. Lowering the cost of reusing generic components 7 1.3. : a language for generic programming 9 G 1.4. Related work in programming language research 10 1.5. Claims and evaluation 11 1.6. Road map 12 Chapter 2. Generic programming and the STL 13 2.1. An example of generic programming 15 2.2. Survey of generic programming in the STL 21 2.3. Relation to other methodologies 49 2.4. Summary 54 Chapter 3. The language design space for generics 55 3.1. Preliminary design choices 55 3.2. Subtyping versus type parameterization 59 3.3. Parametric versus macro-like type parameterization 65 3.4. Concepts: organizing type requirements 73 3.5. Nominal versus structural conformance 83 3.6. Constrained polymorphism 85 vii 3.7. Summary 89 Chapter 4. The design of 90 G 4.1. Generic functions 92 4.2. Concepts 95 4.3. Models 96 4.4. Modules 98 4.5. Type equality 98 4.6. Function application and implicit instantiation 101 4.7. Function overloading and concept-based overloading 108 4.8. Generic user-defined types 110 4.9. Function expressions 112 4.10. Summary 114 Chapter 5. The definition and compilation of 118 G 5.1. Overview of the translation to C++ 120 5.2. A definitional compiler for 133 G 5.3. Compiler implementation details 160 5.4. Summary 162 Chapter 6. Case studies: generic libraries in 163 G 6.1. The Standard Template Library 164 6.2. The Boost Graph Library 174 6.3. Summary 180 Chapter 7. Type Safety of FG 184 7.1. FG = System F + concepts, models, and constraints 185 7.2. Translation of FG to System F 192 7.3. Isabelle/Isar formalization 201 7.4. Associated types and same-type constraints 204 7.5. Summary 213 viii Chapter 8. Conclusion 215 Appendix A. Grammar of 219 G A.1. Type expressions 220 A.2. Declarations 221 A.3. Statements and expressions 221 A.4. Derived forms 223 Appendix B. Definition of FG 224 Appendix. Bibliography 228 Appendix. Index 247 ix Software production in the large would be enormously helped by the availability of spectra of high quality routines, quite as mechanical design is abetted by the existence of families of structural shapes, screws or resistors... One could not stock 300 sine routines unless they were all in some sense instances of just a few models, highly parameterized, in which all but a few parameters were intended to be permanently bound before run time. One might call these early-bound parameters ‘sale time’ parameters... Choice of Data structures... this delicate matter requires careful plan- ning so that algorithms be as insensitive to changes of data structure as possible. When radically different structures are useful for similar problems (e.g., incidence matrix and list representations for graphs), several algorithms may be required. M. Douglas McIlroy, 1969 [126] 1 Introduction A decade or two ago computers were primarily the tools of specialists and the toys of hobby- ists. Now they are a part of everyday life: they are used to create the family photo albums, make travel reservations, communicate with friends, and get directions for a trip. Despite the advances in computer science and software engineering, computers still must be told what to do in excruciating detail. Thus, the production of software to control our computers is an important endeavor, one that affects more and more aspects of our lives. Producing software is hard: massive amounts of time and money go into creating the software applications we use today. This cost affects the prices we pay for shrink wrapped software and factors into the prices of many other goods and services. Further, software 1 1. INTRODUCTION 2 quality affects our lives: buggy software is a constant annoyance and software bugs some- times cause or contribute to more serious harm The 1968 NATO Conference on Software Engineering popularized the terms “software crisis” and “software engineering”. The crisis they faced was widespread difficulties in the construction of large software systems such as IBM’s OS/360 and the SABRE airline reservation system [64, 154]. The conference attendees felt it was time for programmers and managers to get more serious about the process of producing software. McIlroy gave an invited talk entitled Mass-produced Software Components [126]. In this talk he proposed the systematic creation of reusable software components as a solution to the software crisis. The idea was that most software products are created from building blocks that are quite similar, so software productivity would be increased if a standard set of blocks could be shared among many software products. Barnes and Bollinger define a simple equation that summarizes the savings that can be achieved through software reuse [15]. Let D stand for the cost of developing a reusable component and n be the number of uses of the component. The savings is calculated by: n (1) (Ci Ri) D ! − # − "i=1 where Ci is the cost of writing code from scratch to solve a problem and Ri is the cost of reusing the component. A particularly interesting aspect of this equation is that if Ci > Ri, then as n tends to infinity so does the savings from reuse. On the other hand, if n is small, then the benefits of reuse may be outweighed by the initial investment D of developing the reusable component. Studies by Margono and Rhoads have shown that a typical value for D is twice the cost of building a non-reusable version of the component [125]. In addition to the savings in software production, reuse can increase software quality. One of the reasons given by Lim [116] is that the more a piece of software is used, the faster the bugs in the software are found and fixed. Further, the bugs need only be fixed in one place, in the reusable component, and then all uses of the component benefit from the increase in quality.