ABSTRACT Combining Static and Dynamic Typing in Ruby Michael
Total Page:16
File Type:pdf, Size:1020Kb
ABSTRACT Title of dissertation: Combining Static and Dynamic Typing in Ruby Michael Furr Doctor of Philosophy, 2009 Dissertation directed by: Professor Jeffrey S. Foster Department of Computer Science Many popular scripting languages such as Ruby, Python, and Perl are dynam- ically typed. Dynamic typing provides many advantages such as terse, flexible code and the ability to use highly dynamic language constructs, such as an eval method that evaluates a string as program text. However these dynamic features have tra- ditionally obstructed static analyses leaving the programmer without the benefits of static typing, including early error detection and the documentation provided by type annotations. In this dissertation, we present Diamondback Ruby (DRuby), a tool that blends static and dynamic typing for Ruby. DRuby provides a type language that is rich enough to precisely type Ruby code, without unneeded complexity. DRuby uses static type inference to automatically discover type errors in Ruby programs and provides a type annotation language that serves as verified documentation of a method's behavior. When necessary, these annotations can be checked dynamically using runtime contracts. This allows statically and dynamically checked code to safely coexist, and any runtime errors are properly blamed on dynamic code. To handle dynamic features such as eval, DRuby includes a novel dynamic analysis and transformation that gathers per-application profiles of dynamic feature usage via a program's test suite. Based on these profiles, DRuby transforms the program before applying its type inference algorithm, enforcing type safety for dynamic constructs. By leveraging a program's test suite, our technique gives the programmer an easy to understand trade-off: the more dynamic features covered by their tests, the more static checking is achieved. We evaluated DRuby on a benchmark suite of sample Ruby programs. We found that our profile-guided analysis and type inference algorithms worked well, discovering several previously unknown type errors. Furthermore, our results give us insight into what kind of Ruby code programmers \want" to write but is not easily amenable to traditional static typing. This dissertation shows that it is possible to effectively integrate static typing into Ruby without losing the feel of a dynamic language. Combining Static and Dynamic Typing in Ruby by Michael Furr Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2009 Advisory Committee: Professor Jeffrey S. Foster, Chair/Advisor Professor Michael Hicks Professor Neil Spring Professor Vibha Sazawal Professor Bruce Jacob, Dean's Representative c Copyright by Michael Furr 2009 Dedication To my wife, Laura. ii Table of Contents List of Figures vi 1 Introduction 1 1.1 Thesis . .3 1.2 Contributions . .4 1.2.1 RIL, a Ruby Analysis Framework . .4 1.2.2 Static Types for Ruby . .5 1.2.3 Profiled-Guided Analysis of Dynamic Features . .6 2 A Framework for Ruby Analysis 7 2.1 An Introduction to Ruby . .9 2.2 Parsing Ruby . 12 2.2.1 Language Ambiguities . 13 2.2.2 A GLR Ruby Parser . 15 2.3 The Ruby Intermediate Language . 20 2.3.1 Eliminating Redundant Constructs . 20 2.3.2 Linearization . 22 2.3.3 Materializing Implicit Constructs . 25 2.4 Additional Libraries . 27 2.4.1 Dataflow . 27 2.4.2 Pretty Printer . 28 2.4.3 Scope Resolution . 30 2.4.4 Visitor . 30 2.4.5 File Loading . 31 2.4.6 Runtime Instrumentation . 32 2.5 Using RIL . 33 2.5.1 Eliminating Nil Errors . 33 2.5.2 Dataflow Analysis . 36 2.5.3 Dynamic Analysis . 39 2.6 Related Work . 43 3 A Static Type System for Ruby 46 3.1 Overview . 46 3.2 Static Types for Ruby . 48 3.2.1 Basic Types and Type Annotations . 48 3.2.2 Intersection Types . 50 3.2.3 Optional Arguments and Varargs . 51 3.2.4 Union Types . 52 3.2.5 Object Types . 53 3.2.6 F-Bounded Polymorphism . 54 3.2.7 The self type . 55 3.2.8 Abstract Classes and Mixins . 56 3.2.9 Tuple Types . 57 iii 3.2.10 First Class Methods . 58 3.2.11 Types for Variables and Nil . 60 3.2.12 Unsupported features . 61 3.2.13 Cast Insertion . 62 3.3 MiniRuby ................................ 64 3.3.1 Source Language . 65 3.3.2 Type Checking . 67 3.3.3 Type Inference . 76 3.4 Implementation . 78 3.5 Experimental Evaluation . 79 3.5.1 Experimental Results . 82 3.6 Related Work . 86 4 Analyzing Dynamic Features 90 4.1 Motivation . 91 4.2 Overview . 96 4.3 Dynamic Features in DynRuby ..................... 98 4.3.1 An Instrumented Semantics . 100 4.3.2 Translating Away Dynamic Features . 102 4.3.3 Safe Evaluation . 107 4.3.4 Formal Properties . 109 4.4 Implementation . 111 4.4.1 Additional Dynamic Constructs . 112 4.4.2 Implementing safe eval ...................... 114 4.5 Profiling Effectiveness . 117 4.5.1 Dynamic Feature Usage . 117 4.5.2 Categorizing Dynamic Features . 118 4.6 Type Inference . 122 4.6.1 Performance and Type Errors . 125 4.6.2 Changes for Static Typing . 127 4.7 Threats to Validity . 134 4.8 Related Work . 135 5 Future Work 139 5.1 Type System Improvements . 139 5.2 Profiling Improvements . 143 6 Conclusions 145 A RIL Example Source Code 148 B Proofs for MiniRuby 153 B.1 Static Semantics . 153 B.2 Dynamic Semantics . 154 B.3 Soundness . 158 iv C Proofs for DynRuby 174 C.1 Type Checking Rules . 174 C.2 Complete Formalism and Proofs . 177 C.2.1 Translation Faithfulness . 177 C.2.2 Type Soundness . 190 Bibliography 197 v List of Figures 2.1 Sample Ruby code . 10 2.2 Example GLR Code . 16 2.3 Disambiguation example . 18 2.4 Nested Assignment . 22 2.5 RIL Linearization Example . 24 2.6 Dynamic Instrumentation Architecture . 31 3.1 MiniRuby ................................ 66 3.2 Type Checking Rules for Expressions . 68 3.3 Type Checking Rules for Definitions . 72 3.4 Subtyping Judgments . 75 3.5 Type Inference Rules (Updates Only) . 76 3.6 Type Inference Results . 80 4.1 Using require with dynamically computed strings . 91 4.2 Example of require from Rubygems package manager . 92 4.3 Use of send to initialize fields . 93 4.4 Defining methods with eval ........................ 94 4.5 Intercepting calls with method missing .................. 95 4.6 DynRuby source language . 97 4.7 Instrumented operational semantics (partial) . 100 4.8 Transformation to static constructs (partial) . 103 4.9 Safe evaluation rules (partial) . 107 4.10 Dynamic feature profiling data from benchmarks . 116 vi 4.11 Categorization of profiled dynamic features . 121 4.12 Type inference results . 123 4.13 Changes needed for static typing . 124 B.1 Type Checking Rules for Values . 154 B.2 Big-step Operational Semantics for Expressions (1/2) . 155 B.3 Big-step Operational Semantics for Expressions(2/2) . 156 B.4 Big-step Operational Semantics for Definitions . 158 C.1 Type checking rules for DynRuby (selected rules) . 175 C.2 Instrumented big-step operational semantics for DynRuby (exclud- ing blame and error rules) . 178 C.3 Transformation to static constructs (complete) . 179 C.4 Safe evaluation rules (complete) . 180 C.5 Additional operational semantics rule wrapped expressions (1/2) . 181 C.6 Additional operational semantics rule wrapped expressions (2/2) . 182 C.7 Type checking rules for DynRuby (complete) . 183 C.8 Type checking rules for DynRuby (complete) . 184 vii Chapter 1 Introduction Over the last decade a new wave of programming languages such as Perl [81], Python [79], and Ruby [74] have gained popularity. These languages are distin- guished by their strong support for regular expressions and string manipulation, terse syntax, comprehensive libraries, and expressive language constructs. Com- bined, these features aim to make solving common programming tasks as easy as possible. In particular, these features ease the development of prototypes, where producing a implementation quickly is often more important than producing a pro- gram that is correct for every possible input. These languages are also dynamically typed, meaning that no program is ever prevented from being executed. If a type error occurs at runtime, such as evaluating true + 3, the program will be aborted with an error message. However, if this expression is not evaluated, no error will be emitted. This can.