Type Inference for Ruby Using the Cartesian Product Algorithm

Ecstatic – Type Inference for Ruby Using the Cartesian Product Algorithm Spring 2007 Master’s Thesis (SW10) by Martin Madsen, Peter Sørensen, and Kristian Kristensen Department of Computer Science Aalborg University Ruby, Ruby, Ruby, Ruby Ahaa-ahaa-ahaa Do ya, do ya, do ya, do ya Ahaa-ahaa-ahaa Know what ya doing, doing to me? Ahaa-ahaa-ahaa Ruby, Ruby, Ruby, Ruby Ahaa-ahaa-ahaa – “Ruby” by Kaiser Chiefs Aalborg University Department of Computer Science, Frederik Bajers Vej 7E, DK-9220 Aalborg East Title: Ecstatic – Type Inference for Ruby Using the Cartesian Product Abstract Algorithm (CPA) This master’s thesis documents Ecstatic Project Period: – a type inference tool for the Ruby pro- Master’s Thesis (SW10), gramming language. Ecstatic is based on Spring 2007 the Cartesian Product Algorithm (CPA), Project Group: which was originally developed for use D618A in the Self language. Group Members: Martin Madsen, [email protected] The major contributions of this thesis are: the Ecstatic tool that can infer precise and accurate types of arbitrary Ruby pro- Peter Sørensen, [email protected] grams. By implementing CPA we con- firm that the algorithm can be retrofitted for a new language. Utilizing RDoc we Kristian Kristensen, [email protected] devise a method for handling Ruby core and foreign code both implemented in C. Supervisor: Using Ecstatic a number of experiments Kurt Nørmark, were performed that gained insights into [email protected] the degree of polymorphism employed in Ruby programs. We present an ap- proach for unit testing a type inference system. We compare Ruby to Smalltalk and Self, and conclude that their seman- tics are similar. Copies: 7 Pages: 118 With Appendices: 146 Finished: June 14th, 2007 Signatures Martin Madsen Peter Sørensen Kristian Kristensen III Contents 1 Introduction 1 2 Ruby and Dynamic Object Oriented Languages 5 2.1 Ruby . .5 2.1.1 Classes and Objects . .7 2.1.2 Methods and Messages . .8 2.1.3 Modules and Mixins . .9 2.1.4 Core Classes and Modules . 10 2.1.5 Attributes and Dynamic Programming . 11 2.1.6 Code Blocks . 13 2.2 Programming Languages Similar to Ruby . 14 2.2.1 Historical Background and Language Relations . 15 2.2.2 Language Similarities and Discussion . 15 2.3 Summary . 22 3 Understanding Types and Type Inference Algorithms 23 3.1 Definition of Types and Related Concepts . 23 3.1.1 Types in Ruby . 24 3.1.2 Polymorphism . 26 3.1.3 Type Checking . 27 3.1.4 Type Inference . 28 3.2 The Hindley-Milner Algorithm . 29 3.3 The Cartesian Product Algorithm (CPA) . 31 3.4 Summary . 43 4 Problem Statement 45 4.1 Hypotheses . 45 4.2 Goals . 48 4.3 Challenges . 48 4.4 Requirements . 49 4.5 Quality Requirements . 50 4.6 Usage Scenarios . 51 4.7 Summary . 52 5 Implementation of Ecstatic 53 5.1 The Type Inference System . 53 V CONTENTS 5.1.1 System Overview . 53 5.1.2 RubySim . 55 5.1.3 Parsing Ruby . 59 5.1.4 The Controller . 61 5.2 Methods and Templates . 65 5.2.1 Optional Arguments . 68 5.2.2 Propagation Through Templates . 68 5.3 Keeping Track of self ......................... 72 5.4 Types and Grouping . 73 5.5 The State of Ecstatic . 73 5.6 Summary . 76 6 Testing Ecstatic 77 6.1 Research on Compiler Validation . 78 6.2 Relation to Type Inference . 80 6.3 Validation Suite . 80 6.3.1 Test Suite Framework . 81 6.3.2 Results . 84 6.3.3 Reflection . 84 6.4 Summary . 86 7 Experiments 89 7.1 Data Collection . 91 7.2 Results . 92 7.2.1 Use of Variables . 92 7.2.2 Use of Instance Variables . 94 7.2.3 Degree of Parametric Polymorphism . 95 7.2.4 Running Time . 95 7.2.5 NoMethodErrors......................... 95 7.3 Data Critique . 96 7.4 Summary . 97 8 Discussion 99 8.1 Experiences With the Ruby Language . 99 8.1.1 Implementing Blocks . 100 8.1.2 Dynamic Programming . 101 8.2 Discussion of the Hypotheses . 103 8.3 Experiences Gained . 106 8.3.1 Goals . 106 8.3.2 Challenges . 106 8.3.3 Requirements . 107 8.3.4 Quality Requirements . 108 8.3.5 Usage Scenarios . 108 8.4 Summary . 109 VI CONTENTS 9 Conclusion 111 Bibliography 112 A Acronyms 119 B RDoc Extractor 121 C Sample Unit Test 125 D Experiments 127 D.1 BMConverter . 127 D.2 Canna2Skk . 132 D.3 e...................................... 134 D.4 FreeRIDE . 135 D.5 ICalc . 137 D.6 Markovnames . 139 D.7 Quickey . 141 D.8 Roman . 143 D.9 Projects with Error Conditions . 145 VII CHAPTER 1 Introduction Ruby is a dynamic programming language, and therefore defers as many deci- sions as possible to runtime. Dynamic programming languages tend to have a high degree of flexibility and expressiveness. These factors makes a language like Ruby preferred by some programmers. Although Ruby has existed since 1995, it is still relatively new to the Western world and almost unknown to the academic world. The language comes from Japan and most of the early documentation was in Japanese. Since then En- glish resources have slowly emerged. With the release of the Ruby on Rails web framework in 2004 the language have gained more attention, and subsequently many books have been published. O ’Reilly (a publisher of technical books) publishes a book sales list every quarter. Sales of Ruby books have increased in the last years, and surpasses the sales of languages like Perl and Python. TIOBE Software [55] publishes a list once a month of the most popular programming languages. Their list from June 2007 places Ruby on a 10th place. They have statistics from June 2002, and Ruby has featured growth since then. Sun Microsystems support a project called JRuby [49] and Microsoft’s has pre- released IronRuby [50]. The projects implement the Ruby language on the Java Virtual Machine (JVM) and the Common Language Runtime (CLR), respectively. These two announcements illustrate the increasing momentum behind Ruby. After visiting RailsConf 2007, Thorup [54] notes his impression of the adoption of Ruby. By his observation, Ruby is primarily used in start-up shops and in PHP shops wanting to switch to Ruby on Rails Hansson [28]. He believes that the reason is primarily, that developers in these companies are more free to make technology choices compared to big companies. The nature of dynamic programming languages often require their programs to be type checked dynamically. A dynamic type check happens at runtime. This implies that a programmer must run a program to get any feedback about its behaviour and possible errors. The need to constantly run the program while programming is undesired for several reasons. Furthermore, feedback from running a program is often limited and only covers part of the code. Additional and possibly better feedback can often be provided by analysing the code statically. One aspect of static code analysis is type inference, in which 1 Chapter 1. Introduction types are ascribed to variables, expressions, etc. in the code. Utilizing type inference, the programmer can get valuable feedback on how types flow through the program and, thus, discover errors or wrong behaviour before runtime. This thesis couples the field of type inference and the Ruby programming language to create tool that does type inference. The developed tool is called Ec- static and it uses the Cartesian Product Algorithm (CPA) to analyze the types and their flow through a Ruby program. Ecstatic is used to conduct a number of experiments on real life Ruby programs. These experiments form the basis of a discussion of the value of type inference on Ruby and our developed tool. In the following two sections we present the contributions of this thesis. Major Contributions Ecstatic: We have implemented a tool that performs type inference on Ruby programs using the Cartesian Product Algorithm (CPA). CPA Works on Ruby: We have implemented CPA to work on Ruby programs. The CPA was developed for use on the Self programming language, so it was not immediately apparent that it would work on Ruby. Experiences in Implementing CPA: We have gained a number of insights and considerable experience in implementing CPA on Ruby. Some of these details and concepts are not readily available in Agesen [1]’s work on CPA. Foreign Code Inclusion: We present a method for extracting type information from the Ruby Core’s RDoc. This enables us to perform more precise type inference, because we are aware of the types of the built-in libraries. Experiments on Ruby Code: With Ecstatic we have conducted a number of experiments on Ruby programs found at the Ruby Application Archive (RAA). This enables us to collect a set of statistics on how Ruby programs are written, including statistics on data and parametric polymorphism. Testing a Type Inference (TI) System: Based on compiler validation, we present a method for testing the type inference system. The tests are based on Ruby source code samples and unit tests based on an extension to JUnit. Comparing Ruby, Self, and Smalltalk: We perform a language comparison between Ruby, Self, and Smalltalk. We conclude that the three languages are very similar on a semantic level. Although the similarities between Ruby, Self, and Smalltalk are often presented, a more thorough comparison has not been done before. 2 Minor Contributions Ideas for Future Research: We present four hypotheses regarding Ruby programs and programmers. Three of these are considerably broad, and are hence not confirmed or rejected in this thesis. However, they can be con- sidered ideas for future work and research.

Type Inference for Ruby Using the Cartesian Product Algorithm

C Constants and Literals Integer Literals Floating-Point Literals

PYTHON NOTES What Is Python?

Variables and Calculations

Regular Expressions the Following Are Sections of Different Programming Language Specifications

CS414-2004S-01 Compiler Basics & Lexical Analysis 1 01-0: Syllabus

AAE 875 – Fundamentals of Object Oriented Programming and Data Analytics

XL C/C++: Language Reference the Static Storage Class Specifier

C Language O Level / a Level

Modern C++ Tutorial: C++11/14/17/20 on the Fly

CS 1622: Lexical Analysis Lexical Analysis Tokens Why Tokens

C++ CONSTANTS/LITERALS Rialspo Int.Co M/Cplusplus/Cpp Co Nstants Literals.Htm Copyrig Ht © Tutorialspoint.Com

CS414-2017S-01 Compiler Basics & Lexical Analysis 1