<<

Jaakko Rinta-Filppula

IS STATIC TYPE CHECKING WORTH IT?

On the pros and of adding a static type checker to an existing codebase

Master of Science Thesis Faculty of Information Technology and Communication Sciences Examiners: Prof. Kari Systä D.Sc. Matti Rintala April 2021 i

ABSTRACT

Jaakko Rinta-Filppula: Is static type checking worth it? Master of Science Thesis Tampere University Master’s Programme in Information Technology April 2021

Even though type systems are a well-researched topic in theoretical , there are relatively few studies on the differences between static and dynamic type checking for software development. There are controlled studies that focus on specific aspects of the development such as speed and code maintainability but due to their nature, these studies consider rather small codebases. This thesis augments the results of the previous research by gathering real-world on the experiences of using static type checkers designed for dynamically typed languages. The goal of the thesis is to find out if any benefits are observed by using these static type checking tools and whether those benefits justify any possible drawbacks. To answer these ques- tions, an online survey was conducted to gather the experiences of a total of 138 other developers. In addition to the online survey, the author performed an experiment where he adopted Sorbet, a static type checker for Ruby, in a large codebase (approximately 80,000 lines of code) that multiple developers are working on simultaneously. In the survey, almost everyone who had used these tools told that the tools are at least some- what beneficial. A majority of them said that using a static type checker is very beneficial. Usinga static type checker was found to improve all of the different areas of development that were asked about in the survey: code reliability, development speed, API usability, maintainability, working with unfamiliar parts of the code, and confidence in refactoring and writing new code. Overall the responses were very positive with only a small amount of negative answers. In the exploratory experiment performed by the author the results were also positive. Even though there were problems that prevented the author from using the full potential of the type checker, undiscovered problems in the codebase were surfaced. The process of setting up and working with the tools was pleasant. All in all, it seems that using these kinds of static type checkers in dynamically typed languages can provide significant benefits and improve the software development process.

Keywords: static typing, dynamic typing, , type systems, type checkers, software development

The originality of this thesis has been checked using the Turnitin OriginalityCheck service. ii

TIIVISTELMÄ

Jaakko Rinta-Filppula: Kannattaako staattinen tyyppitarkistus? Diplomityö Tampereen yliopisto Tietotekniikan DI-ohjelma Huhtikuu 2021

Vaikka tyyppijärjestelmiä ja niiden teoriaa on tutkittu paljon tietojenkäsittelytieteessä, on staat- tisen ja dynaamisen tyyppitarkistuksen vaikutuksista ohjelmistokehitykseen verrattaen vähän tut- kimuksia. Kehityksen eri osa-alueisiin kohdistuvia vertailukokeita on tehty, mutta tämän tutkimus- tavan luonteen vuoksi ne koskettavat melko pieniä koodikantoja. Tämä diplomityö laajentaa aikai- sempien tutkimusten tuloksia keräämällä kehittäjien kokemuksia dynaamisesti tyypitetyille ohjel- mointikielille suunniteltujen staattisten tyyppitarkistimien käytöstä. Työn tavoitteena on selvittää, onko näiden tyyppitarkistimien käytöstä havaittavaa hyötyä, se- kä arvioida hyötyjen suuruutta mahdollisiin haittoihin verrattuna. Näihin kysymyksiin pyritään vas- taamaan kyselylomakkeella, jolla selvitettiin ohjelmistokehittäjien kokemuksia tyyppitarkistustyö- kaluista. Kyselyyn vastasi 138 osallistujaa. Sähköisen kyselyn lisäksi kirjoittaja suoritti kokeelli- sen tutkimuksen, jossa hän otti käyttöön Sorbet-nimisen työkalun isossa, noin 80 000 rivin Ruby- projektissa, jonka parissa työskentelee useita kehittäjiä samanaikasesti. Lähes kaikki kyselyyn vastanneista kertoivat tyyppitarkistustyökalujen olevan ainakin jokseen- kin hyödyllisiä. Suurin osa heistä luokitteli työkalut erittäin hyödyllisiksi. Staattisen tyyppitarkisti- men koettiin parantavan kaikkia kyselyssä käsiteltyjä kehityksen osa-alueita: koodin luotettavuus, kehitysnopeus, rajapintojen käytettävyys, ylläpidettävyys, entuudestaan tuntemattomien osien pa- rissa työskentely sekä varmuus koodia refaktoroidessa ja uutta koodia kirjoitettaessa. Kokonai- suudessa kyselyn vastaukset olivat erittäin positiivisia ja huonoja kokemuksia oli hyvin vähän. Kirjoittajan oman tutkimuksen tulokset olivat myös positiivisia. Huolimatta ilmenneistä ongel- mista, jotka estivät tyyppitarkistuksen täysimääräisen käytön, koodikannasta ilmeni aikaisemmin löytymättömiä ongelmia. Lisäksi työkalun käyttöönotto ja sen käyttäminen oli miellyttävää. Kaiken kaikkiaan tällaisten dynaamisesti tyypitetyille ohjelmointikielille kehitettyjen staattisten tyyppitar- kistimien käyttö voi tuoda merkkitäviä hyötyjä ja parantaa ohjelmistokehitysprosessia.

Avainsanat: staattinen tyypitys, dynaaminen tyypitys, asteittainen tyypitys, tyyppijärjestelmät, tyyp- pitarkistimet, ohjelmistokehitys

Tämän julkaisun alkuperäisyys on tarkastettu Turnitin OriginalityCheck -ohjelmalla. iii

PREFACE

Student syndrome refers to planned procrastination, when, for example, a student will only start to apply themselves to an assignment at the last possi- ble moment before its deadline. (Wikipedia, Student syndrome)

This thesis is a testament to the above definition of the student syndrome. Some, includ- ing myself, may have doubted whether it will ever be finished at all. Nevertheless, with the right amount of deadline-induced stress, anything seems to be possible.

I would like to thank Flockler and especially my boss, Toni Hopponen, for providing me with the ability to work on this thesis. Thank you for allowing me to allocate company time for the thesis even though it is only loosely related to work. I want to thank the examiners Prof. Kari Systä and D.Sc. Matti Rintala for providing guidance throughout the thesis process. A special thanks to my friends Mika Kuitunen and Sakari Kapanen for giving their opinions on the online survey questions, your help was much appreciated.

Last but absolutely not least, I want to thank you, Kaisa, for supporting me through this sometimes overwhelming and desperate process. Your continued encouragement, help and proofreading have been vital to the success of this thesis. Thank you for pushing me to write when I didn’t feel like writing at all, and for bringing so much joy to everyday life to counterbalance the thesis work.

Tampere, 26th April 2021

Jaakko Rinta-Filppula iv

CONTENTS

1. Introduction...... 1 2. Type systems...... 3 2.1 Static checking...... 4 2.2 Dynamic checking...... 5 2.3 Type conversions and type safety...... 6 3. Static type checkers for dynamically typed languages...... 9 3.1 Different design approaches...... 9 3.2 Ruby...... 12 3.3 JavaScript...... 13 3.4 Python...... 13 4. Studies on static vs. dynamic type systems...... 15 4.1 Controlled studies...... 15 4.2 Studies on existing codebases...... 19 4.3 Other studies...... 21 4.4 Summary of existing studies...... 21 5. Methodology...... 23 5.1 Exploratory testing...... 23 5.2 Online survey...... 24 6. Results and discussion...... 28 6.1 Exploratory testing results...... 28 6.2 Online survey results...... 32 6.3 Summary of results and discussion...... 38 7. Conclusion...... 40 References...... 42 Appendix A: Static type checker survey...... 45 v

LIST OF FIGURES

5.1 Flowchart of the survey ...... 25

6.1 Years of programming experience ...... 32 6.2 What programming languages do you know? ...... 33 6.3 Number of people working on the codebase ...... 34 6.4 Type checking coverage ...... 34 6.5 Experiences of using static type checkers ...... 36 6.6 Expectations (E) of static type checking effects compared to experiences (X) of using them ...... 37 vi

LIST OF TABLES

2.1 Examples of languages with different characteristics. Adapted from [15]...... 7

6.1 File strictness level statistics at the start and end of the exploratory testing 29 6.2 Call-site-level typedness statistics ...... 30 6.3 Codebase size statistics (lines of code) ...... 34 vii

LIST OF PROGRAMS AND ALGORITHMS

2.1 An example ++ program with a type error ...... 4 2.2 Resulting error when trying to compile program 2.1 ...... 4 2.3 An online store payment example in Python ...... 5 2.4 Resulting runtime type error when trying to execute program 2.3 ...... 6 2.5 Problematic line from program 2.3 converted to JavaScript ...... 6 3.1 An example of type annotations with custom syntax (TypeScript) ...... 10 3.2 An example of type annotations as comments in JavaScript using Google Closure Compiler ...... 11 3.3 An example of type annotations using existing language syntax in Ruby with Sorbet ...... 11 3.4 An example of type annotations using a separate file (Ruby and RBS) . 12 6.1 Using safe navigation operator to prevent a crash in Ruby ...... 31 viii

LIST OF TERMS AND ABBREVIATIONS

call-site-level typedness a Sorbet metric that measures the number of method call sites where the type of the receiver is statically known

a collection of values with similar properties

dynamic type checking method of type checking that happens at runtime

file-level typedness a Sorbet metric that measures the number of files with each sigil

gem an external Ruby package or library

gradual typing typing scheme where the code is partly statically checked

RBI file Ruby file, used by Sorbet to provide type annotations sep- arately from the main code file

RBS a language for describing the structure of Ruby programs

sigil a special comment used by Sorbet to denote file’s type checking strictness level

static type checking method of type checking that happens at compile time

type cast an explicit type conversion

type coercion an implicit type conversion

type conversion the act of converting one data type to another

type system a predefined of types and rules on using them provided bythe 1

1. INTRODUCTION

Type systems are a well-studied area in computer science. There are many formaliza- tions and theories on which real-world implementations are built upon. Compared to the amount of theoretical work, there is relatively little empirical research done on the impact of different kinds of type systems on the software development process. Furthermore, due to being controlled studies, many of these empirical works consider rather small codebases. This thesis contributes to this field by studying the use of static type checkers in large, real-world codebases.

The aim of this work is to experiment with adding a static type checker to an existing large codebase and to find out what the benefits and drawbacks of this are. Additionally, an online survey is conducted to find out other people’s experiences with similar transitions. The author’s own experiences will be reflected against the results of this survey. The following research questions are formulated:

1. Are there perceived benefits in using a static type checker for a dynamically typed language? 2. What are the possible downsides of using one, and do the benefits outweigh them?

This thesis work has been done with the support of Flockler1 where the author is currently employed. Flockler develops a web-based platform for social media content curation. The backend of this platform is developed in Ruby using the Ruby on Rails framework. Working with a language like Ruby that is dynamic and makes a lot of use of its metapro- gramming features can be fun and really fast when prototyping a new project or a feature. On the other hand, it can sometimes cause difficulties when working with large existing codebases. Motivation for this experiment was driven by the author’s interest in type sys- tems and their benefits as well as a desire to improve the code quality and developer experience at Flockler. These factors lead to the author’s decision of experimenting with Sorbet [25] to gradually add type checks to the company’s existing codebases.

The backend of Flockler platform consists of two large Ruby on Rails applications, larger one having around 80,000 lines of code. In a large codebase, there is often a need for someone to work on a feature or a bugfix in a part of the code they are not familiar with.

1https://flockler.com 2

Due to the large amount of code, lack of comments and the dynamic and metaprogram- ming heavy nature of Ruby language, it can take a long to time find exactly what part of the code needs to be worked on. Research by Mayer et al. [14] shows that static type checking can help with working on undocumented codebases.

In addition to helping with working on unfamiliar parts of the code, introducing static type checks to the codebase could improve the reliability of the product by eliminating type errors before the code is run. In the past at Flockler, type errors have slipped through the test suite and code review only to be found in production where they affect the end customers. Both codebases have relatively extensive test suites that catch a subset of the possible defects, as well as staging environments where changes can be manually tested before they are deployed to production. However, the test suite of the larger application is slow to run (around 15 minutes) and is almost never fully run on developers’ machines. It is common for the developers to run only a subset of the suite they think best covers the code being worked on. All of the tests are run on a continuous integration service after pushing to GitHub. This means that even simple type errors might not be noticed before pushing the code and waiting for the tests to run, causing slowness in the development workflow. Having a quicker tool to catch these simple type errors would be a great addition to complement the test suite.

This thesis is structured as follows. Chapter 2 gives a basic introduction to programming language type systems and outlines some of the stated benefits and drawbacks between them. Chapter 3 gives an overview of some of the available tooling for adding static type checking to Ruby, JavaScript and Python, and discusses the design choices made in the tools of this type in general. Chapter 4 presents some of the existing studies on the effects of type systems. Chapter 5 introduces the methodologies used in this study. It describes the questions in the conducted online survey and the reasonings behind them as well as documents the process of the author’s own experiment. Results of the online survey and the exploratory process are presented and discussed in Chapter 6. Finally, Chapter 7 concludes the thesis. 3

2. TYPE SYSTEMS

Programming languages are used to manipulate data in one way or another. This is why every programming language has a way of distinguishing between different types of data. Types and the type system are integral parts of any programming language and there are many differences between typing in different languages. A data type is, according to Gabbrielli, “a homogeneous collection of values, effectively presented, equipped with a set of operations which manipulate these values”. [6] Expanding on each of these proper- ties in that definition we can say that a (data) type in a programming language hasthree properties:

1. Values of the same type share structural properties, which makes them homoge- neous. 2. There are operations that can be performed on the values of a given type. An example could be extracting the first letter of a string. 3. Values of a certain type can be presented effectively. For example, some real numbers cannot be effectively presented since they are infinitely long and cannot be constructed using any algorithm. [6]

A type system in a programming language provides the predefined set of types, a way for the programmer to define new types, rules for using the types such as compatability, equivalence and inference as well as the method of checking the type constraints [6]. Method of checking here means whether the type constraints are checked statically at compile time or dynamically at runtime. As Pierce points out in [15], terms like “dynami- cally typed” aren’t really accurate and a more fitting would be “dynamically checked”. In this work statically/dynamically typed and checked are used interchangeably.

Having different data types and being able to define new ones, allows the programmer to represent the concepts in the application better than just using a few primitive types. When used well, the use of distinct types can also act as documentation for the code. In a sense, type annotations act a like comments in the code but with the added benefit of staying up to date when the code changes. [6, 15] Having type information available at module interfaces also acts as an overview of the operations provided by the module [15]. 4

2.1 Static checking

In static type checking, the compiler checks that a program satisfies all of the constraints set by the language type system before the code is run. For example, a program that tries to assign a string to a variable declared with an integer type would not pass the type checker and thus not compile. [6] For example, trying to compile the example C++ program 2.1 which tries to assign a string value to a variable whose type is int (integer) would result in an error described in program 2.2.

1 #include < s t r i n g > 2 3 i n t main ( i n t argc , char const * argv [ ] ) 4 { 5 i n t count = 0; 6 std::string two_str = "two"; 7 8 count = two_str; 9 10 return 0; 11 } Program 2.1. An example C++ program with a type error

code/type_error.cc: In function ’int main(int , const char ** )’: code/type_error.cc:8:11: error: cannot convert ’std::string ’ {aka ’std::__cxx11::basic_string’} to ’int ’ in assignment 8 | count = two_str; | ^~~~~~~ | | | std::string {aka std::__cxx11::basic_string< char >} Program 2.2. Resulting compiler error when trying to compile program 2.1

The main selling point for static type checking is that it will prevent certain programming errors ever making it to production. In addition to catching type errors early, using static checking the sources of the errors can be identified more accurately compared to runtime checks. [15] Type systems vary in expressiveness and in what of errors they can catch. For example a simple type system could check that when assigning a value to a variable the type of the value matches the declared type of the variable. In a more powerful type system, such as that of Idris’s [12], the type checker can check for example that two lists passed to a function have the same length. 5

Even the most powerful type checker is not a silver bullet for getting rid of errors in pro- grams. As Gabrielli and Martini [6] point out, even if a program is free of type errors, it can still have logic errors. Furthermore, they also give dangling reference (or use- after-free) errors as an example that are not always prevented with a use of a static type checker.1Because static type checkers are, by definition, executed before runtime they might sometimes mark programs as erroneous even if these programs would actually never produce any errors at runtime. For example, a type error within an if statement branch that would never be executed, is still reported. [6, 15]

Static type checker can function as an excellent maintenance tool in refactoring code. When changing a , for example, there is no need to manually find all the places where it is used but instead the type checker will show errors wherever the old definition is used. [15]

2.2 Dynamic checking

In dynamic type checking the type constraints are checked at runtime. A value is stored in memory together with a description of its type. At runtime, whenever an operation is applied, the types of the operands are first checked for correctness. In a compiled language, these checks can be inserted by the compiler. Having to check the types before each operation makes the program more inefficient compared to a statically checked program where such checks aren’t needed. [6] This isn’t usually a major concern on modern hardware where most applications can run fast enough even when written in a slightly more inefficient language [32].

Program 2.3 shows an example Python program that models an online store transaction. It retrieves money from a customer’s bank account and sends the same amount to a shop. Before sending the money to the shop, the program writes a line to a log describing what it is doing.

1 payment_amount = 100 2 3 account = BankAccount() 4 account.withdraw(payment_amount) 5 6 shop = Shop() 7 log("Sending " + payment_amount + " euros to the shop") 8 shop.send_money(payment_amount) Program 2.3. An online store payment example in Python

1Rust [22] is an example of a language where these kinds of errors are prevented by the type checker using a concept called ownership. 6

Running the program will produce an error shown in program 2.4 and the execution will stop. The error comes from trying to add the payment amount to the log line. Since the error happens before the money is sent to the shop, the result of the execution is that the customer is charged but the shop never receives the payment. Although this example is admittedly contrived and in this particular case the error could easily be detected with automated tests, it is still rather easy to imagine errors similar to this slipping out into production in a more complicated setting.

Traceback (most recent call last): File "code/dynamic_type_error.py", line 7, in log("Sending " + payment_amount + " euros to the shop") TypeError: can only concatenate str (not "int") to str Program 2.4. Resulting runtime type error when trying to execute program 2.3

Dynamic type checking is often found in interpreted languages. This makes sense since interpreted languages usually don’t have a separate compilation step where the static type checking would take place. According to Tratt and Wuyts [32] these kinds of dynam- ically typed languages were previously thought to be suitable mainly for small scripts and tools but gained more wide-spread popularity during the rise of the web. Nowadays lan- guages such as Python, Ruby, JavaScript and PHP with dynamic typing are amongst the most popular programming languages according to the 2020 Stack Overflow Developer Survey [28].

2.3 Type conversions and type safety

Recreating the problematic line from the previous Python example (program 2.3) in a JavaScript console yields a different result than the original example:

> let paymentAmount = 100; > console.log(’Sending ’ + paymentAmount + ’ euros to the shop’); Sending 100 euros to the shop Program 2.5. Problematic line from program 2.3 converted to JavaScript

As we can see, in JavaScript, adding a number to a string is not a problem. Python and JavaScript are both dynamically typed languages but they seem to behave quite differently in this situation. The example works in JavaScript because it automatically converts the operands to the same type, in this case, the number to a string.

Different types can be compatible with each other. This compatibility is what often de- termines whether or not an expression is correct from the typing standpoint. One way to achieve type compatibility is to provide a way to convert from type A to type B, a type con- version. Type conversion can be implicit (coercion), meaning it is automatically performed 7 without the programmer specifying it, or explicit (cast), in which case the conversion is marked in the source code. [6] Some programming languages allow more implicit type conversions than others. This is why the previously shown Python example (program 2.3) produces an error but the JavaScript example (program 2.5) does not: JavaScript coerces the number to a string before doing the concatenation while Python does not automati- cally perform this conversion. To get the Python example working, the integer would have to be explicitly cast to a string using the str function2. In this particular example case, having implicit type conversions would have been beneficial since the program would not have crashed and the transaction would have been processed correctly, but this is not al- ways the case. Type coercions can lead to some very confusing behavior that can cause subtle bugs that can be hard to detect. For example, JavaScript performs a lot of implicit type conversions. One example of this is how numbers and strings interact in arithmetic expressions: 2 * "3" will produce 6 but the result of 3 + "3" is "33". These kinds of idiosyncrasies combined with dynamic type checking can lead to bugs that are hard to debug.

Type conversions are related to a concept called type safety. Type safety determines how, if at all, different types can interact with each other. Programming languages can be categorized into safe or strongly typed and unsafe or weakly typed based on their type safety. In a strongly typed (or type safe) language “no program can violate the distinctions between types defined in that language” [6]. Pierce [15] describes type safety with“a safe language is one that protects its own abstractions”. Type safety is orthogonal to the method of type checking, meaning for example that a strongly typed language can be either statically or dynamically checked [15]. Table 2.1 illustrates this distinction.

Table 2.1. Examples of languages with different type system characteristics. Adapted from [15].

Statically checked Dynamically checked

Strong Haskell, Java Python, Ruby

Weak C, C++ JavaScript

Often languages with stronger type systems allow for fewer implicit type conversions. Having to write explicit type casts makes it harder to accidentally create errors which is precisely the point of a strong type system. Pierce notes that even strongly typed languages rarely prevent all possible type misuse. In fact, these languages often provide a way to bypass the type system. [15] Some examples of these so called type system

2Both of the languages provide better ways to achieve the desired output, for example string.format function in Python and strings in JavaScript. Simple concatenation is used here for illustrative purposes. 8 escape hatches are the unsafeCoerce function3 in Haskell and the unsafe keyword4 in Rust.

3https://hackage.haskell.org/package/base-4.15.0.0/docs/Unsafe-Coerce.html#v: unsafeCoerce 4https://doc.rust-lang.org/std/keyword.unsafe.html 9

3. STATIC TYPE CHECKERS FOR DYNAMICALLY TYPED LANGUAGES

The decision of type checking strategy is made when designing the programming lan- guage. As Pierce [15] points out, the language design and type system should be done together. The design of the type systems affects the design of the language as a whole both in terms of syntactic choices as well as the features provided by the language [15]. Still, popular dynamically checked languages such as JavaScript, Python and Ruby have had static type checkers developed for them. These tools can be used to achieve what is called gradual typing [24] which means that parts of the program are statically type checked and some are not. Interestingly, the fact that many of the tools are developed by big companies using these languages in a large scale, seems to hint that using completely dynamically typed languages is not desirable in large codebases with many developers.

This chapter will discuss different design decisions made in these tools and give an overview of a few of the most popular projects in this space. JavaScript and Python were chosen as example languages because they are the two most popular dynamically typed languages in the 2020 Stack Overflow Developer Survey [28]. Likewise, Ruby is also quite high on that list but more importantly, it is the language the author is using in his own experiment.

3.1 Different design approaches

Just like traditional static type systems, these third-party tools come in many different flavours with varying implementation strategies and feature sets. While many of thetools support it is also necessary to provide at least some type annotations that the type checker can use as a guide. Perhaps the most interesting design decision regarding these tools is how they have chosen to implement type annotations. Often the language syntax is not designed to accommodate type annotations since they are not really necessary in a dynamically typed language. Options for adding type annotation support to the language include: extending the language syntax, annotating types using special comments, using existing language features to implement type annotations, and providing type annotations in a file separate from the main code. All of these have their own pros and cons and some tools even use more than one of them simultaneously. 10

Extending the language syntax is arguably a rather elegant solution because it makes the type annotations an integral part of the code. However, creating new syntax means that the parser for that language will stop working since it does not recognize the new and un- familiar syntax unless this new syntax is part of the official language syntax specification. This leads to these kinds of solutions having to do some sort of transformation step on the code before it can be run, making it effectively a new language. For example, the pro- gram 3.1 uses TypeScript and requires a compilation step from TypeScript to JavaScript before it can be run. Sometimes adding a feature as prevalent as type annotations nicely to existing syntax might not be possible e.g. if the syntax is already complicated and there are no suitable constructs left.

1 function add ( a : number , b : number ): number { 2 return a + b ; 3 } Program 3.1. An example of type annotations with custom syntax (TypeScript)

Adding type annotations in comments is rather simple and flexible solution. There is no need to remove the comments before running the program as they are ignored, and the type annotation syntax can be anything since it cannot interfere with other language features. An example of this approach can be seen in program 3.2 which uses Google Closure Compiler1 to annotate JavaScript code. A major drawback of this method is that comments are usually ignored by other language tooling such as syntax highlighting, code formatters etc. which leads to a poor developer experience. Implementing type annotations using existing language features removes the tooling issue from the equation: annotations being nothing but regular language syntax means that highlighting and other tools should work fine with them. Also, these kinds of annotations can be used toprovide additional runtime checks if desired. The problem with using existing syntax is that it can make the annotations rather verbose and not as elegant as dedicated syntax. An example of annotations using existing language features is presented in program 3.3 with Ruby and Sorbet [25].

1https://developers.google.com/closure/compiler/ 11

1 /** 2 * @param {number} a 3 * @param {number} b 4 * @return {number} 5 */ 6 function add ( a , b ) { 7 return a + b ; 8 } Program 3.2. An example of type annotations as comments in JavaScript using Google Closure Compiler

1 extend T::Sig 2 3 sig {params(a: Numeric, b: Numeric).returns(Numeric)} 4 def add ( a , b ) 5 a + b 6 end Program 3.3. An example of type annotations using existing language syntax in Ruby with Sorbet

It is also possible to provide type annotations in a separate file alongside the main code file as seen in the example program 3.4 that uses Ruby andRBS2. This approach does not limit the syntax of these type annotations in any way since the annotation files do not need to be read when running the program. It can be tedious to keep the separate annotation file in sync with the code and this method could mean doubling thefilecount in a project. Some of these maintenance tasks can of course be automated. This way of annotating provides an opportunity for the community to create type annotations for existing libraries which helps to make the type checking more accurate in projects that use these libraries. Many of the tools listed in the following sections have an option for separate annotation files and a repository of annotations for existing libraries.

2https://github.com/ruby/rbs 12

1 # Main code file math.rb: 2 def add ( a , b ) 3 a + b 4 end 5 6 # Type annotation file math.rbs 7 def add: (a: Numeric, b: Numeric) −> Numeric Program 3.4. An example of type annotations using a separate file (Ruby and RBS)

The example programs illustrate the differences between the different approaches quite well. For example, the TypeScript syntax in program 3.1 is much more succinct in com- parison with the style Sorbet uses (program 3.3). Every style has its own characteristics and they prioritize different aspects of the development workflow as a whole.

3.2 Ruby

There have been many different projects aiming to provide a type checking system for Ruby. One of the earliest examples is Diamondback Ruby (DRuby) which extends Ruby with static type checks, type inference and type annotations [3, 5]. Another example is Ruby Type Checker (rtc) that adds support for type annotations and runtime checks for those annotations [20].

More recent developments in this space include Sorbet [25] and Steep3 as well as the recent introduction of type annotations in Ruby version 3 in the form of RBS language [21]. Ruby takes an interesting approach to type annotations: the type signatures for methods are written in a separate file using the RBS language. However, there is no first-party type checker but the idea is to provide a standardized way to define the types that third- party type checkers can then use. Steep already uses the RBS format and Sorbet might support it in the future. In addition to the separate type signature definition file, Sorbet also supports inline type annotations using a domain specific language that is valid Ruby code.

Sorbet was chosen for this master’s thesis project because of its maturity, extensive fea- ture set, excellent documentation and development backed by Stripe4, a large company using it in production. The methodology used in this project is described in detail in Chapter 5. Sorbet’s type system is quite powerful with for example control flow-sensitive typing, exhaustiveness checking, abstract classes and interfaces as well as experimental support for shape types. There is also a large collection of premade type signatures for

3https://github.com/soutaro/steep 4https://stripe.com 13 common gems5 in sorbet-typed6 project as well as sorbet-rails7 gem for better integration with Ruby on Rails framework. [25]

3.3 JavaScript

JavaScript has been the most popular programming language in the Stack Overflow De- veloper Surveys for eight years in a row [28]. It is the language for writing web applications and it is used more and more for desktop applications as well through projects like Elec- tron. Therefore, it is no wonder that companies and the developer community have taken interest in creating additional features and tooling such as static type checking for it.

Three of the most popular static JavaScript type checkers are currently TypeScript8 devel- oped by Microsoft, Flow9 developed by Facebook and the previously mentioned Closure Compiler developed by Google. In contrast to Sorbet for Ruby, where type annotations are valid Ruby code, Flow and TypeScript extend the JavaScript syntax with type anno- tations. This results in the need for an additional compilation step that removes these annotations before the code can be run on a standard JavaScript interpreter. Closure Compiler opts to have the annotations as special comments, thus not altering the lan- guage syntax. Technically, TypeScript can be considered as its own language but since all JavaScript programs are also valid TypeScript, it can effectively function like Flow or any other type checker. The DefinitelyTyped project10 provides a collection of TypeScript declarations for existing JavaScript packages that don’t have them included, similarly to the previously mentioned sorbet-typed project.

3.4 Python

Standard Python syntax has had support for type hints since version 3.5. These type hints are not checked or enforced by the Python runtime but exist to be used by third-party tools. [33] This approach of leaving the actual use of the type annotations to external tools is similar to the Ruby 3 approach.

Probably the most popular Python type checker is mypy11. Other Python type check- ers include Pyright12, Pyre13, Pytype14 and the one built in to the PyCharm IDE15. All of

5External libraries are called gems in Ruby 6https://github.com/sorbet/sorbet-typed 7https://github.com/chanzuckerberg/sorbet-rails/ 8https://www.typescriptlang.org/ 9https://flow.org/ 10https://definitelytyped.org/ 11https://mypy-lang.org 12https://github.com/Microsoft/pyright 13https://pyre-check.org/ 14https://github.com/google/pytype 15https://www.jetbrains.com/pycharm/ 14 these support the official Python type annotation syntax and offer similar capabilities to each other. Equivalent to sorbet-typed and DefinitelyTyped, Python’s Typeshed16 project contains typed stubs for the Python standard library and many third-party packages.

16https://github.com/python/typeshed 15

4. STUDIES ON STATIC VS. DYNAMIC TYPE SYSTEMS

The question whether one type system is better than another is often debated between programmers. Proponents of static type systems often argue that static type systems help prevent errors and therefore writing programs using them results in better quality software. Those in favor of dynamic type systems might say that static type systems make the development process slower without noticeable benefits over automated testing. Given how much this topic is argued on the internet and other forums, there is relatively little scientific research done about it. This chapter will introduce some of those studies.The studies are divided into three categories: controlled experiments, studies where existing source code was collected and analyzed, and other studies which do not fall into either of the previous categories.

4.1 Controlled studies

One of the earliest empirical studies regarding type systems was done by Gannon in 1977 [7]. In this experiment, the participants programmed the same program twice, once in a statically typed language and once in a “typeless” language. The participants were divided into two groups where one group first completed the task using the statically typed language and then in the typeless language and vice versa for the other group. The number of errors and their occurrences were compared between the two languages. The study found that statically typed language produced more reliable programs. [7] The results of this study are not very useful in determining the effectiveness of static type checking as the program in the task required string manipulation but the typeless lan- guage lacked a string data type altogether. This resulted in participants having to manu- ally manipulate the data which caused most of the errors in that language. [17]

Prechelt and Tichy conducted an empirical study [17] in 1998 where they investigated the benefits of type checking procedure arguments. They discovered that type checking reduced the number of bugs left in the final programs, reduced bug lifetimes during the development, and increased productivity. They do note, however, that these results are not generalizable to every situation. [17] Dan Luu questions the significance of these results in his literature review [29], saying that the measured differences between type systems are far smaller than the difference between subjects’ programming abilities. 16

Daly et al. performed a small study [1] in 2009 where they explored the impact of static typing in Ruby programming language. In this study, four Ruby programmers were tasked to write two different programs, a sudoku solver and a maze solver. For each of the par- ticipants one task was randomly selected to use DRuby [3] and the other was done using the standard Ruby interpreter. Information about the errors found during the development was gathered and analysed and participants were also interviewed afterwards. The au- thors conclude that no observable benefit of using DRuby over the standard interpreter was found. The participants said that they relied on their memory and naming conven- tions for the type information as well as Ruby’s interactive shell and reflection capabilities to find information on available methods and their type signatures. [1] The programming tasks in the study were quite small which makes it possible to remember most of the type signatures of the program. This would become increasingly harder as the program grows so maybe tools such as DRuby would be more useful in larger codebases.

Stefan Hanenberg, together with other authors, has been conducting several empirical studies on the differences between static and dynamic type systems. Studies by Ha- nenberg [9], by Stuchlik and Hanenberg [30] and by Kleinschmager [13] investigated the effect of type systems on the development speed. The effect of static typing on the us- ability of APIs was studied by Mayer et al. [14], by Endrikat et al. [4] and by Spiza and Hanenberg [27]. Impact on the maintainability of software was also studied by Hanenberg et al. in [10].

Hanenberg started his experiment series with a study [9] concerning the effects of a static type system on the development speed of a program. He conducted a controlled study with 49 participants where the participants had to implement a text scanner program (a sanitizer and a tokenizer) and then expand that to a simplified parser for the Java programming language. A custom programming language was developed for this study to mitigate the differences between participants’ experiences with the language and the differences in the documentations of existing languages. There were two versions of the language, a statically typed one and a dynamically typed one, and half of the participants used the other and the other half used the other. The languages were taught to the participants before them completing the tasks. In the first part of the experiment (the scanner), the dynamically typed language was faster to develop in but in the second part no significant difference was measured. This would seem to indicate that dynamically typed languages are quicker to develop in for smaller projects. The author notes that this experiment considers only the development time and it does not address any other possible differences between the types of languages. [9]

Stuchlik and Hanenberg continued researching the effects of type systems on develop- ment speed in their 2011 study [30]. In this study, they focused on how the type casts needed in a statically typed language affect the development speed. In the experiment, 21 subjects completed five programming tasks in both Java (statically typed) and asub- 17 set of Groovy (dynamically typed). They recorded a significant impact in favor of the dynamically typed language when comparing the sum of all tasks. Measured separately, in three of the five tasks dynamic typing came out ahead but in the two remaining tasks no significant difference was measured. [30]

Kleinschmager also studied in [13] the development speed difference between statically and dynamically typed languages. There were 36 participants and they were given nine tasks to complete in both Java and Groovy. The participants were divided into two groups: one starting with Java and one starting with Groovy. The main hypothesis in the study was that solving a task that requires using an undocumented API is faster in a statically typed language than in a dynamically typed one. This hypothesis was confirmed and a significant positive effect on development time was observed when using a statically typed language. There were two other hypotheses regarding the time it takes to fix errors but those were rejected due to difficulties in analysing the results, which were caused by the learning effect. [13]

Mayer et al.’s study [14] was performed to find out if there is a difference in required development effort between statically and dynamically typed language when working with an undocumented API. They also looked into whether or not the complexity of the tasks has any effect on the results. Development speed was used as a measurement of effort required to complete the programming tasks. Two groups of participants had to complete all five tasks twice: once in each programming language, each group starting with different language. The authors conclude that

1. dynamic type checking might reduce development time when the tasks are on the easier side and 2. static type systems are beneficial when the task requires identifying many different classes.

They also note that the tasks at hand affect the results and there is no simple, generaliz- able answer to static vs. dynamic typing. [14] Similarly, Spiza and Hanenberg investigated in [27] whether type annotations, without static type checking, can improve the usability of APIs. Similar to other studies, this study also had participants divided into two groups where one group started with type annotations and the other without them, and the roles were switched for the second round. Again, development speed was used to measure the usability. Participants were given four tasks where they had to fill in the implementation of a method stub which required using multiple existing interfaces. In one of the tasks, the required API had wrong type annotations. The authors chose Dart1 as the programming language for the experiment since it allowed to have dynamically checked code with or without type annotations2. The results show that having correct type annotations does improve the usability of an API while having incorrect type names can slow down the

1https://dart.dev 18 development significantly. [27]

Hanenberg et al. studied in [10] if static type systems help in software maintainability. They wanted to find out if static typing helps when extending existing code without any documentation, and if it helps when fixing programming errors. There were 33 participants who were divided into two groups: one starting with Java and one starting with Groovy. They all had to perform nine tasks in total on an existing piece of software, a small turn- based game. The tasks were divided into three categories: class identification tasks (five tasks), type error fixing tasks (two tasks) and semantic error fixing tasks (twotasks). Development time was used as a measurement. The results from of study show that static type systems help when using new classes (class identification tasks) and when fixing type errors but no difference in fixing semantic errors was detected. Theyalso noticed that when using a static type system, the participants made fewer file switches in the class identification tasks and type error fixing tasks, which they think contributed to the shorter development time. [10]

Endrikat et al. studied in [4] how documentation and type systems affect the usability of APIs. Their goal was to answer

1. do developers actually use free-text form documentation 2. does development time (including time spent reading the documentation) vary be- tween static typing and dynamic typing with or without documentation and 3. is there a difference in the actual programming time between the type checking and documentation combinations.

Like in the previously mentioned Spiza and Hanenberg study [27], this study also used development speed as the measurement for the usability and Dart as the programming language. The 25 participants were divided into four groups: static typing with and without documentation, and dynamic typing with and without documentation. The authors found that a static type system as well as documentation improved the usability of APIs and that, when both of them are used together, the effect is more significant. [4]

Harlin et al. performed a controlled study [11] where they investigated the effects of static type systems on developing a program from scratch and on debugging programs. In the study, 14 participants were split into two groups: statically typed group who used C# and dynamically typed group who used PHP. Both groups had to complete two programming tasks and two debugging tasks. In the programming tasks, the participants had to develop a simple data validation program and a more complex file encryption program, and their performance was measured in completed requirement points (features). In the debugging tasks, the participants were given two programs similar to the ones they had created that

2Dart version 1 allowed this kind of optional type checking while the current version, Dart 2, is always statically type checked. 19 had different kinds of bugs in them and they had to find and fix those. The results show that there was no significant difference between the languages in the programming tasks. In the debugging tasks, the programmers using the statically typed language were able to fix more bugs than the other group and this difference was more pronounced inthemore complex encryption program. [11]

4.2 Studies on existing codebases

Delorey et al.[2] and Ray et al.[18] have both performed studies where they investigate the differences between programming languages using data collected from open source projects. Delorey et al. focused on productivity differences between languages. They collected data from 9,999 repositories hosted on SourceForge3 and analyzed the differ- ences in “estimated average annual productions” between ten programming languages. Their results show differences between programming languages in annual programmer productivity. Interestingly, the statically typed languages in the study (Java, C, C++, C# and Pascal) ranked higher in their productivity metric than any of the dynamically typed languages in the study (JavaScript, , Tcl, Python and PHP). [2]

Ray et al. collected 50 most starred4 projects of 17 different programming languages from GitHub5 and analyzed those that had more than 28 commits. They categorized pro- gramming languages based on the programming paradigm, the method of type checking, memory management and whether or not implicit type conversions are allowed. They then calculated the number of different kinds of bugs found in the projects by searching for different keywords and phrases in the commit histories of the projects. In the end, their results suggest that, for example, statically typed languages are better than dynam- ically typed and that not having implicit type conversions is better in terms of having less bugs. [18]

Dan Luu noted in his literature review that Ray et al.’s study has some rather severe problems with the project selection where none of the top three TypeScript projects were actually TypeScript projects, for example [29]. Luu’s review was done on an earlier version of the study [19] and it seems that some of these problems have been fixed since then but not all of them. For example, the second and third most popular Perl projects are still a JavaScript project and a small helper script for Ruby on Rails development. Also NodeJS is listed as a JavaScript project even though it is actually a C++ project with a lot of JavaScript files for testing purposes. Taking these into consideration, the results ofthe study should be taken with a grain of salt. However, when it comes to static vs. dynamic type checking, they seem to be in line with other studies, namely [7, 17].

3https://sourceforge.net 4Users can “star” a project on GitHub to express interest in it. 5https://github.com 20

With a similar approach to Delorey et al. [2] and Ray et al. [19], Souza and Figueiredo investigated the use of type annotations in Groovy programs in their 2014 study [26]. Groovy is a hybrid between a statically and dynamically typed language where type checking is dynamic by default but the programmer has the option to mark certain parts of the code to be statically type checked. Type annotations can be added to the dynamically checked code but those annotations will be ignored by the compiler. The authors col- lected the source code of 6,638 Groovy projects on GitHub and analyzed how and where type annotations are used and what factors affect it. Their results show that:

1. Static typing is used more often on module interfaces, public method and construc- tor declarations, and on protected fields. Private declarations and local variables are not typed as often. The authors speculate that this is caused by type annota- tions acting as a documentation as well as allowing the development environment to provide better code assistance. 2. Type annotations are used less often in tests and scripts compared to the main classes of the program. 3. Programmers who are used to programming mainly in statically typed languages use type annotations more often than those used to dynamically typed languages. 4. The size or age of the project does not seem to affect the use of types. 5. Type annotations are used less in the parts of the code that are changed more frequently. [26]

The results regarding the type annotations as documentation (1) are similar to those got by Spiza and Hanenberg in their study [27]. The lesser use of types in more frequently modified parts of the code would suggest that programmers prefer the assumed develop- ment speed increase of dynamic typing to the documentation and maintainability benefits of type annotations [26].

The authors of Gao et al.’s study [8] conducted an experiment to find out how many bugs could adding a static type checker have prevented in JavaScript codebases. They collected a set of closed bug reports from JavaScript projects on GitHub. They then found the commit that had fixed a bug and reverted the state of the codebase to how it wasright before the fix was applied. After that, type annotations were added to the erroneous piece of code and the code was run through TypeScript and Flow type checkers to see if those tools would have caught the bug had there been type annotations present. The results of the study show that out of the 400 bugs tested, both TypeScript and Flow each were able to detect 60 bugs with 57 of them being detectable by both type checkers. [8] 21

4.3 Other studies

In 2000, Prechelt studied [16] the differences between seven different programming lan- guages divided into two categories: scripting languages (Perl, Python, Rexx and Tcl) and conventional languages (C, C++ and Java). These languages were compared both indi- vidually and as aforementioned groups regarding program runtime, memory consumption, reliability, working time and productivity, the length of the program, and program structure. A total of 80 implementations of the same program in different languages were sourced. The C, C++ and Java programs were obtained through a controlled study and the rest of the programs were done by volunteers from newsgroups. The study found that it was much faster to write the programs, at least small ones, in a scripting language than in a conventional compiled language. Meanwhile, no significant difference in the reliability of the resulting programs was found between the language categories. [16] It is notable that it is difficult to draw any generalizable conclusions from the study since the test conditions were different between the two groups [29]. The scripting language group had access to the task descriptions many days before doing the actual programming work and their work time was self-monitored and -reported. The other group, on the other hand, did the tasks in a controlled setting with limited time and accurate work time tracking. [16]

Sagonas and Luna performed and experiment [23] where they took an existing, large codebase written in Erlang (which is dynamically typed) and gradually added type checks to it using the Dialyzer tool. Dialyzer is a static analyzer that can detect different kinds of defects in the source code including type errors. Furthermore, Dialyzer’s type checking can be improved by adding type annotations to the code. The authors converted “type annotation” comments into real annotations checked by the static analyzer and found about fifth of them to be erroneous. This process was done in parts as adding allofthe annotations at once could produce a lot of errors which would make them harder to go through and fix. In addition to finding invalid annotations, they discovered somebugs that hadn’t been caught by the automated test suite. The authors state that even though this kind of tool is not a silver bullet to software quality, it is useful for catching some programming errors as well as for documentation and maintainability purposes. [23] This study has quite similar methodology to the exploratory part presented in this thesis.

4.4 Summary of existing studies

Looking at the existing studies it is impossible to declare one typing scheme absolutely better than the other. Some studies [9, 16, 30] came to the conclusion that it is faster to develop with dynamic typing while others [2, 13] found static typing to be faster. It is impor- tant to remember that in most of these studies the programs in question were rather small, which could favor dynamic typing. Mayer et al.’s study [14] results showed that static typ- ing became faster to develop with when the tasks grew in complexity. Kleinschmager’s 22 study on using undocumented APIs [13] is interesting in how it relates to today’s soft- ware development environment: codebases are changing fast, and teams are becoming increasingly distributed around the world which makes communicating changes more dif- ficult. It often happens that the code might not be documented as well as onewouldlike, if at all. It seems that this is an area where the static typing could bring significant benefits as suggested in the study.

In aspects other than pure development speed, static typing seems to have an edge over dynamic typing. It was found to be more reliable (in terms of the number of bugs intro- duced) in studies like [7, 17] as well as improving the maintainability and usability of a codebase in studies such as [4, 10, 11]. Even just having type annotations without any static type checking seemed to improve API usability [27]. With the previously mentioned aspects of today’s development environments, it is quite easy to see how these usability and maintainability improvements can have a significant positive effect on the develop- ment process and product reliability. It is also interesting to see that the studies on using a static type checker in a dynamically typed language [8, 23] had positive results: in the case of [8] the type checker catching 15% of the bugs, and in [23] being able to find new bugs that automated tests had missed from a mature codebase. 23

5. METHODOLOGY

This study was divided into two parts: an exploratory testing of a third-party static type checker and an online survey on developers’ experiences of using third-party static type checkers. In the exploratory testing part, the author went through the process of setting up and adopting Sorbet [25] in a large Ruby codebase taking notes of the found bugs and other problems or otherwise noteworthy things. Since the target of the exploratory part is only a single codebase and a single tool, it alone is not very representative of these tools overall. Because of this, the online survey was conducted to augment the results of the exploratory study with other people’s experiences of using similar tools. The details of the exploratory testing and the online survey are described in Sections 5.1 and 5.2 respectively.

5.1 Exploratory testing

The plan with the exploratory part of the study was to setup Sorbet in either of the two Ruby on Rails codebases at Flockler and gradually increase the type checking coverage to see whether any bugs could be discovered. Originally, the smaller codebase was considered for the study because it used a newer version of Ruby on Rails that would work with the current version of the sorbet-rails gem. However, between starting the thesis work and getting to the point where this portion of the thesis could be started, work had gone into upgrading the Rails version used by the larger project. The upgrade process was now far enough that experimenting with Sorbet could be done on the larger codebase which had more variety and therefore could provide more comprehensive results.

The initial setup of Sorbet consists of installing the gem and running an initialization com- mand. This initialization process will look at all the Ruby files in the codebase and de- termine the highest possible type checking strictness level that will not produce any type errors for each file. It will also pull in third-party type signature definitions (RBI files)from the sorbet-typed project and generate placeholder definitions for dynamically defined con- structs. Sorbet documentation [25] lists the five strictness levels available:

• ignore: The file is completely ignored by Sorbet • false: Sorbet reports only syntax errors and missing constant errors in these files 24

• true: “Normal type errors”, such as calling methods with wrong amount of argu- ments or calling non-existent methods, are reported • strict: All methods, constants and instance variables in the file must have explicit type signatures • strong: A special “untyped” type is not allowed in the file. This level is used mainly in RBI files and empty class definitions. [25]

The strictness level of a file is defined in a special comment, calleda sigil, at the top of the file, e.g. # typed: strict. The Sorbet documentation has some suggestions for adopting it in an existing codebase. It suggests that in the beginning the main focus should be on getting as many of the files to strictness level true without worrying too much about type signatures. [33] Using the advice given in the documentation the plan of action was going to be:

1. Pick a file with strictness level ignore and change it to level false 2. Run the type checker 3. Fix and take notes of the possible errors reported

(a) If the errors are not fixable for some reason, rollback to the previous level 4. Repeat from step 1 until there are no more files to convert from this strictness level and then move on to converting the files to the next strictness level (e.g. false to true).

Since the codebase is quite large, about 80,000 lines of Ruby code, it was not feasible to enable static checking on all files in the time available. Files to enable the checks on were chosen based on their complexity and perceived importance. For example, rela- tively simple utility classes are good targets because they do not usually contain a lot of metaprogramming tricks and are possibly used by a lot of other parts of the code. Also abstract base classes that are meant to be (partially) overridden are of high value be- cause they define interfaces other classes are supposed to implement. Some filessuch as database schema migrations, test scripts and other miscellaneous utility scripts were excluded altogether since they do not contain any code that is run in production. This resulted in a little under 2,400 files to be analyzed by Sorbet. A total of approximately five working days were spent on improving the type checking level across the codebase.

5.2 Online survey

Because of the short timeframe and limited amount of engineers participating in the ex- ploratory testing, an online survey was designed to complement it. An online survey was chosen because it made possible to gather quite a lot of both quantitative and qualitative data on the subject in a relatively short time. The goal of the survey was to find out the 25 programming community’s experiences in using static type checkers designed for dynam- ically typed languages. The survey did not consider only Ruby and Sorbet but any such tool with the assumption that the experienced effects would be comparable across differ- ent languages. This chapter introduces the structure and the questions of the survey and gives a reasoning for them.

The survey was implemented using an anonymous Google Form which was open from March 9th 2021 to March 31st 2021. The link to the survey was posted on Koodiklinikka Slack community1 and on Reddit’s r/programming subreddit2. There were 25 questions in total divided into four sections: background information, reasons for not using a static type checker, experiences in using a type checker, and experiences in setting up and adopting the use of a static type checker. Most of the questions were multiple choice or simple number inputs that the participants were required to answer. Four of the questions were optional open questions asking for clarifications or additional information regarding a preceding question. The sections the participant had to answer varied based on their answers. The decision tree describing which sections the participant saw is shown in Figure 5.1. The questions in the branching points have been shortened to make them fit better. The full list of the questions and the answer choices is provided in appendix A.

Start

Background information

Has used static No Questions about type checkers? not using the tools

Yes Questions about using the tools

Questions about Yes Has setup setting up the tools a type checker?

No

End

Figure 5.1. Flowchart of the survey

1https://koodiklinikka.fi 2https://reddit.com/r/programming 26

The first section of the survey asked some background information regarding the partic- ipant’s programming experience. This section consisted of four questions. The partici- pants were asked to rate their programming skill level on a scale beginner–intermediate– expert, their years of experience, whether or not they write code professionally, and which programming languages they know.

Those who answered they hadn’t used any of the tools were asked only four questions more. The point of these questions was to get some insight why some people have chosen or been forced not to use a static type checker in their development. First one was “Why haven’t you used a static type checker?” and it had a couple of selectable answers such as “There is no such tool available for my language of choice” and “I don’t think they are useful”, and an option to provide other reasons. The next question was “Which type checkers have you heard of?” with a choice of a few popular tools and the option of specifying other or none. Last question was whether the participant would consider using a static type checker in the future or not with the option to justify their answer.

Those who had used a static type checker continued on to the section about using the type checkers. First, they were asked what type checkers they have used with a few select type checkers as options and the ability to add others not in the list. Next they were asked to estimate the size of the largest codebase they have used a type checker in. There is evidence [11, 14] that static type checking might be more useful in a more complex codebase and this question could be used in gauging the effect of the codebase size on the perceived benefits of a type checker. The next two questions were aboutthe percentage of the codebase that was type checked, and how many people were there working on the codebase. The fifth question was a matrix question where the participants were asked about the effects of using a static type checker on the following areas: code reliability, development speed, confidence when writing new code, confidence in refac- toring existing code, API usability, codebase maintainability, and working with unfamiliar parts of the code. Each of these areas was rated on a discrete scale from 1 to 5 where the numbers meant

1. significantly decreased 2. somewhat decreased 3. no effect 4. somewhat improved 5. significantly improved.

Some of these areas were chosen because they had been the focus of previous studies, namely code reliability in [7, 16, 17], development speed in [9, 13, 16, 30], API usability in [4, 14, 27], and maintainability in [10]. The rest of them were areas the author thought could be affected by the use of a static type checker. The last questions in the type 27 checker usage section were about the overall usefulness of the tools and whether or not the size of the codebase affects it.

The last section had questions about setting up and adopting a type checker. First, there was a free text answer to describe the setup and adoption process of the type checker followed by a multiple choice question about the amount of work required for the setup. Next up, there was a matrix question about the expected effects of a static type checker on the same areas as in the previous section regarding the observed effects. This question was added so that the expectations and the actual effects could be compared as this could provide some interesting insights like a static type checker “disappointing” the user in some aspects. The section and thus the whole survey concluded with questions about how well the static type checker met the participant’s expectations and whether or not using it justifies the effort needed to setup and adopt it. Both of these questions hadan optional follow up question where the choices could be justified. 28

6. RESULTS AND DISCUSSION

This chapter presents the results of the exploratory testing and the online survey in Sec- tions 6.1 and 6.2 respectively. In Section 6.3, these results are then discussed and com- pared with each other and the previous studies analysed in Chapter 4. Some potential flaws in the study and possible future research ideas are also discussed.

6.1 Exploratory testing results

The experience of using Sorbet was positive during the exploratory testing. The setup process was quite smooth after leaving out unnecessary files to make the process faster. Enabling type checking in more and more files gradually means that the developer is not faced with an endless list of type errors at once, which would be very difficult to han- dle. Instead, only errors from one file can be handled at once which makes the process manageable.

When using Sorbet, each file is annotated with a special comment, called a sigil, atthe beginning of the file. This annotation determines the type checking strictness level forthe file. The available strictness levels, from the laxest to the strictest, are: ignore, false, true, strict and strong. Basically, files that are set to level true or higher are statically checked by Sorbet.1 As stated before in Section 5.1, the goal was to get as much of the codebase to be statically type checked as possible, in other words, to have sigil true sigil or higher. This was done by first moving files from ignore sigil to false and then from that to true. Whenever the strictness level is increased, there is a possibility of the type check failing and thus possibly revealing a bug in the code. At the end of this experiment, the Sorbet is not yet in use in the production branch of the application and the only developer who has been working on the adoption has been the author.

There are two key metrics that can be automatically exported by Sorbet after type check- ing: the amount of files with each sigil, called file-level typedness in Sorbet terminology, and what is called call-site-level typedness. File-level typedness is simply the number of files annotated with each of the different strictness levels. Call-site-level typedness de- scribes how many of the method calls in the codebase are done on an object whose type is known to Sorbet. Call-site-level measurement consists of two parts: the total number

1Strictness levels are explained in more detail in section 5.1. 29 of method call sites in files that are # typed: true or higher, and on how many of those call sites the type of the target of the method call is known (the call site is typed). Track- ing file-level typedness is intuitive: the more files are type checked the better. However, static checking can be enabled for a file but then various escape hatches can be used to bypass the type checker. This is where the call-site-level typedness comes into play. It measures the actual amount of type checked method call sites and thus represents the actual typing coverage of the codebase more accurately than the file-level typedness. Tracking call-site-level typedness helps to reduce the amount on untyped variables which provides more accurate type checking. [31]

Table 6.1 lists the number and percentage of files with each sigil and the total number of files. The “Initial” column reports these numbers right after running Sorbet’s initialization command and the “End” column shows the numbers after the exploratory testing phase was completed. The “Difference” column shows the difference between the starting point and the resulting state. As can be seen from the table, the number of unchecked files (ignore and false) has gone down quite significantly.

Table 6.1. File strictness level statistics at the start and end of the exploratory testing

Initial End Difference

Strictness level File count % File count % File count %

Ignore 36 1.5 7 0.3 −29 −80.6

False 645 26.9 561 23.4 −84 −13.0

True 1085 45.3 1174 49.0 89 8.2

Strict 619 25.8 642 26.8 23 3.7

Strong 10 0.4 13 0.5 3 30.0

Total 2395 2397

Call-site-level typedness statistics are shown in the table 6.2. It lists the number and percentage of typed call sites, and the total number of call sites in the type checked files. As more files are converted to strictness level true or higher, the number of call sites goes up. Just like the percentage of type checked files has gone up, the percentage of typed call sites has also gone up but not by as much. 30

Table 6.2. Call-site-level typedness statistics

Initial End

Call sites Count % Count %

Typed 21 561 42.3 26 758 43.6

Total 50 921 61 329

Even though the number of unchecked files did go down, there are still quite a fewof them left in the codebase. A big portion of these files are ActiveRecord model definitions. These models represent the objects in the application’s database and Rails automatically generates, among other things, getter and setter methods for the database columns on the class. Since these methods are dynamically created at runtime, Sorbet does not know about them when the static type checks are run. This makes it really difficult to annotate these files with a sigil higher than false without additional tooling. There is a gem called sorbet-rails2 that can automatically generate type signatures for these dynamically de- fined methods. Unfortunately, the author encountered an error when trying to set thegem up that could not be resolved at the time. The root cause of the error is still unknown but it is most likely a conflict with some other gem used in the project. Once this problem is solved at some point, the file-level typedness can be improved significantly.

The most common error found by Sorbet in this experiment was calling methods on vari- ables that might be nil. Depending on the situation, the fix for these sort of errors was to either add a Sorbet type assertion T.must to the call or to use Ruby’s safe navigation operator (&.) on these calls. Using the T.must assertion will raise an error at runtime if the value is nil. Calling a non-existent method on a nil value will also raise an error so the runtime behaviour does not really change. However, using the type assertion has the added benefit, that after that line Sorbet can assume the value can no longerbe nil. The safe navigation operator works by returning nil if the receiver of the method call is nil and otherwise returning the result of the method call normally. Using the safe navi- gation operator to prevent a crash at the call site by returning nil instead can be useful in, for example, cases where the value is later checked, such as the one demonstrated in program 6.1.

2https://github.com/chanzuckerberg/sorbet-rails/ 31

1 def video_embed_url_from_video_url( url ) 2 uri = URI.parse(url) 3 # The parsed URI might not contain the path part and therefore 4 # the uri.path might be nil. Without the safe navigation operator 5 # (&.) the program would crash on the next line in that case. 6 video_id = uri.path&.split(’/’)&.[](1) 7 return nil if video_id . n i l ? 8 9 return "https ://embed.videohosting.service/#{video_id}" 10 end Program 6.1. Using safe navigation operator to prevent a crash in Ruby

One feature of Sorbet that proved rather useful, was the ability to mark class’s methods as abstract. This will cause the static type checking to fail if any class inheriting the abstract methods does not implement them. There were a couple of logical places to use this feature in the codebase. For example, there is a common base class for fetching social media content that is then specialized for each social media platform using inheritance. Each of the platform-specific subclasses is required to provide certain methods. Mark- ing these methods as abstract makes sure the methods are implemented and helps the developer to see what needs to be done in case support for a new platform is required. Declaring these interfaces revealed that some of the subclasses did not implement all of the methods that the base class expected. These methods were not actually called for these implementations but this is a very error-prone situation that should not exist.

For two important libraries used in the codebase, Koala3 and Twitter gem4, there did not exist ready-made type signature files in the sorbet-typed repository. This prevented changing some rather important files to # typed: true. Fortunately, both of these gems included YARD documentation5 in the source code which meant that a gem called sord6 could be used to generate the method signatures based on the YARD documentation comments. For both of the gems, the generated signatures required some minor tweak- ing by hand that was caused by, for example, typos in the documentation comments. Overall, the experience was good and sord proved crucial in improving the typedness of the codebase. These generated method signatures will most likely be contributed back to the sorbet-typed repository at a later time.

Sorbet also found a few instances of calls to methods that no longer exist. These were not that critical since obviously those parts of the code are no longer used as that would

3https://github.com/arsduo/koala 4https://github.com/borc/twitter 5https://yardoc.org/ 6https://github.com/AaronC81/sord 32 already have caused the program to crash in production. It is however important to min- imize the amount of dead code in the codebase because it reduces the amount of noise and makes spotting the actually relevant parts easier.

In the process of adopting Sorbet, two errors in the method signatures it provides for the Ruby standard library were discovered. One of them was an invalid return type for the Resolv::DNS#getaddresses method and the other one was missing add_certificate method in OpenSSL::SSL:SSLContext. Pull requests fixing these errors were opened on the Sorbet GitHub repository and one of them has already been merged. A new ver- sion of Sorbet is released nightly so these fixes could be taken advantage of quickly.

6.2 Online survey results

A total of N = 138 answers were received on the online survey. Because there was a few days in between posting the link to Koodiklinikka and to Reddit it could be observed that most of the answers, approximately hundred, came from Reddit users.

A vast majority (94.2%) of the participants were professional programmers. Not many beginner programmers answered the survey with 65.2% of the participants rating their programming skill level as “expert” and 31.2% as “intermediate” and only 3.6% as “begin- ner”. The distribution of years of programming experience is shown in Figure 6.1.

33% Less than 5 10% 5–10 11–15 16–20 25% 13% More than 20

19%

Figure 6.1. Years of programming experience

As expected, almost everyone was familiar with at least one dynamically typed program- ming language. JavaScript and Python were the best known programming languages with most of the participants knowing either or both of them. The programming language statistics are shown in Figure 6.2 where all languages with less than five answers are grouped together under “other”. 33

JavaScript or TypeScript 115 Python 104 Java 95 C 74 C++ 74 C# 61 PHP 58 Ruby 33 Rust 26 Go 14 Haskell 14 Scala 10 Clojure 6 Perl 5 Other 55 0 20 40 60 80 100 120 Number of answers

Figure 6.2. What programming languages do you know?

Out of the 138 participants, 110 (79.7%) reported having used static type checkers for dynamically typed languages (referred to as the USE group) and 28 (20.3%) had not used them (the NUSE group). When asked for reasons for not using these tools, about half of the NUSE group said that they either didn’t use a suitable programming language or that there is no such tool available for the language they are using. Other popular reasons included the tools being too difficult or complicated to use or them slowing down development. Of the NUSE group 71.4% said that they would consider using a static type checker in the future. These participants mostly argued their choices with the type checker helping to reduce errors and type annotations improving code documentation. One participant said that they would use a static type checker if their codebase grew to a size where the type checking would be beneficial.

“They seem like a good way to prevent bugs at compile time and annotate code for clarity.”

“Having type correctness helps to reason about the code more easily.”

The reasons for not considering using a static type checker in the future included it be- ing a “hack”, involving too much hassle, and preferring to use an actual statically typed language if there was a need for static type checking.

As stated before, a majority of the participants (110 out of 138) had used static type check- ers for dynamically typed languages. Most used tools were TypeScript (86.4%), Flow (16.4%), mypy (31.8%) and Pyright (8.2%) which aligns well with the fact that JavaScript 34 and Python were the most popular languages among the participants. Statistics about codebase sizes given in table 6.3 show that the codebases where the tools were used are of significant size.

Table 6.3. Codebase size statistics (lines of code)

Average Median 95th percentile 357 943 50 000 1 000 000

The amount of code together with the distribution of the number of people working on the codebases (Figure 6.3) suggest that these were quite large, real-world projects. Based on the answers, type checking is often used in large parts of the application with 63.6% of the answers saying that at least three quarters of the codebase was type checked (Figure 6.4).

1 2–5 54.5% 10.9% 6–10 11–20 3.6% 21–50 7.3% More than 50

12.7% 10.9%

Figure 6.3. Number of people working on the codebase

17.3% 11.8% 0–25% 7.3% 26–50% 51–75% 2.7% 76–100% 60.9% 100%

Figure 6.4. Type checking coverage 35

A little over half (51.8%) of the USE group thought the size of the codebase does not affect the usefulness of a static type checker. It was thought to be more useful in larger codebases by 40.9% of the answers and the rest of the participants did not have experi- ence from multiple projects. None of the participants said type checking would be more useful in a smaller codebase. A common argument for larger projects benefiting more than small ones was that having to keep track of the types mentally becomes harder as the codebase grows (mentioned by roughly 40% of the answers). In addition to reducing cognitive load, helping with refactoring was mentioned in many of the answers.

“The type checker reduces the amount needed to remember. The extra avail- able brain space is then available for use in a larger program.”

“In smaller codebases refactors are usually smaller and it is possible for one developer to know all the functionalities of the software. In large project it is pretty much impossible for one developer to remember every use case for each piece of code, so type checker really helps.”

Overall, the participants’ experiences of using a static type checker have been very pos- itive. As seen in the Figure 6.5, type checking is considered to improve the development process by the majority of the participants in every area in the survey. Development speed is the most controversial area and even in that over half of the participants think that it at least somewhat improves it.

When asked about the overall opinion on the beneficialness of static type checkers, 78.2% of the participants thought that using one has been very beneficial and 19.1% said it has been somewhat beneficial. Only 2.7% reported using a type checker not being beneficial at all. 36

100 90 80 70 60 Significantly improved Somewhat improved 50 No effect Somewhat decreased 40

% of answers Significantly decreased 30 20 10 0

API usability Code reliability Development speed Codebase maintainability

Confidence when writing new code Confidence in refactoring existing code Working with unfamiliar parts of the code

Figure 6.5. Experiences of using static type checkers

Out of the USE group 64.5% (71 participants) had experience in setting up and adopting a static type checker in an existing project (referred to as the SETUP group). They were asked more questions regarding the work and effort required in the setup as well as questions about their initial expectations. Most participants (56.3%) said that the amount of setup work required was “moderate”, 25.4% “minimal” and 18.3% “significant”.

Figure 6.6 compares the participants’ expectations with their experiences of the effects of static type checking. In the figure, the left bar (marked with an E) of each area represents the distribution of the expectations and the right bar (marked with an X) represents the experiences. Only the experiences of the 71 participants in the SETUP group are taken into account. As we can see, the expectations were mostly positive but conservative com- pared to the experiences. The experiences seem to have significantly higher percentages of “significantly improved” answers in many of the areas. 37

E X E X E X E X E X E X E X 100 90 80 70 60 Significantly improve(d) Somewhat improve(d) 50 No effect Somewhat decrease(d) 40

% of answers Significantly decrease(d) 30 20 10 0

API usability Code reliability Development speed Codebase maintainability

Confidence when writing new code Confidence in refactoring existing code Working with unfamiliar parts of the code

Figure 6.6. Expectations (E) of static type checking effects compared to experiences (X) of using them

A majority of the answers (56.3%) reported that their expectations of the static type check- ing effects were completely met. 39.4% said they were somewhat met and only 3 answers (4.2%) said that the expectations were not at all met. These numbers seem to correspond with the comparison of the expectations to the experiences in Figure 6.6. The reasons for not being completely satisfied with the results included technical limitations such asthe type checkers not supporting all language features or falsely reported errors, and workflow related issues like difficulties in setting the tools up or stubborn developers purposefully working around the type checker.

“MyPy is rough, and Python isn’t designed for it. The language has features it doesn’t support well and libraries take advantage of them.”

“Introducing types to highly untyped project will cause problems, always, ex- pect them. It will decrease productivity, it will take an enormous effort.” 38

Almost everyone (90.1%) in the SETUP group thought that the effects of using a static type checker justify the effort required to set it up and adopt the type system. Many argued that the setup process is easy and therefore using these tools should be a trivial choice. Other arguments included improved maintainability, development speed and confidence gains, and surfacing bugs early.

“You’d have to be an idiot to not use such a simple tool that can eliminate such a large proportion of bugs.”

6.3 Summary of results and discussion

Overall, both parts of the study are considered successful and they produced good data on the subject. Especially the online survey proved to be a good source of information. There were some unexpected problems with the exploratory testing part but it still pro- vided valuable insights into using third-party static type checkers.

The exploratory testing part was mostly successful. The experience with Sorbet was positive and it shows a lot of promise for wider adoption in the future. The tool was easy to setup and there were really no drawbacks of using it. Being a gradual type checker it can be, by definition, enabled in only certain parts of the code if necessary. This means that parts, which require more effort to be able to take advantage of the static type checking, can be done later and they do not block the adoption process contributing to the ease of setup. The results of the exploratory testing regarding the adoption and setup support the corresponding results from the survey. In the survey majority of the participants reported that setup effort was not significant and that this effort was justified by the benefits provided by the tools.

Like stated before, most of the errors found by Sorbet were related to variables possibly being nil. These errors might sometimes be hard to spot and therefore it is good to have a tool that can detect them. Some of the reported errors might have been false positives in a sense that there might be a nil-check before calling a method that Sorbet does not yet know about. Nevertheless, it is better to handle these cases as close to the failure point as possible since the code constantly evolves and there might become new call sites where the check is forgotten. In addition to nil variables, Sorbet also found calls to undefined methods. Abstract classes and methods also proved to be useful in describing common interfaces.

Unfortunately, the results of the the exploratory study were not as good as the author had hoped for. This was caused by the inability to actually get Sorbet into production as well as the aforementioned problems related to the ActiveRecord models and the sorbet-rails gem. Due to the too old Rails version it was not possible to put Sorbet into the production branch just yet. This meant that the benefits in day-to-day development could not be 39 assessed during this study. The dynamic nature of ActiveRecord models meant that static checks could not be enabled for those files. The sorbet-rails gem that is designed to help with this exact problem did not work with the codebase under study. The model files contain a lot of the business logic of the application meaning that a big portionof the application is still lacking static type checking. Had these files been type checked the amount of found bugs would probably have been noticeably higher.

The online survey had a decent turn out with 138 participants who represented varying skill levels and experience. The reported codebase sizes were significant compared to the ones used in most of the previous studies reviewed in Chapter 4. Overall, the survey data shows that programmers’ experiences of using static type checkers are positive. It is interesting to see that even though the participants’ expectations on the static type checkers were mostly positive the actual experiences surpassed those expectations.

Static type checkers were reported improving all of the surveyed parts of the development process. According to the results, the most beneficial they seem to be in improving confi- dence in refactoring existing code as well as writing new code, and in improving codebase maintainability as a whole. The responses seem to support the conclusions in the pre- vious controlled studies: code reliability is improved like in studies [7, 17], development speed is increased like in [2, 13], and codebase maintainability and usability got better like studies [4, 10, 11, 27] also found. Development speed was the most controversial area in the survey and that was also the only area where static typing “lost” in the reviewed pre- vious studies. These results suggest that the results from the controlled studies translate well into real-world development. As for the reasons why static type checking seems to help in the development process: quite a few of the respondents reported that having the assistance of a static type checker helps to reduce the cognitive load on the programmer.

A flaw in the survey is that static type checking, improved type safety and having additional type annotations get somewhat mixed up in it. Using these tools usually involves all of these factors. This makes it impossible to separate how each of them contributes to the overall effects. For example, it is not possible be determine if the improvements in API usability are actually due to static type checking or just having more explicit type annotations. On the other hand, it would be quite difficult for the participant to separate these factors and judge them separately. Attributing the results to only static type checking is therefore somewhat flawed.

Further research on real-world codebases and developer experiences are needed to strengthen the results of existing studies. Conducting more surveys with better focus on different aspects of these gradual type checking tools could provide better insights into their effects on the software development process. The tools could also be stud- ied in a controlled environment to get more reliable data. The problem with controlled experiments, however, is that it is difficult to use really large codebases in them. 40

7. CONCLUSION

This thesis studied the practicality of using static type checkers designed for dynamically typed languages. Data on the subject was gathered through an online survey and an exploratory testing performed by the author. In the online survey developers were asked about their opinions on the static type checkers for dynamic languages.In the exploratory part the author evaluated setting up and using a static type checker in a large Ruby codebase. The data from each part was reflected against each other and results collected from previous studies on the differences between type systems. At the beginning, the following research questions were established:

1. Are there perceived benefits in using a static type checker for a dynamically typed language? 2. What are the possible downsides of using one, and do the benefits outweigh them?

The data from the online survey (N = 138) shows that majority of the developers find using static type checkers for dynamically typed languages beneficial. The static type checker was reported being beneficial by the majority of the participants in all theareas of development surveyed. Development speed was the area with the most controversy among the responses. There are still some rough edges with the tools but those seem to be quite minor. Almost everyone who had experience in adopting these types of tools said that the benefits of using them justify the work that is needed to set themup.

The author’s own experiences in using Sorbet, a static type checker for Ruby, support the results of the online survey. Setting up Sorbet in a large codebase was rather trivial and it did not require a huge amount of time. Even with a relatively short time spent on converting files to static type checking it was possible to reveal potential errors that had gone previously unnoticed. There were some problems related to additional Sorbet tooling that would be required to take full advantage of it in the codebase. This resulted in a lesser amount of results than what the author had hoped for.

The results got from this study are in line with many of the previously conducted controlled studies by other authors. This suggests that the results of those studies could also apply to larger, real-world applications. Further research could include more comprehensive surveys and controlled studies with static type checkers for dynamically typed languages. 41

For the first research question the answer is that, yes, there are actually perceived ben- efits to using a static type checker. For the second question it can be said thatthereare some downsides to using them such as the additional work of adopting the tools required. However, all data suggests that these downsides are minor compared to the gained ben- efits.

Based on the results of the study it seems that there is little to no reason not to use a static type checker tool. The fact that many of the tools can be enabled incrementally means that there is not necessarily a big upfront time investment required. Especially in a larger codebase with multiple people working on it, the benefits should greatly outweigh the possible drawbacks. 42

REFERENCES

[1] Daly, M. T., Sazawal, V. and Foster, J. S. Work In Progress: an Empirical Study of Static Typing in Ruby. (2009). Publisher: Citeseer. [2] Delorey, D. P., Knutson, C. D. and Chun, S. Do Programming Languages Affect Productivity? A Case Study Using Data from Open Source Projects. First Inter- national Workshop on Emerging Trends in FLOSS Research and Development (FLOSS’07: ICSE Workshops 2007). First International Workshop on Emerging Trends in FLOSS Research and Development (FLOSS’07: ICSE Workshops 2007). 2007, pp. 8–8. [3] DRuby - Home. URL: http://www.cs.umd.edu/projects/PL/druby/ (visited on 07/09/2020). [4] Endrikat, S., Hanenberg, S., Robbes, R. and Stefik, A. How do API documentation and static typing affect API usability?: Proceedings of the 36th International Con- ference on Software Engineering. ICSE 2014. New York, NY, USA: Association for Computing Machinery, 2014, pp. 632–642. [5] Furr, M., An, J.-h. (, Foster, J. S. and Hicks, M. Static type inference for Ruby. Pro- ceedings of the 2009 ACM symposium on Applied Computing. SAC ’09. Honolulu, Hawaii: Association for Computing Machinery, 2009, pp. 1859–1866. [6] Gabbrielli, M. and Martini, S. Structuring Data. Programming Languages: Princi- ples and Paradigms. Ed. by M. Gabbrielli and S. Martini. Undergraduate Topics in Computer Science. London: Springer, 2010, pp. 197–263. [7] Gannon, J. D. An experimental evaluation of data type conventions. Communica- tions of the ACM 20.8 (1977), pp. 584–595. [8] Gao, Z., Bird, C. and Barr, E. T. To Type or Not to Type: Quantifying Detectable Bugs in JavaScript. 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE). ISSN: 1558-1225. 2017, pp. 758–769. [9] Hanenberg, S. An experiment about static and dynamic type systems: doubts about the positive impact of static type systems on development time. ACM SIGPLAN Notices 45.10 (2010), pp. 22–35. [10] Hanenberg, S., Kleinschmager, S., Robbes, R., Tanter, É. and Stefik, A. An em- pirical study on the impact of static typing on software maintainability. Empirical Software Engineering 19.5 (2014), pp. 1335–1382. [11] Harlin, I. R., Washizaki, H. and Fukazawa, Y. Impact of Using a Static-type System in Computer Programming. Journal of software 12.8 (2017), pp. 598–611. 43

[12] Idris: A Language for Type-Driven Development. URL: https : / / www . idris - lang.org/index.html (visited on 10/01/2020). [13] Kleinschmager, S. Can static type systems speed up programming? An experimen- tal evaluation of static and dynamic type systems. Hamburg, GERMANY: Diplomica Verlag, 2012. [14] Mayer, C., Hanenberg, S., Robbes, R., Tanter, É. and Stefik, A. An empirical study of the influence of static type systems on the usability of undocumented software. Proceedings of the ACM international conference on Object oriented programming systems languages and applications. OOPSLA ’12. Tucson, Arizona, USA: Associ- ation for Computing Machinery, 2012, pp. 683–702. [15] Pierce, B. C. Types and programming languages. Publication Title: Types and pro- gramming languages. Cambridge, Massachusetts: MIT Press. [16] Prechelt, L. An empirical comparison of seven programming languages. Computer 33.10 (2000). Conference Name: Computer, pp. 23–29. [17] Prechelt, L. and Tichy, W. F. A controlled experiment to assess the benefits of pro- cedure argument type checking. IEEE Transactions on Software Engineering; New York 24.4 (1998). Num Pages: 11 Place: New York, United States, New York Pub- lisher: IEEE Computer Society, pp. 302–312. [18] Ray, B., Posnett, D., Devanbu, P. and Filkov, V. A large-scale study of programming languages and code quality in GitHub. Communications of the ACM 60.10 (2017), pp. 91–100. [19] Ray, B., Posnett, D., Filkov, V. and Devanbu, P. A large scale study of programming languages and code quality in github. Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. FSE 2014. New York, NY, USA: Association for Computing Machinery, 2014, pp. 155–165. [20] Ren, B. M., Toman, J., Strickland, T. S. and Foster, J. S. The ruby type checker. Proceedings of the 28th Annual ACM Symposium on Applied Computing. SAC ’13. Coimbra, Portugal: Association for Computing Machinery, 2013, pp. 1565–1572. [21] Ruby 3.0.0 Released. URL: https://www.ruby-lang.org/en/news/2020/12/ 25/ruby-3-0-0-released/ (visited on 03/17/2021). [22] Rust Programming Language. URL: https://www.rust-lang.org/ (visited on 10/08/2020). [23] Sagonas, K. and Luna, D. Gradual typing of erlang programs: a wrangler experi- ence. Proceedings of the 7th ACM SIGPLAN workshop on ERLANG. ERLANG ’08. New York, NY, USA: Association for Computing Machinery, 2008, pp. 73–82. [24] Siek, J. What is Gradual Typing | Jeremy Siek. 2014. URL: https://wphomes. soic.indiana.edu/jsiek/what-is-gradual-typing/ (visited on 03/21/2021). [25] Sorbet · A static type checker for Ruby. URL: https://sorbet.org/ (visited on 07/16/2020). 44

[26] Souza, C. and Figueiredo, E. How do programmers use optional typing? an empir- ical study. Proceedings of the 13th international conference on Modularity. MOD- ULARITY ’14. Lugano, Switzerland: Association for Computing Machinery, 2014, pp. 109–120. [27] Spiza, S. and Hanenberg, S. Type names without static type checking already im- prove the usability of APIs (as long as the type names are correct): an empirical study. Proceedings of the 13th international conference on Modularity. MODULAR- ITY ’14. Lugano, Switzerland: Association for Computing Machinery, 2014, pp. 99– 108. [28] Stack Overflow Developer Survey .2020 Stack Overflow. URL: https://insights. stackoverflow.com/survey/2020 (visited on 10/29/2020). [29] Static v. dynamic languages literature review. URL: https://danluu.com/empirical- pl/ (visited on 10/01/2020). [30] Stuchlik, A. and Hanenberg, S. Static vs. dynamic type systems: an empirical study about the relationship between type casts and development time. Proceedings of the 7th symposium on Dynamic languages. DLS ’11. Portland, Oregon, USA: As- sociation for Computing Machinery, 2011, pp. 97–106. [31] Tracking Adoption with Metrics · Sorbet. URL: https://sorbet.org/ (visited on 04/18/2021). [32] Tratt, L. and Wuyts, R. Guest Editors’ Introduction: Dynamically Typed Languages. IEEE Software 24.5 (2007). Conference Name: IEEE Software, pp. 28–30. [33] typing — Support for type hints — Python 3.9.2 documentation. URL: https:// docs.python.org/3/library/typing.html (visited on 03/13/2021). 45

APPENDIX A: STATIC TYPE CHECKER SURVEY

This survey is part of a Master’s Thesis where I am investigating the benefits and draw- backs of using static type checking tools designed for languages that weren’t originally designed with static type checking. Examples of such tools include Flow for JavaScript, Sorbet for Ruby and mypy for Python. (TypeScript is also included in this category, even if it can be considered a separate language, because it can be used in the same way as the other tools.)

Filling out the survey will take about 6-10 minutes and I encourage you to answer even if you haven’t used these tools before.

All responses are anonymous, no personal information is collected and individual re- sponses cannot be distinguished from the results.

Thank you for your time!

Background information

How would you characterize your programming skill level?

⊚ Expert ⊚ Intermediate ⊚ Beginner

How many years of programming experience do you have?

⊚ Less than 5 years ⊚ 5–10 years ⊚ 11–15 years ⊚ 16–20 years ⊚ More than 20 years 46

How many years of programming experience do you have?

⊚ Less than 5 years ⊚ 5–10 years ⊚ 11–15 years ⊚ 16–20 years ⊚ More than 20 years

Do you program professionally?

⊚ Yes ⊚ No

Which programming languages do you know?

□ JavaScript or TypeScript □ Python □ PHP □ Ruby □ C □ C++ □ Java □ C# □ Other: ______

Have you used static type checkers for dynamically typed languages? E.g. TypeScript, Flow, Sorbet

⊚ Yes (jump to “Using type checkers”) ⊚ No (jump to “Haven’t used static type checkers”)

Haven’t used static type checkers

Why haven’t you used a static type checker?

□ I haven’t used a suitable programming language □ There is no such tool available for my language of choice 47

□ I don’t think they are useful □ I think using them is too difficult □ I think using them slows down development □ Other: ______

Which type checkers have you heard of?

□ None □ TypeScript □ Flow □ Sorbet □ Steep □ mypy □ Pyright □ Phan □ Dialyzer □ Other: ______

Would you consider using a static type checker in the future?

⊚ Yes ⊚ No

Why? Why not?

End of survey

Using type checkers

Which type checkers have you used?

□ TypeScript □ Flow □ Sorbet □ Steep □ mypy □ Pyright 48

□ Phan □ Dialyzer □ Other: ______

What is the largest codebase you have used a static type checker in? Please estimate the LoC

What percentage of the codebase was type checked? Please choose a range or input a more accurate value if available

⊚ 0–25 ⊚ 26–50 ⊚ 51–75 ⊚ 76–100 ⊚ Other: ______

How many people were there working on the codebase?

⊚ 1 ⊚ 2-5 ⊚ 6-10 ⊚ 11-20 ⊚ 21-50 ⊚ More than 50 49

In your experience, how has using a static type checker affected the following areas? 1 = Significantly decreased, 2 = Somewhat decreased, 3 = No effect, 4= Somewhat improved, 5 = Significantly improved

1 2 3 4 5

Code reliability ⊚ ⊚ ⊚ ⊚ ⊚ Development speed ⊚ ⊚ ⊚ ⊚ ⊚ Confidence when writing new code ⊚ ⊚ ⊚ ⊚ ⊚ Confidence in refactoring existing code ⊚ ⊚ ⊚ ⊚ ⊚ API usability ⊚ ⊚ ⊚ ⊚ ⊚ Codebase maintainability ⊚ ⊚ ⊚ ⊚ ⊚ Working with unfamiliar parts of the code ⊚ ⊚ ⊚ ⊚ ⊚

Overall, how beneficial has using a static type checker been in your opinion?

⊚ Not at all ⊚ Somewhat ⊚ Very beneficial

If you have used static type checkers in multiple projects, do you think the size of the codebase affects the usefulness of a static type checker?

⊚ It is more useful in a larger codebase ⊚ The size of the codebase doesn’t matter ⊚ It is more useful in a smaller codebase ⊚ No experience from multiple projects

Why do you think that is?

Do you have experience in setting up a type checker in an existing codebase?

⊚ Yes (move on to “Adopting static type checking”) ⊚ No (End of survey)

Adopting static type checking

This section aims to collect information on the process of adopting static type checking in an existing project. Please consider both the actual installation and setup of the tool as 50 well as the work that needs to be put in afterwards to see any results.

What did the setup of the type checker consist of in your case?

How would you rate the amount of work required to introduce static type checking?

⊚ Minimal ⊚ Moderate ⊚ Significant

How did you expect the type checker to affect the following areas? 1 = Significantly decrease, 2 = Somewhat decrease, 3 = No effect, 4 = Somewhat improve, 5 = Significantly improve

1 2 3 4 5

Code reliability ⊚ ⊚ ⊚ ⊚ ⊚ Development speed ⊚ ⊚ ⊚ ⊚ ⊚ Confidence when writing new code ⊚ ⊚ ⊚ ⊚ ⊚ Confidence in refactoring existing code ⊚ ⊚ ⊚ ⊚ ⊚ API usability ⊚ ⊚ ⊚ ⊚ ⊚ Codebase maintainability ⊚ ⊚ ⊚ ⊚ ⊚ Working with unfamiliar parts of the code ⊚ ⊚ ⊚ ⊚ ⊚

How well were your expectations met?

⊚ Not at all ⊚ Somewhat ⊚ Completely

Why? Why not?

Do you think the effects of using a static type justify the effort put into adopting it?

⊚ Yes ⊚ No 51

Why? Why not?

End of survey