Automated Techniques for Creation and Maintenance of Typescript Declaration Files
Total Page:16
File Type:pdf, Size:1020Kb
Automated Techniques for Creation and Maintenance of TypeScript Declaration Files Erik Krogh Kristensen PhD Dissertation Department of Computer Science Aarhus University Denmark Automated Techniques for Creation and Maintenance of TypeScript Declaration Files A Dissertation Presented to the Faculty of Science and Technology of Aarhus University in Partial Fulfillment of the Requirements for the PhD Degree by Erik Krogh Kristensen August 8, 2019 Abstract JavaScript was initially a scripting language for creating small interactive web pages. However, massive applications, both web pages and server ap- plications, are increasingly being developed using JavaScript. The dynamic nature of the JavaScript language makes it hard to create proper tooling with features such as auto-completion and code-navigation. TypeScript is a JavaScript superset that adds, on top of JavaScript, an optional static type system, which facilitates features such as auto-completion, code navigation, and detection of type errors. However, many TypeScript applications still use untyped libraries written in JavaScript. Developers or users of these JavaScript libraries can choose to write TypeScript declaration files, which provide API models of the libraries and are used to type-check TypeScript applications. These declaration files are, however, written manually and of- ten not by the original authors of the library, which leads to mistakes that can misguide TypeScript application developers and ultimately cause bugs. The goal of this thesis is to design automated techniques for assisting in the development of TypeScript declaration files. This thesis identifies several challenges faced by developers of TypeScript declaration files and tackles these challenges using techniques from programming language re- search. Type inference is used to create new, and update existing, decla- ration files. Automated testing is used to detect errors in declaration files. Finally, data-flow analysis and a novel concept of reasonably most-general- clients are used to verify the absence of errors in declaration files. Each of the techniques is used to improve the quality of real-world declaration files. i Resumé JavaScript startede som et scriptingsprog til at lave små interaktive hjemmesider. Dog udvikles større applikationer, både hjemmesider og andre applikationer, i stigende grad ved brug af JavaScript. JavaScript- sprogets dynamiske karakter gør det vanskeligt at skabe fornuftige udviklingsmiljøer med funktioner som automatisk kode-fuldendelse og kode-navigation. TypeScript er et superset af JavaScript, der udover JavaScript tilføjer et valgfrit system af statiske typer, der muliggør funktioner som automatisk kode-fuldendelse, kode-navigation og detektion af typefejl. Mange TypeScript-applikationer bruger dog stadig ikke-typede biblioteker skrevet i JavaScript. Udviklere eller brugere af disse JavaScript-biblioteker kan vælge at skrive TypeScript-deklarationsfiler, der leverer API-modeller af bibliotekerne og bruges til at kontrollere typerne i TypeScript-applikationer. Disse deklarationsfiler er dog skrevet manuelt og ofte ikke af de originale forfattere af biblioteket, hvilket kan medføre fejltagelser, der kan vildlede TypeScript-applikationsudviklere og i sidste ende forårsage fejl. Målet med denne afhandling er at designe automatiserede teknikker til at assistere i udviklingen af TypeScript-deklarationsfiler. Denne afhandling identificerer flere udfordringer, som udviklere af TypeScript- deklarationsfiler står overfor, og håndterer disse udfordringer ved hjælp af teknikker fra programmeringssprogforskning. Type inferens bruges til at oprette nye og opdatere eksisterende deklarationsfiler. Automatisk testning bruges til at opdage fejl i deklarationsfiler. Endeligt bruges datastrømningsanalyse og et nyt koncept af rimelige-mest-generelle klienter til at verificere fraværet af fejl i deklarationsfiler. Hver af teknikkerne bruges til at forbedre kvaliteten af faktisk benyttede deklarationsfiler. iii Acknowledgments I would like to thank the following people for their support and guidance during my Ph.D. studies. Anders Møller for being an attentive and hard-working supervisor, for helping me navigate the world of academia, for letting me run with my ideas, and for significantly improving the quality of my writing. Esben An- dreasen for all the excellent office discussions we had. Gianluca Mezzetti for being a coding guru without whom this project would have taken longer and been worse. Søren Eller Thomsen for dragging me to the coffee machine for a refill and otherwise interrupting my work. The rest of the Programming Languages research group and various people from Aarhus University for providing an excellent work and study environment with room for learn- ing, discussions, and beer. Eric Bodden and his research group for the great time I had while visiting the Heinz Nixdorf Institut. Finally, my family for supporting me despite not being sure what I really do. Erik Krogh Kristensen, Aarhus, August 8, 2019. v Contents Abstract i Resumé iii Acknowledgments v Contents vii I Overview 1 1 Introduction 3 1.1 Challenges . 7 1.2 Thesis statement . 8 1.3 Contributions . 8 1.4 Outline . 9 2 JavaScript and TypeScript 11 2.1 JavaScript . 11 2.2 TypeScript . 19 2.3 TypeScript Declaration Files . 30 2.4 Errors in TypeScript Declaration Files . 31 3 Automated Program Analysis 35 3.1 Type Analysis . 35 3.2 Unification-Based Type Inference . 36 3.3 Subtyping-Based Type Inference . 40 3.4 Data-Flow Analysis . 44 3.5 Dynamic Analysis . 50 4 Conclusion 55 4.1 Future work . 57 vii viii CONTENTS II Publications 59 5 Inference and Evolution of TypeScript Declaration Files 61 5.1 Abstract . 61 5.2 Introduction . 61 5.3 Motivating Examples . 63 5.4 TSinfer: Inference of Initial Type Declarations . 66 5.5 TSevolve: Evolution of Type Declarations . 69 5.6 Implementation . 69 5.7 Experimental Evaluation . 73 5.8 Related Work . 82 5.9 Conclusion . 82 6 Type Test Scripts for TypeScript Testing 83 6.1 Abstract . 83 6.2 Introduction . 84 6.3 Motivating Examples . 86 6.4 Basic Approach . 89 6.5 Challenges and Design Choices . 93 6.6 Soundness and (Conditional) Completeness . 102 6.7 Experimental Evaluation . 104 6.8 Related Work . 114 6.9 Conclusion . 116 6.10 Appendix . 117 7 Reasonably-Most-General Clients for JavaScript Library Analysis 119 7.1 Abstract . 119 7.2 Introduction . 119 7.3 Motivating example . 122 7.4 Reasonably-Most-General Clients . 125 7.5 Abstract Domains in Static Type Analysis . 130 7.6 Using RMGCs in Static Type Analysis . 130 7.7 Evaluation . 136 7.8 Evaluation on larger benchmarks . 142 7.9 Related Work . 144 7.10 Conclusion . 145 Bibliography 147 Part I Overview 1 Chapter 1 Introduction Optionally typed languages, which mix static and dynamic typing, are in- creasingly being used by developers, with languages such as TypeScript, Python, and Groovy seeing an increase in contributors within a year on GitHub of 90%, 50%, and 40% respectively [4]. Many developers prefer to use TypeScript or Python with both being in the top 3 most loved languages according to Stack Overflow [7]. These optionally typed languages allow programmers to mix dynami- cally typed code, where type checking is deferred until runtime, with stati- cally typed code, which can be type-checked before the program is executed. Some of the patterns used in dynamic programming languages can be very hard to express in statically typed languages [30], however, the tool support for these dynamic languages often lacks compared to statically typed lan- guages. By mixing dynamic and static types, an optionally typed language gives programmers the option of using the programming style that fits the best when solving a problem. Developers writing typed applications in optionally typed languages of- ten want to use existing untyped libraries, and in languages such as Type- Script or Python typed API models describing the behavior of the untyped libraries are often available. These API models facilitate type checking of typed applications that use the untyped libraries, and the API models en- able precise auto-completion and code-navigation in IDEs. However, the API models are ignored at runtime, where the untyped library is executed without any regard for the types in the API models. For TypeScript and Python, these API models are called declaration files and stub files, respec- tively. Both languages have official repositories containing API models for a large collection of libraries.1 1For Python this collection is called typeshed [15], and for TypeScript the collection is called DefinitelyTyped [14]. 3 4 CHAPTER 1. INTRODUCTION Some optionally typed languages are also gradually typed. A gradually typed language adds, on top of being optionally typed, type checks at the boundaries between statically and dynamically typed code [102, 103]. These type checks detect at runtime whether the value of a variable matches the type annotation for the variable. Programmers can, due to the type checks, trust type annotations on variables to be correct at runtime, and that no type errors will happen inside fully annotated code where dynamic types are not used. Most optionally typed languages, including all previously mentioned, are not gradually typed, as these languages do not insert type checks on all boundaries between statically and dynamically typed code. What sets a gradually typed language apart from languages such as Groovy and Dart, which has type checks at variable assignments, is that a gradually typed language can track blame on higher-order functions,