<<

Höllische Programmiersprachen Masterseminar im Wintersemester 2014/2015 Optionale Typen

Michael Lux Technische Universität München betreut durch Ralf Vogler 18.12.2014

Abstract This work is about optional typing, an approach to facilitate advan- tages of static typing like high eciency and early error detection in dy- namically typed programming languages without loosing the advantages and dynamics of the latter. As an example, it will be shown how recent JavaScript development tools make use of optional typing to facilitate those advantages in practice.

1 Introduction

In the long history of software development, many programming languages be- came established. As many of them target speci application elds, whereas others are meant to be as generic as possible, there exists a broad variety of dif- ferent classes of programming languages nowadays, like imperative languages, declarative languages, object-oriented languages, or functional ones, to name just a few. One criterion, maybe one of the most obvious ones to tell dierent kinds of programming languages apart, is the so-called  of a programming language.

A type system is a syntactic method for automatically checking the absence of certain erroneous behaviors by classifying program phrases according to the kinds of values they compute. [12] Of course, there are also various type systems, but at the bottom line, all programming languages can be roughly divided into dynamically and statically typed languages [14]. Both kinds of type systems have individual advantages and disadvantages in terms of usability, type safety, performance and many more aspects.

1 This work is about usage of optional typing, an approach to bring some advantages of static typing into dynamically typed programming languages by oering an opportunity to statically dene types (either via new language constructs or via hints with normal features of the programming language) within the code whenever this is desired. Optional typing can be seen as a subsume of a number of related concepts such as gradual typing, soft typing and pluggable typing [14]. It is dicult to give an exact denition of optional typing, as the whole topic is still an active area of research [14].

In this work, the advantages and disadvantages of dynamic type systems over static type systems are briey explained rst. Secondly, two dierent approaches to introduce optional (static) types in JavaScript will be pre- sented, namely asm.js [5], a JavaScript subset for type hinting, and Flow [4], a meta-language for optional typing. Then, the focus will be on two important advantages that can be achieved using optional types: Performance improvements (or, to be more exact, improved runtime eciency) and early detection of errors (i.e. typing errors) prior to execution. Finally, the work concludes with some considerations about the usefulness of optional typing for application development, using the example of JavaScript.

2 (Dis)Advantages of Dynamic Type Systems

The dierence between statically and dynamically typed languages is that the former can check the correctness of the types of variables and parameters at compile time, whereas the latter can only check those types at runtime [14]. Both statically and dynamically typed programming languages have a long, rich history. The rst programming language considered a high level programming language was FORTRAN, released in 1957 [10]. FORTRAN is, as might be intuitively assumed, a statically typed language. However, the rst dynamically typed language, LISP, was released only shortly after in 1958 [9]. This historical fact shows, that even at a time where computational power was magnitudes lower than those of today's computers, it seems that there have already been good reasons to use dynamic typing despite of it's drawbacks in terms of performance.

2.1 Advantages The rst very obvious advantage of dynamic typing is the shallower learning curve for programming beginners. Where advanced programmers are familiar with typing and related concepts, it can be quite confusing for beginners to even tell oating point numbers and numbers apart, not to mention type con- version, representation of boolean values or strings, and so on. Another point is the vast exibility of dynamic typing. Imagine a function that

2 sums up an array of values. In a statically typed language, one has to dene one function for each data type, e.g. one for arrays of integer values, another one for arrays of oat values, and so on. In a dynamically typed language, the function has to be written only once and can be used with any type of data that semantically makes sense. The same applies for the so-called duck typing [2], where objects with the same set of visible methods and attributes can be equally used, no matter which class they've been derived from. This leads to the nding that dynamic typing also enables, in general, faster de- velopment and ecient (i.e. fast) prototyping. For static typing, programmers always have to consider which type of data suits best for the specic applica- tion, and set up something that might be called a type contract. On the other side, when using dynamic typing, programmers can just write down the code without thinking about typing, and later adapt relevant portions of the code as required.

2.2 Disadvantages The most obvious disadvantage of dynamic typing is a loss of eciency. In statically typed languages, compilers can exploit the type information to produce more ecient code [1] and don't have to put runtime checks of types into the code. This might not be an essential issue for many applications, but it can become a problem when eciency matters, for example for scientic computing. In section 4, we will explain how optional typing can help to alleviate this issue.

As dynamic type systems don't enforce types at compile time, misinter- pretation of the meaning of a function or accidental passing of wrong parameter types is also a severe issue that may result in unexpected behavior. This gets much worse when the programming language is weakly typed, i.e. does a lot of automatic based on assumptions about the most likely meaning of an expression. Let's consider a piece of JavaScript code given in Figure 1.

1 function sum(array){ 2 var sum = 0; 3 for(i = 0;i< array.length;i++) { 4 sum += array[i]; 5 } 6 return sum; 7 } 8 9 sum([1,2,3]);// result:6 10 sum(["1", 2, 3]);// result: "0123"

Figure 1: sum function with invocations

3 The rst call to sum() with integer parameters yields the (most likely) expected result, the latter one might not be what the programmer expected. This happens because the + operator in JavaScript is ambiguously used for an arithmetic operation as well as string concatenation. Such errors may be xed with stronger type systems. In Python, for instance, such code would raise a TypeError, as all numeric values must be explicitly converted to strings before being concatenated there. Of course, this xes only half of the problem, as errors still can't be detected before execution. However, there are much more complex errors of this manner, like returning of null in PHP or undened in JavaScript to indicate an error, whereas the programmer expects an object and doesn't consider possible error states. It will also be shown how optional typing can help with this issue.

Furthermore, as it has already been shown in Figure 1 that typing errors in dynamically typed languages can be very hard to trace and track down. Technically, the second call to the sum() function returns a perfectly valid value. This value may be further processed at dierent places and nally cause either a wrong output or a crash at some other point in time where the root cause of the failure is everything but obvious. One worst case scenario is shown in Figure 2, where the result is even forced back to the type originally expected without any hassle. Such invocations mask the error, making it almost impossible to detect the cause of wrong results.

1 sum(["1", 2, 3]) - 3;// result: 120

Figure 2: really bad invocation of sum

3 Optional Typing in Practice

In the past, there has been a lot of research on how to utilize the benets of static typing in dynamically typed languages, but without integration of this approaches in any mainstream programming language [14]. However, over the past few years approaches began to emerge that try to adopt this idea in practice. This section takes a look at two of them: asm.js [5], that was developed by Mozilla and Flow [4], which was developed by facebook.

3.1 Optional Typing in asm.js Asm.js exploits the ECMAScript denitions to enforce certain native types on variables and parameters using standard JavaScript operators. For instance, ECMA-262 denes the bitwise-or-operation to be performed on 32 bit integer values, which means that an expression like x|0 corresponds to the 32 bit integer value of x. As a subset of JavaScript, it maintains full compatibility

4 to existing JavaScript interpreters according to the ECMAScript standard [6]. A very simple asm.js script is shown in Figure 3. Here, the bitwise-or-operation with the neutral element 0 is used to tell an asm.js-aware JavaScript inter- preter that the inputs as well as the return value of the function have to be 32 bit integer values.

1 function compiledCalculation(){ 2 varx=f()|0;//x isa 32-bit value 3 vary=g()|0;// so isy 4 return(x+y)|0;// 32-bit addition, no type or overflow checks 5 }

Figure 3: asm.js example code

This ECMAScript-compatible nota- tion comes at a cost: To take advan- tage of the speed improvements, pro- grammers must provide consistent, correct types for both input values and return values. If the programmer fails to annotate the code correctly, the code will still work, but without the performance benets of asm.js. Of course, a valida- tor might be used to check the asm.js code for validity, but this makes the workow even more uncomfortable. At this point, [16] comes into play. The primary idea of Em- Figure 4: asm.js compilation and scripten is to provide a transforma- execution pipeline [13] tion path from statically typed, na- tive languages like C(++) to JavaScript [15]. To be as exible and ecient as possible, Emscripten does not directly tran- spile any high level programming language to JavaScript. Instead, Emscripten was created as a LLVM-to-JavaScript Compiler, that translates LLVM [7] byte- code into JavaScript. The LLVM code can be generated by tools like Clang [8] (for C/C++), which compile source code to LLVM bytecode. The usual transformation path from C(++) to JavaScript is shown in Figure 4.

3.2 Optional typing with Flow Flow is a static type checker for JavaScript, released by facebook Inc. in 2014. It's main goal is to nd errors in JavaScript code with little programmer eort, heavily using type inference to automatically detect (not explicitly declared) types of variables and parameters wherever possible [4].

5 When types cannot be properly inferenced, or the programmer wants to explic- itly specify a certain parameter type or return type, he can use Flow's annotation syntax as shown in Figure 5. In contrast to asm.js, Flow doesn't introduce other

1 /* @flow*/ 2 3 function somefunc(a: string,b: number): number{ 4 returna.length*b; 5 } 6 7 somefunc("foo", 42);

Figure 5: Flow example code

types than those available in JavaScript itself. Flow allows common automatic conversions like string + 1 and even less common ones like 1 + true, main- taining maximal compatibility with allowed operations in JavaScript itself. The type annotations used in Flow are no valid JavaScript syntax according to the current ECMAScript [6] standard, but a subset of an extended JavaScript derivative called JSX [3]. Therefore, les annotated with Flow need to be trans- piled into pure JavaScript. The workow recommended by the Flow authors is to use the react-tools node.js package for the transpilation.

4 Advantages through Optional Typing

Optional typing is used to achieve various advantages that are not found in purely dynamically typed languages. However, the focus of the two presented approaches is dierent. While asm.js aims at performance improvements, Flow specically targets the detection of type errors prior to execution. However, both approaches have in common that they may be implemented in portions of the code, usually at the level of so-called modules, and sometimes even only for single variables and functions/methods, whereas the other parts of the modules stay dynamically typed, leaving the programmer the full exibility that dynamic typing brings. This property also led to the term gradual typing, which is often used in this context.

4.1 Performance Modern JavaScript engines, starting in late 2008/early 2009 with V8, Trace- Monkey and Nitro, have become suciently fast to run large code bases [15]. Nevertheless, the very dynamic nature of JavaScript, especially it's type sys- tem, are limiting optimization by JavaScript engines' just-in-time (JIT) com- pilers, making it dicult or impossible to run computation-intensive tasks with speeds comparable to native code. For instance, there will be situations in most JavaScript programs where it is impossible to tell, whether the + operator has to do an arithmetic operation or string concatenation during execution of the script

6 (compare Figure 1). In such cases, there need to be runtime checks which test the operators for being numbers, then performing an arithmetic operation on them, or a string concatenation (plus eventual conversion of the other operand to a string) otherwise. As can be seen in Figure 6, the optional typing with asm.js has the potential to be game-changing here. The asm.js code generated with Emscripten performs numerous times better than normal JavaScript implementations, achieving ap- proximately half the speed of its native counterparts in early tests [15]. This huge performance increase arises from the fact that the knowledge about typing makes many expensive runtime checks and dierent code paths redun- dant, as well as enabling the JavaScript engine to eciently compile the code ahead of time (AOT). Another article at mozilla, published late 2013, talks about improvements that gain even better performance, with asm.js running only 1.5 times slower when compared to native code created with Clang [11].

Figure 6: asm.js performance [15]

4.2 Error Detection As has been shown in subsection 3.2, static typing is not necessarily excessively complex. Approaches like Flow can be used to ensure correct typing of critical modules and to enforce certain types when this is required. For instance, the JSX code shown in Figure 7 enforces the return type string, but the function returns a value of type number. When Flow is run on that code, it throws an error like

<le>:4:10,21: number This type is incompatible with: <le>:3:42,47: string Fixing this error as shown in Figure 8 resolves this problem.

7 1 /* @flow*/ 2 3 function somefunc(a: string,b: number): string{ 4 returna.length*b; 5 } 6 7 somefunc("foo", 42);

Figure 7: wrong Flow example code

1 /* @flow*/ 2 3 function somefunc(a: string,b: number): string{ 4 return(a.length*b).toString(); 5 } 6 7 somefunc("foo", 42);

Figure 8: corrected Flow example code

Finally, Flow is also a gradual type system [4], which means that one can easily statically typed modules, annotated with /* @ow */, with modules without Flow type checks.

5 Conclusion

Optional typing can bring the best of two worlds, static typing and dynamic typing, together. However, as nothing is entirely for free, it should always be carefully considered if optional typing makes sense for a particular application. For JavaScript applications, we can conclude that there are three main factors that inuence the choice of a suitable strategy:

• Existing code base: Does the project start from scratch, or is there already (native) code, that can be compiled to HHVM bytecode?

• Performance requirements: Does the application do heavy low-level com- putations (e.g. numerical algorithms)?

• Type safety requirements: Should the application be protected against type errors (at the cost of a slightly more complicated development work- ow)?

The advisable strategy dependent on this questions is shown in Figure 9. If there already exists a large code base in a HHVM-translatable language, or the code should do lots of low-level computations, asm.js will most likely be the best choice because of it's high performance. If the code is required to be type-safe or generally as less error-prone as possible,

8 Figure 9: JavaScript development strategy owchart

Flow oers this with little additional eort for the programmer. As currently none of the tools presented in this work do seamlessly integrate into the development process, it should always be considered if the expected increase of code quality oered by optional typing justies the additional eort. Especially when the programmers are not familiar with the typing concept, the confusion caused by a complicated workow might cause more harm than good.

References

[1] Robert Cartwright and Mike Fagan. Soft Typing. In Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, volume 26 of PLDI '91, pages 278292, New York, NY, USA, 1991. ACM. [2] Ravi Chugh, Patrick M. Rondon, and Ranjit Jhala. Nested Rene- ments: A Logic for Duck Typing. SIGPLAN Not, 47(1):231244, 2012. URL: http://doi.acm.org/10.1145/2103621.2103686, doi:10.1145/ 2103621.2103686. [3] DeNA Co., Ltd. et al. JSX, 2012. URL: https://jsx.github.io/. [4] Facebook Inc. Flow: A static type checker for JavaScript, 2014. URL: http://flowtype.org/. [5] David Herman, Luke Wagner, and Alon Zakai. asm.js: Working Draft, 2014. URL: http://asmjs.org/spec/latest/. [6] . ECMA-262, ECMAScript Language Speci- cation, Edition 5.1. URL: http://www.ecma-international.org/ publications/standards/Ecma-262.htm.

9 [7] C. Lattner and V. Adve. LLVM: a compilation framework for lifelong program analysis transformation. In Code Generation and Optimization, 2004. CGO 2004. International Symposium on, pages 7586, 2004. doi: 10.1109/CGO.2004.1281665. [8] Chris Lattner. LLVM and Clang: Next Generation Compiler Technol- ogy: BSDCan 2008, 2008. URL: http://www.llvm.org/pubs/2008-05- 17-BSDCan-LLVMIntro.pdf. [9] J. McCarthy. History of Lisp, 1979. URL: http://www.csse.monash.edu. au/courseware/cse3323/CSE3323-2002/lisp.pdf.

[10] John C. McPherson. Early Computers and Computing Institutions. Annals of the History of Computing, 6(1):1516, 1984. doi:10.1109/MAHC.1984. 10005. [11] Robert Nyman. Gap between asm.js and native performance gets even narrower with oat32 optimizations, 2013. URL: https: //hacks.mozilla.org/2013/12/gap-between-asm-js-and-native- performance-gets-even-narrower-with-float32-optimizations/. [12] Benjamin C. Pierce. Types and programming languages. MIT Press, Cam- bridge, Mass., 2002. [13] . Asm.js: The JavaScript Compile Target, 2013. URL: http: //ejohn.org/blog/asmjs-javascript-compile-target/. [14] Laurence Tratt. Dynamically Typed Languages, 2009. URL: http://eprints.bournemouth.ac.uk/10668/1/tratt__dynamically_ typed_languages.pdf.

[15] Alon Zakai. Big Web App? - Compile it!, 07.11.2013. URL: https:// kripken.github.io/mloc_emscripten_talk/. [16] Alon Zakai. Emscripten: An LLVM-to-JavaScript Compiler, 2011. URL: http://gitlocker.org/root/kripken-emscripten/raw/master/ src/relooper/paper.pdf.

10