js.rs – A Rustic JavaScript Interpreter CIS Department Senior Design 2015-2016 1

Terry Sun Sam Rossi [email protected] [email protected] April 25, 2016

Abstract trol flow modifying statements. We aimed for correctness in any language feature that JavaScript is an incredibly widespread lan- we implemented. guage, running on virtually every modern computer and browser, and interpreters such This project aimed for breadth of coverage: as NodeJS allow JavaScript to be used as a Js.rs supports more common language fea- server-side language. Unfortunately, modern tures at the cost of excluding certain edge implementations of JavaScript engines are cases. Additionally, Js.rs does not focus on typically written in C/C++, languages re- achieving optimal performance nor memory liant on manual memory management. This usage, as it is a proof-of-concept project. results in countless memory leaks, bugs, and security vulnerabilities related to memory mis-management. 2 Background

Js.rs is a prototype server-side JavaScript in- 2.1 Rust terpreter in Rust, a new systems program- ming language for building programs with Rust is a spear- strong memory safety guarantees and speeds headed by . It is a general-purpose comparable to C++. Our interpreter runs programming language emphasizing memory code either from source files or an interac- safety and speed concerns. Rust’s initial sta- tive REPL (read-evaluate-print-loop), sim- ble release (version 1.0) was released in May ilar to the functionality of existing server- 2015. side JavaScript interpreters. We intend to demonstrate the viability of using Rust to Rust guarantees memory safety in a unique implement JavaScript by implementing a way compared to other commonly used pro- core subset of language features. To that gramming languages. The compiler stati- end, we’ve tested our coverage using Google’s cally analyzes the source code, tracking the Sputnik test suite, an ECMAScript 5 confor- “lifetime” of any heap-allocated data. When mance test suite. all existent pointers to a piece of data has gone out of scope (e.g. at the end of a func- tion), then the compiler determines that the 1 Introduction data is no longer alive and the allocated space can be freed. This memory manage- Js.rs is a prototype JavaScript interpreter, ment system prevents many classic memory targeting core language features. This in- errors by disallowing the programmer from cludes variable assignment, expression lit- accessing uninitialized or already-freed mem- erals, unary and binary operators, function ory. calls, code blocks, and different types of con- 1Advised by Dr. Zdancewic ([email protected])

1 Compare this system to other memory-safe system. JavaScript has many functional fea- languages, which rely on garbage collection tures, such as first class functions, anony- to find and free memory which is no longer mous functions, and closures. relevant to the running program. Garbage collection incurs significant runtime perfor- JavaScript was first developed at mance costs. by Brendan Eich 1995 for inclusion with their internet browser. Today, JavaScript Rust code compiles down to a native exe- is defined by the ECMAScript specification, cutable. The Rust compiler uses an LLVM- first released in 1997 by the European Com- backend to emit assembly, taking advantage puter Manufacturer Association (ECMA) in of LLVM’s extensive optimization options. response to the emergence of a few forks of In addition, there is no runtime associated the JavaScript implementations. The cur- with executing Rust binaries; Rust is not run rent version is ECMAScript 6, released in by a virtual machine (e.g. Java), nor does it June 2015; ECMAScript 7 is still under de- use garbage collection during execution. velopment.

Rust also provides high-level programming JavaScript has been deployed as a server- language features, such as an type system side language since late 1995. It has gained based on interfaces (known as “traits”) and popularity as a server development language generics, funtional programming and clo- since Node.js was released in 2006. sures, and a set of concurrency primitives. This makes it a very attractive languages There are many JavaScript resources avail- for people who are concerned about the per- able online. We primarily referenced formance of their program, but wish to use the Mozilla Developer Network’s official something more ergonomic than C or C++. JavaScript documentation[4], which provide The Rust standard library is written with breakdowns of many pieces of JavaScript both performance and ergonomics in mind. behavior. Additionally, we made use of a conformance test suite named Sputnik, re- 2.2 JavaScript leased by Google in 2009. Sputnik tests the ECMAScript 5 standard, but includes only JavaScript is an extremely widely used pro- those features which were present in EC- gramming language. It is one of the core MAScript 3. This arrangement was chosen languages used on the Internet; it is included due to the presence of numerous ambigui- on the majority of the websites across the ties in the third specification, which were re- Internet, and is supported by all major web solved in ECMAScript 5. browsers. In addition, it has many desktop applications, e.g., used as a scripting lan- guage, or embedded in PDFs. 3 Design

JavaScript is a high-level, imperative pro- 3.1 Parser gramming language. It is typically inter- preted, and is loosely typed; any variable The first part of the interpreter is the parser, can be passed into any function or opera- takes in a string of JavaScript and generates tor. The type system is composed of seven an Abstract Syntax Tree (AST). An AST primitives types (Number, Boolean, null, contains the structure and the content of a undefined, String, Object, Symbol). Cus- program, but not the specific syntactic com- tom types can be created by inheriting from ponents such as whitespacing. Object, using a prototype-based inheritance

2 We wrote our parser using a parser genera- Type System tor, a common pattern in building parsers. This consists of defining the grammar of In JavaScript, a value can either be located the language (i.e. the valid tokens of the on the stack or on the heap. Primitive values language and the valid sequences of those (e.g. numbers, boolean) are located on the tokens which have semantic meaning in the stack, whereas as all references (such as ob- language); the parser generator takes this jects or anything created from a constructor) grammar specification as input and gener- are located on the heap. To model this prop- ates code to parse the grammar in the target erly, our type system used a pair of types for language (in this case, Rust). Parser genera- each value. JsVar contains the type of the tors tend to be a bit slower in practice than value, as well as any stack-related value (such writing an equivalent parser manually, but as the Rust number or boolean containing using a parser generate greatly increases the the value itself); additionally, each JsVar rate of development due to the ease of use. can optionally be paired with a JsPtrEnum, We opted to use a parser generator rather which contains a reference to any heap al- than writing a custom parser, as we wanted located data that the value may require. to focus on language implementation rather Each JsVar also contains an optional unique than performance. binding, which acts as a lookup identifier for use with heap datatype provided by French Press. The two most widely used parser generators are YACC and Bison, which are implemented in C. Neither of these would be suitable for For example, for the object { x: 3.5, y: our project, as we intended to use only Rust "foo" }, the JsVar would indicate that the libraries in our interpreter to maintain the value is of type “object”. The JsPtrEnum for safety guarantees. After some research, we the object would contain a HashMap with the decided to use LALRPOP, a pure Rust LR(1) keys "x" and "y". The value for "x" would parser generator[2]. be a JsVar of type “number” and with float- ing point value 3.5, and the value for "y" would be a JsVar with the type “string” and a unique binding which can be used to look up the heap data. 3.2 Garbage Collector

JavaScript is a garbage collected language, 3.3 Runtime as it relies on a runtime system to man- age the memory used by a running program. The “runtime” is the most substantial por- This runtime must allocate memory when re- tion of the interpreter, as it is what accepts quested by the language, and free memory the AST as input and evaluates the language when it is determined that allocated data semantics represented by each node. The is no longer accessible by the running pro- runtime is broken down into many modu- gram. We used a garbage collection library, lar components; each module handles a par- French Press. French Press was being devel- ticular type of expression or functionality, oped concurrency by David Mally, a UPenn such as coercion between JavaScript types Masters student, for his Masters thesis. Js.rs or evaluation of literal expressions. Then, is tightly coupled with French Press, sharing higher-level components handle larger pieces a common library defining a shared represen- of code, such as code blocks embedded within tation of the JavaScript type system. function definitions or try-catch constructs.

3 Functions, Scoping, and Closures Assignment statements

One of the most complex parts of imple- There are two ways to assign values to vari- menting the runtime was ensuring that it able bindings in JavaScript. A declaration exhibited the correct behavior with regards (var x = y) will create a variable in the cur- to functions and scoping. For simple cases, rent scope and assign the value y to it. An French Press handled the correct scoping for assignment (x = y) will modify the variable local variables. However, in the case of clo- x in the current scope or any parent scope sures, the standard scoping rules would not if it already exists; otherwise, it will be allo- suffice. The following code sample demon- cated in the global program scope and set to strates such a case: the value y. Additionally, a variable can be declared but not set (var x), which sets it to functionf(){ the default value undefined. varx = 0; return function(){ returnx++; Literal expressions }; } Literal expressions evaluate to the value they varg=f(); represent, and are defined in JavaScript for console.log(g()); each of the primitive types. For null, console.log(g()); undefined, boolean, and numeric literals, we simply create a variable containing the Normally, when a function returns, all of its appropriate value. For more complex lit- local variables are no longer in scope, so the erals which must be allocated on the heap garbage collector can deallocate the memory (Strings, Objects, and Arrays), we first eval- associated them. However, in the above ex- uate the expression by constructing the ap- ample, when f is called, it returns a new propriate type. For Strings, this includes function which has access to x. This means parsing escaped values and translating es- that x cannot be garbage collected at least caped Unicode values to their corresponding until after g is no longer in scope. In order Unicode characters. For Objects and Arrays, to correctly execute code cases like this, we each sub-component in the literal must be it- had to add functionality for our code to de- eratively parsed, and the value constructed. tect when a function is a closure and inform Then, we allocate heap space and store the the garbage collector of the special status of value. that function’s scope. • Boolean: true, false

4 Results • Numeric: e.g. -4, 7.17, NaN

We implemented a significant subset of fea- • null tures which provide more than enough cover- age to write interesting and useful programs. • undefined

4.1 Language Features • String: e.g. "abc\u1234", 'foo bar'

We successfully implemented parsing and • Object: e.g. { x:3, y:{ z:"x" }} evaluation of the following types of expres- sions and statements: • Array: e.g. [1, "hello", x]

4 Operator expressions Function-related expressions

JavaScript provides a set of binary (a + b) Functions are stored into the current scope and unary operators (++a) to define ex- as a special meta-level type which contains pressions based on one or more variables. the function name and the AST node for the JavaScript does not enforce type bounds at function body. When a function is called, the all, so Js.rs must perform type coercion when interpreter allocates a new scope as a child of an expression contains values of two different the current scope, and will then execute the types. For example, in a logical and state- semantics stored in the function’s AST node. ment, both expressions are first coerced to an boolean value before the logical and is As discussed above, anonymous functions applied. may contain references to its parent scope, creating a closure environment. This may involve operator overloading, as in • Named function definition: e.g. the case of +: in most cases, the interpreter function f(x) { return x + 1; } must coerce both arguments to Number be- fore evaluating the expression. However, if • Anonymous function definition: e.g. either argument is a String, then both argu- function(x, y) { return x + y; } ments should instead be coerced to Strings • Function calls: e.g. foo(a, b) and the operation results in the concatenta- tion of those two Strings. Object-related expressions

Additionally, all binary operators can be Js.rs supports simple Object functionality as used as assignment operators (e.g., a += b. a key-value store. Js.rs does not support the This application in Js.rs is handled by the Prototype inheritance model. parser, which will produce an AST node • Instance variable access: e.g. foo.bar identical to the a = a + b case. • Key-indexed access: e.g. foo[bar] • Incrementation: ++, -- • Constructors: e.g. new foobar(x, y)

• Bitwise logical: &, |,, ^ Control-flow statements Many JavaScript constructions change the • instanceof execution flow of the program. Typically, the interpreter will linearly execute the AST • typeof nodes as returned by the parser. However, the interpreter may traverse into previously • Arithmetic: +, -, *, /, % encountered nodes (e.g., a function call), re- peat the current node until a condition is met • Boolean logical: &&, ||, ! (e.g., for and while loops), or skip certain AST nodes (e.g., an if statement). • Shifts: >>>, >>, << • if/else if/else , loops • Inequalties: >, >=, <, <= • while for • break, continue • Equalities: ==, !=, ===, !== • return • Assignment: =, +=, &&=, ... • throw, try/catch/finally

5 Standard library ferent metrics to analyze the coverage of our interpreter. We implemented a small standard library for our interpreter based on some of the most Category-based coverage widely-used features provided by the official JavaScript standard library. Sputnik defines several categories of tests, each with various depths of subcategories (for example, the “Expressions” category Printing contains, among others, a “Postfix Expres- JavaScript interpreters typically provide a sions” subcategory, which in turn contains global console object, through which pro- the subcategories “Postfix Increment Oper- grams can accept input and provide output ator” and “Postfix Decrement Operator”). to the developer console. We implemented Overall, there are 111 leaf categories (i.e. an abbreviated console containing only the categories which do not contain other cate- log function, which coerces its first argument gories). to a string value and outputs it to the screen. We considered the number of leaf categories in which we had passed some of the tests. Of Prototypes the 111 categories, we had coverage in 73 of JavaScript packages a number of built-in pro- the categories, or 65.8%. This indicates that totypes, such as String, Array, Object, and we covered a sizable portion of the languages Function. A value of a given prototype can features. This metric represents breadth of be created with the new keyword, e.g., new JavaScript that out interpreter can handle. Object(). Js.rs implements many of these prototypes as simply wrapper types around Raw coverage the primitive , , and String Number Boolean Sputnik provides a total of 2427 distinct types. We implemented the keyword by new tests. In the end, Js.rs passes 18.2% of those executing a call to the appropriate construc- tests. Js.rs implements a large subset of the tor. commonly used JavaScript features; how- ever, the Sputnik test suite does not isolate Additionally, Js.rs packages a basic Array individual features for each test but com- prototype, with support for constructing Ar- monly includes many parts of the language ray objects from literals, push, accessing ele- in a single test. Furthermore, a single edge ments by index, and re-sizing by setting the case may be present multiple times under length property. several unrelated feature categories.

4.2 Specification Coverage Some features which we did not implement include: To test how complete our coverage of the JavaScript standard was, we built a frame- • The global Arguments object work to run the Google Sputnik test suite[5] • The conditional operator ( ?: ) on our interpreter. Sputnik targets the ECMAScript 5 standard, but includes only • The void operator those features which were present in EC- MAScript 3. This arrangement was chosen • The delete operator due to the presence of numerous ambigui- • The in operator ties in the third specification, which were re- solved in ECMAScript 5. We used two dif- • The with statement

6 5 Ethics and Privacy to use a custom lexer. By default, LAL- RPOP will tokenize the input by splitting Js.rs is developed as a free and open source on any whitespace (and throwing out the software project, hosted on GitHub. Poten- whitespace itself). While this would work tial users do not have to blindly trust that well in many scenarios, JavaScript inter- the interpreter is not malicious, but can au- preters typically allow newlines to be used dit it for themselves. As running this project in place of trailing semicolons. Addition- on a JavaScript program would require re- ally, correctly parsing single-line comments vealing the full source to the interpreter, it is logically requires the detection of newline important that Js.rs does not contain any in- characters. In order to replicate this behav- formation exfiltration systems or other back- ior as closely as possible with LALRPOP, doors. we resorted to preprocessing the code run through js.rs before parsing it. Like the issue with compilation speed, we likely would have 6 Discussion solved this by writing a custom parser if we LALRPOP errata had had enough time. While LALRPOP generally worked quite well for our purposes, there were a few issues Rust Package Ecosystem we ran into during the development process. Overall, we found that the Rust ecosystem was lacking. It was missing many pack- Compilation speed ages that we were looking for, such as a Readline library that was compatible with While parser generators greatly facilitate de- the stable Rust toolchain. (Ultimately, we velopment, they tend to be less performant used Rustyline[3], which only worked with than custom parsers specifically written for the Nightly release channel.) With regards the source language. Although we were not to stability, almost every major dependency heavily concerned with the code execution we used in our project had compilation errors speed of our interpreter, we were fairly incon- at some point during the development pe- venienced by how long it would take LAL- riod, whether due to breaking changes in the RPOP to generate our parser. As LALR- Rust compiler from upstream, or due to bugs POP only had a single author who was quite introduced in the dependency themselves. busy with other things, we understood that we would not be able to expect as good per- formance as more mainstream parser gen- Multi-package architecture erators. Given more time to improve js.rs, we likely would have implemented a custom One of the choices we made early on in the parser to alleviate the issue. That being said, development of js.rs was to split the inter- several non-optimization related updates to preter into multiple packages. Although this LALRPOP released during the course of our originally was intended to separate the code project greatly improved the development of French Press from the rest of the inter- process, including the addition of human- preter, soon decided to split out other parts readable error messages for shift-reduce and of the interpreter as well. We ended up with reduce-reduce conflicts. four different packages, namely a “common” package with code needed by all other pack- ages, French Press, the parser, and the run- Lexing time. Although this logical separation made Unlike more mainstream parser generators, sense in the abstract, it quickly became cum- LALRPOP does not provide any built-in way bersome to ensure that each crate’s depen-

7 dencies were in sync (e.g. that the parser terpreter, covering roughly 65% of categories and the runtime both used the same ver- in the Sputnik conformance test suite and sion of the common package). Moreover, a significant portion of core language fea- due to the slow speed at which the parser tures. We have set up a robust interpreter compiled, changes to the common package framework, which could be easily extended became rather undesirable, as each change to handle further edge cases and less-often would require the parser to recompile as well. used features. Js.rs packages a simple ob- In retrospect, using separate modules within ject system and implemented a basic Array a single package rather than multiple pack- prototype; we believe it would be straight- ages would have significantly increased our forward to implement the full prototype in- development efficiency. heritance model and to implement additional prototypes found in the standard library. 7 Conclusion We were dissatisfied with the current Rust In the end, we found that Rust was an en- ecosystem, but we believe that the availabil- joyable and robust language to build an in- ity and stability of libraries will improve as terpreter in. The high-level abstractions and Rust continues to mature as a programming type system that Rust provides made it easy language. to bulld the structure of our interpreter, and Rust’s standard library helped set up On the whole, we believe that Js.rs achieved a thorough error handling system in the in- our original goal of building a proof of con- teprreter. cept JavaScript interpreter. It is an easily extensible framework that could be expanded We believe we have achieved our goal of im- on in the future, showing that Rust is a pow- plementing a proof of concept JavaScript in- erful language suitable for such a project.

8 References

[1] Official Rust Language website https://www.rust-lang.org/ [2] LALRPOP, LR(1) parser generator for Rust https://github.com/nikomatsakis/lalrpop [3] Rustyline, a pure-Rust implementation of Readline https://github.com/kkawakam/rustyline

[4] Mozilla Developer Network JavaScript Documentation https://developer.mozilla.org/en- US/docs/Web/JavaScript [5] ECMAScript conformance test suite https://code.google.com/archive/p/sputniktests/

9