A Rustic Javascript Interpreter CIS Department Senior Design 2015-2016 1
Total Page:16
File Type:pdf, Size:1020Kb
js.rs { A Rustic JavaScript Interpreter CIS Department Senior Design 2015-2016 1 Terry Sun Sam Rossi [email protected] [email protected] April 25, 2016 Abstract trol flow modifying statements. We aimed for correctness in any language feature that JavaScript is an incredibly widespread lan- we implemented. guage, running on virtually every modern computer and browser, and interpreters such This project aimed for breadth of coverage: as NodeJS allow JavaScript to be used as a Js.rs supports more common language fea- server-side language. Unfortunately, modern tures at the cost of excluding certain edge implementations of JavaScript engines are cases. Additionally, Js.rs does not focus on typically written in C/C++, languages re- achieving optimal performance nor memory liant on manual memory management. This usage, as it is a proof-of-concept project. results in countless memory leaks, bugs, and security vulnerabilities related to memory mis-management. 2 Background Js.rs is a prototype server-side JavaScript in- 2.1 Rust terpreter in Rust, a new systems program- ming language for building programs with Rust is a programming language spear- strong memory safety guarantees and speeds headed by Mozilla. It is a general-purpose comparable to C++. Our interpreter runs programming language emphasizing memory code either from source files or an interac- safety and speed concerns. Rust's initial sta- tive REPL (read-evaluate-print-loop), sim- ble release (version 1.0) was released in May ilar to the functionality of existing server- 2015. side JavaScript interpreters. We intend to demonstrate the viability of using Rust to Rust guarantees memory safety in a unique implement JavaScript by implementing a way compared to other commonly used pro- core subset of language features. To that gramming languages. The compiler stati- end, we've tested our coverage using Google's cally analyzes the source code, tracking the Sputnik test suite, an ECMAScript 5 confor- \lifetime" of any heap-allocated data. When mance test suite. all existent pointers to a piece of data has gone out of scope (e.g. at the end of a func- tion), then the compiler determines that the 1 Introduction data is no longer alive and the allocated space can be freed. This memory manage- Js.rs is a prototype JavaScript interpreter, ment system prevents many classic memory targeting core language features. This in- errors by disallowing the programmer from cludes variable assignment, expression lit- accessing uninitialized or already-freed mem- erals, unary and binary operators, function ory. calls, code blocks, and different types of con- 1Advised by Dr. Zdancewic ([email protected]) 1 Compare this system to other memory-safe system. JavaScript has many functional fea- languages, which rely on garbage collection tures, such as first class functions, anony- to find and free memory which is no longer mous functions, and closures. relevant to the running program. Garbage collection incurs significant runtime perfor- JavaScript was first developed at Netscape mance costs. by Brendan Eich 1995 for inclusion with their internet browser. Today, JavaScript Rust code compiles down to a native exe- is defined by the ECMAScript specification, cutable. The Rust compiler uses an LLVM- first released in 1997 by the European Com- backend to emit assembly, taking advantage puter Manufacturer Association (ECMA) in of LLVM's extensive optimization options. response to the emergence of a few forks of In addition, there is no runtime associated the JavaScript implementations. The cur- with executing Rust binaries; Rust is not run rent version is ECMAScript 6, released in by a virtual machine (e.g. Java), nor does it June 2015; ECMAScript 7 is still under de- use garbage collection during execution. velopment. Rust also provides high-level programming JavaScript has been deployed as a server- language features, such as an type system side language since late 1995. It has gained based on interfaces (known as \traits") and popularity as a server development language generics, funtional programming and clo- since Node.js was released in 2006. sures, and a set of concurrency primitives. This makes it a very attractive languages There are many JavaScript resources avail- for people who are concerned about the per- able online. We primarily referenced formance of their program, but wish to use the Mozilla Developer Network's official something more ergonomic than C or C++. JavaScript documentation[4], which provide The Rust standard library is written with breakdowns of many pieces of JavaScript both performance and ergonomics in mind. behavior. Additionally, we made use of a conformance test suite named Sputnik, re- 2.2 JavaScript leased by Google in 2009. Sputnik tests the ECMAScript 5 standard, but includes only JavaScript is an extremely widely used pro- those features which were present in EC- gramming language. It is one of the core MAScript 3. This arrangement was chosen languages used on the Internet; it is included due to the presence of numerous ambigui- on the majority of the websites across the ties in the third specification, which were re- Internet, and is supported by all major web solved in ECMAScript 5. browsers. In addition, it has many desktop applications, e.g., used as a scripting lan- guage, or embedded in PDFs. 3 Design JavaScript is a high-level, imperative pro- 3.1 Parser gramming language. It is typically inter- preted, and is loosely typed; any variable The first part of the interpreter is the parser, can be passed into any function or opera- takes in a string of JavaScript and generates tor. The type system is composed of seven an Abstract Syntax Tree (AST). An AST primitives types (Number, Boolean, null, contains the structure and the content of a undefined, String, Object, Symbol). Cus- program, but not the specific syntactic com- tom types can be created by inheriting from ponents such as whitespacing. Object, using a prototype-based inheritance 2 We wrote our parser using a parser genera- Type System tor, a common pattern in building parsers. This consists of defining the grammar of In JavaScript, a value can either be located the language (i.e. the valid tokens of the on the stack or on the heap. Primitive values language and the valid sequences of those (e.g. numbers, boolean) are located on the tokens which have semantic meaning in the stack, whereas as all references (such as ob- language); the parser generator takes this jects or anything created from a constructor) grammar specification as input and gener- are located on the heap. To model this prop- ates code to parse the grammar in the target erly, our type system used a pair of types for language (in this case, Rust). Parser genera- each value. JsVar contains the type of the tors tend to be a bit slower in practice than value, as well as any stack-related value (such writing an equivalent parser manually, but as the Rust number or boolean containing using a parser generate greatly increases the the value itself); additionally, each JsVar rate of development due to the ease of use. can optionally be paired with a JsPtrEnum, We opted to use a parser generator rather which contains a reference to any heap al- than writing a custom parser, as we wanted located data that the value may require. to focus on language implementation rather Each JsVar also contains an optional unique than performance. binding, which acts as a lookup identifier for use with heap datatype provided by French Press. The two most widely used parser generators are YACC and Bison, which are implemented in C. Neither of these would be suitable for For example, for the object f x: 3.5, y: our project, as we intended to use only Rust "foo" g, the JsVar would indicate that the libraries in our interpreter to maintain the value is of type \object". The JsPtrEnum for safety guarantees. After some research, we the object would contain a HashMap with the decided to use LALRPOP, a pure Rust LR(1) keys "x" and "y". The value for "x" would parser generator[2]. be a JsVar of type \number" and with float- ing point value 3.5, and the value for "y" would be a JsVar with the type \string" and a unique binding which can be used to look up the heap data. 3.2 Garbage Collector JavaScript is a garbage collected language, 3.3 Runtime as it relies on a runtime system to man- age the memory used by a running program. The \runtime" is the most substantial por- This runtime must allocate memory when re- tion of the interpreter, as it is what accepts quested by the language, and free memory the AST as input and evaluates the language when it is determined that allocated data semantics represented by each node. The is no longer accessible by the running pro- runtime is broken down into many modu- gram. We used a garbage collection library, lar components; each module handles a par- French Press. French Press was being devel- ticular type of expression or functionality, oped concurrency by David Mally, a UPenn such as coercion between JavaScript types Masters student, for his Masters thesis. Js.rs or evaluation of literal expressions. Then, is tightly coupled with French Press, sharing higher-level components handle larger pieces a common library defining a shared represen- of code, such as code blocks embedded within tation of the JavaScript type system. function definitions or try-catch constructs. 3 Functions, Scoping, and Closures Assignment statements One of the most complex parts of imple- There are two ways to assign values to vari- menting the runtime was ensuring that it able bindings in JavaScript. A declaration exhibited the correct behavior with regards (var x = y) will create a variable in the cur- to functions and scoping.