Optimizing Language Execution via Aggressive Speculation

Lukas Stadler Adam Welc Christian Humer Mick Jordan Oracle Labs, AT Oracle Labs, US Oracle Labs, CH Oracle Labs, US {lukas.stadler,adam.welc,christian.humer,mick.jordan}@oracle.com

Abstract at the University of Auckland in the 1990s [9]. Its predeces- The R language, from the point of view of language design sor, S, was created by John Chambers et al. at Bell Labs in and implementation, is a unique combination of various pro- the 1970s, and began as a set of macros simplifying access gramming language concepts. It has functional characteris- to Fortran’s statistical libraries [2]. Over the years, the lan- tics like lazy evaluation of arguments, but also allows expres- guage has become increasingly more popular, with its ap- sions to have arbitrary side effects. Many runtime data struc- plications including data discovery and analysis as well as machine learning, over 11,000 packages in the CRAN 1 and tures, for example variable scopes and functions, are acces- 2 sible and can be modified while a program executes. Several Bioconductor repositories, and its user base exceeding 2 different object models allow for structured programming, million [18]. At the same time, the language has also grown but the object models can interact in surprising ways with quite complex, becoming a mix of many different program- each other and with the base operations of R. ming language paradigms in sometimes unusual and surpris- R works well in practice, but it is complex, and it is a chal- ing combinations. For example, R’s syntax is superficially lenge for language developers trying to improve on the cur- similar to languages like Java or JavaScript, but at the same rent state-of-the-art, which is the reference implementation – time all arguments are lazily evaluated. It is both functional, GNU R. The goal of this work is to demonstrate that, given in that functions in R are first-class data structures and it the right approach and the right set of tools, it is possible is, in practice, largely side effect free, but it is also object to create an implementation of the R language that provides oriented and those side effects that do occur are largely un- significantly better performance while keeping compatibility controlled (in particular, due to lack of support in the type with the original implementation. system). R also supports deep introspection and allows for In this paper we describe novel optimizations backed extensive modifications to runtime data structures, for ex- up by aggressive speculation techniques and implemented ample, iterating and modifying symbol lookup chains at run- within FastR, an alternative R language implementation, uti- time. lizing Truffle – a JVM-based language development frame- For over 20 years R had only one reference implementa- work developed at Oracle Labs. We also provide experimen- tion, GNU R, and it is only recently that alternative imple- tal evidence demonstrating effectiveness of these optimiza- mentations with varying goals and underlying assumptions tions in comparison with GNU R, as well as Renjin and started to emerge (see Section 7 section for details). GNU R TERR implementations of the R language. is the most mature implementation and has worked well in practice, but has some limitations (e.g., slow execution of Categories and Subject Descriptors D.3.4 [Programming R code due to it being interpreted) that make it less suited Languages]: Processors—Optimization, Run-time environ- to face some of the emerging challenges, particularly in the ments realm of so called big data. Keywords R language, Truffle, Graal, optimization In this paper we present FastR3, an alternative implemen- tation of R that efficiently implements all important and dif- 1. Introduction ficult to optimize features of the language. FastR is built on top of Truffle [27], a framework for developing program- The R language is an open-source descendant of the S lan- ming languages utilizing a JVM (Java Virtual Machine). guage, and was created by and Robert Gentleman Our main contribution is to demonstrate that, even when Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed considering a complicated set of language features, aggres- for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, 1 http://cran.r-project.org 2 http://www.bioconductor.org to post on servers or to redistribute to lists, requires prior specific permission and/or a 3 fee. Request permissions from [email protected]. We share the name and part of the code base with our (mostly spiritual) predecessor, called Purdue-FastR [11, 17]. This earlier prototype is now DLS’16, November 1, 2016, Amsterdam, Netherlands c 2016 ACM. 978-1-4503-4445-6/16/11...$15.00 largely abandoned and it did not implement many important language http://dx.doi.org/10.1145/2989225.2989236 features, such as the S3 object model that is extensively used in R libraries.

84 sive speculation expressed in terms of only a small set of chains, but also due to function code being “polluted” by the fundamental techniques, if properly supported by the un- code needed to obtain concrete argument values (see Section derlying infrastructure, can be very effective in deliver- 5.2 for more details). ing a highly performant language runtime, especially when Another complication is that assignment operator, index- combined with a carefully selected and implemented set of ing operator, curly brace and all keywords are implemented language-level optimizations. In particular: as functions5. These functions can potentially have side- effects, and their definitions can change at any time during • We describe how different facets of speculation, that is execution of the loop, for example, looking up the assign- caching, system-wide assumptions and specialization, ment function <- could lead to different results in different help in developing novel optimizations of R’s lazy eval- iterations of the loop. Furthermore, because variable scoping uation in presence of largely unrestricted side effects. in R is dynamic and can be modified at the language level • We show how the same techniques are effective in opti- (and these modifications can be triggered as a side-effect of mizing R’s symbol lookups, function and method calls, a function execution), it cannot be trivially guaranteed that and vector access operations. x is going to point to the same data structure throughout the • We demonstrate effectiveness of our lazy evaluation opti- entire execution of the loop. Finally, the function implement- mizations as well as present overall performance compar- ing the subset operator is generic, so that a specific func- ison between FastR and other implementations, based on tion to execute is chosen at runtime based on the existence experimental evaluation using a variety of benchmarks. and content of the vector’s attributes. These attributes, which In our set of benchmarks, FastR is on average ∼97x faster are meta-data that can be associated with any data structure, than GNU R (with ∼7x geomean speedup). can change freely, and the chosen function can be arbitrarily complex (see Section 5.3.1 for additional details). 2. The Challenge This paper will demonstrate that using aggressive specu- lation techniques, as well as the right tool chain (described in On the surface, R looks similar to other popular program- the following section), it is possible to remove many of the ming languages such as Java or JavaScript. For example, R language’s runtime overheads and improve on the current consider the following function: state-of-the-art. function(x) { for (i in 1:10000) { 3. Truffle Overview x[i] <- i } Truffle [27] is a framework for implementing language run- } times in Java utilizing JVM services (e.g., memory manage- ment). Its API provides many useful building blocks for cre- With <- and [] being R’s assignment operator and indexing ating a language runtime, for example: operator, respectively, and 1:10000 representing a sequence • of numbers between 1 and 10000, execution of this state- base node classes for building an Abstract Syntax Tree ment will result in populating some indexable data structure (AST) interpreter x with consecutive numbers. Two main indexable data types • a domain specific language (Truffle DSL) that helps in R are vectors and lists, the former containing elements of building specialized node classes and simplifies construc- the same primitive type (e.g., a number), and the latter po- tion of general-purpose inline caches tentially containing elements of different types. Operations • support for efficient management of variable scopes on vectors of numbers are very frequent in R and, as such, it (called frames) is expected that they are performed efficiently. The example • above, assuming x to represent a vector of numbers, would source code management and interfaces for tools like ideally have similar performance to an array assignment in debuggers Java. In a Truffle-based interpreter, the ASTs that represent a Unfortunately, syntax is where the similarities between R program will start out in an uninitialized state and specialize and languages like Java or even more dynamic languages according to the actual behavior at runtime. This means that like JavaScript end. In particular, function arguments are the trees only contain implementations of operations for data lazily evaluated in R, so that each argument (e.g., argument types that have been encountered at runtime. For example, if x in the example above) is represented by a promise 4 – an up to a certain point the ”+” operation was only executed internal data structure storing the expression and the evalua- with integer arguments, and is then executed on floating- tion environment used to compute concrete argument values point values, the framework will rewrite the operation in the when needed. This complicates the implementation, not only tree to a more generic implementation that can handle both due to the overhead of creating promises throughout the call cases. As long as language developers are careful to provide

4 In other lazy evaluated languages, such as Haskell, also called thunk. 5 The special syntax is syntactic sugar that is removed by the parser.

85 progressively more generic operation implementations, the where appropriate, either translated into Java or called as C tree will stabilize eventually. or Fortran code through the R native interface. Truffle also provides assumptions, which can be used to The goal of FastR is to be a complete and fully GNU R guard important system-wide conditions. They start out in compatible implementation of the R language, including the “valid” state and can be invalidated explicitly if a con- development tools such as the standard debugger support, as dition they are guarding is about to be broken. The main well as support for third-party R packages available in public property of assumptions is that checking them is very cheap repositories. Since third-party packages are such a big part (potentially free under certain circumstances), while invali- of the R ecosystem, it is crucial for FastR to be able to install dating them is a very expensive operation. This can be used and load these in exactly the same way as GNU R from the to speculate on conditions that are very likely to be true, but user’s perspective. Installing non-trivial packages exercises which cannot be statically proven so. As long as the number most of R’s sophisticated language features, so that FastR of failing assumptions is small, the tradeoff between cheap had to reach a significant level of compatibility with GNU R checking and expensive invalidation is beneficial. in order to be able to do so. Truffle can be enhanced by employing a just-in-time com- Performance-wise, the main focus of FastR is to acceler- piler called Graal [27]. Graal can take the ASTs produced ate execution of the R code. Currently, a typical work flow by a Truffle language and compile them to efficient machine for complex long-running R applications is to prototype an code. It does so via partial evaluation, also called the Second algorithm in R and then identify and re-write performance- Futamura Projection [7]. The important property of this pro- critical portions of R code in C/C++. This is time consum- cess is that it generates a compiler from the interpreter, so ing, error-prone, and difficult to test for correctness (e.g., that language implementations on top of Truffle do not need due to differences in how languages handle floating point a separate custom-built compiler. computations), and will hopefully become unnecessary with Compilation to native code happens only after the AST projects like FastR providing more efficient execution of R stabilizes and so called “hot paths”, that is portions of the code. Nevertheless, since many packages include code writ- application code that are executed most frequently, have ten in C/C++ or Fortran, FastR must implement the interface been identified in the course of interpreted execution. This for calling to and receiving callbacks from such code. Unfor- compilation strategy makes language implementations using tunately the native interface is very complex – it exposes the Truffle and Graal particularly well suited for executing long- internals of the GNU R implementation and assumes that an running applications, which aligns well with FastR’s focus implementation is in C/C++ and can freely share memory. on peak performance (as opposed to startup performance). This is problematic for a Java implementation as Java has Truffle provides an additional mechanism to language de- very strict rules on the use of data outside the JVM as estab- velopers, so called profiles. They allow for passing addi- lished by its Java Native Interface (JNI). tional information to the compiler to help it generate high- At the time of writing, FastR implements all the major performance native code, at a small additional cost to the features of the R language, including but not limited to sup- interpreted code. Numerous different types of profiles exist, porting all user-facing R data structures, lazy evaluation, and with the most important ones being condition profiles and both S3 and S4 object models. The main remaining missing type profiles. Condition profiles are used to remember con- features are full graphics support as well as selected builtin ditional branches that were never taken (to help the compiler functions and parts of the native interface. Nevertheless, generate more concise code). Type profiles remember dy- FastR is complete to the point that it can correctly install over namic type information and help in call devirtualization. 2000 of the CRAN packages as well as run (in parallel) un- Another integral part of achieving high performance is modified selected production applications. The FastR code Graal’s ability to deoptimize and recompile native code. base also includes approximately 10,000 unit tests which Even if the execution stabilizes on a certain set of discovered compare individual operations against the GNU R reference types and valid assumptions, it cannot be guaranteed that in implementation. Some of the tests were written by hand and the future no new types are introduced and no assumptions some were generated automatically by the testR [23] project. are violated. If the framework detects such situation, native To summarize, since FastR implements the majority of code can be invalidated with the execution falling back to the important language constructs, its semantic compatibility interpreter, and recompiled after execution stabilizes again. with GNU R is high, but more work is required on complete- ness as, for example, many R applications rely on packages 4. FastR Overview with native components that FastR does not yet support. FastR is an open source6 implementation of R in Java on top of the Truffle framework. As with all Truffle-based lan- 5. R Language Optimizations guage implementations, it is at heart an AST interpreter that Implementing the R programming language is a challeng- specializes the tree at runtime. It reuses code from GNU R ing task for VM and runtime engineers. It includes many 6 https://github.com/graalvm/fastr programming language features and paradigms that were as-

86 sembled with little concern for efficient and compiled exe- 5.1 Symbol Lookup cution. In this section we show how, despite R’s complex- The seemingly simple task of looking up a symbol repre- ity, the majority of language performance problems can be senting a variable or function name is a complex operation mitigated via aggressive speculation, where the runtime sys- in R, due to the way the lookup scopes are chained as user- tem optimizes an executing program based on a certain set accessible and user-modifiable first-class objects. of predictions (e.g., that the number of different argument A mapping between symbols and their values in R is rep- types for a give function call is small), but is also prepared resented by an environment. If a symbol is not found in to handle the general case, even if not as efficiently. In FastR an environment, lookup continues automatically in the en- we distinguish three major different speculation techniques: closing environment. Certain environments are pre-defined Inline Caches In dynamic languages it is often hard or im- at startup, for example the base environment containing def- possible to determine the target of a specific call at com- initions of builtin functions and system variables, or the pile time, so that every call is in principle polymorphic. global environment representing the outermost scope for In general, the number of call targets is unbound, but we user-defined variables and functions. A linked list of (enclos- can speculate that the number of targets encountered at ing) environments starts at the global environment, contin- runtime is small, and that they can be cached to avoid ues with package environments containing symbols defined expensive lookups and call preparation. by currently loaded packages, and ends at the base environ- ment. This list of environments is called the search path, Inline caches, though typically associated with function and all lookups for basic operations like <- (assignment) or or method call optimizations, do not have to be restricted length (builtin function returning length of an R data struc- to call sites – they are a generic concept that can be ture) will search through all of them before finding the target applied in all places where the number of options is function in the base environment. unbound in theory, but in practice only a small set of New environments are created upon each function call to concrete options will appear at any one place. create a scope for the function’s local variables. The function The so-called guards, used to determine whether a spe- environment gets its enclosing environment from the func- cific inline cache entry applies in the current situation, tion’s enclosing environment. A function’s enclosing envi- are not restricted to simple equality or type checks. How- ronment is initially taken from the point where the function ever, all inline caches use guards whose runtime cost is was created (this can be an outer function environment, or negligible compared to the actual operation they are used the global environment at the top level), but it can be modi- for. fied freely. Assumptions Inexpensive assumptions (see also Section 3) The main difference between R and other superficially can be used to speculate that certain system-wide con- similar languages, such as Java or JavaScript, is that envi- ditions which need to be checked frequently will hold ronments can be explicitly added and removed to and from throughout a program’s execution. the search path, for example by loading or unloading R pack- The underlying system tries to make checking assump- ages. Furthermore, R programmers can create new environ- tions as fast as possible, while invalidating an assumption ments and use the builtin functions attach and detach to may be an expensive operation. manipulate the search path. Modification to variables in most cases only modify the current environment, but R also pro- The use of assumptions needs to be carefully designed so vides variable access capabilities that can reach out to the that the overall system stabilizes to a state where no more outer scope(s), for example, the super assignment operator invalidations occur. <<- that assigns in the enclosing scope. The combination of Specialization Specialization refers to the principle of di- all these features makes symbol lookup very dynamic and viding the implementation of an operation into smaller difficult to optimize – a na¨ıve implementation would per- pieces intended to handle specific cases, and speculating form a full lookup each time irrespective of whether the sym- that only a limited set of code paths will be explored dur- bol binding or the search path has changed or not. ing program execution, which can greatly simplify the FastR uses Truffle facilities so that the actual contents work of an optimizing runtime and compiler. A common of the environment (frame at the Truffle level) and the set use of this is type specialization, with one code path for of names that may be available in an environment (called each data types, but an implementation may also special- frame descriptor and consisting of frame slots) are separate ize (and have separate code paths) for positive or negative entities. A function, for example, creates a new environment values, values below or above a certain threshold, etc. for each call, but the frame descriptor is shared by all these environments. While new environments are created all the These fundamental techniques permeate the entire imple- time, only a limited number of frame descriptors is created, mentation of FastR, starting with symbol lookup, through and the set of names they may contain will stabilize over lazy evaluation, to other performance-critical implementa- time. tion components such as function calls and vector accesses.

87 Given that functions always use the same frame descrip- • enclosing frame descriptor of f2 is f1 tor, the mapping between a name of a local variable and the • length is not available in f1s frame descriptor. actual position in the frame only needs to be determined the • first time the function is executed, and can be cached for sub- enclosing frame descriptor of f1 is the frame descriptor sequent execution. The optimizing Truffle runtime can elide of the global environment allocating the function environment altogether in many cases • length is not available in the frame descriptor of the through escape analysis [20]. global environment In order to accelerate R language’s symbol lookup pro- • ... (these steps continue through the remainder of search cess while remaining faithful to its semantics, for non-local path until reaching the base environment) lookups FastR creates Truffle assumptions for a number of • system-wide conditions: length is available, with a specific known value (the length primitive) in the base environment • For a given frame descriptor, an assumption asserts that there is only a single known environment using the frame Almost all non-local lookups in R code can be satisfied descriptor. Constructing a second environment with the by only checking assumptions. The first time a lookup is same layout will invalidate this assumption. executed, FastR collects the set of assumptions needed to guarantee that the lookup result remains valid. For subse- • For a given frame descriptor, an assumption asserts that quent executions, only the assumptions need to be checked, the frame descriptor of enclosing environments does not which does not incur any runtime cost as soon as the code is change. For example, the frame descriptor of the environ- optimized. ment of an inner function will always have as an enclos- If FastR detects that a symbol lookup is not stable and ing frame descriptor the frame descriptor of the environ- cannot be fulfilled using assumptions, it will fall back to ment of the outer function. While the actual environments doing the full lookup every time. This is a rare case in change, the frame descriptors and their relationships usu- practice. ally stay the same. • For a given frame descriptor, an assumption asserts that 5.2 Lazy Evaluation specific names are not available in this frame descriptor. R uses a call-by-need lazy argument evaluation strategy – a • For a given frame descriptor (which has a single, known promise (see Section 2), consisting of a code snippet and its instance – see the first condition), frame slot-specific evaluation environment, is used to compute a concrete value assumptions assert that the symbol-to-value binding does of an argument. The actual value of the promise is calculated not change, that is, a given symbol always corresponds to (forced) as late as possible and only if it is needed. A promise the same value. is only forced once, and the computed value is cached within Inserting new names into an environment, changing the nest- the promise. ing of environments, or changing values inside environments A straightforward implementation of call-by-need, such can potentially invalidate these assumptions. These are ei- as that of GNU R, is rather simple, but as other researchers [4] ther very infrequent operations, or the invalidation only hap- observed, it is typically less efficient than eager argu- pens the first time they are executed, so that overall the time ment evaluation strategies, such as call-by-value. Creating needed for invalidating assumptions is negligible. promises and going through an additional level of indirec- Consider the following piece of R code that defines func- tion to acquire the actual value is a noticeable overhead, but tion f1 at the global scope and f2 as an inner function in for optimizing runtimes the main problem is that every point f1: in the program that can potentially force a promise becomes a call site that can execute arbitrary code. f1 <- function(x) { f2 <- function(x) { There are two basic approaches of bringing the perfor- length (x) mance of lazy evaluation closer to that of the eager ones but, } as we discuss in Section 7, they cannot be used with FastR. f2(x) They either use static analysis or rely on language features } to control side-effects, while FastR uses JIT compilation and f1 (42) has largely uncontrolled, if infrequent, side effects. Note that the result of the last statement in a function The main idea underlying FastR’s approach to imple- is the return value of this function. When the call to f1 is menting lazy evaluation is that not all promises are created executed, it will in turn call f2 and return the length of equal, and different promise categories require different op- the parameter (in this case – 1). In order to find the actual timization approaches. function that the symbol length is bound to in f2, FastR will discover and check the following assumptions: 5.2.1 Categorizing Promises • length is not available in f2’s frame descriptor We distinguish the following promise categories:

88 Eager promises represent arguments where a local variable promises degenerates to that of default promises, and the call is used as a parameter: site that originally generated the eager promise is instructed to generate default promises. x <- 17; foo(x) The implementation of default promises is straightfor- Indirect (“promised”) promises represent arguments that ward and conceptually similar to that of GNU R. A default were not forced yet and are passed as parameters to promise contains a Truffle node representing the argument subsequent function calls: expression along with the associated (caller) environment. In this case FastR pays the cost associated with passing along bar <- function(x) { foo(x) } the environment, but it can still accelerate the actual compu- tation of the value at the point where the promise is forced. Default (“complex”) promises represent all other promises, FastR creates inline caches for points in the code that repeat- which can contain arbitrary code that needs to be evalu- edly evaluate promises with similar expressions. This allows ated to obtain the concrete argument value: the optimizing Truffle runtime to incorporate the code for foo(x + bar(y)) the specific expressions, thereby exposing more opportuni- ties for further optimizations. Even though there are no systematic studies on what Indirect promises, while technically being an instance of kinds of promises dominate in R code, the authors’ expe- eager promises (in that their creation does not involve any rience working with R applications and standard libraries expensive operations, such as passing along the environ- suggests that even such a simple classification creates oppor- ment), in reality are simply wrappers around default and ea- tunities for improving lazy evaluation performance, and this ger promises and their evaluation performance is that of the paper seeks to back up this hypothesis with the experimental promise they are wrapping. results. 5.3 Function and Method Calls 5.2.2 Implementing Promises From a certain point of view, R is a functional language in Given that they provide the largest potential for optimiza- that functions in R are first-class citizens of the language tions, as the overhead related to these promises can be re- – they can be stored in variables, passed as arguments, re- moved almost completely, FastR focuses its efforts on ea- turned from other functions, etc. As described in Section 2, ger promises. When FastR determines that the argument to many language constructs that in other languages are purely a function call is an eager promise, that is, it passes along a syntactical (e.g., curly braces enclosing a code block), are local variable, it evaluates the local variable and stores the expressed as function calls in R. Fortunately, since functions value of the argument with an eager promise. While this are stored in variables, function lookup proceeds similarly to evaluation clearly has no side-effects, the eagerly evaluated the “generic” symbol lookup described in Section 5.1, with value is correct at the point when the promise is forced only the difference beeing that non-function data structures are if the input variable involved in the argument value compu- skipped during lookup. Consequently, if neither the variable tation has not changed in the meantime (e.g., through use storing the function nor the search path changes, the function of the <<- super assignment operator). The eager value also definition to be used at a given call site remains constant. cannot be trusted any more if functions that hand out ref- The function calling procedure is non-trivial as not only erences to the environment, like environment(), are used. there are many different function types but also R uses a We monitor these invariants by associating a Truffle assump- complicated algorithm for matching function parameters to tion with each eager promise – if an input variable changes its arguments. Therefore we specialize function calling pro- or the environment escapes, the assumption is invalidated, cedure based on the type of function being called and utilize thus invalidating the pre-computed value. In this rare case, caching of argument signatures 7 to accelerate the argument the value must be re-computed from the complete promise. matching procedure. Whenever we need to actually evaluate an eager promise, R language function types include “regular” functions we need its evaluation environment. Since storing the ac- implemented in R, builtin functions implemented in the host tual environment in eager promises would break many opti- language of the R language implementation, and native func- mizations (e.g., Truffle frames can no longer be “virtualized” tions implemented externally in one of the “native” lan- and turned into native stack frames), and thus make eager guages (Fortran, C or C++). Different function types are promises less efficient, FastR stores a marker that can be called differently, and the code handling their invocations is used to locate (by traversing the execution frame stack) the carefully tailored to handle different cases. For example, in- correct environment. If the eager promise assumption holds, vocations of native functions and of most of the builtin func- the performance of eager promises is close to call-by-value, because all that is needed is a small amount of additional 7 As opposed to the static notion of function signature, we use the term context for each parameter. On the other hand, on the slow argument signature to describe a dynamic ordered list of arguments used path (after re-computation is triggered), evaluation of eager for a given function invocation.

89 tions require their arguments to be evaluated (i.e., promises necessary, and function located via symbol lookup (often by representing arguments to be forced), and certain types of analyzing assumptions only) can be used directly. In case builtin functions use a different simplified version of the ar- a class attribute was found, the lookup progresses along a gument matching algorithm. well-defined sequence of function names that need to be This is additionally complicated by the fact that, in ad- considered. For example, if the + operation is executed with dition to being functional, R is also object-oriented, with arguments of class foo, then +.foo (generic dispatch) and the most popular object model, that is S3, being tightly inte- Ops.foo (group generic dispatch since + belongs to the Ops grated with the function execution machinery 8. group) need to be looked up, and only if none of these func- tions exists, the actual primitive operation will be executed. 5.3.1 S3 Object Model FastR builds an inline cache based on the classes that As mentioned in Section 2, any R data structure, for exam- were seen before, so that the list of function names to be ple a vector, can have a list of attributes, that is name-value looked up can be precomputed. The lookups themselves can pairs, associated with it. S3 method dispatch is based on the usually all be fulfilled using assumptions, so that the S3 content of one such attribute, called class. To illustrate how dispatch can subsequently be executed very quickly. method dispatch interleaves with regular function invoca- 5.3.2 Argument Matching tions, consider the following example, where we first create a one-element vector and then define a “regular” function The argument matching algorithm in R is unusually compli- returning the content of this vector increased by 10: cated - it involves positional arguments, named arguments (with partial name matching), default arguments, and vari- v<-structure(42) adic arguments. The algorithm, which creates a plan for re- f<-function(x) { x + 10 } ordering the arguments from the argument signature at the Unsurprisingly, calling f on vector v, f(v) results in gen- call site to the formal signature of the target function, con- erating the value 52. However, after we redefine f to be a sists of four phases: generic function: • Match named arguments that have an exact match in the f<-function(x) { UseMethod(’f’) } formal signature. • we can create a vector with a single class attribute and Match named arguments with a partial (but unambigu- an S3 method (which is, for all intents and purposes, also ous) match in the formal signature. an R function) that will be chosen by the runtime (via the • Fill the remaining formal arguments with the remaining UseMethod builtin) to execute if its argument has the same actual arguments (positional matches). class attribute as the method’s suffix (after the dot): • Put all remaining arguments into the variadic argument v<-structure(42, class=’foo’) (expressed by ...) if present. f.foo<-function(x) { x - 10 } For example, considering the following function definition In this case, the same invocation uses the generic function, (..1 and ..2 retrieve first and second argument from the calls the defined method, and generates the value 32. This is list of variadic arguments): called generic dispatch, and not only “regular” R functions f<-function(a, b, carg, ..., d=6) { can serve as generic functions, but also builtin functions, list(a, b, carg, ..1, ..2, d) for example those representing vector access and arithmetic } operations. This category of builtins implicitly performs a The function invocation f(b=2, 1, c=3, 4, 5) will re- dispatch largely similar to UseMethod. sult in a list consisting of consecutive numbers between 1 R also supports group generic dispatch, where program- and 6: b and carg arguments are matched first (with c being mers can define methods that will be executed for groups of a partial, but unambiguous, match), followed by value 1 be- (builtin) functions instead of for just a single function as in ing matched to the first unmatched argument a, then values the example above. The function groups are predefined (e.g., 4 and 5 are matched to the variadic argument, and finally the functions representing arithmetic operators) and can be use- default value is used for argument d. ful when defining operations for custom data structures. In GNU R, the argument matching algorithm is run for When the UseMethod builtin function is called, or a prim- every function invocation. FastR optimizes argument match- itive operation with generic dispatch is encountered, FastR ing in several ways. Argument signatures, which consist of needs to perform S3 lookup. If the argument does not have an ordered list of argument names, are interned in a global a class attribute, then in most cases no additional action is data structure, so that comparing for equality of signatures is simply an identity comparison between signature objects. In 8 Another object model available in R, that is S4, is a much later and less tightly integrated addition. Even though we implement it in FastR, we will places where FastR performs the argument matching algo- not discuss it further as its performance is considered less critical to the rithm, it also creates an inline cache of argument and for- point of some companies discouraging its use in their code [8]. mal signatures. If a subsequent function invocations uses

90 the same signatures, the same argument permutation can be The guards for the vector inline cache are based on how reused. An inline cache size of four elements is enough to vector accesses are used in R, so that most vector operations elide falling back to the complex matching logic in all but an have just a single entry in the inline cache (see Section 6.4 insignificant number of cases. for detailed numbers). However, in cases where the opera- tion is extremely polymorphic, changing types and dimen- 5.4 Vector Accesses sions continuously, we fall back to using the least recently R’s main data types are vectors, there are no scalar numeric used (LRU) caching strategy. The LRU cache does not be- data types, that is, even the number 42 is a 1-element nu- come stable, so the operation will run in a more generic and meric vector. In order to make these vector data types as less efficient mode in such cases. useful as possible, vector access operations are very pow- FastR tries to keep the code for vector access operations erful and support many different filtering and selection use as small as possible at runtime, by hiding expensive pieces cases. Elements of a vector can be extracted and replaced of functionality behind small and efficient checks. If one of using dynamic single and multi-dimensional (for accesses these checks fails, FastR will deoptimize and recompile to to matrices and multi-dimensional arrays) indexes. Indexes accommodate the new situation, but will not insert a new can consist of multiple (for multi-element accesses) logical line into the inline cache. For example, FastR speculates on values, positive or negative integer values, or string values. the vector length to always be constant if the vector length Vector accesses are very common in R, which is why FastR is below a certain threshold 9. This exposes additional opti- goes to great lengths to optimize these operations using in- mization opportunities, for example unrolling loops iterating line caching and specialization. over the content of a vector. If FastR encounters multiple tar- As a first level of optimization we split the operation get vector length values or values which are greater than the using an inline cache for patterns that are usually stable for threshold, it assumes a dynamic value for the target vector a particular vector access operation. For example, consider length. We found the following list of aggressive specializa- the following two-dimensional target vector (i.e., a matrix) tions beneficial for the vector replace operation: vec with named columns and rows: • target vector is not shared and can be reused col1 col2 • constant length of target, index and value vectors if below row1 1 3 row2 2 4 a certain threshold • constant number of positions selected if below or equal a and the following two-dimensional value vector val: certain threshold col1 col2 • dimension indices contain NA (“not available”) values row1 7 42 • has only positive or only negative vector indices In order to replace both values of the target vector in “row1” • has no zero vector indices with the content of the value vector, we would use the vec["row1", 1:2] <- val replacement operation. • for string based indexing: string indices mapped to the For these kinds of vector replace operation, we cache for same integer indices each combination of: • has not seen any error condition (e.g., out of bounds) • type of the target vector (in this case – double) In addition, many special checks are needed in order to • number of dimensions of the value vector (in this case – provide exactly the same behavior as GNU R in corner cases. 2) While these corner cases might seem unimportant, they are • types of values used for indexing (in this case – String exercised by existing R code, and can lead to hard-to-debug for the first index and double for the second) problems if ignored. • type of the value vector (in this case – double) 6. Experimental Results • type of the replacement operation (R uses two different replacement operators, with slightly different semantics: In our performance evaluation we demonstrate the effects [<- and [[<-) of our lazy evaluation optimizations in Section 6.3. We also pitch FastR against the current state-of-the-art – in Section Please note that the example is only one of many differ- 6.5 we compare the following R implementations and con- ent vector access variants, and in order to make the code for figurations: all these combinations maintainable, we develop these op- erations against a set of generic interfaces. By injecting a • GNU R “base” – a default configuration of GNU R using concrete type from the inline cache, the Truffle partial eval- AST interpretation uator is able to remove all expensive interface calls reliably in the optimized code. 9 Currently the threshold is experimentally set to 4

91 aksie,adas oie h atrs hti aggregates it that results so re- computation latter final the other modified by also and taken suites, route mark used the that follow [11] then yet searchers We cannot run. FastR that and rely components load native applications with R packages existing suite on many benchmark additionally, R official and, no exists knowledge, our of best the To Benchmarks 6.1 compatibility. semantics language R or level offer- similar FastR) a (including ing platforms complete three per- most all with imple- our the mentations, R for are alternative R-based they TERR non-GNU that “production-ready” and is Renjin comparison, choosing formance for reason The in–sebnhakdsrpin[]frdtis,consisting details), for [6] description simula- benchmark n-body see or – manipulation tion tree binary appli- as small Computer (such of for consist cations and created [6] Game problems Benchmarks of Language implementations R are 10 • • • • tews,det ayeauto,ata optto a eskipped. be may computation actual evaluation, lazy to due Otherwise, aieipeetto fRlnug utm nC/C++ FastR in runtime language R of implementation native R) for Runtime Enterprise (TIBCO TERR JVM a of top on programs R Renjin interpreter code “BC” R GNU

3 lentv mlmnaino executing R of implementation alternative – [3] speedup over unoptimized # promises / time 1 2 5 0e+00 2e+05 4e+05 6e+05 8e+05 1e+06 EAGER &CACHE CACHE ONLY EAGER ONLY ofiuaino N sn byte- using R GNU of configuration a – bt 10

shootout bt h hootsiebenchmarks suite shootout The . fn fn

fa fa

6 and [6] fr

iue2. Figure fr kn ma b25 kn nb 2]bench- [24] 2]–alter- – [22] mato ayeauto optimizations evaluation lazy of Impact

ma 1. Figure pd

nb rd 4e+06 rc pd rms statistics Promise sn 92 rd mc1 ecmrsw use we benchmarks rc iu evr65,wt iu enlvrin263.Please 2.6.32. version kernel Linux Enterprise with Hat Red 6.5), of Server derivative Linux (a running 6.5 and Server RAM Linux for Oracle 264G Bridge) featuring (Ivy sockets, cores two CPU v2 between E5-2697 Intel 2.70GHz an is 24 experiments with our Xeon running for used we machine The Configuration 6.2 for otyo oeadsrsigdfeetapcso h lan- the of aspects different The stressing implementation. and guage code R of mostly ( os–setebnhakdsrpin[4 o details). for [24] description benchmark the divi- see common – grand sors finding or transposition matrix as (such involves last ( cases The subgroup mark boundary. some Java-to-native in the implementation crossing even FastR R, GNU the and FastR though in in- similarly to perform merely they include groups that We two dicate these per- code. for the R numbers performance from of the performance interesting improving particularly of not spective such, code as native in are, and functions and builtin in time of majority first spend The each. benchmarks ( 5 two of consisting subgroups 3 into redux ecmrsa follows: as benchmarks

mandelbrot mc2 ntepos eabeit h ae fthe of names the abbreviate we plots, the In (

prog rc reverse-complement mc3 matcal ),

fa mc4 subgroups. sn

( mc5 fasta ),

and pr1

nb mf1 prog ), matfunc

( mf2

fr pr2 n-body mc ersnssml optto tasks computation R simple represents )

( mf3 fasta-redux

,and ), pr3 for

nov arxcluain and calculations matrix involve ) mf4 bt ), b25 matcal

pd mf5

( pr4 binary-trees sn

ecmr ut sdivided is suite benchmark pr1 ( pidigits ( ), pr5 spectral-norm , pr2 mf kn AVG pr3 for ( ), k-nucleotide

), pr4 rd matfunc GEO fn pr5 ( b25 regex-dna ( .For ). fannkuch- shootout and bench- ), b25 ma pr ), note that while the machine is parallel, both R language and on a log scale (values greater than 1 indicate speedups and benchmarks in both our suites are inherently sequential, and smaller than 1 indicate slowdowns). The right-most bar (EA- the benefits of parallel architecture can only be observed if GER & CACHE) represent the impact of both inline caching the language implementation supports implicit parallelism. and of utilizing eager promises. We observe only few minor Currently only Renjin claims existence of such support, with slowdowns resulting from applying the optimizations, with both GNU R and FastR focusing on sequential performance. speedups over the unoptimized implementation reaching All benchmarks, except for the runs used to gather statis- over ∼8x and average performance doubled (with geometric tics, are executed within a common harness, with each mean at over ∼1.6x). The left-most (EAGER ONLY) bar rep- benchmark application being executed in a separate pro- resents a configuration that only enables eager promises and cess. For each benchmark application, the harness executes a the middle bar (CACHE ONLY) represents a configuration certain number of “warmup” iterations followed by a “mea- that only enables inline caching – their analysis indicates that surement” run. The intention here is to present the peak- one size does not fit all. Sometimes the benefit of caching is performance numbers since FastR is targeting long-running more pronounced, as evidenced by shootout’s k-nucleotide applications where warmup-time effects (such as compila- (kn) benchmark, and sometimes it is eager promises that tion cost) are negligible when compared to total execution provide majority of the speedup, as demonstrated by the time. For all performance runs (Figure 2, Figure 3, and Fig- execution times of shootout’s binary-trees bt benchmark. ure 4) we ran each benchmark five times and plot an aver- 6.4 Vector Access Inline Cache age over all runs with 95% confidence intervals. Since the amount of jitter was small, and since all these figures are For our vector access optimizations to be effective we rely plotted on a logarithmic scale, confidence intervals are only on inline caches to remain stable. Therefore we counted the really visible in Figure 2. We use GNU R version 3.2.4, a number of inline cache entries for replace and extract op- version of FastR compatible with GNU R 3.2.4, Renjin ver- erations for the b25 and shootout benchmarks. In total we sion 0.8.2050, and TERR version 4.1.1. We use jdk1.8.0 60 found 96 replace operations and 1182 extract operations. 74 for FastR and Renjin executions. We evaluate different R (∼77%) of the replace and 1127 (∼95%) of the extract op- implementations in their default configurations, much like erations required only a single entry in the inline cache. We those that would be most likely used by end users, with the counted two entries for 16 (∼17%) replace and 43(∼4%) ex- only exception being setting 6G max heap size for Renjin tract operations, three entries for two (∼2%) replace and 6 and FastR runs. We did, however, also ran a configuration (∼0.5%) extract operations and four entries for two (∼2%) where GNU R, TERR, and FastR use the same version of replace and three (∼0.3%) extract operations. The main ob- native LAPACK and BLAS libraries 11 and the results were servation here is that when running the benchmarks, the in- virtually the same as with the default configuration runs. line cache has never overflowed (in fact, the fifth and last entry in the cache was unused for all operations), and none 6.3 Lazy Evaluation Optimizations of the operations used in the benchmarks required the run- Lazy evaluation overheads are meaningful with respect to time to fall back to the LRU caching technique. the total execution time of an application only if said ap- 6.5 FastR vs the World plication creates a significant number of promises that need to be maintained and evaluated. In particular, applications While understanding the impact of specific optimizations that spent most of the time in native code and/or execut- can be very useful, the real test of our system is its perfor- ing builtin functions, such as those that belong b25’s matcal mance against the current state-of-the-art. Before we pro- and matfunc benchmark groups, are unlikely to suffer from ceed to describing the performance numbers, it is important lazy evaluation performance problems and, consequently, to re-iterate that the current main goal of FastR project is to benefit from lazy evaluation optimizations. In order to es- accelerate execution of R code – in particular, we defer fur- timate optimization potential, we ran each benchmark only ther research on improving performance of native code to re- once, measured the number of promises created throughout lated work. In Figure 3 and Figure 4 we demonstrate speedup its execution, and normalized this number with respect to of GNU R’s bytecode interpreter, Renjin, TERR and FastR, the benchmark’s execution time. As we can see in Figure 1, over “base” GNU R, and plotted on a log scale (again, values the b25’s matcal (mc) and matfunc (mf) benchmark groups greater than 1 indicate speedups and smaller than 1 indicate indeed provide almost no opportunity for performance im- slowdowns). As we can see, GNU R’s bytecode interpreter provement. Consequently, we report lazy evaluation opti- brings modest performance gains, with ∼1.7 average (∼1.5 mization numbers only for the shootout benchmarks and the geomean) speedup over “base” GNU R for shootout bench- prog category of the b25 benchmarks. marks and ∼1.2 average (∼1.1 geomean) speedup for b25 In Figure 2 we present speedup of different promise op- benchmarks. Performance of Renjin is unstable, with signif- timization configurations over the unoptimized case plotted icant speedups for some benchmarks, such as b25’s prog-1 (pr1) and prog-2 (pr2) benchmarks in Figure 4), but also 11 Despite our best efforts we could not make Renjin use the native libraries. dramatic slowdowns, such as for b25’s prog-4 (pr2) bench-

93 aki iue4 ejni loual ornsm bench- some run to unable also is marks: Renjin 4. Figure in mark os hn“ae N on R GNU “base” than worse vrg cr with score average b25 on n at’ vrg peu of speedup average FastR’s and cal ob h ancmeio fFsRi em fperformance, of terms in TERR’s FastR of with competitor main the be to of results from ecmrs(e.g., im- benchmarks by custom used suspect optimized builtins heavily we of utilizes as plementations source), TERR, closed functions. is native (it and builtin to calls ∼ o nyde at efr etrte ERo aver- on TERR then better perform on but FastR age, does bound- only native benchmarks, the not remaining crossing the considering of when However, terms ary. in FastR over vantage ihrsett bs”GUR( R GNU “base” to respect with h ulesceryiflt at’ vrg efrac gain ( performance average R. FastR’s inflate GNU clearly “base” outliers over The considerable speedup are times gains thousands performance into reaching its of slow- significant some no and suffers downs, FastR time, same the At omean). ecmrs,btee emti en costeboard the across means ( geometric even but benchmarks), ∼ ∼ .) htwrsi ERsfvrhr are here favor TERR’s in works What 2.4). 0.xsedpon speedup 208.7x 08 and 30.8x b25 ’s ( mc matfunc shootout and ) ecmrsis benchmarks

shootout speedup over GNU R "base" speedup over GNU R "base" ∼ matfunc 1e+00 1e+01 1e+02 1e+00 1e+01 1e+02 1e+03 ∼ 1e−02 1e−01 1e−01 ( . peus epciey r rubyim- arguably are respectively) speedups, 2.4 mf ’s b25 01 vrg peu goenat (geomean speedup average 20.1x regex-dna matrix ecmrs vrl,Rni performs Renjin Overall, benchmarks. ) ∼ ecmrsTR xiisslowdowns exhibits TERR benchmarks ecmr xctos ERappears TERR executions, benchmark . ema)adisaeaespeedup average its and geomean) 0.4 shootout ( mf FASTR TERR RENJIN GNU RBC FASTR TERR RENJIN GNU RBC ∼ ecmrsta oti mostly contain that benchmarks ) . goen– (geomean 3.3 n tcerycretyhsad- has currently clearly it and ) mc1 bt ( rd shootout iue3. Figure and ∼ ecmr n u f5 of out 3 and benchmark ) mc2 2’ matcal b25’s iue4. Figure ∼ . vrg and average 0.7 fn ∼ 57 adgoenat geomean (and 15.7x

57 peu on speedup 15.7x mc3

ecmrs( benchmarks fa mc4 for numbers Performance ∼ efrac ubr for numbers Performance .) Judging 1.3).

and mc5 fr b25 ∼ matfunc ’s . ge- 0.6 mf1 ∼ ∼

mat- kn 7.0) b25 0.6 mf2

94 ma mf3 ihafcso mrvn eoymngmn n per- [15], and rho management memory renamed improving and on Google focus re- by has a up CXXR with picked years several been for cently into dormant it lying refactoring by After implementation started C++. the effort up an clean – to [19] 2008 CXXR in is example One exist. project now is solution. which Truffle-based project a JRuby with byte- the successfully JVM experimenting by lan- the noted of to as Java interface, semantics from code map differ to significantly inter- that challenging bytecode guages is JVM for it the languages, However, targeting other face. in several R, [10], by JRuby for taken example interpreter path JVM-based the a follows and enterprise-grade is clean-room at [3] aimed source Renjin C/C++) closed applications. (in a R is of [22] implementation TERR FastR. to dition third-parties. by developed been Smalltalk- have a language, [14], like SOM and [26] Python of implementations Truffle-based [27]. Ruby and JavaScript Truffle-based for developing implementations is Labs Oracle FastR, to addition In Work Related 7. which but R. GNU moment, than the faster 3x cannot at to up detail we executes in FastR which evaluate applications and production describe internal the on idiosyncrasies. its seman- all language with R tics full support we that considering pressive shootout

nadto o“ue N ,afwdrvtvso this of derivatives few a R, GNU “pure” to addition In ad- in R of implementations alternate several are There was far so focus our programs, R larger of terms In mf4 nb b25 mf5

ecmr suite benchmark pd

pr1 suite benchmark rd pr2

pr3 rc

pr4 sn

pr5 AVG AVG

GEO GEO formance using LLVM [12]. pqR [16] (‘pretty quick R‘) is [7] Y. Futamura. Partial evaluation of computation pro- another GNU R variant that tries to improve on performance cess—anapproach to a compiler-compiler. Higher Or- by various interpreter modifications. The last category of der Symbol. Comput., 12(4), 1999. alternative R implementations are research prototypes that [8] Google. Google’s r style guide. URL https://google. implement only a portion of the full R language semantics. github.io/styleguide/Rguide.xml. These include Purdue-FastR [17], a fast simplified R inter- [9] R. Ihaka and R. Gentleman. R: A language for data analy- preter, and Riposte [21], an experimental R implementation sis and graphics. Journal of Computational and Graphical that focuses on optimizing vector access operations. Statistics, 5(3), 1996. Other frameworks aim to provide similar benefits to Truf- [10] JRuby. JRuby: The Ruby programming language on the JVM, fle/Graal, the most notable being RPython [1], a statically 2016. URL http://jruby.org. typed subset of Python that uses trace-based JIT compila- [11] T. Kalibera, P. Maj, F. Morandat, and J. Vitek. A fast abstract tion to eliminate most of the interpreter overhead. RPython syntax tree interpreter for r. In VEE 2015, 2015. has been used to implement Python and and other dynamic [12] C. Lattner and V. Adve. Llvm: A compilation framework for languages, for example JavaScript and Ruby. lifelong program analysis & transformation. In CGO 2004, There is also a large body of work on optimizing perfor- 2004. mance of lazy evaluations. Arguably the most relevant work [13] J.-W. Maessen. Eager Haskell: resource-bounded execution (in particular, approaches relying on static analysis [5, 25] yields efficient iteration. In ICFP 2002, 2002. are ill-suited for FastR), was done in Haskell and attempts [14] S. Marr and S. Ducasse. Tracing vs. partial evaluation: Com- to bridge the gap between lazy evaluation and eager evalua- paring meta-compilation approaches for self-optimizing inter- tion by optimistically trying eager evaluation [4, 13]. These preters. In OOPSLA 2015, 2015. approaches involve complicated and often expensive meth- [15] K. Millar. rho, 2016. URL https://github.com/ ods of restarting computations upon eager computation fail- kmillar/rho. ure, but their main drawback from FastR’s perspective is that [16] R. Neal. pqr, 2015. URL http://www.pqr-project.org/. they rely on language mechanisms to control side-effects. [17] Purdue. purdue-fastr, 2015. URL https://github.com/ 8. Conclusions allr/purdue-fastr. [18] . Revolution Analytics: a 5-minute his- The R language, with its rich set of features and highly dy- tory. In UseR!, 2014. namic nature, presents unique challenges to language de- velopers, and few traditional compiler techniques can be [19] A. Runnalls. CXXR, 2008. URL https://www.cs.kent. ac.uk/projects/cxxr/. applied if compatibility with the complete set of dynamic language features is needed. In this paper, we showed how [20] L. Stadler, T. Wurthinger,¨ and H. Mossenb¨ ock.¨ Partial escape FastR uses the concepts of inline caching, assumptions and analysis and scalar replacement for java. In CGO ’14, 2014. specialization extensively to extract the stability needed for [21] J. Talbot, Z. DeVito, and P. Hanrahan. Riposte: A trace-driven efficient execution of such a dynamic language. Experimen- compiler and parallel vm for vector code in r. In PACT 2012, tal results show that FastR greatly benefits from this – it can 2012. accelerate R code execution by orders of magnitude while [22] TIBCO. TERR, 2016. URL https://docs.tibco.com/ keeping the original language semantics. products/tibco-enterprise-runtime-for-r. [23] R. Tsegelskyi and J. Vitek. Testr: generating unit tests for r References internals. In UseR!, 2014. URL http://user2014.stat. ucla.edu/abstracts/talks/129_Tsegelskyi.pdf. [1] D. Ancona, M. Ancona, A. Cuni, and N. D. Matsakis. Rpython: A step towards reconciling dynamically and stati- [24] S. Urbanek. R benchmark 2.5, 2008. URL http://r. cally typed oo languages. In DLS 2007, 2007. research.att.com/benchmarks. [2] R. A. Becker, J. M. Chambers, and A. R. Wilks. The new S [25] P. Wadler and R. J. M. Hughes. Projections for strictness language: a programming environment for data analysis and analysis. In Proceedings of the Functional Programming graphics. Chapman and Hall/CRC, 1988. Languages and Computer Architecture, 1987. [3] BeDataDriven. Renjin, 2016. URL http://www.renjin. [26] C. Wimmer and S. Brunthaler. Zippy on truffle: A fast and org/. simple implementation of python. In SPLASH Demo 2013, [4] R. Ennals and S. P. Jones. Optimistic evaluation: an adaptive 2013. evaluation strategy for non-strict programs. In ICFP 2003, [27] T. Wurthinger,¨ C. Wimmer, A. Woß,¨ L. Stadler, G. Duboscq, 2003. C. Humer, G. Richards, D. Simon, and M. Wolczko. One vm [5] K.-F. Faxen.´ Cheap eagerness: Speculative evaluation in a to rule them all. In SPLASH Onward! 2013, 2013. lazy functional language. In ICFP 2000, 2000. [6] B. Fulgham and D. Bagley. The computer language benchmarks game, 2013. URL http://benchmarksgame. alioth.debian.org.

95