MATJUICE: A MATLAB TO JAVASCRIPT STATIC COMPILER

by Vincent Foley-Bourgon

School of Computer Science McGill University, Montréal

ATHESISSUBMITTEDTOTHE FACULTY OF GRADUATE STUDIESAND RESEARCH IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF

MASTEROF SCIENCE

©2016 Vincent Foley-Bourgon

Abstract

A large number of scientists, engineers, and researchers in fields as varied as physics, musicology, biology, and statistics use MATLAB daily as part of their work. These users appreciate the conciseness and expressiveness of the MATLAB language, the impressive number of powerful matrix operations and visualization functions, the easy-to-use IDE, and its interactive environment.

At the same time, the web platform keeps growing and innovating. At the center of this evolution is the JavaScript language. Though it was initially used only for simple tasks in web pages such as form validation, JavaScript is today the driving technology behind extremely powerful and complex applications such as Google Maps, the diagram tool draw.io, and the presentation tool Prezi.

One very desirable property of web applications is their universality; whether it’s the smart phone in our pocket, the laptop on our desk, or the powerful workstation in our lab, all these devices have a modern web browser that can execute an application on the web. The advantage for end-users is that they can use their favorite tools from the device of their choice and wherever they are without fear of compatibility issues. The developers of these applications also benefit by being able to deploy and update applications multiple times per day at a low cost.

MatJuice is a tool to connect MATLAB users to the web: it automatically translates MATLAB code into JavaScript. Scientists need not spend time manually converting their applications to JavaScript, nor become experts in web technologies to publish the fruit of their labor on the web. This thesis will present MatJuice, discuss the challenges of converting from one

i dynamic language to another, how to handle the differences in semantics, and how to make the output code fast.

ii Résumé

Un grand nombre de scientifiques, ingénieurs et chercheurs dans des domaines variés tels que la physique, la musicologie, la biologie et les statistiques utilisent MATLAB dans le cadre de leur travail quotidien. Ces utilisateurs apprécient la concision et l’expressivité du langage MATLAB, le nombre impressionnant d’opérations matricielles et de fonctions de visualisation, la facilité d’utilisation et l’interactivité de l’environnement de travail.

En même temps, le web continue de grandir et d’innover. Au centre de cette fulgurante évolution, on trouve le langage JavaScript. Bien qu’il était initialement utilisé pour des tâches simples telles que la validation d’un formulaire, JavaScript est aujourd’hui la technologie qui fait marcher des applications puissantes et complexes telles que Google Maps, le dessinateur de diagrammes draw.io et l’outil de présentation, Prezi.

Une qualité fort désirable des applications web est leur universalité ; qu’il s’agisse du téléphone intelligent dans notre poche, du portable sur notre bureau ou encore de la puissante station de travail dans notre laboratoire, toutes ces machines ont accès à un fureteur moderne duquel on peut exécuter des applications sur le web. L’avantage pour les utilisateurs est qu’ils peuvent utiliser leurs outils favoris à partir de l’appareil de leur choix et où il le veulent sans crainte de problèmes de compatibilité. Les développeurs de ces applications bénéficient aussi en étant capable de déployer et de mettre à jour leurs applications plusieurs fois par jour et à moindre coût.

MatJuice est un outil qui connecte les utilisateurs de MATLAB au web : il permet la traduction automatique de code MATLAB en JavaScript. Les scientifiques n’ont ainsi pas à passer du temps précieux à faire la conversion manuelle de leurs programmes vers JavaScript, ni à

iii devenir des experts en technologies web pour publier le fruit de leur labeur sur le web. Cette thèse présente MatJuice, discute des défis de la conversion d’un langage dynamique vers un autre, comment gérer les différentes sémantiques des deux langages et comment rendre le code généré rapide.

iv Acknowledgements

Thank you Prof. Laurie Hendren for helping me make this project as good as it could be. Thank you for your patience, your constant encouragement and for fostering a friendly and relaxed atmosphere in your lab.

Thank you Prof. Clark Verbrugge for taking time out of your sabatical to review my thesis.

Thank you members of the Sable lab for the work we did together, and more importantly for the laughs we shared: Erick Lavoie, Hanfeng Chen, Prabhjot Sandhu, Rahul Garg, Faiz Khan, Sameer Jagdale, Lei Lopez, Valerie Saunders Duncan, Sujay Kathrotia, Vineet Kumar, Matthieu Dubet, Alex Krolik, Ismail Badawi, Xu Li.

Thank you students of COMP-520 and COMP-302 that I had the pleasure to T.A. for your hard work and for making me experience the most rewarding job I ever held.

Merci Maman et Papa pour votre support inconditionnel et votre amour.

v vi Table of Contents

Abstracti

Résumé iii

Acknowledgementsv

Table of Contents vii

List of Figures xi

1 Introduction1 1.1 Contributions...... 3 1.2 Organization...... 4

2 Background5 2.1 McLab...... 5 2.2 Tamer...... 6 2.2.1 TameIR...... 7 2.2.2 Analysis framework...... 8 2.2.3 Analyses...... 8 2.3 JastAdd...... 9 2.4 JavaScript...... 9 2.4.1 Typed arrays...... 10 2.4.2 sweet.js...... 10

vii 3 MatJuice compiler structure 11 3.1 Overview...... 11 3.2 Data representation...... 13 3.3 TameIR to JavaScript...... 14 3.3.1 Functions...... 14 3.3.2 Assignment statements...... 15 3.3.3 Flow control statements...... 18 3.3.4 Operators...... 22 3.4 Standard library...... 23 3.4.1 sweet.js...... 26

4 Points-to analysis 29 4.1 Motivation...... 29 4.2 Motivating example...... 32 4.2.1 Input function parameters...... 32 4.2.2 Aliasing statements...... 33 4.2.3 Output function parameters...... 34 4.2.4 Summary...... 35 4.3 Points-to analysis...... 36 4.3.1 Transformation...... 39 4.3.2 Analysis-transformation loop...... 40 4.3.3 Output Parameters Copy...... 41

5 Rich points-to abstraction 43 5.1 Motivating example: part two...... 44 5.2 Analysis components...... 46 5.3 Implementation in MatJuice...... 51 5.3.1 Representation of points-to sets...... 51 5.3.2 Memory site cache...... 51 5.3.3 New TameIR statement...... 52

6 Copy insertion 55

viii 6.1 Copy insertion...... 55 6.2 Points-to-based Copy Insertion...... 57 6.2.1 General process...... 57 6.2.2 Local variables...... 58 6.2.3 Output parameters...... 59 6.3 Copying input parameters...... 60 6.3.1 Analysis...... 60 6.3.2 Transformation...... 60 6.3.3 Possible improvements...... 61 6.4 MatJuice implementation...... 62 6.4.1 No implicit returns...... 63 6.4.2 Local variables and output parameters...... 63 6.4.3 Input parameters...... 64

7 Copy insertion evaluation 65 7.1 Instrumentation and methodology...... 66 7.1.1 Instrumentation of MatJuice...... 66 7.1.2 Instrumentation of GNU Octave...... 67 7.1.3 Benchmark suite...... 67 7.1.4 Methodology...... 67 7.2 Naive copy vs. copy insertion...... 68 7.3 Comparison with Octave...... 71 7.4 Conclusion...... 72

8 Performance evaluation 73 8.1 Benchmarks...... 73 8.1.1 Executing the benchmarks...... 73 8.2 Experimental setup...... 74 8.3 Results...... 75 8.3.1 High-performance numeric routines...... 76 8.3.2 Costly array accesses...... 77

ix 9 Related Work 81 9.1 Other MATLAB compilers...... 81 9.1.1 MATLAB Coder...... 81 9.1.2 FALCON...... 82 9.1.3 Mc2For...... 82 9.1.4 MiX10...... 82 9.1.5 Differences with MatJuice...... 83 9.2 Compilers targetting JavaScript...... 84 9.3 Numerical libraries for JavaScript...... 84 9.3.1 McNumJS...... 85 9.3.2 Ndarray...... 85 9.3.3 numeric.js...... 85

10 Conclusion and Future Work 87 10.1 Conclusion...... 87 10.2 Future work...... 88

Bibliography 91

x List of Figures

3.1 Overview of the MatJuice workflow...... 12 3.2 Writing into an array...... 16 3.3 Reading from an array...... 16 3.4 Function calls...... 17 3.5 Copy statement...... 17 3.6 Translating return from MATLAB to JavaScript...... 19 3.7 If/else statement...... 20 3.8 While statement...... 20 3.9 For loops...... 21 3.10 Translation of for loops...... 23 3.11 Operator transformation...... 23 3.12 Function to operator mapping...... 24 3.13 Implementation of the plus() function between an array and a scalar.... 26 3.14 Macro-expanded implementation of plus()...... 27

4.1 Value vs reference semantics...... 30

5.1 Ordering of two sets of triples...... 50 5.2 Lattice for Listing 5.1 ...... 50

7.1 List of benchmarks...... 68 7.2 MatJuice without copy insertion (Naive) vs. MatJuice with copy insertion (CI) 69 7.3 Total and average size of copies...... 70

xi 7.4 Number of array copies performed at run-time † The figures in this column come from [1]...... 71

8.1 Hardware and software details...... 75 8.2 Benchmark results (times in seconds)...... 76 8.3 Inlined translation of TIRStmtArrayGet...... 78 8.4 Inlined array accesses (times in seconds)...... 79 8.5 Profile outputs of bubble, makechange and matmul ...... 80

xii Chapter 1

Introduction

MatJuice is a compiler project to bridge together two different worlds: the world of scientists and engineers who use MATLAB to solve numerical problems in their fields and the world of the web, where JavaScript is the lingua franca. MatJuice is a new backend for the McLAB [2] compiler framework and is able to translate MATLAB code into JavaScript code.

The vision for this project is to enable MATLAB users to make their programs available on the web without requiring them to become experts in web technologies nor having to manually port their code to JavaScript. Most consumer-level computing devices these days – smart phones, tablets, laptops, workstations – have a web browser which allows their users to instantly access applications without having to install any other piece of software on their machine. This makes the web an excellent platform to make a project available to billions of potential users.

Another goal of this project is to leverage the countless hours of engineering that have gone into making JavaScript implementations as fast as possible. For example, it was demonstrated in Khan et al. [3] that numerical code in JavaScript that used typed arrays is on average no more than twice as slow as native code written in C.

MatJuice can also be used to integrate MATLAB components in a larger web project. For example, one may decide to use MATLAB to perform scientific computations and then use

1 Introduction

the D3.js [4] library to render the results in a web page.

Although one may think that a translation from one dynamic language to another – as is the case here – should be relatively easy and straight-forward, one quickly discovers a number of issues that render the task more complicated than initially believed. The author of this thesis was certainly surprised by the number of difficulties that can be easily missed or overlooked. The main challenges encoutered during the project were:

Value semantics: arrays have value semantics in MATLAB while in JavaScript they have reference semantics; this difference is so important and pervasive that we dedicate three chapters of this thesis explaining how MatJuice addresses this issue correctly and with minimal loss of performance.

Standard library: the MATLAB standard library contains hundreds of functions for ma- nipulating scalars, vectors, and matrices; many of these functions have different behaviours depending on the number if input arguments (e.g. the ones functions creates a square matrix of 1s if passed only one argument; with two or more, the input arguments determine the dimension and shape of the matrix). Porting these functions to JavaScript can be difficult and tedious.

Performance issues: JavaScript was not conceived as a language to perform heavy numer- ical calculations and so it is important that the generated code and the library code use JavaScript features that can yield good performance for numerical code. One important way to get better performance for array computations is to use typed arrays.

Operational differences: a number of base operations have different behaviours in MAT- LAB and JavaScript. Two examples of such differences are: MATLAB arrays are one-indexed while JavaScript arrays are zero-indexed, and writing to an index out of bounds in MATLAB grows the array while in JavaScript it throws an exception.

JIT variance: JavaScript implementations have just-in-time compilers (JIT) that are sensi- tive to a number of parameters such as the number of local variables, the number of instructions in a function, etc. Creating code that would be optimized given a JIT is a hard problem, and one not tackled by MatJuice; we simply generate “normal-looking”

2 1.1. Contributions

JavaScript.

1.1 Contributions

In addition to a working MATLAB-to-JavaScript compiler, the MatJuice project brings forth the following contribution:

Copy insertion transformation: MATLAB has value semantics, JavaScript has reference semantics; rather than copying arrays every time they are passed to functions or assigned to other variables, MatJuice is able to bridge the semantic gap of the two languages by inserting copies for array variables when necessary, resulting in code that can run faster.

Intra-procedural points-to analysis: the transformation above is made possible by a novel dataflow analysis that adopts an unusual trick: rather than considering the semantics of the source language, it computes points-to information using the semantics of the target language.

Copy insertion performance analysis: we investigate the performance impact of our copy insertion transformation against two other techniques for achieving copy semantics: a naive copy-everything approach (compile-time technique) and a copy-on-write approach (execution-time technique).

JavaScript MATLAB library: A number of important MATLAB functions have been man- ually re-implemented in JavaScript. JavaScript’s prototypical object-oriented model is used to unify the treatment of scalars and arrays, while the sweet.js macro system is used to abstract over common array operations without incurring the cost of extra function calls.

Performance benchmarks: we compare the performance of MatJuice-generated JavaScript running in Firefox and Chrome with MathWorks’ MATLAB.

3 Introduction

1.2 Organization

This thesis is organized as follows.

Chapter2 : we present the tools and technologies that MatJuice builds upon, including the McLAB project and the Tamer framework.

Chapter3 : we present the overall structure of the compiler; in particular we explain how the different TameIR statements are converted into JavaScript.

Chapter4 : we present a simplified version of MatJuice’s points-to analysis, a way to compute which variables might point to the same memory locations. This chapter is useful for readers who are not familiar with the semantic differences of MATLAB and JavaScript or with data-flow analyses. Readers knowledgeable in these topics may safely skip it.

Chapter5 : we present the complete points-to analysis that MatJuice uses; we explain the analysis using sets and discuss how and why maps have been used in the Java implementation.

Chapter6 : we explains how the points-to analysis results are used to insert copies in a MATLAB function when necessary in order to preserve value semantics.

Chapter7 : MatJuice’s copy insertion transformation is put to the test; we examine the speed-up gained over a naive approach. We also compare the number of copies performed during execution between MatJuice and Octave.

Chapter8 : we evaluate the performance of the programs produced by MatJuice in Firefox and Chrome and compare them against MathWorks’ MATLAB 2015b.

Chapter9 : we present some related work, notably other MATLAB backends and other projects that target JavaScript.

Chapter 10: we summarize the major points of this thesis and offer some suggestions of future work.

4 Chapter 2

Background

In this chapter we present some of the tools MatJuice uses and builds upon to compile a MAT- LAB program and we discuss some of the design decisions we have made in implementing MatJuice. In Section 2.1 we present the McLAB project and in Section 2.2 we look into more details at one component of McLAB, namely the Tamer framework. Section 2.3 present JastAdd, a domain-specific language for creating and manipulating abstract syntax trees. In Section 2.4 we talk about some of the salient JavaScript technologies used in MatJuice, in particular typed arrays, data structures available in recent JavaScript implementations that are useful for generating fast numerical code.

2.1 McLab

McLAB [2] is an umbrella project that regroups a large number of compiler-related projects for the MATLAB language. Among these are a front-end for scanning and parsing MATLAB code [5]; the high-level analysis framework McSAF [6]; the lower-level analysis framework Tamer [7]; the code generators Mc2For [8] and MiX10 [9] for generating and X10 respectively; AspectMatlab [10], a source-to-source compiler that extends MATLAB with aspect-oriented constructs; a suite of matrix-related analyses; McLabWeb, a web-based IDE

5 Background

[11], etc.

MatJuice is part of this project and uses many of the components of McLAB:

• MatJuice uses McLAB’s front-end to scan and parse MATLAB source files and report errors if the source program in syntactically invalid;

• MatJuice uses the Tamer framework to implement its points-to analysis described in Chapter4 and Chapter5 ;

• MatJuice uses one of the numerous visitor patterns to implement its copy insertion transformation, described in Chapter6 ;

• MatJuice re-uses some of the analyses created for other back-ends, such as the shape analysis (obtaining the size and dimensions of a matrix at a given program point), in its code generator;

• MatJuice uses the use-def analysis of the McSAF framework in the implementation of its input parameter copy described in Section 6.3.

Had those components not existed, the implementation of MatJuice would have been a much longer and harder endeavour. The Tamer framework, in particular, was instrumental in the creation of MatJuice.

2.2 Tamer

The Tamer framework is part of the McLAB project; it is such an important building block of MatJuice that it deserves its own section.

The two most important components of Tamer for the implementation of MatJuice are its 3-address code intermediate representation and its analysis framework.

6 2.2. Tamer

2.2.1 TameIR

The Tamer intermediate representation, called TameIR, is a structured 3-address code representation; it is structured because rather than being a linear list of instructions with labels and jumps for control flow, it is organized as a tree. TameIR was created with the explicit goal of making it easier to create static compiler back-ends, which is why we have selected to use it for MatJuice. Most TameIR nodes actually extend (through the usual class inheritance mechanism) nodes from the McSAF AST. The main differences between this IR and the McSAF AST are the reduced number of node types and the children of nodes are typically variables or constants rather than being arbitrary expressions. This limited subset of the MATLAB language is especially attractive for writing back-ends: indeed in MatJuice we need only these TameIR nodes to compile MATLAB programs into JavaScript:

• Functions

• Assignment statements

– Array set (e.g. A(i) = e)

– Array get (e.g. x = A(i))

– Call statement (e.g. [a b] = f (x))

– Literal assignment (e.g. pi = 3.14)

– Copy statement (e.g. x = y)

• If/else statements

• While loops

• For loops

• break and continue

• Return statements

7 Background

Some MATLAB constructs are translated into more basic constructs when the TameIR representation of the program is created; for example the switch/case statement becomes a series of if/else statements in TameIR, and short-circuit operators, such as logical “or” (k) and logical “and” (&&), are split into a series of if/else statements. Other statements are helpfully disambiguated. Notably, array accesses and function calls, which have the same syntax and share the same node type in the McSAF AST, are split into two different node types in TameIR: TIRArrayGetStmt and TIRCallStmt. This is extremely useful to generate code in a language where the syntax and semantics for those two constructs are different.

2.2.2 Analysis framework

Tamer builds upon McSAF to provide visitors to traverse a TameIR program and perform data-flow analyses. It is the Tamer framework that is responsible for passing data forward or backward between nodes and for checking that a fixed point has been reached. A number of useful analyses have been implemented using this framework, including Mc2For’s shape analysis [12] (determining the size and dimensions of a mastrix) and Marxist IntegerOkay analysis [9] (determining if a floating-point variable can safely be replaced by an integer variable.)

Tamer also provides simpler visitors (i.e. ones that do not transmit data-flow information between nodes) that are useful for transforming the IR, e.g. adding new nodes.

In MatJuice we use the Tamer analysis and transformation classes for two purposes:

• To perform our novel points-to data-flow analysis;

• To insert copies, when necessary, in the body of a function.

2.2.3 Analyses

MatJuice reuses a number of analyses written for Tamer.

8 2.3. JastAdd

Shape analysis: this inter-procedural analysis examines the source code of the input MAT- LAB program and determines the shape of the variables at each program point. This analysis can tell if a variable is a scalar or an array, and in the case of arrays can tell us the number of dimensions and the size of each dimension. The result of this analysis is used heavily in the MatJuice code generator as will be explained in Chapter3 .

Range analysis: this analysis allows the compiler writer to know statically the possible values that a variable may contain. The MatJuice code generator uses this information to generate for loops and will be explained in more details in Section 3.3.3.

2.3 JastAdd

JastAdd [13] is a tool for specifying and creating abstract syntax trees (AST). The program- mer describes the structure of his AST using a declarative domain-specific language similar to EBNF and a command-line tool consumes this specification and generates Java source files that implement the specification. The resulting Java code contains methods for creating new nodes, modifying existing nodes, listing children nodes, inserting/deleting nodes in the tree, etc. The MatJuice code generator traverses the TameIR representation of the program and uses the methods provided by JastAdd to create a semantically equivalent JavaScript AST.

JastAdd also provides a mechanism to add custom methods to the AST nodes; we use this mechanism to implement a pretty printer. Once the TameIR representation has been converted into a JavaScript AST, we can emit JavaScript code to a file by using our custom pretty printer.

2.4 JavaScript

MatJuice uses JavaScript as its target language; the output code is mostly regular JavaScript code that a programmer would write himself. Section 3.3 gives more details on the translation

9 Background

from TameIR to JavaScript. We do want to point out two JavaScript technologies that are important when generating the output program: typed arrays and sweet.js.

2.4.1 Typed arrays

In JavaScript, arrays are actually objects: an array is an object containing property slots named 0, 1, ..., n − 1. The array can be resized by adding or removing some of these properties, and there is no guarantee that the elements are contiguous in memory. This makes for a flexible data structure at the cost of being less efficient both in memory and in processing time.

Recently, many JavaScript implementations have incorporated typed arrays which are similar to arrays found in languages like C or Pascal: all the elements have the same type (i.e. same width) and are contiguous in memory. Studies have shown that typed arrays can significantly improve the performance of numerical code [3]. We have thus decided to use them in MatJuice. All non-scalars are represented by a one-dimensional typed array; a property called mj_size tells the compiler the number of rows and columns. Row and column vectors are represented as 2D matrices and respectively have the dimensions [1,n] and [n,1]. Like in MathWorks’ MATLAB, matrices in MatJuice are represented in column-major format.

2.4.2 sweet.js

Sweet.js [14] is a hygienic macro system for JavaScript. The MatJuice standard library makes heavy use of macros in order to abstract away repetitive code without incurring the cost of a function call. For example, a number of MATLAB builtin functions accept an array and a scalar and apply the operation between every element of the array and the scalar; by using macros rather than functions, we avoid the overhead of function calls. Section 3.4.1 has more details on this.

10 Chapter 3

MatJuice compiler structure

In this chapter we discuss how a MATLAB program becomes a JavaScript program. Sec- tion 3.1 presents a high-level overview of MatJuice and how its components interact. Sec- tion 3.3 explains how the different TameIR statement types are translated into JavaScript. Section 3.4 presents the MatJuice standard library, how it is is structured and how it is implemented.

3.1 Overview

MatJuice has a pipeline architecture as shown in Figure 3.1; in that diagram the square boxes represent processes, the round boxes represent data, and the paper-like boxes represent on-disk data. The shaded boxes are sub-systems of the compiler chain that are part of MatJuice. This diagram is quite large and complex, so let’s look at the main steps that a MATLAB programs goes through before it becomes a JavaScript program.

Scanning and parsing The user program (called foo.m in the diagram) is fed into the McLAB front-end. The front-end will scan and parse this program, along with its dependencies, e.g. if the function foo calls a function bar, the front-end will also scan

11 MatJuice compiler structure

Figure 3.1 Overview of the MatJuice workflow

and parse bar.m. This process creates an in-memory representation of the program, the McSAF AST.

Tamer The McSAF AST is then passed to Tamer, which produces two data structures: (1) the TameIR for the program, (2) analysis results for each program point.

Points-to analysis Each TameIR function is analyzed individually and points-to informa- tion is computed at every program point.

Copy insertion The copy insertion transformation takes the original TameIR function along with its points-to information and inserts explicit copies of arrays where they are necessary.

Code generation This new TameIR function and the shape and range information are passed to the code generator to generate an in-memory JavaScript AST. The struc-

12 3.2. Data representation

ture of this AST was defined in Javascript.ast and the JastAdd tool created the corresponding Java class definitions.

Pretty Printer The JavaScript AST is passed to the pretty printer to emit the final JavaScript code; the pretty printer is written as an aspect over the JavaScript AST.

Standard library The MatJuice standard library is written in sweetjs, a DSL that adds macros to JavaScript. The standard library is passed into the sjs command-line tool to generate the expanded JavaScript code.

Concatenation The code for the standard library and the JavaScript code for the original MATLAB program are concatenated together and written into the file foo.js.

3.2 Data representation

In MATLAB, scalars and arrays are the same thing: a scalar is actually a 1 × 1 matrix and it can be handled using the same array operations (e.g. indexing) that are used on larger matrices. When translating MATLAB into JavaScript, one viable strategy would be to follow this approach and use typed arrays for all data types.

We did a few performance experiments at the beginning of the project and found that this approach, though simple, would incur a very noticeable performance hit over using regular JavaScript numbers. The slowdowns depended on the programs, but on average the implementation using typed arrays for scalars was around two times slower.

We have decided in MatJuice that scalars would be represented using JavaScript’s Num- ber data type and non-scalars would be represented using typed arrays. The choice of JavaScript’s Number class to represent MATLAB scalars is quite natural since both data types respect the IEEE-754 specification for floating-point numbers [15, 16]. We use JavaScript’s object prototype system to implement array-like methods on numbers (e.g. one can write x = 3; x.mj_size();) so that even if we use two different data structures to represent arrays and scalars, they can be manipulated uniformly in the standard library.

13 MatJuice compiler structure

An important limitation of MatJuice in its current state is that it only supports double- precision floating point numbers. The JavaScript language has no support for single- precision floating point numbers nor for integer types; we have therefore concentrated our efforts on supporting double-precision floating point numbers, the default data type of MATLAB. A future version of MatJuice should include support for different numerical types.

3.3 TameIR to JavaScript

In this section we describe how the different TameIR statements are translated in JavaScript by the MatJuice code generator. For a number of nodes (e.g. break and while), the transla- tion is quite simple and direct, while for others nodes (e.g. return and for) the generator needs to do more work to accomodate the semantic differences of the two languages.

3.3.1 Functions

Function declarations are a little different in MATLAB and in JavaScript. Here are some of the notable differences:

• Local variables in JavaScript need to be declared with the var keyword, otherwise they are hoisted to the global scope.

• MATLAB declares the names and the number of output values for the function; JavaScript has no such declaration.

• If control flow falls off the end of the function body, in MATLAB the output parameters are returned to the caller; in JavaScript, falling off the end of the function yields no value back to the caller.

MatJuice addresses all these issues. During the generation of a function’s body, the local variables and temporaries (variables introduced by Tamer) are stored in a set; once the

14 3.3. TameIR to JavaScript

generation of the statements of the bodies is finished, a declaration for each variable in the set is added at the beginning of the function.

When the code generator reaches a return statement, it examines the output parameter declaration of the current MATLAB function and generates the proper return statement. The calling conventions of MatJuice are explained in more detail in the upcoming sections on the translation of function call nodes and return statement nodes.

Finally, we apply one last small transformation to ensure that the control flow of a JavaScript function never returns by falling off the end of the body: once the MatJuice code generator has finished creating the code for the function’s body, it adds a return statement at the end of the function to ensure that the function always returns the output parameters to the caller. MatJuice does no analysis to determine if that statement can be reached and in some cases this could be dead code.

3.3.2 Assignment statements

Array set

MATLAB users can modify arrays by writing to a single cell or to a sub-array. These assignments are transformed into method calls in JavaScript as shown in Listing 3.2. The mj_set method call is used when all the indices are scalars. Its first argument is the expression to store in the array, and its second argument is a list of indices. We use a list of indices to make mj_set more general and usable in multiple dimensions, rather than having a specialized version for the lower dimensions. The method mj_set is responsible for converting the one- based indices into a zero-based linear array access, for reporting an error if the index is negative, and for resizing the array if the computed index is greater than the current length of the array. The mj_slice_set is used if at least one index is a colon expression, i.e., if the user wants to write multiple values in the array at once. The MATLAB colon operator is translated into the function call mc_colon in JavaScript. We implement mj_slice_set by making repeated calls to mj_set. It is Tamer’s shape analysis that allows the code generator

15 MatJuice compiler structure

to determine which method call to generate.

1 % Matlab// JavaScript 2 A(i) = e ===> A.mj_set(e, [i]); 3 A(i, j) = e ===> A.mj_set(e, [i, j]); 4 A(i:j) = e ===> A.mj_slice_set(e, [mc_colon(i, j)]);

Figure 3.2 Writing into an array

Array get

The dual of the array write operation is the read statement; MATLAB users can either read a single element or a slice of elements from an array. The result of the read operation is stored in a variable. This statement is transformed into a method call in JavaScript as shown in Listing 3.3. Similarly to array writes, the method mj_get is used when all indices are scalars and mj_slice_get is used when at least one index is a colon expression; it is implemented by making repeated calls to mj_get. These methods report an error if the computed index is out of bounds.

1 % Matlab// JavaScript 2 x = A(i) ===> x = A.mj_get([i]) 3 x = A(i, j) ===> x = A.mj_get([i, j]) 4 x = A(i:j) ===> x = A.mj_slice_get([mc_colon(i, j)]);

Figure 3.3 Reading from an array

Function calls

MATLAB function calls are represented in TameIR by a single node, TIRCallStmt. Among the children of this node is a list of identifiers for the left-hand side. In MatJuice, we produce a different code template depending on the number of identifiers in that list as shown in Listing 3.4; by specializing calls with zero or one lvalues, we avoid the creation of an array

16 3.3. TameIR to JavaScript

object and the extraction of its components, thus making those two common cases more efficient. This calling convention is supported by our translation of the return statement described in the next section.

1 % Matlab// JavaScript 2 f(x) ===> f(x); 3 a = f(x) ===> a = f(x); 4 [a b] = f(x) ===> temp0 = f(x); 5 a = temp0[0]; 6 b = temp0[1];

Figure 3.4 Function calls

Copy statement

TameIR has one copy statement, the node type TIRCopyStmt; in MatJuice we have cre- ated a subclass of this node and called it MJCopyStmt. The code generator considers a TIRCopyStmt to be an assignment by reference and MJCopyStmt does an assignment by value.

In MATLAB, variable are copied by doing an element-by-element copy of the right-hand side of an assignment; sometimes, doing a full copy is unnecessary and we can just copy a pointer instead. It is the responsibility of the points-to analysis and copy insertion transformation, described in Chapter5 and Chapter6 respectively, to select the appropriate node type and insert it into the TameIR representation of the function’s body. The code generator itself simply translates each node type as shown in Listing 3.5.

1 % Matlab// JavaScript 2 x = y ===> x = y; // TIRCopyStmt 3 x = y ===> x = y.mj_clone() // MJCopyStmt

Figure 3.5 Copy statement

17 MatJuice compiler structure

3.3.3 Flow control statements

Break and continue statements

The MATLAB control flow instructions break and continue have the same semantics as in JavaScript: the former exits the current loop while the latter terminates the current iteration. We simply generate them as-is in JavaScript.

Return statement

The return instruction works differently in MATLAB and in JavaScript. In JavaScript there are two forms:

• return: return to the caller yielding no value;

• return expr: return to the caller, yielding the value of expr.

In MATLAB, return never takes an argument; its behaviour depends on the signature of the function. If the function has defined no output parameters, the function returns and no value is yielded back to the caller. If the function has one or more output parameters, the function returns and the values contained in the output parameter variables are yielded back to the caller.

Listing 3.6 shows three MATLAB patterns and their translation in JavaScript. The patterns for zero or one output parameters would be covered by the template emitted for multiple return values, however we have specialized them to avoid the cost of creating an array object (and unpacking this array on the caller side) when it’s unnecessary. This calling convention is supported by the function call statement described in the previous section.

18 3.3. TameIR to JavaScript

1 % Matlab// JavaScript 2 function f(x) ===> function f(x) { 3 ...... 4 return; return; 5 end } 6 7 function y = f(x) ===> function f(x) { 8 ...... 9 return; return y; 10 end } 11 12 function [y z] = f(x) ===> function f(x) { 13 ...... 14 return; temp0 = [y, z] 15 end return temp0; 16 }

Figure 3.6 Translating return from MATLAB to JavaScript

If/else statement

A MATLAB if/else construct is transformed into an equivalent JavaScript if/else state- ment. The translation is shown in Listing 3.7. The if/elseif/else statement of MATLAB does not exist in TameIR as it is transformed into a cascade of equivalent if/else state- ments. If var is an array in MATLAB, the test acts as a for-all construct; we translate to JavaScript by making a function call that performs the same test.

While loop

A MATLAB while loop has the same semantics as in JavaScript; if the condition expression is true, the body of the loop is executed. MatJuice simply transforms the loop into a JavaScript while statement. Similarly to the if statement before, if the condition is an array, we call mj_forall to assert that all the elements are non-zero.

19 MatJuice compiler structure

1 % Matlab// JavaScript 2 if scalar ===> if (scalar) { 3 ...... 4 else } else { 5 ...... 6 end } 7 8 if array ===> temp = mj_forall(array); 9 ... if (temp) { 10 end 11 }

Figure 3.7 If/else statement

1 % Matlab// JavaScript 2 while scalar ===> while (scalar) { 3 ...... 4 end } 5 6 while array ===> while (mj_forall(array)) { 7 ...... 8 end }

Figure 3.8 While statement

For loop

The translation of a for loop from MATLAB to JavaScript requires special attention: the semantics of for in the two languages are different and an implementer must be careful to preserve the MATLAB semantics in his translation. In this section, we will look at the different forms a MATLAB loop can have, what the semantics are in each case, and how MatJuice does the translation to JavaScript.

20 3.3. TameIR to JavaScript

MATLAB for loops

For loops in TameIR have two syntactic forms: in the simple one, the user specifies only the lower and upper bound; in the complete form, the user specifies the bounds and also an increment. In the simple form, the increment has an implicit value of 1. Note that in MATLAB loops can have more forms, but the translation from the McSAF AST to TameIR normalizes all loops to those two forms.

1 % Simple for loop 2 for i = 1:10 3 disp(i); 4 end 5 % Output:123456789 10. 6 7 % Complete for loop 8 for i = 0:4:15 9 disp(i); 10 end 11 % Output:048 12.

Figure 3.9 For loops

The code examples in Listing 3.9 illustrate some of the issues that a compiler writer must address.

• The lower and upper bounds define a closed interval, e.g. in the simple loop above, i goes from 1 to 10 inclusively;

• In the simple form, if the upper bound is smaller than the lower bound, the loop body is never executed;

• In the complete form, if the increment is 0, the body of the loop is never executed;

• In the complete form, if the lower bound is greater than the upper bound and the increment is positive, the loop is never executed;

• In the complete form, if the lower bound is smaller than the upper bound and the

21 MatJuice compiler structure

increment is negative, the loop is never executed;

• In the complete form (lo : step : hi), if there does not exist an integer k such that lo+k ·step = hi, then the loop terminates on interation k −1 and the iteration variable never takes the value of hi.

Statically-known increment

In the previous section, we categorized loops as being simple or complete; we can also categorize loops along another axis, namely if the value of the increment is known statically or not. The value of a statically-known increment can be inferred from the source code alone. Knowing statically the value of the increment allows the compiler to select the correct comparison operation; for instance, if it is known that the increment is positive, the code generator can output the correct comparison operator, i.e. <=.

If the value of the increment is not known at compile-time, a series of extra tests are generated in the JavaScript code to select the appropriate comparison operation. If this occurs inside a tight loop, the extra runtime cost can be significant.

Translation

The translation of a for loop with a statically-known increment is quite straight-forward as the code generator is able to choose the correct comparison operation. Listing 3.10 shows the translation of a simple for loop where the increment is 1. In the same figure, the value of complete loop’s increment is not known statically and the code generator must produce extra code to select the proper comparison function.

3.3.4 Operators

The operators in MATLAB such as + or * are actually syntactic sugar for functions; indeed 3 + x and plus(3, x) are completely equivalent. These operations are represented in

22 3.4. Standard library

1 % Simple loop witha statically-known increment 2 for i = 1:10 ===> for (i = 1; i <= 10; i = i+1) { 3 disp(i); disp(i); 4 end } 5 6 % Complete loop witha statically-unknown increment 7 for i = 10:x:0 ===> cmp = function(x,y) { return x <= y; } 8 disp(i); if (x === 0) cmp = function() { return false; } 9 end if (x < 0) cmp = function(x,y) { return x >= y; } 10 for (i = 10; cmp(i, 0); i = i+x) { 11 disp(i); 12 }

Figure 3.10 Translation of for loops

TameIR using function call nodes. However, if during code generation we find a call to a function for which an operator exists in JavaScript, and that the arguments of that call are scalars, we replace the function call with a unary or binary operator as shown in Listing 3.11. Empirical testing showed that inlining the operators improved performance. The full list of functions that are translated to JavaScript operators is available in Table 3.12.

1 x = plus(y, z) ===> x = y + z // if y and z are scalars

Figure 3.11 Operator transformation

3.4 Standard library

The MatJuice standard library consists of a number of MATLAB functions ported, by hand, to JavaScript. MATLAB functions often accept either scalar or array input parameters and behave differently based on the shapes; for example, the MATLAB plus works as follows:

• If the two arguments are scalars, their sum is returned;

• if one argument is a scalar and the other is an array, the scalar is added to every

23 MatJuice compiler structure

MATLAB function JavaScript operator plus + minus − times ∗ mtimes ∗ mrdivide / rdivide / le <= lt < ge >= gt > eq === ne ! == uminus − not !

Figure 3.12 Function to operator mapping

element of the array;

• if the two arguments are arrays, their elements are summed pair-wise; an error is reported if the two arrays’ shapes differ.

MatJuice deals with this issue in two ways: (1) when a function is quite common, e.g. arithmetic and comparison operations, we create a specialized version for all the possible shape combinations. For instance, in the case of plus we have a function for the pair scalar/scalar, one for the pair scalar/array, one for array/scalar, and one for array/array. These are named plus_SS, plus_SM, plus_MS, and plus_MM respectively. (2) For other, less common functions, we have a single function and the function examines at runtime the shape of its input arguments and proceeds accordingly. The code generator has a list of built-ins that should be specialized and generates the correct call expression when reaching a function.

By proceeding in this manner, we allow ourselves to have better performance for very common functions (e.g. arithmetic and comparison operations), while not requiring that all functions be implemented this way.

24 3.4. Standard library

Given the huge number of functions available in MATLAB, we have not implemented them all, instead focusing on common functions and the ones necessary for our benchmarks. These include:

• Arithmetic operations

• Comparison operations

• Logical operations

• Numerical functions (e.g. abs, floor, ceil, etc.)

• Basic matrix functions (e.g. matrix multiplication, transposition, etc.)

• Random number generation functions

One important difference between the MatJuice standard library and the MathWorks MAT- LAB standard library is that the MatJuice library is implemented entirely in JavaScript whereas the MATLAB library functions can call functions written in Assembly, C, or Fortran that are part of a linear algebra package such as LAPACK. This makes many operations much faster in MATLAB than in MatJuice as we’ll see in Chapter8 . If browsers had access to native numerical libraries, the performance of numerical JavaScript programs would improve immensely.

An interesting new project for MatJuice, and indeed for all other backends, would be to create an entire MATLAB standard library written directly in MATLAB. This library would need to define a number of basic operations and implement all other functions in terms of these basic operations (or functions that use only the basic operations). Such a project would ease the creation of a fully-functional backend to the translation of TameIR nodes to a given target language and the implementation of a number of basic operations in the target language.

25 MatJuice compiler structure

3.4.1 sweet.js

The standard library is not written in plain JavaScript, but rather in a super-set of JavaScript called sweetjs that adds hygienic macros to JavaScript. In order to reduce the amount of boiler-plate code during the implementation, especially for functions that we specialize, we have created macros, such as elemwise and pairwise, to abstract away common patterns. The macro elemwise applies a function or an operator to every element in an array while pairwise applies a function between two elements of two arrays.

We have decided to use a macro system rather than higher-order functions because early experiments showed that in tightly nested loops, code that made heavy use of closures was much slower than if the code was inlined.

The code sample in Listing 3.13 shows the implementation of the plus function between an array and a scalar; the body of function creates an output array, calls the elemwise macro that says to iterate over m, adding each element to x and storing the result in out. The macro-expanded code can be seen in Listing 3.14.

1 function mc_plus_MS(m, x) { 2 var out = mj_new_from(m); 3 elemwise(out <= m + x); 4 return out; 5 }

Figure 3.13 Implementation of the plus() function between an array and a scalar

26 3.4. Standard library

1 function mc_plus_MS(m_2, x_2) { 2 var out_2 = mj_new_from(m_2); 3 for (var i = 1, N = m_2.mj_numel(); i <= N; ++i) { 4 out_2.mj_set(m_2.mj_get([i]) + x_2, [i]); 5 } 6 ; 7 return out_2; 8 }

Figure 3.14 Macro-expanded implementation of plus()

27 MatJuice compiler structure

28 Chapter 4

Points-to analysis

In this chapter and in the next, we present one of the core contribution of this thesis, an intra-procedural dataflow points-to analysis. The results of this analysis are used during the code generation phase to insert copy statements when necessary and to leave them out when they can be shown to be unnecessary. In this chapter, we explain the analysis using a simplified and easier-to-understand abstraction; we shall rely on our understanding of programming to do the transformation without being explicit as to how this transformation is to be performed programmatically. In Chapter5 , we will present a richer abstraction that takes into consideration how the transformation will be done by a computer, how this enriched abstraction helps the transformation phase, and the details of how we implemented this rich abstraction efficiently in Java.

4.1 Motivation

One important semantic difference between MATLAB and JavaScript is how these languages perform variable assignment, pass input parameters into functions and return results from functions. In MATLAB, these operations are done by value, meaning that a complete copy of the operand is performed before the operation is made. In JavaScript, arrays and objects are

29 Points-to analysis

passed by reference: the memory locations are passed around rather than complete copies of the objects.

Let’s look at the example of assignment. In MATLAB, the statement B = A means “make a copy of the array A and assign this copy to B.” In JavaScript, the same statement means “take the address referred to by A and assign this address to B.” Figure 4.1 illustrates this difference graphically. This semantic difference affects statements that write into A or B (i.e. mutate the array items). In MATLAB, if a user writes the value v at the first index of B (B(1) = v), the first element of A remains unchanged because the two arrays are stored in different memory regions. In JavaScript however, after performing the equivalent statement (B[0] = v;), fetching the first element of A would yield v because A and B share the same memory for their content.

A 1, 2, 3 A 1, 2, 3

B 1, 2, 3 B

Matlab JavaScript A = [1, 2, 3]; var A = [1, 2, 3]; B = A; var B = A;

Figure 4.1 Value vs reference semantics

To ensure a proper translation from MATLAB to JavaScript, the copy semantics of MATLAB must be respected. The obvious approach is to generate JavaScript code that makes copies of all array variables when they are assigned into other variables or passed in and out of functions. Though simple and obviously correct, this strategy pays a heavy run-time price by making copies even when they aren’t needed. For example, in a matrix multiplication of the form C = mtimes(A, B), the variables A and B are not modified during the multiplication, so copying them is very wasteful, especially if the matrices are large.

Another strategy is to implement a copy-on-write mechanism at run-time: arrays are copied and passed to functions by reference, and when a write statement is about to modify an array variable, a copy is performed if necessary. This approach is used in the MathWorks

30 4.1. Motivation

implementation of MATLAB [17] and in GNU Octave [18]. Its main drawback is that a more complex run-time system must be designed, and many extra operations (i.e. reference count updates) must be performed during the execution of the program. For MatJuice, we have decided against this approach in order to keep the whole system simpler and to avoid introducing extra run-time operations that could prevent the JavaScript JIT from optimizing a function.

Lameed and Hendren [19] proposed an inter-procedural analysis to remove unnecessary array copies; their code generator did the naïve approach of inserting copies everywhere and a subsequent transformation removed the copies that were proven to be unnecessary. This inter-procedural analysis was implemented in the JIT compiler for MATLAB, McVM [20].

In MatJuice, we propose a new and novel approach to this problem: we assume that MATLAB has the reference semantics of JavaScript, our target language, and we generate code without any copy statements. We then use an intra-procedural points-to analysis to compute the aliasing relationships that exist between the different variables of a function and insert the copy statements that are necessary to obtain the value semantics of MATLAB. We’ve implemented our analysis in an ahead-of-time compiler, but it could certainly be implemented in a JIT compiler as well where the quick speed of intra-procedural analyses is desirable.

In the implementation of our points-to analysis and copy insertion transformation, we aim to achieve a number of goals. The first and most important goal is that after applying the transformation, the JavaScript program must respect the MATLAB copy semantics. A second goal is that our transformation should not add an excessive number of copies; in addition to the copies necessary to respect the MATLAB semantics, a number of copies may be conservatively inserted when it is impossible to determine statically if a copy truly is needed or not. We would like to keep the number of such extraneous copies to a minimum. Finally, we want the analysis to be intra-procedural: such analyses are cheaper in time and memory than inter-procedural analyses and allow the programmer to reason about functions in isolation.

31 Points-to analysis

4.2 Motivating example

In this section we will look at a MATLAB function, assume it has the reference semantics of JavaScript, and observe where we need to insert copy statements to ensure that at the end the function has the copy semantics of MATLAB.

1 function y = f(A, B) 2 if B(1) < 0 3 B(1) = -B(1); 4 end 5 C = A; 6 C(1) = 2*C(2); 7 if A(1) 8 y = C+B; 9 else 10 y = A; 11 end 12 end

Listing 4.1 Initial function

4.2.1 Input function parameters

Let us first address input function parameters. In Listing 4.1, due to our pass-by reference assumptions, the write statement into B (line 3) would cause the array argument in the caller to also mutate. We avoid this situation by making a copy of B before we write to it. An obvious place to insert such copy is at the beginning of the function. However, we notice that the write statement occurs within a conditional statement (lines 2 to 4) with no else branch: we can avoid the execution of unnecessary copies at run-time by inserting the copy inside the if block.

The complete details of input parameter copy are discussed in Section 6.3.

32 4.2. Motivating example

1 function y = f(A, B) 2 if B(1) < 0 3 B = copy(B); 4 B(1) = -B(1); 5 end 6 C = A; 7 C(1) = 2*C(2); 8 if A(1) 9 y = C+B; 10 else 11 y = A; 12 end 13 end

Listing 4.2 After input parameter copy

4.2.2 Aliasing statements

Aliasing refers to a situation where at least two variables refer to the same memory location. When aliasing is combined with mutation statements, such as array write statements, modi- fications through one variable are also visible through the other variables that point to the same memory. To respect the MATLAB semantics, when a write is performed to a variable that is possibly aliased, copies are inserted after all the statements where that variable could have become aliased.

In Listing 4.2, the statement C = A; (line 6) is an aliasing statement: after its execution the variables A and C point to the same memory location (again, we assume reference semantics). The next statement (line 7) is a write statement that modifies C. We don’t want that modification to be visible from A. To prevent this situation, we add a copy of C after line 6.

33 Points-to analysis

1 function y = f(A, B) 2 if B(1) < 0 3 B = copy(B); 4 B(1) = -B(1); 5 end 6 C = A; 7 C = copy(C); 8 C(1) = 2*C(2); 9 if A(1) 10 y = C+B; 11 else 12 y = A; 13 end 14 end

Listing 4.3 After copy insertion

4.2.3 Output function parameters

Let us consider what might happen inside another function calling f; a programmer may write a statement such as Z = f(X, Y). If within f the execution path went through line 12 of Listing 4.3, then inside the caller Z and X would be aliases, because within f the input parameter A and the output parameter y are aliases and we return y by reference. To avoid this situation, if at a return point an output parameter is possibly aliased to externally allocated memory (e.g. an input parameter or a global) or to another output parameter, we add a copy for that parameter.

34 4.2. Motivating example

1 function y = f(A, B) 2 if B(1) < 0 3 B = copy(B); 4 B(1) = -B(1); 5 end 6 C = A; 7 C = copy(C); 8 C(1) = 2*C(2); 9 if A(1) 10 y = C+B; 11 else 12 y = A; 13 y = copy(y); 14 end 15 end

Listing 4.4 After output parameter copy

4.2.4 Summary

The final version of function f (Listing 4.4) now respects the MATLAB semantics:

• The array write into B no longer affects the actual parameter in the caller.

• A and C are no longer aliases when an array write into C is performed.

• The output parameter y is copied to ensure that no aliasing occurs in the caller when assigning the result of f to a variable.

In addition, we’ve been able to avoid making some unnecessary copies:

• B is only copied when the condition on line 2 is true.

• y is only copied when the condition on line 9 is false.

35 Points-to analysis

We have thus been able to achieve the goals stated at the beginning of this section: the copies that were necessary to emulate the value semantics of MATLAB have been properly inserted, a number of copies that the naïve approach would have inserted have been avoided, and we’ve been able to do this by looking only at the body of function f .

4.3 Points-to analysis

In order to determine where to insert copy statements, MatJuice implements an intra- procedural dataflow analysis to track points-to relationships. Each variable is associated with a points-to set, a set containing abstract memory locations that the variable may point to. If a write statement modifies a variable A that is possibly aliased (meaning that there exists at least one other variable B with the same memory site in its points-to set), all the program points that introduced aliasing of A are found and a copy statement is added.

This chapter presents our points-to analysis using a simplified abstraction; this allows us to explain the analysis at a higher-level and using well-known set operators without being concerned with performance issues. In Chapter5 , we discuss how to enrich this abstraction to allow an automated process to insert copies and how MatJuice implements this richer abstraction. Let us first describe the essential components of the analysis.

Approximation We approximate points-to relationships with a set of pairs. The first component of a pair is a variable name and the second component is a memory site. If a variable v may point to more than one memory site, there are multiple pairs in the set whose first component is v.

A memory site is an abstract object that represents one or more zones in memory that have been allocated by a given statement.

Definition Let v be a variable defined at a program point d. We say that at a program point p, v points to a memory site m if there exists at least one path from d to p with no redefinition of v.

36 4.3. Points-to analysis

Direction The points-to analysis is a forward analysis; information is propagated from statement nodes to their successors.

Merge operation A merge node (i.e. a node with multiple predecessors) combines the information from its predecessors with the set union operation (∪).

Starting approximations The out set of the entry node of a function is the set containing pairs mapping all the input parameters to a single, common external memory site, i.e. out(ENTRY) = {(p,EXTERNAL) | p ∈ input_params}. For every other statement

Si, their initial approximation is the empty set, i.e. out(Si) = {}.

Flow equations dataflow information is passed from one node to another by the formula

out(Si) = (in(Si) − kill(Si)) ∪ gen(Si). The definitions of the kill and gen sets are described below.

Assign statement An assignment statement of the form A = B removes all pairs for the variable A from the flow set and adds pairs from A to all the memory sites that B may point to.

• kill(A = B) = {(A,m) | (A,m) ∈ in(Si)}

• gen(A = B) = {(A,m) | (B,m) ∈ in(Si)}

Assign literal An assignment statement of the form A = x where x is a literal removes all pairs for the variable A from the flow set and adds no new information.

• kill(A = x) = {(A,m) | (A,m) ∈ in(Si)}

• gen(A = x) = {}

Function call A function call statement of the form [r1 r2 ... rn] = f (a1,a2,...,ak) removes all the mappings for every variable on the left-hand side, and for each one creates a mapping to a new, unique memory site. (We guarantee that the return values of a function are not aliased.)

• kill([r1r2...rn] = f (a1,a2,...,ak)) = {(ri,m) | i ∈ 1..n,(ri,m) ∈ in(Si)}

37 Points-to analysis

• gen([r1r2...rn] = f (a1,a2,...,ak)) = {(ri,new memory site) | i ∈ 1..n}

Other statements All other statements just let the information flow through, un- changed.

• kill(Si) = {}

• gen(Si) = {}

The simple function in Listing 4.5 is annotated with comments that shows how dataflow information is propagated from one statement to another and how it is merged.

1 function f()%{} 2 A = zeros(10);%{(A, m1)} 3 B = ones(10);%{(A, m1)(B, m2)} 4 if condition%{(A, m1)(B, m2)} 5 C = A;%{(A, m1)(B, m2)(C, m1)} 6 else %{(A, m1)(B, m2)} 7 C = B;%{(A, m1)(B, m2)(C, m2)} 8 end %{(A, m1)(B, m2)(C, m1)(C, m2)} 9 C(1) = 42;%{(A, m1)(B, m2)(C, m1)(C, m2)} 10 end %{(A, m1)(B, m2)(C, m1)(C, m2)}

Listing 4.5 Simple MATLAB program and points-to information

In the example above, we have a function f taking no arguments; therefore the flowset of its entry node is empty. On line 2, we create a new matrix of size 10 × 10; the out flowset of this statement now contains the pair (A,m1), meaning that the variable A may point to a physical memory location represented by the abstract memory site m1. On line 3, another allocation for a matrix is performed. We add the pair (B,m2) to the out flowset of statement 3. Note that we use a new memory site because the physical memory location allocated on line 3 will be different from the physical memory location allocated on line 2.

On line 5, we assign the address of A to the variable C; A and C now point to the same physical memory location, and we denote this in our flowset by creating pairs between C

38 4.3. Points-to analysis

and all the memory sites of A (in this case, there is only one, m1). An analogous operation occurs on line 7 between B and C.

Line 8 is a merge point: we must combine the flowset computed in the then branch of the if statement with the flowset computed in the else branch. We do this by taking the union of both flowsets, and now at this point we know that A may point to m1, B may point to m2, and C may point to m1 or to m2.

With the points-to information fully computed for every program point in function f , we are now ready to insert the copy statements necessary to obtain the value semantics of MATLAB.

4.3.1 Transformation

The transformation takes a function and its points-to information and produces a new version of the function with the necessary copy statements. It traverses the statements of the function,

looking for array write statements, i.e. a statement of the form “v(i1, ..., ik) = e”. When such a statement is found, the transformer inspects the dataflow information at that program point to see if v is possibly aliased.

Definition 1. A variable v is possibly aliased if there exists at least one memory site m for which the pairs (v,m) and (u,m) exist (with v 6= u). Definition 2. A variable v is possibly aliased to a variable u if there exists at least one memory site m for which the pairs (v,m) and (u,m) exist.

If the array variable v is possibly aliased, we find all the assignment statements involving v (i.e. statements of the form “v = B” or “A = v”) for the memory sites that are aliased and insert copy statements for v after each one of them. (The details on how to efficiently find the assignment statements will be discussed in Chapter5 .)

In Listing 4.5, an array write statement occurs on line 9 on the variable C. In the dataflow information at that point, we see that C is possibly aliased to A (they share the m1 memory site) and to B (they share the m2 memory site). For each of these memory sites, we find

39 Points-to analysis

the assignment statements that introduced that aliasing. The aliasing between A and C is introduced on line 5 and for B and C on line 7. A copy of C is added after both lines:

1 function f() 2 A = zeros(10); 3 B = ones(10); 4 if condition 5 C = A; 6 C = copy(C); 7 else 8 C = B; 9 C = copy(C); 10 end 11 C(1) = 42; 12 end

Listing 4.6 Simple MATLAB program after copy insertion

4.3.2 Analysis-transformation loop

During the transformation phase, once copy statements for a variable have been added, the transformation is terminated and the points-to analysis is run again. This ensures that the transformer always has access to the most up-to-date points-to information. Listing 4.7 shows the points-to information obtained after running the points-to analysis on the function of Listing 4.6.

This ping-pong process between analysis and transformation is guaranteed to terminate: the number of possibly aliased variable monotonically decreases at every iteration. In the worst case, copies will be added after every assignment statement in the function, after which no transformation will occur and the process will terminate.

In Listing 4.7, we can see in the flowset of line 11 that C may point to m3 or to m4, and that no other variable is associated with those memory sites. C is therefore not possibly aliased

40 4.3. Points-to analysis

and it is safe to modify the array it points to.

1 function f()%{} 2 A = zeros(10);%{(A, m1)} 3 B = ones(10);%{(A, m1)(B, m2)} 4 if condition%{(A, m1)(B, m2)} 5 C = A;%{(A, m1)(B, m2)(C, m1)} 6 C = copy(C);%{(A, m1)(B, m2)(C, m3)} 7 else %{(A, m1)(B, m2)} 8 C = B;%{(A, m1)(B, m2)(C, m2)} 9 C = copy(C);%{(A, m1)(B, m2)(C, m4)} 10 end %{(A, m1)(B, m2)(C, m3)(C, m4)} 11 C(1) = 42;%{(A, m1)(B, m2)(C, m3)(C, m4)} 12 end %{(A, m1)(B, m2)(C, m3)(C, m4)}

Listing 4.7 Simple MATLAB program after second run of the points-to analysis

4.3.3 Output Parameters Copy

We can re-use the points-to analysis to find when it is necessary to copy output parameters.

At every return statement, we inspect the dataflow information to determine if an output parameter is possibly aliased to another output parameter or to externally allocated memory, i.e. an input parameter or a global variable that hasn’t been copied yet. If it is, we find the aliasing statements involving that parameter and add the necessary copy statements. We then proceed as described in Section 4.3.2: the transformation is terminated and the analysis is re-executed.

We do not need to handle the case where a function returns when execution reaches the end of the function; as we noted in Section 3.3.1, an explicit return statement is added at the end of every function. Because the function can no longer implicitly return, the transformation for return statements now covers the case where the original function returns by falling off the end.

41 Points-to analysis

42 Chapter 5

Rich points-to abstraction

In Section 4.2.2 and Section 4.2.3, we presented an example MATLAB function, assumed assignment of arrays was done by reference like in JavaScript, and showed where we needed to insert copy statements to obtain the value semantics of MATLAB. However, we did not explain how MatJuice finds where those copy statements should be added; we relied on our understanding as programmers to “know” where to put them. In this chapter, we enrich the abstraction of Chapter4 and in doing so, all the information necessary for adding copies by an automated process will be in place.

The key difference of this richer abstraction is that instead of approximating a set of (v,m) pairs (where v is a variable and m is a memory site), we approximate a set of (v,m,s) triples. The new component, s, is the set of assignment statements that caused the variable v to become aliased with respect to the memory site m.

This new abstraction is presented by means of an example in Section 5.1 and described in detail in Section 5.2. The important points of the implementation in MatJuice are presented in Section 5.3. The transformation that uses this new analysis is the subject of Chapter6 .

43 Rich points-to abstraction

5.1 Motivating example: part two

To get an intuitive feel of how this new, richer abstraction works, we’ll look at the example from Listing 4.5, and explain the new approximations at each program point.

In Listing 5.1, each statement is annotated with the result of the richer points-to analysis. Each set contains triples. The first component of the triple is a variable name; the second component is a memory site that the variable may point to; the third component is a set of statements where that variable became aliased with respect to that memory site. As before, if a variable v may point to multiple memory sites, the set contains multiple triples whose first component is v.

In this example, for simplicity and to fit within the page, the sets of aliasing statements use line numbers to refer to other statements; in the actual implementation, we use pointers to AST nodes.

1 function f()%{} 2 A = zeros(10);%{(A, m1,{})} 3 B = ones(10);%{(A, m1,{})(B, m2,{})} 4 if condition%{(A, m1,{})(B, m2,{})} 5 C = A;%{(A, m1, {5})(B, m2,{})(C, m1, {5})} 6 else %{(A, m1,{})(B, m2,{})} 7 C = B;%{(A, m1,{})(B, m2, {7})(C, m2, {7})} 8 end %{(A, m1, {5})(B, m2, {7})(C, m1, {5})(C, m2, {7})} 9 C(1) = 42;%{(A, m1, {5})(B, m2, {7})(C, m1, {5})(C, m2, {7})} 10 end %{(A, m1, {5})(B, m2, {7})(C, m1, {5})(C, m2, {7})}

Listing 5.1 Rich points-to information

On lines 2 and 3, memory is allocated for A and B respectively. After the execution of line 3, A may point to m1 and has not been aliased yet and B may point to m2 and has not been aliased either, hence the empty set of aliasing statements in both triples.

On line 5, C becomes aliased to A and a new triple is added in the flowset for the variable C.

44 5.1. Motivating example: part two

In addition, we record that it was at that program point that A and C started pointing to the same memory by adding the current statement to the aliasing sets of A and C. On line 7, the same scenario occurs between B and C.

On line 8, a merge point, the set computed in the then branch is combined with the set computed in the else branch. The details of this merge operation are fully explained in Section 5.2. After the merging operation is complete, we obtain the following information:

• A may point to m1 and it became aliased on line 5;

• B may point to m2 and it became aliased on line 7;

• C may point to m1 or to m2; it became aliased with respect to those memory sites on lines 5 and 7 respectively.

On line 9, the content of C is modified. When the transformation phase reaches this point, it looks up the information for C and finds that it is possibly aliased to A via m1 and to B via m2. The sets associated with C indicate where copies of C must be inserted: one copy after statement 5 and one copy after statement 7. Listing 5.2 shows the analysis information once the copies have been added and the analysis executed again.

1 function f()%{} 2 A = zeros(10);%{(A, m1,{})} 3 B = ones(10);%{(A, m1,{})(B, m2,{})} 4 if condition%{(A, m1,{})(B, m2,{})} 5 C = A;%{(A, m1, {5})(B, m2,{})(C, m1, {5})} 6 C = copy(C);%{(A, m1,{})(B, m2,{})(C, m3,{})} 7 else %{(A, m1,{})(B, m2,{})} 8 C = B;%{(A, m1,{})(B, m2, {7})(C, m2, {7})} 9 C = copy(C);%{(A, m1,{})(B, m2,{})(C, m4,{})} 10 end %{(A, m1,{})(B, m2,{})(C, m3,{})(C, m4,{})} 11 C(1) = 42;%{(A, m1,{})(B, m2,{})(C, m3,{})(C, m4,{})} 12 end %{(A, m1,{})(B, m2,{})(C, m3,{})(C, m4,{})}

Listing 5.2 Rich points-to information after copy

45 Rich points-to abstraction

After the execution of the copy statement on line 6, C is associated with a fresh memory site, m3. In addition, statement 5 is removed from the aliasing statements of the triples for A and for C. After the execution of the copy statement on line 9, the same happens between B and C.

On line 10, at the merge point, we obtain a set that says that:

• A may point to m1;

• B may point to m2;

• C may point to m3 or to m4.

When code for this new function is generated, we have the assurance that whether C points to m3 or to m4, it is the only variable that may point to those two locations and thus the assignment of line 11 will not affect the content of other variables. No other copy statements are needed and we now have the MATLAB value semantics.

5.2 Analysis components

In this section, we present our points-to analysis using the same the 6 components from Section 4.3. As this abstraction is more complex than the one from Chapter4 , so are the rules for making information flow between nodes.

Approximation We approximate points-to relationships with a set of triples. The first component of a triple is a variable name v; the second component is a memory site m; the third component is a set of statements where v became aliased with respect to m. The statements where a variable becomes aliased are assignment statements of the form v = u; v becomes aliased with u by pointing to u’s memory site and u becomes aliased because a new variable (v) points to its memory.

If a variable v may point to more than one memory site, there are multiple triples in the set whose first component is v.

46 5.2. Analysis components

Definition Let v be a variable defined at a program point d. We say that at a given program point p, the variable v points to a memory site m if there exists at least one path from d to p with no redefinition of v.

Direction The points-to analysis is a forward analysis; information is propagated from statement nodes to their successors.

Merge operation A merge node (i.e. a node with multiple predecessors) combines the information from two predecessor nodes P1 and P2 in three steps. We shall call “corresponding triples” a triple (v,m,s) in P1 and a triple (v0,m0,s0) in P2 if v = v0 and m = m0.

• If P1 and P2 respectively contain the corresponding triples (v,m,s) and (v,m,s0), the triple (v,m,s ∪ s0) is added to the output set;

• Triples in P1 that have no corresponding triple in P2 are added as is to the output set;

• Triples in P2 that have no corresponding triple in P1 are added as is to the output set.

We can express these rules formally using the equation that follows.

0 0 out(S) ={(v,m,s ∪ s ) | (v,m,s) ∈ out(P1),(v,m,s ) ∈ out(P2)}

∪{(v,m,s) | (v,m,s) ∈ out(P1),(v,m,∗) ∈/ out(P2)}

∪{(v,m,s) | (v,m,s) ∈ out(P2),(v,m,∗) ∈/ out(P1)}

Starting approximations The out set of the entry node of a function is the set containing triples that map the array input parameters to a common external memory site and an empty set of aliasing statements, i.e. out(ENTRY) = {(p,EXTERNAL,{}) | p ∈ input_params}.

For every other statement Si, their initial approximation is the empty set, i.e. out(Si) = {}.

Flow equations In the previous chapter, we defined the out set of a program point by removing a kill set and adding a gen set. We could use the same strategy with our sets

47 Rich points-to abstraction

of triples, however we found that this approach makes the notation long-winded and hard to understand. Therefore, each flow equation is going to define out(S) directly in terms of in(S), with the help of some helper definitions. This style is closer to what a programmer would write in an actual implementation.

Assign statement An assignment statement S of the form A = B creates an output set containing the following triples:

• For every memory m associated with B, we create a triple (A,m,{S}): A may point to any memory site that B may point to and A became aliased at S. The previous triples for A are discarded.

• For every triple (B,m,s), we remove all the statements that involved A from s and we add the current statement S.

• For every triple (v,m,s) where v is neither A nor B, we remove all the statements that involved A from s.

We can express these rules using the follow set equation1:

aliasingStmtsA = [{s | (A,∗,s) ∈ in(S)} memsitesB = {m | (B,m,∗) ∈ in(S)} out(S) = {(A,m,{S}) | m ∈ memsitesB)} ∪ {(B,m,(s − aliasingStmtsA) ∪ {S}) | (B,m,s) ∈ in(S)} ∪ {(v,m,s − aliasingStmtsA) | (v,m,s) ∈ in(S),v 6= A,v 6= B}

Assign literal An assign statement S of the form A = x where x is a scalar literal creates the following output set:

• All the triples (v,m,s) where v is not A are included in the output set and we remove all the statements that involved A from s. 1We use the asterisk (*) to indicate a component of a triple that is unused in an equation.

48 5.2. Analysis components

• No triples for A are included.

aliasingStmtsA = [{s | (A,∗,s) ∈ in(S)} out(S) = {(v,m,s − aliasingStmtsA) | (v,m,s) ∈ in(S),v 6= A}

Function calls A function call statement S of the form [r1 r2 ...rn] = f (a1,a2,...,ak) is similar to the assignment statement except that instead of having a single variable on the left-hand side, we possibly have multiple ones.

• A triple (ri,m,{}) is added to out(S) for every variable ri on the left-hand side. The memory site m is a function of the statement and the variable: given the same statement and variable, we should obtain the same memory site, and that memory site must be unique (i.e. there is no other statement- variable pair that yields the same memory site).

• The triples (v,m,s), where v is not one of the left-hand side variables, from

in(S) are included and we remove all statements involving any of the ri from s.

newMemSite(stmt,var) = unique memory site for the pair (stmt, var) i=n ! [ [ aliasingStmts = {s | (ri,∗,s) ∈ in(S)} i=1

out(S) = {(ri,newMemSite(S,ri),{}) | i ∈ 1..n}

∪ {(v,m,s − aliasingStmts) | (v,m,s) ∈ in(S),v ∈/ {r1,...,rn}}

The function newMemSite is necessary in order for a fixed point to be reached, i.e. that the sets computed in iteration n are all equal to the sets computed in iteration n − 1. If an entirely new memory site was given at every iteration, the fixed point procedure would diverge. We will explain how MatJuice deals with this issue in Section 5.3.2.

49 Rich points-to abstraction

Other statements All other statements S let the information flow through unchanged.

out(S) = in(S)

Ordering Two sets of triples, S1 and S2, are ordered as described in Figure 5.1. S1 is

considered to be smaller than S2 if all the triples in S1 have a corresponding triple in

S2, and if the set of statements of S1’s triples are subsets of the set of statements of

their corresponding triple in S2. The lattice in Figure 5.2 shows the ordering of sets from the example in Listing 5.1.

S1 v S2 =⇒ ∀(v,m,s) ∈ S1 : 0 0 0 ∃(v ,m ,s ) ∈ S2 : v = v0 ∧ m = m0 ∧ s ⊆ s0

Figure 5.1 Ordering of two sets of triples

⊤ . . . {(A,m1,{5}), (B,m2,{7}), (C,m1,{5}), (C,m2,{6})}

{(A,m1,{5}), {(B,m2,{7}), .... (C,m1,{5})} (C,m2,{7})}

{(A,m1,{})}.... {(B,m2,{})}

{}

Figure 5.2 Lattice for Listing 5.1

50 5.3. Implementation in MatJuice

5.3 Implementation in MatJuice

The points-to analysis described in the previous section has been implemented in the MatJuice backend. In this section, we will examine some of the implementation details.

5.3.1 Representation of points-to sets

The points-to analysis of MatJuice does not use sets of triples as described before, but instead a map; the full type of the abstraction is Map>>. This implementation choice was made to make looking up the information related to one variable a O(1) operation rather than a O(n) operation.

This difference in the representation of the approximations changes nothing to the actual anal- ysis. In fact, this map type can be shown to be isomorphic to the type Set>> by writing a pair of functions that can convert from one representa- tion to the other as shown in Listing 5.3.

5.3.2 Memory site cache

As we mentioned in the explanations of flow equation for function calls (Section 5.2), in order to reach a fixed point in the analysis phase, it is necessary that we associate the same memory sites with the output parameters of the call from one iteration to the next. If we fail to do this, the memory sites would keep changing and the analysis phase would never terminate, i.e. reach a point where the sets computed in the previous iteration are equal to the sets computed in the current iteration.

In our description of the flow equation, we used a function newMemSite for this purpose. In MatJuice, the analysis phase maintains a cache to implement this function: the keys are pairs, a variable and a statement, and the values are memory sites. It is necessary to use both a variable name and a statement as a key to support function calls that assign to multiple

51 Rich points-to abstraction

1 // Convert from Map to Set 2 function toSet(map): 3 out = new Set() 4 foreach v in map.keySet(): 5 foreach m in map.get(v).keySet(): 6 out.add(new Triple(v, m, map.get(v).get(m))) 7 return out 8 9 // Convert from Set to Map 10 function toMap(set): 11 out = new Map() 12 foreach triple in set: 13 v = triple[0] 14 m = triple[1] 15 s = triple[2] 16 17 vMemSites = out.getOrDefault(v, new Map()) 18 mStmts = vMemSites.getOrDefault(m, new Set()) 19 20 mStmts.add(s) 21 vMemSites.put(m, mStmts) 22 out.put(v, vMemSites) 23 return out

Listing 5.3 Converting from map to set and from set to map

values, e.g. S : [a b] = f (x): our map will contain one entry for the pair (a,S) and another entry for the pair (b,S).

When we create the output set for a function call statement, we first look in our map for a memory site that was allocated in a previous phase for a given variable for that statement. If such a memory site is found, we use it in our output set; if no memory site is found, we create a new one, add it to the cache and use it in our output set.

5.3.3 New TameIR statement

In the code examples shown so far, we have represented the copy of a variable using a function call. While we could have also represented copies with function calls in MatJuice,

52 5.3. Implementation in MatJuice we have instead decided to subclass the TIRCopyStmt node type and create a new instruction called MJCopyStmt. This has the benefit that the methods defined for TIRCopyStmt such as analysis visitors and pretty printers work for TIRCopyStmt.

In the code generator, a TIRCopyStmt node is translated to JavaScript as a simple assignment (e.g. A = B in MATLAB becomes A = B in JavaScript) while a MJCopyStmt node is translated to A = A.m j_clone(). The method mj_clone is part of the MatJuice standard library and performs a full copy of array.

For the points-to analysis, the flow equation for an MJCopyStmt node is very similar to that of a function call. If an MJCopyStmt node S has the form A = copy(A), then this is the output set generated for that statement:

aliasingA = [{s | (A,∗,s) ∈ in(S)} out(S) = {(A,newMemSite(S,A),{})} ∪ {(v,m,s − aliasingA) | (v,m,s) ∈ in(S),v 6= A}

53 Rich points-to abstraction

54 Chapter 6

Copy insertion

Chapter4 and Chapter5 explained how we can use an intra-procedural data flow analysis to statically estimate which variables might point to the same blocks of memory during the execution of a given MATLAB function. In this chapter, we show how MatJuice uses this information to insert appropriate copy statements to ensure that a translation to JavaScript preserves the copy semantics of MATLAB.

In this chapter, Section 6.1 introduces the invariants that must be respected for a transforma- tion to be correct; Section 6.2 explains how the rich points-to information is used to insert copies for local variables and output parameters; Section 6.3 explains how MatJuice inserts copies for input parameters using an analysis based on DefUse chains; Section 6.4 presents some of the details of the MatJuice implementation.

6.1 Copy insertion

Copy insertion is the compiler phase wherein a MATLAB function is transformed such that the variable copies necessary to respect the value semantics of MATLAB are added before the function is translated to JavaScript. Three invariants hold in the transformed function:

55 Copy insertion

1. The input parameters are not mutated inside the function; writes are performed on copies of the parameters;

2. At all return points, the output parameters point to locally-allocated memory (i.e. not to the EXTERNAL memory site of input parameters and globals) and they are not aliased to other output parameters;

3. The write statements inside the function are performed on local variables that have no possible aliases.

Let’s explain why invariants 1 and 2 are necessary in the context of an intra-procedural analysis and show how this differs from the constraints of a similar inter-procedural analysis.

Consider the following function call: [a b] = f (x). The value semantics of MATLAB assure us that the value of the input parameter x will be the same before and after the call to f ; if f needs to modify x, it must first make a copy of x and leave the original memory untouched. A valid transformation of f must therefore detect if there exists a path where x is mutated and add the necessary copy. An inter-procedural analysis could detect that modifying x is acceptable if it can be shown that x is never used after the invocation of f .

The semantics of MATLAB also assure us that modifying a has no impact on b (and vice-versa), nor on x or any other variable in scope. To ensure that this is the case, the transformation of f should make sure that a and b point to memory allocated within the body of f (to prevent them from pointing to memory that the analysis has no knowledge of) and that they are not aliased to one another. Again, in an inter-procedural analysis, with the knowledge of how a and b are used, the transformation could decide that it’s correct for them to be possibly aliased, but in an intra-procedural analysis where that information is unknown, we must be conservative and make them point to their own memory sites.

56 6.2. Points-to-based Copy Insertion

6.2 Points-to-based Copy Insertion

In this section, we will explain how the points-to information computed in the rich points-to analysis phase is used to insert copies of local variables and output parameters to obtain invariants 2 and 3.

6.2.1 General process

The points-to-based copy insertion transformation is invoked at the beginning MatJuice’s code generation phase. Before a MATLAB function is translated to JavaScript, it goes through the following loop:

1. Apply the points-to analysis to the function;

2. Add copy statements for locals and output parameters to the function;

3. If at least one copy statement was added, 1.

At each iteration, after copies for one variable have been added, we terminate the transfor- mation process and do the points-to analysis again. To explain why, let’s look at the simple example in Listing 6.1.

1 function f()%{} 2 A = zeros(2);%{(A, m1,{})} 3 B = A;%{(A, m1, {3})(B, m1, {3})} 4 A(1,1) = 10;%{(A, m1, {3})(B, m1, {3})} 5 B(2,1) = 20;%{(A, m1, {3})(B, m1, {3})} 6 end

Listing 6.1 Copying one variable at a time

If the copy insertion tried to add copies for all variables at once, it would find that statement 4 writes into A which is possibly aliased to B and therefore it would add a copy after line 3. This copy would break the aliasing relationship that exists between A and B as they now

57 Copy insertion

point to distinct memory locations. However, if the transformation were allowed to continue, by looking at the now stale information at statement 5, it would find that A and B are still possibly aliased and add a copy for B after line 3, thus inserting an unnecessary operation.

This is why once copies for a variable have been added, we terminate the transformation process and the points-to analysis is re-run: to always have the most up-to-date points-to information to avoid doing copies when we don’t have to.

This looping process of performing analysis and inserting copies is guaranteed to terminate: in the worst case, a copy statement is added after every assignment statement. At that point, no variables will be possibly aliased and no more copies will be inserted.

6.2.2 Local variables

Statements that modify arrays are the points in a function where the difference between value and reference semantics changes the program’s meaning: with reference semantics, the modification of one variable can affect the content of other variables. As the description of invariant 3 suggests, we want the target of an array write to be the only variable to point to its associated memory sites.

The copy insertion transformation traverses the statements of a function; when an array write statement of the form A(index) = expr is found, we inspect the points-to information at that program point. If we find that A is possibly aliased, meaning that it shares at least one memory site with another variable, we insert a copy for A after every statement contained in the set of aliasing statements of the offending memory sites.

In the example in Listing 5.1, a write statement is found on line 9 (C(1) = 42;). The variable C may point to m1 (aliased to A) or to m2 (aliased to B). The set of statements associated with C,m1 was {5}, i.e. C = A;, and a copy of C was added after this assignment. Similarly, a copy statement was added after the assignment C = B; on line 7 resulting in Listing 5.2.

This transformation inserts copies only when it finds a write statement; if some variables

58 6.2. Points-to-based Copy Insertion

are possibly aliased but no write statements is performed on any of them, no copies will be inserted. This is in line with one of our goals, namely that we should not insert unnecessary copies.

6.2.3 Output parameters

When the copy insertion transformation finds a return statement, it inspects the points-to information of the output parameters. If an output parameter p may point to externally allocated memory or is possibly aliased to another output parameter, copies of p are inserted in the function at the program points indicated by the aliasing statements set of the offending memory sites.

As explained in Section 6.2.1, once copies for a variable are inserted, the transformation phase is terminated and the points-to analysis is run again. As the small example in Listing 6.2 shows, after a copy for the output parameter a has been added on line 4, at the return statement on line 6, neither of the output parameters a or b may point to externally- allocated memory, nor are they possibly aliased to one another. This, again, shows why it’s important to obtain fresh analysis results after inserting copies for one variable: adding all copies in one pass would have inserted a second, unnecessary copy statement for b after line 5.

1 function [a, b] = f(x) 2 z = zeros(10);%{(z, m1,{})} 3 a = z;%{(z, m1, {3})(a, m1, {3})} 4 a = copy(a)%{(z, m1,{})(a, m2,{})} 5 b = z;%{(z, m1, {5})(a, m2,{})(b, m1, {4})} 6 return;%{(z, m1, {5})(a, m2,{})(b, m1, {4})} 7 end

Listing 6.2 Breaking aliasing between output parameters

59 Copy insertion

6.3 Copying input parameters

In this section, we describe how to find the input parameters that need to be copied and where the appropriate copy statements should be inserted.

6.3.1 Analysis

Unlike local variables and output parameters, a simple one-pass traversal over the IR is sufficient to find all the statements that write to an array input parameter. A write statement A(i) = e possibly modifies an input parameter p if p is contained in the UseDef set of A.

The analysis computes for each input parameter the set of statements that write to it directly or indirectly. The pseudo-code in Listing 6.3 describes an algorithm to compute that information.

1 analysis_info = new Map() 2 3 foreach input parameter p: 4 write_statements = new Set() 5 foreach statement s: 6 if s has the form "A(i) = e" and p ∈ UseDef(A): 7 write_statements.add(s) 8 analysis_info.put(p, write_statements)

Listing 6.3 Parameter mutation analysis

6.3.2 Transformation

Unlike the copies we made for locals and output parameters, only one copy statement is added for each input parameter that is possibly modified in the function. An input parameter is possibly modified if its associated set of write statements as computed above is non-empty.

60 6.3. Copying input parameters

A copy statement is added to the inner-most block that is a common ancestor to all the statements in the set associated with the input parameter. In addition, this block must not be inside a loop. By putting the copy as deep within the function as possible, we can avoid making unnecessary copies when the dynamic control flow doesn’t go through any of the write statements for a given input parameter. Putting the copies outside of loops is necessary to respect the semantic of the original program.

6.3.3 Possible improvements

In this section, we present two improvements that could help reduce the number of copies of input parameter further. These transformations are not implemented in MatJuice.

Writes within loops

The parameter copy insertion transformation could be modified to insert more than just copy statements. Consider the function in Listing 6.4.

1 function f(A) 2 A = copy(A); 3 while cond 4 A(1) = 0; 5 end 6 end

Listing 6.4 Naïve copy insertion for loops

In this code sample, we have added a copy statement outside the loop, however the copy will be performed even if the loop is never actually executed. A more astute transformation is shown in Listing 6.5.

In this improved transformation, we wrap the while loop inside an if statement with the same condition. We would therefore only copy A if the loop is going to be executed at least once.

61 Copy insertion

1 function f(A) 2 if cond 3 A = copy(A); 4 while cond 5 A(1) = 0; 6 end 7 end 8 end

Listing 6.5 Smarter copy insertion for loops

The major difficulty with this transformation is figuring out if this is a safe transformation to do, e.g. cond may have side effects.

Multiple copies

Consider the function in Listing 6.6: it contains an if statement with 1000 branches. This function writes to A in only two of those branches. Our input parameter copy insertion adds a single copy statement outside the if statement. If those two branches represent a small percentage of all dynamic path executions, a large number of unneeded copies may be performed. A dynamic compiler could determine if it is worthwhile to remove the single copy outside the if block and insert two separate copies in the appropriate branches.

6.4 MatJuice implementation

In this section, we give some of the technical details of the implementation of copy insertion in MatJuice.

62 6.4. MatJuice implementation

1 function f(A) 2 if A(1) = 0 3 A(1) = 42; 4 elseif A(1) = 1 5 A(1) = 43; . 6 . 7 elseif A(999) = 999 8 case_999(); 9 else 10 default(); 11 end 12 end

Listing 6.6 If statement with large branching factor

6.4.1 No implicit returns

If a MATLAB function would return implictly by falling off the end of its body, MatJuice adds an explicit return statement node on that path. This change doesn’t alter the semantics of the function, and making all the return points explicit simplifies the implementation of the transformation that adds copies of output parameters.

6.4.2 Local variables and output parameters

MatJuice uses the visitor pattern interface TIRAbstractNodeCaseHandler provided by Tamer to implement copy insertion based on the results of the points-to analysis. This visitor does a depth-first traversal of a TIRFunction node. Our transformation modifies the function in place rather than creating a completely new one. After copy statements have been inserted, a boolean field, addedCopy, is set when a copy statement has been added and is used to notify the visitor to terminate its traversal and return to the caller.

Our implementation of the Tamer visitor overrides two methods, caseTIRArraySetStmt and caseTIRReturnStmt; these correspond to the statements we said were of interest in Section 6.2.2 and Section 6.2.3. In those overridden methods, we inspect the points-to

63 Copy insertion

analysis information, determine if copy statements need to be added, and add MJCopyStmt nodes at the appropriate program points when required.

6.4.3 Input parameters

To find which input parameters need to be copied, MatJuice combines McLab’s UseDef- DefUseChain analysis with Tamer’s depth-first visitor. At an array write statement, if the variable being modified is in the DU chain of an input parameter p, we add that statement to a list of statements that modify p. This process is performed for every array input parameter.

The transformation inspects the lists associated with each input parameter. If the list is not empty (i.e. the parameter is modified and must be copied), we find the inner-most block in which to enclose the copy. This inner-most block must also not be a loop nor contained inside a loop.

64 Chapter 7

Copy insertion evaluation

In this chapter, we evaluate how well the copy insertion transformation of MatJuice performs. We are interested in two metrics:

Metric 1 the number of copies performed at run-time;

Metric 2 the impact of those copies on execution time.

This chapter is organized as follows. In Section 7.1 we describe how we’ve instrumented MatJuice and GNU Octave to obtain the number of array copies performed during the execution of a benchmark; we also detail our evaluation methodology in that section. In Section 7.2 we evaluate metrics 1 and 2 by having MatJuice generate two output programs: one that performs copy insertion and one that naively copies every array. In Section 7.3 we compare MatJuice’s copy insertion transformation with a copy-on-write strategy; because MathWork’s MATLAB is proprietary software, we will instead compare against GNU Octave 4.0, a free software implementation compatible with MATLAB. Because it is difficult to meaningfully measure the performance impact of a single implementation detail in two different systems, we will only compare the number of copies performed at run-time and not compare the execution times of the two systems.

65 Copy insertion evaluation

7.1 Instrumentation and methodology

In this section, we describe the changes that were made to the implementations of MatJuice and Octave to measure the number of copies, we present our suite of benchmarks, how they were executed and measured, and we give the details of the machine that executed the benchmarks.

7.1.1 Instrumentation of MatJuice

In order to know the number of copies performed during the execution of a benchmark, we have implemented two enhancements in MatJuice.

The first enhancement is in the code generator; a new command-line option, --copy-insertion= has been added to control how the original MATLAB program is translated to JavaScript. If the flag is true (the default), the rich points-to analysis and the copy insertion for input parameters, locals and output parameters are performed. If the flag is false, the code generator does not execute those phases and changes how it generates code:

• A copy of the array input parameters is added at the beginning of a function’s body.

• A MATLAB statement of the form A = B is translated to A = B.m j_clone(), meaning that a full copy of the right-hand side is performed, even if unnecessary.

The second enhancement, was implemented in the standard library: two global counters were added, one to count the number of arrays that were fully copied and one to count the total number of elements that were copied. The mj_clone method updates those counter every time it’s called. To make sure that those counts represent the number of copies as they appear in the program, we have made sure that no library function calls mj_clone (which would inflate the number of copies), and when some did, we modified them to use a different method.

66 7.1. Instrumentation and methodology

7.1.2 Instrumentation of GNU Octave

For Octave, we took the C++ code for version 4.0 and modified the core library. In particular, we’ve instrumented the make_unique method inside liboctave/array/Array.h to count the number of times this method calls the copy constructor of a non-scalar array, i.e. when the number of elements of the array is greater than 1.

When testing this instrumentation, we found that during the initialization of Octave itself, 533 such invocations of the copy constructors were made. Therefore, we subtract 533 from the number reported by our instrumentation to obtain the number of copies performed at run-time during the execution of the benchmark code.

7.1.3 Benchmark suite

We will use the set of benchmarks described in Table 7.1 for our evaluation. We chose these benchmarks because they cover a wide variety of problems from numerical computing. Also, a subset of our benchmarks have been used in the past by Lameed and Hendren [1] for evaluating their inter-procedural copy removal transformation.

7.1.4 Methodology

The benchmarks presented in Table 7.1 were compiled to JavaScript by MatJuice and executed 10 times. Each benchmark accepts a scale factor that can used to control how long a benchmark takes to finish. For some benchmarks, such as bubble or matmul, the scale factor determines the size of the input; for others, the scale factor indicates the number of times to execute the benchmark. We have selected scale factors such that each benchmark ran long enough (at least half a second) so that the measurements are not significantly affected by other processes that may run on the machine at the same time.

A Python script executed each benchmark 10 times and reported the number of copies, the average execution time and standard deviation for each benchmark. The benchmarks were

67 Copy insertion evaluation

Benchmark Source Description babai MATLAB file exchange Compute the Babai estimation for an integer least square problem bubble McLab Bubble sort, a O(n2) sorting algorithm capr Chalmers University Compute the capacitance of a transmission line using finite differ- ence and Gauss-Seidel iteration clos Otter project Compute the transitive closure of a directed graph collatz McLab Test the Collatz conjecture up to a given integer crni Falcon project Compute the Crank-Nicholson solution to the one-dimensional heat equation dich Falcon project Compute the Dirichlet solution to Laplace’s equation fdtd EEK 170 Apply the Finite Difference Time Domain (FDTD) technique on a hexahedral cavity with conducting walls. fft Press et. al Compute the discrete Fourier transform fiff Falcon project Compute the finite-difference solution to a given wave equation lgdr Unknown Compute the normalized, orthogonormal Legendre polynomials Pn(x) for all degrees up to and including n and their first and second derivatives makechange McLab Compute the ways to make change for a given amount using dynamic programming matmul McLab Naïve O(n3) matrix multiplication mcpi McLab Calculate π by the Monte Carlo method nb1d Otter project Simulate the 1-dimensional n-body problem numprime Burkardt and Cliff Count the number of primes up to a given integer

Figure 7.1 List of benchmarks run on a Linux machine running Ubuntu 14.04 LTS; we used Node.js 4.3.0 and GNU Octave 4.0.

7.2 Naive copy vs. copy insertion

In this section, we examine how MatJuice’s copy insertion improves over a naive always- copy strategy. Table 7.2 gives the figures we’ve obtained. The “Naive” columns report the figures when copies are always performed and the “CI” columns report the figures when points-to analysis and copy insertion are performed. The “copies” column shows how many times the mj_clone method was called during execution.

68 7.2. Naive copy vs. copy insertion

MatJuice (Naive) MatJuice (CI) Benchmark (scale) Time (s) Copies Copies/s Time (s) Copies Copies/s Speedup babai (2000) 0.58 ± 0.00 4000 6896.55 0.56 ± 0.01 0 0.00 1.04 bubble (10,000) 3.49 ± 0.03 2 0.57 3.47 ± 0.01 1 0.29 1.01 capr (5) 8.41 ± 0.02 150000 17835.91 7.57 ± 0.01 50000 6605.02 1.11 clos (1) 19.10 ± 0.02 1 0.05 19.37 ± 0.03 0 0.00 0.99 collatz (1,000,000) 2.46 ± 0.05 0 0.00 2.48 ± 0.04 0 0.00 0.99 crni (5) 16.15 ± 0.02 45980 2847.06 15.93 ± 0.04 22990 1443.19 1.01 dich (5) 4.96 ± 0.01 0 0.00 4.97 ± 0.02 0 0.00 1.00 fdtd (1) 19.14 ± 0.59 600 31.35 17.32 ± 0.56 0 0.00 1.11 fft (9) 1.17 ± 0.01 2 1.71 1.16 ± 0.00 1 0.86 1.01 fiff (5) 5.53 ± 0.03 0 0.00 5.52 ± 0.02 0 0.00 1.00 lgdr (1000) 1.02 ± 0.01 3000 2941.18 1.02 ± 0.01 0 0.00 1.00 makechange (2000) 1.00 ± 0.04 2001 2001.00 0.89 ± 0.00 0 0.00 1.12 matmul (400) 1.67 ± 0.01 2 1.20 1.62 ± 0.01 0 0.00 1.03 mcpi (1,000,000) 1.67 ± 0.07 0 0.00 1.67 ± 0.06 0 0.00 1.00 nb1d (5) 6.34 ± 0.05 198202 31262.15 5.40 ± 0.04 0 0.00 1.17 numprime (5,000,000) 4.28 ± 0.12 0 0.00 4.41 ± 0.09 0 0.00 0.97

Figure 7.2 MatJuice without copy insertion (Naive) vs. MatJuice with copy insertion (CI)

The first thing to notice from those figures is that the number of copies when copy insertion is enabled is always lower. For some benchmarks (e.g. bubble, clos, fft, and matmul) it goes from a handful of copies to one or zero; in those cases, the naive implementation copies the array input parameters while the copy insertion implementation analysis is able to determine that a copy is unnecessary. The small number of copies is due to the nature of the benchmarks: some compute a scalar value (e.g. clos, prime) while others don’t copy arrays during their execution (e.g. bubble, matmul).

For other benchmarks such as capr, crni, and nb1d, the difference is more drastic: the number of copies is being reduced by tens of thousands in the more extreme cases. For capr and crni, the reduction in copies comes solely from the input parameter analysis being able to determine that some of the input parameters are not modified during the execution of the function and thus need not be copied. For fdtd, the conversion from MATLAB to TameIR introduces array copies; MatJuice’s points-to analysis is able to determine that these copies are actually read-only (i.e. no write is ever performed on these arrays) and so the copies that the naive strategy introduces are never inserted. For nb1d, both the input parameter analysis and the points-to analysis determine that the copies are unnecessary.

69 Copy insertion evaluation

Some benchmarks, such as babai and lgdr show a large reduction in the number of copies, but this is due to the scale factor that repeats the benchmark many times over. A single execution of babai performs two copies and one execution of lgdr performs one.

Even at this relatively small scale, the effects of the copy insertion transformation are apparent: for fdtd and capr, the smarter translation gives a speedup of 11%, and nb1d obtains a 17% speedup. In the case of crni, the speedup is a more modest 1%. We also observe a correlation between the “copies per second” column and the “speedup” column; the most significant speedups came from benchmarks that were copy-intensive, such as capr, makechange, and nb1d.

We were interested in measuring the speed of copies in JavaScript. To this end, we’ve implemented a small program that times how long it takes to copy an array of double- precision floating point numbers equivalent to the number of elements copied in a single iteration of these four benchmarks. The results are given in Figure 7.3.

MatJuice (Naive) MatJuice (CI) Benchmark Total Mean Total Mean Time (s) capr 31,500,000 1050 10,500,000 1050 0.078 crni 21,146,202 2300 10,575,400 2300 0.041 fdtd 4,735,000 7892 0 N/A 0.031 nb1d 1,383,600 30 0 N/A 0.005

Figure 7.3 Total and average size of copies

The “total” columns report how many double-precision floating point numbers were copied during a single execution of a benchmark (i.e. scale = 1), and the “mean” columns report, on average, how many elements were copied per call to mj_clone. The last column, “time”, is the most interesting data point; we took the difference between the naive total size and the copy insertion total size (e.g. for capr, 31,500,000 − 10,500,000 = 21,000,000) and report the time it takes to copy a typed array of that size. In all four cases, a full copy takes less than 80 milliseconds. This shows that even though copies can be performed very efficiently in JavaScript, generating code that reduces the number of such operations is an important optimization as shown in Figure 7.2.

70 7.3. Comparison with Octave

7.3 Comparison with Octave

In this section, we will look at how MatJuice’s compile-time approach of performing copies compares to a popular run-time approach, namely copy on write. Copy on write is a system where an array is copied when it is written into and only if it is aliased. We have used GNU Octave 4.0 as a comparison point because MathWorks’ MATLAB is proprietary and closed source and thus it is not possible to get the source code and instrument it.

In Table 7.4, we show the number of copies performed at run-time by GNU Octave and MatJuice with and without copy insertion. We have only included the number of copies and not the execution times as it would be impossible to meaningfully compare them.

A number of the benchmarks were used in Lameed & Hendren [1]. In that paper, the authors computed a lower bound on the number of necessary copies, using AspectMatlab, to obtain the value semantics of MATLAB. For the benchmarks we share in common with them, we’ve included the number they computed.

Benchmark (scale) Octave MatJuice (Naive) MatJuice (CI) Lower Bound † babai (1) 2 2 0 N/A bubble (1000) 1 1 2 N/A capr (1) 10002 30000 10000 10000 clos (1) 0 1 0 0 collatz (1000) 0 0 0 N/A crni (1) 4601 9196 4598 4598 dich (1) 2 0 0 0 fdtd (1) 0 600 0 0 fft (1) 1 2 1 1 fiff (1) 0 0 0 N/A lgdr (1) 0 3 0 N/A makechange (1000) 0 2 0 N/A matmul (256) 0 2 0 N/A mcpi (1) 0 2 0 N/A nb1d (1) 8 4601 0 0 numprime (1) 0 0 0 N/A

Figure 7.4 Number of array copies performed at run-time † The figures in this column come from [1]

71 Copy insertion evaluation

We notice in the table above that the number of copies performed by the copy-on-write system and the copy-insertion system are extremely similar: in 11 of the benchmarks, they are identical and in 5 benchmarks copy insertion actually performs fewer copies. Copy insertion is showing itself to be as good as a system that knows at run-time whether an array is aliased or not. In addition, for the benchmarks where we know the lower bound of the number of copies, we see that the number of copies performed by MatJuice with copy insertion is equal. This is a very strong indication of the effectiveness of MatJuice’s technique.

7.4 Conclusion

The figures obtained in this chapter show that MatJuice’s points-to analysis and copy insertion transformation can dramatically reduce the number of copies that need to be performed at run-time, and in fact we’ve seen that MatJuice is able to match the number of copies of a run-time system and matches the lower-bound of copies necessary for a number of our benchmarks. These results tell quite clearly that an intra-procedural approach that adds no extra run-time machinery is definitely suitable for the kind of numerical problems typical of MATLAB. Also, by not needing a reference counting system in the output program, we save CPU cycles and introduce less code that may prevent a JIT compiler from performing aggressive optimizations.

72 Chapter 8

Performance evaluation

In this chapter, we compare the performance of MatJuice-generated JavaScript running in two modern web browsers, Firefox and Chrome, against MATLAB code running in Math- Works’ implementation. In Section 8.1 we introduce the benchmarks that we’ll use for our experiment and discuss the third-party tools that we used to execute them. Section 8.2 gives the details on the hardware and software that we used to run the benchmarks. Section 8.3 presents the results of our experiment and offers explanations on the figures we obtained.

8.1 Benchmarks

We will be reusing the benchmarks from the previous chapter to measure the performance of MatJuice and MATLAB from page 68.

8.1.1 Executing the benchmarks

To execute the benchmarks, we integrated them into Ostrich2 [21]. Ostrich2 is a suite of numerical benchmarks for languages such as C, MATLAB, JavaScript and Python. Ostrich2 builds upon the Wu Wei benchmarking toolkit [22], and offers a number of features that were

73 Performance evaluation

quite useful to benchmark MatJuice; in particular, Ostrich2’s ability to properly launch a web browser, execute a benchmark, obtain the time information, and shut down the browser cleanly was instrumental in running tests efficiently. Ostrich2 is also able to interact with other systems, such as MathWorks’ MATLAB, so we were able to run our entire benchmark completely automatically.

To integrate the benchmarks into Ostrich2, we had to modify them to adopt the conventions of the suite:

• For each benchmark we created a meta-file describing the nature of the benchmark, the auxiliary files necessary for compiling, and the size of the inputs;

• Calls to the timing functions tic and toc were inserted inside the benchmark itself rather than the benchmark being timed by an external tool (e.g. Unix’s time(1) command);

• A print statement was added to the benchmarks to report the execution time in the JSON format that Wu expects;

• A new compiler description file for MatJuice was added.

Once the infrastructure was in place, we used Ostrich2 and Wu Wei to run each benchmark 10 times in Chrome, Firefox and MathWorks’ MATLAB and we used Wu Wei’s report command to obtain the timings. Ostrich2 also supports GNU Octave, however we found its performance too slow to be worthy of inclusion.

8.2 Experimental setup

In the previous chapter, we used Node.js to count the number of copies that each benchmark performed. In this chapter, the JavaScript benchmarks are run directly inside a web browser. We are using Mozilla Firefox 41 and Google 48, which were the latest stable versions at the time the benchmarks were executed. We selected those browsers for a number of reasons: (1) they are cross-platform, running natively on Windows, Mac and Linux, (2) they are

74 8.3. Results

generally considered to be the two fastest browsers on the market at the moment, (3) they are the two most popular browsers on the market at the moment.

For MATLAB, we are using MathWorks’ MATLAB 2015b, the latest stable version at the time we executed the benchmarks. MathWorks’ MATLAB is the reference implementation for the MATLAB language and the most popular MATLAB implementation, so it makes sense that we use it for our tests.

The full software details, as well as the operating system version and the specification of the hardware are summarized in Table 8.1.

Hardware CPU Intel Core i7 @ 3.60GHz (8 cores) Cache size 3 MiB Memory 8 GiB

Software OS Ubuntu 14.04 LTS Kernel Linux 3.16 Firefox 41 Chrome 48 V8 4.8 MathWorks MATLAB 2015b

Figure 8.1 Hardware and software details

8.3 Results

With the details of the benchmarks and the tools used to execute the benchmarks out of the way, let us look at the time Chrome, Firefox and MathWorks’ MATLAB took to execute the benchmarks. The figures are shown in Table 8.2; the times are in seconds and are the average time over 10 executions. We’ve included a “ratio” column along with the timings for JavaScript; a ratio below 1 means that JavaScript was faster than MATLAB and a ratio above 1 means that JavaScript was slower. A cell in magenta denotes a time where JavaScript is

75 Performance evaluation

slower than MATLAB, and a cell in green denotes a time where JavaScript is faster.

MatJuice (Chrome) MatJuice (Firefox) MATLAB Benchmark (scale) Time (s) Ratio Time (s) Ratio Time (s) babai (2000) 1.13 6.27 0.76 4.22 0.18 bubble (10,000) 3.92 2.78 4.20 2.97 1.41 capr (5) 7.63 3.54 7.62 3.54 2.15 clos (1) 20.29 88.21 23.35 101.52 0.23 collatz (1,000,000) 2.60 0.41 1.02 0.16 6.31 crni (5) 16.14 7.57 15.38 7.22 2.13 dich (5) 5.54 4.07 4.08 3.00 1.36 fdtd (1) 24.26 269.55 9.62 106.88 0.09 fft (9) 1.52 4.90 0.97 3.12 0.31 fiff (5) 6.09 3.12 5.37 2.75 1.95 lgdr (1000) 0.45 2.64 0.25 1.47 0.17 makechange (2000) 1.20 3.63 1.15 3.48 0.33 matmul (400) 2.00 2.81 2.04 2.87 0.71 mcpi (1,000,000) 1.47 0.62 1.53 0.64 2.37 nb1d (5) 2.96 2.81 3.64 3.46 1.05 numprime (5,000,000) 3.77 0.08 4.11 0.09 43.53

Figure 8.2 Benchmark results (times in seconds)

The first thing one notices is that out of 16 benchmarks, MathWorks’ MATLAB is faster in 13 of them. Let’s discuss the performance differences between JavaScript and MATLAB and attempt to explain them.

8.3.1 High-performance numeric routines

For many benchmarks, it was expected that MATLAB would be faster: the MATLAB implementation delegates expensive array operations (e.g. matrix arithmetic) to a native library of highly-optimized routines written in compiled languages such as C and Fortran. In JavaScript, the equivalent operations are written in JavaScript itself; as JavaScript is unable to interact with native libraries, it cannot benefit from the effort and engineering that went into high-performance libraries. In a few benchmarks the time difference was quite high:

clos: this benchmark includes the multiplication of two 450 × 450 matrices in a loop;

76 8.3. Results

crni: this benchmark assigns the transpose of a vector into a matrix slice. fdtd: this benchmark has a loop in which a series of matrix additions, subtractions and multiplications are performed;

To get an idea of the importance of high-performance routines on the execution time of the MATLAB benchmarks, we replaced the native matrix multiply in clos with the MATLAB implementation of matrix multiplication from the matmul benchmark. The average time for computing clos in MathWorks’ MATLAB went from an average of 0.23 seconds to 16.3 seconds, a slowdown of 70x!

8.3.2 Costly array accesses

As we discussed in Section 3.3, MATLAB array accesses are transformed into function calls (mj_get and mj_set) in the JavaScript output. In a number of benchmarks, we have found that these basic operations make up a large percentage of the total execution time.

In particular, we have profiled the benchmarks bubble, makechange and matmul. These benchmarks don’t use constructs other than loops and array accesses and one would expect the JavaScript implementations to be competitive with MATLAB. To our surprise, they were all slower than in MATLAB.

We have used Firefox’s developer tools to profile these three benchmarks. We’ve captured screenshots from the profiler’s output in Figure 8.5. For bubble, array access represents 23.46% of the total execution time, for makechange the percentage is 31.24% and for matmul, 24.75%. These figures are very high for such simple and common operations.

Clearly, our strategy for reading and writing into arrays can be a major bottleneck in the performance of our generated code. To address this embarrassing issue, we have implemented an alternative way of translating the TIRStmtArrayGet and TIRStmtArraySet nodes when all the indices are scalars (slicing operations are still done with function calls).

By using Tamer’s shape information, we can translate these statements directly into JavaScript

77 Performance evaluation

1 %A isa3x4 matrix 2 % MATLAB// JavaScript 3 x = A(i, j) ===> x = A[(i-1) + 3*(j-1)] 4 y = A(k) ===> x = A[(k-1)] 5 6 %A isa two-dimensional matrix with unknown shape 7 % MATLAB// JavaScript 8 x = A(i, j) ===> x = A[(i-1) + A.mj_stride()[1]*(j-1)] 9 x = A(k) ===> x = A[(k-1)] 10

Figure 8.3 Inlined translation of TIRStmtArrayGet

array accesses. If the dimensions of the array are known statically, we can directly insert the constants into the expression for computing the index, otherwise we make a call to the mj_stride property to obtain the size of a given dimension. The code in Listing 8.3 shows how TIRStmtArrayGet is translated; the code for TIRStmtArraySet is analogous, but not shown to save space.

With this new translation strategy in place, we re-compiled the benchmarks and ran them again. The updated results are shown in Table 8.4. We can now see that Chrome and Firefox are faster in 10 of the 16 benchmarks, more than half of the suite. In particular, the bubble, makechange, and matmul benchmarks are now all faster in JavaScript than in MATLAB, and they’ve improved over the previous translation strategy by factors close to an order of magniture (8x, 10.9x, and 15.4x for Chrome; 11.3x, 6.4x, and 17x for Firefox).

These new results confirm our intuition that leveraging the hard work that went into the JavaScript JIT compilers and VMs is a valid strategy for getting MATLAB code to run in the browsers with acceptable performance.

78 8.3. Results

MatJuice (Chrome) MatJuice (Firefox) MATLAB Benchmark (scale) Time (s) Ratio Time (s) Ratio Time (s) babai (2000) 1.10 6.11 0.77 4.28 0.18 bubble (10,000) 0.49 0.35 0.37 0.26 1.41 capr (5) 0.86 0.40 1.26 0.58 2.16 clos (1) 20.41 92.77 23.68 107.64 0.22 collatz (1,000,000) 2.66 0.42 1.02 0.16 6.32 crni (5) 7.92 3.70 8.41 3.93 2.14 dich (5) 0.87 0.64 0.31 0.23 1.36 fdtd (1) 23.99 266.56 9.58 106.44 0.09 fft (9) 0.13 0.42 0.10 0.32 0.31 fiff (5) 0.48 0.25 0.88 0.45 1.94 lgdr (1000) 0.44 2.44 0.24 1.33 0.18 makechange (2000) 0.11 0.32 0.18 0.53 0.34 matmul (400) 0.13 0.19 0.12 0.17 0.70 mcpi (1,000,000) 1.43 0.61 1.57 0.67 2.35 nb1d (5) 2.92 2.78 3.67 3.50 1.05 numprime (5,000,000) 3.82 0.09 4.10 0.09 43.56

Figure 8.4 Inlined array accesses (times in seconds)

79 Performance evaluation

(a) Bubble sort

(b) Make change

(c) Matrix multiply

Figure 8.5 Profile outputs of bubble, makechange and matmul

80 Chapter 9

Related Work

In Section 9.1, we present a number of other MATLAB-related projects, mostly other source- to-source compilers and how they influenced MatJuice. In Section 9.2, we present a number of compiler projects that target JavaScript. Finally, in Section 9.3 we look at some JavaScript libraries that provide numeric routines.

9.1 Other MATLAB compilers

In this section, we shall look at other source-to-source compilers for MATLAB.

9.1.1 MATLAB Coder

MATLAB Coder [23] is a commercial product by MathWorks, the makers of MATLAB. It is an ahead-of-time, source-to-source compiler that generates readable and portable C and C++ code. MATLAB Coder can be used to integrate MATLAB code in a C or C++ application, and it can also be used to accelerate computationally intensive functions.

81 Related Work

9.1.2 FALCON

FALCON [24, 25] is an ahead-of-time, source-to-source compiler accepting a MATLAB program as input and producing an equivalent Fortran 90 program. FALCON uses type inference algorithms developed for the array APL and the set language SETL. It takes the MATLAB functions and scripts of the input program and inlines them into a single function. A number of the benchmarks we presented in Chapter7 and Chapter8 were also used to evaluate FALCON, such as dich, fiff, and cnri.

9.1.3 Mc2For

Mc2For [8] is one of the numerous backends available from the McLAB project. Like MatJuice, it is an ahead-of-time, source-to-source compiler; it accepts a MATLAB program as input and produces a Fortran 95 program as output. Similarly to MatJuice, it uses the Tamer framework and TameIR as its intermediate representation of choice. Unlike MatJuice, Mc2For also uses the Tamer+ utility which accumulates the three-address code statements of Tamer and groups them together to regain the original expressions that the programmer wrote. This is extremely helpful to the readability of the output program.

The shape analysis that MatJuice uses was first developed for Mc2For. In particular, a domain-specific language was created to describe the the shape of the input parameters accepted by the different MATLAB built-in functions and the shape of the output parameters.

9.1.4 MiX10

MiX10 [9] is also a backend available under the umbrella of the McLAB project. It is an ahead-of-time, source-to-source compiler that consumes a MATLAB program and produces an X10 program as output. X10 is a language designed by IBM for high-performance computing; the current compiler translates X10 code to Java or C++.

82 9.1. Other MATLAB compilers

One of the contribution of MiX10 is the IntegerOkay analysis. Experimental results showed that when a MATLAB program was converted to X10 and then to C++, the resulting program was slower than if the same program was converted to X10 and then to Java. As this went against common wisdom and experience, the developers investigated and found that using floating point numbers when integers could be used was the source of the slowdown in C++. They thus created an analysis that could tell if it was safe to represent a given variable using an integer rather than a floating-point number.

Since JavaScript doesn’t have integers in its language definition, we have not used MiX10’s IntegerOkay analysis in MatJuice.

9.1.5 Differences with MatJuice

The compiler projects mentioned above and MatJuice differ in a few notable areas.

Target language: MatJuice targets JavaScript, a dynamically-typed language, while the other projects target statically-typed languages (C, C++, Fortran, and X10). One particular point where this difference was helpful to us was in the handling of function parameters: rather than having multiple versions for a scalar parameter and an array parameter, we can use the same function and rely on the common methods that we’ve implemented for the classes Number and Float64Array.

JIT target: The compilers above all target languages that are typically implemented with an ahead-of-time compiler while JavaScript is typically implemented with a just-in-time compiler. This can make the performance of the generated JavaScript more sensitive to variations in the output code (e.g. if the number of statements in a function exceeds the threshold for a particular JIT compiler).

Copy insertion: The projects mentioned before perform a full copy of arrays when they are assigned to other variables; MatJuice distinguishes itself by being smarter and making copies only when they are shown to be necessary.

83 Related Work

9.2 Compilers targetting JavaScript

In the past years, a dizzying number of compilers and compiler-related tools that target JavaScript have been created. A github page [26] maintained by the authors of the Coffee- Script project [27] lists a large number of such projects.

Of these projects, we find a number of new programming languages that were created to improve on the state of the art for web programming:

CoffeeScript [27]: a clean, indentation-based language that maps one-to-one to JavaScript;

TypeScript [28]: a superset of JavaScript by Microsoft that adds optional static typing and class-based object-oriented programming to JavaScript;

Elm [29]: a purely functional programming language for building browser-based user interfaces by using functional reactive programming;

Dart [30]: an object-oriented programming language by Google that compiles to JavaScript.

In the list of projects, we also find a number of backends for existing programming lan- guages, such as emscripten [31] (C, C++), js_of_ocaml [32] (OCaml), ghcjs [33] (Haskell), ClojureScript [34] (Clojure), FunScript [35] (F#), Opal [36] (Ruby), and many others. These projects, like MatJuice, exist to make programs written in those languages accessible in a web browser without requiring a complete rewrite.

Despite the impressive number of projects that target JavaScript, numerical languages are notably absent: we find no tools for MATLAB, Octave, Julia, or R. This is the niche that MatJuice hopes to fulfill.

9.3 Numerical libraries for JavaScript

When MatJuice was first started, we found no numerical library upon which to build and thus wrote our own implementations of the necessary MATLAB built-ins. Since then, a

84 9.3. Numerical libraries for JavaScript number of projects have appeared to offer better and faster numerical routines to JavaScript.

9.3.1 McNumJS

McNumJS [37] is a numerical library developed at Sable lab that aims to be easy to use by providing an API similar to that of NumPy. It also tries to be performant by using typed arrays and using patterns of asm.js to gain an extra boost of performance. By using these techniques, McNumJS was faster than other numerical libraries by a factor of at least 1.8.

9.3.2 Ndarray

Ndarray [38] is a modular multi-dimensional array library for JavaScript. At its core, it provides the fundamental methods for reading and writing an element in an array, retrieving information on the array (e.g. dimensions, number of elements); a growing ecosystem of numerical methods adds more advanced operations to those arrays for performing linear algebra, calculus, or graphic operations. Ndarray uses typed arrays to have good performance. It also features methods to specify whether a matrix should be represented in row-major or column-major format.

9.3.3 numeric.js

Numeric.js [39] is a library that allows a programmer to perform sophisticated numerical operations in the browser. Numeric.js offers interesting functionality, such as the workshop, a MATLAB-like interactive environment that can be used for plotting. Due to its age, matrices in numeric.js are represented using regular arrays instead of typed arrays.

85 Related Work

86 Chapter 10

Conclusion and Future Work

10.1 Conclusion

We have presented in this thesis MatJuice, an ahead-of-time, source-to-source compiler from MATLAB to JavaScript. By leveraging tools from the McLAB project and from the Tamer framework, we are able to automatically translate MATLAB programs into equivalent JavaScript programs. Such a tool is useful to scientists and engineers who want to integrate their work into a web application without having to manually port their code to JavaScript.

The MatJuice compiler, as part of its transformation process, features a novel points-to analysis and the associated copy insertion transformation to emualte MATLAB’s copy semantics without performing a full copy of an array at each assignment (Chapter4 , Chapter5 , Chapter6 ). For every function in a program, MatJuice is able to correctly determine at every program point if a full array copy is necessary or if it isn’t. We have found that this intra-procedural approach is, predictably, always better than a naive approach that inserts copies everywhere, both in the number of copies performed at run-time and in the execution time. More interestingly, we have found that it is as good, and in some cases better, than the copy-on-write system of GNU Octave, a free software implementation

87 Conclusion and Future Work

compatible with MATLAB. In fact, we have found, when comparing our transformation with previous work, that MatJuice is often (and in the case of our benchmarks, always) able to insert the minimum number of copies necessary to respect the MATLAB semantics (Chapter7 ).

In the benchmarking chapter (Chapter8 ), we compared the execution time of 16 numerical programs; when a simple translation strategy for accessing arrays was used, we found that the MatJuice-generated JavaScript was slower than MathWorks’ MATLAB in 13 out of 16 benchmarks. We addressed this problem and implemented a smarter translation of the Tamer array access statements. With the updated code generator, we found that the MatJuice- generated JavaScript was faster than MATLAB in 10 out of 16 benchmarks. We also saw that in many benchmarks, MATLAB’s ability to call into native libraries was instrumental to its performance.

10.2 Future work

In this section, we will offer suggestion of work that one could do with MatJuice and McLAB, so that the project may become even better than it currently is.

Support for more data types: at the moment MatJuice supports only double-precision floating-point numbers as this is the only type natively available in JavaScript. Al- though floats represent the vast majority of use cases in MATLAB, it would be good to add support for integer types and for complex numbers as well.

More complete library: the number of built-in functions available to MatJuice is small in comparison to the thousands of functions available in MATLAB. We implemented only the common arithmetic and logical operations as well as the matrix routines required for running our benchmarks. To support a greater number of programs, more built-in functions are needed. These functions could either be written directly in JavaScript or be adapted from an existing numeric library (such as ndarray). This latter option is especially attractive: the interest in numeric computing in JavaScript is rising and so

88 10.2. Future work

does the number of contributors to ndarray.

Alternatively, the matrix operations could be written in MATLAB using a basic MAT- LAB subset and compiled by MatJuice. This latter approach would also benefit other McLAB backends by lowering the effort required to support many operations.

Liveness analysis: our points-to analysis does not currently consider liveness (i.e. if the current value of a variable is still needed) to perform copy insertion. If liveness information was combined with our points-to information, it would be possible to determine that even if two variables point to the same memory, since one is dead, a copy is not necessary.

In addition to the work suggested above, JavaScript and the web evolve at an extremely fast pace and many new technologies emerge that could be of interest for MatJuice. One such technology is WebAssembly [40], a low-level language designed as a target for compilers. The goal of WebAssembly is to make web applications running in the browser competitive in performance with native applications. At the time of writing this thesis, WebAssembly is still in the experimental stages, but shows promise and the makers of the major browsers (Mozilla, Google, Apple, and Microsoft) all seem enthusiastic about this new technology.

Another technology that would be a huge game changer for compilers of numerical languages that target the web would be a high-performance array library like LAPACK. Notably, we saw in Section 8.3.1 that by using a well-tuned matrix multiplication routine, the execution time of a benchmark can be improved by a factor of 70x.

89 Conclusion and Future Work

90 Bibliography

[1] N. Lameed and L. Hendren, “Staged Static Techniques to Efficiently Implement Array Copy Semantics in a MATLAB JIT Compiler,” in Proceedings of the 20th International Conference on Compiler Construction, CC 2011, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2011, Springer Berlin / Heidelberg, mar 2011.

[2] A. Casey, J. Li, J. Doherty, M. Chevalier-Boisvert, T. Aslam, A. Dubrau, N. Lameed, A. Aslam, R. Garg, S. Radpour, O. S. Belanger, L. Hendren, and C. Verbrugge, “McLab: an extensible compiler toolkit for MATLAB and related languages,” in Proceedings of the Third C* Conference on Computer Science and Software Engineering, C3S2E ’10, (New York, NY, USA), pp. 114–117, ACM, 2010.

[3] Faiz Khan, Vincent Foley-Bourgon, Sujay Kathrotia, Erick Lavoie, Laurie Hendren, “Using JavaScript and WebCL for Numerical Computations: A Comparative Study of Native and Web Technologies,” Tech. Rep. SABLE-TR-2014-2, Sable Research Group, School of Computer Science, McGill University, Montréal, Québec, Canada, 2014.

[4] S. Teller, Data Visualization with D3.js. Packt Publishing, 2013.

[5] A. Casey and L. Hendren, “MetaLexer: a modular lexical specification language,” in Proceedings of the tenth international conference on Aspect-oriented software

91 Bibliography

development, AOSD ’11, (New York, NY, USA), pp. 7–18, ACM, 2011.

[6] J. Doherty, “Mcsaf: An extensible static analysis framework for the language,” Master’s thesis, August 2011.

[7] A. W. Dubrau and L. J. Hendren, “Taming matlab,” in Proceedings of the ACM international conference on Object oriented programming systems languages and applications, OOPSLA ’12, (New York, NY, USA), pp. 503–522, ACM, 2012.

[8] X. Li, “Mc2for: A tool for automatically translating matlab to fortran 95,” pp. 234–243, IEEE, 2014.

[9] V. Kumar, “Mix10: Compiling matlab to x10 for high performance,” Master’s thesis, April 2014.

[10] A. Bodzay and L. Hendren, “Aspectmatlab++: Annotations, types, and aspects for scientists,” in Proceedings of the 14th International Conference on Modularity, MOD- ULARITY 2015, (New York, NY, USA), pp. 41–54, ACM, 2015.

[11] S. L. et al., “McLab-Web.” https://github.com/Sable/McLab-Web.

[12] L. Xu, “Mc2for: A matlab to fortran 95 compiler,” Master’s thesis, McGill University, April 2014.

[13] T. Ekman and G. Hedin, “The jastadd system — modular extensible compiler construc- tion,” Science of Computer Programming, vol. 69, no. 1–3, pp. 14 – 26, 2007. Special issue on Experimental Software and Toolkits.

[14] Mozilla, “sweet.js.” http://sweetjs.org/.

[15] MathWorks, “MATLAB: Floating-Point Numbers.” http://www.mathworks.com/ help/matlab/matlab_prog/floating-point-numbers.html.

[16] E. International, “ECMAScript Language Specification, §4.3.19.” http://www. ecma-international.org/ecma-262/5.1/#sec-4.3.19.

92 Bibliography

[17] L. Shure, “MATLAB: Memory Management for Functions and Variables.” http://blogs.mathworks.com/loren/2006/05/10/ memory-management-for-functions-and-variables/.

[18] GNU, “Octave: Miscellaneous Techniques.” https://www.gnu.org/software/ octave/doc/interpreter/Miscellaneous-Techniques.html.

[19] N. Lameed and L. Hendren, “Staged Static Techniques to Efficiently Implement Array Copy Semantics in a MATLAB JIT Compiler,” in Proceedings of the 20th International Conference on Compiler Construction, CC 2011, Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2011, Springer Berlin / Heidelberg, Mar 2011.

[20] M. Chevalier-Boisvert, “McVM: an Optimizing Virtual Machine for the MATLAB Programming Language,” Master’s thesis, August 2009.

[21] Sable Lab, “Ostrich2.” https://github.com/Sable/Ostrich2.

[22] Sable Lab, “Wu Wei Benchmarking Toolkit.” https://github.com/Sable/ wu-wei-benchmarking-toolkit/.

[23] MathWorks, “Matlab Coder.” http://www.mathworks.com/products/ matlab-coder/.

[24] L. De Rose and D. Padua, “A matlab to fortran 90 translator and its effectiveness,” in Proceedings of the 10th International Conference on Supercomputing, ICS ’96, (New York, NY, USA), pp. 309–316, ACM, 1996.

[25] L. D. Rose, K. Gallivan, E. Gallopoulos, B. A. Marsolf, and D. A. Padua, “Falcon: A matlab interactive restructuring compiler,” in Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing, LCPC ’95, (London, UK, UK), pp. 269–288, Springer-Verlag, 1996.

[26] J. Ashkenas, “Languages that compile to JavaScript.” https://github.com/ jashkenas/coffeescript/wiki/list-of-languages-that-compile-to-js.

93 Bibliography

[27] J. Ashkenas, “CoffeeScript.” http://coffeescript.org/.

[28] Microsoft, “TypeScript.” https://www.typescriptlang.org/.

[29] Evan Czaplicki, “Elm.” http://elm-lang.org/.

[30] Google, “Dart.” https://www.dartlang.org/.

[31] The emscripten authors, “Emscripten.” https://github.com/kripken/ emscripten.

[32] Ocsigen authors, “js_of_ocaml.” https://github.com/ocsigen/js_of_ocaml.

[33] Hamish Mackenzie, Victor Nazarov, Luite Stegeman, “ghcjs.” https://github.com/ ghcjs/ghcjs.

[34] Rich Hickey, “ClojureScript.” https://github.com/clojure/clojurescript.

[35] Zach Bray, Tomas Petricek, “FunScript.” http://funscript.info/.

[36] Adam Beynon, “Opal.” http://opalrb.org/.

[37] S. Kathrotia, “Mcnumjs: A javascript library for numerical computations,” Master’s thesis, McGill University, April 2015.

[38] Mikola Lysenko, “Ndarray.” https://github.com/scijs/ndarray.

[39] Sébastien Loisel, “Numeric Javascript.” http://www.numericjs.com/.

[40] Wagner, Bastien et al., “WebAssembly.” https://webassembly.github.io/.

94