Implementation of a Graph Coloring Register Allocator for the Graal

Submitted by Schröckeneder Florian Submitted at Institute for System Software Supervisor Prof. Dr. Dr. h.c. Implementation of a Hanspeter Mössenböck Co-Supervisor DI Josef Eisl Graph Coloring Register July 2017 Allocator for the Graal Compiler Master Thesis to obtain the academic degree of Diplom-Ingenieur in the Master’s Program Computer Science JOHANNES KEPLER UNIVERSITY LINZ Altenbergerstraße 69 4040 Linz, Osterreich¨ www.jku.at DVR 0093696 I Abstract Register allocation is crucial for the performance of modern compilers. In this thesis we implemented a graph coloring register allocator and compared it to a linear scan approach. In cases where code quality is important our approach has potentially an advantage. In cases where compile time is more important the other approach has the advantage. In this thesis we first explain the two register allocation approaches and take a look at where register allocation fits into the compilation process. Then we look at our version of register allocation and show an complete example of how it works. For the comparison of the two approaches we use two benchmark suites. From the results of this comparison we learned that linear scan can perform better if it is highly optimized and graph coloring is not. In order to use the full advantages of graph coloring there are also a lot of further optimizations needed. From what we have learned creating this thesis, it is beneficial to have the possibility to choose between different kinds of register allocation. Depending on the situation, a better code performance can be more important than a shorter compile time. But to really make a difference in practice, a lot optimization effort has to be put into an implementation. II Kurzfassung Registerzuteilung ist entscheidend für die Leistung von modernen Übersetzern. In dieser Diplomarbeit haben wir einen Ansatz der Registerzuteilung implementiert, nämlich Graph Coloring und verglichen ihn mit einem Linear Scan Ansatz. In Fällen wo die Qualität des Codes wichtig ist, hat unser Ansatz potentiell Vorteile. In Fällen wo die Übersetzungszeit wichtiger ist hat der andere Ansatz Vorteile. In dieser Diplomarbeit stellen wir zuerst beide Ansätze zur Registerzuteilung vor und sehen uns an wo genau im Übersetzungsprozess die Registerzuteilung stattfindet. Im Anschluss sehen wir uns unsere Version der Registerzuteilung an und zeigen ein komplettes Beispiel wie dieser funktioniert. Für den Vergleich der beiden Ansätze verwenden wir zwei Benchmark-Suiten. Anhand der Ergebnisse dieses Vergleichs lernten wir, dass ein stark optimierter Linear Scan Ansatz eine bessere Spitzenleis- tung erzielt als ein nicht optimierter Graph Coloring Ansatz. Um die Vorteile von Graph Coloring zu nutzen sind weitere Optimierungen notwendig. Wir lernten beim Erstellen dieser Diplomarbeit, dass es von Vorteil ist die Möglichkeit zu haben zwischen verschiedenen Arten von Registerzuteilung zu wählen. Abhängig von der Situation kann eine bessere Leistung des Codes wichtiger sein als eine kürzere Übersetzungszeit. Aber um in der Praxis einen Unterschied zu machen muss ein hoher Optimierungsaufwand an einer Implementation betrieben werden. Contents III Contents 1 Introduction 1 1.1 Variables, Values and Live Ranges . .1 1.2 Register allocation . .1 1.2.1 Graph Coloring . .2 1.2.2 Linear Scan . .2 1.3 Trade-off between allocation methods . .3 1.4 Structure of this Thesis . .4 2 System Overview 5 2.1 JavaVM............................................5 2.2 HotSpot VM . .5 2.3 Graal . .6 2.3.1 Compilation with Graal . .6 2.4 Static Single Assignment Form . .7 3 Implementation 8 3.1 Chaitin-Briggs Algorithm . .8 3.1.1 Data Structure . .9 3.1.2 Renumber . .9 3.1.3 Build . 10 3.1.4 Coalesce . 10 3.1.5 Spill Costs . 10 3.1.6 Optimistic coloring . 11 3.1.6.1 Simplify . 11 3.1.6.2 Select . 12 3.1.7 Spill Code . 13 3.2 Graph Coloring Register Allocation For Graal . 13 3.2.1 Liveness Analysis . 14 3.2.1.1 Number Instructions . 14 3.2.1.2 Build Local Live Sets . 15 3.2.1.3 Build Global Live Sets . 15 Contents IV 3.2.1.4 Build Intervals . 16 3.2.2 Build Graph . 17 3.2.3 Coloring . 19 3.2.3.1 Simplify . 19 3.2.3.2 Select . 20 3.2.4 Spill Values . 21 3.2.5 After Spill . 22 3.2.6 Assign Location . 23 3.2.7 Phi-Resolution . 24 3.2.8 Data-Structure . 25 4 Case Study 28 4.1 Test Case . 28 4.2 LIR Code of the Test Case . 28 4.3 After lifetime analysis . 29 4.4 After build graph . 31 4.5 After simplify . 31 4.6 After spill . 32 4.7 After select . 33 4.8 After assign locations . 33 4.9 After resolve data flow . 35 5 Evaluation 37 5.1 Evaluation environment and benchmark . 37 5.2 DaCapo . 39 5.3 Peak Performance . 39 5.4 Compile Time . 39 5.5 ScalaDacapo . 40 5.6 Peak Performance . 40 5.7 Compile Time . 40 6 Related Work 42 7 Future Work 44 7.1 Spilling Strategy . 44 7.2 Heuristic for Choosing a Spill Candidate . 45 7.3 Identify Performance Issues . 45 8 Conclusion 46 9 Acknowledgement 47 Contents V Bibliography 51 Introduction 1 Chapter 1 Introduction Computer programs have an arbitrary amount of variables, but processors only have a limited amount of hardware registers. This means that some variables have to be kept in memory. The Goal of register allocation is to find an assignment that keeps important variables in registers [4]. 1.1 Variables, Values and Live Ranges In this thesis we differentiate between these three notions. Variables can hold values and these values have live ranges. A variable stands for a type like integer or floating point. A value is assigned to one or more variables. A live range starts at the definition of the value and ends at its last use. 1.2 Register allocation Register allocation is a part of the compilation process. Its task is to decide which value to keep in hardware registers at each point of the generated code. Accessing a register is faster than accessing memory which means the decision to keep a value in memory has a direct effect on run-time performance [4]. Different ways to make this decision exist. We focused on one way to solve this issue, by using a graph coloring approach. Another approach is linear scan. Introduction 2 1.2.1 Graph Coloring Graph coloring is method that assigns colors to all nodes of the graph. Any two registers that are connected via an edge can not receive the same color. Finding a coloring for the graph is NP-complete in general and because of this we need heuristics to find an approximation. We can treat register allocation as such a graph coloring problem by creating an interference graph. The interference graph contains a node for each value. If one value is alive at the same time as another one, they interfere with each other. In that case, we add an edge between the nodes of these values to the graph. Based on these interferences we can assign values to registers. We can only assign two values to the same register, if they do not interfere with each other [6,7]. Figure 1.1 and Figure 1.2 shows an example of a colored interference graph. Figure 1.1: Interference Example 1.2.2 Linear Scan Linear Scan does not use an an interference graph but allocates registers in a single scan of the live ranges of the values. The algorithm keeps these live ranges in a sorted list and jumps from start point to start point. In every step it keeps a list with alive values and checks if the life range that begins at this point, interferes with them [18]. Figure 1.3 shows how linear scan performs the allocation for the code in Figure 1.1. We also assume two registers. At point A the two registers contain v1 and Introduction 3 Figure 1.2: Interference Graph with five values and two registers v3. Then we jump to point B. v1 is no longer alive here but v4 is. We continue to point C where v2 becomes alive. The registers now contain v4 and v2. At point D v1 gets a register and v2 stays in the other one. And finally at point E v2 and v5 are assigned to registers. Figure 1.3: Linear scan example 1.3 Trade-off between allocation methods The graph coloring algorithm passes over the live ranges more than once in order to build and main- tain the graph. Linear scan makes the register assignment in a single pass over the live ranges and is therefore faster with regards to compile time. On the other hand the linear scan makes the assignment decisions based on local knowledge and does not reevaluate these decisions [12]. Graph coloring makes this decisions based on local and global knowledge and reevaluates them which can lead to better allocation and spill decisions and therefore can lead to a better runtime performance. Just-in-time compilers often use linear scan because of the faster compile time. Just-in-time compilers compile code during the execution time of the program, when it is needed. However we can Introduction 4 think of instances where better code is more important than compile time. One example would be for the compilation of time critical methods on a server, that are compiled once but are often executed. For these reasons it makes sense to extend Graal with an additional allocation method. Graal is a just-in-time compiler for Java, written in Java [8].

Implementation of a Graph Coloring Register Allocator for the Graal

CSE 220: Systems Programming

Memory Allocation Pushq %Rbp Java Vs

Unleashing the Power of Allocator-Aware Software

P1083r2 | Move Resource Adaptor from Library TS to the C++ WP

The Slab Allocator: an Object-Caching Kernel Memory Allocator

Fast Efficient Fixed-Size Memory Pool: No Loops and No Overhead

Effective STL

Memory Allocation

Composing High-Performance Memory Allocators

A Progressive Register Allocator for Irregular Architectures

Implementing a New Register Allocator for the Server Compiler in the Java Hotspot Virtual Machine

GUARDER: a Tunable Secure Allocator