Adapting a Constraint-Based Compiler Tool to a New VLIW Architecture

IT 19 060 Examensarbete 30 hp September 2019 Adapting a Constraint-Based Compiler Tool to a New VLIW Architecture Martin Kjellin Institutionen för informationsteknologi Department of Information Technology Abstract Adapting a Constraint-Based Compiler Tool to a New VLIW Architecture Martin Kjellin Teknisk- naturvetenskaplig fakultet UTH-enheten The compiler tool Unison uses combinatorial optimisation to perform integrated register allocation and instruction scheduling, and is designed to be possible to adapt Besöksadress: to different processor architectures. Black Arrow is a VLIW (very long instruction Ångströmlaboratoriet Lägerhyddsvägen 1 word) architecture designed for signal processing within the field of mobile Hus 4, Plan 0 communication technology. Black Arrow has structural traits that complicate the register allocation and instruction scheduling process. Most important of these is that Postadress: functionally equivalent execution units within the processor are not topologically Box 536 751 21 Uppsala equivalent, since their distances to other execution units vary. This means that an instruction schedule must indicate not only in what cycle each instruction is to be Telefon: executed, but also the execution unit that will execute it. This thesis presents and 018 – 471 30 03 evaluates a version of Unison that has been adapted to Black Arrow. A central point Telefax: in the adaptation is how the problem of assigning each Black Arrow instruction to a 018 – 471 30 00 specific execution unit is solved through the use of Unison’s ability to choose between several alternative instructions implementing the same operation. The Hemsida: evaluation shows that the Black Arrow version of Unison, when optimising for http://www.teknat.uu.se/student minimal execution time and a variant of minimal code size, is able to find optimal solutions to the register allocation and instruction scheduling problems for several common integer algorithms, within reasonable time. Handledare: Karl-Filip Faxén Ämnesgranskare: Pierre Flener Examinator: Mats Daniels IT 19 060 Tryckt av: Reprocentralen ITC Contents Acknowledgements 7 1 Introduction 9 1.1 Purpose . 9 1.2 Structure . 10 2 Black Arrow 11 2.1 Datapath . 11 2.2 Instruction Set . 14 2.3 Bach and Kudzul-C . 14 3 Combinatorial Optimisation 17 3.1 Constraint Programming . 17 3.2 Lazy Clause Generation . 18 4 Unison 21 4.1 Code Import . 21 4.2 Linearisation . 22 4.3 Extension with Optional Copy Operations . 23 4.4 Augmentation with Alternative Temporaries . 24 4.5 Constraint Model . 26 4.6 Solving . 30 5 Target Description 33 5.1 Instruction Set . 33 5.2 Choosing Execution Unit . 35 5.3 Optional Copy Operations . 36 5.4 Global Interconnects . 37 5.5 Resources . 39 5.6 Latencies and Serialisation . 39 5.7 Read and Write Capabilities . 40 6 Toolchain 43 6.1 Preprocessing . 43 6.2 Unison’s processing . 45 6.3 Postprocessing . 45 7 Evaluation 49 7.1 Test Programs . 49 7.2 Test Setup . 50 5 7.3 Optimising for Minimal Execution Time . 52 7.4 Optimising for Minimal Code Size . 56 7.4.1 Optimising with Constant-Size Bundles . 57 7.4.2 Optimising with Varying-Size Bundles . 59 7.5 Discussion . 62 8 Related Work 65 8.1 Combinatorial Optimisation Approaches . 65 8.2 Cluster Allocation . 66 9 Future Work 67 9.1 Instruction Set . 67 9.2 Choice of Execution Units . 68 9.3 Further Investigations . 69 Bibliography 71 6 Acknowledgements This thesis has only one author, but several others have given invaluable help during the long process of writing it and performing the investigation reported in it. My supervisor Karl-Filip Faxén has guided me in the world of processors and compilers and helped me understand how Black Arrow works and how it can be modelled. Fur- thermore, we have had many interesting discussions about other important subjects, such as politics and music. Pierre Flener has been a very interested and encourag- ing reviewer and has given me valuable advice on how to improve the thesis text. Roberto Castañeda Lozano has willingly helped me understand all important details of Unison. His helpfulness is even more remarkable considering that he has no formal responsibility for this thesis. Roberto was also the one who proposed using Unison’s mechanism for alternative instructions to assign instructions to the execution units of Black Arrow, an idea that is very important in this thesis. Per Starbäck made the LATEX document class stpthesis, which was used to give the thesis a beautiful layout. I am most grateful to them all! Last but not least, I would like to thank Sofia Nordin, who, apart from being a very dear friend, was the one who helped me get in touch with Karl-Filip and thus with the idea of the investigation presented in the thesis. 7 8 1 Introduction A compiler consists of two main parts: a front-end and a back-end. While the front-end analyses the source program that is about to be compiled and produces an intermediate representation of it, the back-end produces a target program (often in some assem- bly language) from the intermediate representation [1, p. 4]. Two important tasks performed by a compiler back-end are register allocation and instruction scheduling [1, pp. 505–506]. Register allocation is the act of assigning temporaries (program variables) to registers or memory at each point in the program, while instruction scheduling is the act of ordering the instructions of the program to achieve high throughput (number of instructions executed per time unit). These tasks can be performed using heuristic or systematic methods. Given enough time, the latter methods produce optimal register allocations and instruction schedules, in relation to some objective such as minimal code size or minimal (estimated) execution time for the compiled program. The instruction scheduling performed by the compiler is especially important for so-called VLIW (very long instruction word) architectures. The defining characteristic of such an architecture is that each instruction word encodes all the operations to be issued in a single cycle. This means that the compiler, during instruction scheduling, must decide which operations are to be issued in parallel, instead of relying on hardware to do this at runtime [1, p. 710]. 1.1 Purpose The work presented in this thesis has been performed at RISE SICS, a government- owned Swedish research institute within applied information and communication technology. The compiler tool Unison [12], which has been developed in collaboration between RISE SICS, the communication technology company Ericsson, and KTH Royal Institute of Technology, uses combinatorial optimisation to perform integrated register allocation and instruction scheduling. Unison can be integrated with the back- end of the compiler LLVM [32] and is designed to be possible to adapt to different processor architectures. In another collaboration between RISE SICS and Ericsson, a new VLIW architecture called Black Arrow has been developed [22, 23]. Black Arrow, which is intended to be used for signal processing within the field of mobile communication technology, is a programmable accelerator, that is, a processor designed to provide as much as possible of both the flexibility of a digital signal processor (DSP) and the performance of a hardware accelerator (a processor specialised in efficiently calculating certain mathematical functions). As part of the Black Arrow project, an architecture-specific compiler called Bach (Black Arrow Compiler in Haskell), which is not based on 9 methods for combinatorial optimisation, has been developed [22]. When compared to LLVM’s default algorithms for register allocation and instruction scheduling, Unison has already been shown to produce code of similar or better quality for a few architectures, among these the VLIW architecture Hexagon [12]. However, the Black Arrow architecture has structural traits that complicate the instruction scheduling and register allocation problems, and it has not been studied before if Unison can handle these traits and, if so, produce better code than Bach. The purpose of this study is therefore to: • Investigate if Unison can be adapted to support the Black Arrow architecture. • If so, evaluate three aspects of the performance of the adapted version of Unison: – The quality of the produced code (in terms of code size and estimated execution time). – The speed of the instruction scheduling and register allocation process. – The impact on quality and speed of using different solvers, techniques and strategies for combinatorial optimisation. As will be shown, it is indeed possible to adapt Unison to the Black Arrow architecture, so that the tool can find optimal solutions to the instruction scheduling and register allocation problems for Black Arrow, in a way that provides reasonable processing times for several common integer algorithms, when the optimisation concerns minimal execution time and a variant of minimal code size. However, the evaluation also shows that other programs and another code-size optimisation criterion make the optimisation significantly harder for Unison. 1.2 Structure As a necessary background to the adaptation of Unison for the Black Arrow architecture, this introduction is followed by descriptions of the Black Arrow architecture (Chapter 2), two techniques for combinatorial optimisation that can be used by Unison (Chapter 3), and Unison’s instruction scheduling and register allocation process (Chap- ter 4). After these background chapters, the target description that adapts Unison to Black Arrow and that has been developed as part of this study is presented in Chapter 5. The practical process of transforming a program representation produced by Bach to one where Unison has performed instruction scheduling and register allocation is described in Chapter 6, before the adapted version of Unison is evaluated in Chapter 7. The thesis ends with a discussion of some related work in Chapter 8 and suggestions for future work in Chapter 9. 10 2 Black Arrow The most recent version of the Black Arrow architecture is (in September 2019) version 1.0 [23].

Load more