Msc THESIS Porting the GCC Compiler to a VLIW Vector Processor

Computer Engineering 2009 Mekelweg 4, 2628 CD Delft The Netherlands http://ce.et.tudelft.nl/ MSc THESIS Porting the GCC Compiler to a VLIW Vector Processor Roel Trienekens Abstract Applications run on embedded DSPs become increasingly complex, while the demands on speed and power continue to grow. One method of meeting these demands is to move some of the processor complexity from hardware to the compiler. This increases the importance of the role of the compiler. This thesis describes how we ported the Gnu Compiler Collec- tion (GCC) to the Embedded Vector Processor (EVP). GCC is a widely used open source compiler framework, which brings several advantages. Its open source nature allows the compiler developer insight into the inner workings and provides the means to change every aspect of the compiler. This allows great freedom in applying compiler optimizations as well as the ability to adapt the compiler to changes in the architecture on short notice. GCC has a large supporting community, delivering quick and accurate feedback and allows the compiler developer to benefit from the improvements contributed by its members. The EVP is a Very Long Instruction Word (VLIW) vector processor, developed at NXP Semiconductors and now property of ST-Ericsson. It is an embedded processor used in mobile CE-MS-2009-19 communications to handle GSM, UMTS, 3G and 4G standards, amongst others. The goal of this project was to provide proof-of-concept that an EVP back end for GCC can be written that (1) supports all the vector types of the EVP, (2) supports custom operations in the form of the EVP-C intrinsic operations (an extension to the C language designed for the EVP) and (3) that takes advantage of features of the EVP such as post-increment and -decrement addressing, predicated execution and the ability to schedule operations on several different functional units. In this report we describe the implementation of support for vector data types and registers and support for EVP-C intrinsics. The Discrete Finite Automaton instruction scheduler in GCC was adapted to schedule VLIW code. In addition, support was implemented to schedule semantically equivalent operations on different functional units using different instruction mnemonics. For this we devised a new approach that allows us to maintain scheduling freedom while at the same time being able to split the instruction scheduling automaton into several smaller automata in order to avoid excessive build time and compilation time. We tested the compiler using the DejaGnu testing framework, the EEMBC Telecom benchmark suite and a Fast Fourier Transform benchmark which uses EVP-C intrinsics. The experimental results obtained show that the compiler generates correct code of high quality. Faculty of Electrical Engineering, Mathematics and Computer Science Porting the GCC Compiler to a VLIW Vector Processor THESIS submitted in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE in COMPUTER ENGINEERING by Roel Trienekens born in Gouda, The Netherlands Computer Engineering Department of Electrical Engineering Faculty of Electrical Engineering, Mathematics and Computer Science Delft University of Technology Porting the GCC Compiler to a VLIW Vector Processor by Roel Trienekens Abstract pplications run on embedded DSPs become increasingly complex, while the demands on speed and power continue to grow. One method of meeting these demands is to move some of the processor com- A plexity from hardware to the compiler. This increases the importance of the role of the compiler. This thesis describes how we ported the Gnu Compiler Collection (GCC) to the Embedded Vector Processor (EVP). GCC is a widely used open source compiler framework, which brings several advantages. Its open source nature allows the compiler developer insight into the inner workings and provides the means to change every aspect of the compiler. This allows great freedom in applying compiler optimizations as well as the ability to adapt the compiler to changes in the architecture on short notice. GCC has a large supporting community, delivering quick and accurate feedback and allows the compiler developer to benefit from the improvements contributed by its members. The EVP is a Very Long Instruction Word (VLIW) vector processor, developed at NXP Semi- conductors and now property of ST-Ericsson. It is an embedded processor used in mobile communications to handle GSM, UMTS, 3G and 4G standards, amongst others. The goal of this project was to provide proof-of- concept that an EVP back end for GCC can be written that (1) supports all the vector types of the EVP, (2) supports custom operations in the form of the EVP-C intrinsic operations (an extension to the C language designed for the EVP) and (3) that takes advantage of features of the EVP such as post-increment and -decrement addressing, predicated execution and the ability to schedule operations on several different functional units. In this report we describe the implementation of support for vector data types and registers and support for EVP-C intrinsics. The Discrete Finite Automaton instruction scheduler in GCC was adapted to schedule VLIW code. In addition, support was implemented to schedule semantically equivalent operations on different functional units using different instruction mnemonics. For this we devised a new approach that allows us to maintain scheduling freedom while at the same time being able to split the instruction scheduling automaton into several smaller automata in order to avoid excessive build time and compilation time. We tested the compiler using the DejaGnu testing framework, the EEMBC Telecom benchmark suite and a Fast Fourier Transform benchmark which uses EVP-C intrinsics. The experimental results obtained show that the compiler generates correct code of high quality. Laboratory : Computer Engineering Codenumber : CE-MS-2009-19 Committee Members : Advisor: Alex Turjan, ST-Ericsson Chairperson: Kees Goossens, CE, TU Delft Member: Ben Juurlink, CE, TU Delft Member: Koen Langendoen, CS, TU Delft i ii To my family and my girlfriend, for their support and kicking my butt when I needed it the most. iii iv Contents List of Figures vii List of Tables ix Acknowledgements xi 1 Introduction 1 1.1 Background and Related Work ............................. 2 1.2 Motivation ........................................ 4 1.3 Goals .......................................... 4 1.4 Contributions ...................................... 5 2 Fundamentals 7 2.1 EVP ........................................... 7 2.1.1 Architecture ................................... 7 2.1.2 EVP architecture features ........................... 11 2.2 EVP-C .......................................... 12 2.3 The GCC compiler ................................... 14 2.3.1 Building the GCC compiler .......................... 14 2.3.2 Intermediate representations .......................... 15 2.3.3 Passes of the compiler ............................. 16 2.4 Summary ........................................ 20 3 Implementation 21 3.1 Specifying the machine properties ........................... 21 3.1.1 General machine properties .......................... 21 3.1.2 Registers .................................... 22 3.1.3 Register classes ................................. 24 3.1.4 Machine modes ................................. 25 3.2 Expanding Tree-SSA into RTL ............................. 27 3.2.1 Defining expansion rules ............................ 27 3.2.2 Function calls .................................. 43 3.2.3 Function argument passing ........................... 44 3.2.4 Function prologue and epilogue generation .................. 45 3.3 VLIW Scheduling .................................... 46 3.3.1 The Deterministic Finite Automaton ...................... 46 3.3.2 Defining the resource usage patterns ...................... 47 3.3.3 Modeling the long immediate field ....................... 48 3.3.4 Scheduling semantically equivalent operations ................ 48 3.3.5 Specifying scheduling priorities ........................ 51 3.3.6 Inter-basic block scheduling .......................... 52 v 3.3.7 Resource conflict avoidance .......................... 53 3.4 Emitting assembly code ................................. 54 3.5 Implementing Target-Specific Features ......................... 57 3.5.1 Branch delay scheduling ............................ 57 3.5.2 Post-increment addressing ........................... 59 3.5.3 Hardware loops ................................. 62 3.5.4 Alias analysis .................................. 63 3.5.5 If-conversion .................................. 64 3.6 Summary ........................................ 64 4 Experimental Results 67 4.1 Correctness benchmarks ................................ 67 4.2 Performance benchmarks ................................ 69 4.3 Impact of scheduling semantically equivalent operations on different functional units 69 4.4 Exploration of the possibilities of autovectorization .................. 71 5 Conclusions and Recommendations 75 5.1 Impact of the Open Source nature of GCC ....................... 76 5.2 Contributions ...................................... 77 5.3 Recommendations for future work ........................... 78 vi List of Figures 2.1 EVP functional block diagram [33] ........................... 8 2.2 Storage of values in double and triple vectors ..................... 10 2.3 Files used in building the GCC compiler executable .................. 15 2.4 Compiler flow of the GCC compiler .........................

Msc THESIS Porting the GCC Compiler to a VLIW Vector Processor

Software Orchestration of Instruction Level Parallelism on Tiled Processor Architectures

Emerging Technologies Multi/Parallel Processing

Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor

Register Assignment for Software Pipelining with Partitioned Register Banks

A VLIW Architecture for a Trace Scheduling Compiler

Accessionindex: TCD-SCSS-T.20121208.090 Accession Date: Accession By: Object Name: Apollo DN10000 Vintage: C.1988 Synopsis: Mi

Robert P. Colwell

In the Supreme Court of the State of Delaware Vliw

Sample Pages From: Multiflow Computer: a Start-Up Odyssey

Ultrasparc User's Manual

Understanding EPIC Architectures and Implementations

Dynamic Fluorescence Lifetime Sensing with CMOS Single-Photon Avalanche Diode Arrays and Deep Learning Processors