LLVM-Chimps: Compilation Environment for Fpgas Using LLVM Compiler Infrastructure and Chimps Computational Model
Total Page:16
File Type:pdf, Size:1020Kb
LLVM-CHiMPS: Compilation Environment for FPGAs Using LLVM Compiler Infrastructure and CHiMPS Computational Model Seung J. Lee1, David K. Raila2, Volodymyr V. Kindratenko1 1) National Center for Supercomputing Applications (NCSA) 2) University of Illinois at Urbana-Champaign (UIUC) [email protected], [email protected], [email protected] Abstract developed by Xilinx, employs a unique approach by imposing a computational model on the FPGA CHiMPS (Compiling High level language to subsystem, treating it as a virtualized hardware Massively Pipelined System) system, developed by architecture that is convenient for high level code Xilinx is gaining popularity due to its convenient compilation. The CHiMPS compiler transforms high computational model and architecture for field level language (HLL) codes, currently written in ANSI programmable gate array computing. The CHiMPS C, to CHiMPS target language (CTL), which is system utilizes CHiMPS target language as an combined with runtime support for the FPGA runtime intermediate representation to bridge between the high environment and assembled using the CHiMPS level language and the data flow architecture assembler and FPGA design tools to a hardware data generated from it. However, currently the CHiMPS flow implementation [2]. The instructions defined in frontend does not provide many commonly used CTL have a close resemblance to a traditional optimizations and has some use restrictions. In this microprocessor instruction set which is convenient for paper we present an alternative compiler environment compiling, optimizing, and should be familiar to based on low level virtual machine compiler programmers [3, 4]. This effectively bridges the gap environment extended to generate CHiMPS target between hardware design and traditional software language code for the CHiMPS architecture. Our development and only imposes minor costs related to implementation provides good support for global runtime overhead support to virtualize the architecture. optimizations and analysis and overcomes many limitations of the original Xilinx CHiMPS compiler. 1.2. LLVM Simulation results from codes based on this approach show to outperform those obtained with the original Low level virtual machine (LLVM) compiler [5], CHiMPS compiler. originally developed at the University of Illinois, is now an open source compiler project that is aimed at 1. Introduction supporting global optimizations. It has many attractive features for program analysis and optimization. LLVM 1.1. CHiMPS has a GCC based C and C++ frontend as well as its own frontend, Clang, that offers modular and reusable Recently, systems have been developed that employ components for building compilers that are target- Field Programmable Gate Arrays (FPGAs) as independent and easy to use. This reduces the time and application accelerators. In order to execute effort to build a particular compiler. Static backends applications on such systems, the application already implemented in LLVM including X86, X86-64, programmer extracts computationally intensive kernel PowerPC 32/64, ARM and others share those from the application, reorganizes data structures to components. A notable feature of LLVM is its low accommodate memory subsystem, and rewrites the level static single assignment (SSA) form virtual kernel in the highly specialized environment for FPGA instruction set which is a low-level intermediate execution. The CHiMPS (Compiling High level representation (IR) that uses RISC-like instructions. language to Massively Pipelined System) system [1], The IR makes LLVM versatile and flexible as well as language-independent and is amenable data flow level source code is translated into a low-level representations. hardware language, such as VHDL or Verilog. The idea of the frontend optimization strategy using 1.3. LLVM in CHiMPS compilation flow LLVM has been proven by the Trident compiler [19]. Trident’s framework is similar to LLVM-CHiMPS The current LCC based frontend of CHiMPS compilation except that Trident compiler is used compiler [1] is known to have some serious drawbacks instead of CHiMPS backend. Also in our LLVM- as shown in Section 2. In the work presented in this CHiMPS compiler a more aggressive set of paper, LLVM is employed to improve these optimization techniques is implemented in the LLVM shortcomings and to investigate the ability to do high frontend whereas in the Trident’s approach many level program transformations that better support the optimizations are pushed down to the Trident’s CHiMPS architecture. Figure 1 illustrates the backend. compilation flows of the original Xilinx CHiMPS. The ROCCC compiler [20] is another example of a LLVM backend part is aimed at substituting for compiler that translates HLL code into highly parallel CHiMPS HLL compiler. In the following sections, the and optimized circuits. Instead of using LLVM implementation details are discussed. frontend and CHiMPS backend, the ROCCC compiler is built on the SUIF2 and Machine-SUIF [29, 30] platforms, with the SUIF2 platform being used to process optimizations while the Machine-SUIF platform is used to further specialize for the hardware and generate VHDL. On the commercial side, several HLL compilers are available including Mitrion-C [26], Impulse-C [25], DIME-C [27], and MAP-C [28]. These compilers, however, are proprietary and support only a few reconfigurable computing systems. In our LLVM-CHiMPS approach, we take advantage of the optimization techniques implemented in the open-source LLVM compiler and of the FPGA hardware knowledge applied to the backend CHiMPS compiler developed by Xilinx. 2. Examples of known limitations in Xilinx CHiMPS compiler Original Xilinx CHiMPS compiler has some limitations that make it inconvenient and difficult for programming. For example, legal high level expressions in HLL code sometimes need to be paraphrased for CHiMPS compilation. Consider the ANSI C code shown in Figure 2. Figure 1. Compilation flow of Xilinx CHiMPS char* foo (int select, char* brTid, char* brFid) { compiler. if (select) return brTid; 1.4. Related work return brFid; } Several research and commercial HLL to FPGA compilers are currently available [19-27]. Although Figure 2. Simple code that fails in Xilinx CHiMPS their implementations differ significantly, the general compiler framework of compilation strategies is the same: high for (i=0; i<=n; i++) { } for (i = 0; i < n; i++) { } for (i=0; i<n; i+=2) { } for (i=1; i<n; i++) { } Figure 3. (a) Supported and (b) unsupported expressions of for loop construct int foo() { Enter foo; define i32 @foo() nounwind { int n; reg k, n entry: int k=0; add 0;k ret i32 45 for (n=0; n<10; n++) reg temp0:1 } k+=n; nfor l0;10;n add k,n;k return k; end l0 } exit foo; k Source code CTL from Xilinx CHiMPS compiler LLVM IR Figure 4. XIlinx CHiMPS compiler does not employ common optimization techniques. This simple code fails because two return own virtual instruction set which is a low-level IR that statements are used in a single function, and although uses RISC-like target independent instructions. Every this is legal in ANSI C, the Xilinx CHiMPS frontend LLVM backend for a specific target is based on this does not support multiple returns. LLVM IR. Figure 3a shows the only form of for statement that In the following sections, overall comparison of is fully supported by the Xilinx CHiMPS compiler. All instructions at two different levels is exposed. While other uses of for statement, such as shown in Figure 3b LLVM instructions are low level, CHiMPS target are not guaranteed to produce correct code [16]. language defines low level, as well as some high level A very significant weakness of the Xilinx CHiMPS instructions such as for and nfor, which are similar to compiler frontend is that no optimizations are carried for statement in C language. The difference between out during compilation [17]. Figure 4 depicts an these two levels is critical in this implementation of example for a simple summation from 0 to 9 that yields LLVM-CHiMPS, and so it needs to be discussed in 45, and is not optimized at all by the CHiMPS frontend. more detail. Readers are directed to refer to the LLVM The CTL code emitted is almost directly related to the and CHiMPS documentation [2, 3, 5] for more detailed source code, although this kind of a simple high level syntax, semantics and examples of the instructions for code is commonly optimized at the compilation time, CHiMPS and LLVM. as shown in the LLVM IR, the right column in Figure 4. These serious drawbacks in CHiMPS can be 3.1. Implementation of low level improved using LLVM as a frontend which offers more representations in CHiMPS stable compilation with numerous global optimizations and the potential for significant program LLVM is intended to support global optimization transformations. which is one of the motivations for using it in this work. Additionally, LLVM’s simple RISC-like 3. From LLVM to CHiMPS instructions are quite similar to CHiMPS instructions which are also similar to traditional microprocessor As shown in Figure 1, our approach is to use LLVM instructions, so there is a good connection between as the frontend to generate CTL code instead of the these models and a good starting point for LLVM- Xilinx CHiMPS frontend. This section describes how CHiMPS. LLVM is modified to meet this goal. LLVM has its Integer arithmetic Binary operations - add, sub, multiply, divide, cmp - add, sub, mul, udiv, sdiv, fdiv, urem, srem, frem Floating-point arithmetic Bitwise binary operations - i2f, f2i, fadd, fsub, fmultiply, fdivide, fcmp - shl, lshr, ashr, and, or, xor Logical operations Other operations - and, or, xor, not, shl, shr - icmp, fcmp, … Conversion operations - sitofp, fptosi, … Figure 5. Arithmetic and logical instructions in (a) CHiMPS and (b) LLVM Pseudo-instructions - reg, enter, exit, assign, foreign, call Figure 6. CHiMPS pseudo-instructions Standard memory access instructions Memory access and addressing operations - memread, memwrite - load, store, … Figure 7. Memory access instructions in (a) CHiMPS and (b) LLVM CHiMPS also has common instructions for call are easily associated with call instruction in LLVM arithmetic and logical operations.