FPGA-Based Coprocessors for Clouds

FPGA-Based Coprocessors for Clouds JOSIP POPOVIC, B.Sc. EE A thesis submitted to the Faculty of Graduate and Postdoctoral Affairs in partial fulfillment of the requirements for the degree of MASTER OF APPLIED SCIENCE IN ELECTRICAL AND COMPUTER ENGINEERING Ottawa-Carleton Institute for Electrical and Computer Engineering Faculty of Engineering and Design Department of Systems and Computer Engineering Carleton University Ottawa, Ontario, Canada ©2013 Josip Popovic Abstract Computer Clouds are growing in size and customer base. Lately their growth has been limited or at least influenced by hosted server’s power consumption. FPGA is a good contender for providing task offloading to CPU based servers. However hard coded FPGA solutions are not flexible enough to support a variety of applications being used in Clouds. FPGA soft processors provide the required flexibility but not the speed. High Level Synthesis requires FPGA reconfiguration. The solution to enabling use of FPGAs in Clouds seems to be logical: use a fundamentally simple and small processor (flexibility) that runs code that is as close to hardware fixed function architecture as possible (speed). The above set of requirements has led to the creation of hardware architectures that improve the existing state-of-the-art with features applicable to the Cloud computing environment. A real life design inspired by requirements from financial industry is implemented on a FPGA. ii Table of Contents Abstract ii Table of Figures vii Table of Tables xi List of Acronyms and Definitions xii List of FPGA Terms xiv Chapter 1 - Introduction 1 1.1 Motivation ....................................................................................................................... 1 1.2 Cloud Power Consumption ....................................................................................... 1 1.3 Reconfigurable Coprocessors in Clouds .............................................................. 3 1.4 Computationally intensive applications in Clouds .......................................... 4 1.5 Software Developers ................................................................................................... 6 1.6 Objectives ........................................................................................................................ 7 1.7 Overview of the Contributions ................................................................................ 8 1.8 Summary of Results ..................................................................................................... 9 1.9 Thesis Outline ................................................................................................................ 9 Chapter 2 - Background 11 2.1 What is Cloud Computing? ..................................................................................... 12 2.2 Cloud Acceleration Technologies ........................................................................ 13 2.3 FPGA Design Methodologies ................................................................................. 15 2.3.1 Custom design .............................................................................................. 15 2.3.2 Multiprocessing with embedded cores ............................................... 16 2.3.3 Virtual Hardware ......................................................................................... 17 2.3.4 High Level Synthesis .................................................................................. 18 2.3.5 Academic Solutions .................................................................................... 28 2.4 FPGA-Based Coprocessor Elements ................................................................... 34 2.4.1 Interconnect Architectures ..................................................................... 34 2.4.2 Controller ....................................................................................................... 43 2.4.3 Compiler ......................................................................................................... 44 2.4.4 Memory Subsystem .................................................................................... 45 2.5 Server Operating System Jitter ............................................................................ 45 2.6 Power Consumption in Integrated Circuits .................................................... 46 2.7 Amdahl's Law .............................................................................................................. 47 2.8 Implemented Application ...................................................................................... 47 2.8.1 Goals ................................................................................................................. 47 2.8.2 Options Trading ........................................................................................... 47 iii 2.8.3 Black-Scholes ................................................................................................ 47 2.9 i5-3570 CPU ................................................................................................................. 49 2.10 Development Board.................................................................................................. 50 2.11 PCIe core and Software Driver ............................................................................. 51 2.11.1 Speedy PCIe Core ......................................................................................... 51 2.11.2 Speedy PCIe Software Driver .................................................................. 53 Chapter 3 - Requirements, Thesis Scope and Contributions 55 3.1 Analysis of the literature and requirement definition ................................ 56 3.1.1 Methodology Comparison ........................................................................ 56 3.1.2 Cloud Coprocessor Architecture Requirements .............................. 58 3.1.3 Basis of Cloud Coprocessor Architectures ......................................... 60 3.2 Thesis Scope ................................................................................................................ 61 3.3 Contributions .............................................................................................................. 62 3.3.1 Processing Pipeline .................................................................................... 62 3.3.2 Instruction Stages ....................................................................................... 64 3.3.3 Instruction Format ..................................................................................... 65 3.3.4 Crossbar Architecture ............................................................................... 65 3.3.5 Electron versus NISC .................................................................................. 65 3.3.6 Electron versus FlexCore ......................................................................... 67 3.3.7 Electron and Variable Processing Latencies ..................................... 69 3.3.8 Processing Tile ............................................................................................. 69 3.3.9 Experimental Results ................................................................................. 70 Chapter 4 - FPGA-based Coprocessor Architecture 71 4.1 System Level Architecture ..................................................................................... 71 4.1.1 Hardware ........................................................................................................ 71 4.1.2 Software .......................................................................................................... 72 4.2 System Design Flow ................................................................................................. 73 4.3 Processing Element Interconnect ....................................................................... 74 4.3.1 Processing Element Crossbar Implementation ............................... 74 4.3.2 Processing Element 2D-Mesh Implementation ............................... 92 4.3.3 Summary......................................................................................................... 97 4.4 Processing Tile ........................................................................................................... 98 4.4.1 Control and Data Flow ............................................................................... 98 4.4.2 Memory Hierarchy ....................................................................................100 4.4.3 System Controller......................................................................................101 4.4.4 Data Memory ...............................................................................................101 4.4.5 Router ............................................................................................................101 4.4.6 Kernel .............................................................................................................101 4.5 Electron .......................................................................................................................102 4.5.1 Electron Topology .....................................................................................102 4.5.2 DynaPath ......................................................................................................104 4.6 Processing Elements ..............................................................................................106

Load more