A Fixed Frequency FPGA Architecture by Nicholas Croyle Weaver BA
Total Page:16
File Type:pdf, Size:1020Kb
The SFRA: A Fixed Frequency FPGA Architecture by Nicholas Croyle Weaver B.A. (University of California at Berkeley) 1995 A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the GRADUATE DIVISION of the UNIVERSITY of CALIFORNIA at BERKELEY Committee in charge: Professor John Wawrzynek, Chair Professor John Kubiatowicz Professor Steven Brenner 2003 The dissertation of Nicholas Croyle Weaver is approved: Chair Date Date Date University of California at Berkeley 2003 The SFRA: A Fixed Frequency FPGA Architecture Copyright 2003 by Nicholas Croyle Weaver 1 Abstract The SFRA: A Fixed Frequency FPGA Architecture by Nicholas Croyle Weaver Doctor of Philosophy in Computer Science University of California at Berkeley Professor John Wawrzynek, Chair Field Programmable Gate Arrays (FPGAs) are synchronous digital devices used to realize digital designs on a programmable fabric. Conventional FPGAs use design dependent clocking, so the resulting clock frequency is dependent on the user design and the mapping process. An alternative approach, a Fixed-Frequency FPGA, has an intrinsic clock rate at which all designs will operate after automatic or manual modification. Fixed-Frequency FPGAs offer significant advantages in computational throughput, throughput per unit area, and ease of integration as a computational coprocessor in a system on a chip. Previous Fixed-Frequency FPGAs either suffered from restrictive interconnects, application restrictions, or issues arising from the need for new toolflows. This thesis proposes a new interconnect topology, a “Corner Turn” network, which main- tains the placement properties and upstream toolflows of a conventional FPGA while allowing effi- cient, pipelined, fixed frequency operation. Global routing for this topology uses efficient, polyno- mial time heuristics and complete searches. Detailed routing uses channel-independent packing. C-slow retiming, a process of increasing the throughput by interleaving independent streams of execution, is used to automatically modify designs so they operate at the array’s intrinsic clock frequency. An additional C-slowing tool improves designs targeting conventional FPGAs to isolate the benefits of this transformation from those arising from fixed frequency operation. This seman- tic transformation can even be applied to a microprocessor, automatically creating an interleaved multithreaded architecture. Since the Corner Turn topology maintains conventional placement properties, this thesis defines a Fixed-Frequency FPGA architecture which is placement and tool compatible with Xil- 2 inx Virtex FPGAs. By defining the architecture, estimating the area and performance cost, and providing routing and retiming tools, this thesis directly compares the costs and benefits of a Fixed- Frequency FPGA with a commercial FPGA. Professor John Wawrzynek Dissertation Committee Chair iii To my grandparents, who made this all possible. iv Contents List of Figures ix List of Tables xiii 1 Introduction 1 1.1 What are FPGAs? . 2 1.2 Why FPGAs as Computational Devices? . 4 1.3 Why Fixed-Frequency FPGAs? . 5 1.4 Why the SFRA? . 7 1.4.1 Why a Corner-Turn Interconnect? . 7 1.4.2 Why C-slow Retiming? . 9 1.4.3 The Complete Research Toolflow . 10 1.4.4 This Work’s Contributions . 11 1.5 Overview of Thesis . 12 2 Related Work 14 2.1 FPGAs . 14 2.1.1 Unpipelined interconnect, variable frequency . 15 2.1.2 Unpipelined Interconnect Studies . 18 2.1.3 Pipelined Interconnect, Variable Frequency . 19 2.1.4 Pipelined, fixed frequency . 20 2.2 FPGA Tools . 21 2.2.1 General Placement . 21 2.2.2 Datapath Placement . 22 2.2.3 Routing . 23 2.2.4 Retiming . 23 3 Benchmarks 25 3.1 AES Encryption . 25 3.2 Smith Waterman . 27 3.3 Synthetic microprocessor datapath . 29 3.4 LEON Synthesized SPARC core . 30 3.5 Summary . 31 v I The SFRA Architecture 32 4 The Corner-Turn FPGA Topology 34 4.1 The Corner-Turn Interconnect . 34 4.2 Summary . 38 5 The SFRA: A Pipelined, Corner-Turn FPGA 39 5.1 The SFRA Architecture . 39 5.2 The SFRA Toolflow . 43 5.3 Summary . 44 6 Retiming Registers 45 6.1 Input Retiming . 45 6.2 Output Retiming . 46 6.3 Mixed Retiming . 48 6.4 Deterministic Routing Delays . 48 6.5 Area Costs . 50 6.6 Retiming and the SFRA . 51 6.7 Summary . 51 7 The Layout of the SFRA 52 7.1 Layout Strategy . 52 7.1.1 SRAM Cell . 55 7.1.2 Output Buffers . 57 7.1.3 Input Buffers . 58 7.1.4 Interconnect Buffers . 59 7.2 Overall Area . 60 7.3 Timing Results . 61 7.4 Summary . 61 8 Routing a Corner-Turn Interconnect 63 8.1 Interaction with Retiming . 64 8.2 The Routing Process . 64 8.2.1 Direct Routing . 65 8.2.2 Fanout Routing . 65 8.2.3 Pushrouting . 66 8.2.4 Zig-zag (2 turn) routing . 67 8.2.5 Randomized Ripup and Reroute . 68 8.2.6 Detailed Routing . 68 8.3 Routing Quality & Analysis . 69 8.3.1 AES . 69 8.3.2 Smith/Waterman . 71 8.3.3 Synthetic Microprocessor Datapath . 72 8.3.4 Leon 1 . 74 8.4 Summary . 75 vi 9 Other Implications of the Corner-Turn Interconnect 76 9.1 Defect Tolerance . 76 9.2 Memory Blocks . 78 9.3 Hardware Assisted Routing . 79 9.4 Heterogeneous channels . 80 9.5 Summary . 80 II Retiming 81 10 Retiming, C-slow Retiming, and Repipelining 83 10.1 Retiming . 84 10.2 Repipelining . 87 10.3 C-slow Retiming . 88 10.4 Retiming for Conventional FPGAs . 90 10.5 Retiming for Fixed Frequency FPGAs . 91 10.6 Previous Retiming Tools . 93 10.7 Summary . 94 11 Applying C-Slow Retiming to Microprocessor Architectures 95 11.1 Introduction . 95 11.2 C-slowing a Microprocessor . 96 11.2.1 Register File . 98 11.2.2 TLB . 99 11.2.3 Caches and Memory . 99 11.2.4 Branch Prediction . 100 11.2.5 Control Registers and Machine State . 100 11.2.6 Interrupts . 101 11.2.7 Interthread Synchronization . 101 11.2.8 Power Consumption . 101 11.3 C-slowing LEON . 102 11.4 Summary . 103 12 The Effects of Placement and Hand C-slowing 104 12.1 The Effects of Hand Placement and C-slow Retiming . 104 12.2 AES Results . 106 12.3 Smith/Waterman results . ..