Are Field-Programmable Gate Arrays Ready for the Mainstream?

Total Page:16

File Type:pdf, Size:1020Kb

Are Field-Programmable Gate Arrays Ready for the Mainstream? [3B2-9] mmi2011060058.3d 23/11/011 16:51 Page 58 Prolegomena ...............................................................................................................................Department Editors: Kevin Rudd and Kevin Skadron ................................................................................... Are Field-Programmable Gate Arrays Ready for the Mainstream? GREG STITT University of Florida .......For more than a decade, will attempt to answer this question by configurable logic blocks (CLBs), multi- researchers have shown that field- discussing the major barriers that have pliers, digital signal processors, and on- programmable gate arrays (FPGAs) can prevented more widespread FPGA chip RAMs. Some embedded-systems- accelerate a wide variety of software, usage, in addition to surveying research specialized FPGAs, such as Xilinx’s in some cases by several orders of mag- and necessary innovations that could po- Virtex-5Q FXT (http://www.xilinx.com/ nitude compared to state-of-the-art tentially eliminate these barriers. products/silicon-devices/fpga/virtex-5q/ microprocessors.1-3 Despite this perfor- fxt.htm), even contain microprocessor mance advantage, FPGA accelerators FPGA architectures cores. In addition, for FPGAs without have not yet been accepted by main- To understand the barriers to main- these microprocessors, designers can stream application designers and instead stream usage, we must first understand use the lookup tables to implement remain a niche technology used mainly how FPGA architectures differ from mul- numerous soft processors—such as in embedded systems. ticore and GPU architectures. The most Xilinx’s MicroBlaze Soft Processor By contrast, GPUs have quickly fundamental difference is that multi- (http://www.xilinx.com/tools/microblaze. gained wide popularity. Although GPUs cores and GPUs provide general-purpose htm) and Altera’s Nios II Processor often provide better performance, espe- or specialized processors to execute par- (http://www.altera.com/devices/processor/ cially for floating-point-intensive applica- allel threads, whereas FPGA architec- nios2/ni2-index.html)—in some cases tions, FPGAs commonly have significant tures implement digital circuits by even more than 1,000.8 Although such advantages for bit-level and integer oper- providing numerous lookup tables imple- processors make an FPGA look similar ations.4,5 In addition, FPGAs often have mented in small RAMs. Lookup tables to a GPU or multicore, soft processors better energy efficiency,6 in some cases implement combinational logic by storing are an order of magnitude slower than by as much as 10 times.5 A motivating the corresponding truth table and using most dedicated processors, which has example is the FPGA-based Novo-G sys- the logic inputs as the address into the resulted in niche usage (for example, tem,7 which provides performance for lookup table. Figure 1 shows a simple embedded systems and multicore emu- computational biology applications that’s example of a 32-bit adder decomposed lation8). For mainstream computing, comparable to the Roadrunner and into 32 full adder circuits, which synthe- soft processors would likely be used as Jaguar supercomputers, while only con- sis tools might map onto 32 lookup controllers for custom pipelined circuits suming 8 kW compared to several tables. Similarly, FPGAs enable sequen- in other parts of the FPGA. megawatts. tial circuits by providing flip-flops along To combine these resources into a With the importance of energy effi- lookup table outputs. By providing hun- larger circuit, FPGAs provide reconfigura- ciency growing rapidly, these surprising dreds of thousands of lookup tables ble interconnect similar to Figure 2. In results raise the following question: if and flip-flops, FPGAs can implement between each row and column of FPGAs offer such significant advantages massively parallel circuits. resources (such as lookup tables and over alternative devices, why are they Modern FPGAs also commonly have CLBs), FPGAs contain numerous routing still a niche technology? This column coarser-grained resources such as tracks, which are wires that carry signals .............................................................. 58 Published by the IEEE Computer Society 0272-1732/11/$26.00 c 2011 IEEE [3B2-9] mmi2011060058.3d 23/11/011 16:51 Page 59 compilers in addition to codesign tools pipelined data path in Figure 3b that fully + 32-bit adder for integrating the processors with cus- unrolls the inner loop, resulting in 64 mul- tom circuits in other parts of the FPGA. tiplications and 63 additions (that is, one 32 full adders For FPGAs in PCI Express accelerator iteration of the outer loop) each cycle. boards, board vendors provide a commu- Other parts of the circuit (such as control, nication API for transferring data to and buffers, and memory) are not shown. from software code running on a host As you would expect, designing RTL 32 LUTs processor. circuits is time consuming. For the exam- LUT . LUT ple in Figure 3b, designers must specify Design methodologies the entire structure of the data path and Figure 1. Field-programmable gate and programming models must also define control for reading arrays (FPGAs) perform computation To use an FPGA for accelerating soft- inputs from memories into buffers, stall- by implementing the corresponding ware, designers typically start by profil- ing the data path when buffers are full logic (for example, a 32-bit adder) ing software, identifying the most or empty, writing outputs to memory, using lookup tables (LUTs). For this computationally intensive regions, and and so on. For a typical FPGA board, a example, synthesis tools divide the then implementing those regions as cir- complete RTL implementation of the sev- 32-bit adder into smaller circuits (for cuits in the FPGA. eral lines of C code in Figure 3a can re- example, 32 full adders) and then To implement a region of code on quire more than 1,000 lines of VHDL. map those circuits onto lookup tables the FPGA, there are several possible Such complexity leads to the frequent with the same or larger numbers of programming models. The most com- question: if compilers can extract parallel- inputs and outputs. mon model is to manually convert the ism from high-level code for multicores code into a semantically equivalent and GPUs, why can’t they do the same RTL circuit, which designers typically for FPGAs? For more than a decade, specify using hardware-description lan- researchers have worked toward this across the chip. Connection boxes pro- guages (HDLs) such as VHSIC Hard- goal with high-level synthesis tools.9 vide programmable connections between ware Description Language (VHDL) or Commercial high-level synthesis tools are resource I/O and routing tracks. Similarly, Verilog. Figure 3a shows an example becoming increasingly common (for ex- switch boxes provide programmable of a high-level loop and a corresponding ample, AutoPilot, Catapult C, Impulse C), connections between routing tracks where each row and column intersect, which allows for tools to route a signal to any destination on the device. Such an architecture is commonly called an island-style fabric, where computational resources are analogous to islands in a sea of interconnect. LUT, CLB, DSP, To implement an application on this Mult, RAM, etc. architecture, FPGA tools typically take a Switch box register-transfer-level (RTL) circuit and perform technology mapping to convert Connection box the circuit into device resources (for ex- ample, converting an adder to lookup tables in Figure 1), followed by place- ment to map each technology-mapped component onto physical resources, and finally routing to program the inter- connect to implement all connections in Figure 2. A typical FPGA architecture. FPGAs contain numerous routing tracks, the circuit. When using processors, ei- which carry signals across the chip between computational resources (for ther soft or hard, device vendors provide example, configurable logic blocks [CLBs], multipliers [Mult], and digital signal tool frameworks such as Xilinx’s EDK processors [DSPs]). Connection boxes provide programmable connections (Embedded Development Kit) and between resource I/O and routing tracks, while switch boxes provide Altera’s SOPC (System on a Programma- connections between routing tracks only. ble Chip) Builder that provide software .................................................................... NOVEMBER/DECEMBER 2011 59 [3B2-9] mmi2011060058.3d 23/11/011 16:51 Page 60 .......................................................................................................................................................................................................................... PROLEGOMENA void filter(float a[], Such limitations aren’t unique to FPGA usage by inhibiting the emergence float b[], float c[]) { for(int i¼0; FPGAs. GPUs have similar problems of a ‘‘killer app’’ supporting high sales i < OUTPUT_SIZE; iþþ){ with these same higher-level software volumes that subsidize development of c[i]¼0; for (int j¼0; j < 64; constructs and also require a significant new features and provide economies of jþþ){ amount of parallelism. One key difference scale to lower prices. Whereas GPUs c[i] þ¼ a[iþj]*b[j]; } between FPGA and GPU amenability is became widespread owing to computer } that high-performance FPGA applications graphics’ popularity in desktops, laptops, } generally require deep pipelines, such as and gaming consoles, FPGAs evolved (a) the example in Figure
Recommended publications
  • An Introduction to High-Level Synthesis
    High-Level Synthesis An Introduction to High-Level Synthesis Philippe Coussy Michael Meredith Universite´ de Bretagne-Sud, Lab-STICC Forte Design Systems Daniel D. Gajski Andres Takach University of California, Irvine Mentor Graphics today would even think of program- Editor’s note: ming a complex software application High-level synthesis raises the design abstraction level and allows rapid gener- solely by using an assembly language. ation of optimized RTL hardware for performance, area, and power require- In the hardware domain, specification ments. This article gives an overview of state-of-the-art HLS techniques and languages and design methodologies tools. 1,2 ÀÀTim Cheng, Editor in Chief have evolved similarly. For this reason, until the late 1960s, ICs were designed, optimized, and laid out by hand. Simula- THE GROWING CAPABILITIES of silicon technology tion at the gate level appeared in the early 1970s, and and the increasing complexity of applications in re- cycle-based simulation became available by 1979. Tech- cent decades have forced design methodologies niques introduced during the 1980s included place-and- and tools to move to higher abstraction levels. Raising route, schematic circuit capture, formal verification, the abstraction levels and accelerating automation of and static timing analysis. Hardware description lan- both the synthesis and the verification processes have guages (HDLs), such as Verilog (1986) and VHDL for this reason always been key factors in the evolu- (1987), have enabled wide adoption of simulation tion of the design process, which in turn has allowed tools. These HDLs have also served as inputs to logic designers to explore the design space efficiently and synthesis tools leading to the definition of their synthe- rapidly.
    [Show full text]
  • Automatic Generation of Hardware for Custom Instructions
    Automatic Generation of Hardware for Custom Instructions by Philip Ioan Necsulescu Thesis submitted to the Faculty of Graduate and Postdoctoral Studies in partial fulfillment of the requirements for the degree of Master of Science in Systems Science Interdisciplinary studies Faculty of Graduate and Postdoctoral Studies © Philip Ioan Necsulescu, Ottawa, Canada, 2011 Table of Contents List of Figures ................................................................................................................................................ 3 Acknowledgements ....................................................................................................................................... 4 Abstract ......................................................................................................................................................... 5 Glossary ......................................................................................................................................................... 6 1 - Introduction ............................................................................................................................................. 8 2 - Background (Literature Review) ............................................................................................................ 12 2.1 – Applications of Custom Instructions .............................................................................................. 12 2.2 – Comparison of Automated HDL Generation Porjects ...................................................................
    [Show full text]
  • Xcell Journal Issue 50, Fall 2004
    ISSUE 50, FALL 2004ISSUE 50, FALL XCELL JOURNAL XILINX, INC. Issue 50 Fall 2004 XcellXcelljournaljournal THETHE AUTHORITATIVEAUTHORITATIVE JOURNALJOURNAL FORFOR PROGRAMMABLEPROGRAMMABLE LOGICLOGIC USERSUSERS MEMORYMEMORY DESIGNDESIGN Streaming Data at 10 Gbps Control Your QDR Designs PARTNERSHIP 20 Years of Partnership Author! Author! Programmable WorldWorld 20042004 SOFTWARE Algorithmic C Synthesis The Need for Speed MANUFACTURING Lower PCB Mfg. Costs Optimize PCB Routability R COVER STORY FPGAs on Mars The New SPARTAN™-3 Make It You r ASIC The world’s lowest-cost FPGAs Spartan-3 Platform FPGAs deliver everything you need at the price you want. Leading the way in 90nm process technology, the new Spartan-3 devices are driving down costs in a huge range of high-capability, cost-sensitive applications. With the industry’s widest density range in its class — 50K to 5 Million gates — the Spartan-3 family gives you unbeatable value and flexibility. Lots of features … without compromising on price Check it out. You get 18x18 embedded multipliers for XtremeDSP™ processing in a low-cost FPGA. Our unique staggered pad technology delivers a ton of I/Os for total connectivity solutions. Plus our XCITE technology improves signal integrity, while eliminating hundreds of resistors to simplify board layout and reduce your bill of materials. With the lowest cost per I/O and lowest cost per logic cell, Spartan-3 Platform FPGAs are the perfect fit for any design … and any budget. MAKE IT YOUR ASIC The Programmable Logic CompanySM For more information visit www.xilinx.com/spartan3 Pb-free devices available now ©2004 Xilinx, Inc., 2100 Logic Drive, San Jose, CA 95124.
    [Show full text]
  • Catapult DS 1-10 2PG Datasheet US.Qxd
    C-B ase d D e s i g n Catapult C Synthesis D ATASHEET Major product features: • Mixed datapath and control logic synthesis from both pure ANSI C++ and SystemC • Multi-abstraction synthesis supports untimed, transaction-level, and cycle-accurate modeling styles • Full-chip synthesis capabilities including pipelined multi-block subsystems and SoC interconnects • Power, performance, and area exploration and optimization • Push button generation of RTL verification infrastructure • Advanced top-down and bottom-up hierarchical design management Catapult C Synthesis produces high-quality RTL implementations from abstract • Full and accurate control over design specifications written in C++ or SystemC, dramatically reducing design and interfaces with Interface Synthesis verification efforts. technology and Modular IO • Interactive and incremental design Tackle Complexity, Accelerate Time to RTL, Reduce Verification Effort methodology achieves fastest path to optimal hardware Traditional hardware design methods that require manual RTL development and debugging are too time consuming and error prone for today’s complex designs. • Fine-grain control for superior The Catapult® C Synthesis tool empowers designers to use industry standard ANSI quality of results C++ and SystemC to describe functional intent, and move up to a more productive • Built-in analysis tools including abstraction level. From these high-level descriptions Catapult generates production Gantt charts, critical path viewer, and quality RTL. With this approach, full hierarchical systems comprised of both cross-probing control blocks and algorithmic units are implemented automatically, eliminating the typical coding errors and bugs introduced by manual flows. By speeding time • Silicon vendor certified synthesis to RTL and automating the generation of bug free RTL, the Catapult C Synthesis libraries and integration with RTL tool significantly reduces the time to verified RTL.
    [Show full text]
  • Embedded Processing Using Fpgas Agenda
    Embedded Processing Using FPGAs Agenda • Why FPGA Platform Based Embedded Processing • Embedded Use Models And Their FPGA Based Solutions • Architecture/Topology Choices • A Reoccurring Question: Hardware Or Software • Reconfigurable Hardware • Tool Flows For FPGA Based Embedded Systems 2 - Embedded Processing using FPGAs www.xilinx.com Agenda • Why FPGA Platform Based Embedded Processing • Embedded Use Models And Their FPGA Based Solutions • Architecture/Topology Choices • A Reoccurring Question: Hardware Or Software • Reconfigurable Hardware • Tool Flows For FPGA Based Embedded Systems 3 - Embedded Processing using FPGAs www.xilinx.com Why Use Processors In the First Place • Microcontrollers (µC) and Microprocessors (µP) Provide a Higher Level of Design Abstraction – Most µC functions can be implemented using VHDL or Verilog - downsides are parallelism & complexity – Using C/C++ abstraction & serial execution make certain functions much easier to implement in a µC • Discrete µCs are Inexpensive and Widely Used – µCs have years of momentum and software designers have vast experience using them 4 - Embedded Processing using FPGAs www.xilinx.com Why Embedded Design using FPGAs In Addition To The Universal Drive Towards Smaller Cheaper Faster With Less Power…. 1 Difficult to Find the Required Mix of Peripherals in Off the Shelf (OTS) Microcontroller Solutions •2 Selecting a Single Processor Core with Long Term Solution Viability is Difficult at Best •3 Without Direct Ownership of the Processing Solution, Obsolescence is Always a Concern
    [Show full text]
  • An Introduction to High-Level Synthesis
    [3B2-8] mdt2009040008.3d 17/7/09 13:24 Page 8 High-Level Synthesis An Introduction to High-Level Synthesis Philippe Coussy Michael Meredith Universite´ de Bretagne-Sud, Lab-STICC Forte Design Systems Daniel D. Gajski Andres Takach University of California, Irvine Mentor Graphics today would even think of program- Editor’s note: ming a complex software application High-level synthesis raises the design abstraction level and allows rapid gener- solely by using an assembly language. ation of optimized RTL hardware for performance, area, and power require- In the hardware domain, specification ments. This article gives an overview of state-of-the-art HLS techniques and languages and design methodologies tools. 1,2 Tim Cheng, Editor in Chief have evolved similarly. For this reason, ÀÀ until the late 1960s, ICs were designed, optimized, and laid out by hand. Simula- THE GROWING CAPABILITIES of silicon technology tion at the gate level appeared in the early 1970s, and and the increasing complexity of applications in re- cycle-based simulation became available by 1979. Tech- cent decades have forced design methodologies niques introducedduring the 1980s included place-and- and tools to move to higher abstraction levels. Raising route, schematic circuit capture, formal verification, the abstraction levels and accelerating automation of and static timing analysis. Hardware description lan- both the synthesis and the verification processes have guages (HDLs), such as Verilog (1986) and VHDL for this reason always been key factors in the evolu- (1987), have enabled wide adoption of simulation tion of the design process, which in turn has allowed tools. These HDLs have also served as inputs to logic designers to explore the design space efficiently and synthesis tools leading to the definition of their synthe- rapidly.
    [Show full text]
  • High Level Synthesis with Catapult
    High Level Synthesis with Catapult 8.0 Richard Langridge European AE Manager 21 st January 2015 Calypto Overview • Background – Founded in 2002 – SLEC released 2005 & PowerPro 2006 – Acquired Mentor’s Catapult HLS technology and team in 2011 • Operations – Headquarters in San Jose with R&D in California, Oregon, NOIDA India – Sales and support worldwide, with new support office opened in Korea • Technology – Patented Deep Sequential Analysis Technology for verification and power optimization – Industry leading High-Level Synthesis technology – 30 patents granted; 17 pending • Customers – Over 130 customers worldwide • FY14 Results - Record revenue 2 Calypto Design Systems Calypto Design Systems’ Platforms Catapult HL Design & Verification Platform PowerPro RTL Low Power Platform 3 Calypto Design Systems Why Teams Want to Adopt HLS • Accelerate design time by working at higher level of void func (short a[N], abstraction for (int i=0; i<N; i++) { if (cond) – z+=a[i]*b[i]; New features added in days not weeks else – Address complexity through abstraction • Cut verification costs with 1000x speedup vs RTL – Faster design simulation in higher level languages – Easier functional verification and debug • Determine optimal microarchitecture – Rapidly explore multiple options for optimal Power Performance Area (PPA) • Facilitate collaboration, reuse and creation of derivatives – Technology and architectural neutral design descriptions RTL are easily shared, modified and retargeted – HLS becoming an IP enabler 4 Calypto Design Systems Catapult Delivers
    [Show full text]
  • HDL and Programming Languages ■ 6 Languages ■ 6.1 Analogue Circuit Design ■ 6.2 Digital Circuit Design ■ 6.3 Printed Circuit Board Design ■ 7 See Also
    Hardware description language - Wikipedia, the free encyclopedia 페이지 1 / 11 Hardware description language From Wikipedia, the free encyclopedia In electronics, a hardware description language or HDL is any language from a class of computer languages, specification languages, or modeling languages for formal description and design of electronic circuits, and most-commonly, digital logic. It can describe the circuit's operation, its design and organization, and tests to verify its operation by means of simulation.[citation needed] HDLs are standard text-based expressions of the spatial and temporal structure and behaviour of electronic systems. Like concurrent programming languages, HDL syntax and semantics includes explicit notations for expressing concurrency. However, in contrast to most software programming languages, HDLs also include an explicit notion of time, which is a primary attribute of hardware. Languages whose only characteristic is to express circuit connectivity between a hierarchy of blocks are properly classified as netlist languages used on electric computer-aided design (CAD). HDLs are used to write executable specifications of some piece of hardware. A simulation program, designed to implement the underlying semantics of the language statements, coupled with simulating the progress of time, provides the hardware designer with the ability to model a piece of hardware before it is created physically. It is this executability that gives HDLs the illusion of being programming languages, when they are more-precisely classed as specification languages or modeling languages. Simulators capable of supporting discrete-event (digital) and continuous-time (analog) modeling exist, and HDLs targeted for each are available. It is certainly possible to represent hardware semantics using traditional programming languages such as C++, although to function such programs must be augmented with extensive and unwieldy class libraries.
    [Show full text]
  • Automatic Instruction-Set Architecture Synthesis for VLIW Processor Cores in the ASAM Project
    Automatic Instruction-set Architecture Synthesis for VLIW Processor Cores in the ASAM Project Roel Jordansa,∗, Lech J´o´zwiaka, Henk Corporaala, Rosilde Corvinob aEindhoven University of Technology, Postbus 513, 5600MB Eindhoven, The Netherlands bIntel Benelux B.V., Capronilaan 37, 1119NG Schiphol-Rijk Abstract The design of high-performance application-specific multi-core processor systems still is a time consuming task which involves many manual steps and decisions that need to be performed by experienced design engineers. The ASAM project sought to change this by proposing an auto- matic architecture synthesis and mapping flow aimed at the design of such application specific instruction-set processor (ASIP) systems. The ASAM flow separated the design problem into two cooperating exploration levels, known as the macro-level and micro-level exploration. This paper presents an overview of the micro-level exploration level, which is concerned with the anal- ysis and design of individual processors within the overall multi-core design starting at the initial exploration stages but continuing up to the selection of the final design of the individual proces- sors within the system. The designed processors use a combination of very-long instruction-word (VLIW), single-instruction multiple-data (SIMD), and complex custom DSP-like operations in or- der to provide an area- and energy-efficient and high-performance execution of the program parts assigned to the processor node. In this paper we present an overview of how the micro-level design space exploration interacts with the macro-level, how early performance estimates are used within the ASAM flow to determine the tasks executed by each processor node, and how an initial processor design is then proposed and refined into a highly specialized VLIW ASIP.
    [Show full text]
  • Advances in Architectures and Tools for Fpgas and Their Impact on the Design of Complex Systems for Particle Physics
    Advances in Architectures and Tools for FPGAs and their Impact on the Design of Complex Systems for Particle Physics Anthony Gregersona, Amin Farmahini-Farahania, William Plishkerb, Zaipeng Xiea, Katherine Comptona, Shuvra Bhattacharyyab, Michael Schultea a University of Wisconsin - Madison b University of Maryland - College Park fagregerson, farmahinifar, [email protected] fplishker, [email protected] fcompton, [email protected] Abstract these trends has been a rapid adoption of FPGAs in HEP elec- tronics. A large proportion of the electronics in the Compact The continual improvement of semiconductor technology has Muon Solenoid Level-1 Trigger, for example, are based on FP- provided rapid advancements in device frequency and density. GAs, and many of the remaining ASICs are scheduled to be Designers of electronics systems for high-energy physics (HEP) replaced with FPGAs in proposed upgrades [1]. have benefited from these advancements, transitioning many de- signs from fixed-function ASICs to more flexible FPGA-based Improvements in FPGA technology are not likely to end platforms. Today’s FPGA devices provide a significantly higher soon. Today’s high-density FPGAs are based on a 40-nm sil- amount of resources than those available during the initial Large icon process and already contain an order of magnitude more Hadron Collider design phase. To take advantage of the ca- logic than the FPGAs available at planning stage of the Large pabilities of future FPGAs in the next generation of HEP ex- Hadron Collider’s electronics. 32 and 22 nm silicon process periments, designers must not only anticipate further improve- technologies have already been demonstrated to be feasible; ments in FPGA hardware, but must also adopt design tools and as FPGAs migrate to these improved technologies their logic methodologies that can scale along with that hardware.
    [Show full text]
  • An Automated Flow to Generate Hardware Computing Nodes from C for an FPGA-Based MPI Computing Network
    An Automated Flow to Generate Hardware Computing Nodes from C for an FPGA-Based MPI Computing Network by D.Y. Wang A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF BACHELOR OF APPLIED SCIENCE DIVISION OF ENGINEERING SCIENCE FACULTY OF APPLIED SCIENCE AND ENGINEERING UNIVERSITY OF TORONTO Supervisor: Paul Chow April 2008 Abstract Recently there have been initiatives from both the industry and academia to explore the use of FPGA-based application-specific hardware acceleration in high-performance computing platforms as traditional supercomputers based on clusters of generic CPUs fail to scale to meet the growing demand of computation-intensive applications due to limitations in power consumption and costs. Research has shown that a heteroge- neous system built on FPGAs exclusively that uses a combination of different types of computing nodes including embedded processors and application-specific hardware accelerators is a scalable way to use FPGAs for high-performance computing. An ex- ample of such a system is the TMD [11], which also uses a message-passing network to connect the computing nodes. However, the difficulty in designing high-speed hardware modules efficiently from software descriptions is preventing FPGA-based systems from being widely adopted by software developers. In this project, an auto- mated tool flow is proposed to fill this gap. The AUTO flow is developed to auto- matically generate a hardware computing node from a C program that can be used directly in the TMD system. As an example application, a Jacobi heat-equation solver is implemented in a TMD system where a soft processor is replaced by a hardware computing node generated using the AUTO flow.
    [Show full text]
  • Advances in Architectures and Tools for Fpgas and Their Impact on the Design of Complex Systems for Particle Physics
    Advances in Architectures and Tools for FPGAs and their Impact on the Design of Complex Systems for Particle Physics Anthony Gregersona, Amin Farmahini-Farahania, William Plishkerb, Zaipeng Xiea, Katherine Comptona, Shuvra Bhattacharyyab, Michael Schultea a University of Wisconsin - Madison b University of Maryland - College Park fagregerson, farmahinifar, [email protected] fplishker, [email protected] fcompton, [email protected] Abstract these trends has been a rapid adoption of FPGAs in HEP elec- tronics. A large proportion of the electronics in the Compact The continual improvement of semiconductor technology has Muon Solenoid Level-1 Trigger, for example, are based on FP- provided rapid advancements in device frequency and density. GAs, and many of the remaining ASICs are scheduled to be Designers of electronics systems for high-energy physics (HEP) replaced with FPGAs in proposed upgrades [1]. have benefited from these advancements, transitioning many de- signs from fixed-function ASICs to more flexible FPGA-based Improvements in FPGA technology are not likely to end platforms. Today’s FPGA devices provide a significantly higher soon. Today’s high-density FPGAs are based on a 40-nm sil- amount of resources than those available during the initial Large icon process and already contain an order of magnitude more Hadron Collider design phase. To take advantage of the ca- logic than the FPGAs available at planning stage of the Large pabilities of future FPGAs in the next generation of HEP ex- Hadron Collider’s electronics. 32 and 22 nm silicon process periments, designers must not only anticipate further improve- technologies have already been demonstrated to be feasible; ments in FPGA hardware, but must also adopt design tools and as FPGAs migrate to these improved technologies their logic methodologies that can scale along with that hardware.
    [Show full text]