A Survey and Evaluation of FPGA High-Level Synthesis Tools

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCAD.2015.2513673, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems TCAD, VOL. X, NO. Y, DECEMBER 2015 1 A Survey and Evaluation of FPGA High-Level Synthesis Tools Razvan Nane, Member, IEEE, Vlad-Mihai Sima, Christian Pilato, Member, IEEE, Jongsok Choi, Student Member, IEEE, Blair Fort, Student Member, IEEE, Andrew Canis, Student Member, IEEE, Yu Ting Chen, Student Member, IEEE, Hsuan Hsiao, Student Member, IEEE, Stephen Brown, Member, IEEE, Fabrizio Ferrandi, Member, IEEE, Jason Anderson, Member, IEEE, Koen Bertels Member, IEEE Abstract—High-level synthesis (HLS) is increasingly popular at a low level of abstraction, where cycle-by-cycle behavior for the design of high-performance and energy-efficient heteroge- is completely specified. The use of such languages requires neous systems, shortening time-to-market and addressing today’s advanced hardware expertise, besides being cumbersome to system complexity. HLS allows designers to work at a higher-level of abstraction by using a software program to specify the hard- develop in. This leads to longer development times that can ware functionality. Additionally, HLS is particularly interesting critically impact the time-to-market. for designing FPGA circuits, where hardware implementations An interesting solution to realize such heterogeneity and, can be easily refined and replaced in the target device. Recent at the same time, address the time-to-market problem is the years have seen much activity in the HLS research community, combination of reconfigurable hardware architectures, such with a plethora of HLS tool offerings, from both industry and academia. All these tools may have different input languages, as field-programmable gate arrays (FPGAs) and high-level perform different internal optimizations, and produce results of synthesis (HLS) tools [2]. FPGAs are integrated circuits that different quality, even for the very same input description. Hence, can be configured by the end user to implement digital circuits. it is challenging to compare their performance and understand Most FPGAs are also reconfigurable, allowing a relatively which is the best for the hardware to be implemented. We quick refinement and optimization of a hardware design with present a comprehensive analysis of recent HLS tools, as well as overview the areas of active interest in the HLS research no additional manufacturing costs. The designer can modify community. We also present a first-published methodology to the HDL description of the components and then use an evaluate different HLS tools. We use our methodology to compare FPGA vendor toolchain for the synthesis of the bitstream to one commercial and three academic tools on a common set of configure the FPGA. HLS tools start from a high-level soft- C benchmarks, aiming at performing an in-depth evaluation in ware programmable language (HLL) (e.g. C, C++, SystemC) terms of performance and use of resources. to automatically produce a circuit specification in HDL that Index Terms—High-Level Synthesis, Survey, Evaluation, Com- performs the same function. HLS offers benefits to software parison, FPGA, DWARV, LegUp, Bambu. engineers, enabling them to reap some of the speed and energy benefits of hardware, without actually having to build I. INTRODUCTION up hardware expertise. HLS also offers benefits to hardware engineers, by allowing them to design systems faster at a high- Clock frequency scaling in processors stalled in the mid- level abstraction and rapidly explore the design space. This is dle of the last decade, and in recent years, an alternative crucial in the design of complex systems [3] and especially approach for high-throughput and energy-efficient processing suitable for FPGA design where many alternative implemen- is based on heterogeneity, where designers integrate software tations can be easily generated, deployed onto the target processors and application-specific customized hardware for device, and compared. Recent developments in the FPGA acceleration, each tailored towards specific tasks [1]. Although industry, such as Microsoft’s application of FPGAs in Bing specialized hardware has the potential to provide huge acceler- search acceleration [4], and the forthcoming acquisition of ation at a fraction of a processor’s energy, the main drawback Altera by Intel, further underscore the need for using FPGAs is related to its design. On one hand, describing these compo- as computing platforms with high-level software-amenable nents in a hardware description language (HDL) (e.g. VHDL design methodologies. HLS has also been recently applied to or Verilog) allows the designer to adopt existing tools for a variety of applications (e.g. medical imaging, convolutional RTL and logic synthesis into the target technology. On the neural networks, machine learning), with significant benefits other hand, this requires the designer to specify functionality in terms of performance and energy consumption [5]. R. Nane, V.M. Sima, and K. Bertels are with Delft University of Technol- Although HLS tools seem to efficiently mitigate the problem ogy, The Netherlands. of creating the hardware description, automatically generating C. Pilato was with Politecnico di Milano, Italy, and is now with Columbia hardware from software is not easy and a wide range of University, NY, USA. F. Ferrandi is with Politecnico di Milano, Italy. different approaches have been developed. One approach is J. Choi, B. Fort, A. Canis, Y.T. Chen, H. Hsiao, S. Brown, and J. Anderson to adapt the high-level language to a specific application are with University of Toronto, Canada. domain (e.g. dataflow languages for describing streaming Copyright (c) 2015 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be applications). These HLS tools can leverage dedicated op- obtained from the IEEE by sending an email to [email protected]. timizations or micro-architectural solutions for the specific 0278-0070 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TCAD.2015.2513673, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems TCAD, VOL. X, NO. Y, DECEMBER 2015 2 domain. However, the algorithm designer, who is usually a its status. Furthermore, the bullet type, defined in the figure’s software engineer, has to understand how to properly update legend, denotes the target application domain for which the the code. This approach is usually time consuming and er- tool can be used. Finally, tool names which are underlined ror prone. For this reason, some HLS tools offer complete in the figure mean that the tool also supports SystemC as support for a standard high-level language, such as C, giving input. Using DSLs or SystemC raises challenges for adoption complete freedom to the algorithm designer. Understanding of HLS by software developers. In this section, due to space the current HLS research directions, the different HLS tools limitations, we describe only the unique features of each available and their capabilities is a difficult challenge, and tool. For general information (e.g. target application domain, a thoughtful analysis is lacking in the literature to cover all support for floating/fixed-point arithmetic, automatic testbench these aspects. For example, [6] was a small survey of existing generation), the reader is referred to Table I. We first introduce HLS tools with a static comparison (on criteria such as the the academic HLS tools evaluated in this study, before moving documentation available or the learning curve) of their features onto highlight features of other HLS tools available in the and user experience. However, the tools have not been applied community (either commercial or academic). to benchmarks, nor were the results produced by the tools compared. Indeed, given an application to be implemented as A. Academic HLS Tools Evaluated in This Study a hardware accelerator, it is crucial to understand which HLS tool better fits the characteristics of the algorithm. For this DWARV [7] is an academic HLS compiler developed at reason, we believe that it is important to have a comprehensive Delft University of Technology. The tool is based on the CoSy survey of recent HLS tools, research directions, as well as commercial compiler infrastructure [10] developed by ACE. a systematic method to evaluate different tools on the same The characteristics of DWARV are directly related to the benchmarks in order to analyze the results. advantages of using CoSy, which are its modular and robust In this paper, we present a thorough analysis of HLS tools, back-end, and that is easily extensible with new optimizations. current HLS research thrusts, as well as a detailed way to BAMBU [8] is an academic HLS tool developed at Po- evaluate different state-of-the-art tools (both academic and litecnico di Milano and first released in 2012. BAMBU is commercial) on performance and resource usage. The three able to generate different pareto-optimal implementations to academic

A Survey and Evaluation of FPGA High-Level Synthesis Tools

An Introduction to High-Level Synthesis

A Highly Configurable High-Level Synthesis Functional Pattern Library

Richard S. Uhler

FPGA Programming for the Masses

Design Automation for Streaming Systems

Review of FPD's Languages, Compilers, Interpreters and Tools

Xcell Journal Issue 50, Fall 2004

Catapult DS 1-10 2PG Datasheet US.Qxd

An Introduction to High-Level Synthesis

High Level Synthesis with Catapult

HDL and Programming Languages ■ 6 Languages ■ 6.1 Analogue Circuit Design ■ 6.2 Digital Circuit Design ■ 6.3 Printed Circuit Board Design ■ 7 See Also

Automatic Instruction-Set Architecture Synthesis for VLIW Processor Cores in the ASAM Project