C/VHDL Codesign for Lhcb VELO Zero Suppression Algorithms
Total Page:16
File Type:pdf, Size:1020Kb
C/VHDL Codesign for LHCb VELO zero suppression algorithms Manfred Muecke European Organization for Nuclear Research CERN, CH-1211 Geneva 23 Switzerland Abstract—We present a workflow to generate cycle-accurate C II. REQUIREMENTS and VHDL code from one common description to accelerate and unify algorithm implementation and simulation for real-time After an initial algorithm specification and coding phase DSP applications in particle detectors. optimization often takes place on the RTL-level to minimize We use Confluence as description language which compiles resource-usage and -sharing within the FPGA. It is however into C-code (for simulation) and VHDL code (for implementation expected, that even after optimization a modification to the on FPGAs). We demonstrate the improved portability and original algorithm might be suggested due to improved system simulation speed while assuring bit- and cycle-accuracy. Our or physics knowledge. The chosen language should support a approach solves the problem of having to maintain two source high-level description of datapath-oriented designs yet provide codes (for simulation and implementation) in parallel and therefore minimizes workload and the danger of inconsistent the ability to control generated hardware transparently to models. This is crucial in an environment where algorithm assure the use for both foreseen tasks i.e. specification implementation is expected to cycle through several design mapping and resource RTL-level optimization. iterations. As the algorithms have to be integrated in a carefully timed Index Terms—Hardware design languages, Design automation system, the ability to specify the latency as a design parameter is mandatory. I. INTRODUCTION PGAS have become a popular choice for digital signal Based on the observations described above, we have F (pre)processing in Physics experiments as they provide defined the main requirements for a potential tool: high-speed custom digital signal processing (DSP) while 1) abstract, DSP-oriented description capabilities offering the ability to upgrade algorithms during the life-time 2) transparent hardware generation at RTL level of the detector. Designs for FPGAs are typically described in 3) output of synthesizable VHDL, VHDL or Verilog. 4) possibility to specify latency To evaluate a detector's overall performance however, a 5) output of fast executing model which can be easily simulation is usually set up showing the effects of all foreseen integrated into other modules written in C/C++. data processing on some given data. These data sets are usually large compared to typical test patterns in hardware Generation of a fast executing C-model from a VHDL design. description would satisfy requirements 2-5. To our best Here we encounter a mismatch between the abilities of knowledge a tool allowing the generation of independent C- design tools for FPGAs and the needs of large simulation code from a VHDL description does not exist. packages integrating modules from a diverse community. The Another possibility to satisfy requirements 2-4 would be to LHCb-PROC-2005-008 04/06/2005 language of choice in high energy physics is typically C or interface to the VHDL simulator through its PLI C++ and the code is expected to be as portable as possible. (Programming Language Interface). This however usually To integrate signal processing tasks being performed on still requires a license being available and limits the FPGAs and described in a hardware description language portability. Requirement 5 can not be met with VHDL (HDL), a model of the FPGA's functionality written in a high- cosimulation as the simulator's execution speed is typically level language has to be provided to the simulation. Recoding low which is mainly due to the extensive modeling abilities the implemented solution in C/C++ is an often chosen of VHDL. Simulators have to sacrifice simulation speed for solution. This however lays the ground for inconsistencies and full VHDL compliance. increased cost of code maintenance. Therefore we explore the possibility to derive both VHDL-code for FPGA implementation and C-code for simulation from one common source description. measured. III. RELATED WORK Several solutions exist for generating RTL-models from C- The generated C-code has been embedded in a small test code. program to provide testvectors. The program has been SystemC [1] is a modeling language based on C++ which it compiled using gcc, been simulated for 1,000,000 cycles while extends with additional libraries to provide constructs for RTL measuring its execution time. and behavioral modeling. The main differences between A slightly different yet functional equivalent hand coded SystemC and Confluence are their language paradigm implementation of the algorithm has also been simulated and (imperative vs. functional) and the fact that Confluence the execution time of the simulation has been measured. derives all target models from one common netlist description. All three models have been fed with an identical data set and the output has been compared to guarantee functional identity. MentorGraphics CatapultC [2] generates VHDL from pure Both VHDL models have been synthesized using Altera's C++ especially aiming at dataflow oriented applications. Quartus II 5.0 software. All results are reported in Table 1. Celoxica offers its Handel-C language for the same purpose. The difference in used FPGA resources stems mainly from the slightly differing algorithm implementations. The excessive There are several Simulink blocksets available allowing the use of memory bits in the handcoded version is due to a square generation of HDL code from a graphical system operation which has been implemented as a look-up table representation. This approach is inherently limited to the whereas in the Confluence implementation a multiplication respective field of application foreseen during creation of the operator has been used causing the increased use of DSP- blockset. blocks. This is also expected to be the reason for the differing VHDL simulation times. IV. CONFLUENCE The execution time for the C-code is below one second Confluence [3] is a functional language aiming at the design showing the potential speedup compared to simulation. The of synchronous systems. Systems are described at register paramount advantage of the generated C-code is however its transfer level (RTL), yet the description is more compact and portability as it does no longer depend on a VHDL simulator generic than VHDL. This is possible due to the restriction on and its inherent platform and license requirements. synchronous systems allowing implied clock-, enable- and reset-domains while the abilities of a functional language allow for very generic programming. Table 1: Measurement Results The Confluence tool chain allows the generation of bit- and cycle-accurate VHDL, Verilog and C models. Looking at the Hand Confluence tool flow in more detail, one can separate two steps: LCMS coded 1) The Confluence description is elaborated, recursive VHDL C VHDL constucts are resolved and the result is written into a 235 LEs, 329 LEs, Resources 7680 bits, 1088 bits, RTL netlist. (LE, memory bits) - 2) The netlist writer parses the generated netlist and writes 2 9bDSP 6 9bDSP VHDL simulation cycles 1 000 a model implementing the respective functionality in time (ModelSim) 2,6GHz P4 ~120s - ~80s the chosen target language (C, VHDL, Verilog, JHDl, C runtime NuSMV). (gcc) - <1s - The Confluence compiler is open source and freely available for academic use, allowing a proper code inspection of the compiler and giving a certain independence from further REFERENCES developments. The generated VHDL models have been observed to be [1] Panda, P.R., "SystemC - a modeling platform supporting multiple design abstractions," Proceedings of the 14th international symposium on synthesizeable without further modifications. Systems synthesis, pp.75-80, 2001 [2] Andres Takach, Bryan Bowyer, Thomas Bollaert, "C Based Hardware V. MEASUREMENTS Design for Wireless Applications," Proceedings of the conference on Design, Automation and Test in Europe, Vol.3, pp.124 - 129, 2005 The VELO subdetector [4] of the LHCb experiment applies [3] Tom Hawkins, “Confluence Language Reference Manual,” Available at: a common mode subtraction on sets of digitized data samples. http://www.confluent.org [4] "LHCb VELO (Vertex Locator): Technical Design report," Available at We have implemented a linear common mode algorithm [1] in http://cdsweb.cern.ch/search.py?recid=504321 Confluence using a linear fit to compute the mean, slope and [5] Guido Haefeli, "Contribution to the development of the acquisition deviation over a set of 32 10bit samples. From the Confluence electronics for the LHCb experiment," PhD Thesis, EPFL Lausanne, description we have generated both C- and VHDL-code. 2004 The generated VHDL-code together with a small testbench has been simulated using ModelTech ModelSim and the execution time of the simulation for 1,000,000 cycles has been .