LHCb-PROC-2005-008 04/06/2005 source description. description. source implementation and C-code forsimulation from one common for FPGA VHDL-code both derive to possibility increased cost of codemaintenance.we Therefore explore the and ground for inconsistencies the however lays This solution. chosen often an is C/C++ in solution the implemented Recoding the simulation. to provided be to has level language inwritten ahigh- model functionality (HDL), of the a FPGA's FPGAs and described hardware in a description language C++ and the codeisto beas expected portable as possible. language of choice in high energy physics is typicallyorC packages frommodules integrating community. a diverse The design tools for FPGAs the and needs of large simulation design. hardware in typical testpatterns usually to large compared data processinggiven some on data. sets These data are foreseen all of effects the showing up set usually is simulation VHDL . or of the detector. Designs for FPGAs aretypically described in algorithms offering during upgrade life-time ability the the to high-speed custom digitalsignal processingwhile (DSP) F iterations. design several through cycle to expected is implementation algorithm where environment an in crucial is This models. of inconsistent danger the and workload minimizes therefore codessimulation (for and implementation) inparallel and source two maintain to having of problem the solves approach Our bit- and cycle-accuracy. assuring while speed simulation ondemonstrate FPGAs). We the improved portability and VHDL into (for and (for code implementation C-code simulation) DSP applicationsparticle in detectors. real-time for simulation and algorithm implementation unify and accelerate to description common one from VHDL code and To integrateTo signal processingtasks being performed on mismatch of we abilities the between a Here encounter evaluateTo adetector's performance overall however, a Index Terms compiles which language description as Confluence use We Abstract (pre)processing in Physics experiments they as provide PGA —We present a workflow to generate cycle-accurate C cycle-accurate togenerate aworkflow —We present S have become apopular choice for digitalsignal —Hardware design languages, Design automation C/VHDL forLHCb Codesign VELO zero European OrganizationEuropean forRese Nuclear I.

I NTRODUCTION suppression algorithms

Manfred Muecke arch CERN,CH-1211 Geneva 23Switzerland definedmain the requirements for apotential tool: is mandatory. parameter asadesign latency specify the to system, ability the mapping and resource RTL-level optimization. assureuse forforeseen the i.e.specification both tasks to hardwaregenerated transparently control to the ability high-level description of datapath-oriented designsyet provide or physicsknowledge. chosen The language should support a original algorithmmight besuggested due to improved system modificationexpected, optimization that after a even tothe however is It FPGA. the within -sharing and resource-usage minimize to RTL-level onthe place takes often optimization full VHDL compliance. compliance. full VHDL for speed sacrifice simulation have to Simulators of VHDL. abilities modeling extensive the to mainly due is which low typically is speed execution simulator's the as cosimulation not 5 Requirement portability. met can withbe VHDL the limits and available being alicense still requires (ProgrammingLanguage This Interface). however usually PLI its through simulator VHDL the interface to Another possibility to satisfyrequirements 2-4would beto code from aVHDL description does not exist. C- independent of generation the allowing tool a knowledge descriptionwould satisfy requirements To 2-5. our best VHDL a from C-model executing fast a of Generation Based on the observations described above,we have be to have algorithms the As phase and coding specification algorithm After initial an 5) 4) 3) 2) 1)

integrated intomodules other written C/C++. in easily be can which model executing fast of output specify latency to possibility VHDL, synthesizable of output levelgenerationhardware atRTL transparent capabilities description DSP-oriented abstract, II.

R EQUIREMENTS integrated in a carefully timed in acarefullyintegrated timed

measured. III. RELATED WORK Several solutions exist for generating RTL-models from C- The generated C-code has been embedded in a small test code. program to provide testvectors. The program has been SystemC [1] is a modeling language based on C++ which it compiled using gcc, been simulated for 1,000,000 cycles while extends with additional libraries to provide constructs for RTL measuring its execution time. and behavioral modeling. The main differences between A slightly different yet functional equivalent hand coded SystemC and Confluence are their language paradigm implementation of the algorithm has also been simulated and (imperative vs. functional) and the fact that Confluence the execution time of the simulation has been measured. derives all target models from one common netlist description. All three models have been fed with an identical data set and the output has been compared to guarantee functional identity. MentorGraphics CatapultC [2] generates VHDL from pure Both VHDL models have been synthesized using 's C++ especially aiming at dataflow oriented applications. Quartus II 5.0 software. All results are reported in Table 1. Celoxica offers its Handel-C language for the same purpose. The difference in used FPGA resources stems mainly from the slightly differing algorithm implementations. The excessive There are several Simulink blocksets available allowing the use of memory bits in the handcoded version is due to a square generation of HDL code from a graphical system operation which has been implemented as a look-up table representation. This approach is inherently limited to the whereas in the Confluence implementation a multiplication respective field of application foreseen during creation of the operator has been used causing the increased use of DSP- blockset. blocks. This is also expected to be the reason for the differing VHDL simulation times. IV. CONFLUENCE The execution time for the C-code is below one second Confluence [3] is a functional language aiming at the design showing the potential speedup compared to simulation. The of synchronous systems. Systems are described at register paramount advantage of the generated C-code is however its transfer level (RTL), yet the description is more compact and portability as it does no longer depend on a VHDL simulator generic than VHDL. This is possible due to the restriction on and its inherent platform and license requirements. synchronous systems allowing implied clock-, enable- and reset-domains while the abilities of a functional language allow for very generic programming. Table 1: Measurement Results The Confluence tool chain allows the generation of bit- and cycle-accurate VHDL, Verilog and C models. Looking at the Hand Confluence tool flow in more detail, one can separate two steps: LCMS coded 1) The Confluence description is elaborated, recursive VHDL C VHDL constucts are resolved and the result is written into a 235 LEs, 329 LEs, Resources 7680 bits, 1088 bits, RTL netlist. (LE, memory bits) - 2) The netlist writer parses the generated netlist and writes 2 9bDSP 6 9bDSP VHDL simulation 1 000 cycles a model implementing the respective functionality in time (ModelSim) P4 2,6GHz ~120s - ~80s the chosen target language (C, VHDL, Verilog, JHDl,

C runtime NuSMV). (gcc) - <1s -

The Confluence compiler is open source and freely available for academic use, allowing a proper code inspection of the compiler and giving a certain independence from further REFERENCES developments. The generated VHDL models have been observed to be [1] Panda, P.R., "SystemC - a modeling platform supporting multiple design abstractions," Proceedings of the 14th international symposium on synthesizeable without further modifications. Systems synthesis, pp.75-80, 2001 [2] Andres Takach, Bryan Bowyer, Thomas Bollaert, "C Based Hardware V. MEASUREMENTS Design for Wireless Applications," Proceedings of the conference on Design, Automation and Test in Europe, Vol.3, pp.124 - 129, 2005 The VELO subdetector [4] of the LHCb experiment applies [3] Tom Hawkins, “Confluence Language Reference Manual,” Available at: a common mode subtraction on sets of digitized data samples. http://www.confluent.org [4] "LHCb VELO (Vertex Locator): Technical Design report," Available at We have implemented a linear common mode algorithm [1] in http://cdsweb.cern.ch/search.py?recid=504321 Confluence using a linear fit to compute the mean, slope and [5] Guido Haefeli, "Contribution to the development of the acquisition deviation over a set of 32 10bit samples. From the Confluence electronics for the LHCb experiment," PhD Thesis, EPFL Lausanne, description we have generated both C- and VHDL-code. 2004 The generated VHDL-code together with a small testbench has been simulated using ModelTech ModelSim and the execution time of the simulation for 1,000,000 cycles has been