
Extended Abstract: Effective simulation and debugging fora high-level hardware language using software compilers Clément Pit-Claudel, Thomas Bourgeat, Stella Lau, Arvind, Adam Chlipala MIT CSAIL 1 Motivation 2 Limitations of the State of the Art Hardware designs are expressed in a spectrum of languages Architects currently have three main options for simulation: ranging from low-level RTL (Verilog [15] or VHDL [8]) to high-level non-architectural simulation, hardware simula- sequential software languages with annotations for high- tion using an FPGA, and cycle-accurate simulation on a level hardware synthesis (Vivado HLS [17], Handel-C [6], CPU. High-level, non-architectural simulation [1, 13] is fast. Clash [12], etc.). Different points on this scale entail different However, the models do not cycle-accurately represent syn- trade-offs. Verilog offers limited programming abstractions thesizable designs and have an unpredictable cost model. and composability and thus is tedious to write and debug, but Simulating on an FPGA is both fast and cycle-accurate. How- it provides developers fine-grained control over the result- ever, the synthesis, placement, and routing flow to compile a ing circuits. HLS systems, i.e., the hardware design systems design to run on an FPGA is slow, and so iterating on a design that start from software languages to generate Verilog, offer is time-intensive. Furthermore, debugging on an FPGA is rich abstractions and excellent debugging and simulation challenging: even printf, the simplest tool used in debugging, facilities but poor control over generated circuits. This is not is unavailable. surprising: the sequential computation model of software Hence, we focus on cycle-accurate simulation on a CPU. languages is deeply at odds with the hardware computation Verilog is a common language for writing RTL and describ- models, which try to run all parts of a circuit in parallel all ing circuits cycle-accurately. However, Verilog (analogously the time. to assembly) is intrinsicially hard to debug. Furthermore, Rule-based languages, such as Bluespec [9], Kôika [2], Verilog is often auto-generated, which makes it even more and Kami [3], offer an interesting middle ground, with both inscrutable: translations are not designed for readability, and predictable performance and high-level, usable semantics. high-level abstractions are lost in translation. Programmers Rule-based designs describe the manipulation of (hardware) debugging Verilog mainly use printf and assertion debug- state elements using state-transforming atomic rules, which ging, or they examine low-level waveforms. This is primitive (appear to) execute sequentially. An RTL compiler for such in comparison to software debugging, which offers powerful a rule-based system introduces concurrency by translating tools like gdb [14], reverse debugging [4], and code profiling. rules into individual circuits that run in parallel preserving Tooling for RHDLs is also limited. Bluesim [10] (a one-rule-at-a-time semantics. Bluespec-level cycle-based simulator) generates models While significant effort has been dedicated to synthesizing for simulation after multiple passes of compilation, which high-quality hardware from Bluespec designs, comparatively leads to hard-to-read models. Furthermore, its performance little effort has been expended on simulation, debugging, is poor, and so it is common to first compile to RTL and and testing of rule-based designs: these tasks are typically then simulate the resulting circuit. This approach again performed at the generated-Verilog level. This is non-ideal suffers from lack of readability and high-level abstraction: in for two reasons: particular, there is limited support for exploring executions 1. generated circuits suffer from poor readability due to below the cycle granularity. compiler optimizations breaking high-level abstrac- In summary, with the state of the art, models generated tions, worsening the debugging experience, and for simulation are not simultaneously designed for cycle- 2. generated circuits are optimized for the hardware exe- accurate semantics, readability/debuggability, and simula- cution model where concurrency is free, rather than tion performance. the mostly sequential model of CPUs—this is ineffi- cient for simulation. Our work is motivated by the idea that architecture hob- 3 Key Insights byists and researchers should be able to create hardware Circuits compiled from high-level HDLs are optimized for designs easily: we want to enable architects to write in high- the concurrent execution model of hardware, rather than the level hardware-description languages without sacrificing sequential model of CPUs. The hardware model has a tradeoff any of (1) the ability to generate efficient, synthesizable cir- between area and critical path: often you get faster circuits cuits with predictable performance characteristics, (2) direct with a shorter critical path by doing potentially redundant visbility of high-level abstractions when debugging, and (3) work. Thus, a significant amount of code in Verilog is “dead” simulation performance (in fact, we can leverage high-level at any given point. Modern simulators have been unable to semantic properties to simulate faster). exploit this, and many signals are evaluated unnecessarily. Clément Pit-Claudel, Thomas Bourgeat, Stella Lau, Arvind, Adam Chlipala As an example, rule-based languages have an “early-exit” by both Kôika and a commercial Bluespec compiler from semantics, with the compiler dynamically tracking whether equivalent designs. As Kôika has not yet been used to design a rule “fails” at any point in its execution (e.g. by conflicting very large systems, our benchmarks are all small to medium- with another rule resulting in breakage of the one-rule-at-a- sized. While we do not believe that the results would be time semantics). A circuit does not have this luxury: after different, more experience would be needed to generalize a failure, the rest of the circuitry is still there, and it com- the claims to very large designs. putes all intermediate values of the rules. By specializing Debugging and design-exploration methodology. The our compilers for simulation, we can leverage this high-level case studies (Section 4.2) demonstrate novel and interesting information to “early-exit” to avoid simulating dead code. ways to apply software tools to hardware designs: code- One key insight is the following: simulating a high-level coverage tools operating at the source-line level provide a design and synthesizing hardware from a circuit description wealth of architectural data without adding a single hardware are fundamentally different activities that deserve funda- counter (getting, for example, branch misprediction counts mentally different compilers making fundamentally different for free); reverse debuggers and hardware watchpoints al- assumptions and compilation choices. Thus, we should sepa- low quick bug-finding without consulting waveforms; and rate the simulation and synthesis pipelines to allow compiler sub-cycle, step-by-step debugging of readable models lets specialization. Furthermore, by compiling to C++, we can one explore an unfamiliar design interactively, at a level leverage existing compilers and toolchains (e.g. gdb, gcov [7], matching the original Kôika semantics. Ultimately, lever- gprof [5], rr [11]) to get fast and debuggable software models aging existing software toolchains enables a whole range for free (as compared to compiling directly to machine code, of novel and interesting hardware debugging and design which fails to take advantage of existing compiler optimiza- exploration methodologies. tions and would be unusable for debugging). The result is an improved hardware-development work- Contributions. flow: architects write in an RHDL with convenient abstrac- 1. We show how using completely separate toolchains for tions, compile to a readable, cycle-accurate C++ model, debug software simulation and hardware synthesis leads to and profile with standard software tools, then synthesize to faster simulation and improved debugging experience. RTL. 2. We describe techniques to build fast software models of rule-based designs, using lightweight transactions. 4 Main Artifacts 3. We show that rule-based designs are amenable to We present a methodology for better simulation of RHDLs, heavy optimization through static analysis that ex- leading to a better hardware design experience, and an open- ploits the simplicity of the input language. source compiler to C++ for the Kôika RHDL (Cuttlesim). 4. We give concrete evidence of the value of this approach We evaluate the performance of Cuttlesim by benchmark- using simulation performance benchmarks and debug- ing embedded processor designs and DSP building blocks; ging and design-exploration case studies. see Table 1. We compare primarily against Verilator, an open- source, state-of-the-art Verilog simulator1. To evaluate the 6 Citation for Most Influential Paper hardware design and debugging process, we conduct a series Award of case studies (functional-correctness-debugging of a cache This paper presented the last piece of a new hardware- coherence protocol, functional validation of a design using development toolchain based on completely separating simu- randomized testing, performance-debugging of an embed- lation and synthesis pipelines, enabling the design of circuits ded processor core, and design exploration adding a branch with predictable performance, a debugging experience on
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages3 Page
-
File Size-