The Challenges of Hardware Synthesis from C-Like Languages

The Challenges of Hardware Synthesis from C-like Languages Stephen A. Edwards∗ Department of Computer Science Columbia University, New York Abstract most successful C-like languages, in fact, bear little syntactic or semantic resemblance to C, effectively forcing users to learn The relentless increase in the complexity of integrated circuits a new language anyway. As a result, techniques for synthesiz- we can fabricate imposes a continuing need for ways to de- ing hardware from C either generate inefficient hardware or scribe complex hardware succinctly. Because of their ubiquity propose a language that merely adopts part of C syntax. and flexibility, many have proposed to use the C and C++ lan- For space reasons, this paper is focused purely on the use of guages as specification languages for digital hardware. Yet, C-like languages for synthesis. I deliberately omit discussion tools based on this idea have seen little commercial interest. of other important uses of a design language, such as validation In this paper, I argue that C/C++ is a poor choice for specify- and algorithm exploration. C-like languages are much more ing hardware for synthesis and suggest a set of criteria that the compelling for these tasks, and one in particular (SystemC) is next successful hardware description language should have. now widely used, as are many ad hoc variants. 1 Introduction 2 A Short History of C Familiarity is the main reason C-like languages have been pro- Dennis Ritchie developed C in the early 1970 [18] based on posed for hardware synthesis. Synthesize hardware from C, experience with Ken Thompson’s B language, which had itself proponents claim, and we will effectively turn every C pro- evolved from Martin Richards’ BCPL [17]. Ritchie describes grammer into a hardware designer. Another common motiva- all three as “close to the machine” in the sense that their ab- tion is hardware/software codesign: today’s systems are very stractions are very similar to the data types and operations sup- often implemented as a mix of hardware and software, and it plied by conventional processors. is often unclear at the onset which portions to implement in A core principle in BCPL is its undifferentiated array of hardware and which can be software. Here, the claim is that words memory model. Integers, pointers, and characters are all using a single language for both simplifies the migration task. represented in a single word; the language is effectively type- In this paper, I argue that these claims are largely false and less. This made perfect sense on the word-addressed machines that C is a poor choice for specifying hardware. On the con- BCPL was targeting at the time, but wasn’t acceptable for the trary, I claim that the semantics of C-like imperative languages byte-addressed PDP-11 on which C was first developed. are so distant from those of hardware that C-like thinking is Ritchie modified BCPL’s “array of words” model to add the actually detrimental to hardware design. familiar character, integer, and floating-point types now sup- Executing a given piece of C code on a traditional sequential ported by virtually every general-purpose processor. Ritchie processor can be thought of as synthesizing hardware from C, considers C’s treatment of arrays to be characteristic of the lan- but the techniques presented here instead strive for more cus- guage. Unlike other languages that have explicit array types, tomized implementations that exploit more parallelism, hard- arrays in C are almost a side-effect of its pointer semantics. ware’s main advantage. Unfortunately, as discussed below, the While this model leads to simple, very efficient implemen- C language has no support for parallelism and as such either tations, Ritchie observes that the prevalence of pointers in C the synthesis tool is responsible for finding it (a very difficult means that compilers must use careful dataflow techniques to task) or the designer is forced to modify the algorithm and avoid aliasing problems while applying optimizations. insert explicit parallelism. Neither solution is particular satis- Ritchie lists a number of infelicities in the language caused factory and the latter deviates significantly from the objective by historical accident. For example, the use of break to sep- of trivially turning C programmers into hardware designers. arate cases in switch statements arose because Ritchie copied The thesis of this paper is that giving C programmers tools an early version of BCPL; later versions used endcase. The is not enough to turn them into reasonable hardware design- precedence of bitwise-AND is lower than the equality opera- ers. I show that efficient hardware is usually impossible to tor because the logical-AND operator was added later. describe in an unmodified C-like language because the lan- Many aspects of C are greatly simplified from their BCPL guage inhibits the specification or automatic inference of ad- counterparts because of limited memory on the PDP-11 (24K, equate concurrency, timing, types, and communication. The of which 12 were devoted to the nascent Unix kernel). For ex- ∗[email protected] http://www.cs.columbia.edu/˜sedwards ample, BCPL allowed arbitrary control-flow statements to be Edwards is supported by an NSF CAREER award, a grant from Intel corpora- embedded within expressions. This facility does not exist in C tion, an award from the SRC, and from New York State’s NYSTAR program. because limited memory demanded a one-pass compiler. Language Comment system. Supplied classes provide mechanisms for specifying Cones [25] Early, combinational only datapaths, finite-state machines, and similar constructs (Paskˇ o HardwareC [12] Behavioral synthesis-centric et al. [16] describes adding an extension for RAM interfaces). Transmogrifier C [7] Limited scope These data structures are then translated into languages such SystemC [8] Verilog in C++ as Verilog and passed to conventional synthesis tools. Lipton Ocapi [19] Algorithmic structural descriptions et al.’s PDL++ system [14] takes a very similar approach. C2Verilog [23] Comprehensive; company defunct The C2Verilog compiler developed at CompiLogic (later Cyber [26] Little made public (NEC) called C Level Design and, since November 2001, part of Syn- Handel-C [2] C with CSP (Celoxica) opsys) is one of the few that can claim broad support of ANSI SpecC [6] Resolutely refinement-based C. It can translate pointers, recursion, dynamic memory allo- Bach C [10] Untimed semantics (Sharp) cation, and other thorny C constructs. Soderman and Panchul CASH [1] Synthesizes asynchronous circuits describe their compiler in a pair of 1998 papers [23, 24] and hold a broad patent covering C-to-Verilog-like translation [15] Table 1: The languages/compilers considered in this paper. that describes their compiler in detail. NEC’s Cyber system [26] accepts a C variant dubbed BDL. While details of the language are not publicly available, the Thus, C has at least four defining characteristics: a set of available information suggests it is a somewhat ad hoc transla- types that correspond with what the processor directly manip- tion that probably puts restrictions on the language, insists its ulates, pointers instead of a first-class array type, a number users write C in particular idioms, and probably requires a fair of language constructs that are historical accidents, and many amount of coaxing from users. others that are due to memory restrictions. Celoxica’s Handel-C [2] is a C variant that extends the lan- These characteristics are troubling when synthesizing hard- guage with constructs for parallel statements and OCCAM- ware from C. Variable-width integers are natural in hardware, like rendezvous communication. Handel-C’s timing model is yet C only supports four sizes, all larger than a byte. C’s mem- uniquely simple: each assignment statement takes one cycle. ory model is a large, undifferentiated array of bytes, yet many Gajski et al.’s SpecC language [6] is a superset of ANSI C small, varied memories are most effective in hardware. Fi- augmented with many system- and hardware-modeling con- nally, modern compilers can assume available memory is eas- structs, including ones for finite-state machines, concurrency, ily 10 000 times larger than what was available to Ritchie. pipelining, and structure. The latest language reference manual [5] lists thirty-three new keywords. SpecC imposes a re- 3 C-like Hardware Synthesis Languages finement methodology. As such, the whole language is not di- A variety of C-like hardware languages have been proposed rectly synthesizable but through a series of both manual and since the late 1980s (Table 1). This section describes each in automated rewrites, a SpecC description can be refined into roughly chronological order. See also De Micheli [3]. one that can be synthesized. Stroud et al.’s Cones [25] was one of the earliest. From a Like Handel-C, Sharp’s Bach C [10] is an ANSI-C variant strict subset of C, it synthesized single functions into combi- with explicit concurrency and rendezvous-style communica- national blocks. It could handle conditionals; loops, which it tion. However, Bach C only imposes sequencing rather than unrolled; and arrays treated as bit vectors. assigning a particular number of cycles to each operation. And Ku and De Micheli developed HardwareC [12] for input to although it supports arrays, it does not support pointers. their Olympus synthesis system [4]. It is a behavioral hardware Budiu et al.’s CASH compiler [1] is unique among the C language with a C-like syntax and has extensive support for synthesizers because it generates asynchronous hardware. It hardware-like structure and hierarchy. accepts ANSI C, identifies instruction-level parallelism, and Galloway’s Transmogrifier C [7] is a fairly small C subset. generates an asynchronous dataflow circuit.

The Challenges of Hardware Synthesis from C-Like Languages

An Introduction to High-Level Synthesis

A Compiler Infrastructure for Static and Hybrid Analysis of Discrete Event System Models

Synchronously Waiting on Asynchronous Operations

Lock-Free Programming

CMS Application Multitasking

Automating Non-Blocking Synchronization in Concurrent Data Abstractions

Mutual Exclusion: Classical Algorithms for Locks

Synchronization Key Concepts: Critical Sections, Mutual Exclusion, Test-And-Set, Spinlocks, Blocking and Blocking Locks, Semaphores, Condition Variables, Deadlocks

Concurrent Programming Without Locks

Enabling Task Level Parallelism in Handelc Thamer S

Parallel Systemc/TLM Simulation of Hardware Components Described for High-Level Synthesis Denis Becker

INTEGRATING ALGORITHM-LEVEL DESIGN and SYSTEM-LEVEL DESIGN THROUGH SPECIFICATION Synthesisintegrating Algorithm-Level Design