Hardware Construction in Chisel

Hardware Construction in Chisel Huy Vo Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2013-98 http://www.eecs.berkeley.edu/Pubs/TechRpts/2013/EECS-2013-98.html May 17, 2013 Copyright © 2013, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Contents 1 Introduction 3 1.1 Related Works . 3 1.2 Acknowledgments . 4 2 Chisel 5 2.1 Node . 5 2.2 Component . 6 2.3 Backend . 7 3 Basic Hardware Construction in Chisel 9 3.1 ListLookup . 9 3.2 Vec . 10 4 Advanced Hardware Construction in Chisel 12 4.1 ListLookup . 12 4.2 Vec . 13 5 Elaboration Time Hardware Construction 15 5.1 Elaboration Time Hardware Construction Interface . 15 5.2 When . 15 5.3 SRAM Backend . 16 5.4 Automatic Pipeline Synthesis . 17 6 Conclusion and Future Work 22 1 List of Figures 1 Chisel Node . 5 2 2-Input Mux . 6 3 Chisel Literal Examples . 6 4 Hierarchical 4-Input Mux . 7 5 Chisel Backend . 8 6 ListLookup as Chisel Node . 9 7 Vec as Chisel Node . 10 8 ListLookup as Scala code and basic Chisel nodes . 12 9 Vec as Scala code and basic Chisel nodes . 13 10 Elaboration Time Interface . 15 11 When Statement . 16 12 5-stage CPU Datapath . 17 13 Resolving hazards through interlocks . 19 14 Resolving hazards through speculation and bypassing . 20 2 1 Introduction Chisel [2] is designed to be a hardware construction language so any design in Chisel will always have a well defined mapping to low level hardware blocks. Chisel’s core programming features allow users to accurately describe hardware circuits and can be easily extended for higher level hardware designs. This paper categorizes Chisel hardware construction into three levels and discusses the merits and drawbacks of each level. We first introduce the core features of the Chisel hardware construction language in Chapter 2. All hardware design in Chisel boils down to writing a Scala program that builds up a directed graph of Nodes which are Scala objects that map to basic circuit elements. Chisel provides basic operators for constructing and connecting nodes into subgraphs. The subgraphs can be organized into Components which can be connected to other Components to form larger and more complex graphs. Chisel code can be mapped to many different backends. We currently support a C++ backend for high speed simulation and a Verilog backend to leverage existing ASIC tools. Chapter 3 presents the lowest level of hardware construction in Chisel. We consider this the lowest level since Chisel code is written at the granularity of Chisel’s most basic element, Nodes. At this hardware construction level, Chisel user code looks identical to the targeted backend code. This can be advantageous if the backend has syntax that the tools can use to synthesize efficient hardware. The downside to this is that syntax in one backend might not have a parallel in another backend. Moving up the abstraction ladder, Chapter 4 presents the next level of Chisel hardware construction in which the user writes code using Scala functions that generate hardware. Although the Chisel code might look similar to Chisel from Chapter 3, the resulting Chisel graph is much denser since this hardware construction level only uses the basic features of Chisel. Higher level hardware facilities are abstracted away into Scala functions that implement the same functionality using only basic Chisel Nodes. Although we no longer leverage backend tool support for special syntax, this approach has the advantage of decoupling Chisel code from the backends. The last level of hardware construction, Chapter 5, analyzes and transforms Chisel graphs to produce different Chisel graphs. This hardware construction level is similar to the hardware construction level of Chapter 4 in that users still write Scala programs to build hardware. The difference is that these functions are invoked during elaboration time to modify a graph instead of during runtime to build the graph. This hardware construction level is useful for circuit designs that are difficult or undesirable to build during runtime. 1.1 Related Works The most widely used hardware description languages (HDL), Verilog and VHDL, are actually ill-suited for designing efficient, high-performance hardware. These languages lack the powerful abstraction facilities that are readily available in modern programming languages. Without these facilities, hardware designers using these languages are relegated to writing undescriptive and non-reusable hardware modules which dramatically impedes the most important aspect of building efficient hardware: design space exploration. One way to circumvent this issue is to design hardware from within a specific domain. Esterel [3] is synchronous programming language that uses event-based statement to describe reactive systems. Bluespec [4] uses guarded atomic actions to describe state change. Spiral [5] generates hardware for 3 linear signal transformations from an input problem description. Although these languages allow designers to write more expressive hardware, they only work well when used for task that matches their model and will fall short when used for general purpose hardware construction. 1.2 Acknowledgments The Chisel HDL is an ongoing project. Jonathan Bachrach is the original author. I joined the project in 2011 where I worked with Jonathan and Brian Richards to port an existing FPU library from Verilog to Chisel. I continued working with Jonathan to implement Chisel’s type system. In 2012 I work with Yunsup Lee to port an existing vector machine to Chisel. During this time I developed Vec to better express the hardware in the vector machine. I eventually moved Vec into Chisel’s library. Since then Jonathan has added support for bidirectional wires to Vec and Andrew Waterman has added new functionality to Vec such as contains and forAll methods. Jonathan is responsible for the original implementation of Chisel’s when statement. Andrew implemented Chisel’s elaboration time interface. Yunsup, Andrew, and I have written various SRAM backends for the chips we built. The automatic pipeline synthesis tool is CS294-88 class project that I developed with Wenyu Tang. 4 2 Chisel Chisel is a domain-specific language embedded in the Scala programming language [7] so it is really just a library of Scala functions and data structures. Someone writing Chisel code is essentially writing a Scala program that uses the provided functions and data structures to construct hardware. This chapter introduces the core components of the Chisel hardware construction language; readers interested in a more a detailed specification of Chisel should consult the Chisel manual [1] which can be found on the Chisel website. 2.1 Node Hardware circuits are represented in Chisel as directed graphs of Nodes, which are the base class from which all circuit elements are derived. Figure 1a shows the Chisel class hierarchy and Figure 1b shows the API for the Node class. Types. Chisel has its own type system that is maintained independent of the Scala type system. Figure 1a shows the Chisel data types and their hierarchy (everything rooted at Data). Bits is used to represent raw collection of bits, Bool is used to represent Boolean literals, and Num is used to represent numbers. Note that Num has two subclasses, Fix and UFix which are used for signed and unsigned numbers respectively. Node abstract class Node { // user assigned name of Node Updateable Lit Op val name: String Data Mem Reg // incoming edges val input: ArrayBuffer[Node] Bits Bool Num // width inferance def inferWidth: Int Fix UFix } (a) Hierarchy (b) API Figure 1: Chisel Node Arithmetic and Logic. Arithmetic and logic operations are represented with Op nodes. The Chisel manual has an exhaustive list of supported operations. Op nodes are constructed whenever a user invokes the corresponding type node member function for the operation. For example, suppose we have two Bits nodes, a and b, and want to connect them to the inputs of an AND gate. The Chisel code that will do this is a & b which is actually syntactic sugar for a.&(b). The & function that Bits defines will instantiate an Op node to represent the & and put the two Bits nodes, a and b, into the Op node’s input. The example Chisel code in Figure 2 shows Bits nodes and Op nodes interacting to create a 2-input multiplexor built out of & and | gates. In this example, a, b, and sel are incoming Bits. Invoking a & sel will create a new Op and connect that node to a Bits since both inputs are Bits. The output of the circuit, c, will be the Bits connected to the | Op. Literals The Lit node works in conjunction with type Nodes to represent literals. Figure 3 shows example Chisel code for instantiating literals. True and false literals are represented as Bools 5 a Bits Op(&) Bits sel Bits Op(|) Bits c Op(&) Bits b Bits val c = a & sel | b & ~sel Figure 2: 2-Input Mux connected to Lit nodes 1 and 0 respectively. Arbitrary bit strings can be represented with Bits Nodes and numbers can be represented with Fix and UFix Nodes, all of which are connected to Lit Nodes with the appropriate value. Registers The Reg node, Chisel most basic state element, represents a positive-edge-triggered flip-flop. Reg nodes, unlike the previous nodes, are manually instantiated by the user.

Hardware Construction in Chisel

PIPELINING and ASSOCIATED TIMING ISSUES Introduction: While

2.5 Classification of Parallel Computers

Microprocessor Architecture

Instruction Pipelining (1 of 7)

Review of Computer Architecture

CS152: Computer Systems Architecture Pipelining

Threading SIMD and MIMD in the Multicore Context the Ultrasparc T2

Pipelined Computations

Trends in Processor Architecture

Reverse Engineering X86 Processor Microcode

Chap. 9 Pipeline and Vector Processing

Chapter 4 Data-Level Parallelism in Vector, SIMD, and GPU Architectures