Design Principles for Packet Deparsers on Fpgas

Total Page:16

File Type:pdf, Size:1020Kb

Design Principles for Packet Deparsers on Fpgas Design Principles for Packet Deparsers on FPGAs Thomas Luinaud Jeferson Santiago da Silva [email protected] [email protected] Polytechnique Montréal Kaloom Inc. Montréal, Canada Montréal, Canada J.M. Pierre Langlois Yvon Savaria [email protected] [email protected] Polytechnique Montréal Polytechnique Montréal Montréal, Canada Montréal, Canada ABSTRACT KEYWORDS The P4 language has drastically changed the networking field as it Packet deparsers; Graph optimization; P4 language allows to quickly describe and implement new networking appli- ACM Reference Format: cations. Although a large variety of applications can be described Thomas Luinaud, Jeferson Santiago da Silva, J.M. Pierre Langlois, and Yvon with the P4 language, current programmable switch architectures Savaria. 2021. Design Principles for Packet Deparsers on FPGAs. In Proceed- impose significant constraints on P4 programs. To address this ings of the 2021 ACM/SIGDA International Symposium on Field Programmable shortcoming, FPGAs have been explored as potential targets for P4 Gate Arrays (FPGA ’21), February 28-March 2, 2021, Virtual Event, USA. ACM, applications. P4 applications are described using three abstractions: New York, NY, USA, 7 pages. https://doi.org/10.1145/3431920.3439303 a packet parser, match-action tables, and a packet deparser, which reassembles the output packet with the result of the match-action tables. While implementations of packet parsers and match-action 1 INTRODUCTION tables on FPGAs have been widely covered in the literature, no gen- The P4 Domain Specific Language (DSL) [6] has reshaped the net- eral design principles have been presented for the packet deparser. working domain as it allows describing custom packet forwarding Indeed, implementing a high-speed and efficient deparser on FPGAs applications with much flexibility. There has been a growing in- remains an open issue because it requires a large amount of inter- terest in using FPGAs to offload networking tasks. For instance, connections and the architecture must be tailored to a P4 program. Microsoft already deploy FPGAs in their data centers to implement As a result, in several works where a P4 application is implemented the data plane of Azure servers [8]. In-network computing is an- on FPGAs, the deparser consumes a significant proportion of chip other avenue where FPGAs have recently been considered [20]. resources. Hence, in this paper, we address this issue by presenting In addition, several recent works exploit FPGA reconfigurability design principles for efficient and high-speed deparsers on FPGAs. to create programmable data planes and implement P4 applica- As an artifact, we introduce a tool that generates an efficient vendor- tions [5, 13, 21]. agnostic deparser architecture from a P4 program. Our design has As presented in Figure 1, a P4 application comprises three ab- been validated and simulated with a cocotb-based framework. The stractions: the packet parser, the processing stage (match-action resulting architecture is implemented on Xilinx Ultrascale+ FPGAs tables), and the packet deparser (§2.2). While designs of efficient and supports a throughput of more than 200 Gbps while reducing packet parsers on FPGA have been widely explored [3, 5, 19], little resource usage by almost 10× compared to other solutions. effort has been dedicated to the implementation of efficient packet deparsers. First, to the best of our knowledge, only a single paper CCS CONCEPTS covers this topic [7]. However, Cabal et al. [7] report only the FPGA resource consumption for a 100 Gbps packet deparser, while the • Hardware ! Reconfigurable logic applications; • Computer design principles and microarchitectural details are not covered. arXiv:2103.07750v1 [cs.AR] 13 Mar 2021 systems organization ! Reconfigurable computing; High- Second, as Luinaud et al. [16] have observed, a packet deparser can level language architectures; • Networks ! Programming in- consume more than 80% of the resources needed to implement a terfaces. complete pipeline, which can jeopardize the ability of FPGAs to implement more complex P4 applications. This paper introduces an open-source solution to generate effi- cient and high-speed packet deparsers on FPGAs. It lays the foun- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed dations for the design principles of packet deparsers on FPGAs. It for profit or commercial advantage and that copies bear this notice and the full citation comprises an architecture and a compiler to generate a deparser on the first page. Copyrights for components of this work owned by others than ACM from a P4 program. must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a The deparser compiler (§4) is described in Python and generates fee. Request permissions from [email protected]. synthesizable VHDL code for the proposed deparser architecture. FPGA ’21, February 28-March 2, 2021, Virtual Event, USA The generated architecture leverages the inherent configurability © 2021 Association for Computing Machinery. ACM ISBN 978-1-4503-8218-2/21/02...$15.00 of FPGAs to avoid hardware constructs that cannot be efficiently https://doi.org/10.1145/3431920.3439303 implemented on FPGAs, such as crossbars or barrel shifters [1, 25]. PHV_data PHV PHV_valid Pkt_in Pkt_out shifters Pkt_out PHV PHV Selector Payload Payload Parser shifters Deparser Processing Figure 2: Deparser architecture overview Packet Payload Because the sequence of emit statements determines the header Figure 1: Considered switch architecture emission order, and because the validity bit can be altered by previ- ous control blocks, the deparser must be able to insert or remove headers at runtime. The simulation environment is based on cocotb [11], which al- lows using several off-the-shelf Python packages, such as Scapy, 2.2 P4 Program Components to generate test cases. In addition, it is possible to connect the This paper considers the switch structure proposed by Benáček design under test with virtual network interfaces [12]. As a re- et al. [5], composed of three parts: a parser, a processing part and a sult, behavioural validation can be done using the P4 behavioural deparser, as presented in Figure 1. reference model [17]. The generated architecture has been evaluated for a variety of Parser. The parser takes as input a packet and generates a Packet packet headers. The evaluations show that the generated deparser Header Vector (PHV) and a packet payload. In our design, we as- supports more than 200 Gbps packet throughput while reducing sume that the PHV is composed of two parts : the PHV_data bus the resource usage by more than 10× compared to state-of-the-art containing header data and a bitmap vector, the PHV_valid bus, in- solutions (§5). dicating the validity bit of each header component. We also assume The contributions of this paper are as follows: that the packet payload is sent through a streaming bus with the • A deparser architecture that leverages FPGA configurabil- first byte at position 0. ity (§3); Processing. The processing part takes as input the PHV from • An open-source P4-to-VHDL packet deparser compiler (§4); the parser and outputs a modified PHV, which is forwarded tothe and, deparser. The operations on the PHV can either be header data • A simulation environment based on cocotb to simplify de- modifications or header validity bit alteration. parser verification (§5). Deparser. The deparser block takes as input the PHV from the processing part and the payload from the parser. It outputs the 2 PACKET PROCESSING packet to be sent on a streaming bus. This section introduces the P4 language and how P4 program com- ponents are organized to describe packet processing. 3 DEPARSER ARCHITECTURE PRINCIPLES In this section we cover the deparser architecture principles. First, 2.1 P4 language we introduce a deparser abstract machine. Second, we cover the P4 [6] is an imperative DSL used to describe custom packet pro- deparser I/O signals. Third, we present the microarchitecture of cessing on programmable data planes. the proposed deparser. All our design choices use the inherent configurability of the FPGAs and provide configurable blocks for 2.1.1 Overview of P4 programs. There are four components that the deparser compiler. structure a P4 program: header, parser, control, and switch.A header is a structure composed of fields of specific width anda 3.1 Deparser Abstract Machine validity bit. A struct of headers is used to define the set of headers that can be processed by a P4 program. The parser block expresses The deparser abstract machine is described in Figure 2. In this in which order and how to extract packet headers. The Control architecture, we assume that the PHV is buffered and arrives at the block describes the operations to perform on headers. deparser together with a PHV valid vector. Algorithm 1 presents the pseudo-code for the PHV shifters mod- 2.1.2 Control Operations. In a control block, multiple operation ule while Algorithm 2 illustrates the payload shifter. types can be performed to modify headers. Two specific opera- The main limiting factors to implement a deparser on FPGAs are tions are of interest for the deparser, setValid and setInvalid, the high number of interconnections and barrel shifters required which can be used to set a header validity bit to valid or invalid, for header insertion. To limit the use of these blocks, we construct respectively. a new packet based on the header and the payload. Thus, as the P4 In P4, control blocks also implement the deparsing logic. These deparsing logic can entirely be inferred at compile time and since blocks are composed of a series of emit statements. First, the order FPGAs are reconfigurable, we tailor the deparser architecture toa of these statements determines in which order headers are emitted.
Recommended publications
  • Federation Toolflow
    Federation: Open Source SoC Design Methodology Jack Koenig, Staff Engineer Aug 3, 2019 Hardware Trends: Compute Needs are Changing e.g., machine learning But, CPUs are not getting faster! Source: Medium, Entering the world of Machine Learning 2 Hardware Trends: Custom Hardware to the Rescue! Custom Chips But, custom chip development GPUs (e.g., Google TPU) costs are too high! 3 How did turn into a $1B acquisition with only 13 employees?* 4 * https://www.sigarch.org/open-source-hardware-stone-soups-and-not-stone-statues-please Why do silicon projects need experts from at least 14+ disciplines just to get started? 6 <Your Name Here> Tech Stack Tech Stack Readily-Available Reusable Technology from techstacks.io Federation Tools Infrastructure Cloud Services 7 We Should Copy “Innovations” from the Software Industry open-source standards abstraction and code reuse composability and APIs productivity tooling commodity infrastructure 8 npm Enables Modular Web Development with Reusable Packages 9 Source: https://go.tiny.cloud/blog/a-guide-to-npm-the-node-js-package-manager/ npm Enables Javascript as Most Popular Programming Language • 830.000 of packages • 10 million users • 30 billion packages downloads per month • 90% of Web built of NPM packages • NPM used everywhere: client, server, mobile, IoT • Open source libraries availability is the major driving force for Language adoption [Leo2013] proven by NPM • 70+ programming languages can be transpiled into JS, WebAssembly and published on NPM 10 Federation: the action of forming states or organizations into a single group with centralized control, within which smaller divisions have some degree of internal autonomy 11 Federation: a suite of open-source tools that SiFive is using to orchestrate modular SoC design workflows 12 Federation Tools Enable a Modular SoC Design Methodology Federation Tools IP Package 1 IP Package 2 IP Package 3 1.
    [Show full text]
  • Co-Simulation Between Cλash and Traditional Hdls
    MASTER THESIS CO-SIMULATION BETWEEN CλASH AND TRADITIONAL HDLS Author: John Verheij Faculty of Electrical Engineering, Mathematics and Computer Science (EEMCS) Computer Architecture for Embedded Systems (CAES) Exam committee: Dr. Ir. C.P.R. Baaij Dr. Ir. J. Kuper Dr. Ir. J.F. Broenink Ir. E. Molenkamp August 19, 2016 Abstract CλaSH is a functional hardware description language (HDL) developed at the CAES group of the University of Twente. CλaSH borrows both the syntax and semantics from the general-purpose functional programming language Haskell, meaning that circuit de- signers can define their circuits with regular Haskell syntax. CλaSH contains a compiler for compiling circuits to traditional hardware description languages, like VHDL, Verilog, and SystemVerilog. Currently, compiling to traditional HDLs is one-way, meaning that CλaSH has no simulation options with the traditional HDLs. Co-simulation could be used to simulate designs which are defined in multiple lan- guages. With co-simulation it should be possible to use CλaSH as a verification language (test-bench) for traditional HDLs. Furthermore, circuits defined in traditional HDLs, can be used and simulated within CλaSH. In this thesis, research is done on the co-simulation of CλaSH and traditional HDLs. Traditional hardware description languages are standardized and include an interface to communicate with foreign languages. This interface can be used to include foreign func- tions, or to make verification and co-simulation possible. Because CλaSH also has possibilities to communicate with foreign languages, through Haskell foreign function interface (FFI), it is possible to set up co-simulation. The Verilog Procedural Interface (VPI), as defined in the IEEE 1364 standard, is used to set-up the communication and to control a Verilog simulator.
    [Show full text]
  • Pyvsc: Systemverilog-Style Constraints and Coverage in Python
    PyVSC: SystemVerilog-Style Constraints, and Coverage in Python M. Ballance Milwaukie, OR, USA [email protected] Abstract—Constrained-randomization and functional Coverage is captured as a set of individual coverpoint coverage are key elements in the widely-used SystemVerilog-based instances, and is not organized into covergroup types and verification flow. The use of Python in functional verification is instances in the same way that SystemVerilog does. growing in popularity, but Python has historically lacked support for the constraint and coverage features provided by B. SystemC SystemVerilog. This paper describes PyVSC, a library that SystemC has been one of the longer-lasting languages used provides these features. for modeling and verifying hardware that is built as an embedded domain-specific language within a general-purpose Keywords—functional verification, Python, constrained programming language (C++). While libraries for constraints random, functional coverage and coverage targeting SystemC are not directly applicable in I. INTRODUCTION Python, it’s worthwhile considering approaches taken. Verification flows based on constrained-random stimulus The SystemC Verification Library (SCV) provides basic generation and functional coverage collection have become randomizations and constraints, using a Binary Decision dominant where SystemVerilog is used for the testbench Diagram (BDD) solver. BDD-based constraint solvers are environment. Consequently, verification engineers have known to have performance and capacity challenges, especially significant experience with the constructs and patterns provided with sizable constraint spaces. No support for capturing by SystemVerilog. functional coverage is provided. There is growing interest in making use of Python in The CRAVE [4] library sought to bring higher performance verification environments – either as a complete verification and capacity to randomization in SystemC using Satifiability environment or as an adjunct to a SystemVerilog verification Modulo Theories (SMT) constraint solvers.
    [Show full text]
  • PGAS Communication for Heterogeneous Clusters with Fpgas
    PGAS Communication for Heterogeneous Clusters with FPGAs by Varun Sharma A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto © Copyright 2020 by Varun Sharma Abstract PGAS Communication for Heterogeneous Clusters with FPGAs Varun Sharma Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2020 This work presents a heterogeneous communication library for generic clusters of processors and FPGAs. This library, Shoal, supports the partitioned global address space (PGAS) memory model for applications. PGAS is a shared memory model for clusters that creates a distinction between local and remote memory accesses. Through Shoal and its common application programming interface for hardware and software, applications can be more freely migrated to the optimal platform and deployed onto dynamic cluster topologies. The library is tested using a thorough suite of microbenchmarks to establish latency and throughput performance. We also show an implementation of the Jacobi method that demonstrates the ease with which applications can be moved between platforms to yield faster run times. ii Acknowledgements It takes a village to raise a child and about as many people to raise a thesis as well. Foremost, I would like to thank my supervisor, Paul Chow, for all that he’s done for me over the course of the four years that I’ve known him thus far. Thank you for accepting me as your student for my undergraduate thesis and not immediately rejecting me when I asked to pursue a Masters under your supervision as well.
    [Show full text]
  • Lastlayer:Towards Hardware and Software Continuous Integration
    1 LastLayer:Towards Hardware and Software Continuous Integration Luis Vega, Jared Roesch, Joseph McMahan, Luis Ceze. Paul G. Allen School of Computer Science & Engineering, University of Washington F Abstract—This paper presents LastLayer, an open-source tool1 that enables hardware and software continuous integration and simulation. Python Compared to traditional testing approaches based on the register transfer Scala level (RTL) abstraction, LastLayer provides a mechanism for testing RTL input signals Python Verilog designs with any programming language that supports the C RTL DSLs hardware foreign function interface (CFFI). Furthermore, it supports a generic C Scala RTL input signals design Verilog ports interface that allows external programs convenient access to storage RTL DSLs RTL output signals hardware resources such as registers and memories in the design as well as control design VHDL ports over the hardware simulation. Moreover, LastLayer achieves this software Verilog RTL output signals hardware tests integration without requiring any hardware modification and automatically VHDL generates language bindings for these storage resources according to user specification. Using LastLayer, we evaluated two representative hardware tests Python (a) Traditional RTL simulationDPI integration examples: a hardware adder written in Verilog operating reg_a Java DPIDPI over NumPy arrays, and a ReLu vector-accelerator written in Chisel Python reg_amem_a processing tensors from PyTorch. Racket Java DPIDPI CFFI mem_a mem_b Index terms— hardware simulation, hardware language inter- HaskellRacket DPI CFFI DPI mem_b LastLayer reg_b operability, agile hardware design HaskellRust DPI LastLayer reg_b JavaScriptRust hardware design 1 INTRODUCTION softwareJavaScript tests hardware design software tests From RISC-V [9] processors to domain-specific accelerators, both industry and academia have produced an outstanding number of (b) LastLayer simulation open-source hardware projects.
    [Show full text]
  • Cocotb: a Python-Based Digital Logic Verification Framework
    Cocotb: a Python-based digital logic verification framework Ben Rosser University of Pennsylvania December 11, 2018 Ben Rosser (Penn) Cocotb for CERN Microelectronics December 11, 2018 1 / 41 Introduction Cocotb is a library for digital logic verification in Python. Coroutine cosimulation testbench. Provides Python interface to control standard RTL simulators (Cadence, Questa, VCS, etc.) Offers an alternative to using Verilog/SystemVerilog/VHDL framework for verification. At Penn, we chose to use cocotb instead of UVM for verification last summer. This talk will cover: What cocotb is, what its name means, and how it works. What are the potential advantages of cocotb over UVM or a traditional testbench? What has Penn’s experience been like using cocotb? Ben Rosser (Penn) Cocotb for CERN Microelectronics December 11, 2018 2 / 41 Introduction: Disclaimer Disclaimer: I am not a UVM expert! My background is in physics and computer science, not hardware or firmware design. During my first year at Penn, I spent some time trying to learn UVM, but we decided to try out cocotb soon afterward. I can talk about what made us want an alternative to UVM, but I cannot provide a perfect comparison between the two. Ben Rosser (Penn) Cocotb for CERN Microelectronics December 11, 2018 3 / 41 What is Cocotb? Ben Rosser (Penn) Cocotb for CERN Microelectronics December 11, 2018 4 / 41 Approaches to Verification Why not just use Verilog or VHDL for verification? Hardware description languages are great for designing hardware or firmware. But hardware design and verification are different problems. Using the same language for both might not be optimal.
    [Show full text]
  • Latest Updates
    The PoC-Library Documentation Release 1.2.0 Patrick Lehmann, Thomas B. Preusser, Martin Zabel Sep 11, 2018 Introduction I Introduction1 1 What is PoC? 5 1.1 What is the History of PoC?.....................................6 1.2 Which Tool Chains are supported?..................................6 1.3 Why should I use PoC?.......................................7 1.4 Who uses PoC?............................................7 2 Quick Start Guide 9 2.1 Requirements and Dependencies...................................9 2.2 Download............................................... 10 2.3 Configuring PoC on a Local System................................. 10 2.4 Integration.............................................. 10 2.5 Run a Simulation........................................... 12 2.6 Run a Synthesis........................................... 13 2.7 Updating............................................... 13 3 Get Involved 15 3.1 Report a Bug............................................. 15 3.2 Feature Request........................................... 15 3.3 Talk to us on Gitter.......................................... 15 3.4 Contributers License Agreement................................... 16 3.5 Contribute to PoC.......................................... 16 3.6 Give us Feedback........................................... 18 3.7 List of Contributers.......................................... 18 4 Apache License 2.0 19 4.1 1. Definitions............................................. 19 4.2 2. Grant of Copyright License....................................
    [Show full text]
  • Cocotb: a Python-Based Digital Logic Verification Framework
    Cocotb: a Python-based digital logic verification framework Ben Rosser University of Pennsylvania December 10, 2019 Ben Rosser (Penn) Cocotb Overview December 10, 2019 1 / 20 Introduction Cocotb is a library for digital logic verification in Python. Coroutine cosimulation testbench. Provides Python interface to control standard RTL simulators (Cadence, Questa, VCS, etc.) Offers an alternative to using Verilog/SystemVerilog/VHDL framework for verification. At Penn, we used cocotb to verify ITk Strips ASICs (HCCStar and AMAC). We found it to be a very powerful tool for verification! Would highly recommend using cocotb for other projects. This talk: part of a seminar I gave at CERN (https://indico.cern.ch/event/776422/): What cocotb is, what its name means, and how it works. What are the potential advantages of cocotb over UVM or a traditional testbench? What has Penn’s experience been like using cocotb? Ben Rosser (Penn) Cocotb Overview December 10, 2019 2 / 20 Approaches to Verification Why not just use Verilog or VHDL for verification? Hardware description languages are great for designing hardware or firmware. But hardware design and verification are different problems. Using the same language for both might not be optimal. Verification testbenches are software, not hardware. Higher level language concepts (like OOP) are useful when writing complex testbenches. Two possible ways to improve the situation: Add higher level programming features to a hardware description language. Use an existing general purpose language for verification. SystemVerilog is the first approach: simulation-only OOP language features. UVM (Universal Verification Methodology) libraries written in SystemVerilog. UVVM is a similar VHDL library; less familiar with it.
    [Show full text]
  • TFG-3030 DELGADO DELGADO, JOSÉ LUIS.Pdf
    Proyecto Fin de Carrera Grado de Ingeniería en Tecnologías Industriales Estudio sobre inyección de fallos en simulaciones Verilog/VHDL utilizando Cocotb y software libre Autor: Jose Luis Delgado Delgado Tutor: Hipólito Guzmán Miranda Equation Chapter 1 Section 1 Dpto. de Ingeniería Electrónica Escuela Técnica Superior de Ingeniería Universidad de Sevilla Sevilla, 20 20 Estudio sobre inyección de fallos en simulaciones Verilog/VHDL utilizando Cocotb y software libre 2 Estudio sobre inyección de fallos en simulaciones Verilog/VHDL utilizando Cocotb y software libre Proyecto Fin de Carrera Ingeniería Industrial Estudio sobre inyección de fallos en simulaciones Verilog/VHDL utilizando Cocotb y software libre Autor: Jose Luis Delgado Delgado Tutor: Hipólito Guzmán Miranda Profesor titular Dpto. de Ingeniería Electrónica Escuela Técnica Superior de Ingeniería Universidad de Sevilla Sevilla, 2020 3 Estudio sobre inyección de fallos en simulaciones Verilog/VHDL utilizando Cocotb y software libre 4 Estudio sobre inyección de fallos en simulaciones Verilog/VHDL utilizando Cocotb y software libre Proyecto Fin de Carrera: Estudio sobre inyección de fallos en simulaciones Verilog/VHDL utilizando Cocotb y software libre Autor: Jose Luis Delgado Delgado Tutor: Hipólito Guzmán Miranda El tribunal nombrado para juzgar el Proyecto arriba indicado, compuesto por los siguientes miembros: Presidente: Vocales: Secretario: Acuerdan otorgarle la calificación de: Sevilla, 2020 El Secretario del Tribunal 5 Estudio sobre inyección de fallos en simulaciones Verilog/VHDL
    [Show full text]
  • Verification of Readout Electronics in the ATLAS Itk Strips Detector
    Verification of Readout Electronics in the ATLAS ITk Strips Detector Benjamin John Rosser Department of Physics and Astronomy University of Pennsylvania Abstract: Particle physics detectors increasingly make use of custom FPGA firmware and application- specific integrated circuits (ASICs) for data readout and triggering. As these designs become more complex, it is important to ensure that they are simulated under realistic operating conditions before beginning fabrication. One tool to assist with the development of such designs is cocotb, an open source digital logic verification framework. Using cocotb, verification can be done at high level using the Python programming language, allowing sophisticated data flow simulations to be conducted and issues to be identified early in the design phase. Cocotb was used successfully in the development of a testbench for several custom ASICs for the ATLAS ITk Strips detector, which found and resolved many problems during the development of the chips. Talk presented at the 2019 Meeting of the Division of Particles and Fields of the American Physical Society (DPF2019), July 29{August 2, 2019, Northeastern University, Boston, C1907293. 1 Introduction Work is currently underway in preparation for the High Luminosity upgrade to the Large Hadron Collider (HL-LHC), scheduled to begin running in 2026. This upgrade project will increase the number of proton-proton collisions by a factor of 3.5. By the end of the HL-LHC's run in the late 2030s, an order of magnitude more data will have been collected. In order to run at this higher collision rate, the ATLAS detector [1] requires a number of upgrades to enable faster data processing and increased radiation tolerance.
    [Show full text]
  • Fuzzing Hardware Like Software
    Fuzzing Hardware Like Software Timothy Trippel, Kang G. Shin Alex Chernyakhovsky, Garret Kelly, Matthew Hicks Computer Science & Engineering Dominic Rizzo Computer Science University of Michigan OpenTitan Virginia Tech Ann Arbor, MI Google, LLC Blacksburg, VA ftrippel,[email protected] Cambridge, MA [email protected] fachernya,gdk,[email protected] Abstract—Hardware flaws are permanent and potent: hard- Software Hardware ware cannot be patched once fabricated, and any flaws may undermine even formally verified software executing on top. Custom Consequently, verification time dominates implementation time. Test Coverage The gold standard in hardware Design Verification (DV) is Generator concentrated at two extremes: random dynamic verification and Tracing TB DUT formal verification. Both techniques struggle to root out the PriorWork subtle flaws in complex hardware that often manifest as security vulnerabilities. The root problem with random verification is its undirected nature, making it inefficient, while formal verification Software SW is constrained by the state-space explosion problem, making it Fuzzer à infeasible to apply to complex designs. What is needed is a HW HW Generic TB solution that is directed, yet under-constrained. Fuzzing HW DUT Model DUT Instead of making incremental improvements to existing hard- ware verification approaches, we leverage the observation that =Inject Coverage Tracing Instrumentation existing software fuzzers already provide such a solution; we adapt it for hardware verification, thus leveraging existing—more Fig. 1. Fuzzing Hardware Like Software. Unlike prior Coverage Directed Test Generation (CDG) techniques [15]–[18], we advocate for fuzzing soft- advanced—software verification tools. Specifically, we translate ware models of hardware directly, with a generic harness (testbench) and RTL hardware to a software model and fuzz that model.
    [Show full text]
  • Fault: a Python Embedded Domain-Specific Language For
    tifact r Comp * * let fault: A Python Embedded Domain-Specific nt e * A A te is W E s e n C l * l o D C o * V * c u e A m s E Language For Metaprogramming Portable u e e C n R t v e o d t * y * s E a a l d u e a Hardware Verification Components t Lenny Truong, Steven Herbst, Rajsekhar Setaluri, Makai Mann, Ross Daly, Keyi Zhang, Caleb Donovick, Daniel Stanley, Mark Horowitz, Clark Barrett, and Pat Hanrahan Stanford University, Stanford CA 94305, USA Abstract. While hardware generators have drastically improved design productivity, they have introduced new challenges for the task of veri- fication. To effectively cover the functionality of a sophisticated gener- ator, verification engineers require tools that provide the flexibility of metaprogramming. However, flexibility alone is not enough; components must also be portable in order to encourage the proliferation of verifica- tion libraries as well as enable new methodologies. This paper introduces fault, a Python embedded hardware verification language that aims to empower design teams to realize the full potential of generators. 1 Introduction The new golden age of computer architecture relies on advances in the design and implementation of computer-aided design (CAD) tools that enhance produc- tivity [11, 21]. While hardware generators have become much more powerful in recent years, the capabilities of verification tools have not improved at the same pace [12]. This paper introduces fault,1 a domain-specific language (DSL) that aims to enable the construction of flexible and portable verification components, thus helping to realize the full potential of hardware generators.
    [Show full text]