Download The
Total Page:16
File Type:pdf, Size:1020Kb
Techniques for Enabling In-System Observation-based Debug of High-Level Synthesis Circuits on FPGAs by Jeffrey Goeders BASc Computer Engineering, University Toronto, 2010 MASc Computer Engineering, The University of British Columbia, 2012 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy in THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (Electrical and Computer Engineering) The University of British Columbia (Vancouver) September 2016 c Jeffrey Goeders, 2016 Abstract High-level synthesis (HLS) is a rapidly growing design methodology that allows designers to create digital circuits using a software-like specification language. HLS promises to increase the productivity of hardware designers in the face of steadily increasing circuit sizes, and broaden the availability of hardware acceleration, allowing software designers to reap the benefits of hardware implementation. One roadblock to HLS adoption is the lack of an in-system debugging infrastructure. Existing debug technologies are limited to software emulation and cannot be used to find bugs that only occur in the final operating environment. This dissertation investigates techniques for observing HLS circuits, allowing designers to debug the circuit in the context of the original source code, while it executes at-speed in the normal operating environment. This dissertation is comprised of four major contributions toward this goal. First, we develop a debugging framework that provides users with a basic software-like debug experience, including single- stepping, breakpoints and variable inspection. This is accomplished by automatically inserting special- ized debug instrumentation into the user’s circuit, allowing an external debugger to observe the circuit. Debugging at-speed is made possible by recording circuit execution in on-chip memories and retrieving the data for offline analysis. The second contribution contains several techniques to optimize this data capture logic. Program analysis is performed to develop circuitry that is tailored to the user’s individual design, capturing a 127x longer execution trace than an embedded logic analyzer. The third contribution presents debugging techniques for multithreaded HLS systems. We develop a technique to observe only user-selected points in the program, allowing the designer to sacrifice com- plete observability in order to observe specific points over a longer period of execution. We present an ii algorithm to allow hardware threads to share signal-tracing resources, increasing the captured execution trace by 4x for an eight thread system. The final contribution is a metric to measure observability in HLS circuits. We use the metric to explore trade-offs introduced by recent in-system debugging techniques, and show how different approaches affect the likelihood that variable values will be available to the user, and the duration of execution that can be captured. iii Preface The contributions presented in this thesis have been published in journals and conference proceedings [1, 2, 4, 6], as well as a workshops paper [3], and a book chapter [5]. Content from Chapter 3 was published as a conference paper [1] and a workshop paper [3]. Large portions of Chapter 4 were published in paper [2]. Papers [1] and [2] were combined and extended with the remaining content from Chapter 4 as the journal article in [6]. These contributions were proto- typed in an open-source academic tool, which was described in the book chapter in [5]. Content from Chapter 5 was published as a conference paper [4]. Content from Chapter 6 was also published as a conference paper [7]. In all of these contributions, I was primarily responsible for conducting the research, prototyping techniques, and performing experiments. This was done under the guidance and direction of my advisor, Dr. Steve Wilton. Dr Wilton also provided editorial support for all manuscripts. [1] Jeffrey Goeders and Steven J. E. Wilton. “Effective FPGA debug for high-level synthesis gen- erated circuits”. In: International Conference on Field Programmable Logic and Applications. Sept. 2014, pp. 1–8. [2] Jeffrey Goeders and Steven J. E. Wilton. “Using Dynamic Signal-Tracing to Debug Compiler- Optimized HLS Circuits on FPGAs”. In: International Symposium on Field-Programmable Cus- tom Computing Machines. May 2015, pp. 127–134. [3] Jeffrey Goeders and Steven J. E. Wilton. “Allowing Software Developers to Debug HLS Hard- ware”. In: International Workshop on FPGAs for Software Programmers. Aug. 2015. iv [4] Jeffrey Goeders and Steven J. E. Wilton. “Using Round-Robin Tracepoints to Debug Multi- threaded HLS Circuits on FPGAs”. In: International Conference on Field-Programmable Tech- nology. Dec. 2015, pp. 40–47. [5] Andrew Canis, Jongsok Choi, Blair Fort, Bain Syrowik, Ruo Long Lian, Yuting Chen, Hsuan Hsiao, Jeffrey Goeders, Stephen Brown, and Jason Anderson. “LegUp high-level synthesis”. In: Chapter in FPGAs for Software Engineers. Springer, 2016. [6] Jeffrey Goeders and Steven J.E. Wilton. “Signal-Tracing Techniques for In-System FPGA De- bugging of High-Level Synthesis Circuits”. In: IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2016). To be published in 2016. [7] Jeffrey Goeders and Steven J. E. Wilton. “Quantifying Observability for In-System Debug of High-Level Synthesis Circuits”. In: International Conference on Field Programmable Logic and Applications. Aug. 2016. v Table of Contents Abstract . ii Preface . iv Table of Contents . vi List of Tables . xi List of Figures . xii Glossary . xiv Acknowledgments . xvi 1 Introduction . 1 1.1 Motivation . 1 1.1.1 The Increasing Demand for High-Level Synthesis . 1 1.1.2 The Need for In-System Debugging of HLS Circuits . 3 1.2 Challenges and Objectives . 6 1.2.1 Challenges of Source-Level, In-System Debugging . 6 1.2.2 Research Objectives . 6 1.3 Contributions . 7 1.3.1 A Source-Level, In-System HLS Debugging Framework . 8 1.3.2 Optimizing Data Capture . 9 1.3.3 Debugging Parallel Systems . 10 1.3.4 Quantifying Observability . 11 1.4 Organization . 12 2 Background . 14 2.1 Current Approaches for Debugging HLS Circuits . 14 2.1.1 Debugging and Validation Scenarios . 14 vi 2.1.2 Debugging by Software Emulation . 15 2.1.3 Debugging by Hardware Simulation . 17 2.1.4 In-System Hardware Debugging . 18 2.1.5 Filling a Need: Source-Level, In-System Debugging . 20 2.2 In-System Hardware Debug Techniques . 20 2.2.1 Scan-based Debugging . 21 2.2.2 Trace-based Debugging . 21 2.3 High-Level Synthesis . 24 2.3.1 History and Motivation . 24 2.3.2 Present-Day HLS Tools . 25 2.3.3 The HLS Flow . 27 2.4 Related Work . 33 2.4.1 In-System HLS Debugging . 33 2.4.2 Debugging Optimized Software . 38 2.5 Approaches and Assumptions In This Work . 39 2.5.1 Context with Related Work . 39 2.5.2 Fault Model . 40 2.6 Summary . 42 3 The Debugging Framework . 43 3.1 The Debugging Framework . 44 3.1.1 Adding Debug to the HLS Flow . 44 3.1.2 Modes of Operation . 45 3.1.3 Observability-Based Debug . 47 3.1.4 Debugging Context . 48 3.2 Mapping Software to Hardware . 49 3.2.1 Scope of Mapping . 49 3.2.2 Control Flow . 50 3.2.3 Data Flow . 50 3.2.4 Required Modifications to the HLS Tool . 53 3.2.5 Properties of Benchmark Circuits . 53 3.3 Circuit Instrumentation . 54 3.3.1 Instrumentation Components . 56 3.3.2 Required Modifications to the HLS Tool . 56 3.4 The Debugger Application . 57 3.4.1 Gantt Chart . 59 3.4.2 Debug Modes . 59 3.4.3 Instruction-Level Parallelism . 59 vii 3.4.4 IR Instructions . 60 3.4.5 Compiler Optimizations . 61 3.5 Summary . 62 4 Optimizing Data Capture . 64 4.1 Baseline Architecture . 65 4.2 Split Trace Buffer Architecture . 66 4.3 Control Trace Buffer Optimizations . 67 4.4 Datapath Registers Trace Buffer Optimizations . 69 4.4.1 Dynamic Tracing Architecture . 69 4.4.2 Delay-Worst Signal-Trace Scheduling . 70 4.4.3 Delay-All Signal-Trace Scheduling . 74 4.4.4 Dual-Ported Memory Signal-Trace Scheduling . 74 4.4.5 Dynamic Tracing Results . 74 4.5 Memory-Update Trace Buffer Optimization . 75 4.5.1 Case 1 – Tracing a Single Memory Controller . 76 4.5.2 Case 2 – Multiple Memory Controllers, Combined Tracing with the Datapath Registers . 79 4.5.3 Case 3 – Substitute Memory Controller Signals with IR Registers . 79 4.5.4 Case 4 – When Possible, Use Memory Controller Signals to Deduce IR Signals 81 4.6 Challenges of a Split Buffer Architecture . 82 4.6.1 Event Ordering . 83.