Downloads the Trace History
Total Page:16
File Type:pdf, Size:1020Kb
Accelerating In-System Debug of High-Level Synthesis Generated Circuits on Field-Programmable Gate Arrays using Incremental Compilation Techniques by Pavan Kumar Bussa B.Tech Electrical Engineering, Indian Institute of Technology Jodhpur, 2015 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF APPLIED SCIENCE in The Faculty of Graduate and Postdoctoral Studies (Electrical and Computer Engineering) THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver) August 2017 c Pavan Kumar Bussa, 2017 Abstract High-Level Synthesis (HLS) has emerged as a promising technology that allows designers to create a digital hardware circuit using a high-level lan- guage like C, allowing even software developers to obtain the benefits of hardware implementation. HLS will only be successful if it is accompanied by a suitable debug ecosystem. There are existing debugging methodologies based on software simulation, however, these are not suitable for finding bugs which occur only during the actual execution of the circuit. Recent efforts have presented in-system debug techniques which allow a designer to debug an implementation, running on a Field-Programmable Gate Ar- ray (FPGA) at its actual speed, in the context of the original source code. These techniques typically add instrumentation to store a history of all user variables in a design on-chip. To maximize the effectiveness of the limited on-chip memory and to simplify the debug instrumentation logic, it is de- sirable to store only selected user variables. Unfortunately, this may lead to multiple debug runs. In existing frameworks, changing the variables to be stored between runs changes the debug instrumentation circuitry. This requires a complete recompilation of the design before reprogramming it on an FPGA. In this thesis, we quantify the benefits of recording fewer variables and solve the problem of lengthy full compilations in each debug run using incre- mental compilation techniques present in the commercial FPGA CAD tools. We propose two promising debug flows that use this technology to reduce the debug turn-around time for an in-system debug framework. The first flow, in which the user circuit and instrumentation are co-optimized during compilation, gives the fastest debug clock speeds but suffers in user circuit performance once the debug instrumentation is removed. In the second flow, ii Abstract the optimization of the user circuit is sacrosanct. It is placed and routed first without having any constraints and the debug instrumentation is added later leading to the fastest user circuit clock speeds, but performance suffers slightly during debug. Using either flow, we achieve 40% reduction in debug turn-around times, on average. iii Lay Summary Designing modern digital electronic systems can be expensive and time con- suming. Ensuring a design does not contain errors is especially challenging. This task, known as debugging, is often hampered by the fact that there is limited visibility into the internal operation of a circuit. Recent work has proposed methods to enhance this visibility. These methodologies often involve running the circuit many times; each run requires significant setup (compilation) time in which the design is automatically instrumented with different observation circuitry. In this thesis, we show how this compilation time can be dramatically reduced. The key insight is that the vast major- ity of the circuit does not change between runs; we present techniques that allow the compilation tool to focus only on the parts of the circuit that do change. This leads to significantly faster debug, potentially lowering the cost of producing working digital systems. iv Preface This thesis is related to the PhD dissertation of Dr. Jeffrey Goeders, a recent graduate from UBC who is now an Assistant Professor at Brigham Young University, Utah. He helped me in setting up the framework, answered all my technical questions and pointed me to the right place to start. This work has been accepted as a poster paper titled "Accelerating In- System FPGA Debug of High-Level Synthesis Circuits using Incremental Compilation Techniques" at the International Conference on Field-Programmable Logic and Applications 2017 (FPL'17) and will be published in the confer- ence proceedings. I was primarily responsible for conducting the research, performing the experiments and summarizing the results. This was done under the guidance of my advisor Dr. Steve Wilton. Dr. Wilton and Dr. Goeders also provided editorial support for all of my submitted works. v Table of Contents Abstract ................................. ii Lay Summary .............................. iv Preface ..................................v Table of Contents ............................ vi List of Tables .............................. ix List of Figures ..............................x Acknowledgements ........................... xi 1 Introduction .............................1 1.1 Motivation . .1 1.2 High Level Synthesis . .2 1.3 HLS Debug . .3 1.3.1 Software-like Debug . .3 1.3.2 RTL Simulation . .4 1.3.3 In-system Debugging . .4 1.4 Selective Variable Tracing . .6 1.5 Incremental HLS Debug . .7 1.6 Contributions . .8 1.7 Thesis Outline . 10 2 Background ............................. 12 2.1 Overview . 12 vi Table of Contents 2.2 HLS Flow . 12 2.3 HLS Debugging Techniques . 15 2.3.1 In-system Debugging Approaches . 16 2.3.2 Source Level In-system Debugging Approaches . 19 2.4 Incremental debugging Approaches . 24 2.5 Summary . 32 3 The Debugging Framework .................... 34 3.1 Overview . 34 3.2 Framework . 34 3.2.1 Debug Instrumentation . 35 3.3 Trace Buffer Optimizations . 37 3.3.1 Control Flow Optimization . 38 3.3.2 Dynamic Tracing of Datapath Register Signals . 39 3.4 Debug Flow . 41 3.5 Selective Variable Tracing . 42 3.6 Summary . 43 4 Incremental Debug Flows ..................... 46 4.1 Overview . 46 4.2 Incremental Flows . 47 4.2.1 A Naive Incremental Flow . 47 4.2.2 Incremental Flow with Permanent Taps . 48 4.2.3 Incremental Flow with Permanent Taps and Late Bind- ing ............................ 49 4.3 Automated GUI . 55 4.4 Summary . 55 5 Results ................................ 57 5.1 Overview . 57 5.2 Methodology . 58 5.3 Impact on the Trace Window Size . 60 5.3.1 Variation in the Trace Window Size . 61 5.4 Impact on Debug Instrumentation Area . 62 vii Table of Contents 5.5 Impact on the Compile time . 67 5.6 Impact on the Frequency of User Circuit . 69 5.7 Summary . 75 6 Conclusions and Future Work ................. 76 6.1 Overview . 76 6.2 Summary . 76 6.3 Future Work . 78 Bibliography ............................... 80 Appendix ................................. 90 A A Guide to our GUI Framework ................ 91 viii List of Tables 5.1 Trace Window Size(For Flow 3) . 59 5.2 Trace Scheduler Area (For Flow 3) . 63 5.3 Total Compile Time for Flow 3 in Seconds . 65 5.4 Total Compile Time for Flow 4 in Seconds . 66 5.5 Area Breakdown (For Flow 3) . 69 5.6 Incremental Compile Overhead for Flow 3 in Seconds . 70 5.7 Frequency Results: Flow 3 vs Flow 1 . 73 5.8 Frequency Results: Flow 4 vs Flow 1 . 74 ix List of Figures 2.1 HLS Flow . 14 3.1 Debug Instrumentation . 35 3.2 Control Flow Optimization . 37 3.3 Dynamic Signal Tracing Optimizations . 44 3.4 Original Debug Flow . 45 4.1 A Naive Incremental Flow (Flow 2) . 51 4.2 Incremental Flow with Permanent Taps(Flow 3) . 52 4.3 Incremental Flow with Permanent Taps and Late Binding (Flow 4) . 53 4.4 Modified GUI to support Selective Variable Recording . 54 5.1 Trace Window Size Variation [ T racesubset ]........... 61 T racefull 5.2 Frequency Variation with the number of taps . 72 x Acknowledgements I would like to thank my advisor Dr. Steve Wilton for his constant motiva- tion and support throughout my master's journey. This research would not have been possible without his guidance. Special thanks to Jeff Goeders for helping me understand his work and providing the backbone of the framework for me to build upon. I would also like to thank other graduate students in Wilton's group for their suggestions and feedback especially during our weekly group meetings. My friends have played an important role in helping me maintain a good work life balance which was much needed at some times. Financial support for this work has been provided by the National Sci- ences and Engineering Research Council of Canada, as well as from Intel Corporation. Last but not the least, I would like to thank my parents for believing in me and providing everything I needed to reach this point. xi Chapter 1 Introduction 1.1 Motivation Recent years have seen a tremendous demand for faster computation as ap- plications become larger and more complex. However, the performance of a processor has not been increasing fast enough to meet such computation demands mainly because of the difficulty in reducing the transistor sizes and because of the increased power dissipation. This has triggered the com- munity to move towards parallel multi-core processing architectures where several processors/cores of same kind (Homogeneous computing) or different kinds (Heterogeneous computing) are used to gain performance or energy ef- ficiency. Heterogeneous computing has shown promising results in situations where only a specific part of the application is to be accelerated. One way to accelerate algorithms using these heterogeneous systems is using a cus- tomized application specific processor. This is different than a homogeneous system where a general purpose processor would have to be used irrespective of the computing workload. However, manufacturing and designing an Ap- plication Specific Integrated Circuit (ASIC) like a custom processor is very time consuming and also the associated non-recurring engineering costs are high. An alternative to using ASIC's is to use devices which are more flexible and easy to design while providing similar benefits to those of an ASIC. Field Programmable Gate Arrays (FPGAs) have emerged as reconfigurable devices with the capability of emulating any custom circuit, leading to per- formance gains over a wide range of applications.