Legup: Open-Source High-Level Synthesis Research Framework
Total Page:16
File Type:pdf, Size:1020Kb
LegUp: Open-Source High-Level Synthesis Research Framework by Andrew Christopher Canis A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto c Copyright 2015 by Andrew Christopher Canis Abstract LegUp: Open-Source High-Level Synthesis Research Framework Andrew Christopher Canis Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto 2015 The rate of increase in computing performance has been slowing due to the end of processor fre- quency scaling and diminishing returns from multiple cores. We believe the industry is heading towards heterogeneous computing, an accelerator era, where specialized hardware is harnessed for better power efficiency and compute performance. A natural platform for these accelerators are field-programmable gate arrays (FPGAs), which are integrated circuits that can implement large custom digital circuits including complete system-on-chips. However, programming an FPGA can be an arduous undertaking even for experienced hardware engineers. We propose raising the abstraction level by allowing a designer to incrementally move their design from a processor to a set of hardware accelerators, each automat- ically synthesized from a software implementation. This dissertation describes LegUp, an open-source high-level synthesis (HLS) framework that enables this new design methodology. We further present novel improvements to the quality of the synthesized circuits when targeting FPGAs. First, we present the LegUp high-level synthesis framework with an overview of our design flow. The software is unique among academic tools for offering a wide support of the ANSI C software language, for targeting a hybrid processor/accelerator architecture, and for being open-source. We also show that the quality of results produced by LegUp are competitive with a commercial HLS tool. Next, we present an FPGA architecture-specific HLS resource sharing approach. Our technique multi-pumps high-speed DSP blocks on modern FPGAs by clocking them at twice the system clock frequency. We show that multi-pumping can reduce circuit area without impacting performance. Following this, we describe a novel loop pipeline scheduling algorithm. Our approach handles complex constraints by using a backtracking method to discover better scheduling possibilities. This scheduling algorithm improves throughput for complex loop pipelines compared to prior work and a commercial tool. Finally, we examine LegUp’s target memory architecture and describe how to partition memory within the circuit hierarchy using information from compiler alias analysis. We also present a method to efficiently use the block RAMs present in modern FPGAs by grouping memories together. These techniques decrease memory usage and improve performance for our HLS-generated circuits. ii Acknowledgements There have been many people involved in the LegUp project of whom I was immensely lucky and grateful to work with over the years. This dissertation would not have been possible without my two incredible supervisors and mentors. I would like to thank my co-advisor, Jason Anderson, for his guidance and mentorship throughout my studies. Jason dedicated significant time to the LegUp project, spending many hours in meetings, recruiting students, organizing tutorials and spreading the word about LegUp. I admire your work ethic and I have vastly improved my ability to write and conduct research by learning from your example. Also thanks to my co-advisor, Stephen Brown, for your high-level vision and candid advice, and for giving me the flexibility to follow my own research path. Thanks to the members of my committee, Vaughn Betz, Jianwen Zhu, and Andreas Koch for their edits and feedback on this work. I would like to thank all the other graduate students involved with the LegUp project. I was lucky to work with such a smart team: Blair Fort, Ruo Long (Lanny) Lian, Nazanin Calagar, Li Liu, Marcel Gort, Bain Syrowik, Joy (Yu Ting) Chen, and Julie Hsiao. In particular, I wanted to thank Jongsok (James) Choi with whom I spent many long nights debugging signal waveforms and improving LegUp. Also Mark Aldham for working on the initial version of LegUp and running power simulations. Thanks to all the LegUp summer undergraduate students: Victor Zhang, Ahmed Kammoona, Stefan Hadjis, Kevin Nam, Qijing (Jenny) Huang, Ryan Xi, Emily Miao, Yolanda Wang, Yvonne Zhang, William Cai, and Mathew Hall who were all a joy to work with and pushed the LegUp project further. Thanks to all the other graduate students from Pratt 392 especially Mehmet Avci, Jason Luu, and Braiden Brousseau for your many entertaining discussions over the years. Thanks for the feedback from Altera employees Tomasz Czajkowski and Deshanand Singh who gave some initial guidance for this research direction and for Altera’s funding of the project. I would also like to thank Philippe Coussy, Daniel Gajski, and Jason Cong for organizing a fascinating tutorial that I attended at DAC in 2009, which influenced the work here. Also I am grateful to CMC for providing us with Modelsim licenses. Special thanks to the dependable administrative support from Kelly, Judith, and Darlene. I also appreciated the inspiring entrepreneurship talks and dinners organized by professor Jonathan Rose. I am grateful to the Canadian government for their generous scholarships through the Natural Sci- ences and Engineering Research Council and the Ontario Graduate Scholarship. I thank the Rogers Family for their generous scholarships and for supporting the ECE faculty. Thanks to my friends and roommates for all the fun outside of school over the past six years, especially: Adam, Michael, Paul, Mark, and Alex. I am truly grateful for the loving support of my parents, Anne and Frank, and my brothers: Lloyd, Stephen, and Ian. Thanks for believing in me, supporting my education, and teaching me to always try my best. Finally, thanks to Sabrina for all the love, support, and constant thoughtfulness! iii Our grand business undoubtedly is, not to see what lies dimly at a distance, but to do what lies clearly at hand. — Thomas Carlyle iv Contents 1 Introduction 1 1.1 ResearchMotivation ................................. ...... 3 1.2 ResearchContributions .............................. ....... 5 1.3 Organization ....................................... .... 6 2 Background and Related Work 7 2.1 Introduction...................................... ...... 7 2.2 ModernComputationPlatforms . ....... 7 2.3 High-LevelSynthesisFlow. .. .. .. .. .. .. .. .. .. .. .. .. ....... 8 2.4 C Compiler: Low-Level Virtual Machine (LLVM) . ......... 9 2.5 Allocation.......................................... ... 11 2.6 Scheduling......................................... .... 13 2.6.1 SDCScheduling ..................................... 15 2.6.2 Extracting Parallelism . .. 17 2.7 Binding ............................................ .. 17 2.8 FPGAArchitecture .................................. ..... 19 3 LegUp: Open-Source High-Level Synthesis Research Framework 21 3.1 Introduction...................................... ...... 21 3.2 Background....................................... ..... 22 3.2.1 PriorHLSTools ..................................... 22 3.2.2 Application-Specific Instruction-Set Processors (ASIPs) . ............. 24 3.3 LegUpOverview ..................................... .... 25 3.3.1 DesignMethodology ................................. .. 25 3.3.2 TargetSystemArchitecture . ...... 26 3.4 LegUpDesignandImplementation . ....... 28 3.4.1 HardwareModules .................................. .. 29 3.4.2 Device Characterization . .... 30 3.4.3 HardwareProfiling.................................. .. 31 3.4.4 HybridProcessor/AcceleratorSystem . ........ 31 3.4.5 LanguageSupportandBenchmarks . ...... 32 3.4.6 CircuitCorrectness................................ .... 34 3.4.7 Extensibility of LegUp to Other FPGA Devices . ..... 35 3.5 ExperimentalStudy ................................. ...... 36 v 3.5.1 ExperimentalResults ............................... ... 37 3.5.2 ComparisontoCurrentLegUpRelease. ...... 44 3.6 ResearchusingLegUp ................................ ...... 44 3.7 Summary .......................................... ... 46 4 Multi-Pumping for Resource Reduction in FPGA High-Level Synthesis 48 4.1 Introduction...................................... ...... 48 4.2 Background....................................... ..... 49 4.3 Multi-Pumped Multiplier Units: Concept and Characterization . ........... 50 4.3.1 Multi-Pumped Multiplier Characterization . .... 51 4.3.2 Multi-Pumping vs. Resource Sharing . ..... 53 4.4 Multi-Pumping DSPs in High-Level Synthesis . ........ 54 4.4.1 DSPInferencePrediction . .. .. .. .. .. .. .. .. .. .. .. .... 54 4.5 ExperimentalStudy ................................. ...... 55 4.6 Summary .......................................... ... 57 5 Modulo SDC Scheduling with Recurrence Minimization in HLS 58 5.1 Introduction...................................... ...... 58 5.2 Preliminaries . ... 59 5.2.1 RelatedWork...................................... 59 5.2.2 Background: Loop Pipeline Modulo Scheduling . ..... 61 5.2.3 Background: Loop Pipeline Hardware Generation . ........ 64 5.3 Motivation ......................................... ... 65 5.3.1 Greedy Modulo Scheduling Example . ... 65 5.4 ModuloSDCScheduler ................................. .... 66 5.4.1 Detailed Scheduling Example . .. 69 5.4.2 Complexity Analysis