Fast and Flexible CAD for Fpgas and Timing Analysis by Kevin Edward

Fast and Flexible CAD for FPGAs and Timing Analysis by Kevin Edward Murray A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto © Copyright 2020 by Kevin Edward Murray Abstract Fast and Flexible CAD for FPGAs and Timing Analysis Kevin Edward Murray Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto 2020 Field Programmable Gate Arrays (FPGAs) are a widely used platform for hardware acceleration and digital systems design, which offer high performance and power efficiency while remaining re- programmable. However a major challenge to using FPGAs is the complex Computer Aided Design (CAD) flow required to program them, which makes FPGAs difficult and time-consuming to use. This thesis investigates techniques to address these challenges. A key component of the CAD flow is timing analysis, which is used to verify correctness and drive optimizations. Since the dominant form of timing analysis, Static Timing Analysis (STA), is time consuming to perform, we investigate methods to efficiently perform STA in parallel. We then develop Extended Static Timing Analysis (ESTA), which performs a more detailed and less pessimistic form of timing analysis by accounting for the probability of signal transitions. We next study a variety of enhancements to the FPGA physical design flow, including to key optimization stages like packing, placement and routing. These techniques are combined in the open- source VTR 8 design flow, and significantly outperform previous methods. We also develop advanced architecture modelling capabilities which allow VTR 8 to target a wider range of FPGA architectures. These enhancements allow FPGA architects more fine grained control of the targeted FPGA architecture, enabling more in depth study of key architectural features such as the routing network. Additionally, we show how these enhancements enable the VPR placement and routing tool within VTR to target commercial FPGAs as part of a fully open-source design flow. Finally, we study how Reinforcement Learning (RL) can be used to learn new effective heuristics for use in CAD algorithms, and show how this can be used to improve the run-time quality trade-off of an FPGA placement algorithm. ii Acknowledgements First, I would like to thank my supervisor, Vaughn Betz. He has provided numerous suggestions and provided very helpful feedback which has greatly improved this work. Beyond research, I am deeply appreciative of the time and effort he has spent mentoring me, and helping to develop my teaching, communication and management skills. I have learned about much beyond CAD and FPGAs under your guidance. I would also like to thank many of my fellow lab mates and friends. Thank you for your engaging discussions which have been the catalysts for many ideas, and for your encouraging friendship. I'd particularly like to thank Mohamed Abdelfattah, Jeff Cassidy, Henry Wong, Sadegh Yazdanshenas, Ibrahim Ahmed, Oleg Petelin, Kosuke Tatsumura, Sameh Attia, Abed Yassine, Andrew Boutros, Fynn Schwiegelshohn, Mathew Hall, Linda Shen, Tanner Young-Schultz, Mustafa Abbas, Mohamed ElDafrawy and Mohamed ElGammal. I am also greatful for having been able to work with a number of undergraduate summer students including: Johnson Zhong, Jasmine Wang, Eugene Sha, Yang Su, Tina Yang, Richard Ren, Andre Pereira, Jean Wu, Matthew Walker, Hanqing Zeng and Martin Zhang. I hope you learned as much for the experience as I have. From Imperial College London, I would like to thank George Constantinides (who hosted me at Imperial for a semester), Andrea Suardi, and the whole Circuit and Systems group. I am also fortunate to have collaborated with many people as part of the VTR project. In particular I would like to thank Jean-Philippe Legault, Kenneth Kent, Jason Luu, Tim Ansell, Keith Rothman, Dusty DeWeese, and Jeff Goeders for their feedback, brainstorming and help supporting and developing VTR. I would also like to thank the members of my thesis commitee, Jason Anderson and Jonathan Rose, for their helpful discussions and feedback over the years, and Tom Spyrou, Chris Wysocki, and Paul Leventis for helpful insights into timing analysis. During this work I have been fortunate to recieve funding from the National Science and Engineering Research Council (NSERC), the Province of Ontario, and the University of Toronto. I am also thankful for a number of individuals who were influential in leading me to this path, including particularly Ian Rowe, and Stuart Taylor. Finally, I would like to thank my parents, whose constant love, support and encouragment have made this all possible. iii Contents 1 Introduction 1 1.1 Motivation . .1 1.2 Contributions . .3 1.3 Organization . .4 2 Background 6 2.1 Field Programmable Gate Arrays . .6 2.1.1 FPGA Architecture . .6 2.1.2 CAD for FPGAs . .9 2.2 CAD Flows . 11 2.2.1 Impact of CAD Flow on Productivity . 11 2.3 Timing Analysis . 13 2.4 FPGA Packing & Placement . 16 2.4.1 Optimization Objectives . 16 2.4.2 Optimization Algorithms . 17 2.4.3 Handling Legality . 18 2.5 FPGA Routing . 18 I Timing Analysis 20 3 Tatum: Fast Parallel Timing Analysis 21 3.1 Introduction . 21 3.2 Background & Related Work . 22 3.2.1 STA Background . 22 3.2.2 Parallel Timing Analysis . 23 3.3 Baseline Implementation . 23 3.3.1 Memory Layout Optimizations . 24 3.4 Parallel STA on GPUs . 25 3.4.1 GPU Algorithm . 25 3.4.2 GPU Experimental Results . 26 3.5 Parallel STA on CPUs . 28 3.5.1 CPU Algorithm A: Levelized Locks . 28 3.5.2 CPU Algorithm B: Levelized No-Locks . 28 iv 3.5.3 CPU Algorithm C: Dynamic . 30 3.5.4 CPU Experimental Results . 32 3.6 Multi-Corner Timing Analysis . 33 3.6.1 Parallel . 33 3.6.2 SIMD . 34 3.6.3 Analysis within Fixed CAD Run-Time Budgets . 35 3.7 Multi-clock Timing Analysis . 35 3.8 Tatum . 36 3.8.1 Asymptotic Scalability . 36 3.8.2 Comparison to OpenTimer . 38 3.9 Evaluation in VPR . 39 3.10 Improving Optimization . 40 3.10.1 Algorithmic Enhancements . 42 3.10.2 Results . 42 3.11 Conclusion . 43 3.12 Future Work . 44 4 Extended Static Timing Analysis 45 4.1 Introduction . 45 4.2 Background & Related Work . 46 4.3 Extended Static Timing Analysis Formulation . 48 4.3.1 Transition Model . 48 4.3.2 Using #SAT to Calculate Probabilities . 49 4.3.3 Combining Activation Functions . 50 4.3.4 Conditioning Functions . 51 4.4 ESTA Implementation . 51 4.4.1 Calculating Timing Tags . 52 4.4.2 Input Filtering . 52 4.4.3 Tag Merging . 53 4.4.4 Run-Time/Accuracy Trade-Offs . 53 4.4.5 Enhanced Algorithm . 54 4.4.6 Computational Complexity . 54 4.4.7 Comparison with Exhaustive Simulation . 56 4.4.8 Per-Output & Max-Delay Modes . 56 4.5 Non-Uniform & Correlated Conditioning Functions . 56 4.5.1 Non-uniform Conditioning Functions . 57 4.5.2 Correlated Condition Functions . 59 4.6 Evaluation Methodology . 60 4.6.1 Monte-Carlo Simulation . 61 4.6.2 Metrics . 62 4.7 Results . 62 4.7.1 Verification . 62 4.7.2 Maximum Delay Estimation with False Paths . 63 4.7.3 Monte-Carlo Convergence . 63 v 4.7.4 ESTA Run-time/Accuracy Trade-Offs . 63 4.7.5 ESTA Non-Uniform Conditioning Functions . 65 4.7.6 ESTA and MC Comparison . 66 4.8 Conclusion . 69 4.9 Future Work . ..

Load more