Hardware Acceleration of Electronic Design Automation
Total Page:16
File Type:pdf, Size:1020Kb
HARDWARE ACCELERATION OF ELECTRONIC DESIGN AUTOMATION ALGORITHMS A Dissertation by KANUPRIYA GULATI Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY December 2009 Major Subject: Computer Engineering HARDWARE ACCELERATION OF ELECTRONIC DESIGN AUTOMATION ALGORITHMS A Dissertation by KANUPRIYA GULATI Submitted to the Office of Graduate Studies of Texas A&M University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Approved by: Chair of Committee, Sunil P. Khatri Committee Members, Peng Li Jim Ji Duncan M. Walker Desmond A. Kirkpatrick Head of Department, Costas N. Georghiades December 2009 Major Subject: Computer Engineering iii ABSTRACT Hardware Acceleration of Electronic Design Automation Algorithms. (December 2009) Kanupriya Gulati, B.E., Delhi College of Engineering, New Delhi, India; M.S., Texas A&M University Chair of Advisory Committee: Dr. Sunil P. Khatri With the advances in very large scale integration (VLSI) technology, hardware is going parallel. Software, which was traditionally designed to execute on single core microproces- sors, now faces the tough challenge of taking advantage of this parallelism, made available by the scaling of hardware. The work presented in this dissertation studies the accelera- tion of electronic design automation (EDA) software on several hardware platforms such as custom integrated circuits (ICs), field programmable gate arrays (FPGAs) and graphics processors. This dissertation concentrates on a subset of EDA algorithms which are heav- ily used in the VLSI design flow, and also have varying degrees of inherent parallelism in them. In particular, Boolean satisfiability, Monte Carlo based statistical static timing analysis, circuit simulation, fault simulation and fault table generation are explored. The architectural and performance tradeoffs of implementing the above applications on these alternative platforms (in comparison to their implementation on a single core micropro- cessor) are studied. In addition, this dissertation also presents an automated approach to accelerate uniprocessor code using a graphics processing unit (GPU). The key idea is to partition the software application into kernels in an automated fashion, such that multiple instances of these kernels, when executed in parallel on the GPU, can maximally benefit from the GPU’s hardware resources. The work presented in this dissertation demonstrates that several EDA algorithms can be successfully rearchitected to maximally harness their performance on alternative plat- forms such as custom designed ICs, FPGAs and graphic processors, and obtain speedups iv upto 800 . The approaches in this dissertation collectively aim to contribute towards en- × abling the computer aided design (CAD) community to accelerate EDA algorithms on ar- bitrary hardware platforms. v To My Grandmas, Late Gurcharan Kaur Gulati and Prakash Kaur Arora, for their unassailable faith and patience. vi ACKNOWLEDGMENTS I can no other answer make, but thanks, and thanks, and ever thanks. - William Shakespeare This acknowledgment is an insufficient platform to express my deep sense of gratitude, but here’s my heartfelt attempt. In order to present this dissertation in its entirety, I cannot not acknowledge my Ph.D. adviser and mentor Dr. Sunil Khatri for his remarkable guidance, undying enthusiasm, high expertise and especially his let’s get it done attitude. But for his constant support and timely critisism, this work may have never seen the light of day. My thanks are due to my mentor at my Intel internships, and my Ph.D. committee member, Dr. Desmond Kirkpatrick, for inspiring me with his ingenious knowledge base, and for providing me the opportunity to intern at Strategic CAD Labs (SCL), and for ex- tending his confidence towards my research. I am grateful to my Ph.D. committee members, Drs. Hank Walker, Peng Li and Jim Ji, for their valuable feedback and encouragement for my research, and even more for being outstanding teachers. During the (rather long) course of my graduate studies, I have had the opportunity to work with some extremely smart students in my research group. I would especially like to thank Nikhil and Rajesh, for being excellent lab mates, and inspiring me with their intense sincerity and hard work; Suganth, for letting me win at squash (at least until he learned it); Kalyan, for sharing a (teeny) part of his Linux expertise; Karan and Charu, for their tenacity at work and cheerful demeanor; and all past and present group members, for their infectious dedication. I was fortunate to befriend some excellent folks at Texas A&M University. I thank vii Prasenjit, for helping me believe over ’sushi and fish fry’ that ’it all works out’; Richa, for interesting conversations over ’chai and paranthas’; Rouella and Harneet, for insanely delicious home-cooked dinners; and Gaurav for always ensuring that I ’Don’t Panic’, for driving thousands of miles and for being patient with me (more often than not). I am grateful to my friends and family who helped me in ways one too many to list. Above all, I owe it to my family - Mom, Dad, Samaira, Geety and Ashu, for being there, always. viii TABLE OF CONTENTS CHAPTER Page I INTRODUCTION .......................... 1 I-A. Hardware Platforms Considered in This Dissertation . 4 I-B. EDAAlgorithmsStudiedinThisDissertation . 5 I-B.1. ControlDominatedApplications . 6 I-B.2. ControlPlusDataParallelApplications . 8 I-C. Automated Approach for GPU Based Software Acceleration 11 I-D. ChapterSummary...................... 12 II HARDWARE PLATFORMS ..................... 14 II-A. ChapterOverview. 14 II-B. Introduction ......................... 15 II-C. Hardware Platforms Studied in This Dissertation . 15 II-C.1. CustomICs ..................... 15 II-C.2. FPGAs........................ 16 II-C.3. GraphicsProcessors. 16 II-D. General Overview and Architecture . 17 II-E. ProgrammingModelandEnvironment . 21 II-F. Scalability.......................... 23 II-G. DesignTurn-aroundTime. 24 II-H. Performance......................... 24 II-I. CostofHardware ...................... 27 II-J. FloatingPointOperations. 27 II-K. SecurityandRealTimeApplications . 28 II-L. Applications......................... 29 II-M. ChapterSummary. 30 III GPU ARCHITECTURE AND THE CUDA PROGRAMMING MODEL ............................... 31 III-A. ChapterOverview. 31 III-B. Introduction ......................... 31 III-C. HardwareModel . .. .. .. .. .. .. 33 III-D. MemoryModel ....................... 34 III-E. ProgrammingModel . 37 III-F. ChapterSummary. 40 ix CHAPTER Page IV ACCELERATING BOOLEAN SATISFIABILITY ON A CUS- TOM IC ............................... 41 IV-A. ChapterOverview. 41 IV-B. Introduction ......................... 42 IV-C. PreviousWork........................ 45 IV-D. HardwareArchitecture . 47 IV-D.1. AbstractOverview . 47 IV-D.2. HardwareOverview. 48 IV-D.3. HardwareDetails . 49 IV-D.3.a. DecisionEngine . 49 IV-D.3.b. ClauseCell . 50 IV-D.3.c. BaseCell . 55 IV-D.3.d. PartitioningtheHardware . 59 IV-D.3.e. Inter-bankCommunication . 61 IV-E. AnExampleofConflictClauseGeneration. 64 IV-F. PartitioningtheCNFInstance. 66 IV-G. ExtractionoftheUnsatisfiableCore . 68 IV-H. ExperimentalResults . 70 IV-I. ChapterSummary. .. .. .. .. .. .. 76 V ACCELERATING BOOLEAN SATISFIABILITY ON AN FPGA . 77 V-A. ChapterOverview. 77 V-B. Introduction ......................... 78 V-C. PreviousWork........................ 79 V-D. HardwareArchitecture . 82 V-D.1. ArchitectureOverview . 82 V-E. Solving a CNF Instance Which Is Partitioned into SeveralBins......................... 83 V-F. PartitioningtheCNFInstance. 85 V-G. HardwareDetails . .. .. .. .. .. .. 87 V-H. ExperimentalResults . 90 V-H.1. CurrentImplementation. 90 V-H.2. PerformanceModel . 92 V-H.2.a. FPGAResources . 92 V-H.2.b. Clauses/VariableRatio . 93 V-H.2.c. CyclesVs.BinSize . 94 V-H.2.d. BinsTouchedVs.BinSize . 95 V-H.2.e. BinSize.................... 96 x CHAPTER Page V-H.3. Projections ..................... 97 V-I. ChapterSummary. .101 VI ACCELERATING BOOLEAN SATISFIABILITY ON A GRAPH- ICS PROCESSING UNIT ...................... 102 VI-A. ChapterOverview. .102 VI-B. Introduction . .103 VI-C. RelatedPreviousWork . .105 VI-D. OurApproach .. .. .. .. .. .. ..108 VI-D.1. SurveySATandtheGPU . .108 VI-D.1.a. SurveySAT . .109 VI-D.1.b. SurveySATontheGPU . 113 VI-D.1.c. SurveySATResultsontheGPU . 115 VI-D.2. MiniSAT Enhanced with Survey Propagation (MESP) .......................117 VI-E. ExperimentalResults . .120 VI-F. ChapterSummary. .121 VII ACCELERATING STATISTICAL STATIC TIMING ANALY- SIS USING GRAPHICS PROCESSORS ............... 123 VII-A. ChapterOverview. .123 VII-B. Introduction . .124 VII-C. PreviousWork. .127 VII-D. OurApproach . .129 VII-D.1. StaticTimingAnalysis(STA)ataGate . 130 VII-D.2. Statistical Static Timing Analysis (SSTA) at a Gate 133 VII-E. ExperimentalResults . .136 VII-F. ChapterSummary. .139 VIII ACCELERATING FAULT SIMULATION USING GRAPHICS PROCESSORS ............................ 141 VIII-A. ChapterOverview. .141 VIII-B. Introduction . .142 VIII-C. PreviousWork. .144 VIII-D. OurApproach . .146 VIII-D.1. LogicSimulationataGate . 147 VIII-D.2. FaultInjectionataGate. 151 VIII-D.3. FaultDetectionataGate . 152 xi CHAPTER Page VIII-D.4. FaultSimulationofaCircuit . 153 VIII-E. ExperimentalResults . .155 VIII-F. ChapterSummary . .157 IX FAULT TABLE GENERATION USING GRAPHICS PROCESSORS 159 IX-A. ChapterOverview. .159 IX-B. Introduction . .160 IX-C. PreviousWork. .163 IX-D.