Parallel Multi-Core Verilog HDL Simulation
Total Page:16
File Type:pdf, Size:1020Kb
University of Massachusetts Amherst ScholarWorks@UMass Amherst Doctoral Dissertations Dissertations and Theses Summer August 2014 Parallel Multi-core Verilog HDL Simulation Tariq B. Ahmad University of Massachusetts Amherst Follow this and additional works at: https://scholarworks.umass.edu/dissertations_2 Part of the Computer and Systems Architecture Commons, Digital Circuits Commons, Hardware Systems Commons, and the VLSI and Circuits, Embedded and Hardware Systems Commons Recommended Citation Ahmad, Tariq B., "Parallel Multi-core Verilog HDL Simulation" (2014). Doctoral Dissertations. 45. https://doi.org/10.7275/5382659 https://scholarworks.umass.edu/dissertations_2/45 This Open Access Dissertation is brought to you for free and open access by the Dissertations and Theses at ScholarWorks@UMass Amherst. It has been accepted for inclusion in Doctoral Dissertations by an authorized administrator of ScholarWorks@UMass Amherst. For more information, please contact [email protected]. PARALLEL MULTI-CORE VERILOG HDL SIMULATION A Dissertation Presented by TARIQ BASHIR AHMAD Submitted to the Graduate School of the University of Massachusetts Amherst in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY May 2014 Electrical and Computer Engineering ⃝c Copyright by Tariq Bashir Ahmad 2014 All Rights Reserved PARALLEL MULTI-CORE VERILOG HDL SIMULATION A Dissertation Presented by TARIQ BASHIR AHMAD Approved as to style and content by: Maciej J. Ciesielski, Chair Sandip Kundu, Member Michael Zink, Member Charles Weems, Member Christopher V. Hollot, Department Head Electrical and Computer Engineering To all those who believe. ACKNOWLEDGMENTS I would like to thanks Professor Maciej Ciesielski for helping me when i needed the most and his constant support and mentorship. I also want to thank all the committee members. I must thank Professor C.M. Krishna as well for his help. I am grateful to Dusung Kim for helping me start this project. I want to acknowledge my friend Dr. Faisal M. Kashif for his constant support and mentorship. I am indebted to Fulbright (United States Educational Foundation in Pakistan) in their efforts to help me during my PhD. I cannot forget their favors and i will always remember Dr. Grace Clark and Rita Akhtar for what they did for me. I must also mention that my technical life transformed when i was offered an internship at Marvell, where i got to discover my technical weaknesses and how to overcome those. I am greatly indebted to Awais Nemat and Guy Hutchison for their constant feedback, willing to help and guidance. It was because of their help, Dr. Faisal's help and Fulbright's support, i was able to overcome a major obstacle in my PhD in fall of 2010. The way to this internship started at the parent's house of Amer Haider in spring 2009. I must thank Amer, his mother Ayesha Haider, his father Muzaffar Haider and Hidaya foundation for being hospitable and becoming means to where i am today. I must thank Ameen Ashraf for helping in getting an intership at Apple Computer in Summer 2011. Last but not the least, i want to thank again my parents, my family and everyone around me who has been a positive influence in my life. v ABSTRACT PARALLEL MULTI-CORE VERILOG HDL SIMULATION MAY 2014 TARIQ BASHIR AHMAD B.S., GIK INSTITUTE OF ENGINEERING M.S., UNIVERSITY OF MASSACHUSETTS AMHERST Ph.D., UNIVERSITY OF MASSACHUSETTS AMHERST Directed by: Professor Maciej J. Ciesielski In the era of multi-core computing, the push for creating true parallel applications that can run on individual CPUs is on the rise. Application of parallel discrete event simulation (PDES) to hardware design verification looks promising, given the com- plexity of today's hardware designs. Unfortunately, the challenges imposed by lack of inherent parallelism, suboptimal design partitioning, synchronization and com- munication overhead, and load balancing, render this approach largely ineffective. This thesis presents three techniques for accelerating simulation at three levels of ab- straction namely, RTL, functional gate-level (zero-delay) and gate-level timing. We review contemporary solutions and then propose new ways of speeding up simulation at the three levels of abstraction. We demonstrate the effectiveness of the proposed approaches on several industrial hardware designs. vi TABLE OF CONTENTS Page ACKNOWLEDGMENTS ............................................. v ABSTRACT .......................................................... vi LIST OF TABLES .................................................... xi LIST OF FIGURES..................................................xiii CHAPTER 1. INTRODUCTION ................................................. 1 1.1 Importance of Simulation . 1 1.2 Problems with Parallel Simulation . 5 1.2.1 Design Partitioning . 5 1.2.2 Communication and Synchronization between Partitions . 6 1.2.3 Applicability of Parallel Simulation to Large Designs . 9 1.3 Parallel Simulation Applications . 11 1.4 Formal Verification . 13 1.4.1 Equivalence Checking (EC) . 13 1.4.2 Model Checking and Property Checking . 14 1.5 Static Timing Analysis . 15 1.6 Why Gate-level Simulation? . 15 2. PREVIOUS WORK ON PARALLEL SIMULATION .............. 17 2.1 Factors Affecting the Performance of Parallel HDL Simulation . 18 2.1.1 Timing Granularity . 18 2.1.2 Hardware Architecture . 18 2.1.3 Issues in Design Partitioning . 20 vii 2.1.4 Time Synchronization . 20 2.2 Prediction-based Parallel Simulation . 22 2.3 Multi-level Temporal Parallel Event-Driven Simulation . 24 2.4 Differences between Distributed Simulation and MULTES . 27 2.5 Parallel Computer Architecture . 29 2.5.1 Introduction . 29 2.5.2 New Trends in Computer Architecture . 31 2.5.2.1 Parallelism on Single Core Machine . 31 2.5.3 Classification of Parallel Architectures . 33 2.5.3.1 Single-Instruction, Multiple-Data (SIMD) . 33 2.5.3.2 Multiple-Instruction, Multiple-Data (MIMD) . 33 2.5.3.3 Symmetric Multiprocessing (SMP) . 33 2.5.3.4 Non-uniform Memory Access (NUMA) . 34 2.5.4 Memory Organization in Multi-core Machines . 34 2.5.4.1 Distributed Memory Machines (DMM) . 35 2.5.4.2 Shared Memory Machines (SMM) . 35 2.5.5 Thread Level Parallelism (TLP) . 35 3. PARALLEL MULTI-CORE VERILOG HDL SIMULATION BASED ON FUNCTIONAL PARTITIONING ................. 37 3.1 Predicting Input Stimulus . 38 3.2 Preliminary Results of Predictor . 39 3.3 Quantitative Overhead Measurement in Multi-Core Simulation Environment . 41 3.4 Prediction-based Multi-Core Simulation . 44 3.4.1 Basic Idea . 44 3.4.2 Dealing with Mismatches . 48 3.5 Architecture of Prediction-based Gate-level Simulation . 49 3.6 Experiments on Real Designs . 49 3.7 Dealing with Resynthesized and Retimed Designs . 52 3.8 Conclusion . 53 3.9 Appendix A: Profiling . 53 3.10 Appendix B: Simulation Plots . 54 3.11 Appendix C: Designs Unsuitable for Multi-core Simulation . 66 viii 4. EXTENDING PARALLEL MULTI-CORE VERILOG HDL SIMULATION PERFORMANCE BASED ON DOMAIN PARTITIONING USING VERILATOR AND OPENMP ....... 68 4.1 Introduction . 68 4.2 Simulator Internals . 68 4.3 Parallelizing using OpenMP . 71 4.4 Results . 71 4.5 Dependencies in the Testbench . 73 5. ACCELERATING RTL SIMULATION IN TEMPORAL DOMAIN ...................................................... 78 5.1 Introduction . 79 5.1.1 Issues with Co-Simulation . 79 5.1.2 Issues with Multi-Core Simulators . 79 5.2 Temporal Parallel Simulation. 80 5.2.1 Preliminaries . 80 5.2.2 Integration with the current ASIC/FPGA design flow . 82 5.3 Exploring Circuit Unrolling option for Parallel Simulation . 83 5.4 Experiments and Results . 85 5.4.1 Setup . 85 5.4.2 Simulation of Small Custom Design Circuit . 86 5.4.3 Simulation by varying the Unroll factor (F) . 86 5.4.4 Simulation by varying the number of cores . 89 5.5 Muti-core Architecture of Temporal RTL Simulation . 92 5.5.1 Load Balancing in the Multi-core Architecture . 92 5.5.2 Simulation of industry standard design . 93 5.6 Conclusion . 96 6. ACCELERATING GATE-LEVEL TIMING SIMULATION ....... 97 6.1 Introduction . 97 6.1.1 Issues with Simulation . 97 6.2 Hybrid Approach to Gate-level Timing Simulation . 98 6.2.1 Basic Concept . 98 ix 6.2.2 Design Partitioning for Gate level Simulation . 99 6.2.3 Integration with the existing ASIC/FPGA Design Flow . 104 6.2.4 Early Gate-level Timing Simulation . 105 6.3 Experiments . 106 6.3.1 Experimental Setup . 106 6.3.2 Results . 106 6.4 Verification of Simulation Results . 109 6.5 New Gate-level Timing Simulation Flow . 111 6.6 Conclusion and Future Directions . ..