Chapter 8 Timing Closure
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Chapter 8 – Timing Closure VLSI Physical Design
© KLMH Chapter 8 – Timing Closure VLSI Physical Design: From Graph Partitioning to Timing Closure Original Authors: Andrew B. Kahng, Jens Lienig, Igor L. Markov, Jin Hu VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 1 Lienig Chapter 8 – Timing Closure © KLMH 8.1 Introduction 8.2 Timing Analysis and Performance Constraints 8.2.1 Static Timing Analysis 8.2.2 Delay Budgeting with the Zero-Slack Algorithm 8.3 Timing-Driven Placement 8.3.1 Net-Based Techniques 8.3.2 Embedding STA into Linear Programs for Placement 8.4 Timing-Driven Routing 8.4.1 The Bounded-Radius, Bounded-Cost Algorithm 8.4.2 Prim-Dijkstra Tradeoff 8.4.3 Minimization of Source-to-Sink Delay 8.5 Physical Synthesis 8.5.1 Gate Sizing 8.5.2 Buffering 8.5.3 Netlist Restructuring 8.6 Performance-Driven Design Flow 8.7 Conclusions VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 2 Lienig 8.1 Introduction © KLMH System Specification Partitioning Architectural Design ENTITY test is port a: in bit; end ENTITY test; Functional Design Chip Planning and Logic Design Circuit Design Placement Physical Design Clock Tree Synthesis Physical Verification DRC and Signoff LVS Signal Routing ERC Fabrication Timing Closure Packaging and Testing Chip VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 3 Lienig 8.1 Introduction © KLMH • IC layout must satisfy geometric constraints and timing constraints − Setup (long-path) constraints − Hold (short-path) constraints • Chip designers must complete timing closure − Optimization process that meets timing constraints − Integrates point optimizations discussed in previous chapters, e.g., placement and routing, with specialized methods to improve circuit performance VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 4 Lienig 8.1 Introduction © KLMH Components of timing closure covered in this lecture: • Timing-driven placement (Sec. -
EMBEDDED PROGRAMMABLE LOGIC CORES By
IMPLEMENTATION CONSIDERATIONS FOR “SOFT” EMBEDDED PROGRAMMABLE LOGIC CORES by James Cheng-Huan Wu B.A.Sc., University of British Columbia, 2002 A thesis submitted in partial fulfillment of the requirements for the degree of Master of Applied Science in The Faculty of Graduate Studies Department of Electrical and Computer Engineering We accept this thesis as conforming to the required standard: ___________________________ ___________________________ ___________________________ ___________________________ The University of British Columbia September 2004 © James C.H. Wu, 2004 ABSTRACT IMPLEMENTATION CONSIDERATIONS FOR “SOFT” EMBEDDED PROGRAMMABLE LOGIC CORES As integrated circuits become increasingly more complex and expensive, the ability to make post-fabrication changes will become much more attractive. This ability can be realized using programmable logic cores. Currently, such cores are available from vendors in the form of “hard” macro layouts. An alternative approach for fine-grain programmability is possible: vendors supply an RTL version of their programmable logic fabric that can be synthesized using standard cells. Although this technique may suffer in terms of speed, density, and power overhead, the task of integrating such cores is far easier than the task of integrating “hard” cores into an ASIC or SoC. When the required amount of programmable logic is small, this ease of use may be more important than the increased overhead. In this thesis, we identify potential implementation issues associated with such cores, and investigate in depth the area, speed and power overhead of using this approach. Based on this investigation, we attempt to improve the performance of programmable cores created in this manner. Using a test-chip implementation, we identify three main issues: core size selection, I/O connections, and clock-tree synthesis. -
Chapter 8 – Timing Closure
© KLMH VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8 – Timing Closure Original Authors: Andrew B. Kahng, Jens Lienig, Igor L. Markov, Jin Hu VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 1 Lienig Chapter 8 – Timing Closure © KLMH 8.1 Introduction 8.2 Timing Analysis and Performance Constraints 8.2.1 Static Timing Analysis 8.2.2 Delay Budgeting with the Zero-Slack Algorithm 8.3 Timing-Driven Placement 8.3.1 Net-Based Techniques 8.3.2 Embedding STA into Linear Programs for Placement 8.4 Timing-Driven Routing 8.4.1 The Bounded-Radius, Bounded-Cost Algorithm 8.4.2 Prim-Dijkstra Tradeoff 8.4.3 Minimization of Source-to-Sink Delay 8.5 Physical Synthesis 8.5.1 Gate Sizing 8.5.2 Buffering 8.5.3 Netlist Restructuring 8.6 Performance-Driven Design Flow 8.7 Conclusions VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 2 Lienig 8.1 Introduction © KLMH System Specification Partitioning Architectural Design ENTITY test is port a: in bit; end ENTITY test; Functional Design Chip Planning and Logic Design Circuit Design Placement Physical Design Clock Tree Synthesis Physical Verification DRC and Signoff LVS Signal Routing ERC Fabrication Timing Closure Packaging and Testing Chip VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 3 Lienig 8.1 Introduction © KLMH • IC layout must satisfy geometric constraints, electrical constraints, power & thermal constraints as well as timing constraints − Setup (long-path) constraints − Hold (short-path) constraints • Chip designers must complete timing closure − Optimization process that meets timing constraints − Integrates point optimizations discussed in previous chapters, e.g., placement and routing, with specialized methods to improve circuit performance VLSI Physical Design: From Graph Partitioning to Timing Closure Chapter 8: Timing Closure 4 Lienig 8.1 Introduction © KLMH Components of timing closure covered in this lecture: • Timing-driven placement (Sec. -
Timing Closure Problem: Review of Challenges at Advanced Process Nodes and Solutions
IETE Technical Review ISSN: 0256-4602 (Print) 0974-5971 (Online) Journal homepage: http://www.tandfonline.com/loi/titr20 Timing Closure Problem: Review of Challenges at Advanced Process Nodes and Solutions Sneh Saurabh, Hitarth Shah & Shivendra Singh To cite this article: Sneh Saurabh, Hitarth Shah & Shivendra Singh (2018): Timing Closure Problem: Review of Challenges at Advanced Process Nodes and Solutions, IETE Technical Review To link to this article: https://doi.org/10.1080/02564602.2018.1531733 Published online: 22 Oct 2018. Submit your article to this journal View Crossmark data Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=titr20 IETE TECHNICAL REVIEW https://doi.org/10.1080/02564602.2018.1531733 Timing Closure Problem: Review of Challenges at Advanced Process Nodes and Solutions Sneh Saurabh, Hitarth Shah and Shivendra Singh Department of ECE, IIIT, Delhi, India ABSTRACT KEYWORDS Attaining timing closure marks the culmination of an arduous VLSI design process. The targets set Advanced process nodes; for timing closure and the time taken to achieve it can critically impact the success of a product Design flow; Gate-delay; in a highly competitive semiconductor market. Therefore, methodologies employed in VLSI design Process variations; Timing process are strategized to attain quick timing closure along with reasonable design metrics. How- closure; Wire-delay ever, at advanced process nodes, attaining timing closure becomes quite challenging. As a result, at advanced process nodes, innovative solutions are required to be incorporated in VLSI design processes, as well as in Electronic Design Automation (EDA) tools and technologies. In this review paper, we discuss timing closure problem explaining the root cause of its difficulty. -
FPGA Emulation for Critical-Path Coverage Analysis
FPGA Emulation for Critical-Path Coverage Analysis by Kyle Balston B.A.Sc., Simon Fraser University, 2010 ATHESISSUBMITTEDINPARTIAL FULFILLMENT OFTHEREQUIREMENTSFORTHEDEGREEOF Master of Applied Science in THEFACULTYOFGRADUATESTUDIES (Electrical and Computer Engineering) The University Of British Columbia (Vancouver) October 2012 © Kyle Balston, 2012 Abstract A major task in post-silicon validation is timing validation: it can be incredibly difficult to ensure a new chip meets timing goals. Post-silicon validation is the first opportunity to check timing with real silicon under actual operating conditions and workloads. However, post-silicon tests suffer from low observability, making it difficult to properly quantify test quality for the long-running random and di- rected system-level tests that are typical in post-silicon. In this thesis, we propose a technique for measuring the quality of long-running system-level tests used for timing coverage through the use of on-chip path monitors to be used with FPGA emulation. We demonstrate our technique on a non-trivial SoC, measuring the cov- erage of 2048 paths (selected as most critical by static timing analysis) achieved by some pre-silicon system-level tests, a number of well-known benchmarks, boot- ing Linux, and executing randomly generated programs. The results show that the technique is feasible, with area and timing overheads acceptable for pre-silicon FPGA emulation. ii Preface Preliminary work related to this thesis has been published in a conference paper and a journal paper. The work itself will be published as an invited conference paper. The first paper, Post-Silicon Code Coverage Evaluation with Reduced Area Overhead for Functional Verification of SoC, was published as [33]. -
Appendix a Solutions to Chapter Exercises
Appendix A A Solutions to Chapter Exercises A Solutions to Chapter Exercises ................................267 Chapter 2: Netlist and System Partitioning.......................................... 267 Chapter 3: Chip Planning..................................................................... 270 Chapter 4: Global and Detailed Placement ......................................... 273 Chapter 5: Global Routing.................................................................... 276 Chapter 6: Detailed Routing................................................................. 280 Chapter 7: Specialized Routing ........................................................... 284 Chapter 8: Timing Closure ................................................................... 292 A 267 A Solutions to Chapter Exercises Chapter 2: Netlist and System Partitioning Exercise 1: KL Algorithm Maximum gains and the swaps of nodes with positive gain per iteration i are given. i = 1: 'g1 = D(a) + D(f ) – 2c(a,f ) = 2 + 1 – 0 = 3 ĺ swap nodes a and f, G1 = 3. i = 2: 'g2 = D(c) + D(e) – 2c(c,e) = -2 + 0 – 0 = -2 ĺ swap nodes c and e, G2 = 1. i = 3: 'g3 = D(b) + D(d) – 2c(b,d) = -1 + 0 – 0 = -1 ĺ swap nodes b and d, G3 = 0. Maximum gain is Gm = 3 with m = 1. a d Therefore, swap nodes a and f. B Graph after Pass 1 (right). A b e c f Exercise 2: Critical Nets and Gain During the FM Algorithm (a) Table recording (1) the cell moved, (2) critical nets before the move, (3) critical nets after the move, and (4) which cells require a gain update. Cell Critical nets Critical nets Which cells require moved before the move after the move a gain update a -- N1 b b -- N1 a c -- N1 d d N3 N1,N3 c,h,i e N2 N2 f,g f N2 N2 e,g g N2 N2 e,f h N3 N3 d,i i N3 N3 d,h (b) Gains for each cell. -
The Coming of Age of Physical Synthesis
The Coming of Age of Physical Synthesis Charles J. Alpert Chris Chu Paul G. Villarrubia Manager Associate Professor Senior Technical Staff Member Design Productivity Group Department of ECE Server and Technology Group IBM Research Iowa State University IBM Corp. Austin, TX 78758 Ames, IA 50011 Austin, TX 78758 Abstract— Physical synthesis, the integration of logic syn- targets, designer cell movement constraints, and routability thesis with physical design information, was born in the mid directives. to late 1990s, which means it is about to enter its teenage years. Today, physical synthesis tools are a major part of the 3. Timing Analysis. After placement, one can run static EDA industry, accounting for hundreds of millions of dollars timing analysis to see how much timing degradation has in revenue. This work looks at how technology and design occurred and identify nets with signal integrity problems. trends have affected physical synthesis over the last decade and also how physical synthesis will continue to evolve on its 4. Electrical Correction. After placement, one may find way to adulthood. gates which drive load above the allowed specification and long wires for which the signal exceeds the designer spec- I. INTRODUCTION ified slew rate. Electrical correction fixes these problems, typically through buffering and gate sizing, thereby getting Physical synthesis is motivated by the technology trend the design into a reasonably good timing state. One can that is driving the increasing magnitude of wire delay as a also employ a logical effort [2] type of approach to get percentage of total circuit performance. Several technology the design into a reasonably good global state in terms of generations ago, when wire delays were insignificant, the timing closure. -
Common Path Pessimism Removal in Static Timing Analysis
Scholars' Mine Masters Theses Student Theses and Dissertations Summer 2015 Common path pessimism removal in static timing analysis Chunyu Wang Follow this and additional works at: https://scholarsmine.mst.edu/masters_theses Part of the Computer Engineering Commons Department: Recommended Citation Wang, Chunyu, "Common path pessimism removal in static timing analysis" (2015). Masters Theses. 7440. https://scholarsmine.mst.edu/masters_theses/7440 This thesis is brought to you by Scholars' Mine, a service of the Missouri S&T Library and Learning Resources. This work is protected by U. S. Copyright Law. Unauthorized use including reproduction for redistribution requires the permission of the copyright holder. For more information, please contact [email protected]. COMMON PATH PESSIMISM REMOVAL IN STATIC TIMING ANALYSIS by CHUNYU WANG A THESIS Presented to the Faculty of the Graduate School of the MISSOURI UNIVERSITY OF SCIENCE AND TECHNOLOGY In Partial Fulfillment of the Requirements for the Degree MASTER OF SCIENCE IN COMPUTER ENGINEERING 2015 Approved by Yiyu Shi, Advisor Minsu Choi Jun Fan 2015 Chunyu Wang All Rights Reserved iii ABSTRACT Static timing analysis is a key process to guarantee timing closure for modern IC designs. However, additional pessimism can significantly increase the difficulty to achieve timing closure. Common path pessimism removal (CPPR) is a prevalent step to achieve accurate timing signoff. To speed up the existing exhaustive exploration on all paths in a design, this thesis introduces a fast multi-threading timing analysis for removing common path pessimism based on block-based static timing analysis. Experimental results show that the proposed method has faster runtime in removing excess pessimism from clock paths. -
Introducing Vivado ML
Introducing Vivado® ML Editions Nick Ni, Director of Marketing, Software & AI Solutions © Copyright 2021 Xilinx EDA Challenges Today 92B transistors Lack of QoR Breakthrough 1 35B transistors Typically in 1-3% gain range 19B transistors 7B transistors Design Sizes - 38% CAGR Capacity + Routability 2 Compile and iteration time rising 2011 2014 2016 2019 28nm 20nm 16nm 7nm 2 © Copyright 2021 Xilinx Machine Learning is All About Data EDA = decades of data + massive design reuse Training Inference class APPLICATIONS Classification Object Detection Segmentation Speech Recognition Anomaly Detection Now… EDA 3 © Copyright 2021 Xilinx 1 Machine Learning: Next Big Leap for QoR Tipping point: EDA engineers “know-how” heuristics to data-driven ML models EDA Task ML AIG/MIG optimizer DNN selection Breakthrough in QoR gain: 1% → 10%+ Classify optimal flow CNN Generate optimal flow GCN, RL Placement decisions GCN, RL Evaluate potential paths SVM, NN Faster design iteration for a QoR target Routing congestion CNN, GAN, ML, MARS Predict wirelength LDA, KNN Table Source: Guyue Huang, et al. 2021. Machine Learning for Electronic Design Automation: A Survey. arXiv:2102.03357 Photo credit: Rohit Sharma, Machine Intelligence in Design Automation CNN = Convolutional Neural Network, RL = Reinforcement Learning, KNN = K-Nearest-Neighbor, GAN = Generative Adversarial Network, SVM = Support Vector Machine, GCN = Graph Convolutional Networks, DNN = Deep Neural Network, MARS = Multivariate Adaptive Regression Splines, LDA = Linear Discriminant Analysis 4 © Copyright -
Divide-And-Conquer Techniques for Large Scale FPGA Design by Kevin
Divide-and-Conquer Techniques for Large Scale FPGA Design by Kevin Edward Murray A thesis submitted in conformity with the requirements for the degree of Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto © Copyright 2015 by Kevin Edward Murray Abstract Divide-and-Conquer Techniques for Large Scale FPGA Design Kevin Edward Murray Master of Applied Science Graduate Department of Electrical and Computer Engineering University of Toronto 2015 The exponential growth in Field-Programmable Gate Array (FPGA) size afforded by Moore's Law has greatly increased the breadth and scale of applications suitable for implementation on FPGAs. However, the increasing design size and complexity challenge the scalability of the conventional approaches used to implement FPGA designs | making FPGAs difficult and time-consuming to use. This thesis investigates new divide-and-conquer approaches to address these scalability challenges. In order to evaluate the scalability and limitations of existing approaches, we present a new large FPGA benchmark suite suitable for exploring these issues. We then investigate the practicality of using latency insensitive design to decouple timing requirements and reduce the number of design iterations required to achieve timing closure. Finally we study floorplanning, a technique which spatially decomposes the FPGA implementation to expose additional parallelism during the implementation process. To evaluate the impact of floorplanning on FPGAs we develop Hetris, a new automated FPGA floorplanning tool. ii Acknowledgements First, I would like to thank my supervisor Vaughn Betz. His suggestions and feedback have been invaluable in improving the quality of this work. Furthermore, I am deeply appreciative the time and effort he has invested in mentoring me. -
WP331 (V1.0.2) January 18, 2008
White Paper: ISE 6.1i R WP331 (v1.0.2) January 18, 2008 Timing Closure 6.1i By: Rhett Whatcott NOTE: The material contained within this document is out of date, but it is provided as a historical reference. The high performance of today's Xilinx devices can overcome the speed limitations of other technologies and older devices. Furthermore, designs that formerly only fit or ran at high clock frequencies in an ASIC are finding their way into Xilinx FPGAs. However, it is imperative that designers have a proven methodology for obtaining their performance objectives. This article is written specifically to address timing closure in high-performance applications. Consider these guidelines a road map for improving performance and meeting your timing objectives. Similar to a road map, you may find detours, different and faster paths, and newer roads that will help you to achieve your goals. © 2004-2008 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc. All other trademarks are the property of their respective owners. WP331 (v1.0.2) January 18, 2008 www.xilinx.com 1 R Introduction Introduction This updated flow includes new techniques, and updates to software flow and tools based on ISE™ 6.1i design tools and FPGA architectures. X-Ref Target - Figure 1 Figure 1: Timing Closure Flowchart 2 www.xilinx.com WP331 (v1.0.2) January 18, 2008 Timing Closure Flow R Timing Closure Flow 1. Use proper coding techniques. This includes many topics expanded on below, but primarily involves the implementation of synchronous design techniques, use of Xilinx specific coding, and use of cores. -
Fast and Flexible CAD for Fpgas and Timing Analysis by Kevin Edward
Fast and Flexible CAD for FPGAs and Timing Analysis by Kevin Edward Murray A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto © Copyright 2020 by Kevin Edward Murray Abstract Fast and Flexible CAD for FPGAs and Timing Analysis Kevin Edward Murray Doctor of Philosophy Graduate Department of Electrical and Computer Engineering University of Toronto 2020 Field Programmable Gate Arrays (FPGAs) are a widely used platform for hardware acceleration and digital systems design, which offer high performance and power efficiency while remaining re- programmable. However a major challenge to using FPGAs is the complex Computer Aided Design (CAD) flow required to program them, which makes FPGAs difficult and time-consuming to use. This thesis investigates techniques to address these challenges. A key component of the CAD flow is timing analysis, which is used to verify correctness and drive optimizations. Since the dominant form of timing analysis, Static Timing Analysis (STA), is time consuming to perform, we investigate methods to efficiently perform STA in parallel. We then develop Extended Static Timing Analysis (ESTA), which performs a more detailed and less pessimistic form of timing analysis by accounting for the probability of signal transitions. We next study a variety of enhancements to the FPGA physical design flow, including to key optimization stages like packing, placement and routing. These techniques are combined in the open- source VTR 8 design flow, and significantly outperform previous methods. We also develop advanced architecture modelling capabilities which allow VTR 8 to target a wider range of FPGA architectures.