UNIVERSITY of CALIFORNIA SAN DIEGO IC Physical Design

UNIVERSITY OF CALIFORNIA SAN DIEGO IC Physical Design Methodologies for Advanced Process Nodes A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Electrical Engineering (Computer Engineering) by Kwangsoo Han Committee in charge: Professor Andrew B. Kahng, Chair Professor Chung-Kuan Cheng Professor Rajesh Gupta Professor Ryan Kastner Professor Farinaz Koushanfar 2018 Copyright Kwangsoo Han, 2018 All rights reserved. The dissertation of Kwangsoo Han is approved, and it is ac- ceptable in quality and form for publication on microfilm and electronically: Chair University of California San Diego 2018 iii DEDICATION To my family, without whose love, encouragement and sacrifice this thesis would not have been finished. iv TABLE OF CONTENTS Signature Page . iii Dedication . iv Table of Contents . v List of Figures . viii List of Tables . xiii Acknowledgments . xv Vita ............................................. xvii Abstract of the Dissertation . xx Chapter 1 Introduction . 1 1.1 Challenges in Physical Design . 2 1.1.1 Limitation of Manufacturing . 4 1.1.2 Severe Process Variation . 5 1.1.3 Escalating Interconnect RC Delay . 7 1.2 This Thesis . 8 Chapter 2 Manufacturing-Aware Design Methodologies . 12 2.1 MILP-Based Optimization of 2D Block Masks for Timing-Aware Dummy Segment Removal in Self-Aligned Multiple Patterning Layouts . 13 2.1.1 Background . 16 2.1.2 MILP-based 2D Block Mask Optimization . 21 2.1.3 Overall Flow . 31 2.1.4 Experimental Setup and Results . 34 2.1.5 Conclusion . 42 2.2 Scalable Detailed Placement Legalization for Complex Sub-14nm Constraints . 43 2.2.1 N10 FEOL and Cell Placement Constraints . 43 2.2.2 Related Work . 47 2.2.3 Our Approach . 48 2.2.4 Experimental Setup and Results . 57 2.2.5 Conclusion . 61 2.3 Evaluation of BEOL Design Rule Impacts Using an Optimal ILP-based Detailed Router . 62 2.3.1 Related Work . 64 2.3.2 Optimal Routing Formulation . 65 2.3.3 Empirical Studies . 72 2.3.4 Conclusion . 79 v 2.4 Acknowledgments . 79 Chapter 3 Process-Aware Design Methodologies . 81 3.1 OCV-Aware Top-Level Clock Tree Optimization . 82 3.1.1 Motivation and Related Work . 82 3.1.2 Our Approach . 84 3.1.3 Clock Tree Optimization . 86 3.1.4 Experimental Setup and Results . 92 3.1.5 Conclusion . 98 3.2 A Global-Local Optimization Framework for Simultaneous Multi-Mode Multi-Corner Clock Skew Variation Reduction . 100 3.2.1 Related Work . 101 3.2.2 Problem Formulation . 102 3.2.3 Optimization Framework . 103 3.2.4 Experimental Setup and Results . 113 3.2.5 Conclusion . 117 3.3 Improved Performance of 3DIC Implementations Through Inherent Awareness of Mix-and-Match Die Stacking . 118 3.3.1 Related Work . 120 3.3.2 Problem Formulation . 122 3.3.3 ILP-Based Partitioning Methodology . 123 3.3.4 Heuristic Partitioning Methodology . 125 3.3.5 Experimental Setup and Results . 131 3.3.6 Conclusion . 135 3.4 Acknowledgments . 136 Chapter 4 Interconnect-Aware Design Methodologies . 137 4.1 Prim-Dijkstra Revisited: Achieving Superior Timing-driven Routing Trees . 138 4.1.1 Related Work . 141 4.1.2 Problem Formulation . 142 4.1.3 The PD-II Spanning Tree Construction . 144 4.1.4 The Detour-Aware Steinerization Algorithm (DAS)..... 148 4.1.5 Experimental Setup and Results . 149 4.1.6 Conclusion . 156 4.2 A Study of Optimal Cost-Skew Tradeoff and Remaining Suboptimality in Interconnect Tree Constructions . 158 4.2.1 Related Work . 160 4.2.2 Flow-based ILP Formulation . 162 4.2.3 Experimental Setup and Results . 167 4.2.4 Conclusion . 169 4.3 Optimal Generalized H-Tree Topology and Buffering for High-Performance and Low-Power Clock Distribution . 170 4.3.1 Motivation and Related Work . 172 4.3.2 Our Approach . 176 4.3.3 Experimental Setup and Results . 190 vi 4.3.4 Conclusion . 198 4.4 Acknowledgments . 199 Chapter 5 Conclusion . 200 Bibliography . 204 vii LIST OF FIGURES Figure 1.1: Design cost and transistor count trends [140]. 1 Figure 1.2: Design Capability Gap [102]. 2 Figure 1.3: Roadmap of future technology [193]. 3 Figure 1.4: History of timing signoff [101]. 3 Figure 1.5: Wafer price trend [215]. 4 Figure 1.6: Design rules and operations explosion [214]. 5 Figure 1.7: Timing corner explosion [101]. 6 Figure 1.8: Cost of design margin. 7 Figure 1.9: Interconnect RC increase as technology nodes scale down [44]. 7 Figure 1.10: Scope and organization of this thesis. 8 Figure 2.1: SAMP process: (a) post-route layout; (b) cut mask application; (c) layout after cut mask application; (d) block mask application; and (e) final layout after block mask application. 14 Figure 2.2: Block mask rules: (a) minimum width and length rules; (b) minimum over- lap rule; (c) minimum U-shape rule; and (d) minimum L-shape rule. 16 Figure 2.3: Illustration of the material selectivity-based block approach. 18 Figure 2.4: Comparison between selective block and non-selective block: (a) selective block mask in red removes only red segments; (b) selective block mask in green removes only green segments; and (c) a complex non-selective block mask is required to remove the same dummy segments. 19 Figure 2.5: Cut mask rules: minimum spacing. 20 Figure 2.6: Comparison between selective cuts and non-selective LELE cuts. (a) Selec- tive cut mask in red (resp. green) realizes EOL only for red (resp. green) segments, and is transparent to green (resp. red) segments. (b) Non-selective LELE cuts realize EOL for both colors. 20 Figure 2.7: Shapes and block candidates for Shape 2. 23 Figure 2.8: Illustration of a U-shape block mask rule violation. 25 Figure 2.9: Cut and block mask co-optimization: (a) block candidates; (b) cut candidates; and (c) a possible final layout. 26 0 Figure 2.10: Illustration of binary variable e : cut candidate c1,1 and block candidate v1,3 are selected. 29 Figure 2.11: Gate delay vs. net capacitance for a specific gate instance. 30 Figure 2.12: Comparison of timing results from Tempus (Golden) and our estimation (Estimated). (a) Path delay and (b) stage delay comparisons. The maximum errors are -4ps and -23ps for stage delay and path delay, respectively. 32 Figure 2.13: Illustration of conflict list enumeration for minimum spacing constraint, showing horizontally and vertically conflicting pairs. 33 Figure 2.14: Distributed optimization: (a) - (d) respectively illustrate the first, second, third and fourth iteration in our approach. Since target clips (yellow) for an iteration do not share their boundaries with each other, each target is independently optimizable. 33 Figure 2.15: Overall optimization flow. 34 viii Figure 2.16: Sensitivity study results: sensitivity of dummy removal rate to (a) block candidate length and (b) clip size. ...................... 39 Figure 2.17: Layouts of M4 layer before and after dummy fill removal: (a) initial layout with dummy fill; (b) layout covered by the selective block mask (red); (c) layout covered by the selective block mask (blue); and (d) layout after timing-aware dummy fill removal with optimized selective block masks. 42 Figure 2.18: Illustration of inverter cell layout in N10 node. 44 Figure 2.19: (a) Examples of minimum implant width violations [209]. (b) The design rule for OD jogs. 45 Figure 2.20: (a) Drain-drain abutment violation with an example standard cell layout. (b) Use of dummy poly gates in the library design style can avoid DDA violation in a correct-by-construction manner. 45 Figure 2.21: (a) Inter-row variable mrq for IW1. (b) Intra-row variable hrq for IW2. The color (gray and white) of regions indicates Vth................ 51 Figure 2.22: Overall flow of detailed placement legalization. 54 Figure 2.23: Partitioning of layout for parallel global optimization. 56 Figure 2.24: Remaining violations vs. runtime. Each dot indicates an iteration; after the third iteration, local optimization is performed. The diamond-shaped markers represent third-iteration points. 60 Figure 2.25: (a) Layout with DRVs before optimization. (b) Layout without DRVs after optimization. 61 Figure 2.26: Example showing multi-pin nets and the routing solution. 67 Figure 2.27: Via shape. (a) 2 × 2 square via. (b) 2 × 1 bar via. 68 Figure 2.28: (a) SADP-specific design rules and (b) example showing that via location does not provide enough information to distinguish the upper and lower cases, i.e., to check SADP-aware rules. 69 Figure 2.29: An example of a routing graph. (a) The p variable of a vertex vi is deter- mined by flow variables of edges with vertex vi’s neighbor vertices vt, vb, k vl, vr. (b) Wire segment geometries that respectively result when pr,i = 1 k and pl,i = 1.................................. 70 Figure 2.30: (a) A wire segment, of which the EOL is located at vertex vi with the wire coming from the right side. (b) Forbidden via locations for other wire segments with pl,j = 1. (c) Forbidden via locations for other wire segments with pr,j = 1.................................. 72 Figure 2.31: Overall flow of BEOL rule evaluation. 73 Figure 2.32: Routing clips from (a) N28-12T, (b) N28-9T and (c) N7-9T. Standard cell boundaries and power/ground rail are highlighted with white lines and yellow dashed lines, respectively..

UNIVERSITY of CALIFORNIA SAN DIEGO IC Physical Design

A Comprehensive Stochastic Design Methodology for Hold-Timing Resiliency in Voltage-Scalable Design

Overcoming the Challenges in Very Deep Submicron for Area Reduction, Power Reduction and Faster Design Closure

Design Closure: Power Constraints, Best Practices for an Accurate Report Power Estimation

NOT for Printing

SEMICONDUCTOR COLLABORATIVE DESIGN PROCESS Enable Collaborative Design for Complex Semiconductor Projects

Design for Manufacturing (Dfm) in Submicron Vlsi Design

Designing a Chip Challenges, Trends, and Latin America Opportunity

Physical Design of a 3D-Stacked Heterogeneous Multi-Core Processor

A Semi-Custom Design Flow in High-Performance Microprocessor Design Gregory A

LAMDA: Learning-Assisted Multi-Stage Autotuning for FPGA Design Closure

Ultrafast Design Methodology Guide for the Vivado Design Suite

Constraint Driven I/O Planning and Placement for Chip-Package Co-Design ∗