Sequential Optimization for Low Power Digital Design
Total Page:16
File Type:pdf, Size:1020Kb
Sequential Optimization for Low Power Digital Design Aaron P. Hurst Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2008-75 http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-75.html May 30, 2008 Copyright © 2008, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Acknowledgement Advisor: Robert Brayton Sequential Optimization for Low Power Digital Design by Aaron Paul Hurst B.S. (Carnegie Mellon University) 2002 M.S. (Carnegie Mellon University) 2002 A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Electrical Engineering and Computer Science in the GRADUATE DIVISION of the UNIVERSITY OF CALIFORNIA, BERKELEY Committee in charge: Professor Robert K. Brayton, Chair Professor Andreas Kuehlmann Professor Margaret Taylor Spring 2008 The dissertation of Aaron Paul Hurst is approved. Chair Date Date Date University of California, Berkeley Spring 2008 Sequential Optimization for Low Power Digital Design Copyright c 2008 by Aaron Paul Hurst Abstract Sequential Optimization for Low Power Digital Design by Aaron Paul Hurst Doctor of Philosophy in Electrical Engineering and Computer Science University of California, Berkeley Professor Robert K. Brayton, Chair The power consumed by digital integrated circuits has grown with increasing tran- sistor density and system complexity. One of the particularly power-hungry design features is the generation, distribution, and utilization of one or more synchronization signals (clocks). In many state-of-the-art designs, up to 30%-50% of the total power is dissipated in the clock distribution network. In this work, we examine the application of sequential logic synthesis techniques to reduce the dynamic power consumption of the clocks. These optimizations are sequential because they alter the structural location, functionality, and/or timing of the synchronization elements (registers) in a circuit netlist. A secondary focus is on developing algorithms that scale well to large industrial designs. The first part of the work deals with the use of retiming to minimize the number of registers and therefore the capacitive load on the clock network. We introduce a new formulation of the problem and then show how it can be extended to include necessary constraints on the worst-case timing and initializability of the resulting netlist. It is then demonstrated how retiming can be combined with the orthogonal technique of 1 intentional clock skewing to minimize the combined capacitive load under a timing constraint. The second part introduces a new technique for inserting clock gating logic, whereby a clock’s propagation is conditionally blocked for subsets of the registers in the design that are not actively switching logic state. The conditions under which the clock is disabled are detected through the use of random simulation and Boolean satisfiability checking. This process is quite scalable and also offers the potential for additional logic simplification. Professor Robert K. Brayton Dissertation Committee Chair 2 Contents Contents i List of Figures v List of Tables viii Acknowledgements x 1 Introduction 1 1.1 LowPowerDigitalDesign ........................ 2 1.1.1 Technological ........................... 2 1.1.2 Commercial ............................ 4 1.1.3 Environmental .......................... 7 1.2 SequentialOptimization . 11 1.2.1 Retiming.............................. 12 1.2.2 Clock Skew Scheduling . 18 1.3 OrganizationofthisDissertation . .. 21 2 Unconstrained Min-Register Retiming 24 2.1 Problem .................................. 25 2.1.1 Motivation............................. 26 2.2 PreviousWork .............................. 31 2.2.1 LPFormulation.......................... 31 2.2.2 Min-Cost Network Circulation Formulation . .. 34 2.3 Algorithm ................................. 36 i 2.3.1 Definitions............................. 36 2.3.2 SingleFrame ........................... 39 2.3.3 MultipleFrames.......................... 50 2.4 Analysis .................................. 51 2.4.1 Proof................................ 51 2.4.2 Complexity ............................ 59 2.4.3 Limitations ............................ 66 2.5 ExperimentalResults........................... 67 2.5.1 Setup ............................... 67 2.5.2 Runtime.............................. 68 2.5.3 Characteristics .......................... 71 2.5.4 LargeArtificialBenchmarks . 78 2.6 Summary ................................. 80 3 Timing-Constrained Min-Register Retiming 81 3.1 Problem .................................. 82 3.2 PreviousWork .............................. 82 3.2.1 LPFormulation.......................... 82 3.2.2 Minaret .............................. 83 3.3 Algorithm ................................. 84 3.3.1 SingleFrame ........................... 86 3.3.2 MultipleFrames.......................... 93 3.3.3 Examples ............................. 93 3.4 Analysis .................................. 97 3.5 Proof.................................... 97 3.5.1 Complexity ............................ 100 3.6 ExperimentalResults. .. .. 101 3.6.1 Runtime.............................. 101 3.6.2 Characteristics . .. .. 103 3.7 Summary ................................. 110 4 Guaranteed Initializability Min-Register Retiming 112 ii 4.1 Problem .................................. 113 4.2 PreviousWork .............................. 114 4.2.1 InitialStateComputation . 114 4.2.2 ConstrainingRetiming . 117 4.3 Algorithm ................................. 120 4.3.1 FeasibilityConstraints . 120 4.3.2 IncrementalBias . .. .. 122 4.4 Analysis .................................. 127 4.4.1 Proof................................ 127 4.4.2 Complexity ............................ 127 4.5 ExperimentalResults. .. .. 128 4.6 Summary ................................. 129 5 Min-Cost Combined Retiming and Skewing 130 5.1 Problem .................................. 131 5.1.1 Motivation............................. 131 5.1.2 Definitions............................. 134 5.2 PreviousWork .............................. 135 5.3 Algorithm:Exact ............................. 136 5.4 Algorithm:Heuristic . .. .. 139 5.4.1 IncrementalRetiming. 140 5.4.2 Overview ............................. 141 5.5 ExperimentalResults. .. .. 144 5.6 Summary ................................. 150 6 Clock Gating 151 6.1 Problem .................................. 151 6.1.1 Implementation . .. .. 152 6.2 PreviousWork .............................. 153 6.2.1 StructuralAnalysis . 153 6.2.2 SymbolicAnalysis . 154 6.2.3 RTLAnalysis ........................... 156 iii 6.2.4 ODC-BasedGating. 157 6.3 Algorithm ................................. 157 6.3.1 Definitions............................. 158 6.3.2 PowerModel ........................... 160 6.3.3 Overview ............................. 161 6.3.4 LiteralCollection . 161 6.3.5 CandidatePruning . 166 6.3.6 CandidateProof ......................... 167 6.3.7 CandidateGrouping . 168 6.3.8 Covering.............................. 170 6.4 CircuitMinimization . .. .. 171 6.5 ExperimentalResults. .. .. 172 6.5.1 Setup ............................... 172 6.5.2 StructuralAnalysis . 173 6.5.3 PowerSavings........................... 176 6.5.4 CircuitMinimization . 176 6.6 Summary ................................. 179 7 Conclusion 180 7.1 Minimizing Total Clock Capacitance . 181 7.2 Minimizing Effective Clock Switching Frequency . ..... 182 Bibliography 184 A Benchmark Characteristics 189 iv List of Figures 1.1 Tradeoffofperformanceandpower. 5 1.2 Cost of IC cooling system technologies. .. 6 1.3 Overview of US power consumption. [1] . 9 1.4 Forward and backward retiming moves. 12 1.5 A circuit and its corresponding retiming graph. ..... 14 1.6 Retiming to improve worst-case path length. ... 15 1.7 Retiming to reduce the number of registers. ... 16 1.8 Intentionalclockskewing. 19 2.1 The elimination of clock endpoints also reduces the number of distribu- tiveelementsrequired. 28 2.2 Ascanchainformanufacturingtest. 30 2.3 A three bit binary counter with enable. 38 2.4 An example circuit requiring unit backward flow. .... 43 2.5 An example circuit requiring multiple backward flow. ...... 44 2.6 Fan-outsharinginflowgraph. 45 2.7 The illegal retiming regions induced by the primary input/outputs. 46 2.8 The corresponding flow problem for a combinational network. .... 49 2.9 Flow chart of min-register retiming over multiple frames ....... 51 2.10 Acutintheunrolledcircuit. 55 2.11 Retimingcutcomposition. 57 2.12 The runtime of flow-based retiming vs. CS2 and MCF for the largest designs. .................................. 73 v 2.13 The runtime of flow-based retiming vs. CS2 and MCF for the medium designs. .................................. 74 2.14 The distribution of design size vs. total number of iterations in the forwardandbackwarddirections. 78 2.15 The percentage of register savings contributed by each direction / it- eration. .................................. 79 3.1 Bounding timing paths using ASAP and ALAP positions. ... 84 3.2 The computation of conservative long path timing constraints. ... 88 3.3 The implementation of conservative timing constraints.