IETE Technical Review

ISSN: 0256-4602 (Print) 0974-5971 (Online) Journal homepage: http://www.tandfonline.com/loi/titr20

Timing Closure Problem: Review of Challenges at Advanced Process Nodes and Solutions

Sneh Saurabh, Hitarth Shah & Shivendra Singh

To cite this article: Sneh Saurabh, Hitarth Shah & Shivendra Singh (2018): Problem: Review of Challenges at Advanced Process Nodes and Solutions, IETE Technical Review To link to this article: https://doi.org/10.1080/02564602.2018.1531733

Published online: 22 Oct 2018.

Submit your article to this journal

View Crossmark data

Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=titr20 IETE TECHNICAL REVIEW https://doi.org/10.1080/02564602.2018.1531733

Timing Closure Problem: Review of Challenges at Advanced Process Nodes and Solutions

Sneh Saurabh, Hitarth Shah and Shivendra Singh

Department of ECE, IIIT, Delhi, India

ABSTRACT KEYWORDS Attaining timing closure marks the culmination of an arduous VLSI design process. The targets set Advanced process nodes; for timing closure and the time taken to achieve it can critically impact the success of a product Design flow; Gate-delay; in a highly competitive semiconductor market. Therefore, methodologies employed in VLSI design Process variations; Timing process are strategized to attain quick timing closure along with reasonable design metrics. How- closure; Wire-delay ever, at advanced process nodes, attaining timing closure becomes quite challenging. As a result, at advanced process nodes, innovative solutions are required to be incorporated in VLSI design processes, as well as in Electronic Design Automation (EDA) tools and technologies. In this review paper, we discuss timing closure problem explaining the root cause of its difficulty. We also explain traditional techniques that address timing closure problem. Furthermore, we highlight new chal- lenges that appear at advanced process nodes and discuss solutions to these problems that are being employed or are proposed in literature.

1. INTRODUCTION a given design and have a critical impact on the com- Designing an integrated circuit is a complicated process petitiveness of the corresponding product in the semi- and involves making trade-offs among several conflicting conductor market [5]. Therefore, timing closure is an design parameters. To simplify the overall design pro- importantaspectofadesignflowandacarefulattention cess, the full flow is broken down into several distinct must be paid to it right from the beginning of a design stepsthatarecarriedoutoneafterother,asshownin process. Figure 1(a). The design starts with a specification and istakenthroughaseriesoftransformationandabstract At each stage of a design process, several aspects of a representation. Some of the critical steps in the design circuit such as functionality, timing, area, power, testa- flowarelogicsynthesis,floorplanning,placement,clock bility, yield, and reliability are examined and verified. tree synthesis and routing. At the end of the , The operation of each design step can be modeled as an alayoutisobtained. interaction between a design database, an analysis engine andanimplementationengineasshowninFigure1(b). Before sending the final layout of a circuit to a foundry A design database contains appropriate information of forfabrication,asetofdesignparametersandrulesare a design such as design hierarchy, the interconnection checked.Thesefinalchecksareknownassignoffchecks of gates (netlist), timing constraints, floorplan, etc. An andensurethatthefabricatedchipwillbeabletomeet analysis engine computes or estimates design parameters the given specification. Among signoff checks, verify- related to timing, power, testability, etc. An implementa- ing whether a circuit is able to meet the given timing tion engine transforms a design creating additional infor- constraints is critical. At the end of a design flow, if mation for the design and ensuring that timing, power, thefinallayoutofacircuitisabletomeetthegiven area, testability, and other design constraints are met. timing constraints, then timing closure is said to be attained [1–4]. In this paper, we review the basics of timing closure problem and explain the difficulty in achieving it. Fur- A design flow as shown in Figure 1(a)isanarduouspro- thermore, we describe traditional techniques that are cess and requires a huge amount of effort from designers employed to ameliorate the problem of timing closure [3,5]. The defined targets and the strategies employed and highlight their inadequacies at advanced process to attain timing closure decide the time taken to signoff nodes. The problem of timing closure becomes more

© 2018 IETE 2S.SAURABHET AL.: TIMING CLOSURE PROBLEM

Figure 1: (a) Steps involved in a typical design process or design flow. (b) Interaction between design database, analysis engine and implementation engine difficult at advanced process nodes due to: (a) increased example, in the gate-based netlist, there is no infor- complexity of designs (b) new device phenomenon, and mation about the nature of interconnects. There- (c) lower supply voltages (VDD), higher clock-speed and fore, implementation engines that operate on a gate- increased impact of process variations tighten timing basednetlistareforcedtoestimateinterconnect constraints. Furthermore, we review techniques that can delay using some heuristics. However, the estimated be applied to ease timing closure problem at advanced delay can be widely different from the actual delay process nodes. It is important to point out that, in this computed in the later stages of a design flow [2, 3]. As paper, by advanced process nodes we mean “14-nm” a result, actual timing violations are discovered only onward technology nodes. Therefore, this review will in later stages and creates timing closure problems. also help VLSI designers and researchers in appreciating Furthermore, timing closure problem is aggravated impending challenges of future technology nodes. by the fact that, during later stages of a design flow, the flexibility to make changes in a design decreases. The rest of this paper is organized as follows. In Section 2, For example, after detailed routing, it is difficult to do we explain the basics of timing closure problem. In Sec- logic optimization since some new cells can be cre- tions 3 and 4, we highlight challenges of timing closure ated/destroyed which would entail repeating certain at advanced process nodes and possible solutions. In physical design steps such as placement and routing. Section 5,wemakeconcludingremarks. (2) Conflict among timing and other metrics of a design: The design process involves considering sev- eral parameters such as timing, area, power, signal 2. THE PROBLEM integrity, reliability, etc. In general, some of these The problem of timing closure becomes difficult due to parameters such as power and area are in conflict the following characteristics of design processes: with the timing of a circuit. Therefore, when an implementation engine improves power or area of (1) High level of abstraction in the early phases of a acircuit,thenthetimingdegrades.Furthermore, design: During early phases of a design, the level of the problem is complicated by the fact that, in the abstractionishighandthereislessinformationcon- early phases of a design, the timing-critical portion tent in the design, as illustrated in Figure 2(a). For ofadesignisnotexactlyknown.Therefore,even S. SAURABH ET AL.: TIMING CLOSURE PROBLEM 3

Figure 2: (a) Information content increases and flexibility to make change decreases, as a design flow proceeds. (b) A design flow becomes iterative when decisions of preceding steps need to be reverted

in the timing-critical portion of a circuit an imple- better timing during early stages of a design flow, mentation engine can choose to trade-off timing thus mitigating the timing closure problem. The dif- and improve other parameters of the circuit such as ficulty in finding actual timing-critical portion of power, area, testability, reliability, etc. This exacer- a circuit during early phases of a design flow is bates the problem of timing closure. handled by taking a pessimistic view of the design attributes such as wire-delay or by putting additional The above-mentioned characteristics of design processes timing margins [2,6,7]. These strategies compensate make a design flow iterative and it can never be guaran- the lack of information in early phases of a design teed that a design flow will finally converge3 [ ]. As an flow by over-designing, which often results in an illustration, assume that for a given design, it is found increased area and power [2]. after detailed routing that many of the violating paths (2) Predict and prevent: One of the strategies to mit- pass through a particular ripple-carry adder (RCA). One igate timing closure problem is by predicting the of the techniques to fix this problem can be to change timing problems early in a design flow and taking thearchitectureoftheadderfromRCAtocarrylook- preventive measures to avoid them [8,9]. This strat- ahead(CLA).Thiswillrequirechangingadecisionmade egy can be implemented in various ways depend- during RTL synthesis, as shown in Figure 2(b). Chang- ing on the information content of a design [2,8,9]. ing an RCA to CLA can result in an increased number For example, logic synthesis can take its decisions of cells, which can force displacement and re-routing of based on estimated interconnect delays [9]. The pre- many cells and, in turn, result in another set of timing dictability of interconnect delays can be improved violations. Thus, in practice, multiple iterations of differ- by keeping a companion placed-cell model or by ent design steps are carried out before attaining timing interleaving logic synthesis with placement [2]. closure. The effectiveness of “predict and prevent” strategies depend on the correlation between the estimated Traditionally, following strategies are employed to miti- design attributes and the actual design attributes gate the problem of timing closure: computed in the later phase of a design flow. Prac- tically, ensuring this correlation is difficult. (1) Introducepessimismintheearlyphasesofa (3) Timing-driven implementation engines: The tim- design: Though the goal of timing closure is to meet ing closure problem can be mitigated by making the timing constraints at the end of a design flow, static implementation engines take their decisions based timing analysis (STA) is carried out at individual on circuit timing. For example, a timing-driven design steps of Figure 1(a). This ensures that timing- placement engine internally invokes STA engine to critical portion of a design can be optimized for find critical paths in the design3 [ ,10]. Subsequently, 4S.SAURABHET AL.: TIMING CLOSURE PROBLEM

it assigns higher weights to nets that are more increasing. In contemporary technologies, intercon- timing-critical [3,11].Asaresult,gatesconnected nect delays are comparable to gate-delays. There- with the critical nets are placed together, reducing fore, conventional techniques such as upsizing and the delay of the critical path. Similarly, timing-driven decreasing input capacitances of load transistors buffer insertion and routing can mitigate the timing can only reduce a portion of a path delay and can closure problem [12–14]. be incapable of fixing timing violations. Further- more, with scaling, the resistivity of copper inter- connects increases exponentially due to an increased impact of electron scatterings at surfaces and at grain 3. CHALLENGES AT ADVANCED PROCESS boundaries [19]. As a result, with scaling, delay due NODES to interconnect resistance increases at a higher rate Despite the above-mentioned techniques, with advance- compared to interconnect capacitance [20]. There- ment in process nodes, attaining timing closure becomes fore, techniques such as employing low-k material to more difficult, as described in this section. reduce interconnect capacitance are not expected to reduce the interconnect delays significantly, exacer- bating the problem of timing closure [20,21]. (5) Reliability and aging constraints: At advanced pro- 3.1 Increased Complexity of Designs cesses, issues of reliability and aging become impor- With the advancement in technology, designs become tant due to increased negative bias temperature more complicated and achieving timing closure becomes instability (NBTI), positive bias temperature insta- challenging due to the following reasons: bility (PBTI), hot carrier injection (HCI) and elec- tromigration [22]. To address these issues, tech- (1) Increased size of problem: Following Moore’s law, niques are adopted such as reducing the current lev- the number of transistors in a circuit increases expo- els, reducing supply voltages or reducing the capac- nentially. As a result, the computational resource itance for the signals with a high slew. However, required for attaining timing closure in contempo- in general, these techniques of mitigating reliabil- rary billion-gate designs becomes very high [5]. ity problems increase the delay of a circuit. Simi- (2) Tighter timing constraints: In general, with the larly, VT -shift during the lifetime of a product due advancement in technology, the target clock fre- to bias temperature instability (BTI) needs to be quency of a design increases. Therefore, the time taken care of by adding appropriate margins, adap- required for the signal to propagate decreases and tive body bias (ABB) and adaptive supply voltage meeting timing constraints becomes more difficult. (ASV) techniques, which further complicates tim- (3) Design techniques that trade-off timing: Some of ing closure problem [23]. Furthermore, to accurately the design techniques that are increasingly employed account for aging-effects machine learning-based at advanced process nodes include clock gating, STA (LSTA) can be used [24]. multi-threshold CMOS (MTCMOS) or power gat- (6) Manufacturing and yield-related constraints: At ing, multiple clock and power domains, dynamic advanced process nodes, additional design con- voltage and frequency scaling (DVFS), dynamic straints emerge due to lithography compliance or threshold voltage adjustment, adaptive body bias emerging design challenges. For example, con- (ABB), near-threshold computing (NTC), etc. straints due to multiple patterning lithography [15–17]. Most of these design techniques trade-off (MPL), drain-to-drain (D2D) abutment constraint, timing to achieve lower power consumption. For minimum implant area (MinIA) constraint, con- example, increasing the threshold voltage (VT)ofa straints due to multiple-row height cells, metal- transistor by reverse body bias (RBB) reduces leak- density uniformity requirement etc. [25–28]. These age power and is accompanied by a loss in perfor- constraints increase the complexity of physical mance [18]. Similarly, NTC increases the energy- design and timing closure. efficiency by 10× and is accompanied by a 10× increase in delay of a circuit [16]. Therefore, meeting 3.2 Increased Impact of Process Variations timing constraints in designs employing these novel techniques becomes more difficult. With advancement in technology, the process of fabri- (4) Dominance of interconnect delay: With advance- cation becomes more complicated [29,30]. Furthermore, ment in technology, the contribution of intercon- limitations due to quantum-mechanical effects introduce nect delay to the overall delay of a path has been greater process variations [31].Processvariationsare S. SAURABH ET AL.: TIMING CLOSURE PROBLEM 5 manifested as mismatches in the electrical characteristics specified in a set of Standard Parasitic Exchange Format of identically designed devices and interconnects [31]. (SPEF) files. The impact of process variations on standard The spatial extent of process variations can be across cells is taken into account by using different libraries for different dies (known as die-to-die variations or inter- different PVT corners. Similarly, the impact of process dievariations)orlocatedinsidethesamedie(known variations on interconnects is taken into account by using as within-die variations or intra-die variations). Further- different SPEF files for different RC corners [37]. Fur- more, process variations can be systematic variations thermore, contemporary designs are expected to work (deterministic variations that can be predicted by ana- under different modes, with possibly different clock- lyzing the layout of a circuit) or random (variations that frequencies, a different value of signal such as test-mode, are uncertain and cannot be predicted) [7]. Based on scan-mode, etc. Therefore, timing closure must be estab- fabrication steps, process variations can be classified as: lished for all the modes of a given design. The technique that is commonly employed to establish timing closure at (1) Front end of line (FEOL) process variations: The all possible cases of PVT corners, RC corners and modes variations in FEOL processes impact electrical prop- of a design is multi-mode multi-corner (MMMC) anal- erties of transistors and are due to line edge rough- ysis [37]. Under an MMMC analysis, different combina- ness (LER), random dopant fluctuations (RDFs), tions of PVT corners, interconnect corners and modes of metal-gate work function variations (WFV), gate the design, commonly known as scenarios, are consid- dielectric thickness (tox) variations, effective chan- ered simultaneously, as shown in Figure 3(a). In contrast nel length (Leff ) variations, fin width variations to carrying out STA separately at different scenarios, an (for FinFETs), source/drain length variations and MMMCanalysisoffersafastersolutionsincemultiple spacer length variations [32,33]. These variation loading of designs, libraries and SPEF files are avoided sources become dominant at advanced processes and parallel processing of different scenarios can be due to the small size of device structures. Further- invoked. However, with the advancement in technology, more, sensitivities of the electrical parameter to the establishing timing closure using an MMMC analysis process variations increase for smaller devices due becomes tedious as the number of scenarios dramatically to drain-induced barrier lowering (DIBL), velocity increase due to the following reasons: overshoot, etc. [34]. (2) Back end of line (BEOL) process variations: The (1) The impact of FEOL variations (PVT corners) and variations in BEOL processes become crucial at BEOL variations (RC corners) are considered dis- advanced process nodes due to increased contribu- tinctly in an MMMC analysis. With advancement tion of the interconnect resistance and capacitance in technology, the number of FEOL and BEOL cor- (RC) to the delay of a circuit [31,35]. The pro- ners both increase. Furthermore, the combination cess variations in interconnects are primarily man- of PVT corners and RC corners that results in the ifestedasvariationsinthickness(tconn)andwidth worst-case scenario is often determined by the cir- (wconn) of interconnect and thickness of dielectric cuit structure [7]. Therefore, to impose worst-case layer (tlayer) between consecutive layers of intercon- boundonthedelaysofacircuitsubjectedtoallFEOL nect [36]. Therefore, BEOL variations are expected and BEOL variations, many combinations of PVT to result in variations in delay of a circuit [20,36]. and RC corners must be considered. (2) In a PVT corner-based timing analysis, typically, it Thus, at advanced process nodes, timing closure method- is assumed that a cell-delay increases with temper- ologies must account for greater FEOL and BEOL varia- ature. However, due to competing effects of tem- tions. perature on the mobility and VT of a MOSFET, this assumption breaks down at low VDD [39]. As a result, the cell-delay decreases with temperature 3.3 Increased Multi-Modes Multi-Corner (MMMC) at sufficiently low V . This phenomenon, known Combination DD as inverted temperature dependence (ITD), makes The impact of process variations and changes in the envi- it difficult to determine the temperature for which ronment are generally modeled by analyzing a circuit a cell-delay can be maximized or minimized [39]. at some discrete set of process, voltage and tempera- Therefore, both low and high-temperature corners ture (PVT) conditions or PVT corners. For a standard must be analyzed. cell-based design, the timing characteristics of a cell are (3) The contemporary power-constrained designs specified in a set of libraries (typically in Synopsys Lib- employ a wider range of VDD and multiple power erty format) and the characteristics of interconnects are domains which increase PVT corners. 6S.SAURABHET AL.: TIMING CLOSURE PROBLEM

Figure 3: (a) Combination of PVT corner, RC corner and modes give different scenarios. (b) Functional noise and delay noise [38]

(4) The attributes of design such as multiple clocks and is greater than a certain threshold, then an incor- power-up/down states tend to increase the number rect value can be latched and propagated, causing a of modes. Clock skew requirements among these functional failure. modesmaynotbecompatiblewitheachother.So (2) Delay noise: When a victim and aggressor nets clock tree network should maintain all the constraint are both transitioning, the delay of signals can be among each mode, which complicates the timing affected and a timing violation can be triggered. closure problem. The injected noise on the victim net is proportional to The increased number of scenarios increases the com- the mutual capacitance of the nets and the rate of change plexity and resource requirement of an MMMC-based ofvoltageoftheaggressornet.Theimpactofcrosstalk STA. Similarly, implementation engines must consider all noise aggravates at advanced technology nodes due to scenarios during timing optimization and typically suffer the following reasons: (a) increased mutual capacitance from “ping-pong effect”, i.e. fixing violations in one of the due to reduced spacing and increased aspect ratio of scenarios can lead to additional violations in other sce- wires, (b) lower VT increases the susceptibility of the vic- narios [4,40–42]. The “ping-pong effect” is aggravated at timtoglitchpropagation,(c)lowerVDD results in an advanced process nodes due to a wider variation in sensi- increased VT/VDD ratio, reducing effective hold strength tivities of gate-delays and wire-delays with respect to VDD of the drivers, (d) faster transitions due to faster devices and temperature variations, thus leading to different crit- increases noise injection, and (e) decreased clock period ical paths in different scenarios. Therefore, an MMMC- reducesmarginfornoisetolerance[8,43]. based timing closure becomes increasingly tedious and difficult at advanced process nodes. Itisimportanttopointoutthatthenoiseanalysisiscar- riedoutattheendofadesignflow,oncedetailedroutes are available and mutual capacitance can be extracted 3.4 Impact of Noise [8]. Typically, simple coupling models are combined with The signal integrity issues such as immunity of a design switching-windows overlap determined using STA tools against crosstalk noise become critical at advanced pro- to compute the glitches and delay adjustments [8,44]. cess nodes [38,43]. The crosstalk noise is observed as a Since noise analysis and corresponding fixes are made at change in the voltage waveform of a net (known as vic- the time of signoff, it exacerbates timing closure problem. tim)duetotheswitchingactivityinsomeneighboring nets (known as aggressors) due to the mutual capacitance 3.5 Pessimism in STA betweenthem.AsillustratedinFigure 3(b), crosstalk noise can manifest itself in two ways [38,43]: Ingeneral,STAtakesaconservativeapproachandascer- tains whether a given circuit is safe to operate under (1) Functional noise: Due to crosstalk noise, glitches are the worst-case. The worst-case analysis allows STA algo- generated in a steady state signal and if the change rithms to run in linear time with respect to the size of S. SAURABH ET AL.: TIMING CLOSURE PROBLEM 7 acircuit[7,45]. However, for a given design, it is possi- As technology advances signal integrity becomes very ble that the worst-case considered by STA is unrealistic difficult. To incorporate this with timing closure problem and impossible to occur. These sources of pessimism in STA time will increase. Machine learning can be useful STA can be removed and a higher maximum operating for this issue. frequency can be obtained. Since, achieving timing clo- sure is more difficult at advanced processes, removing pessimism from STA becomes crucial. 4. POSSIBLE SOLUTIONS Someofthepossiblesolutionstothechallengesmen- 3.6 Accuracy of Delay Models tioned above are summarized in Table 1 and described in detail in this section. Currently, STA typically employs non-linear delay model (NLDM) to compute gate-delays. NLDM characterizes delay and output slew of a cell as a discrete func- 4.1 Handling Challenges of Increased Design tion of input slew and output load [6,37,46,47]. An Complexities implicit assumption in NLDM is that the input and the output waveforms are linear and the slope completely The issues arising out of increased design complexities characterizes a waveform shape. However, this assump- can be addressed using following approaches: tion breaks down and NLDM becomes inadequate at advanced process nodes due to back-Miller effects, non- (1) Improvements in Electronic Design Automation linear pin capacitance and need for complex noise anal- (EDA) tools: With the advancement in technol- ysis [48,49]. Moreover, with reducing clock period in ogy, EDA tools have been consistently enhanced to advanced designs, it becomes necessary to achieve closer handle larger and more complicated circuits. Tradi- correlation between the delay models and the behav- tionally, these improvements are obtained using effi- ior of the actual hardware. Therefore, a conventional cient data structures and devising algorithms with delay model becomes inadequate at advanced process reducedcomplexity.Moreover,duringlast10years, nodes [47,48]. the power of parallel computation has been exploited to deliver improved performance [51]. Currently, Moreover, most delay models assume that that the side allmajorcommercialtoolsforsynthesis,placement, inputs of a gate are held constant when the source of a routing, STA, etc., employ parallel processing and timing arc makes a transition. However, in realistic cir- exhibit good scalability. Recently, machine learn- cuits,formultipleinputlogicgates,multipleinputpins ing techniques are also being explored to address can make transitions at the same time depending on VLSI design problems [52]. In future, approaches the arrival time of the signals. This is especially true for such as designing at a higher level of abstraction gates immediately driven by a bank of flip-flop triggered look promising [5]. EDA tools also mitigate tim- simultaneously at the arrival of a clock-edge [50]. How- ing closure problem by exploiting newer degrees of ever, ignoring multi-input switching (MIS) can result in freedom in fixing timing issues. For example, useful both optimistic and pessimistic computation of a gate- skews (by speeding or delaying clock signals reach- delay, as illustrated in Figure 4, potentially leading to a ing a register) are now aggressively employed to chip-failure [4,50]. improve the slack of a design [10,42]. (2) Integrated flow: The difficulty in implementing “predict and prevent” methodology has led to an integrated approach to synthesis, placement and routing steps [8]. Some of the features of an inte- grated flow are (a) Different implementation and analysis engines operate together. For example, synthesis, place- ment and STA engines can be integrated together. (b) The functioning of each individual engine is modeled as a sum of small incremental opera- Figure 4: MIS delay is greater than SIS delay when both inputs rise together. Similarly, when both inputs fall together, an tions. MIS delay can be less compared to the corresponding SIS (c) For a quick timing closure, incremental opera- delay [50] tions must be non-disruptive, i.e. they should 8S.SAURABHET AL.: TIMING CLOSURE PROBLEM

Table 1: Summary of challenges of timing closure at advanced process nodes and possible solutions S. no. Attributes Challenges Possible solutions 1 Design complexity • Increased size of problem • Improvement in EDA tools using innovative data-structure and algorithms and exploiting parallel processing • Tighter timing constraints • Integrated design flows with incremental operations • Conflict between timing and other design • Exploiting new degrees of freedom in design constraints optimization • Additional manufacturing and yield-related • “Divide and conquer” strategies for large designs constraints • Dominance of interconnect delay • Reduce interconnect resistance using optimized aspect ratio, hybrid interconnect structure and novel materials 2 Process variations • Both FEOL and BEOL variations increase • Employ AOCV or POCV along with LVF. SSTA can also become viable in future • Variations in device and circuit attributes increase • Employ design techniques to mitigate the impact of process variations • STA and implementation tools must account for increased variations 3 MMMC analysis • Large number of scenarios • Exclude certain scenarios based on design properties and their boundedness • Increased computational resource requirement • “Ping-pong effect” in optimization 4 Noise analysis • Increased impact of crosstalk noise • Use noise aware CCS/ECSM models • Models must be capable to take care of noise • Consider noise optimization throughout a design flow • Design must be tolerant to noise • Fix final noise failures using suitable design techniques 5AccuracyofSTA • Unnecessary pessimism in STA must be removed • Employ PBA, CPPR and SHPR techniques to ease timing closure problem • Use tightened BEOL corners 6 Accuracy of delay models • Consider non-linearity, overshoot and • Employ CCS and ECSM under-shoot in waveforms • Consider MIS in delay models • Use MCSM or suitable derating for MIS

not make large changes in the untouched por- approach enables carrying out STA even for billion- tion of a design [53]. Moreover, the incremental gate designs. operations can be undone, and reverting a bad (4) Handling large interconnect delays: Since the decision should be possible. impact of interconnect resistance dominate at (d) A tight integration of different engines is made. advanced process nodes, techniques such as increas- An incremental operation of one engine can ing the aspect ratio of interconnects can be consid- trigger incremental operation of some other ered [20]. Furthermore, hybrid interconnect archi- engine. For example, placement of a group of tecture can be employed as follows: (i) use low resis- cells can trigger incremental timing updates. tance aluminum for short wires that are not prone to (e) A sequence of incremental operations of differ- electromigration, and (ii) use copper for long wires entenginesarecarriedoutinsuchamanner and global signals that are prone to electromigra- that a quicker timing closure is achieved. tion [21]. Moreover, employing new materials such (3) Novel strategies: To obtain quicker timing closure as carbon nanotubes and graphene nanoribbons can on large designs, “divide and conquer”-based strate- be explored [20,21]. Furthermore, optical intercon- gies are used. A popular strategy is to divide a given nectscanbeaviablealternativeinthefuture[56]. design into smaller blocks and first achieve timing closureoneachindividualblocks.Subsequently,an 4.2 Handling Increased Impact of Process abstract model of a given block, known as extracted Variations timing model (ETM), is created [54,55]. An ETM contains information of only the interface paths of Following approaches are considered at advanced process a block and omits the details of the internal flop-to- nodes to account for process variations: flop paths. Thereafter, timing closure is achieved at the top-level by instantiating ETMs instead of corre- (1) On-chip variation (OCV) derating factors: The sponding blocks. As a result, during top-level STA, inter-die variations (global variations that affect all the analysis of paths contained fully inside a block devices on a given die in the same manner) are becomes unnecessary. Thus, “divide and conquer” typically considered in STA using PVT/RC corners S. SAURABH ET AL.: TIMING CLOSURE PROBLEM 9

[37,45]. The intra-die variations (local variations in STA [58].Thearrivaltimesarecomputedtopolog- devices and interconnects on a given die) are typi- ically in a breadth-first manner on a timing graph. cally taken into account by OCV derating factors. An The crux of this method is to compute the arrival OCV derating factor can be defined differently for time distribution using two elementary operations different corners based on the type of analysis (early [7,58]: or late), type of path (data path or clock path) and (a) Sum operation: Given statistical distributions of type of delay (gate-delay or wire-delay). An OCV two or more random variables (random vari- derating factor is multiplied with the base delay for ables can be delay or arrival time), a sum oper- a given timing arc, and thus provide a mechanism to ation computes their statistical sum (the result define upper/lower bounds on the delays. It may be is defined by a new statistical distribution). In noted that defining OCV derating factors involve a general, the sum operation is straightforward to trade-off between aggressive timing and considering implement. safe bounds for the delays [45].Asaresult,design- (b) Max operation: Given statistical distributions of ers typically specify a conservative derating factors twoormorerandomvariables,amaxopera- which can limit achievable clock frequency. It is wor- tion computes the statistical distribution of the thy to point out that OCV derating methodology maximum value among the given distributions. makes following implicit assumptions [57]: The max operation is typically carried out on (a) All delays for certain groups of timing arcs get arrival times and the result of the max opera- scaled by the same factor. For example, in the tion is subsequently used in the computation of late analysis, typically, all gate-delays on a data arrival times at downstream nodes. Therefore, path are scaled by the same derating factor. the topological and spatial correlation among Therefore, it is implicitly assumed that a per- variables must be preserved by the max oper- fect positive correlation exists among a certain ation. This complicates the implementation of groupoftimingarcs. the max operation. (b) All delays for certain groups of timing arcs get SSTA has been intensively investigated since the scaled by different scaling factors. For exam- early 2000s. Despite its accuracy in modeling pro- ple, in the late analysis, typically, all gate-delays cess variations, currently, it is not being used for on a launch clock path are scaled by a factor timing closure, primarily because of unfavorable more than 1 and all gate-delays on correspond- cost–benefit of statistical delay models and complex- ing capture clock path are scaled by a factor ities of SSTA deployment in design processes [4,57]. less than 1. Therefore, it is implicitly assumed Alternatively, at advanced process nodes, simpler that a perfect negative correlation exists among techniques as described below are employed. a certain group of timing arcs. (3) Advanced on-chip variation (AOCV): An AOCV However,aperfectcorrelationisunrealisticand, technique enhances the OCV methodology by defin- therefore, OCV methodology is pessimistic. ingcellderatingfactorsasafunctionofpath (2) Statistical STA (SSTA): One of the techniques to attributes.Aderatingtableisspecifiedforacellas remove above-mentioned pessimism is by treating afunctionof: the impact of process variations statistically using (a) Spatial extent of the path in which a cell is statistical STA (SSTA) [7,58]. In a typical SSTA located: If the spatial extent of the path in which methodology, device and interconnect parameters a cell is located is increased, then the derating that are sensitive to process variations are treated factor is increased. It is based on the observa- as Gaussian random variables. For example, gate tion that the cells that are closer together exhibit length, gate width, VT, metal line width, etc., can lesser systematic variations than the cells that be treated as Gaussian random variables. Further- arelocatedfurtherapart. more, the delay of a given timing arc is assumed (b) Logic depth of the path in which a cell is located: to be linearly dependent on these random variables. If the logic depth of the path in which a cell The goal of SSTA, given a statistical distribution of islocatedisincreased,thenthederatingfac- process-sensitive parameters, is to compute the sta- tor is decreased. This is due to the fact that for tistical distribution of the latest arrival time at the paths consisting of a higher number of logic lev- given sink nodes.An SSTA methodology is widely els, the effect of random variation of one cell implemented using a block-based method which can tends to cancel the effect of random variation of run in a linear time with respect to the number of another cell, and on an average smaller random nodes in a timing graph, similar to conventional variation is contributed by each cell. 10 S. SAURABH ET AL.: TIMING CLOSURE PROBLEM

The biggest demerit of AOCV is not taking into 4.4 Preventing and Fixing Crosstalk Noise Issues account the effect of load and input slew on derating In general, crosstalk noise issues are discovered only at factors. the end of a design flow. However, to achieve quicker (4) Parametric on-chip variation (POCV): APOCV timing closure, crosstalk issues must be considered and technique considers the impact of local variations prevented throughout a design flow. For example, dur- in a statistical framework. In contrast to full SSTA, ing floorplanning, narrow channel must be avoided since a POCV methodology does not require statistical parallel wires running within a narrow channel can cre- library characterization or statistical RC extraction. ate crosstalk issues. During physical optimization, some However, it requires information of the relative vari- upper limit must be specified on the slew and, during ation in gate-delay and the RC of interconnects. This routing, parallel lengths of wires should be avoided [8]. information is easier to gather for standard cells, for Despite these measures, noise problems typically get dis- example using Monte-Carlo simulation of a single covered during signoff and can be fixed by (a) increas- cell or a few stages of cells. Using this information, ing slew of an aggressor (downsizing the driver of the along with SSTA block-based method and statisti- aggressor net), (b) upsizing the driver of a victim net, (c) cal operations, the delay and the arrival time are increasing the spacing between an aggressor and the cor- computed as a function of the given relative varia- responding victim, and (d) add a shielding wire between tion. Furthermore, POCV can also account for the an aggressor and the corresponding victim if routing dependency of delay variations on the slew and load resources are available [8]. by taking information from Liberty variation format (LVF) files [57,59]. An LVF file models the variation 4.5 Reducing Pessimism in the Timing Analysis in delay, output slew and constraints (setup/check values) as a function of input slew and output load. Someofthestrategiesemployedtoreducepessimismin Therefore, POCV is similar to a single-variable SSTA STA and ease timing closure problem are as follows: without requiring costly statistical models. Never- theless, it removes artificial pessimism of OCV der- (1) Path-based analysis (PBA): In a conventional STA, ates, and in contrast to AOCV, is able to model worst-case scenario of a circuit is determined using variations at the level of timing arcs [57]. graph-based analysis (GBA). In GBA, for a given output pin of a gate, only the worst possible values Itisimportanttonotethatincreasedimpactofpro- of the arrival time and the output slew are stored cess variations at advanced process nodes must be mit- and propagated, as illustrated in Figure 5(a). How- igated by the implementation engines at the circuit level ever, for a given output of a multi-input gate, the also [8]. worst possible values of the arrival time and the output slew can correspond to two different timing arcs, which is unrealistic. Therefore, a GBA-based 4.3 Increased Multi-Modes Multi-Corner (MMMC) STAgivesalooseboundontheoperatingfrequency Combination and is pessimistic. An alternative technique, referred The challenge of an increased number of scenarios in an to as path-based analysis (PBA), removes the pes- MMMC STA is firstly handled by: simism of GBA by carrying out the analysis of a given path by strictly using the timing arcs of that partic- (1) Excluding certain corner-mode combinations that ular path, as illustrated in Figure 5(a).At advanced cannot exist in a realistic design. process nodes, given the difficulty in achieving tim- (2) Identifying certain scenarios which are contained in ing closure, PBA is carried out for some top thou- some other scenarios, i.e. the worst slack in one sce- sands of paths that exhibit worst slacks. Though PBA nario is guaranteed to be bound by the worst slack in can consume a significant amount of computing and some other scenario. engineering resources, it is now widely employed to reclaim performance lost due to conventional GBA- Nevertheless, even after reducing, the number of sce- based STA. narioscanbeoverwhelminglylarge.InmodernEDA (2) Common path pessimism removal (CPPR): tools, a large number of scenarios is typically handled by Another pessimism in a conventional STA is due exploiting the benefits of multithreading and distributed to common circuit elements in a launch clock path computing. Furthermore, to avoid ping-pong effects in and the corresponding capture clock path. For late optimization, techniques based on machine-learning are analysis, the worst-case scenario is determined by being explored [41]. considering the latest arrival time for a launch clock S. SAURABH ET AL.: TIMING CLOSURE PROBLEM 11

Figure 5: (a) For GBA in late analysis, at the output port OUT, the arrival is MAX(ATA + DA, ATB + DB) and the slew is MAX(SA, SB).For PBA, when the relevant arc is A → OUT, at the output port OUT, the arrival time is (ATA + DA) and the slew is SA, (b) Assuming different delay values for common path (shown as shaded element) for a launch clock path and the corresponding capture clock path introduces pessimism

path and the earliest arrival time for the correspond- model and effective current source model (ECSM) are ing capture clock path. However, when the launch being used [47,48]. These models are able to compute clock path and the capture clock path have common delay with considerable accuracy for high impedance circuit elements, as shown in Figure 5(b), assuming interconnects and distorted signals. The CCS driver different delay values for the same circuit elements model characterizes the drawn current as a function is unrealistic and introduces pessimism. Therefore, of input slew, output load and time. The ECSM driver common path pessimism removal (CPPR) tech- model characterizes the voltage response at the output niques can ease timing closure problems. Moreover, of a gate as a function of input slew and output load. during the implementation of a circuit, sharing of However, since both these models are represented as clock network (between launch and capture paths) 3-D tables, timing libraries become quite bulky [48]. canbedonetoclaimCPPRcredits. The large size of timing libraries increases the runtime (3) Remove pessimism in BEOL corners: The varia- of EDA tools and can impact the schedule of timing tions in different BEOL layers are not fully correlated closure [46]. and impact of random variations in different BEOL layers tend to cancel each other. Since the probability The problem of MIS is proposed to be handled using of all BEOL layers simultaneously taking the worst a multi-port current source model (MCSM) compris- possible value is quite low, corner-based STA is pes- ing non-linear voltage-controlled resistors and capacitors simistic. By tightening the BEOL corners, using the [50]. Recently, MIS is proposed to be handled by simple methodology suggested in [36], some pessimism can gate-specific derating that avoids any major change in be removed. a timing library [4]. However, it appears that enhanc- (4) Setup-hold pessimism reduction (SHPR): Inacon- ing delay model to account for MIS will be required at ventional STA, the setup time, the hold time and advanced process nodes. the clock-to-q delay of a flip-flop are characterized independently. However, there exists an interdepen- dence between them. For example, if the setup time 5. CONCLUSION or the hold time is reduced below a certain threshold, a sharp increase in the clock-to-q delay is observed In this paper, we discussed timing closure problem [60]. These interdependencies can be exploited in and highlighted critical challenges of timing closure at STA to remove some pessimism, by employing flex- advanced process nodes and possible solutions. However, ible timing models and SHPR techniques [60]. it is worthy to mention that the highlighted challenges and solutions cannot be considered exhaustive. Depend- ing on the process technology and design specifics, newer 4.6 Accuracy of Delay Model challenges do emerge. Furthermore, continuing device To fulfill requirement of accuracies of a delay model at innovations and design innovations will continue posing advanced process nodes, composite current source (CCS) new timing closure challenges in future. 12 S. SAURABH ET AL.: TIMING CLOSURE PROBLEM

DISCLOSURE STATEMENT 13.S.-W.Hur,A.Jagannathan,andJ.Lillis,“Timing-driven maze routing,” IEEETrans.Comput.AidedDes.Integrated No potential conflict of interest was reported by the authors. Circ. Syst., Vol. 19, no. 2, pp. 234–41, 2000.

14. C. Chu, “Flute: Fast lookup table based wirelength esti- REFERENCES mation technique,” in IEEE/ACM Int. Conf. Comput. Aided Des., 2004. ICCAD-2004. November 2004, pp. 1. J. Lou, W. Chen, and M. Pedram, “Concurrent logic 696–701. restructuring and placement for timing closure,” in Pro- ceedings of the 1999 IEEE/ACM International Conference on 15. E. Le Sueur and G. Heiser, “Dynamic voltage and frequency Computer-aided Design, IEEE Press, 1999, pp. 31–36. scaling: The laws of diminishing returns,” in Proceedings of the 2010 International Conference on Power Aware Comput- 2. O. Coudert, “Timing and in physical design ing and Systems, 2010, pp. 1–8. flows,” in Proceedings International Symposium on Quality Electronic Design, 2002. IEEE, 2002, pp. 511–516. 16. R. G. Dreslinski, M. Wieckowski, D. Blaauw, D. Sylvester, and T. Mudge, “Near-threshold computing: Reclaiming 3.C.J.Alpert,S.K.Karandikar,Z.Li,G.-J.Nam,S.T.Quay, moore’s law through energy efficient integrated circuits,” H. Ren, C. N. Sze, P.G. Villarrubia, and M. C. Yildiz, “Tech- Proc. IEEE, Vol. 98, no. 2, pp. 253–66, 2010. niques for fast physical synthesis,” Proc. IEEE,Vol.95,no. 3, pp. 573–99, 2007. 17. K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand, “Leakage current mechanisms and leakage reduction tech- 4. A. B. Kahng, “New game, New goal posts: A recent his- niques in deep-submicrometer CMOS circuits,” Proc. tory of timing closure,” in Design Automation Conference IEEE, Vol. 91, no. 2, pp. 305–27, 2003. (DAC), 2015 52nd ACM/EDAC/IEEE. IEEE, 2015, pp. 1–6. 18. J.W.Tschanz,J.T.Kao,S.G.Narendra,R.Nair,D.A.Anto- 5. W. J. Dally, C. Malachowsky, and S. W. Keckler, “21st cen- niadis,A.P.Chandrakasan,andV.De,“Adaptivebodybias tury digital design tools,” in Proceedings of the 50th Annual for reducing impacts of die-to-die and within-die parame- Design Automation Conference,ser.DAC’13.NewYork, ter variations on microprocessor frequency and leakage,” NY: ACM, 2013, pp. 94:1–94:6. IEEE J. Solid-State Circ., Vol. 37, no. 11, pp. 1396–402, 2002. 6. H. Bhatnagar, Advanced ASIC Chip Synthesis: Using Synop- sys’ Design Compiler and PrimeTime.Norwell,MA:Kluwer 19.A.Pyzyna,R.Bruce,M.Lofaro,H.Tsai,C.Witt,L. Academic, 1999. Gignac,M.Brink,M.Guillorn,G.Fritz,H.Miyazoe,D. Klaus, E. Joseph, K. P. Rodbell, C. Lavoie, and D. G. 7. D. Blaauw, K. Chopra, A. Srivastava, and L. Scheffer, “Sta- Park, “Resistivity of copper interconnects beyond the 7 nm tistical timing analysis: From basic principles to state of the node,” in 2015 Symposium on VLSI Technology, Jun. 2015, art,” IEEE Trans. Comput. Aided Des. Integrated Circ. Syst., pp. T120–T121. Vol. 27, no. 4, pp. 589–607, 2008. 20. C.PanandA.Naeemi,“Aparadigmshiftinlocalintercon- 8. L. Lavagno, I. Markov, L. Scheffer, and G. Martin, EDA nect technology design in the era of nanoscale multigate for IC Implementation, Circuit Design, and Process Technol- and gate-all-around devices,” IEEE Electron Device Lett., ogy (Electronic Design Automation for Integrated Circuits Vol. 36, no. 3, pp. 274–76, Mar. 2015. Handbook).BocaRaton,FL:CRCPress,2016. 21. C.PanandA.Naeemi,“Aproposalforanovelhybridinter- 9. L. Amarú, P.Vuillod, J. Luo, and J. Olson, “Logic optimiza- connect technology for the end of roadmap,” IEEE Electron tion and synthesis: Trends and directions in industry,” in Device Lett., Vol. 35, no. 2, pp. 250–52, 2014. 2017 Design, Automation & Test in Europe Conference & Exhibition (DATE), IEEE, 2017, pp. 1303–1305. 22. J. W. McPherson, “Reliability challenges for 45 nm and beyond,” in Proceedings of the 43rd Annual Design Automa- 10. S. Kim, S. Do, and S. Kang, “Fast predictive useful skew tion Conference, ser. DAC ’06. New York, NY: ACM, 2006, methodology for timing-driven placement optimization,” pp. 176–181. in 2017 54th ACM/EDAC/IEEE Design Automation Confer- ence (DAC), Jun. 2017, pp. 1–6. 23. T.B.Chan,W.T.J.Chan,andA.B.Kahng,“Onaging-aware signoff for circuits with adaptive voltage scaling,” IEEE 11. T. T. Kong, “A Novel net weighting algorithm for timing- Trans. Circ. Syst., Vol. 61, no. 10, pp. 2920–30, October driven placement,” in Proceedings of the 2002 IEEE/ACM 2014. International Conference on Computer-aided Design,Ser. ICCAD ’02. New York, NY: ACM, 2002, pp. 172–176. 24. S. Bian, M. Hiromoto, M. Shintani, and T. Sato, “LSTA: Learning-based static timing analysis for high-dimensional 12. C. Alpert and A. Devgan, “Wire segmenting for improved correlated on-chip variations,” in 2017 54th ACM/EDAC/ buffer insertion,” in Proceedings of the 34th Annual Design IEEE Design Automation Conference (DAC), Jun. 2017, pp. Automation Conference, ACM, 1997, pp. 588–593. 1–6. S. SAURABH ET AL.: TIMING CLOSURE PROBLEM 13

25. Y. Du and M. D. Wong, “Optimization of standard 36. T. B. Chan, S. Dobre, and A. B. Kahng, “Improved sig- cell based detailed placement for 16 nm finfet pro- noff methodology with tightened BEOL corners,” in 2014 cess,” in Design, Automation and Test in Europe Con- IEEE 32nd International Conference on Computer Design ference and Exhibition (DATE), 2014, IEEE, 2014, pp. (ICCD), Oct. 2014, pp. 311–316. 1–6. 37. J. Bhasker and R. Chadha, Static Timing Analysis for 26. A. B. Kahng and H. Lee, “Minimum implant area-aware Nanometer Designs: A Practical Approach.1sted.Springer, gate sizing and placement,” in Proceedings of the 24th Edi- 2009. tion of the Great Lakes Symposium on VLSI, ACM, 2014, pp. 57–62. 38. K. L. Shepard and V. Narayanan, “Noise in deep submi- cron digital design,” in Proceedings of the 1996 IEEE/ACM 27. Y. Lin, B. Yu, and D. Z. Pan, “Detailed placement in International Conference on Computer-aided Design, IEEE advanced technology nodes: A survey,” in 2016 13th IEEE Computer Society, 1997, pp. 524–531. International Conference on Solid-State and Integrated Cir- cuit Technology, Oct. 2016, pp. 836–839. 39. A. Dasdan and I. Hom, “Handling inverted temperature dependence in static timing analysis,” ACM Transactions 28.P.Debacker,K.Han,A.B.Kahng,H.Lee,P.Ragha- on Design Automation of Electronic Systems (TODAES), van, and L. Wang, “Vertical m1 routing-aware detailed Vol. 11, no. 2, pp. 306–24, 2006. placement for congestion and wirelength reduction in sub-10nm nodes,” in 2017 54th ACM/EDAC/IEEE Design 40. W.Shen,Y.Cai,W.Chen,Y.Lu,Q.Zhou,andJ.Hu“Useful Automation Conference (DAC), Jun. 2017, pp. clock skew optimization under a multi-corn er multi-mode 1–6. design framework,” in 2010 11th International Symposium on Quality Electronic Design (ISQED), March 2010, pp. 29. S. Natarajan, M. Agostinelli, S. Akbar, M. Bost, A. Bowon- 62–68. der, V. Chikarmane, S. Chouksey, A. Dasgupta, K. Fischer, and Q. Fu et al., “A 14 nm logic technology featuring (2n) 41.K.Han,J.Li,A.B.Kahng,S.Nath,andJ.Lee,“Aglobal- generation FinFET, air-gapped interconnects, self-aligned local optimization framework for simultaneous multi- double patterning and a 0.0588 µm 2 sram cell size,” in mode multi-corner clock skew variation reduction,” in Electron Devices Meeting (IEDM), 2014 IEEE International. Proceedings of the 52d Annual Design Automation Confer- IEEE, 2014, pp. 3–7. ence, ser. DAC ’15. New York, NY, USA: ACM, 2015, pp. 26:1–26:6. 30. R. Xie, P. Montanini, K. Akarvardar, N. Tripathi, B. Haran, S.Johnson,T.Hook,B.Hamieh,D.Corliss,andJ.Wang 42. S. Roy, P. M. Mattheakis, L. Masse-Navette, and D. Z. Pan, et al., “A 7 nm FinFET technology featuring euv pattern- “Clock tree resynthesis for multi-corner multi-mode tim- ing and dual strained high mobility channels,” in Electron ing closure,” IEEETrans.Comput.AidedDes.Integrated Devices Meeting (IEDM), 2016 IEEE International, IEEE, Circ. Syst., Vol. 34, no. 4, pp. 589–602, 2015. 2016, pp. 2–7. 43. M.Becer,R.Vaidyanathan,C.Oh,andR.Panda,“Crosstalk 31. S. R. Nassif, “Design for variability in DSM technologies noise control in an SoC physical design flow,” IEEE Trans. [deep submicron technologies],” in Proceedings IEEE 2000 Comput.AidedDes.IntegratedCirc.Syst.,Vol.23,no.4,pp. First International Symposium on Quality Electronic Design 488–97, 2004. (Cat. No. PR00525), 2000, pp. 451–454. 44. K. Tseng and V. Kariat, “Static noise analysis with noise 32. K. Patel, T. J. K. Liu, and C. J. Spanos, “Gate line ege windows,” in Proceedings 2003. Design Automation Confer- roughness model for estimation of finfet performance vari- ence (IEEE Cat. No.03CH37451), June 2003, pp. 864–868. ability,” IEEE Trans. Electron Device., Vol. 56, no. 12, pp. 3055–63, Dec. 2009. 45.R.Chen,L.Zhang,V.Zolotov,C.Visweswariah,andJ. Xiong, “Static timing: Back to our roots,” in Proceedings of 33.J.-S.Yoon,C.-K.Baek,andR.-H.Baek,“Process-induced the 2008 Asia and South Pacific Design Automation Confer- variations of 10-nm node bulk nFinFETs considering ence, ser. ASP-DAC ’08. Los Alamitos, CA: IEEE Computer middle-of-line parasitics,” IEEE Trans. Electron Device., Society Press, 2008, pp. 310–315. Vol. 63, no. 9, pp. 3399–405, 2016. 46. S. Saurabh and P. Mittal, “A practical methodology to 34. K.J.Kuhn,M.D.Giles,D.Becher,P.Kolar,A.Kornfeld,R. compress technology libraries using recursive polynomial Kotlyar, S. T. Ma, A. Maheshwari, and S. Mudanai, “Pro- representation,” in 2018 31st International Conference on cess technology variation,” IEEE Trans. Electron Device., VLSI Design and 2018 17th International Conference on Vol. 58, no. 8, pp. 2197–208, Aug. 2011. Embedded Systems (VLSID), January 2018, pp. 301–306.

35. D. Prasad, A. Ceyhan, C. Pan, and A. Naeemi, “Adapting 47.J.F.CroixandD.Wong,“Bladeandrazor:Cellandinter- interconnect technology to multigate transistors for opti- connect delay analysis using current-based models,” in mum performance,” IEEE Trans. Electron Device., Vol. 62, Proceedings Design Automation Conference, 2003 IEEE, no. 12, pp. 3938–44, 2015. 2003, pp. 386–389. 14 S. SAURABH ET AL.: TIMING CLOSURE PROBLEM

48. I.Keller,K.H.Tam,andV.Kariat,“Challengesingatelevel Conference, ser. DAC ’02. New York, NY: ACM, 2002, pp. modeling for delay and SI at 65 nm and below,” in Proceed- 152–157. ings of the 45th Annual Design Automation Conference,ser. DAC ’08. New York, NY: ACM, 2008, pp. 468–473. 55. S. Saurabh, N. Kumar, and I. Keller, “Method and appara- tusforcomprehensionofcommonpathpessimismduring 49. S. Saurabh and N. Kumar, “Method and apparatus for timing model extraction,” January 20, 2015, US Patent efficient generation of compact waveform-based timing 8,938,703. models,” August 8, 2017, US Patent 9,727,676. 56. J. W. Goodman, F. J. Leonberger, S.-Y. Kung, and R. A. 50.C.Amin,C.Kashyap,N.Menezes,K.Killpack,and Athale, “Optical interconnections for VLSI systems,” Proc. E. Chiprout, “A multi-port current source model for IEEE, Vol. 72, no. 7, pp. 850–66, July 1984. multiple-input switching effects in CMOS library cells,” in 2006 43rd ACM/IEEE Design Automation Conference,July 57. A. Mutlu, J. Le, R. Molina, and M. Celik, “A parametric 2006, pp. 247–252. approach for handling local variation effects in timing anal- ysis,” in Proceedings of the 46th Annual Design Automation 51. B. Catanzaro, K. Keutzer, and B.-Y. Su, “Parallelizing Conference, ACM, 2009, pp. 126–129. CAD: A timely research agenda for EDA,” in 2008 45th ACM/IEEE Design Automation Conference, June 2008, pp. 58. C. Visweswariah, K. Ravindran, K. Kalafala, S. G. Walker, 12–17. S. Narayan, D. K. Beece, J. Piaget, N. Venkateswaran, and J. G. Hemmett, “First-order incremental block-based sta- 52. A. B. Kahng, “Machine learning applications in physical tistical timing analysis,” IEEE Trans. Comput. Aided Des. design: Recent results and directions,” in Proceedings of the Integrated Circ. Syst., Vol. 25, no. 10, pp. 2170–80, October 2018 International Symposium on Physical Design,ser.ISPD 2006. ’18. New York, NY: ACM, 2018, pp. 68–73. 59. B. Bautz and S. Lokanadham, “A slew/load-dependent 53. M. Pan and C. Chu, “IPR: An integrated placement and approach to single-variable statistical delay modeling,” in routing algorithm,” in Proceedings of the 44th Annual Proceedings Tau Workshop, 2014, pp. 1–18. Design Automation Conference,ser.DAC’07.NewYork, NY: ACM, 2007, pp. 59–62. 60. E. Salman, A. Dasdan, F. Taraporevala, K. Kucukcakar, and E. G. Friedman, “Exploiting setup and hold-time interde- 54. C. W. Moon, H. Kriplani, and K. P. Belkhale, “Timing pendence in static timing analysis,” IEEE Trans. Comput. model extraction of hierarchical blocks by Graph reduc- Aided Des. Integrated Circ. Syst., Vol. 26, no. 6, pp. 1114–25, tion,” in Proceedings of the 39th Annual Design Automation June 2007.

Authors Hitarth Shah is pursuing M.Tech. in VLSI and Embedded Systems from IIIT Delhi. Sneh Saurabh is an Assistant Professor He has completed B.E. in Electrical Engi- at IIIT Delhi in the Department of Elec- neering from GTU. His research interests tronics and Communication Engineering. are in Mixed-signal design and Memory He obtained his Ph.D. from IIT Delhi in design. the year 2012 and B.Tech. (EE) from IIT Kharagpur in the year 2000. Before join- E-mail: [email protected] ing IIIT Delhi in June 2016, he has worked in the semiconductor industry for around Shivendra Singh is pursuing M.Tech. in sixteen years. He has contributed at various technical and man- agerial positions at Cadence Design Systems, Synposys India, VLSI and Embedded Systems from IIIT Magma Design Automation and Atrenta India. He has exper- Delhi.HehascompletedB.EinElec- tronics and Communications Engineering tise in the areas of STA, Logic and Physical Synthesis and Formal Verification. Currently, he is Senior Member, IEEE and from RGPV Bhopal. His research interests an Editor of IETE Technical Review. are in Mixed-signal design and Nanoelec- tronics. Corresponding author. E-mail: [email protected] E-mail: [email protected]