POWER ANALYSIS AND REDUCTION FOR NANOSCALE CIRCUITS

LEAKAGE CURRENT IN THE NANOMETER REGIME HAS BECOME A SIGNIFICANT

PORTION OF POWER DISSIPATION IN CMOS CIRCUITS AS THRESHOLD VOLTAGE,

CHANNEL LENGTH, AND GATE OXIDE THICKNESS SCALE DOWNWARD.

VARIOUS TECHNIQUES ARE AVAILABLE TO REDUCE LEAKAGE POWER IN HIGH-

PERFORMANCE SYSTEMS.

CMOS devices have scaled down- ponents becomes equally important in ward aggressively in each technology genera- nanoscaled devices.2 Hence, the relative mag- tion to achieve higher integration density and nitudes of the leakage components play a performance. However, leakage current has major role in low-leakage logic design.3 increased drastically with technology scaling In the nanometer regime, leakage currents Amit Agarwal and become a major contributor to the total make up a significant portion of the total IC power. Different leakage mechanisms con- power consumption in high-performance dig- Intel Corp. tribute to the total leakage in a device. As Fig- ital circuits. Because high-performance systems ure 1 shows, the three major types of leakage must work within a predefined power budget, mechanisms are leakage power reduces the available power, Saibal Mukhopadhyay impacting performance. It also contributes to • subthreshold, the power consumption during standby oper- Arijit Raychowdhury • gate, and ation, reducing battery life. Hence, designers • reverse-biased, drain- and source-sub- require techniques that reduce leakage power Kaushik Roy strate junction band-to-band-tunneling while maintaining high performance. More- (BTBT).1 over, as different leakage components become Purdue University more important with technology scaling, each With technology scaling, each of these leak- leakage reduction technique needs reevalua- age components increases drastically, result- tion in scaled technologies where subthresh- Chris H. Kim ing in an increase in the total leakage current. old conduction is not the only leakage The increase in different leakage compo- mechanism. Designers will require new, low- University of Minnesota nents with technology scaling has two major power circuit techniques to reduce total leak- implications in leakage estimation and low- age in high-performance nanoscale circuits. power logic design. First, these increases add up to a dramatic increase in total leakage. Leakage components More importantly, each of the leakage com- In addition to the three major leakage com-

68 Published by the IEEE Computer Society 0272-1732/06/$20.00 © 2006 IEEE Subthreshold Gate leakage −06 leakage 10 Gate − 10 07 Subtreshold Gate Junction BTBT −08 Source Drain 10 − 10 09

n+ n+ − 10 10

− 10 11

Reverse-biased − junction BTBT 10 12 − 10 13 Leff = 25 nm Leff = 50 nm Leff = 90 nm

Bulk Figure 2. Contribution of different leakage components in NMOS devices4 at different technology generations. The Figure 1. Major leakage components in a . leakage values are extracted using device simulation in

Medici. VDD values are chosen following the ITRS guidelines (0.7 V at 25 nm, 0.9 V at 50 nm, and 1.2 V at 90 nm). ponents, there are others, such as gate-induced drain leakage (GIDL) and punch-through cur- rent.But those components are not very serious × −07 − 10 in normal modes of operations. GIDL will be 10 04 7 Gate of concern in cases where VGD < 0 (VGD, volt- BTBT 6 age across the gate and the drain of the transis- Leff = 25 nm Subthreshold tor), and pass-gate logic is definitely a part of it. − 5 10 06 However, for the range of VDD suggested by the L = 50 nm eff 4 International Technology Roadmap for Semicon- ductors, we observe that maximum negative 3

Current (A) − Current (A) V = −V , and it does not result in any sig- 10 08 GD DD 2 nificant GIDL. As technology scales downward, the supply 1 voltage must also scale down to reduce − 10 10 0 dynamic power and maintain reliability. How- 1.1 nm 1.3 nm 1.2 nm 1.4 nm Doping-1 Doping-2 T ever, this requires the scaling of Vth to main- (a) ox (b) tain a reasonable gate overdrive. Vth scaling and reduction, because of short-channel Figure 3. Variation of different leakage components with technology genera- effects (SCEs) such as drain-induced barrier- tion and oxide thickness (a), and doping profile (b). Doping-1 has a stronger lowering (DIBL),2 result in an exponential halo profile than Doping-2. We extracted leakage values using device simu- increase in subthreshold current. To control lation in Medici. We chose VDD as 0.7 V at 25 nm and 0.9 V at 50 nm. the SCE and to increase the transistor drive strength, oxide thickness must also become thinner in each technology generation. helps to control the short-channel effect. The Aggressive scaling of oxide thickness results in high doping density near the source- and a high direct-tunneling current through the drain-substrate junctions causes a significant- transistor’s gate insulator.2 On the other hand, ly large BTBT current through these junctions scaled devices require higher substrate doping under high reversed bias.2 We conclude that densities and the application of “halo” pro- each of the leakage components increases with files (implants of a high doping region near technology scaling, as Figure 2 shows. the source and drain junctions of the chan- Figure 3 shows the different leakage com- nel) to reduce the width of the depletion ponents of NMOS devices with physical gate region for the source- and drain-substrate lengths of 25 and 50 nm.4 The plot also shows junctions.2 A narrower depletion region width the results at different oxide thicknesses based

MARCH–APRIL 2006 69 POWER REDUCTION

the different leakage current components have − × 10 08 different temperature dependences. Sub- 1.5 Gate threshold current is governed by the carrier dif- Junction BTBT fusion that increases with an increase of Subthreshold temperature. Since tunneling probability of an Total

m) 1.0 electron through a potential barrier does not μ depend directly on temperature, the gate and the junction band-to-band tunneling are less sensitive to temperature variations. However, Current (A/ 0.5 increasing temperature reduces silicon’s band gap, which is the barrier height for tunneling in BTBT. Hence, the junction BTBT should 0 increase with an increase in temperature. 300 320 340 360 380 400 Figure 4 shows the effect of temperature Temperature (K) variation on individual leakage component of the previously mentioned 25-nm NMOS Figure 4. Simulation result for variation of different leakage device based on the device simulation. In Fig- components with temperature for NMOS device of ure 4, we observe that the subthreshold leak- 4 Leff = 25 nm. age increases exponentially with temperature, the junction BTBT increases slowly with tem- perature, and the gate leakage is almost inde- on device simulation. We varied only the oxide pendent of temperature variation. Figure 4 thickness in the simulations for a particular shows that for this particular NMOS device, technology node (doping remained constant). at T = 300 K (a possible temperature in the The gate and subthreshold leakages correlate standby mode) the gate leakage is the domi- strongly with oxide thickness; a high oxide nant leakage component. However, the sub- thickness results in low gate leakage. Although threshold and BTBT leakages become long-channel MOSFET theory maintains that dominant at T = 400 K (a possible tempera- higher oxide thickness helps to increase the ture in active mode). Hence, it can be con- threshold voltage, it worsens the short-chan- cluded that the individual leakage components nel effect.2 If the short-channel effect is not and the total leakage depend strongly on tem- very high (as in the 50 nm device in Figure 3a), perature (or mode of operation).

increasing Tox might reduce the subthreshold It is evident that in nanoscaled devices all leakage. However, in a nanoscale device where of the different leakage components become SCE is extremely severe (in the 25 nm device important and their magnitude depends in the present case), an increase in the oxide strongly on the device structure, doping pro- thickness will increase the subthreshold leak- file, and temperature. age (Figure 3a). Similarly, the subthreshold leakage and the junction BTBT are strongly Circuit techniques to reduce leakage in logic coupled through the doping profile. Since circuits are mostly designed for the Figure 3b shows the different leakage com- highest performance—to satisfy overall system ponents of a 25-nm device at different dop- cycle time requirements, for example—they

ing profiles (oxide thickness and VDD typically consist of large gates, highly parallel remained constant). A strong “halo” doping architectures with logic duplication. The leak- reduces the subthreshold current but results age power consumption is substantial for such in a high BTBT. Reducing the halo strength circuits. However, not every application lowers the BTBT, but increases subthreshold requires a fast circuit to operate at the highest current considerably. We conclude that the performance level all the time. Modules in magnitude of leakage components and their which computation is bursty (such as certain relative dominance with respect to each other functional units or cache sections) are often depend strongly on device geometry and dop- idle. Thus, there is an opportunity to reduce ing profile. the leakage power consumed by such circuits. The basic physical mechanisms governing Researchers have proposed different circuit

70 IEEE MICRO techniques to reduce leakage Table 1. Circuit techniques to reduce leakage. energy without impacting performance by using this Runtime techniques slack. In Table 1, we catego- Design time techniques Standby leakage reduction Active leakage reduction rize these techniques based on Dual Vth Natural Stacking DVTS when and how they use the Sleep Transistor available timing slack. Dual FBB/RBB

Vth statically assigns high Vth to some in the noncritical paths at design 15,000 time to reduce leakage cur- Single low Vth Single high V rent. Techniques, which use th Dual Vth the slack at runtime, fall into 15,000 two groups, depending on whether they reduce standby Current (A) or active leakage. Standby- 15,000 leakage reduction techniques put the entire system in a Critical Noncritical path path 0 low-leakage mode when 0.1 0.3 0.5 0.7 Low-Vth High-Vth computation is not required. gate gate Delay (ns) Active-leakage reduction techniques slow down the (a) (b) system by dynamically changing Vth to reduce leak- Figure 5. A dual Vth CMOS circuit (a), and a path distribution of dual Vth and single Vth CMOS (b). age when maximum perfor- mance is not needed. In active mode, the operating temperature same critical delay as the single low-Vth increases because of the transistors’ switching CMOS circuit, but the transistors in the non- activities. This has an exponential effect on critical paths can use a high Vth to reduce leak- subthreshold leakage, making this the domi- age power. Dual-threshold CMOS is effective nant leakage component during active mode in reducing leakage power during both stand- and amplifying the leakage problem. by and active modes. Researchers have pro- posed many design techniques, some of which 6 Design time techniques consider upsizing a high Vth transistor in dual-

Design time techniques exploit the delay Vth design to improve performance, or upsiz- slack in noncritical paths to reduce leakage. ing an additional low-Vth transistor to create These techniques are static; once they become more delay slack and then converting it to part of the design, there is no way to dynami- high Vth to reduce leakage power. Upsizing the cally change them while the circuit is operating. transistor affects switching power and die area. Designers can trade off such effect against

Dual-threshold CMOS using a low-Vth transistor, which increases

In logic, designers can assign a high Vth to leakage power. some transistors in the noncritical paths to Domino logic can be susceptible to leak- reduce subthreshold leakage current. This per- age—especially in wide or domino gates. mits the use of low Vth transistors in the crit- Low-threshold evaluation logic reduces noise ical path(s), preserving performance.5 This immunity. So, for scaled technologies, domi- technique does not require additional cir- no logic can require larger keeper transistors, cuitry, and can support both high perfor- which in turn can affect speed. Figure 6 shows mance and low leakage simultaneously. a typical dual-Vth domino logic circuit for low- Figure 5a illustrates the basic idea of a dual- leakage noise-immune operations.7 Because

Vth circuit. Figure 5b shows the path distrib- of the fixed transition directions in domino ution of dual- and single-Vth standard CMOS logic, you can easily assign low Vth to all tran- for a 32-bit adder. Dual-Vth CMOS has the sistors that switch during evaluate mode and

MARCH–APRIL 2006 71 POWER REDUCTION

neling is a significant portion of total leakage. I2 I3 Clockn+1 Oxide thickness

A higher oxide thickness (Tox) can yield a Clockn high V device for dual-threshold CMOS cir- P1 th cuits. Higher Tox not only reduces the sub- threshold leakage but reduces gate oxide IN I1 n+1 tunneling, since the oxide tunneling current Low Vt decreases exponentially with an increase in IN pulldown n oxide thickness. Since higher oxide thickness reduces the gate capacitance, it is also benefi- cial for dynamic power reduction.8 However,

Figure 6. A dual Vth domino gate with low in a nanoscale device where SCEs are extreme- 7 Vth devices shaded. ly severe (as in 25-nm devices), an increase in the oxide thickness will increase the sub- threshold leakage (Figure 3a). To suppress

high Vth to all transistors that switch during SCEs, the high-Tox device must have a longer 8 precharge modes. When a dual-Vth domino channel length compared to low-Tox devices. logic stage goes into standby mode, the domi- Advanced process technology is required for no clock must be high (evaluate) to shut off fabricating multiple oxide thicknesses.

the high-Vth devices (that is, P1, I2 PMOS, and

I3 NMOS). Furthermore, to ensure that the Channel length internal node remains at solid logic zero, For short-channel transistors, the threshold

which turns off the high-Vth keeper and I1 voltage increases with the increase in channel

NMOS, the initial inputs into the domino length (Vth roll-off). A multiple-channel- gate must be set high. length design uses the conventional CMOS The fabrication process can achieve a high- technology. However, for the transistors with μ Vth device by varying different parameters, or feature size close to 0.1 m, designs can use changing the doping profile, using higher halo implants2 to suppress SCE. This causes a

oxide thickness, and increasing the channel very sharp Vth roll-off; and hence, it is non- length. Each parameter has its own trade-off trivial to control the threshold voltages near in terms of process cost, effect on different the minimum feature size for such technolo- leakage components, and SCEs. gies. The longer transistor lengths for the

high-Vth transistors will increase the gate Doping profile capacitance, which has a negative effect on the Increasing the channel doping densities8 is a performance and power.

commonly used technique to achieve higher With the increase in Vth variation and sup- threshold voltages. It does require two addi- ply voltage scaling, it is becoming difficult to

tional masks, resulting in high process cost. maintain sufficient gap among the low Vth,

However, the threshold voltage can vary because high Vth, and supply voltage required for dual-

of the nonuniform distribution of the doping Vth design. Furthermore, dual-Vth design density, making it difficult to achieve dual- increases the number of critical paths in a die. threshold voltages when the threshold voltages It has been shown that as the number of crit-

are very close to each other. High-Vth is also ical paths on a die increases, within-die delay achievable by increasing the strength of halo, variation causes both the mean and standard increasing the peak doping, moving the posi- deviation of the die’s frequency distribution tion of the lateral peak of the halo close to the to become smaller, resulting in reduced per- channel’s center, and moving the position of the formance.9 halo’s vertical peak away from the bottom junc- tion and toward the surface. However, increas- Runtime techniques ing the halo strength increases junction A common architectural technique to keep tunneling (Figure 3b), which might become the power of fast, hot circuits within bounds severe in nanoscaled devices where junction tun- has been to freeze the circuits—place them in

72 IEEE MICRO a standby state—any time they are not need- 1.5V ed. Standby-leakage reduction techniques exploit this idea to place certain sections of VG1 = 0 the circuit in standby mode (low-leakage Node 1: Vq1 = 89 mV Leakage (nA) mode) when they are not required. 10 VG1 = 0 Node 2: 8 Vq1 = 84 mV Exploiting natural transistor stacks 6 Leakage currents in NMOS or PMOS tran- VG1 = 0 Node 3: 4 sistors depend exponentially on the voltage at Vq1 = 14 mV 2 the four transistor terminals. Increasing the VG1 = 0 source voltage of the NMOS transistor expo- No. of transistors off in stack nentially reduces subthreshold leakage current because of negative Vgs, a lowered signal rail (VDD (a) (b) − Vs), reduced DIBL, and body effect. This effect, also called self-reverse biasing of the tran- Figure 7. Effect of transistor stacking on source voltage (a), and leakage cur- sistor, is achievable by turning off a stack of tran- rent versus number of transistors off in stack (b). sistors. Turning off more than one transistor in a stack raises the stack’s internal voltage (source voltage), which acts as a source reverse. Table 2. Input vector control. Figure 7a depicts a simple pull-down net- work of a four-input NAND gate. This pull- Circuit Input vector IDDQ (nA) Type of case down network forms a stack of four transistors. Four-input NAND ABCD = 0000 0.60 Best If you turn off some transistors for a long time, ABCD = 1111 24.1 Worst the circuit reaches a steady state where leakage Three-input NOR ABC = 111 0.13 Best through each transistor is equal, and the volt- ABC = 000 29.5 Worst age across each transistor settles to a steady- Full adder ABCi = 111 7.8 Best state value. In a case where only one NMOS ABCi = 001 62.3 Worst device is off, the voltage at the source node of 4 bit ripple adder ABC = 111 91.3 Best the off transistor would be virtually zero A = B = 1111, Ci = 1 94.0 Best because all the other transistors that remain on A = B = 0101, Ci = 1 282.9 Worst will act as a short circuit. Hence, there is no self-reverse biasing effect, and the leakage across the off transistor is large. If more than The voltages at the internal nodes depend one transistor is off, the source voltages of the on the input applied to the stack. Functional off transistor—except the one connected to blocks such as NAND, NOR, or other com- ground by the on transistors—will be greater plex gates have a ready stack of transistors. Max- than zero. The most negatively self-reverse imizing the number of off transistors in a biased transistor (since subthreshold leakage is natural stack by applying proper input vectors an exponential function of gate-source volt- can reduce the standby leakage of a functional age) will be the main determiner of overall block. Lee et al.10 propose a model and algo- leakage. The voltages at the internal nodes rithm to estimate leakage and to select the prop- depend on the input applied to the stack. er input vectors to minimize the leakage in logic Figure 7a shows the internal voltages when blocks. Table 2 shows the quiescent current all four transistors are off. These internal volt- (IDDQ) flowing into different functional blocks ages make the off transistors self-reverse biased. for the best- and worst-case input vectors. All The reverse bias makes the leakage across the results are based on HSpice simulation using μ transistor that is off very small. Figure 7b shows 0.18- m technology with VDD = 1.5 V. Results the subthreshold leakage current versus the show that applying the proper input vector can number of off transistors in a stack. There is a efficiently reduce the total subthreshold leak- large difference in leakage current between age in the standby mode of operation. one-and two-transistor-off cases. Turning off Gate and junction leakage are also impor- three transistors improves subthreshold leak- tant in scaled technologies, and can be a sig- age, but provides a diminishing return. nificant portion of total leakage. So we must

MARCH–APRIL 2006 73 POWER REDUCTION

VDD ings are possible. Such a circuit topology—one that incorporates and uses a high-Vt sleep tran- SL Sleep control transistor sistor—is known as multithreshold CMOS or 11 VDDV MTCMOS. Figure 8 shows an example. In fact, only one type (either PMOS or

NMOS) of high-Vth transistor is sufficient for leakage reduction. The NMOS insertion scheme is preferable, since the NMOS on- resistance is lower than that for a PMOS at the same width. The NMOS can thus be smaller than a corresponding PMOS.12 How- VSSV SL Sleep control transistor ever, MTCMOS can only reduce leakage power in standby mode, and the large insert-

VSS ed sleep transistors can increase the area and delay. Moreover, if the circuit must retain data Figure 8. Schematic of an MTCMOS circuit in standby mode, it will require an extra high- 11 with low Vth device shaded. Vth memory circuit to maintain the data.

Instead of using high-Vth sleep transistors, a super cutoff CMOS (SCCMOS) circuit uses

reinvestigate the stack-of-transistors technique low-Vth transistors with an inserted gate bias for these types of leakage. generator.13 In standby mode, the gate is applied − Researchers have shown that with high gate to VDD + 0.4 V for PMOS (VSS 0.4 V for leakage, the traditional way of using stacked NMOS) by using the internal gate bias gener- transistors fails to reduce leakage and in the ator to fully cut off the leakage current. Com- worst case might increase overall leakage.3 Gate pared to MTCMOS—in which it becomes

leakage depends on the voltage drop across dif- difficult to turn on the high-Vth sleep transistor ferent regions of a transistor. Applying 00 as at very low supply voltages—SCCMOS circuits the input to a two-transistor stack only reduces can operate at very low supply voltages. subthreshold leakage and does not change the Heo and Asanovic14 proposed a sleep tran- gate leakage component. Using 10 reduces the sistor technique to save leakage in domino voltage drop across the terminals, where the gates. Figure 9 shows two small sleep transis- gate leakage dominates, thereby lowering the tors added to a conventional CMOS domino gate leakage while offering marginal improve- gate.14 In standby mode, the clock is high, and ment in subthreshold leakage.3 Therefore, in the sleep signal is asserted. If the data input scaled technologies where gate leakage domi- was high, node 1 would have been discharged. nates the total leakage, using 10 might produce If the data input was low, node 1 would be more savings in leakage, compared to 00. The high but leakage through the NMOS dynam- source-substrate and drain-substrate junction ic pull-down stack would slowly discharge the BTBT leakage is a weak function of input volt- node to ground. The NMOS sleep transistor age and hence we can neglect it in this analysis. prevents any short-circuit current in the stat- ic output logic while the dynamic node dis- Forced-stack (sleep) transistor charges to ground. Node 2 would rise as static This technique inserts an extra series-con- pull-up turns on, which would cause the nected (sleep) transistor in the pull-down/pull- NMOS transistors in the pull-down stacks of up path of a gate and turns it off in standby the following domino gates to turn on. This mode. The extra transistor is on during nor- would accelerate the discharge of their internal mal operation. This provides a substantial sav- dynamic nodes. Since sleep transistors are not ings in leakage current during standby mode. in the critical path (the evaluation path), this However, the extra stacked transistor makes the technique incurs a minimal performance loss. drive current of the forced-stack gates lower, resulting in increased delay. Hence, this tech- Forward or reverse body biasing nique is only usable for noncritical paths. If the Variable-threshold CMOS (VTCMOS) is 15 sleep transistor’s Vth is high, extra leakage sav- a body-biasing design technique. Figure 10a

74 IEEE MICRO shows the VTCMOS scheme. To achieve dif- ferent threshold voltages, this scheme uses a Sleep (=1) self-substrate-bias circuit to control the body bias. In active mode, VTCMOS technique applies a zero body bias (ZBB). In standby Clock P1 mode, it applies a deep reverse body bias (=1) (RBB) to increase the threshold voltage and to cut off the leakage current. Providing the Node 2 Node 1 → IN Sleep (0 1) body bias voltage requires routing a body bias Low Vt (1→ 0) (=0) grid, and this adds to the overall chip area. (=0) pulldown Keshavarzi et al. reported that RBB lowers IC leakage by three orders of magnitude in a 0.35-μm technology.16 However, more recent Figure 9. Domino gate with sleep transistor.14 data shows that RBB’s effectiveness in lower- ing Ioff decreases as technology scales, because of the exponential increase in source-substrate A design can use RBB in standby mode and drain-substrate band-to-band tunneling together with FBB to further reduce the leak- leakage at the source-substrate and drain-sub- age current. Researchers have shown that FBB strate p-n junctions (because of halo doping in and high Vth along with RBB provides a scaled devices).16 Moreover, the shorter chan- 20 × reduction, as opposed to 3 × for RBB nel lengths as technology scales and the lower and low Vth. FBB devices, however, have larg- channel doping (to reduce Vth) worsen SCE er junction capacitance and body effect, which and diminish the body effect. This in turns reduces the delay improvement, especially in weakens RBB’s Vth modulation capability. stacked circuits. It is also possible to combine

For scaled technologies, researchers have FBB with a lower VDD to reduce the switching recently proposed using forward body biasing and standby leakage power yet achieve the

(FBB) to achieve better current drive with same performance as for a high VDD. 17 fewer SCEs. This circuit uses high-Vth tran- Raising the NMOS source voltage while sistors (high channel doping), reducing leak- tying the NMOS body to ground can produce age in standby mode, while employing FBB the same effect as RBB. Applying a negative in active mode to achieve high performance. source voltage with respect to the body, which Both high channel doping and FBB reduce is tied to ground, can also provide FBB. Figure SCEs, relaxing the scalability limitations on 10b illustrates the circuit diagram for these 18 channel length imposed by Vth roll-off and techniques. The main advantage is that it then

DIBL. This results in higher Ion compared to eliminates the need for a deep N- or triple-well low-Vth designs for same worst-case Ioff, process since the target system and the control improving performance. circuitry can share the same substrate.

V Standby NWELL VBP VDD Active VDD VNWELL

VSL 0 Active Standby Active VSS VBN VSL Standby (a) (b)

Figure 10. Variable threshold CMOS15 (a), and realizing body biasing by changing the source voltage with respect to body voltage, which is grounded18 (b).

MARCH–APRIL 2006 75 POWER REDUCTION

Error (n) = Fclk (n) – Fcp (n)

Clock System VNWELL + Feedback Charge VNWELL Σ VNWELL _ algorithm pumps Counter

Critical N path replica VNWELL Counter

Figure 11. Dynamic Vth scaling system.

Active-leakage reduction techniques continuous body bias control to track the

During active mode, circuits work at high optimal Vth for a given workload. A clock temperatures. Figure 4 shows that the sub- speed scheduler embedded in the operating threshold leakage increases exponentially with system determines the (reference) clock fre- temperature, the junction BTBT increases quency at runtime. The DVTS controller slowly with temperature, and the gate leakage adjusts the PMOS and NMOS body bias so is almost independent of temperature varia- that the oscillator frequency of the critical- tion. Because of the exponential increase in path replica tracks the given reference clock leakage, the active-leakage power in sub-100- frequency. The error signal—the frequency nm generations accounts for a large fraction of difference between the reference clock and the the total power consumption, even during oscillator—goes into the feedback controller. runtime. However, not every application The continuous feedback loop can also com- requires a fast circuit to operate at the highest pensate for process, supply voltage, and tem- performance level all the time. Active-leakage perature variations.

reduction techniques exploit this idea to inter- A simpler method, called the Vth hopping mittently slow down the fast circuitry and scheme, dynamically switches between low

reduce the leakage power consumption as well and high Vth, depending on the performance as the dynamic power consumption when demand.20 Figure 12 shows the schematic dia- maximum performance is not required. gram for this scheme. Compared to the con- tinuous body bias control in Figure 11, the

Dynamic Vth scaling (DVTS) discrete control has two levels of Vth. If control The DVTS scheme uses body biasing to signal VTHlow_Enable is asserted, the tran-

adaptively change the Vth based on the per- sistors in the target system are forward body

formance demand. The circuit delivers the biased, and Vth is low. When the system can

lowest Vth via ZBB, when the highest perfor- trade off performance for lower power con- mance is required. When the performance sumption, VTHhigh_Enable is asserted, and

demand is low, this scheme reduces the clock Vth is high. The operating frequency of the

frequency, and raises Vth via RBB to reduce target system is fCLK when Vth is low and fCLK2

the runtime leakage power dissipation. In when the Vth is high. Researchers have veri-

cases when there is no workload at all, the cir- fied an algorithm that adaptively changes Vth

cuit can increase Vth to its upper limit to sig- depending on the workload, applying it to an nificantly reduce the standby leakage power. MPEG4 video encoding system. As men- This scheme delivers just enough throughput tioned in the previous section, the RBB’s effec- for the current workload by tracking the opti- tiveness is reducing because of the worsening

mal Vth. It considerably reduces leakage power SCE and increasing BTBT leakage at the by intermittently slowing down the circuit. source-substrate and drain-substrate junc- The literature documents the proposal of tions. Applying FBB together with RBB can several DVTS system implementations.19-20 achieve a better performance-leakage trade- Figure 11 shows DVTS hardware that uses off for DVTS systems.

76 IEEE MICRO Circuit techniques to reduce leakage in cache memories VTHlow_Enable Figure 13a shows the 7 available terminals VTHlow_Enable in a conventional six-transistor SRAM cell: VBSP2 VSL, VPWELL, VNWELL, VDL, VWL, VBL, and VBLB. Continuous VTH controller Researchers have proposed various SRAM cell VBSP1 architectures that control one or more of the seven terminal voltages during standby mode, Frequency controller to reduce the leakage components in Figure VBSN2 13b. Each technique exploits the fact that the V fCLK or fCLK/2 BSN1 active portion of a cache is very small, which Control blockVTH selector Processor core gives the opportunity to put the large idle por- tion in a low-leakage sleep mode. We evaluate the effectiveness and overhead of each tech- Figure 12. Vth hopping scheme. nique based on the following criteria:

• Leakage reduction. Although subthreshold • Mode transition overhead. Although cre-

leakage still continues to dominate the Ioff ating alternate mode can save power, most at high temperatures, ultrathin oxides and systems have a limited time and energy high doping concentrations have led to a budget for mode transition. An assess- rapid increase in direct-tunneling gate ment should consider the overhead in leakage and BTBT leakage at the source terms of transition latency and energy. and drain junctions in the nanometer • Stability. The leakage reduction technique regime. Each leakage reduction technique should not have a noticeable impact on needs reevaluation in scaled technologies SRAM cell stability or soft error rate. where subthreshold conduction is not the only leakage mechanism. Table 3 shows our assessment of the various • Performance. Some techniques, such as techniques, based on these criteria. source biasing, lengthen the delay of The source biasing scheme raises the source 21,22 reading or writing to memory due to line voltage (VSL) in sleep mode, which additional circuit in critical path. This reduces subthreshold leakage because of the negative effects needs assessment along three effects described earlier. It also reduces side a given technique’s benefits. the gate leakage in the cell because of the

V V V '1' '1' BL DL BLB M4 M2

VWL M5 VNWELL '1' M6 '0'

'0' '0' M3 M1 VPWELL

6TSRAM

V Gate leakage Subthreshold leakage (a)SL (b)

Figure 13. Seven terminals of the 6T SRAM cell (a), and dominant leakage components in a 6T SRAM (b).

MARCH–APRIL 2006 77 POWER REDUCTION

Table 3. Low-leakage SRAM cell techniques.

Source RBB/FBB Dynamic VDD Leakage Negative word

Criteria biasing (VSL)(VPWELL, VNWELL)(VDL) biased (VBL, VBLB) line (VWL) Leakage Subthreshold, Subthreshold: ↓↓ Subthreshold, gate: ↓ Subthreshold, Subthreshold: ↓, reduction gate: ↓↓ BTBT: ↑ Bit line leakage: ↑ gate: ↓ gate: ↑ Performance Delay increase No delay increase No delay increase No delay increase No delay increase Overhead Medium transition Large transition Large transition Precharge latency Low charge overhead overhead overhead overhead pump efficiency Stability Increases No impact on Worst soft-error No impact on No impact on soft-error rate soft-error rate rate soft-error rate soft-error rate, high voltage stress

− 22 relaxed signal rail, VDD VSL. Such a scheme es a larger latency or energy transition over- requires an extra NMOS to be series con- head than does SBSRAM. Moreover, the great- nected in the pull-down path to cut off the est drawback of the DVSRAM is that it source line from ground during sleep mode; increases the bit line leakage in sleep mode this in turn imposes an extra access delay. The since the voltage level in the stored node also

reduced signal charge in sleep mode also caus- drops for lower VDD. Therefore, this scheme is

es the soft-error rate to rise, requiring addi- not suitable for dual-Vth designs where the tional error correction code circuits. speed-critical access transistors may already be

RBB the NMOS or PMOS transistors can using low-Vth devices with high leakage levels. reduce subthreshold leakage via body effect, Researchers have also proposed a technique but does not affect access time by switching that biases the bitlines to an intermediate level to ZBB in active mode.23 The body bias tran- to reduce the access transistor leakage via the sition does impose a large latency or energy DIBL effect.26 Since only access transistors

overhead because of the large VBB swing and benefit from the leakage reduction, the over- substrate capacitance. This scheme becomes all leakage savings is moderate. Unlike the less attractive in scaled technologies because three previously mentioned techniques, it is the body coefficient decreases with smaller necessary to apply this scheme to the entire dimensions, and RBB also increases the source subarray because different cache lines share and drain junction BTBT leakage. For scaled the bit line. The main limitation comes from technologies, a recently proposed design uses the fact that there is a precharge latency when- FBB to reduce subthreshold leakage and to ever a new subarray is accessed. This would achieve better current drive while maintain- require an architectural modification to ing reasonable junction BTBT.24 A new high- resolve the multiple hit times in case the

Vth device optimized for FBB changes the precharge instant is not known ahead of time. 27 doping profile by adjusting the peak halo dop- The negative word line scheme pulls VWL ing (channel engineering) or uses gate mate- down to a negative voltage during standby to rial with a higher work function (a technique avoid subthreshold leakage through the access called work function engineering).24 This transistors. However, it causes increased gate scheme resolves the drawback associated with leakage and higher voltage stress in the access RBB SRAM and suggests a viable solution for transistors. Although this technique has no reducing leakage in nanoscale memories. impact on performance or soft-error rate, it

A dynamic VDD SRAM (DVSRAM) lowers causes a loss of power because of generating the supply voltage,25 which in turn reduces the negative bias using charge pumps. This becomes subthreshold, gate, and BTBT leakage. This more serious as the supply voltage scales. − scheme requires a smaller signal rail (VDL

VGND) compared to the SBSRAM for equiva- n each more advanced technology generation, lent leakage savings. Although there is no Isemiconductor devices scale downward to impact on delay in active mode, the large VDD achieve high integration density. At the same swing between sleep and active mode impos- time, supply voltage also scales downward to

78 IEEE MICRO achieve a lower switching energy per device. Techniques for Low-Power Digital Circuits,” However, high performance also requires a com- IEEE J. Solid-State Circuits, vol. 35, no. 7, mensurate scaling of the transistor threshold July 2000, pp. 1,009-1,018. voltage, which in turn causes an exponential 8. N. Sirisantana, L. Wei, and K. Roy, “High- increase in subthreshold leakage current. So Performance Low-Power CMOS Circuits aggressive device scaling into the nanometer Using Multiple Channel Length and Multiple regime not only increases the subthreshold leak- Oxide Thickness,” Proc. Int’l Conf. Comput- age but also has other negative impacts, such as er Design (ICCD ’00), IEEE CS Press, 2000, increased drain-induced barrier-lowering, Vth pp. 227-234. roll-off, reduced on- to off-current ratio, and 9. K.A. Bowman et al., “Impact of Die-to-Die and increased source-drain resistance. Avoiding these Within Die Parameter Fluctuations on the SCEs requires the incorporation of oxide thick- Maximum Clock Frequency Distribution for ness scaling and higher nonuniform doping, Gigascale Integration,” IEEE J. Solid-State Cir- which results in an exponential increase in gate cuits, vol. 37, no. 2, Feb 2002, pp. 183-190. and junction BTBT leakage. Collectively, these 10. Z. Lee, et al., “Two-Dimensional Doping Pro- factors lead to an increase in total leakage, mak- file Characterization of by Inverse ing leakage current a major component of total Modeling Using Characteristics in the Sub- power consumption. Hence, leakage reduction threshold Region,” IEEE Trans. Electron techniques are becoming indispensable to future Devices, vol. 46, no. 8, Aug. 1999, pp. designs. MICRO 1,640–1,649. 11. S. Mutoh et al., “1-V Power Supply High- Acknowledgments Speed Digital Circuit Technology with Multi- The research was funded in part by Semi- threshold Voltage CMOS,” IEEE J. Solid-State conductor Research Corp. (SRC 1078.001), Circuits, vol. 30, no. 8, Aug. 1995, pp. 847-854. the Gigascale System Research Center, Intel, 12. J. Kao, A. Chandrakasan, and D. Antoniadis, and IBM. “Transistor Sizing Issues and Tool for Multi- threshold CMOS Technology,” Proc. ACM/ References IEEE Design Automation Conference (DAC 1. S. Borkar, “Design Challenges of Technolo- 34), ACM Press, pp. 409-414, June 1997. gy Scaling,’’ IEEE Micro, vol. 19, no. 4, July- 13. H. Kawaguchi, K. Nose, and T. Sakurai, “A Aug. 1999, pp. 23-29. CMOS Scheme for 0.5V Supply Voltage with 2. Y. Taur and T.H. Ning, Fundamentals of Pico-Ampere Standby Current,” Proc. IEEE Modern VLSI Devices, Cambridge Univ. Int’l Solid-State Circuits Conf., IEEE Press, Press, 1998. 1998, pp. 111-116. 3. S. Mukhopadhyay et al., “Gate Leakage 14. S. Heo and K. Asanovic, “Leakage-Biased Reduction for Scaled Devices Using Tran- Domino Circuits for Dynamic Fine-Grain sistor Stacking,” IEEE Trans. VLSI Systems, Leakage Reduction,” Proc. Symp. VLSI Cir- vol. 11, no. 4, Aug. 2003, pp. 716-730. cuits, IEEE Press, June 2002, pp. 316-319. 4. D.A. Antoniadis et al., “‘Well-Tempered’ 15. T. Kuroda et al., “A 0.9V 150 MHz 10 mW Bulk-Si NMOSFET Device Home Page,” 4 mm 2-D Discrete Cosine Transform Core http://www-mtl.mit.edu/researchgroups/ Processor with Variable-Threshold-Voltage Well/. Scheme,” Proc. IEEE Int’l Solid-State Cir- 5. M. Ketkar et al., “Standby Power Optimiza- cuits Conf., IEEE Press, 1996, pp. 166-167. tion via Transistor Sizing and Dual Threshold 16. A. Keshavarzi et al., “Effectiveness of Voltage Assignment,” Proc. Int’l Conf. Com- Reverse Body Bias for Low Power CMOS puter-Aided Design (ICCAD 02), IEEE Press, Circuits,” Proc. 8th NASA Symp. VLSI 2002, pp. 375-378. Design, IEEE Press, 1999, pp. 231-239. 6. T. Karnik et al., “Total Power Optimization by 17. S. Narendra et al. “Forward Body Bias for

Simultaneous Dual-Vt Allocation and Device Microprocessors in 130-nm Technology Sizing in High-Performance Microproces- Generation and Beyond,” IEEE J. Solid State sors,” Proc. 39th Design Automation Conf. Circuits, IEEE Press, May 2003, pp. 696-701. (DAC 02), IEEE Press, 2002, pp. 486-491. 18. H. Mizuno et al., “An 18-μA Standby Current 7. J.T. Kao et al., “Dual-Threshold Voltage 1.8-V, 200-MHz Microprocessor with Self-

MARCH–APRIL 2006 79 POWER REDUCTION

Substrate-Biased Data-Retention Mode,” University and a BTech in electrical engineer- IEEE J. Solid-State Circuits, vol. 34, no. 11, ing from the Indian Institute of Technology, Nov. 1999, pp. 1,492-1,500. Kanpur, India. He is a member of the IEEE.

19. C.H. Kim and K. Roy, “Dynamic Vth Scaling Scheme for Active Leakage Power Reduc- Saibal Mukhopadhyay is a PhD candidate in tion,” Proc. Design, Automation, and Test in electrical and computer engineering at Pur- Europe, IEEE CS Press, 2002, pp. 163-167. due University. His research interests include

20. K. Nose et al., “Vth-Hopping Scheme for 82 the analysis and design of low-power and Percent Power Saving in Low-Voltage robust circuits using nanoscale CMOS and Processors,” Proc. IEEE Custom Integ. Circ. circuit design using double-gate transistors. Conf., IEEE Press, 2001, pp. 93-96. Mukhopadhyay has a BE in electronics and

21. A. Agarwal, H. Li, and K. Roy, “A Single-Vt telecommunication electrical engineering Low-Leakage Gated-Ground Cache for Deep from Jadavpur University, Calcutta, India. He Submicron,” IEEE J. Solid-State Circuits, is a student member of the IEEE. IEEE Press, 2003, pp. 319-328. 22. A. Agarwal and K. Roy, “Noise Tolerant Arijit Raychowdhury is a PhD candidate in Cache Design to Reduce Gate and Sub- electrical and computer engineering at Pur- threshold Leakage in Nanometer Regime,” due University. His research interests include Proc. Int’l Symp. Low Power Electronics and device and circuit design for scaled silicon and Design (ISLPED 03), 2003, pp. 18-21. nonsilicon devices. Raychowdhury has a BE in

23. C.H. Kim and K. Roy, “Dynamic Vt SRAM: A electronics and telecommunication engineer- Leakage Tolerant Cache Memory for Low ing from Jadavpur University, Calcutta, India. Voltage Microprocessors,” Proc. Int’l Symp. Low Power Electronics. and Design (ISLPED Kaushik Roy is the Roscoe H. George profes- 02), ACM Press, 2002, pp. 251-254. sor of electrical and computer engineering at 24. C.H. Kim et al., “A Forward Body-Biased Purdue University. His research interests include Low-Leakage SRAM Cache: Device and VLSI design and CAD for nanoscale silicon and Architecture Considerations,” Proc. Int’l nonsilicon technologies; low-power electronics Symp. Low Power Electronics and Design for portable computing and wireless commu- (ISLPED 03), ACM Press, 2003, pp. 6-9. nications; circuit-level system integration testing 25. K. Flautner, “Drowsy Caches: Simple Tech- and verification; and reconfigurable comput- niques for Reducing Leakage Power,” Proc. ing. Roy has a PhD in electrical and computer 29th Ann. Int’l Symp. Comp. Architecture engineering from the University of Illinois at (ISCA-29), IEEE CS Press, 2002, pp. 148-157. Urbana-Champaign and a BTech degree in elec- 26. S. Heo et al., “Dynamic Fine-Grain Leakage tronics and electrical communications engi- Reduction Using Leakage-Biased Bitlines,” neering from the Indian Institute of Technology, Proc. Int’l Symp. Comp. Architecture (ISCA- Kharagpur, India. He is a Fellow of the IEEE. 29), IEEE CS Press, 2002, pp. 137-147. 27. K. Itoh et al., “A Deep Sub-V, Single Power- Chris H. Kim is an assistant professor in the

Supply SRAM Cell with Multi-Vt, Boosted electrical and computer engineering department Storage Node and Dynamic Load,” Proc. at the University of Minnesota. His research Symp. VLSI Circuits Digest of Technical interests include theoretical and experimental Papers, IEEE Press, 1996, pp. 132-133. aspects of VLSI system design in nanoscale tech- nologies. Kim has a PhD in electrical and com- Amit Agarwal is a research engineer in circuit puter engineering from Purdue University, and research lab at Intel. His research interests an MS in biomedical engineering and BS in include low-power, high-performance, process- electrical engineering from Seoul National Uni- tolerant cache and register file design; low- versity, Korea. He is a member of IEEE. power integrated device, circuit, architecture design; and reconfigurable architecture design Direct questions and comments about this with unreliable components for yield improve- article to Amit Agarwal, Mailstop: JF2-04, ment. Agarwal has an MS and PhD in electri- 2111 NE 25th Ave., Hillsboro, OR 97124; cal and computer engineering from Purdue [email protected].

80 IEEE MICRO