<<

COVER FEATURE Leakage Current: Moore’s Law Meets Static Power

Microprocessor design has traditionally focused on dynamic power consumption as a limiting factor in system integration. As feature sizes shrink below 0.1 micron, static power is posing new low-power design challenges.

Nam Sung ower consumption is now the major tech- power. However, the power they consume has Kim nical problem facing the semiconductor increased dramatically with increases in device Todd Austin industry. In comments on this problem at speed and chip density. the 2002 International Electron Devices The research community has recognized the sig- David P Meeting, Intel chairman Andrew Grove nificance of this increase for some time. Figure 1 Blaauw cited off-state current leakage in particular as a lim- shows total chip dynamic and static power con- Trevor iting factor in future microprocessor integration.1 sumption trends based on 2002 statistics normal- Off-state leakage is static power, current that ized to the 2001 International Technology Road- Mudge 2 University of leaks through even when they are turned map for Semiconductors. The ITRS projects a Michigan, off. It is one of two principal sources of power dis- decrease in dynamic power per device over time. Ann Arbor sipation in today’s microprocessors. The other is However, if we assume a doubling of on-chip dynamic power, which arises from the repeated devices every two years, total dynamic power will Krisztián capacitance charge and discharge on the output of increase on a per-chip basis. Packaging and cooling Flautner the hundreds of millions of gates in today’s chips. costs as well as the limited power capacity of bat- ARM Ltd. Until very recently, only dynamic power has been teries make this trend unsustainable. a significant source of power consumption, and Figure 1 also shows exponential increases pro- Jie S. Hu Moore’s law has helped to control it. Shrinking jected for the two principal components of static processor technology has allowed and, below 100 power consumption: Mary Jane nanometers, actually required reducing the supply Irwin voltage. Dynamic power is proportional to the • subthreshold leakage, a weak inversion current Mahmut square of supply voltage, so reducing the voltage across the device; and Kandemir significantly reduces power consumption. • gate leakage, a tunneling current through the Unfortunately, smaller geometries exacerbate leak- insulation. Vijaykrishnan age, so static power begins to dominate the power Narayanan consumption equation in microprocessor design. The ITRS expects the rate of these increases to level Pennsylvania State out in 2005 but to remain substantial nonetheless. University PROCESS TRENDS Even today, total power dissipation from chip leak- Historically, complementary metal-oxide semi- age is approaching the total from dynamic power, conductor technology has dissipated much less and the projected increases in off-state subthreshold power than earlier technologies such as - leakage show it exceeding total dynamic power transistor and emitter-coupled logic. In fact, when consumption as technology drops below the 65-nm not switching, CMOS transistors lost negligible feature size.

68 Computer Published by the IEEE Computer Society 0018-9162/03/$17.00 © 2003 IEEE 100 300 Gate length

250 1 Dynamic If they reach mainstream production, emerging power 200 techniques to moderate the gate-oxide tunneling effect—primarily by using high-k to bet- Possible trajectory ter insulate the gate from the channel—could bring 0.01 if high-k dielectrics 150 gate leakage under control by 2010. reach mainstream Sub- production As leakage current becomes the major contribu- threshold tor to power consumption, the industry must leakage 100 Physical gate length (nm) reconsider the power equation that limits system 0.0001 performance, chip size, and cost. 50 Normalized total chip power dissipation Gate-oxide POWER BASICS leakage Five equations model the power-performance 0.0000001 0 19901995 20002005 2010 2015 2020 tradeoffs for CMOS logic circuits. We present them here in simplifications that capture the basics for logic designers, architects, and system builders. The first Figure 1. Total chip three are common in the low-power literature.3 The 0.5) will reduce the operating frequency by more dynamic and static last two model subthreshold and gate-oxide leakage. than half (fnorm ≈ 0.3). power dissipation Operating frequency and voltage Overall power consumption trends based on the International The first relation shows the dependency of oper- The third equation defines overall power con- Technology ating frequency on supply voltage: sumption as the sum of dynamic and static power: Roadmap for

α 2 Semiconductors. f ∝ (V − Vth) / V (1) P = ACV f + VIleak (3) The two power plots for static power where V is the transistor’s supply voltage, Vth is its The first term is the dynamic power lost from represent the 2002 threshold or switching voltage, and the exponent charging and discharging the processor’s capaci- ITRS projections α is an experimentally derived constant that, for tive loads: A is the fraction of gates actively switch- normalized to current technology, is approximately 1.3. ing and C is the total capacitance load of all gates. those for 2001. We can use this relation to develop an equation The second term models the static power lost due The dynamic power relating frequency and supply voltage. First, con- to leakage current, Ileak. increase assumes a sider an operating voltage Vnorm and frequency We have ignored power lost to the momentary doubling of on-chip fnorm, which are normalized to the maximum oper- short circuit at a gate’s output whenever the gate devices every two ating voltage Vmax and frequency fmax. Then switches. The loss is relatively small; it contributes years. approximate a linear relationship of frequency to to dynamic power loss, and the equation’s first term voltage with the following equation: can absorb it, if necessary. When dynamic power is the dominant source of Vnorm =β1 +β2 ⋅ fnorm (2) power consumption—as it has been and as it remains today in many less aggressive fabrication where the constants β1 = Vth / Vmax and β2 = 1 −β1. technologies—we can approximate Equation 3 From Equation 1 we see that f = 0 corresponds to with just the first term. Its V2 factor suggests reduc- Vnorm = Vth / Vmax, which for today’s technology is ing supply voltage as the most effective way to approximately 0.3. The simple relationship that decrease power consumption. In fact, halving the Equation 2 expresses closely matches recent indus- voltage will reduce the power consumption by a trial data.4 factor of four. But Equation 2 shows that halving Note that fmax corresponds to Vmax and that, as the voltage will reduce the processor’s maximum the relation in Equation 1 specifies, the frequency operating frequency by more than half. drops to zero when V is reduced to Vth. Equation To compensate for this performance loss, we can 2 also indicates that reducing the operating fre- use either parallel or pipelined implementations. If quency by a particular percentage from fmax will the implementation runs the original serial com- reduce the operating voltage by a smaller percent- putation as two parallel subtasks or as two age. For instance, if we assume β1 = 0.3, reducing pipelined subtasks, the dynamic power consump- the frequency by 50 percent (fnorm = 0.5) will reduce tion can decrease by more than a factor of two the operating voltage by 35 percent (Vnorm = 0.65). compared to the serial case. Of course, the reduc- Conversely, reducing the voltage by half (Vnorm = tion depends on significant parallelism being pre-

December 2003 69 sent in the computation, but many important Tox will reduce gate leakage. Unfortunately, it also code classes approximate this condition, degrades the transistor’s effectiveness because Tox Pipelining including digital signal processing and image must decrease proportionally with process scaling is the low-power processing. to avoid short channel effects. Therefore, increas- architectural ing Tox is not an option. The research community Leakage current is instead pursuing the development of high-k solution. But how useful are parallelism and pipelin- gate insulators. ing for reducing power when static power As with subthreshold leakage, a die’s combined consumption becomes a major component? gate width is a convenient measure of total oxide As noted, leakage current, the source of sta- leakage. tic power consumption, is a combination of sub- threshold and gate-oxide leakage: Ileak = Isub + Iox. Low-power architectural options Subthreshold power leakage. An equation that Because subthreshold and oxide leakage both Anantha Chandrakasan, William Bowhill, and depend on total gate width or, approximately, gate Frank Fox5 present shows how subthreshold leak- count, a pipelined implementation’s contribution age current depends on threshold voltage and sup- to leakage is comparable to the simple serial case, ply voltage: apart from the extra gates that latches introduce to separate the pipe stages. Pipelined implementations (4) can run at a lower voltage, which can reduce power consumption for both dynamic and static power K1and n are experimentally derived, W is the gate compared to the serial case. width, and Vθ in the exponents is the thermal volt- Parallel implementations can also run at a lower age. At room temperature, Vθ is about 25 mV; it voltage, but only by roughly doubling the amount increases linearly as temperature increases. If Isub of hardware. Thus, depending on some of the equa- grows enough to build up heat, Vθ will also start to tions’ experimental constants, the parallel case rise, further increasing Isub and possibly causing could leak more power than the serial case—even . to the point of offsetting any savings in dynamic Equation 4 suggests two ways to reduce Isub. power. First, we could turn off the supply voltage—that is, Pipelining is therefore the low-power solution. It set V to zero so that the factor in parentheses also will always leak less power than the parallel case becomes zero. Second, we could increase the thresh- because it has about half the hardware, and it will old voltage, which—because it appears as a nega- leak less power than the serial case because it runs tive exponent—can have a dramatic effect in even at a lower voltage. In fact, pipelining’s combined small increments. On the other hand, we know dynamic and static power leakage will be less than from Equation 1 that increasing Vth will reduce that of the serial case. speed. The problem with the first approach is loss of state; the problem with the second approach is REDUCING STATIC POWER CONSUMPTION the loss of performance. Unlike dynamic power, leakage is not activity Gate width W is the other contributor to sub- based, so reducing node switching when there is no threshold leakage in a particular transistor. Designers work does not help reduce power consumption. often use the combined widths of all the processor’s Shutting off the inactive part of the system does transistors as a convenient measure of total sub- help, but it results in loss of state. threshold leakage. Gate-oxide power leakage. Gate-oxide leakage is less Retention flip-flops well understood than subthreshold leakage. For When a device is inactive for a long period of our purposes, a simplification of equations from time, a “snooze” mode may help if the save/restore Chandrakasan, Bowhill, and Fox5 is sufficient to cost is small compared to the power that the snooze illustrate the key factors: time conserves. For shorter inactive periods, researchers have developed “balloon” logic, also called retention flip-flops. The idea is to use high- (5) Vth latches to duplicate those latches that must pre- serve state. As Equation 4 shows, the high-Vth K2 and α are experimentally derived. The term of latches have a dramatically reduced subthreshold interest is oxide thickness, Tox. Clearly, increasing leakage. In snooze mode, the control logic copies

70 Computer BL BL Figure 2. Leakage (1V) (1V) current paths in a memory cell. The bitline leakage current flows P1 P2 through the access the main latch’s state to the retention latch and (0V) transistor N3 from (1V) N4 turns off the main latch to save energy. N3 the bitline, while A transistor’s threshold voltage depends on its N1 N2 the cell leakage design and process technology. Typical values are flows through 450 mV in 180-nm processes and 350 mV in 130- WL WL (0V) transistors P1 nm technologies. Using techniques or apply- (0V) and N2. ing a bias voltage to the substrate can increase threshold voltage by 100 mV. This in turn reduces Bitline leakage path leakage by a factor of about 10, but it increases Cell leakage path switching time by about 15 percent. Thus, low- leakage retention flops are only useful in saving state energy efficiently—their use on the proces- age level to access it.10 Waking up the drowsy cache sor’s critical path would slow it down. They also lines is treated as a pseudo cache miss and incurs incur die area increases for the duplicated latches, one additional cycle overhead. which can be significant: 20 percent of the logic size Other proposed state-preserving techniques and 10 percent of total chip area are not unex- include gradually decreasing threshold voltages11 pected for small embedded cores. and using preferred data values.12 All these tech- niques reduce leakage less than turning a cache line Controlling memory leakage off completely, but accessing the low-leakage state On-chip caches constitute the major portion of incurs much less penalty. Moreover, while state-pre- the processor’s transistor budget and account for a serving techniques can only reduce leakage by about significant share of leakage. In fact, leakage is pro- a factor of 10, compared to more than a factor of jected to account for 70 percent of the cache power 1,000 for destructive techniques, the net difference budget in 70-nm technology.6 in power consumed by the two techniques is less Figure 2 illustrates the various leakage current than 10 percent. When the reduced wake-up time is paths in a typical memory cell. The current through factored into overall program runtime, state-pre- the access transistor N3 from the bitline is referred serving techniques usually perform better. They have to as bitline leakage, while the current flowing the additional benefit of not requiring an L2 cache. through transistors P1 and N2 is cell leakage. Control techniques. There are two broad categories Both bitline and cell leakage result from sub- of control techniques for leakage-saving features: threshold conduction—current flowing from the source to drain even when gate-source voltage is • application-sensitive controls, based on run- below the threshold voltage. In addition, gate-oxide time performance feedback,9,13 and leakage current is flowing through the transistor • application-insensitive controls, which peri- gates. odically turn off cache lines.6,7,10 Circuit techniques. Two broad categories of circuit techniques aim to reduce leakage: state-destructive The degree of control varies significantly within and state-preserving. each group. For example, an application-sensitive State-destructive techniques use ground gating, technique9 may indicate that the processor can turn also called gated-Vdd. Ground gating adds an off 25 percent of the cache because of a very high NMOS (n-channel metal-oxide semiconductor) hit rate, but it provides no guidance about which 75 sleep transistor to connect the memory storage cell percent of the cache lines will be used in the near and the power supply’s ground.7-9 Turning a cache future. On the other hand, an insensitive technique line off saves maximum leakage power, but the loss like the cache-decay algorithm7 keeps track of sta- of state exposes the system to incorrect turn-off tistics for each cache line and thus may provide bet- decisions. Such decisions can in turn induce signif- ter predictive behavior. icant power and performance overhead by causing Application-insensitive algorithms do not opti- additional cache misses that off-chip memories mize for the specific workload. Instead, they aim must satisfy. for good average behavior and minimize the down- State-preserving techniques vary. Drowsy caches side of misprediction. Periodically turning off a multiplex supply voltages according to the state of cache line is an example of such a scheme.10 Its suc- each cache line or block. The caches use a low- cess depends on how well the selected period retention voltage level for drowsy mode, retaining reflects the rate at which the instruction or data the data in a cache region and requiring a high volt- working set changes. Specifically, the optimum

December 2003 71 Hotspots and Code Sequentiality

Researchers at the Pennsylvania State University have developed a scheme to manage instruction cache leakage that is sensitive to changes in temporal and spatial locality during program execution.1 The scheme builds on drowsy cache techniques. It associates each period may change not only across applications but cache line with a mode bit that controls whether the line is awake and also within the different phases of one application. accessible or in low-leakage drowsy mode. Periodically, a global sleep In such cases, the algorithm must either keep the signal resets all the mode bits to put all the cache lines in drowsy mode. cache lines awake longer than necessary or turn off This approach is based on the notion that working sets change peri- the lines that hold the current instruction working odically. In reality, the change is gradual, so asserting the sleep signal set, which slows performance and wastes energy. will unnecessarily put some lines that are part of the working set into Addressing the first problem by decreasing the drowsy mode. period will exacerbate the second problem. On the plus side, this approach is simple, has little imple- Identifying hotspots mentation overhead, and works well for data One improvement on this basic idea prevents inadvertent mode caches. transitions by augmenting each cache line with a local voltage-con- Application-insensitive algorithms can also exhibit trol-mask bit. When set, the VCM masks the influence of the global pathologically bad behavior on certain code types. 6 sleep signal and prevents mode transition. The VCM bits are set based For example, one technique wakes up (and puts to on information from an enhanced branch target buffer. The BTB mon- sleep) cache subbanks as execution moves between itors how often an application accesses the different basic blocks and them. If the first part of the loop is in one bank and uses this information to identify whether they belong to a program the latter part in another, the algorithm can cause hotspot. Once the BTB determines that a program is within a hotspot, frequent drowsy-awake transitions that negatively the processor sets a global mask bit, then sets the VCM bits of all impact the loop’s performance. Moreover, the leak- accessed cache lines to indicate the program hotspot. The processor age current dissipated by lines that are awake and updates the BTB access-frequency counters and the VCM bits peri- not accessed can waste significant energy. odically to reflect the change in program phase. Compiler techniques Predicting transitions Using compiler directives might make it possible A second improvement focuses on predictively transitioning the cache to keep some loops within bank boundaries, assum- lines that an application program will access next from sleep to normal ing that the compiler knows the bank structure. mode. The predictive strategy avoids the performance and associated However, a typical large application will likely have leakage penalty incurred from accessing a cache line in sleep mode. Since to divide some loops across banks. sequentiality is the norm in code execution, the program counter pre- The compiler can also provide application-sen- dictively transitions the next cache line to the normal mode when it sitive leakage control. For example, a program’s accesses the current line. This technique is referred to as just-in-time source code could include explicit loop-level cache 13 activation. line turn-off instructions. However, this scheme demands sophisticated program analysis and mod- Tracking access moves ification support as well as modifications to the Finally, asserting the global sleep signal when accesses move from instruction set architecture. one cache sub-bank to another enables the processor to identify oppor- The “Hotspots and Code Sequentiality” sidebar tunities for sleep mode transition when spatial locality changes. This describes a leakage management technique that differs from asserting the global sleep signal periodically to capture exploits two main characteristics of instruction temporal locality changes. access patterns: the confinement of program exe- cution mainly to program hotspots and the sequen- Performance improvements tial access pattern that instructions exhibit. When averaged across 14 SPEC2000 benchmarks, these three improvements provide an average leakage energy savings of 63 per- TECHNOLOGY TRENDS AND CHALLENGES cent in the instruction cache compared to using no leakage manage- Equation 4 shows that subthreshold leakage ment, 49 percent compared to the bank-based turnoff scheme, and 29 exhibits a strong dependence on temperature. percent over the compiler-based turnoff scheme. Therefore, one approach to reducing subthreshold leakage is to actively refrigerate the chip. While this option seems promising for controlling the sub- Reference threshold leakage point, it does not address gate- 1. J. Hu et al., “Exploiting Program Hotspots and Code Sequentiality for oxide leakage. Further, its practical application Instruction Cache Leakage Management,” Proc. Int’l Symp. Low-Power faces significant technological and cost challenges. and Design (ISLPED 03), ACM Press, 2003, pp. 402-407. Multiple threshold voltages A more promising approach in terms of sub-

72 Computer Figure 3. Leakage 90 current and gate 80 length. (a) Impact of gate-length on 70 leakage current, and (b) expected 60 threshold current employs multiple threshold volt- probability age technologies. Today’s processes typically offer 50 distribution of two threshold voltages. Designers assign a low the leakage 40 threshold voltage to a few performance-critical Leakage current (pA) current for transistors and a high threshold voltage to the 30 devices on a chip. majority of less timing-critical transistors. This approach incurs a high subthreshold leakage cur- 0.16 0.17 0.18 0.19 0.20 rent for the performance-critical transistors, but it (a) Drawn gate length (µm) can significantly reduce the overall leakage. Further, future technologies are likely to offer 1.00 three threshold voltages—low, high, and extra high—or even more. This opens the way to new 0.80 leakage optimizations within different portions of a cache or at different levels of its hierarchy. For 0.60 instance, the address-decoder and bus-driver cir- cuits in a cache consume a significant portion of 0.40 total access time, so a designer could construct them from high-Vth transistors, while constructing 0.20 the more numerous bit cells from extra-high-Vth Normalized frequency count devices and reserving low-Vth devices for speed-crit- 0 ical parts of the processor core. Furthermore, new 20 30 40 50 60 70 80 90 100 tradeoffs become possible between the cache size (b) Leakage current (pA) and its threshold voltage. Another possible power optimization is to sig- nificantly reduce leakage by increasing the L2 long “tail” for high-leakage currents corresponds cache’s threshold voltage. To compensate for the to devices with small gate lengths. This distribu- increased L2 access time, a designer could increase tion implies that a small set of devices experience the L1 cache size or use faster (and leakier) tran- significantly more subthreshold leakage current sistors to construct it. Trading off cache size and than the average device. Since caches comprise a threshold voltages at different levels of the on-chip large number of devices, they have a high proba- cache hierarchy can reduce the total leakage with- bility of containing a few “extremely leaky” gates. out sacrificing overall performance. Intradie process variations will require new These techniques do not address gate-oxide approaches to reducing cache leakage, specifically leakage. targeting bit cells with high-leakage currents. One approach would be to identify such cells at test time Gate length and swap them with redundant cells. As CMOS technology is scaled, variations in gate length, oxide thickness, and doping concentrations Oxide tunneling are becoming more significant—particularly for The growing significance of oxide-tunneling cur- intradie variations that occur among the devices of rent poses another challenge to reducing cache a single die. leakage. Process scaling has consistently reduced Figure 3a shows the impact of gate length varia- the gate-oxide layer’s thickness to provide sufficient tions on leakage current. Short-channel effects, current drive at reduced voltage supplies. The such as drain-induced barrier lowering, give sub- resulting gate-tunneling leakage current is signifi- threshold leakage current an exponential depen- cant, as Equation 5 shows. Iox arises from the finite dence on the gate length. Thus, even a 10 percent (nonzero) probability of an electron directly tun- variation from the nominal length can change the neling through the insulating silicon oxide layer. leakage current by a factor of three. Figure 1 shows that gate-oxide current leakage Figure 3b shows the expected probability distri- will catch up to subthreshold leakage in magni- bution of the leakage current for devices on a chip, tude—and in many cases, it already has. The main given a Gaussian distribution of gate lengths with approach to reduce gate-oxide leakage applies a standard deviation equal to 5 percent of mean for aggressive high-k materials with dielectric con- a 180-nm process. The exponential increase in leak- stants of 25 to 50, such as hafnium oxide. These age current generates a log-normal distribution; the materials greatly diminish gate-oxide leakage, but

December 2003 73 they also pose numerous process integration prob- age-Scalable and Frequency-Scalable 32b PowerPC lems. The ITRS does not project them reaching Processor,” Proc. Int’l Solid-State Circuits Conf. mainstream production until 2010.4 (ISSCC), IEEE Press, 2002, pp. 340-341. The emerging importance of gate-oxide leakage 5. A. Chandrakasan, W. Bowhill, and F. Fox, Design of raises new challenges for leakage reduction in on- High-Performance Microprocessor Circuits, IEEE chip caches. One possible approach employs tech- Press, 2001. nologies with dual-oxide thicknesses. Designers 6. N. Kim et al., “Drowsy Instruction Caches: Leakage could use thick-oxide transistors for cache bit cells, Power Reduction Using Dynamic Voltage Scaling and for example, reducing overall gate-oxide leakage Cache Sub-bank Prediction,” Proc. 35th Ann. Int’l in the same way high threshold voltage devices can Symp. Microarchitecture (MICRO-35), IEEE CS reduce subthreshold leakage. Press, 2002, pp. 219-230. Other methods of reducing subthreshold leakage 7. S. Kaxiras, Z. Hu, and M. Martonosi, “Cache Decay: may apply to gate-tunneling current as well, but Exploiting Generational Behavior to Reduce Cache their effects require further research. For example, Leakage Power,” Proc. 28th Int’l Symp. Computer gate-oxide leakage—unlike subthreshold leakage— Architecture (ISCA 28), IEEE CS Press, 2001, pp. has a very weak dependence on temperature. Thus, 240-251.

as subthreshold leakage decreases with decreases in 8. M. Powell et al., “Gated-Vdd: A Circuit Technique to operating temperature, gate-oxide leakage becomes Reduce Leakage in Deep-Submicron Cache Memo- more dominant. This will require technology and ries,” Proc. Int’l Symp. Low-Power Electronics and engineering that specifically target gate-oxide reduc- Design (ISLPED 00), ACM Press, 2000, pp. 90-95. tion in standby mode. 9. M. Powell et al., “Reducing Leakage in a High-Per- formance Deep-Submicron Instruction Cache,” IEEE Trans. VLSI, Feb. 2001, pp. 77-89. ower consumption has become a primary con- 10. K. Flautner et al., “Drowsy Caches: Simple Tech- straint in microprocessor design, along with niques for Reducing Leakage Power,” Proc. 29th P performance, clock frequency, and die size. Ann. Int’l Symp. Computer Architecture (ISCA 29), Researchers in both industry and academia are IEEE CS Press, 2002, pp. 148-157. focusing on ways to reduce it. This community has 11. H. Kim and K. Roy, “Dynamic Vth SRAMs for Low become adept at overcoming seemingly insur- Leakage,” Proc. Int’l Symp. Low-Power Electronics mountable barriers—for example, the supposed and Design (ISLPED 02), ACM Press, 2002, pp. 251- feature size limits for optical lithography dictated 254. by the wavelength of visible light. We can expect 12. N. Azizi, A. Moshovos, and F.N. Najm, “Low-Leak- the community to meet the power challenge in the age Asymmetric-Cell SRAM,” Proc. Int’l Symp. next few years as well. Low-Power Electronics and Design (ISLPED 02), ACM Press, 2002, pp. 48-51. 13. W. Zhang et al., “Compiler-Directed Instruction Acknowledgments Cache Leakage Optimization,” Proc. 35th Ann. Int’l This work was supported by ARM, an Intel Symp. Microarchitecture (MICRO-35), IEEE CS Graduate Fellowship, the US Defense Advanced Press, 2002, pp. 208-218. Research Projects Agency, the Semiconductor Re- search Corporation, the Gigascale Silicon Research Nam Sung Kim is a PhD candidate at the Univer- Center, and the National Science Foundation. sity of Michigan. His research interests include low- power and low-complexity microarchitecture design at the circuit and microarchitectural bound- References ary. Kim received an MS in electrical engineering 1. R. Wilson and D. Lammers, “Grove Calls Leakage from the Korea Advanced Institute of Science and Chip Designers’ Top Problem,” EE Times, 13 Dec. Technology, Taejon, Korea. He is a student mem- 2002; www.eetimes.com/story/OEG20021213S0040. ber of the IEEE and the ACM. Contact him at 2. Semiconductor Industry Assoc., International Tech- [email protected]. nology Roadmap for Semiconductors, 2002 Update; http://public.itrs.net. Todd Austin is an associate professor of electrical 3. T. Mudge, “Power: A First-Class Architectural Design engineering and computer science at the University Constraint,” Computer, Apr. 2001, pp. 52-58. of Michigan. His research interests include com- 4. K. Nowka et al., “A 0.9V to 1.95V Dynamic Volt- puter architecture, compilers, computer system ver-

74 Computer ification, and performance analysis tools and tech- puter science from the University of Illinois. She is niques. Austin received a PhD in computer science an IEEE Fellow, an ACM Fellow, and a member of from the University of Wisconsin. Contact him at the National Academy of Engineering. Contact her [email protected]. at [email protected].

David Blaauw is an associate professor of electri- Mahmut Kandemir is an assistant professor of com- cal engineering and computer science at the Uni- puter science and engineering at Pennsylvania State versity of Michigan. His research interests include University. His research interests include optimiz- VLSI design and CAD with emphasis on circuit ing compilers, I/O-intensive applications, and analysis and optimization problems for high-per- power-aware computing. He received a PhD in elec- formance and low-power microprocessor designs. trical engineering and computer science from Syra- Blaauw received a PhD in computer science from cuse University. He is a member of the IEEE and the University of Illinois, Urbana-Champaign. the ACM. Contact him at [email protected]. Contact him at [email protected]. Vijaykrishnan Narayanan is an associate profes- Trevor Mudge is the Bredt Professor of Electrical sor of computer science and engineering at Penn- Engineering and Computer Science at the Univer- sylvania State University. His research interests are sity of Michigan. In addition, he runs Idiot Savants, in embedded systems, energy-efficient designs, com- a chip design consultancy, and advises several ven- puter architecture, and VLSI design. Narayanan ture firms. His research interests include computer received a PhD in computer science from the Uni- architecture, computer-aided design, and compil- versity of South Florida. He has received several ers. Mudge received a PhD in computer science awards including the IEEE Computer Society’s from the University of Illinois, Urbana-Cham- Richard E. Merwin Award. Contact him at paign. He is a Fellow of the IEEE and a member [email protected]. of the ACM, the IEE, and the British Computer Society. Contact him at [email protected].

Krisztián Flautner is a principal researcher at ARM Limited and the architect of ARM’s Intelligent Energy Management technology. His research JOIN A interests address high-performance, low-power processing platforms to support advanced software environments. Flautner received a PhD in computer science and engineering from the University of THINK Michigan. Contact him at krisztian.flautner@arm. com. TANK Jie S. Hu is a PhD candidate at Pennsylvania State University. His research interests include high-per- ooking for a community targeted to your formance, low-power microprocessor design; area of expertise? IEEE Computer Society power-efficient memory architectures; and com- L Technical Committees explore a variety plexity-effective instruction issue queues. Hu of computing niches and provide forums for received an ME in signal and information process- dialogue among peers. These groups influence ing from Peking University. He is a student mem- our standards development and offer leading ber of the ACM and the IEEE. Contact him at conferences in their fields. [email protected]. Mary Jane Irwin is the A. Robert Noll Chair in Join a community that targets your discipline. Engineering in the Department of Computer Sci- ence and Engineering at Pennsylvania State Uni- versity. Her research interests include computer In our Technical Committees, you’re in good company. architecture, embedded and mobile computing sys- tems design, low-power design, and electronic design automation. Irwin received a PhD in com- computer.org/TCsignup/