<<

EMBEDDED COMPUTING

ing. We also anticipate the emergence of relatively simple, disposable devices Mobile that support the pervasive computing infrastructure—for example, sensor network nodes. The requirements of low-end devices are increasing exponentially, and com- puter architectures must adapt to keep Todd Austin, David Blaauw, Scott Mahlke, up. Some elements of high-end devices and Trevor Mudge, University of Michigan are already present in cell phones Chaitali Chakrabarti, Arizona State University from the major manufacturers. High- Wayne Wolf, Princeton University end PDAs also include an amazing range of features, such as networking and cameras.

oore’s law has held sway over the past several Current trends in decades, with the number architecture and power cannot M of transistors per chip doubling every 18 meet the demands of mobile months. As a result, a fairly inexpen- supercomputing. Significant sive CPU can perform hundreds of millions of operations per second— innovation is required. performance that would have cost mil- lions of dollars two decades ago. We should be proud of our achieve- puters. Rather than worrying solely SUPERCOMPUTING REQUIREMENTS ments and rest on our laurels, right? about performance, with the occa- A mobile will employ Unfortunately, no. sional nod to power consumption and natural I/O interfaces to the mobile The human appetite for computation cost, we need to judge by user. For example, input could come has grown even faster than the pro- their performance-power-cost product. through a continuous real-time speech- cessing power that Moore’s law pre- This new way of looking at proces- processing component. Device output dicted. We need even more powerful sors will lead us to new computer will include high-bandwidth graphics processors just to keep up with mod- architectures and new ways of think- display, either as a semitransparent ern applications like interactive multi- ing about computer system design. heads-up display or an ocular interface media communications. To make such as a retinal projector. An audio matters more difficult, we need these A MOBILE COMPUTING WORLD channel will support output for audio powerful processors to use much less Untethered digital devices are reception and sound cues. Finally, the electrical energy than we have been already ubiquitous. The world has device will include a high-bandwidth accustomed to. more than 1 billion active cell phones, interface for network and In other words, we need mobile each a sophisticated multiprocessor. telecommunication access. supercomputers that provide massive With sales totaling about $400 million This platform will have to execute computational performance from the every year, the cell phone has arguably many computationally intensive appli- power in a battery. These supercom- become the dominant computing plat- cations: soft radio, cryptography, aug- puters will make our personal devices form, a candidate for replacing the per- mented reality, speech recognition, and much easier to use. They will perform sonal computer. mobile applications such as e-mail and real-time speech recognition, video We expect to see both the types and word processing. We expect this plat- transmission and analysis, and high- numbers of mobile digital devices form to require about 16 times as much bandwidth communication. And they increase in the near future. New devices computing horsepower as a 2-GHz Intel will do so without us having to worry will improve on the by Pentium 4 processor, for a total per- about where the next electrical outlet incorporating advanced functionality, formance payload of 10,000 SPECInt will be. such as always-on Internet access and benchmark units (www.specbench.org). But to achieve this functionality, we human-centric interfaces that integrate To remain mobile, the device must must rethink the way we design com- voice recognition and videoconferenc- achieve this extremely high perfor-

May 2004 81 Embedded Computing

• higher instruction throughput via 10k SPEC Int2000 instruction-level parallelism. 10,000 Moore’s law Performance The combined strength of these opti- speedup gap 1,000 mizations has led to the industry’s impressive performance gains. The study points out that clock-rate 100 improvements from pipelining must soon diminish because current designs 10 Process improvement (relative FO4 delay) have little logic within pipe stages. As Pipelining (relative FO4 gates/stage) such, latch delay and clock skew will Performance (SPECInt2000) 1 ILP (relative SPECInt/MHz) soon dominate the clock period. The Performance pipeline curve in Figure 1 illustrates 0.1 i386 i486 Pentium Pentium Pentium Pentium Pentium One Two Three this leveling off. For example, Intel’s Pro II III 4 Gen Gen Gen Pentium 4 microprocessor has only 12 fanout-of-four (FO4) gate delays per Figure 1. Performance trends for desktop processors. The star indicates the mobile super- stage, leaving little logic that can be computer’s requirement. bisected to produce higher clocked rates. mance using only a small battery for trends will not meet mobile supercom- The negative trend of the instruc- power. Given the slow growth trend puting demands anytime soon—and tion-level parallelism curve in Figure 1 for batteries—5 percent capacity without significant innovation, per- suggests that increased instruction increase per year—we estimate that a haps they never will. throughput cannot make up for antic- mobile supercomputer (circa 2006) ipated clocking limits. The Pentium 4 will require a 1,475 mA-hr battery General-purpose limits microprocessor achieves only about 80 weighing 4 oz. With a five-day battery For more than three decades, archi- percent of its predecessor Pentium III’s lifetime under a 20 percent duty cycle tects have lavished attention on the instruction throughput for some appli- (peak load versus standby), we esti- design and optimization of general- cations (measured in SPECInt/Mhz for mate that the system’s peak power purpose processors. As a result, cur- the same technology). requirement must not exceed 75 mW. rent designs feature many advanced As architectural optimizations reach techniques such as superpipelining, their limits, they threaten a primary PERFORMANCE AND superscalar execution, dynamic sched- source of value in the computer in- POWER TRENDS uling, multilevel memory caching, and dustry, namely ongoing performance Unfortunately, mobile supercom- aggressive speculation. Combined with increases. puting’s requirements are in contrast fabrication technology improvements, to the trends we see in both computer these optimizations have resulted in a Nanometer impedences architecture and power for future steady doubling of processor perfor- Circuit-level effects in nanometer devices. mance every 18 months. devices are also a leading barrier to con- Figure 1 shows the trends in perfor- But a growing body of evidence sug- tinued performance scaling. Short- mance, measured in SPECInt, for a gests that general-purpose processor channel effects already prevent gate family of Intel x86 processors. Figure optimizations are diminishing in value. delay from scaling with feature size as 2 shows the power consumption trends A study examining the scalability of originally expected. Figure 1 shows the in the same processors. The graphs rep- future general-purpose processor technology curve flattening. Capacitive resent the published data for proces- designs (V. Agarwal et al., “Clock Rate and inductive coupling and increased sors ranging from the 386 (in 1990) to versus IPC: The End of the Road for interconnect lengths pose a serious dif- the Pentium 4 (in 2002) in roughly Conventional Microarchitectures,” ficulty for fast signal transmission across two-year steps. The predicted trends Proc. 27th Ann. Int’l. Symp. Computer the die. through 2008 are derived from the Architecture (ISCA 00), IEEE CS Press, Furthermore, as Figure 2 shows, the 2003 edition of the International 2000, pp. 248-259) identified two sharp rise in static leakage current in Technology Roadmap for Semicon- kinds of general-purpose processor nanometer designs is impeding contin- ductors (http://public.itrs.net/). optimizations: ued improvements in processor power The star on each graph indicates our consumption. The leakage current orig- mobile supercomputer’s performance • increased clock speed through inates in a dramatic increase in both and power requirements. Clearly, the pipelining, and subthreshold current and gate-oxide

82 Computer leakage current. In fact, static power consumption is now a primary issue in 1,000 deep submicron design and is projected Total power to account for as much as 50 percent of Dynamic power Static power the total power dissipation for high-end 100 processors in 90-nm technology. Power gap 10 REVOLUTIONARY CHANGES In mobile applications, a device can be in standby mode a significant por- Power (W) 1 tion of the time. In this case, leakage power dominates total power dissi- 0.1 pation and threatens the ability to 75 mW meet the power requirements for high- peak power performance mobile processors. It is 0.01 i386 i486 Pentium Pentium Pentium Pentium Pentium One Two Three becoming clear that incremental im- Pro II III 4 Gen Gen Gen provements within the architecture and circuit subdomains are not going to deliver the extra performance and Figure 2. Power trends for desktop processors. The star indicates the mobile power efficiency that high-end mobile supercomputer’s requirement. applications will demand. Furthermore, future generations of advantage of advanced circuit fea- accurately predict when to shut down VLSI technology will not provide the tures, parts of the system to save energy. reliable operation that we have so long • programs that automatically ex- assumed. The small size of future ploit application-specific architec- devices will make them vulnerable to tures, and Todd Austin is an associate professor radiation-induced upsets, circuit noise, • software to glue the layers to- of electrical engineering and computer and other factors that produce enough gether and allow existing off-the- at the University of Michigan. operational, transient failures to require shelf applications to use the sys- Contact him at [email protected]. architectural designs that can compen- tem efficiently. sate for them. This means diverting a David Blaauw is an associate profes- significant amount of the processor’s We can also turn logic checking to our sor of electrical engineering and com- computational effort to check the advantage by using it to run the system puter science at the University of results. Thus, we must be even more at its targeted performance level—even Michigan. Contact him at blaauw@ clever about how we squeeze perfor- in the presence of errors caused by umich.edu. mance out of our machines, particu- infrequently occurring “worst-case” larly since all that checking logic scenarios. We can tune our designs for Scott Mahlke is an assistant professor consumes energy that we can ill afford “better-than-worst-case” operation of electrical engineering and computer to lose. and remove the large safety margins science at the University of Michigan. necessary to insure against worst-case Contact him at [email protected]. JOINT OPTIMIZATIONS situations. To build practical mobile super- Trevor Mudge is Bredt Family Chair computers, system architects need to of electrical engineering and computer jointly optimize across algorithms, f course, we must apply energy science at the University of Michigan. architectures, and circuits. We don’t management aggressively at all Contact him at [email protected]. have all the answers today about how O abstraction levels to meet mobile to solve all the problems inherent in supercomputing requirements. This Chaitali Chakrabarti is a professor of mobile supercomputing, but we believe means optimizing the hardware so that electrical engineering at Arizona State that we have identified some useful components can be turned on and off University. Contact her at chaitali@ approaches. quickly. It also requires extracting pro- asu.edu. We can control tradeoffs in a verti- gram data during compilation for use cally integrated manner: in guiding energy management. Finally, Wayne Wolf is a professor of electrical we need to develop sophisticated user engineering at Princeton University. • microarchitectures that can take monitoring systems that can more Contact him at [email protected].

May 2004 83