8 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 32, NO. 1, JANUARY 2013 Underdesigned and Opportunistic Computing in Presence of Hardware Variability Puneet Gupta, Member, IEEE, Yuvraj Agarwal, Member, IEEE, Lara Dolecek, Member, IEEE, Nikil Dutt, Fellow, IEEE, Rajesh K. Gupta, Fellow, IEEE, Rakesh Kumar, Member, IEEE, Subhasish Mitra, Member, IEEE, Alexandru Nicolau, Member, IEEE, Tajana Simunic Rosing, Member, IEEE, Mani B. Srivastava, Fellow, IEEE, Steven Swanson, Member, IEEE, and Dennis Sylvester, Fellow, IEEE

Abstract—Microelectronic circuits exhibit increasing variations print circuits and build systems at exponentially growing ca- in performance, power consumption, and reliability parameters pacities for the last three decades. After reaping the benefits of across the manufactured parts and across use of these parts the Moore’s law driven cost reductions, we are now beginning over time in the field. These variations have led to increasing use of overdesign and guardbands in design and test to ensure to experience dramatically deteriorating effects of material yield and reliability with respect to a rigid set of datasheet properties on (active and leakage) power, and die yields. So far, specifications. This paper explores the possibility of constructing the problem has been confined to the hardware manufacturers computing machines that purposely expose hardware variations who have responded with increasing guardbands applied to to various layers of the system stack including software. This microelectronic chip designs. However, the problem continues leads to the vision of underdesigned hardware that utilizes a software stack that opportunistically adapts to a sensed or mod- to worsen with critical dimensions shrinking to atomic scale. eled hardware. The envisioned underdesigned and opportunistic For instance, the oxide in 22 nm process is only five atomic computing (UnO) machines face a number of challenges related layers thick, and gate length is only 42 atomic diameters to the sensing infrastructure and software interfaces that can across. This has translated into exponentially increasing costs effectively utilize the sensory data. In this paper, we outline of fabrication and equipment to achieve increasingly precise specific sensing mechanisms that we have developed and their potential use in building UnO machines. control over manufacturing quality and scale. We are beginning to see early signs of trouble that may Index Terms—, design automation, make benefits of semiconductor industry unattainable for all design for manufacture, digital integrated circuits, micro- procesors, reliability, system software. but the most popular (hence highest volumes) of consumer gadgets. Most problematic is the substantial fluctuation in I. Introduction critical device/circuit parameters of the manufactured parts HE PRIMARY driver for innovations in computer sys- across the die, die-to-die and over time due to increasing T tems has been the phenomenal scalability of the semicon- susceptibility to errors and circuit aging-related wear-out. ductor manufacturing process that has allowed us to literally Consequently, the microelectronic substrate on which modern computers are built is increasingly plagued by variability in Manuscript received March 28, 2012; revised August 5, 2012; accepted September 16, 2012. Date of current version December 19, 2012. This performance (speed, power) and error characteristics, both work was supported in part by the National Science Foundation Variability across multiple instances of a system and in time over its usage Expedition award under Grants CCF-1029030, CCF-1028831, CCF-1028888, life. Consider the variability in circuit delay and power as it has CCF-1029025, and CCF-1029783. The focus of the National Science Foun- dation Computing Expedition on Variability-Aware Software for Efficient increased over time as shown in Fig. 1. The most immediate Computing with Nanoscale Devices (http://www.variability.org) is to address impact of such variability is on chip yields: a growing number these challenges and develop prototypical UnO computing machines. This of parts are thrown away since they do not meet the timing paper was recommended by Associate Editor V. Narayanan. P. Gupta, L. Dolecek, and M. B. Srivastava are with the University and power related specifications. Left unaddressed, this trend of California, Los Angeles, CA 90024 USA (e-mail: [email protected]; toward parts that scale in neither capability nor cost will [email protected]; [email protected]). cripple the computing and information technology industries. Y. Agarwal, R. K. Gupta, T. S. Rosing, and S. Swanson are with the Univer- sity of California, San Diego, CA 92093 USA (e-mail: [email protected]; The solution may stem from the realization that the problem [email protected]; [email protected]; [email protected]). is not variability per se, rather how computer system designers N. Dutt and A. Nicolau are with the University of California, Irvine, CA have been treating the variability. While chip components no 92617 USA (e-mail: [email protected]; [email protected]). S. Mitra is with Stanford University, Stanford, CA 447337 USA (e-mail: longer function like the precisely chiseled parts of the past, [email protected]). the basic approach to the design and operation of computers R. Kumar is with the University of Illinois at Urbana-Champaign, Urbana, has remained unchanged. Software assumes hardware of fixed IL 61820 USA (e-mail: [email protected]). D. Sylvester is with the University of Michigan, Ann Arbor, MI 48105 USA specifications that hardware designers try hard to meet. Yet (e-mail: [email protected]). hardware designers must accept conservative estimates of Color versions of one or more of the figures in this paper are available hardware specifications and cannot fully leverage software’s online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCAD.2012.2223467 inherent flexibility and resilience. Guardband for hardware

0278-0070/$31.00 c 2012 IEEE GUPTA et al.: UNDERDESIGNED AND OPPORTUNISTIC COMPUTING IN PRESENCE OF HARDWARE VARIABILITY 9

Fig. 2. Frequency variation in an 80-core processor within a single die in Intel’s 65 nm technology [4].

Fig. 1. Circuit variability as predicted by ITRS [2]. design increases cost—maximizing performance incurs too much area and power overheads [1]. It leaves enormous performance and energy potential untapped as the software assumes lower performance than what a majority of instances of that platform may be capable of most of the time. In this paper, we outline a novel flexible hardware–software Fig. 3. Sleep power variability across temperature for ten instances of an off- stack and interface that use adaptation in software to re- the-shelf ARM Cortex M3 processor [6]. Only 5 out of 10 measured boards lax variation-induced guard-bands in hardware design. We are shown for clarity. envision a new paradigm for computer systems, one where nominally designed (and hence underdesigned) hardware parts performance variation of more than 25% at 0.8 V in a work within a software stack that opportunistically adapts to recent experimental Intel processor. Similarly, on the variations. memory/storage front, recent measurements done by us This paper is organized as follows. The next section dis- show 27% and 57% operation energy variation and 50% cusses sources of variability and exemplifies them with mea- bit error rate variation between nominally identical flash surements from actual systems. Section III gives an introduc- devices (i.e., same part number) [3]. This variability tion of underdesigned and opportunistic computing. Section can also include variations coming from manufacturing IV outlines various methods of variation monitoring with ex- the same design at multiple silicon foundries. amples. Section V discusses underdesigned and opportunistic 2) Environment. ITRS projects Vdd variation to be 10% computing (UnO) hardware while Section VI discusses UnO while the operating temperature can vary from −30 °C software stack implications. We conclude in Section VII. to 175°C (e.g., in the automotive context [5]) resulting in over an order of magnitude sleep power variation (see Fig. 3) resulting in several tens of percent performance II. Measured Variability in Contemporary change and large power deviations. Computing Systems 3) Vendor. Parts with almost identical specifications can Scaling of physical dimensions in semiconductor circuits have substantially different power, performance, or relia- faster than tolerances of equipments used to fabricate them bility characteristics, as shown in Fig. 4. This variability has resulted in increased variability in circuit metrics such as is a concern as single vendor sourcing is difficult for power and performance. In addition, new device and intercon- large-volume systems. nect architectures as well as new methods of fabricating them 4) Aging. Wires and transistors in integrated circuits suffer are changing sources and nature of variability. In this section, substantial wear-out leading to power and performance we discuss underlying physical sources of variation briefly, and changes over time of usage (see Fig. 5). Physical mech- illustrate the magnitude of this variability visible to the soft- anisms leading to circuit aging include bias temperature ware layers by measured off-the-shelf hardware components. instability (i.e., BTI where threshold voltage of tran- There are four sources of hardware variation. sistors degrades over time), hot carrier injection (HCI), 1) Manufacturing. The International Technology Roadmap where transistor threshold voltage degrades every time it for Semiconductors (ITRS) highlights power/ switches, and electromigration (EM), where wire width performance variability and reliability management shrinks as more currents passes through it. in the next decade as a red brick (i.e., a problem In the remainder of this section, we give a few examples with unknown solutions) for design of computing of variability measurement in modern off-the-shelf computing hardware. As an example, Fig. 2 shows within-die 3σ components. 10 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 32, NO. 1, JANUARY 2013

Fig. 4. Power variation across five 512 MB DDR2-533 DRAM parts [7]. Fig. 6. Measured active power variability for ARM Cortex M3 processor. Only 5 out of 10 measured boards are shown for clarity.

observed is likely due to variation in short-circuit power at lower temperatures and leakage component of active power at higher temperatures.

B. Power Consumption Variability in Contemporary General Purpose Processors On the other end of the power and performance spectrum are the higher end modern processors from Intel and AMD Fig. 5. Normalized frequency degradation in a 65 nm ring oscillator due to that are meant for laptop, desktop, and server class devices. NBTI [8]. Energy management is critically important in these platforms due to battery lifetime concerns (mobile devices) or rising A. Power Variability in Contemporary Embedded Processors energy costs (desktops and servers). In modern platforms, Energy management methods in embedded systems rely on the processor is often a dominant energy consumer and with knowledge of power consumption of the underlying computing increasingly energy efficient designs has a very large dynamic platform in various modes of operation. In previous work range of operation. As a result characterizing processor power [6], we measured and characterized instance and temperature- consumption accurately such that adaptive algorithms that can dependent power consumption for the Atmel SAM3U, a manage their power consumption is key. contemporary embedded processor based on an ARM Cortex However, it is likely that characterizing power consumption M3 core, and discussed the implications of this variation to on just one instance of a processor may not be enough since system lifetime and potential software adaptation. Fig. 3 shows part-to-part power variability may be significant and knowing that at room temperature the measured sleep power spread was its extent is therefore critical. Using a highly instrumented 9× which increased to 14× if temperature variation (20 °C to Calpella platform from Intel, we were able to isolate the power 60 °C) was included as well. 1 Such large variation in leakage consumption of multiple subsystems within contemporary x86 power is to be expected given its near exponential dependence processors (Intel Nehalem class). Using this Calpella plat- on varying manufacturing process parameters (e.g., gate length form, and an extensive suite of single-threaded and modern and threshold voltage) as well as temperature. To the best of multithreaded benchmarks such as the SPEC CPU2006 and our knowledge, this embedded processor was manufactured PARSEC suites, we characterized the variability in power in 90 nm or older technology. Given that the leakage as well consumption across multiple cores on the same die, and also as leakage variability are likely to increase with technology across cores on different dies. We used Linux cpu-sets generations, we expect future embedded processors to fare to pin the benchmark and the OS threads to specific cores much worse in terms of variability. Another interesting point to for these experiments. To minimize experimental error (which note is that six out of the ten boards that we measured were out is eventually very small as measured by standard deviations of datasheet specification highlighting the difficulty in ensuring across runs), we ran each benchmark configuration multiple a pessimistic datasheet specification for sleep power. times (n = 5); we swapped the processors in and out and On the same processor core, we measured active power repeated the experiments to eliminate any effects due to the variation as well. The power spread at room temperature is thermal couplings of the heatsink, as well as rebooted the about 5%, and larger if temperature dependence is included as Calpella platform multiple times to account for any effects well (see Fig. 6). Actual switching power has a weak linear due to the OS scheduling decisions. dependence on temperature and its process dependence is also Table I summarizes the variability in power consumption relatively small. Most of the temperature-dependent variation across cores on six instances of identical dual-core Core i5– 540M processors for different configurations such as turning 1All temperature experiments were conducted in a controlled temperature TurboBoost ON or OFF and also turning processor sleep states chamber. Further details can be found in [6] (C-states) ON or OFF. We do not show the variability across GUPTA et al.: UNDERDESIGNED AND OPPORTUNISTIC COMPUTING IN PRESENCE OF HARDWARE VARIABILITY 11

TABLE I Variability in Power and Energy Consumption Measured Across Six Identical Core i5-540M Parts, With Different Processor Features Enabled or Disabled

TurboBoost = OFF TurboBoost = ON C-States = ON C-States = OFF C-States = ON C-States = OFF Power Energy Power Energy Power Energy Power Energy Maximum variation (for any particular benchmark) 22% 16.7% 16.8% 16.8% 11.7% 10.6% 15.1% 14.7% Average variation (across all benchmarks) 14.8% 13.4% 14.7% 13.9% 8.6% 7.9% 13.2% 11.7% Our data show that the variability across parts is indeed significant and it decreases with TurboBoost and processor C-states enabled due to reduction in leakage power. The MaximumVariation values presented in depicts variability for a particular benchmark that shows the largest variation while the AverageVariation shows the average variation in power and energy across all the benchmarks for a particular configuration. multiple cores on the same processor (within die variability) since it was quite low and instead focus on the variability across instances (across-die variability). We also measure benchmark completion times and use that to measure the variability in energy consumption. The data presented in the table are for our final benchmark set consists of 19 benchmarks from the SPEC CPU 2006 benchmark set [9] chosen to cover a wider range of resource usage patterns. As shown in Table I the variation in power consumption observed across dies is significant (maximum 22% for a par- ticular benchmark and average 12.8% across all benchmarks) and depends on the processor configuration. Our data provide evidence that the power variation we observe is dominated by the differences in leakage power across processor dies. Fig. 7. Programming performance exhibits wide and predictable variation For instance, disabling the processor sleep states (C-states = across pages within a single flash block. Since program power is constant, the same effect leads to energy efficiency variation as well. Chip identifier OFF) increases the parts of the chip that remain powered denotes the manufacturer (the initial letter), cell type (MLC or SLC), and on, thereby increasing the leakage power consumption and capacity. Error bars: one standard deviation. in turn increasing the power variability. As shown in Table I, the average power variation across all benchmarks indeed random variation at multiple scales. Fig. 7 show the system- increases when C-states are disabled (C-States = OFF case); atic variation we observe in program latency for multilevel the variation increases from 8.6% (TurboBoost = ON, C-states cell (MLC) flash devices. Power measurements of the same = ON) to 13.2% (TurboBoost = ON, C-States = OFF). Fur- operations show that power consumption is roughly constant ther details about our measurement methodology and dataset, across pages, leading to a similar variation in energy per extended results under different processor configurations, and programmed bit. We have demonstrated that SSD firmware discussions about potential causes of this observed variability and the operating system can leverage this variation to improve can be found in [10]. performance of high-priority IO requests or reduce energy consumption during, for instance, battery powered operation. Rising density has had as especially detrimental affect on C. Variation in Memories flash’s bit error rate and the number of program/erase cycles The push toward “big data” applications has pushed the a flash block can sustain before it becomes unusable. We performance, reliability, and cost of memory technologies to frequently observe variation of 10× or more in raw bit error the forefront of system design. Two technologies are playing rate across blocks on the same chip. Even more insidious is the particularly important roles: DRAM (because it is very fast) decrease in data retention (i.e., how long data remain intact in and NAND Flash (because it is nonvolatile, very dense/cheap, an idle block) with program/erase cycling: accelerated aging and much faster than disk). In both technologies, variability measurements show that by the end of its program/erase life manifests itself as bit-errors, differences in performance, and time, data retention time drops by 90% (from 10 years to differences in energy efficiency. 1). We are using these and other measurements to predict a) Flash Memory: To attain ever-greater flash memory when individual flash blocks will go bad, and allocate different densities, flash memory manufacturers are pushing the limits types of data (e.g., long-term storage versus temporary files) of the process technology and the principles of operation that to different region of the flash memory within an SSD. underlie flash devices. The result is growing grow levels of b) DRAM: [11] tested 22 double date rate (DDR3) dual variability in flash performance at multiple scales. inline memory modules (DIMMs), and found that power usage Careful measurements of flash behavior [3] allow us to in DRAMs is dependent both on operation type (write, read, quantify variability in flash’s read, write, and erase perfor- and idle) as well as data, with write operations consuming mance, raw bit error rate, and energy consumption. The more than reads, and 1s in the data generally costing more results show both systematic variation with flash blocks and power than 0 s. Temperature had little effect (1%–3%) across 12 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 32, NO. 1, JANUARY 2013

instance, voltage overscaling can cause timing errors thereby changing the functional behavior of the circuit but the exact input dependence of such errors may be tough to predict especially in the presence of process or environmental variations. On the other hand, such input to error mapping is known a priori for functionally underdesigned systems (though it may be too complex to keep track of). A large class of applications is tolerant to certain kinds of errors due to algorithmic or cognitive noise tolerance. For example, certain multimedia and iterative applications are tolerant to certain numerical errors and other errors in data. Fig. 8. Maximum variations in write, read, and idle power by DIMM An numerical or data error often simply affects the quality of Category, 30 °C. Instrumentation measurement error is less than 1%. output for such applications. The number of errors tolerable by such applications is bounded by the degree to which a degradation in output quality is acceptable. System correctness can be carefully relaxed for such applications for improved power and performance characteristics. Note that only certain classes of errors are acceptable. The system still needs to avoid errors in control or errors that may cause exceptions or I/O errors. Our recent work [22] presents a compiler analysis framework for identifying vulnerable operations in error toler- ant GPU applications. A large class of other applications, e.g., Fig. 9. Classification of UnO computing machines. mission-critical applications, do not tolerate any error in the output. For such applications, the system needs to guarantee correct operation under all conditions. For such applications, the −50 °C to 50 °C range. Variations were up to 12.29% and hardware monitoring mechanisms could be used to to identify 16.40% for idle power within a single model and for different an aggressive, but safe operating point. Alternatively, hardware models from the same vendor, respectively. In the scope of all error resilience mechanisms can be used to guarantee that no tested 1 gigabyte (GB) modules, deviations were up to 21.84% hardware error is exposed to applications. in write power. The measured variability results are summa- The IC design flow will use software adaptability and error rized in Fig. 8. Our ongoing work addresses memory manage- resilience for relaxed implementation and manufacturing ment methods to leverage such power variations (for instance constraints. Instead of crashing and recovering from errors, variability-aware Linux virtual memory management [12]). the UnO machines make proactive measurements and predict parametric and functional deviations to ensure continued III. UnO Computing Machines operations and availability. This will preempt impact on software applications, rather than just reacting to failures (as UnO computing machines provide a unified way of address- is the case in fault-tolerant computing) or under-delivering ing variations due to manufacturing, operating conditions, and along power/performance/reliability axes. Fig. 10 shows the even reliability failure mechanisms in the field: difficult-to- UnO adaptation scheme. Underdesigned hardware may be predict spatiotemporal variations in the underlying hardware, acceptable for appropriate software applications (i.e., the ones instead of being hidden behind conservative specifications, are that degrade gracefully in presence of errors or are made fully exposed to and mitigated by multiple layers of software. robust). In other cases, software adaptation may be aided by A taxonomy of UnO machines is shown in Fig. 9. They can hardware signatures (i.e., measured hardware characteristics). be classified along the following two axes. There are several ways to implement such adaptation ranging 1) Type of Underdesign. Use parametrically underprovi- from software-guided hardware power management to just-in- sioned circuits (e.g., voltage overscaling as in [13], [14]) time recompilation strategies. Some examples are described or be implemented with explicitly altered functional later in Section V. For hardware design flow, the objective description (e.g., [15]–[17]). is to develop efficient design and test methods, given that the 2) Type of Operation. Erroneous operation may rely upon goal is no longer to optimize yield for fixed specifications, but application’s level of tolerance to limited errors (as in rather to ensure that designs exhibit well-behaved variability [18]–[20] to ensure continued operation. In contrast, characteristics that a well-configured software stack can easily error-free UnO machines correct all errors (e.g., [13]) exploit. or operate hardware within correct-operation limits (e.g., [6], [21] ). Furthermore, for erroneous systems, the IV. Sensing and Exposing Hardware Signatures exact error characteristics may or may not be known. In parametrically underdesigned erroneous systems, it A hardware signature provides a way of capturing one may not be easy to infer the exact error behavior. For or more hardware characteristics of interest and transmitting GUPTA et al.: UNDERDESIGNED AND OPPORTUNISTIC COMPUTING IN PRESENCE OF HARDWARE VARIABILITY 13

2) A wide range of parameters beyond traditional chip- level power and performance may need to be mea- sured and calibrated for UnO machines. Examples include the noise margins and erroneous behaviors of memory cells and sequential elements, error rates, and degradation behaviors of various design blocks and canary circuits under voltage/frequency overscal- ing or voltage/temperature stress. behavior of various functional blocks and their error/degradation behaviors. With conventional production test methodologies, it may be highly difficult to accomplish these tasks cost- effectively. 3) Today’s production testing methodologies typically do not factor in end-user application intent (some excep- Fig. 10. Examples of UnO adaptaion aided by embedded hardware moni- toring. tions exist). Factoring in application intent can improve manufacturing yield since several chips give accept- able application behavior despite being defective (see these characteristics to the software at a given point in time. [18]). Establishing this defect to application behavior Examples of hardware signatures include performance, power, mapping, especially in case of programmable, general temperature, error rate, delay fluctuations, and working mem- purpose computing systems is highly difficult. Factoring ory size. in application behaviors during production testing may Hardware signatures may be collected at various levels require complex fault diagnosis (which can result in of spatial and temporal granularities depending on hardware expensive test time) and/or extensive use of functional and software infrastructure available to collect them and their tests. Functional test generation is generally considered targeted use in the UnO context. UnO machines expose these to be “black art” and difficult to automate. signatures to opportunistic software via architectural assists to dynamically recognize and use interesting behaviors. Key B. Runtime Sensing Methods challenges in realizing this vision are as follows: Techniques for collecting signatures during system runtime 1) identification of hardware characteristics of interest and can be classified into the following four categories. derivation of corresponding signatures that can be ef- 1) Monitors. By monitors, we refer to structures that are fectively sensed and utilized by the architecture and generally “decoupled” from the functional design (but software layers; can be aware of it). Such structures are inserted (in 2) techniques (hardware, software, or combinations a sparse manner) to capture hardware characteristics thereof) to enable effective sensing; of interest. Several examples exist in the literature to 3) design of the overall platform to support a signature- monitor various circuit characteristics (e.g., achievable driven adaptive software stack. frequency [23], leakage power [24], or aging [25]). The challenge is to minimize the overheads introduced by A. Production Test of UnO Machines the monitoring circuits; these overheads take the form of Signatures can be collected at various points in time, for area, delay, and/or power, as well as design complexity example during production testing or boot-time, periodically while retaining accuracy. during runtime, or adaptively based on environmental condi- 2) Error Detection Circuits and In Situ Sensors. Circuit tions (voltage, temperature), workload, and actions triggered delay variation can be characterized using a variety of by other sensing actions. While runtime sensing is central to delay fault detection flip-flop designs (provided timing- UnO, it is supported by production-time sensing that goes critical paths are exercised) [26]–[30]. These approaches beyond traditional production testing methods that generally can be used either for concurrent error detection (i.e., focus on screening defective and/or binning manufactured detect a problem after errors appear in system data and parts according to their power/performance characteristics. systems) or for circuit failure prediction (i.e., provide Some of the challenges associated with production-time sens- early indications of impending circuit failures before ing are listed below. errors appear in system data and states) [27]. Depending 1) Today, performance/power binning is generally lim- on the final objective, such error detection circuits may ited to high-end processors. For complex SoCs with be employed in a variety of ways, such as: a large mix of heterogeneous components (including 1) simply keep them activated all the time. a significant proportion of nonprocessor components), 2) activate them periodically. it is difficult to create comprehensive tests to achieve 3) activate them based on operating conditions or high-confidence binning. In contrast, we expect UnO signatures collected by (simpler) monitors. machines to support a large number of highly granular 3) Online Self-Test and Diagnostics. Online self-test and bins. This requires innovations in production testing diagnostics techniques allow a system to test itself con- techniques. currently during normal operation without any downtime 14 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 32, NO. 1, JANUARY 2013

TABLE II Comparing Different Hardware Signature Generation Methods

Monitors Error indicators On-line self-test Software and diagnostics inferences Hardware costs Low Generally high Medium Low Accuracy Low High High Low Coverage Low High High Low Hardware changes Required Generally required Generally required Minimal Diagnosability Low Medium to low High Low On-line operation Yes Yes Partial with Yes with system support multiprogramming Applicability Distributed effects General as long as Transient errors General but diagnosis (failure mechanisms Localized events (generally) errors are created. cannot be detected. of problems is addressed) not covered. difficult

visible to the end user. Major technology trends such as Since each method of runtime sensing inevitably makes a the emergence of many-core SoC platforms with lots fundamental tradeoff between cost, accuracy, and applicability of cores and accelerators, the availability of inexpensive across various stages of system lifetime (see Table II), it is nec- off-chip nonvolatile storage, and the wide-spread use of essary to combine these approaches to meet the area/power/test test compression techniques for low-cost manufacturing time/accuracy constraints. Intelligent algorithms drawing from test create several new opportunities for online self- recent work in information theory, compressive sensing (e.g., test and diagnostics that overcome the limitations of [43]), and similar fields can be used to adjust sampling traditional pseudorandom logic built-in self-test (Logic rates based on prior observations, detect data outliers, and BIST) techniques [31]. In addition to hardware support, otherwise exploit the vast information being provided by the effective online self-test diagnostics must be orchestrated runtime sensing techniques. In addition, to build models of using software layers such as virtualization [32] and the hardware variability that can be used by software layers (for operating system [33]. example, to decide which code versions to make available), 4) Software-Based Inference. A compiler can use the func- randomness and dependencies must be captured, which has tional specification to automatically insert (in each com- initially been approached using simplified probabilistic models ponent) software-implemented test operations to dynam- [44]. As mentioned earlier, collective use of many sensors ically probe and determine which parts are working. brings up questions of the appropriate monitoring granularity Such test operations may be used, for instance, to de- (i.e., what to test when). Temporal granularity is determined termine working parts of the instruction set architecture by time dependence of wear-out mechanisms, timescales of (ISA) for a given processor core. Software-implemented ambient/operating variability and initial design-time guard- test operations may be instrumented using a variety of band. Statistical system adaptation techniques must be able techniques: special test sequences with known inputs and to take advantage of fine granularity. This establishes a close outputs, self-checking tests, and quick error detection tie between system design and software adaptability versus tests [34]–[41]. test procedures during manufacturing, and both must be co- One area of recent work in monitors that could be employed optimized for the best result (e.g., examples shown in [45]). in UnO machines are aging sensors that estimate or measure Frequently measuring high precision signatures (speed, power, the degree of aging (for different mechanisms, such as negative and error rate) at a large number of points in a circuit bias temperature instability, NBTI, or oxide degradation for incurs large overheads. Preliminary work [21] shows that example) that a particular chip has experienced. One design for discrete algorithm configurations or code versions, it is was presented in [25], where a unified NBTI and gate- possible to derive optimal quantization strategies (e.g., what oxide wear-out sensor was demonstrated in a 45 nm CMOS exact frequencies to test). Spatial granularity is dependent on technology. With a very small area, sprinkling hundreds or the magnitude of within-die process variations, heterogeneity even thousands of such sensors throughout a design becomes in design (e.g., use of multiple device types necessitating feasible, providing a fairly reliable indication of system ex- multiple measurements) and opportunities of leveraging het- pected lifetime. erogeneity/deviations in the ISA (e.g., LOAD consuming more Alternatively, in situ sensors directly measure the hardware power while ADD consuming less than expected). used in the chip itself. Such designs tend to incur relatively Providing architectural and system support for test and higher overhead in area, performance, or some other metric. monitoring is equally important. For instance, CASP [31] One recent in situ design [42] reuses header devices aimed integrates microarchitecture support, as well as software- at leakage reduction to evaluate the nature of leakage in a assisted techniques utilizing virtual machine monitors and circuit block. Area overhead in this case can be kept in the operating systems, to efficiently orchestrate online self-test and 5% range, though such checks must be scheduled (this can be diagnostics at the system level. CASP provides efficient online done without affecting user performance). test access mechanisms to fetch high-quality test patterns from GUPTA et al.: UNDERDESIGNED AND OPPORTUNISTIC COMPUTING IN PRESENCE OF HARDWARE VARIABILITY 15

throttle the data rate to match what the platform can sustain), use a different set of hardware resources (e.g., use instructions that avoid a faulty module or minimize use of a power hungry module), change the algorithm (e.g., switch to an algorithm that is more resilient to computational errors), and change hardware’s operational setting (e.g., tune software-controllable control knobs such as voltage/frequency). These changes can occur at different places in the software stack with corresponding tradeoffs in agility, flexibility, and effectiveness: explicitly coded by the application developer making use of special language features and API frame- works; automatically synthesized during static or just-in-time compilation given information about application requirements Fig. 11. Four phases of CASP operation that include software orchestration to minimize the performance impact of self-test. and platform variability; and transparently managed by the operating system resource scheduler. off-chip storage and apply them to various components in an SoC in four phases (see Fig. 11). A. Selective Use of Hardware Resources A simple technique that stalls the component-under-test can One strategy for coping with variability is to selectively result in significant system performance degradation or even choose units to perform a task from a pool of available visible system unresponsiveness [46]. An example software hardware. This can happen at different granularities where optimization is CASP-aware OS scheduling [33]. We modified variability may manifest itself across different cores in a the scheduler of a Linux operating system to consider the multicore processor, different memory banks, or different unavailability of cores undergoing online self-test and diag- servers in a data center. We briefly describe a few examples nostics. Experimental results on a dual quad-core Xeon system below. show that the performance impact of CASP on noninteractive A large fraction of all integrated circuit failures are related to applications in the PARSEC benchmark suite is negligible thermal issues [52], [53]. A number of strategies are used to try (e.g., < 1%) when spare cores are available. to minimize the effect of thermal variability on general purpose designs, starting from on-chip, to within a single server and finally at the large system level such as within datacenters. The V. Variability-Aware Software cost of cooling can reach up to 60% of datacenter running Variability has been typically addressed by process, device, costs [54]. The problem is further complicated by the fact and circuit designers with software designers remaining iso- that software layers are designed to be completely unaware lated from it by a rigid hardware–software interface, which of hardware process and thermal variability. [55] studied the leads to decreased chip yields and increased costs [1]. Re- effect of temperature variability on mean time to failure cently, there have been some efforts to handle variability at (MTTF) of general purpose processors. It compared the default higher layers of abstraction. For instance, software schemes Linux scheduler with various thermal and power management have been used to address voltage [47] or temperature variabil- techniques implemented at the OS and hardware levels of a ity [48]. Hardware “signatures” are used to guide adaptation in 16 core high end general purpose processor. The difference in quality-sensitive multimedia applications in [49]. In embedded MTTF between default case and the best performing example sensing, [50], [51] propose sensor node deployment schemes (including different dynamic power management schemes) based on the variability in power across nodes. is as high as 73%. With relatively minor changes to the A software stack that changes its functions to exploit and operating system layer, the MTTF of devices can more than adapt to runtime variations can operate the hardware platform double at lower energy cost by effectively leveraging hardware close to its operational limits. The key to this achievement characteristics. is to understand the information exchanges that need to take In [19], the authors present an error resilient system ar- place, and the design choices that exist for the responses to chitecture (ERSA), a robust system architecture which targets variability, where they occur (i.e., which layer of software), emerging applications such as recognition, mining, and syn- and when they occur (design or run time). thesis (RMS) with inherent error resilience, and ensures high The hardware variability may be visible to the software in degree of resilience at low cost. While resilience of RMS several ways: changes in the availability of modules in the applications to errors in low-order bits of data is well known, platform (e.g., a processor core not being functional); changes execution of such applications on error-prone hardware signif- in module speed or energy performance (e.g., the maximum icantly degrades output quality (due to high-order bit errors feasible frequency for a processor core becoming lower or and crashes). ERSA avoids these pitfalls using a judicious its energy efficiency degraded); and changes in error rate of combination of the following key ideas. modules (e.g., an ALU computing wrong results for a higher 1) Asymmetric reliability in many-core architectures: fraction of inputs). The range of possible responses that the ERSA consists of small number of highly reliable com- software can make is rich: alter the computational load (e.g., ponents, together with a large number of components 16 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 32, NO. 1, JANUARY 2013

that are less reliable but account for most of the com- putation capacity. Within an application, by assigning the control-related code to reliable components and the computation intensive code to relaxed reliability components, it is possible to avoid highly conservative overall system design. 2) Error-resilient algorithms at the core of probabilis- tic applications: instead of traditional concurrent error checks typically used in fault-tolerant computing, ERSA relies on lightweight error checks such as timeouts, illegal memory access checks, and application-aware error compensation using the concepts of convergence filtering and convergence damping. Fig. 12. Hardware guided adaptation improves PSNR (for samples of video sequences encoded using adaptive and nonadaptive methods, please see http: 3) Intelligent software optimizations: error injection ex- //nanocad.ee.ucla.edu/Main/Codesign). periments on a multicore ERSA hardware prototype demonstrate that even at very high error rates of 20 of iterations. The system uses a calibration phase to make errors/flip-flop/108 cycles (equivalent to 25 000 er- approximation decisions based on the quality of service re- rors/core/s), ERSA maintains 90% or better accuracy quirements specified by the programmer. At runtime, the of output results, together with minimal impact on system periodically monitors quality of service and adapts the execution time, for probabilistic applications such as the approximation decisions as needed. Such adaptations can be K-Means clustering, LDPC decoding, and Bayesian net- used to maximally leverage underlying hardware platform in work inference. We also demonstrate the effectiveness of presence of variations. ERSA in tolerating high rates of static memory errors In the context of multimedia applications, [21] demon- that are characteristic of emerging challenges related to strated how by adapting application parameters to the post SRAM Vccmin problems and erratic bit errors. manufacturing hardware characteristics across different die, In addition to processors, variation is also found in memory it is possible to compensate for application quality losses subsystems. Experiments have shown that random access that might otherwise be significant in presence of process memory chips are also subject to variations in power consump- variations. This adaptation in turn results in improved man- tion. [11] found up to 21.8% power variability across a series ufacturing yield, relaxed requirement for hardware overdesign of 1 GB DIMMs and up to 16.4% power variation across 1 GB and better application quality. Fig. 12 shows that quality- DIMMs belonging to the same vendor. Variation in memory variation tradeoff can differ significantly for different hardware power consumption, error, and latency characteristics can be instances of a video encoder. In H.264 encoding, adapting handled by allowing the run time system to select different parameters such as subpixel motion estimation, FFT transform application memory layout, initialization, and allocation strate- window size, run length encoding mechanism, and size of gies for different application requirements. An architecture motion estimation search window can lead to significant yield for hardware-assisted variability-aware memory virtualization improvements (as much as 40% points at 0% overdesign), a (VaMV) that allows programmers to exploit application-level reduction in overdesign (by as much as 10% points at 80% semantic information [56] by annotating data structures and yield) as well as application quality improvements (about 2.6 partitioning their address space into regions with different dB increase in average PSNR at 80% yield). Though this power, performance, and fault-tolerance guarantees (e.g., map approach follows the error-free UnO machine model, exploting look-up tables into low-power fault-tolerant space or pixel data errors also as a quality tuning axis has been explored in [60]. in low-power nonfault-tolerant space) was explored in [57]. In context of erroneous UnO machines, several alternatives The VaMV memory supervisor allocates memory based on pri- exist ranging from detecting and then correcting faults within ority, application annotations, device signatures based on the the application as they arise (e.g., [61], [62]) to designing memory subsystem characteristics (e.g., power consumption), applications to be inherently error-tolerant (e.g., application and current memory usage. Experimental results on embedded robustification [63]). In addition, many applications have in- benchmarks show that VaMV is capable of reducing dynamic herent algorithmic and cognitive error tolerance. For such power consumption by 63% on average while reducing total applications, significant performance and energy benefits can execution time by an average of 34% by exploiting: 1) SRAM be obtained by selectively allowing errors. voltage scaling, 2) DRAM power variability, and 3) Efficient dynamic policy-driven variability-aware memory allocation. C. Alternate Code Paths One broad class of approaches for coping with hardware B. Software Adaptations for Quality-Complexity Tradeoffs variability is for the software to switch to a different code Application parameters can be dynamically adapted to ex- path in anticipation of or in response to a variability event. plore energy, quality, and performance tradeoffs [58]. For Petabricks [64] and Eon [65] feature language extensions that example, Green [59] provides a software adaptation modality allow programmers to provide alternate code paths. In [64], the where the programmer provides “breakable” loops and a compiler automatically generates and profiles combinations function to evaluate quality of service for a given number of these paths for different quality/performance points. In GUPTA et al.: UNDERDESIGNED AND OPPORTUNISTIC COMPUTING IN PRESENCE OF HARDWARE VARIABILITY 17

[65], the runtime system dynamically chooses paths based on energy availability. Levels is an energy-aware program- ming abstraction for embedded sensors based on alternative tasks [66]. Programmers define task levels, which provide identical functionality with different quality of service and energy usage characteristics. The run-time system chooses the highest task levels that will meet the required battery lifetime. While the systems described in [64], [65], and [66] do not consider hardware variability, similar mechanisms for express- ing application elasticity can be leveraged in a variability- aware software system. For example, the application can read the current hardware signature, or register interest in receiving Fig. 13. Designing a software stack for variability-aware duty cycling. notifications or exceptions when a specific type or magnitude of changes occur in that signature, as illustrated in scenarios interfaces accessible to either the system or the applications. 1 and 2 of Fig. 13. Application response to variability events Fig. 13 shows several different ways such an opportunistic could be structured as transitions between different run-levels, stack may be organized; the scenarios shown differ in how with code-blocks being activated or deactivated as a result of the sense-and-adapt functionality is split between applications transitions. and the operating system. Scenario 1 relies on the application Algorithms and libraries with multiple implementations can polling the hardware for its current “signature.” In the second be matched to underlying hardware configuration to deliver the scenario, the application handles variability events generated best performance [67], [68]. Such libraries can be leveraged to by the operating system. In the last scenario, handling of choose the algorithm that best tolerates the predicted hardware variability is largely offloaded to the operating system. variation and deliver a performance satisfying the quality-of- Using architecture similar to that of scenario 3 in Fig. 13, service requirement. With support from the OS, the switch [6] explored a variability-aware duty cycle scheduler where to alternative algorithms may be done in an application- application modules specify a range of acceptable duty cy- transparent fashion by relinking a different implementation of cling ratios, and the scheduler selects the actual duty cycle a standard library function. based on run-time monitoring of operational parameters, and a power-temperature model that is learned off-line for the D. Adjusting Hardware Operating Point specific processor instance. Two programming abstractions are provided: tasks with variable iterations and tasks with variable Variation-aware adjustment of hardware operating point, period. For the first, the programmer provides a function that whether in the context of adaptive circuits (e.g., [69]–[71]), can be invoked repeatedly a bounded number of times within adaptive microarchitectures (e.g., [28], [72]–[74]), or software- each fixed period. For the second, the application programmer assisted hardware power management (e.g., [4], [75], [76]) provides a function that is invoked once within each variable has been explored extensively in literature. UnO software but bounded period of time. The system adjusts the number of stack will be cognizant of actuation mechanisms available in iterations or period of each task based on the allowable duty hardware and utilize them in an application-aware manner. cycle. The scheduler determines an allowable duty cycle based on: 1) sleep and active power versus temperature curves for E. An Example UnO Software Stack for Embedded Sensing each instance; 2) temperature profile for the application, which Throttling the workload is a common strategy to achieve can be precharacterized or learned dynamically; 3) lifetime system lifetime objectives in battery powered systems. A par- requirement; and 4) battery capacity. ticularly common technique is duty cycling, where the system In an evaluation with ten instances of Atmel SAM3U is by default in a sleep state and is woken up periodically to processors, [6] found that variability-aware duty cycling yields attend to pending tasks and events. A higher duty cycle rate a3− 22x improvement in total active time over schedules typically translates into higher quality of service and a typical based on worst-case estimations of power, with an average application-level goal is to maximize quality of data through improvement of 6.4x across a wide variety of deployment higher duty cycles, while meeting a lifetime goal. Duty cycling scenarios based on collected temperature traces. Conversely, is particularly sensitive to variations in sleep power at low duty datasheet power specifications fail to meet required lifetimes cycling ratios. Variability implies that any fixed, network-wide by 7%–15%, with an average 37 days short of a required choice of duty cycle ratio that is selected to ensure desired lifetime of one year. Finally, a target localization application lifetime needs to be overly conservative and result in lower using variability-aware duty cycle yields a 50% improvement quality of sensing or lifetime. in quality of results over one based on worst-case estimations In order to maximize the sensing quality in the presence of power consumption. of power variation, an opportunistic sensing software stack With current technology characteristics, a variability-aware can help discover and adapt the application duty cycle ratio to schedule is beneficial for smaller duty cycles (< 5%). With the power variations across parts and over time. The run-time sys- expected projection of sleep and active power variation with tem for the opportunistic stack has to keep track of changes in the scaling of technology, there will be significant benefits of a hardware characteristics and provide this information through variability-aware schedule even for higher duty cycles. Some 18 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 32, NO. 1, JANUARY 2013 classes of real time, highly synchronized embedded sensing applications are not amenable to this type of adaptation. Furthermore, this scheme adds some complexity to the appli- cation, in the form of bounds to task activations, which may in turn lead to further complexities in data storage, inference, and communication strategies. Nevertheless, the benefits of variability-aware duty cycling outweigh the added complexity for a large class of sensing applications. While this adapta- tion scheme indirectly leads to a form of energy-based load balancing across a network of sensors, further opportunities Fig. 14. Goal of a gradual slack soft processor [20] design is to transform a for network-wide adaptation exist in role selection for nodes, slack distribution characterized by a critical wall into one with a more gradual failure characteristic. This allows performance/power tradeoffs over a range of where a node could take different roles (e.g., data collector, error rates, whereas conventional designs are optimized for correct operation router, aggregator) depending on its respective energy rank in and recovery-driven designs are optimized for a specific target error rate. the network.

F. Variability Simulation Challenges Due to the statistical nature of variability, it is important to test variability-aware software techniques on large number of hardware platforms, which, unfortunately is impractical in most cases. It is, therefore, essential to develop simulation and emulation infrastructures for UnO computing systems which are scalable while retaining variability modeling accuracy. The Fig. 15. Different microarchitectures exhibit different error rate behaviors, work in this direction has been somewhat limited (e.g., [77] in demonstrating the potential to influence the energy efficiency of a timing speculative architecture through microarchitectural optimizations. (Notations context of performance variability for a specific system or error describe the number of ALUs, issue queue length, number of physical emulation in [19], [63]) and a key challenge for UnO software registers, and load store queue size for a microarchitecture.) methods is building such emulation platforms for large class of computing systems which model all manifestations (error, errors can be tolerated by software. Similarly, software should delay, power, aging, etc.) of variability. designed to make forward progress in spite of the errors produced by hardware. The resulting system will potentially be significantly more power efficient as hardware can now be VI. Implications for Circuits and Architectures underdesigned exactly to meet the software reliability needs. Software on UnO machines can react to hardware signature Design methodologies that are aware of the error resilience of a resource by either changing its utilization profile or mechanisms used during the design process can result in changing the hardware’s operating point. In addition, the significantly lower power processor designs (e.g., [80]). computing platform can be implemented just to report the Unfortunately, current design methodologies are not always hardware signature (with or without self-healing) or it may suitable for such underdesign. For example, conventional allow (software-driven) adaptation. The power/area cost will processors are optimized such that all the timing paths are depend on the desired spatial granularity of healing/adaptation critical or near-critical (“timing slack wall”). This means that (i.e., instruction level, component level, and platform level). any time an attempt is made to reduce power by trading off Since the mechanisms for such healing or adaptation (e.g., reliability (by reducing voltage, for example), a catastrophi- shutdown using power gating; power/performance change us- cally large number of timing errors is seen [20]. The slack ing voltage scaling, etc.) are fairly well understood, future distribution can be manipulated (for example, to make it look research focus should be on optimizing application-dependent gradual, instead of looking like a wall (Fig. 14) to reduce the tradeoffs and reducing hardware implementation overheads. number of errors when power is reduced [81]. Furthermore, The embedded sensing and H.264 encoder examples, the frequently exercised components (or timing paths) that discussed previously, fall into the category of parametrically contribute the most to the error rate can be optimized at underdesigned, error-free UnO systems. Certain classes the expense of infrequently exercised components to perform of applications are inherently capable of absorbing some the slack redistribution without an overhead. The resulting amount of errors in computations, which allows for quality processor will produce only a small number of errors for large to be traded off for power. Errors in hardware (which are power savings. Also, the error rate will increase gradually with circuit observable but system insignificant, such as errors in increased power savings [80]. Hardware can be explicitly de- speculation units of a processor [78]) can be permitted to signed for a target error rate. The errors are either detected and improve other design metrics such as power, area, and delay. corrected by a hardware error resilience mechanism or allowed Error probability can be traded off with design metrics by to propagate to an error tolerant application where the errors adjusting verification thresholds, or selective guardbanding, manifest themselves as reduced performance or output quality. or selective redundancy, etc. Recent work (“stochastic Corresponding opportunities exist at the microarchitectural computing” [79]) advocates that hardware should be allowed level. Architectural design decisions have always been, and to produce errors even during nominal operation if such still are made in the context of correctness. Until now, error GUPTA et al.: UNDERDESIGNED AND OPPORTUNISTIC COMPUTING IN PRESENCE OF HARDWARE VARIABILITY 19 resilience has been viewed as a way to increase the efficiency An adaptive software stack cognizant of the exact hardware of a design that has been architected for correctness but state enables a fluid specification for UnO computing plat- operates outside the bounds of the correctness guarantee. forms so that meeting a rigid specification becomes less im- However, optimizing a design for correctness can result in portant than selecting good power/performance/area/reliability significant inefficiency when the actual intent is to operate operating points. Using an adaptable software stack—and the design at a nonzero error rate and allow errors to be knowing the bounds of this adaptability—we can relax the tolerated or corrected by the error resilience mechanism [82]. notion of harsh constraints for design of computing systems In other words, one would make different, sometimes coun- and alleviate the “last-MHz problem” that results in a very high terintuitive, architectural design choices when optimizing a cost to achieve the marginally high performance. Similarly, an processor specifically for error resilience than when optimizing application-aware fluid test specification can lead to large gains for correctness [83]. This is not surprising considering that in parametric manufacturing yield. For instance in our recent many microarchitectural decisions such as increasing the size simulation study [21], we show 30% improvement in hardware of caches and register files, etc., tend to modify the slack yield or equivalently 8% reduction in overdesign for simple distribution of the processor to look more like a wall. Similarly, H.264 video encoder by leveraging a software application that microarchitectural decisions such as increasing the size of adapts to underlying hardware. instruction and load-store queues worsen the delay scalability of the processor. When the processors are optimized for nonzero error rates, such decisions need to be avoided (see VII. Conclusion Fig. 15). At its core, handling variability of hardware specification at Though, most existing work [84]–[86] introduces errors by run time amounts to detecting the variability using hardware- intelligently overscaling voltage supplies, there has been some or software-based sensing mechanisms that we discussed ear- work in the direction of introducing error into a system via lier, followed by selecting a different execution strategy in manipulation of its logic-function, for adders [87], [88] as the UnO machine’s hardware or software or both. Though the well as generic combinational logic [17]. Let us consider an focus in UnO systems is adaptation in the software layer, example functionally underdesigned integer multiplier circuit we realize that flexibility, agility, accuracy, and overhead [15] with potentially erroneous operation. We use a modified requirements may dictate choice of layer (in the hardware– 2×2 multiplier as a building block which computes 3 × 3 software stack) where adaptation happens for different vari- as 7 instead of 9. By representing the output using three ation sources and manifestations. To realize the UnO vision, bits (111) instead of the usual four (1001), we are able several challenges exist: to significantly reduce the complexity of the circuit. These inaccurate multipliers achieve an average power saving of 1) What are the most effective ways to detect the nature 31.78%–45.4% over corresponding accurate multiplier de- and extent of various forms of hardware variations? signs, for an average percentage error of 1.39%–3.32%. The 2) What are the important software-visible manifestations design can be enhanced (albeit at power cost) to allow for of these hardware variations, and what are appropriate correct operation of the multiplier using a correction unit, abstractions for representing the variations? for nonerror-resilient applications which share the hardware 3) What software mechanisms can be used to oppor- resource. Of course the benefits are strongly design dependent. tunistically exploit variability in hardware performance For instance, in the multiplier case, the savings translate to and reliability, and how should these mechanisms be 18% saving for an FIR filter and only 1.5% for a small distributed across application, compiler-generated code, embedded microprocessor. Functional underdesign will be a and run-time operating system services? useful power reduction and/or performance improvement knob 4) How can hardware designers and design tools leverage especially for robust and gracefully degrading applications. application characteristics and opportunistic software Much research needs to be done to functionally or para- stack? metrically underdesign large general class of circuits auto- 5) What is the best way of verifying and testing hardware matically. Mechanisms to pass application intent to physical when there are no strict constraints at the hardware– implementation flows (especially to logic synthesis in case of software boundary and the hardware and application functional underdesign) need to be developed. Formally quan- behaviors are no longer constant? tifying impact of hardware errors on applications is important Our early results illustrating UnO hardware–software stack (e.g., [89] computes the fundamental limits of accuracy of are very promising. For instance, our implementation of LDPC error control decoders due to noise in transmission as variability-aware dutycycling in TinyOS gave over 6X av- well as underdesigned hardware units). Error steering toward erage improvement in active time for embedded sensing. less important (depending on the application) parts of ISA With shrinking dimensions approaching physical limits of during design or at runtime can be done with ISA extensions semiconductor processing, this variability and resulting ben- specifying latency and reliability constraints to the hardware. efits from UnO computing are likely to increase in future. Annotations indicating access patterns, reliability requirements Such a vision of a redefined—and likely flexible/adaptive— [90], etc. of code segments as well as data can be used hardware/software interface presents us with an opportunity to effectively especially in case of heterogeneous circuits and substantially improve the energy efficiency, performance and systems (e.g., [19]). cost of computing systems. 20 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 32, NO. 1, JANUARY 2013

References oscillators,” in Proc. IEEE Int. Symp. Quality Electron. Design, Mar. 2012, pp. 633–640. [1] K. Jeong, A. B. Kahng, and K. Samadi, “Impacts of guardband reduction [24] C. Kim, K. Roy, S. Hsu, R. Krishnamurthy, and S. Borkar, “An on-die on design process outcomes: A quantitative approach,” IEEE Trans. CMOS leakage current sensor for measuring process variation in sub- Semicond. Manuf., vol. 22, no. 4, pp. 552–565, Nov. 2009. 90 nm generations,” in Proc. IEEE VLSI Circuits Symp., May 2004, pp. [2] ITRS. (2009). The International Technology Roadmap for Semiconduc- 221–222. tors [Online]. Available: http://public.itrs.net/ [25] P. Singh, E. Karl, D. Sylvester, and D. Blaauw, “Dynamic NBTI [3] L. Grupp, A. M. Caulfield, J. Coburn, E. Yaakobi, S. Swanson, and management using a 45 nm multi-degradation sensor,” in Proc. IEEE P. Siegel, “Characterizing flash memory: Anomalies, observations, and Custom Integr. Circuits Conf., Sep. 2010, pp. 1–4. applications,” in Proc. IEEE/ACM Int. Symp. Microarchit., Dec. 2009, [26] P. Franco and E. McCluskey, “On-line testing of digital circuits,” in pp. 24–33. Proc. IEEE VLSI Test Symp., 1994, pp. 167–173. [4] S. Dighe, S. Vangal, P. Aseron, S. Kumar, T. Jacob, K. Bowman, [27] M. Agarwal, B. Paul, M. Zhang, and S. Mitra, “Circuit failure prediction J. Howard, J. Tschanz, V. Erraguntla, N. Borkar, V. De, and S. Borkar, and its application to transistor aging,” in Proc. IEEE VLSI Test Symp., “Within-die variation-aware dynamic-voltage-frequency scaling core May 2007, pp. 277–286. mapping and thread hopping for an 80-core processor,” in Proc. IEEE [28] D. Ernst, N. S. Kim, S. Das, S. Pant, R. Rao, T. Pham, C. Ziesler, Int. Solid-State Circuit Conf., Feb. 2010, pp. 174–175. D. Blaauw, T. Austin, K. Flautner, and T. Mudge, “Razor: A low-power [5] R. W. Johnson, J. L. Evans, P. Jacobsen, J. R. Thompson, and M. pipeline based on circuit-level timing speculation,” in Proc. IEEE/ACM Christopher, “The changing automotive environment: High temperature Int. Symp. Microarchit., Dec. 2003, pp. 7–18. electronics,” IEEE Trans. Electron. Packag. Manuf., vol. 27, no. 3, pp. [29] K. Bowman, J. Tschanz, N. S. Kim, J. Lee, C. Wilkerson, S.-L. 164–176, Jul. 2004. Lu, T. Karnik, and V. De, “Energy-efficient and metastability-immune [6] L. Wanner, C. Apte, R. Balani, P. Gupta, and M. Srivastava, “Hardware resilient circuits for dynamic variation tolerance,” IEEE J. Solid State variability-aware duty cycling for embedded sensors,” IEEE Trans. Very Circuits, vol. 44, no. 1, pp. 49–63, Jan. 2009. Large Scale Integr. (VLSI) Syst., to be published. [30] M. Nicolaidis, “Time redundancy based soft-error tolerance to res- [7] H. Hanson, K. Rajamani, J. Rubio, S. Ghiasi, and F. Rawson, “Bench- cue nanometer technologies,” in Proc. IEEE VLSI Test Symp., 1999, marking for power and performance,” in Proc. SPEC Benchmarking pp. 86–94. Workshop, 2007. [31] Y. Li, S. Makar, and S. Mitra, “CASP: Concurrent autonomous chip self- [8] R. Zheng, J. Velamala, V. Reddy, V. Balakrishnan, E. Mintarno, S. Mitra, test using stored test patterns,” in Proc. IEEE/ACM Design, Automat. S. Krishnan, and Y. Cao, “Circuit aging prediction for low-power Test Eur., Mar. 2008, pp. 885–890. operation,” in Proc. IEEE Custom Integr. Cicuit Conf., Sep. 2009, pp. [32] H. Inoue, L. Yanjing, and S. Mitra, “VAST: Virtualization-assisted 427–430. concurrent autonomous self-test,” in Proc. IEEE Int. Test Conf., Oct. [9] J. McCullough, Y. Agarwal, J. Chandrashekhar, S. Kuppuswamy, A. C. 2008, pp. 1–10. Snoeren, and R. K. Gupta, “Evaluating the effectiveness of model-based [33] Y. Li, Y. Kim, E. Mintarno, D. Gardner, and S. Mitra, “Overcoming power characterization,” in Proc. USENIX Annu. Tech. Conf., 2011. early-life failure and aging challenges for robust system design,” IEEE [10] B. Balaji, J. McCullough, R. Gupta, and Y. Agarwal, “Accurate char- Design Test Comput., vol. 26, no. 6, p. 1, Dec. 2009. acterization of variability in power consumption in modern computing [34] P. Parvathala, K. Maneparambil, and W. Lindsay, “FRITS: A micropro- platforms,” in Proc. USENIX Workshop Power-Aware Comput. Syst. cessor functional BIST method,” in Proc. IEEE Int. Test Conf., 2002, (USENIX HotPower), 2012. pp. 590–598. [11] M. Gottscho, A. Kagalwalla, and P. Gupta, “Power variability in contem- [35] J. Shen and J. Abraham, “Native mode functional test generation for porary DRAMs,” IEEE Embedded Syst. Lett., vol. 4, no. 2, pp. 37–40, processors with applications to self test and design validation,” in Proc. Jun. 2012. IEEE Int. Test Conf., Oct. 1998, pp. 990–999. [12] L. Bathen, M. Gottscho, N. Dutt, P. Gupta, and A. Nicolau, “ViPZonE: [36] M.-L. Li, P. Ramachandran, S. K. Sahoo, S. V. Adve, V. S. Adve, and OS-level memory variability-aware physical address zoning for energy Y. Zhou, “Understanding the propagation of hard errors to software savings,” in Proc. IEEE/ACM CODES+ISSS, 2012. and implications for resilient system design,” in Proc. Archit. Support [13] T. Austin, D. Blaauw, T. Mudge, and K. Flautner, “Making typical Programm. Languages Operating Syst., Mar. 2008, pp. 265–276. silicon matter with Razor,” IEEE Comput., vol. 37, no. 3, pp. 57–65, [37] M.-L. Li, P. Ramachandran, S. K. Sahoo, S. V. Adve, V. S. Adve, and Mar. 2004. Y. Zhou, “Trace-based microarchitecture-level diagnosis of permanent [14] V. K. Chippa, D. Mohapatra, A. Raghunathan, K. Roy, and S. T. hardware faults,” in Proc. IEEE Conf. Dependable Syst. Netw., Jun. 2008, Chakradhar, “Scalable effort hardware design: Exploiting algorithmic pp. 22–31. resilience for energy efficiency,” in Proc. IEEE/ACM Design Automat. [38] S. K. Sahoo, M.-L. Li, P. Ramachandran, S. V. Adve, V. S. Adve, and Conf., Jun. 2010, pp. 555–560. Y. Zhou, “Using likely program invariants to detect hardware errors,” in [15] P. Kulkarni, P. Gupta, and M. Ercegovac, “Trading accuracy for power in Proc. IEEE. Int. Conf. Dependable Syst. Netw., Jun. 2008, pp. 70–79. a multiplier architecture,” J. Low Power Electron., vol. 7, pp. 482–489, [39] D. Lin, T. Hong, F. Fallah, N. Hakim, and S. Mitra, “Quick detection of 2011. difficult bugs for effective post-silicon validation,” in Proc. IEEE/ACM [16] M. R. Choudhury and K. Mohanram, “Approximate logic circuits for low IEEE Conf. Dependable Syst. Netw., 2012, pp. 561–566. overhead, non-intrusive concurrent error detection,” in Proc. IEEE/ACM [40] N. Oh, S. Mitra, and E. McCluskey, “Ed4i: Error detection by diverse Design Automat. Test Eur., Mar. 2008, pp. 903–908. data and duplicated instructions,” in Proc. IEEE Trans. Comput., vol. 51, [17] D. Shin and S. K. Gupta, “Approximate logic synthesis for error tolerant no. 2. Feb. 2002, pp. 180–199. applications,” in Proc. IEEE/ACM Design Automat. Test Eur., Mar. 2010, [41] T. Hong, “QED: Quick error detection tests for effective post- pp. 957–960. silicon validation,” in Proc. IEEE Int. Test Conf. , Nov. 2010, pp. [18] M. Breuer, S. Gupta, and T. Mak, “Defect and error tolerance in the 1–10. presence of massive numbers of defects,” IEEE Design Test Comput., [42] P. Singh, Z. Foo, M. Wieckowski, S. Hanson, M. Fojtik, D. Blaauw, vol. 21, no. 3, pp. 216–227, May–Jun. 2004. and D. Sylvester, “Early detection of oxide breakdown through in situ [19] H. Cho, L. Leem, and S. Mitra, “ERSA: Error resilient system architec- degradation sensing,” in Proc. IEEE Int. Solid-State Circuits Conf. Dig. ture for probabilistic applications,” IEEE Trans. Comput.-Aided Design Tech. Papers, Feb. 2010, pp. 1–10. Integr. Circuit Syst., vol. 31, no. 4, pp. 546–558, Apr. 2012. [43] X. Li, R. R. Rutenbar, and R. D. Blanton, “Virtual probe: A statisti- [20] A. B. Kahng, S. Kang, R. Kumar, and J. Sartori, “Designing processors cally optimal framework for minimum-cost silicon characterization of from the ground up to allow voltage/reliability tradeoffs,” in Proc. Int. nanoscale integrated circuits,” in Proc. IEEE/ACM Int. Conf. Comput. Symp. High-Perfor. Comput. Archit., Jan. 2010, pp. 1–11. Aided Design, 2009, pp. 433–440. [21] A. Pant, P. Gupta, and M. v.-d. Schaar, “AppAdapt: Opportunistic [44] M. Qazi, M. Tikekar, L. Dolecek, D. Shah, and A. Chandrakasan, “Loop application adaptation to compensate hardware variation,” IEEE Trans. flattening and spherical sampling: Highly efficient model reduction tech- Very Large Scale Integr. (VLSI) Syst., vol. 20, no. 11, pp. 1986–1996, niques for sram yield analysis,” in Proc. IEEE/ACM Design, Automat. Nov. 2012. Test Eur. Conf. Exhib., Mar. 2010, pp. 801–806. [22] J. Sartori and R. Kumar, “Branch and data herding: Reducing control and [45] E. Mintarno, E. Skaf, J. R. Zheng, J. Velamala, Y. Cao, S. Boyd, memory divergence for error-tolerant GPU applications,” IEEE Trans. R. Dutton, and S. Mitra, “Self-tuning for maximized lifetime energy- Multimedia, to be published. efficiency in the presence of circuit aging,” in Proc. IEEE Trans. Comput. [23] T. B. Chan, P. Gupta, A. B. Kahng, and L. Lai, “DDRO: A novel Aided Design Integrated Circuit Syst., vol. 30, no. 5, May 2011, pp. performance monitoring methodology based on design-dependent ring 760–773. GUPTA et al.: UNDERDESIGNED AND OPPORTUNISTIC COMPUTING IN PRESENCE OF HARDWARE VARIABILITY 21

[46] Y. Li, O. Mutlu, D. Gardner, and S. Mitra, “Concurrent autonomous self- [69] S. Borkar, T. Karnik, S. Narendra, J. Tschanz, A. Keshavarzi, and test for uncore components in system-on-chips,” in Proc. IEEE VLSI Test V. De, “Parameter variations and impact on circuits and microarchi- Symp., 2010, pp. 232–237. tecture,” in Proc. IEEE/ACM Design Automat. Conf., Jun. 2003, pp. [47] V. J. Reddi, M. S. Gupta, M. D. Smith, G.-Y. Wei, D. Brooks, 338–342. and S. Campanoni, “Software-assisted hardware reliability: Abstracting [70] S. Ghosh, S. Bhunia, and K. Roy, “CRISTA: A new paradigm for low- circuit-level challenges to the software stack,” in Proc. IEEE/ACM power, variation-tolerant, and adaptive circuit synthesis using critical Design Automat. Conf., Jul. 2009, pp. 788–793. path isolation,” IEEE Trans. Comput.-Aided Design Integr. Circuit Syst., [48] J. Choi, C.-Y. Cher, H. Franke, H. Hamann, A. Weger, and P. Bose, vol. 26, no. 11, pp. 1947–1956, Nov. 2007. “Thermal-aware task scheduling at the system software level,” in Proc. [71] A. Agarwal, B. Paul, S. Mukhopadhyay, and K. Roy, “Process variation IEEE/ACM Int. Symp. Low Power Electron. Design, 2007, pp. 213–218. in embedded memories: Failure analysis and variation aware architec- [49] A. Pant, P. Gupta, and M. van der Schaar, “Software adaptation in quality ture,” IEEE J. Solid State Circuits, vol. 40, no. 9, pp. 1804–1814, Sep. sensitive applications to deal with hardware variability,” in Proc. Great 2005. Lakes Symp. VLSI, 2010, pp. 213–218. [72] D. Sylvester, D. Blaauw, and E. Karl, “Elastic: An adaptive self-healing [50] T. Matsuda, T. Takeuchi, H. Yoshino, M. Ichien, S. Mikami, architecture for unpredictable silicon,” IEEE Design Test Comput., H. Kawaguchi, C. Ohta, and M. Yoshimoto, “A power-variation model vol. 23, no. 6, pp. 484–490, Jun. 2006. for sensor node and the impact against life time of wireless sensor [73] K. Meng and R. Joseph, “Process variation aware cache leakage man- networks,” in Proc. IEEE Int. Conf. Commun. Electron., Oct. 2006, pp. agement,” in Proc. IEEE/ACM Int. Symp. Low Power Electron. Design, 106–111. 2006, pp. 262–267. [51] S. Garg and D. Marculescu, “On the impact of manufacturing process [74] A. Tiwari, S. R. Sarangi, and J. Torrellas, “Recycle: Pipeline adaptation variations on the lifetime of sensor networks,” in Proc. IEEE/ACM to tolerate process variation,” in Proc. Int. Symp. Comput. Archit., 2007, CODES+ISSS, Sep.–Oct. 2007, pp. 203–208. pp. 323–334. [52] M. Pedram and S. Nazarian, “Thermal modeling, analysis, and manage- [75] S. Chandra, K. Lahiri, A. Raghunathan, and S. Dey, “Variation-tolerant ment in VLSI circuits: Principles and methods,” Proc. IEEE, vol. 94, dynamic power management at the system-level,” IEEE Trans. Very no. 8, pp. 1487–1501, Aug. 2006. Large Scale Integr. (VLSI) Syst., vol. 17, no. 9, pp. 1220–1232, Sep. [53] S. Sankar, M. Shaw, and K. Vaid, “Impact of temperature on hard disk 2009. drive reliability in large datacenters,” in Proc. Int. Conf. Dependable [76] R. Teodorescu and J. Torrellas, “Variation-aware application scheduling Syst. Netw., Jun. 2011, pp. 530–537. and power management for chip multiprocessors,” in Proc. Int. Symp. [54] E. Kursun and C.-Y. Cher, “Temperature variation characterization and Comput. Archit., Jun. 2008, pp. 363–374. thermal management of multicore architectures,” IEEE/ACM MICRO, [77] V. J. Kozhikkottu, R. Venkatesan, A. Raghunathan, and S. Dey, “VESPA: vol. 29, no. 1, pp. 116–126, Jan.–Feb. 2009. Variability emulation for system-on-chip performance analysis,” in Proc. [55] A. Coskun, R. Strong, D. Tullsen, and T. S. Rosing, “Evaluating IEEE/ACM IEEE/ACM Design, Automat. Test Eur. Conf. Exhib., Mar. the impact of job scheduling and power management on proces- 2011, pp. 1–6. sor lifetime for chip multiprocessors,” in Proc. SIGMETRICS, 2009, [78] S. Mukherjee, C. Weaver, J. Emer, S. Reinhardt, and T. Austin, “A sys- pp. 169–180. tematic methodology to compute the architectural vulnerability factors [56] S. Novack, J. Hummel, and A. Nicolau, “A simple mechanism for for a high-performance microprocessor,” in Proc. IEEE/ACM MICRO, improving the accuracy and efficiency of instruction-level disambigua- Dec. 2003, p. 29. tion,” in Proc. Int. Workshop Languages Compilers Parallel Comput., [79] N. Shanbhag, R. Abdallah, R. Kumar, and D. Jones, “Stochastic com- 1995. putation,” in Proc. IEEE/ACM Design Automat. Conf., Jun. 2010, pp. [57] L. Bathen, N. Dutt, A. Nicolau, and P. Gupta, “VaMV: Variability-aware 859–864. memory virtualization,” in Proc. IEEE/ACM Design, Automat. Test Eur. [80] A. B. Kahng, S. Kang, R. Kumar, and J. Sartori, “Recovery-driven de- Conf. Exhib., Mar. 2012, pp. 284–287. sign: A methodology for power minimization for error tolerant processor [58] H. Hoffmann, S. Sidiroglou, M. Carbin, S. Misailovic, A. Agarwal, and modules,” in Proc. IEEE/ACM Design Automat. Conf., Jun. 2010, pp. M. Rinard, “Dynamic knobs for responsive power-aware computing,” in 825–830. Proc. SIGARCH Computer Architecture News, vol. 39. Mar. 2011, pp. [81] A. B. Kahng, S. Kang, R. Kumar, and J. Sartori, “Slack redistribution 199–212. for graceful degradation under voltage overscaling,” in Proc. 15th [59] W. Baek and T. M. Chilimbi, “Green: A framework for support- IEEE/SIGDA Asia South Pacific Design Automat. Conf., Jan. 2010, pp. ing energy-conscious programming using controlled approximation,” in 825–831. Proc. SIGPLAN Not., Jun. 2010. [82] N. Zea, J. Sartori, and R. Kumar, “Optimal power/performance pipelin- [60] S. Narayanan, J. Sartori, R. Kumar, and D. Jones, “Scalable stochastic ing for timing error resilient processors,” in Proc. IEEE Int. Conf. processors,” in Proc. IEEE/ACM Design, Automat. Test Eur. Conf. Compute. Design, Oct. 2010, pp. 356–363. Exhib., Mar. 2010, pp. 335–338. [83] J. Sartori and R. Kumar, “Exploiting timing error resilience in archi- [61] R. Hegde and N. R. Shanbhag, “Energy-efficient signal processing via tecture of energy-efficient processors,” ACM Trans. Embedded Comput. algorithmic noise-tolerance,” in Proc. IEEE/ACM Int. Symp. Low Power Syst., to be published. Electron. Design, Aug. 1999, pp. 30–35. [84] K. V. Palem, “Energy aware computing through probabilistic switching: [62] K.-H. Huang and J. Abraham, “Algorithm-based fault tolerance for A study of limits,” IEEE Trans. Comput., vol. 54, no. 9, pp. 1123–1137, matrix operations,” IEEE Trans. Comput., vol. 33, no. 6, pp. 518–528, Sep. 2005. Jun. 1984. [85] M. S. Lau, K.-V. Ling, and Y.-C. Chu, “Energy-aware probabilistic [63] J. Sloan, D. Kesler, R. Kumar, and A. Rahimi, “A numerical multiplier: Design and analysis,” in Proc. ACM CASES, Oct. 2009, pp. optimization-based methodology for application robustification: Trans- 281–290. forming applications for error tolerance,” in Proc. IEEE Dependable [86] J. George, B. Marr, B. E. S. Akgul, and K. V. Palem, “Probabilistic Syst. Netw., Jun.–Jul. 2010, pp. 161–170. arithmetic and energy efficient embedded signal processing,” in Proc. [64] J. Ansel, C. Chan, Y. L. Wong, M. Olszewski, Q. Zhao, A. Edelman, and ACM CASES, 2006, pp. 158–168. S. Amarasinghe, “PetaBricks: A language and compiler for algorithmic [87] D. Shin and S. K. Gupta, “A re-design technique for datapath modules choice,” in Proc. SIGPLAN Not., vol. 44. Jun. 2009. in error tolerant applications,” in Proc. Asian Test Symp., Nov. 2008, pp. [65] J. Sorber, A. Kostadinov, M. Garber, M. Brennan, M. D. Corner, and 431–437. E. D. Berger, “Eon: A language and runtime system for perpetual [88] D. Kelly and B. Phillips, “Arithmetic data value speculation,” systems,” in Proc. Int. Conf. Embedded Netw. Sens. Syst., 2007. in Proc. Asia-Pacific Comput. Syst. Archit. Conf., 2005, [66] A. Lachenmann, P. J. Marron,´ D. Minder, and K. Rothermel, “Meeting pp. 353–366. lifetime goals with energy levels,” in Proc. Int. Conf. Embedded Netw. [89] S. M. S. T. Yazdi, H. Cho, Y. Sun, S. Mitra, and L. Dolecek, “Prob- Sensor Syst., 2007, pp. 131–144. abilistic analysis of Gallager B faulty decoder,” in Proc. IEEE Int. [67] M. Frigo and S. G. Johnson, “The design and implementation of Conf. Commun. (ICC)—Emerging Data Storage Technol., Jun. 2012, pp. FFTW3,” Proc. IEEE, vol. 93, no. 2, pp. 216–231, Feb. 2005. 43–48. [68] X. Li, M. Garzaran, and D. Padua, “Optimizing sorting with machine [90] J. Henkel et al., “Design and architectures for dependable embed- learning algorithms,” in Proc. IEEE Int. Parallel Distributed Symp., Mar. ded systems,” in Proc. IEEE/ACM CODES+ISSS, Oct. 2011, pp. 2007, pp. 1–6. 69–78. 22 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 32, NO. 1, JANUARY 2013

Puneet Gupta (S’02–M’08) received the Ph.D. de- Rakesh Kumar (M) received the B.Tech. degree gree from the University of California, San Diego, in computer science and engineering from the In- in 2007. dian Institute of Technology Kharagpur, Kharagpur, He co-founded Blaze DFM, Inc., Sunnyvale, CA, India, in 2001, and the Ph.D. degree in computer in 2004. He is currently a Faculty Member with engineering from the University of California, San the Department of Electrical Engineering, Univer- Diego, in September 2006. sity of California, Los Angeles. He currently leads He is currently an Assistant Professor with the the IMPACT+ Center. His current research interests Department of Electrical Communication Engineer- include design-manufacturing interface for lowered ing, University of Illinois at Urbana-Champaign, costs, increased yield, and improved predictability Urbana. His current research interests include error of integrated circuits and systems. resilient computer systems and low power computer Dr. Gupta was the recipient of the National Science Foundation CAREER architectures for emerging workloads. Award and the ACM/SIGDA Outstanding New Faculty Award. Subhasish Mitra (M) received the Ph.D. degree in electrical engineering from Stanford University, Yuvraj Agarwal (M) received the Ph.D. degree Stanford, CA, in 2000. in computer engineering from the University of He is currently the Director of the Robust Systems California (UC), San Diego, in 2009. Group of the Department of Electrical Engineering He is currently a Research Scientist with the De- and the Department of Computer Science, Stanford partment of Computer Science and Engineering, UC, University, Stanford, CA, where he is the Chambers San Diego. His current research interests include Faculty Scholar of Engineering. Prior to joining intersection of systems and networking and embed- Stanford, he was a Principal Engineer with Intel ded systems. In recent years, his research has been Corporation. His current research interests include focused on green computing, mobile computing, and robust system design, VLSI design, CAD, validation energy efficient buildings. and test, and emerging nanotechnologies. His research results, jointly with his students and collaborators, have seen widespread proliferation in the industry and have been recognized by several major awards. Lara Dolecek (M) received the Ph.D. degree from the University of California (UC), Berkeley, in Alexandru Nicolau (M) received the Ph.D. degree 2007. from Yale University, New Haven, CT, in 1984. She is currently an Assistant Professor with the He is currently a Professor with the Department Department of Electrical Engineering, UC, Los An- of Computer Science and the Department of Electri- geles. Her current research interests include coding cal Engineering and Computer Science, University and information theory, graphical models, and sta- of California, Irvine. His current research inter- tistical inference. ests include systems, electronic design automation, Dr. Dolecek received the 2007 David J. Sakrison computer architecture, systems software, parallel Memorial Prize at UC Berkeley and the 2012 NSF processing, and high performance computing. He CAREER Award. has authored or co-authored more than 250 peer- reviewed articles and multiple books. Dr. Nicolau serves as the Editor-in-Chief of IJPP and he is on the steer- Nikil Dutt (F) received the Ph.D. degree in computer ing/program committees of the ACM International Conference on Supercom- science from the University of Illinois at Urbana- puting, the Symposium on Principles and Practice of Parallel Programming, Champaign, Urbana, in 1989. the IEEE International Conference on Parallel Architectures and Compilation He is currently a Chancellor’s Professor of com- Technology, MICRO, and the Workshop on Languages and Compilers for puter science with the Department of Electrical Parallel Computing. Engineering and Computer Science, University of California, Irvine. His current research interests in- clude embedded systems, electronic design automa- Tajana Simunic Rosing (M) received the Masters tion, computer architecture, systems software, for- degree in engineering management and the Ph.D. mal methods, and brain-inspired architectures and degree in 2001 from Stanford University, Stanford, computing. CA. Dr. Dutt is an ACM Distinguished Scientist and an IFIP Silver Core She is currently an Associate Professor with the Awardee. He has served as the Editor-in-Chief of the ACM Transactions on Department of Computer Science, University of Cal- Design Automation of Electronic Systems. He currently serves as an Associate ifornia, San Diego. Prior to this, she was a Full-Time Editor for the ACM Transactions on Embedded Computing Systems and Researcher with HP Labs while working part-time the IEEE Transactions On Very Large Scale Integration (VLSI) at Stanford University. Her current research interests Systems. include energy efficient computing, and embedded and wireless systems.

Rajesh K. Gupta (F) received the Ph.D. degree in electrical engineering from Stanford University, Mani B. Srivastava (F’08) received the Ph.D. de- Stanford, CA, in 1994. gree from the University of California, Berkeley, in He is the QUALCOMM Chair Professor of com- 1992. puter science and engineering with the University He is a Faculty Member at University of Califor- of California, San Diego. His recent contributions nia, Los Angeles (UCLA), where he is a Professor include SystemC modeling and SPARK parallelizing with the Department of Electrical Engineering, and high-level synthesis, both of which are publicly is also affiliated with the Department of Computer available and have been incorporated into industrial Science. Prior to joining the University of Cali- practice. He currently leads the National Science fornia, Los Angeles, in 1992, he was with Bell Foundation Expedition on Variability. Labs Research. His current research interests include Dr. Gupta and his students received the Best Paper Award at DCOSS’08 embedded wireless systems, power-aware systems, and the Best Demonstration Award at IPSN/SPOTS’05. wireless networks, and pervasive sensing. GUPTA et al.: UNDERDESIGNED AND OPPORTUNISTIC COMPUTING IN PRESENCE OF HARDWARE VARIABILITY 23

Steven Swanson (M) received the Ph.D. degree Dennis Sylvester (S’95–M’00–SM’04–F’11) re- from the University of Washington, Seattle, in 2006. ceived the Ph.D. degree from the University of He is currently an Associate Professor with the California, Berkeley, in 1999. Department of Computer Science, University of Cal- He is currently a Professor with the Department ifornia, San Diego, and the Director of the Non- of Electrical Engineering and Computer Science, Volatile Systems Laboratory, San Diego. His current University of Michigan, Ann Arbor. He is a co- research interests include systems, architectures, se- founder of Ambiq Micro, a fabless semiconductor curity, and reliability issues surrounding nonvolatile, company developing ultralow power mixed-signal solid-state memories. solutions for compact wireless devices. His cur- rent research interests include design of millimeter- scale computing systems and energy efficient near- threshold computing for a range of applications.