Underdesigned and Opportunistic Computing in Presence Of

8 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 32, NO. 1, JANUARY 2013 Underdesigned and Opportunistic Computing in Presence of Hardware Variability Puneet Gupta, Member, IEEE, Yuvraj Agarwal, Member, IEEE, Lara Dolecek, Member, IEEE, Nikil Dutt, Fellow, IEEE, Rajesh K. Gupta, Fellow, IEEE, Rakesh Kumar, Member, IEEE, Subhasish Mitra, Member, IEEE, Alexandru Nicolau, Member, IEEE, Tajana Simunic Rosing, Member, IEEE, Mani B. Srivastava, Fellow, IEEE, Steven Swanson, Member, IEEE, and Dennis Sylvester, Fellow, IEEE Abstract—Microelectronic circuits exhibit increasing variations print circuits and build systems at exponentially growing ca- in performance, power consumption, and reliability parameters pacities for the last three decades. After reaping the benefits of across the manufactured parts and across use of these parts the Moore’s law driven cost reductions, we are now beginning over time in the field. These variations have led to increasing use of overdesign and guardbands in design and test to ensure to experience dramatically deteriorating effects of material yield and reliability with respect to a rigid set of datasheet properties on (active and leakage) power, and die yields. So far, specifications. This paper explores the possibility of constructing the problem has been confined to the hardware manufacturers computing machines that purposely expose hardware variations who have responded with increasing guardbands applied to to various layers of the system stack including software. This microelectronic chip designs. However, the problem continues leads to the vision of underdesigned hardware that utilizes a software stack that opportunistically adapts to a sensed or mod- to worsen with critical dimensions shrinking to atomic scale. eled hardware. The envisioned underdesigned and opportunistic For instance, the oxide in 22 nm process is only five atomic computing (UnO) machines face a number of challenges related layers thick, and gate length is only 42 atomic diameters to the sensing infrastructure and software interfaces that can across. This has translated into exponentially increasing costs effectively utilize the sensory data. In this paper, we outline of fabrication and equipment to achieve increasingly precise specific sensing mechanisms that we have developed and their potential use in building UnO machines. control over manufacturing quality and scale. We are beginning to see early signs of trouble that may Index Terms—Computer architecture, design automation, make benefits of semiconductor industry unattainable for all design for manufacture, digital integrated circuits, micro- procesors, reliability, system software. but the most popular (hence highest volumes) of consumer gadgets. Most problematic is the substantial fluctuation in I. Introduction critical device/circuit parameters of the manufactured parts HE PRIMARY driver for innovations in computer sys- across the die, die-to-die and over time due to increasing T tems has been the phenomenal scalability of the semicon- susceptibility to errors and circuit aging-related wear-out. ductor manufacturing process that has allowed us to literally Consequently, the microelectronic substrate on which modern computers are built is increasingly plagued by variability in Manuscript received March 28, 2012; revised August 5, 2012; accepted September 16, 2012. Date of current version December 19, 2012. This performance (speed, power) and error characteristics, both work was supported in part by the National Science Foundation Variability across multiple instances of a system and in time over its usage Expedition award under Grants CCF-1029030, CCF-1028831, CCF-1028888, life. Consider the variability in circuit delay and power as it has CCF-1029025, and CCF-1029783. The focus of the National Science Foun- dation Computing Expedition on Variability-Aware Software for Efficient increased over time as shown in Fig. 1. The most immediate Computing with Nanoscale Devices (http://www.variability.org) is to address impact of such variability is on chip yields: a growing number these challenges and develop prototypical UnO computing machines. This of parts are thrown away since they do not meet the timing paper was recommended by Associate Editor V. Narayanan. P. Gupta, L. Dolecek, and M. B. Srivastava are with the University and power related specifications. Left unaddressed, this trend of California, Los Angeles, CA 90024 USA (e-mail: [email protected]; toward parts that scale in neither capability nor cost will [email protected]; [email protected]). cripple the computing and information technology industries. Y. Agarwal, R. K. Gupta, T. S. Rosing, and S. Swanson are with the Univer- sity of California, San Diego, CA 92093 USA (e-mail: [email protected]; The solution may stem from the realization that the problem [email protected]; [email protected]; [email protected]). is not variability per se, rather how computer system designers N. Dutt and A. Nicolau are with the University of California, Irvine, CA have been treating the variability. While chip components no 92617 USA (e-mail: [email protected]; [email protected]). S. Mitra is with Stanford University, Stanford, CA 447337 USA (e-mail: longer function like the precisely chiseled parts of the past, [email protected]). the basic approach to the design and operation of computers R. Kumar is with the University of Illinois at Urbana-Champaign, Urbana, has remained unchanged. Software assumes hardware of fixed IL 61820 USA (e-mail: [email protected]). D. Sylvester is with the University of Michigan, Ann Arbor, MI 48105 USA specifications that hardware designers try hard to meet. Yet (e-mail: [email protected]). hardware designers must accept conservative estimates of Color versions of one or more of the figures in this paper are available hardware specifications and cannot fully leverage software’s online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TCAD.2012.2223467 inherent flexibility and resilience. Guardband for hardware 0278-0070/$31.00 c 2012 IEEE GUPTA et al.: UNDERDESIGNED AND OPPORTUNISTIC COMPUTING IN PRESENCE OF HARDWARE VARIABILITY 9 Fig. 2. Frequency variation in an 80-core processor within a single die in Intel’s 65 nm technology [4]. Fig. 1. Circuit variability as predicted by ITRS [2]. design increases cost—maximizing performance incurs too much area and power overheads [1]. It leaves enormous performance and energy potential untapped as the software assumes lower performance than what a majority of instances of that platform may be capable of most of the time. In this paper, we outline a novel flexible hardware–software Fig. 3. Sleep power variability across temperature for ten instances of an off- stack and interface that use adaptation in software to re- the-shelf ARM Cortex M3 processor [6]. Only 5 out of 10 measured boards lax variation-induced guard-bands in hardware design. We are shown for clarity. envision a new paradigm for computer systems, one where nominally designed (and hence underdesigned) hardware parts performance variation of more than 25% at 0.8 V in a work within a software stack that opportunistically adapts to recent experimental Intel processor. Similarly, on the variations. memory/storage front, recent measurements done by us This paper is organized as follows. The next section dis- show 27% and 57% operation energy variation and 50% cusses sources of variability and exemplifies them with mea- bit error rate variation between nominally identical flash surements from actual systems. Section III gives an introduc- devices (i.e., same part number) [3]. This variability tion of underdesigned and opportunistic computing. Section can also include variations coming from manufacturing IV outlines various methods of variation monitoring with ex- the same design at multiple silicon foundries. amples. Section V discusses underdesigned and opportunistic 2) Environment. ITRS projects Vdd variation to be 10% computing (UnO) hardware while Section VI discusses UnO while the operating temperature can vary from −30 °C software stack implications. We conclude in Section VII. to 175°C (e.g., in the automotive context [5]) resulting in over an order of magnitude sleep power variation (see Fig. 3) resulting in several tens of percent performance II. Measured Variability in Contemporary change and large power deviations. Computing Systems 3) Vendor. Parts with almost identical specifications can Scaling of physical dimensions in semiconductor circuits have substantially different power, performance, or relia- faster than tolerances of equipments used to fabricate them bility characteristics, as shown in Fig. 4. This variability has resulted in increased variability in circuit metrics such as is a concern as single vendor sourcing is difficult for power and performance. In addition, new device and intercon- large-volume systems. nect architectures as well as new methods of fabricating them 4) Aging. Wires and transistors in integrated circuits suffer are changing sources and nature of variability. In this section, substantial wear-out leading to power and performance we discuss underlying physical sources of variation briefly, and changes over time of usage (see Fig. 5). Physical mech- illustrate the magnitude of this variability visible to the soft- anisms leading to circuit aging include bias temperature ware layers by measured off-the-shelf hardware components. instability (i.e., BTI where threshold voltage of tran- There are four sources of hardware

Underdesigned and Opportunistic Computing in Presence Of

Network Processors the Morgan Kaufmann Series in Systems on Silicon Series Editor: Wayne Wolf, Georgia Institute of Technology

About the Editors

2016 Esweek Program October 2-7, 2016 | Pittsburgh, Pa | Usa

On-Chip Communication Architectures the Morgan Kaufmann Series in Systems on Silicon Series Editor,Wayne Wolf, Georgia Institute of Technology

NIKIL DUTT Professor of CS and EECS University of California, Irvine

Memory Hierarchy Hardware-Software Co-Design in Embedded Systems

On-Chip Communication Architectures