Energy-Efficient Abundant-Data Computing: the N3XT 1,000X

COVER FEATURE REBOOTING COMPUTING Energy-Efficient Abundant-Data Computing: The N3XT 1,000× Mohamed M. Sabry Aly, Mingyu Gao, Gage Hills, Chi-Shuen Lee, Greg Pitner, Max M. Shulaker, Tony F. Wu, and Mehdi Asheghi, Stanford University Jeff Bokor, University of California, Berkeley Franz Franchetti, Carnegie Mellon University Kenneth E. Goodson and Christos Kozyrakis, Stanford University Igor Markov, University of Michigan, Ann Arbor Kunle Olukotun, Stanford University Larry Pileggi, Carnegie Mellon University Eric Pop, Stanford University Jan Rabaey, University of California, Berkeley Christopher Ré, H.-S. Philip Wong, and Subhasish Mitra, Stanford University Next-generation information technologies will process unprecedented amounts of loosely structured data that overwhelm existing computing systems. N3XT improves the energy efficiency of abundant-data applications 1,000-fold by using new logic and memory technologies, 3D integration with fine-grained connectivity, and new architectures for computation immersed in memory. 24 COMPUTER PUBLISHED BY THE IEEE COMPUTER SOCIETY 0018-9162/15/$31.00 © 2015 IEEE he rising demand for high- new system technology that promises enabled by low-temperature performance IT services with to breathe new life into computing. layer transfer techniques. This human-like interfaces is driv- Key N3XT components include the unique approach decouples ing the quest for the next gen- following: high-temperature nanoma- Teration of energy-efficient computers. terial synthesis (to achieve These computers will operate on abun- › High-performance and energy- high- quality materials) from dant data that can be highly unstruc- efficient field-effect transistors low-temperature monolithic 3D tured and often streamed in terabytes. (FETs) based on atomic-scale integration. Abundant-data workloads arise from nanomaterials, such as 1D car- › Embedded cooling technologies social networks, e-commerce transac- bon nanotubes (CNTs) and 2D targeting a range of application tions, genome sequences, and multi- layered semiconductors. domains (for example, hand- media analytics. Within 10 years, tril- › Massive amounts of nonvola- held versus servers) to overcome lions of sensors will be connected to tile storage such as low- voltage power density challenges. Exam- the Internet, creating a massive data resistive RAM (RRAM) and ples include conduction using deluge that could overwhelm commu- magnetoresistive memories 2D materials, management of nication bandwidths. Computers must such as spin-transfer torque thermal transients based on be able to process, understand, classify, magnetic RAM (STT-MRAM). phase change, and convective and organize relevant data in real time These diverse technologies offer copper nanomesh structures and in an energy- and cost-efficient complementary tradeoffs among connected to chip periphery manner. high density, quick access, long microfluidics. The slowdown of silicon CMOS data retention, and read/write › New microarchitectures and sys- (Dennard) scaling has prompted com- endurance. Their advantages tem runtimes for scalable com- prehensive research on faster, more can be successfully utilized putation immersed in memory energy-efficient switches. However, and their drawbacks avoided that lead to massive amounts of better switches alone will not deliver through a carefully designed active data, enabled by the above the necessary leaps in performance. In memory hierarchy and tight technology components and particular, abundant-data applications expose gross inefficiencies in traditional architectures, where poor local- ity leads to excessive cache misses, N3XT PROMISES MAJOR ENERGY-DELAY causing massive and slow off-chip traf- fic to pin-limited DRAMs that face their PRODUCT BENEFITS FOR WIDE-RANGING own scaling challenges. Thus, only APPLICATIONS, ESPECIALLY ABUNDANT- small fractions of time and energy of DATA WORKLOADS. the system are responsible for computation itself, presenting an opportunity for major improvements. integration with computation their fine-grained integration. N3XT: AN END-TO-END units. Cross-layer resilience techniques APPROACH › Fine-grained (monolithic, for overcome yield and reliability Our Nano-Engineered Computing example) 3D integration of com- challenges. Systems Technology (N3XT) approach puting and memory elements capitalizes on several recent nanotech- with ultradense connectivity We demonstrate the effectiveness nology breakthroughs (see Figure 1). between layers. Such fine- of N3XT by using the system-level Instead of focusing solely on improv- grained monolithic 3D integra- energy-delay product (EDP) metric— ing transistors or memory cells, N3XT tion is natural to the N3XT tran- the product of a software program’s adopts an integrated approach for a sistor and memory technologies, total energy consumption and total DECEMBER 2015 25 REBOOTING COMPUTING Experimental demonstrations (3) Fine-grained monolithic 3D integration (a) 3D RRAM - Compute + memory elements - Ultradense connectivity using nanoscale vias RRAM cells (2) High-density nonvolatile memories (b) Efcient heat removal solutions - 3D RRAM: massive storage - STT-MRAM: quick access (1) Energy-efcient FETs Thermal storage (copper + - 1D CNTs nanomesh and phase change) 3 µm - 2D layered nanomaterials (c) Monolithic 3D “high-rise chip” (5) Computation Logic Memory immersed in memory Memory Logic (4) Efcient heat removal On-chip nanoconvection/ conduction solutions FIGURE 1. Monolithically integrated 3D system enabled by Nano-Engineered Computing Systems Technology (N3XT). On the right are the five key N3XT components. On the left are images of experimental technology demonstrations: (a) transmission electron microscopy (TEM) of a 3D resistive RAM (RRAM) for massive storage, (b) scanning electron microscopy (SEM) of nanostructured materials for efficient heat removal (left: microscale capillary advection; right: copper nanomesh with phase-change thermal storage), and (c) SEM of a monolithic 3D chip for high-performance and energy-efficient computation. CNTs: carbon nanotubes, FETs: field-effect transistors, and STT-MRAM: spin-transfer torque magnetic RAM. execution time—subject to power den- key performance metrics among com- Atomically thin logic devices sity constraints. Given that speed can ponents. Additional synergies arise; for N3XT logic devices capitalize on the be traded for energy and vice versa, the example, faster memory accesses cut unique properties of atomic-scale EDP metric is important in quantifying core idle times, reducing energy con- nanomaterials including 1D CNTs and computing system performance.1 To sumption and overall execution time. 2D layered semiconductors (for exam- enable new frontiers of abundant-data Additional improvements arise from ple, black phosphorus, WSe2). These applications for both mobile devices ultradense monolithic 3D integration nanomaterials are ideal for building and the cloud, we target EDP improve- with fine-grained connectivity and highly scaled FETs that can deliver ments by 1,000×. For traditional multi- increased memory bandwidth, enabling large drive currents at low supply processor workloads, N3XT targets many concurrent memory accesses and voltages. Such FETs exhibit excel- 10×–100× EDP benefits. As we show, significantly reducing memory access lent electrostatic control (resulting N3XT experimental proto types can be contention; and from nonvolatile mem- from atomically thin 1D CNTs with built today. ories, which dramatically reduce idle approximately 1-nm diameter and 2D Such significant benefits are gener- energy consumption and simplify layered semiconductors) while simul- ally rare, and cannot be achieved with memory access mechanisms. taneously achieving excellent carrier evolutionary improvements in archi- transport. tectures, transistors, or memory cells N3XT TECHNOLOGY CNTs are hollow cylindrical nano- alone—an end-to-end approach such FOUNDATIONS structures of carbon atoms with excep- as N3XT is essential. Take, for example, Table 1 summarizes the primary tional electrical, thermal, and mechan- the total delay of a processor pipeline nano technologies that form the ical properties. A carbon nanotube or the total energy of processor cores foundations of N3XT. They work syn- FET (CNFET) consists of multiple CNTs and memories, where each component ergistically to overcome the limita- connected in parallel to form the tran- must show comparable improvement. tions of existing approaches while sistor channel (see Figure 2a). CNFETs N3XT improves each component and meeting application-level thermal promise an order-of- magnitude better finds symbiotic relations to enhance constraints. EDP versus silicon CMOS at the digital 26 COMPUTER WWW.COMPUTER.ORG/COMPUTER TABLE 1. Nano-Engineered Computing Systems Technology (N3XT) technology foundations. Impact on Technology Computation Storage Memory access Field-effect transistors: 1D carbon Highly energy-efficient digital NA Energy-efficient memory nanotubes and 2D layered semiconductors systems (including logic and controllers and peripheral interconnects) circuits Emerging Spin-transfer torque NA Quick access, high endurance No refresh; simple control; nonvolatile magnetic RAM energy-efficient management memory by turning off unused banks 3D resistive RAM NA Very high density, long retention Fine-grained (monolithic) 3D integration Computation immersed in Massive on-chip storage High bandwidth and low latency memory Integration of heterogeneous High computation density for a memory technologies

Energy-Efficient Abundant-Data Computing: the N3XT 1,000X

Spreading Excellence Report

A Compiler-Compiler for DSL Embedding

(URMD) Grad Cohort Workshop

Entrepreneurship Opportunities & Skills

Implementing and Evaluating Nested Parallel Transactions in Software Transactional Memory

Ecpe Connections

Kunle Olukotun Cadence Design Systems Professor and Professor of Electrical Engineering

News from SCS Networks

Multicore Cpus: Processor Proliferation - IEEE Spectrum 2/15/11 1:51 PM

University of Copenhagen

10003.Demeyerromain.2604.Pdf (0.1

A Year in Review