IBM Research

System Trends and their Impact on Future Design

Tilak Agerwala Vice President, Systems IBM Research

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research

Agenda

System and application trends Impact on architecture and The Memory Wall Cellular architectures and IBM's Blue Gene Summary

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research TRENDSTRENDS

Microprocessors < 10 GHz in systems 64-256 Way SMP 65-45nm, Copper, Highest performance SOI Best MP Scalability 1-2 GHz Leading edge technology 4-8 Way SMP RAS, virtualization ~100nm technology 10+ of GHz 4-8 Way SMP SMP/Large 65-45nm, Copper, SOI Systems Highest Frequency Cost and power Low GHz sensitive Leading edge process Uniprocessor technology Desktop ~100nm technology 2-4 GHz, Uniproc, Component-based and Game ~100nm, Copper, SOI Consoles Lowest Power / Lowest Multi MHz cost designs SoC capable Embedded Uniprocessor ASIC / Foundry ~100-200nm technologies Systems technology

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research TRENDSTRENDS

Large system application trends

Traditional commercial applications Databases, transaction processing, business apps like payroll etc. The internet has driven the growth of new commercial applications New life sciences applications are commercial and high-growth Drug discovery and genetic engineering research needs huge amounts of compute power (e.g. protein folding simulations) Important applications will scale out Large-scale parallelism Little or no interaction between computations e.g., web application serving, life sciences, softswitch, video streaming, financial front-ends, ERP, CRM, eProcurement

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research TRENDSTRENDS

Large SMP systems

SMPs support upward scalable workloads Workloads that scale well with more processors under a single system image Close interaction between threads e.g., databases, decision-support systems SMPs are also efficient with workloads that scale out Other cost-effective solutions are available Today: larger than 16-32 way SMPs 64-256 way systems foreseen for future workloads Robust RAS characteristics Robust virtualization/logical partitioning capabilities Staple on the mainframes for decades Cross-pollination with other high-end platforms is happening now Likely to trickle down to every platform except, maybe, embedded

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research

Synchronization and coherence structures create new bottlenecks in SMPs SMP CPI Even the single performance worsens as the number of finite CPI perfect cache CPI processors increases Perfect cache performance worsens because of locking and synchronization Greater OS pathlength than on a uniprocessor

Synchronization operations are slow CPI Finite cache performance deterioration Interaction among processors

1 6 13 # of processors

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research TRENDSTRENDS

Large "blade" systems

Usually intended for workloads that scale out Logically same as clusters Classic distributed memory multicomputer systems "Data center" or "grid" in a box Many inexpensive, possibly heterogeneous, nodes called blades Shared power/storage infrastructure Must minimize total system power and cost Each blade can be a small SMP One OS instance on each blade (typical) Simplified system management

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research TRENDSTRENDS

Large high-performance systems

100000 New trends are towards 10000 High compute density

IBM Blue Gene™ Low cost 1000 Power efficiency 84% CGR Riken MDM* High system reliability 100 TeraFLOPS ASCI 10 IBM Deep Blue®*

1 1995 2000 2005 2010 2015 Year * Special purpose machines, i.e. chess, molecular dynamics, protein folding.

Source: ASCI Roadmap www.llnl.gov/asci, IBM Moravec 1998, www.transhumanist.com/volume1/moravec.htm

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research TRENDSTRENDS

Evolution of game applications and infrastructure

Time

Full interaction, Nature "Transactional" immersive Enhanced user interaction of Pre-rendered Highly collaborative Automatic generation Applications Fixed content Dynamic synthesis, On-demand generation Timely updates of versions Adaptive Global

Distribution The Internet "On-demand" Media CD/DVD

Infrastructure

2001 2003 2005+

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research TRENDSTRENDS

Online gaming potential Rev $B 5.8 8.0 11.0 13.3 16.3 20.0 Online gaming set to become a significant part of the total games 74% market Off-line Online Console-based online gaming is expected to pass PC-based online %Total Games 26%

gaming in 2003-2004 timeframe Rev 256 344 569 1,228 2,660 5,279 $M

PC Console 25%

%Total Online 75% 100%

Year 2000 2001 2002 2003 2004 2005

Source: Forrester research 8/2000 - US Market only

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research TRENDSTRENDS

Game processors and systems

Huge growth in console game processors Desktop processors will push the frequency curve for the foreseeable future Desktop processors can be retooled, adapted, or used without modifications in game consoles Special hardware (e.g. nVidia add-on cards) Attached processors (e.g. Sony PlayStation2) Dedicated Special purpose functional units (e.g., AltiVec, SSE2) Game consoles could grow to become set-top boxes, home media and entertainment servers

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research TRENDSTRENDS

Embedded Systems

Embedded Systems are specialized or dedicated used to control appliances, devices and machines Platforms that use embedded systems include consumer appliances, IT devices and industrial/commercial machines Consumer: PDAs, game consoles, set top boxes, automotive control systems, home appliances IT: Printers, copiers, faxes, teller machines, telecom and routers, modems, videoconferencing, disk controllers Industrial/commercial: robotics, data acquisition, manufacturing control, process control, medical imaging and monitoring, aerospace, satellite systems, radar systems

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research TRENDSTRENDS

Growth in Embedded Devices

Embedded devices will Embedded Programmable Devices be pervasive 22 On average, 3 20 embedded 18 16 devices/person on the 14 planet by 2011 12 10 8

Billions of Units 6 4 2 0 1999 2001 2003 2005 2007 2009 2011 CAGR 10.3%

Source: Gartner 2002: Microprocessor, and Digital Signal Forecast Through 2005

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research

Embedded systems characteristics

New embedded applications are media-rich e.g., video-condap41i 0 0 1 0 6l Tf hone.1765 0.7137 0.702 rg ET 142.8 417 m 393138.6 313.4 3902.8 409.2 385S Q BT 162 407.4 380 0 0 rg /F3 181 T3030.1271 07 -0.1deoDSPmedia-be0 1 uslicto fulfi Tfations are -specific requirements in .1765 -22 0 0 071 216-0.1271 0-0.1deoed systems chara.1765 0.7137 0.702 rg ET 142.8 441 l 37.815.2 441 l 3453 434.4 3453 434.4 37.815b* BT 135.6 430.8 3 TD 0 0 rg /F0 21.0282 Tf -0.3658 Tc 2 Tw (e.g.,)Ceristics

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research

Agenda

System and application trends Impact on architecture and microarchitecture Architecture impact Microarchitecture impact and system bottlenecks Energy efficiency Component-based design System reliability Virtualization and logical partitioning The Memory Wall Cellular architectures and IBM's Blue Gene Summary

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research

Impact of the

Architecture Special functions for the trends Enhanced ISA functionality game and embedded Virtualization/LPAR space

Power Power-aware microarchitecture Low-power cpu cores/components Low-power circuits Low-power process technology Microarchitecture Power-aware pipelines Meet the frequency curve Availability Need for ILP and continues Intra-processor redundancy Balanced designs for power/performance System component redundancy

Cost SoC-like design Component-based design SoC Better tools

LargeLarge systemssystems DesktopDesktop andand GameGame EmbeddedEmbedded

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research IMPACTIMPACT ofof thethe TRENDSTRENDS

Architecture: ISA alone not a performance differentiator New applications are fairly ISA-agnostic But legacy applications are tightly connected to legacy ISAs RISC vs. CISC performance battle is over Modern CISCs implemented as RISCs internally Only the total system performance matters The new challenge is functionality Virtualization/logical partitioning; SIMD/DSP extensions; power Clear advantage to having single ISA from high-end to low-end

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research IMPACTIMPACT ofof thethe TRENDSTRENDS

PowerPC spans the entire spectrum for new applications

Power 4+ High-end Power 4 Power 3 Power 2

970 Price/Performance Leadership 750FX 750 603e

440GP

401 405GP

405LP

Today Future

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research IMPACTIMPACT ofof thethe TRENDSTRENDS

Microarchitecture: exploiting parallelism

Emphasis on Instruction-level and task parallelism High-issue rate O-O-O execution Multithreading and chip MPs Large shared on-chip caches (L1, L2), on-chip L3 directories Pressure to stay on the frequency curve Deeper pipelines, with many more latches Need better branch prediction and active Balancing degree of pipelining with Branch predictability and data forwarding latencies Power consumption Cache access latency and bandwidth Balancing issue width with Clock frequency Complexity of design and verification Design cost

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research IMPACTIMPACT ofof thethe TRENDSTRENDS

Power-aware processors and systems

Power-aware Better understand pipeline depth and power consumption tradeoffs Frequency, voltage scaling and Eliminate redundancy and speculation to conserve energy while minimizing performance impact Power efficient circuits and semiconductor technologies Power-efficient circuits, latches Process technologies like Si-on-Insulator (SOI) offer better power/performance IBM products show the advantages Power4+, PowerPC405LP Processor power density is a hard problem Processor is the hardest system component to cool Uneven heat densities on the processor chip Software-controlled power consumption optimization 25% of papers, 2 tutorials in this symposium related to this issue

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research IMPACTIMPACT ofof thethe TRENDSTRENDS

Wattage affects size

Physical Size = "What matters most to the 14.5 sq. designers at Google is not speed, but ft power -- low power, because data centers can consume as much electricity as a city." Service Floor- - Eric Schmidt, CEO Google (Quoted in Size = Loading NY Times, 9/29/02) 25 sq. ft Size = 36 sq. ft

Cooling Size = 190 sq. ft

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research IMPACTIMPACT ofof thethe TRENDSTRENDS

Component-based design driven by cost and power System-on-chip (SoC) technology today

STI n+ n+ STI p SRAM/ Device uProc. eDRAM

IP Blocks + (Cores) DSP Cells Ethernet Core Connect Controller SoC technology is still maturing A typical embedded SoC Portable/reusable CPU cores Embedded memory Interfaces to the world (USB, PCI, Ethernet et al.) Mixed signal blocks (optional) Programmable hardware (optional) ROM (holds firmware/software) Today: approx. 500K+ gates PowerPC 405GP

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research IMPACTIMPACT ofof thethe TRENDSTRENDS

High-end microprocessor design: an SOC-like approach Lower design cost, fast turnaround

System-on-a-Chip Technology Evolution System-on-a-Chip Examples

SMP Other uP uP Fabric Processor core Chips High-End core L3 SMP Server L2 L3 Cache Processor Cache Cntls Chip uP Mem Memory core Cntls

Increasing area is available for systems-on-a-chip uP I/O Cntlr I/O More cores Modular core Bladed Server L2 L3 Larger caches L3 Cache functions Processor Cache Cntlr Chip uP Hardware accelerators Mem core Memory Increased redundancy for Cntlr reliability

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research

Towards autonomic infrastructures

Self-protecting System designed to protect itself from any unauthorized access anywhere Self-optimizing System designed to automatically manage resources to allow the servers to meet the enterprise needs in the most efficient fashion Self-healing Autonomic problem determination and resolution Self-configuring systems designed to define itself "on the fly"

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research IMPACTIMPACT ofof thethe TRENDSTRENDS

Availability Systems must meet higher availability standards The 5-nines target: 99.999% uptime (~315 seconds/year) Soft error rates are going up as voltages and feature sizes are dropping Detection and correction of soft errors Redundancy in the microarchitecture ECC protection in cache and register structures control for failure detection and restart

Examples of hardware support Pipeline mirroring Pipeline mirroring in IBM mainframe processors in IBM z900 mainframe processor High availability mechanisms are essential for an autonomic infrastructure

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research IMPACTIMPACT ofof thethe TRENDSTRENDS

Research opportunity: Availability

Availability must be handled as a system issue Software Front-end apps (edge-server applications) Back-end apps (e.g. databases) OS, middleware Hardware Network connectivity Motherboard and peripherals Package Coherence structures , buses, I/O Processor

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research IMPACTIMPACT ofof thethe TRENDSTRENDS

Virtualization and logical partitioning Architectural support for virtualization Distinguishing between user/supervisor/hypervisor state Hypervisor mode instructions to create and protect logical partitions or full virtual machines There could be ISA obstacles to full virtualization This is hard stuff S/360 has been a pioneer since 1968 with the VM environment Success of Linux on mainframes: 100s of Linux's in a box LPAR on Power4: simple addition to the address translation hardware efficiently implements dynamic LPAR Significant cost and utilization advantage for customers Huge research opportunity Improving performance of virtual machines via dynamic adaptation Designing streamlined architectural support

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research IMPACTIMPACT ofof thethe TRENDSTRENDS

Design complexity Potential Design Complexity vs. Designer Productivity complexity is 1E+7 1E+8 increasing 1E+6 1E+7 Designer productivity cannot keep up Time-to-market is critical 1E+5 1E+6 Design and deliver within fixed time budget 1E+4 1E+5 Better, more intelligent design tools and automation methodologies are 1E+3 1E+4 necessary Extensive work is ongoing at IBM 1E+2 1E+3 Research to solve problems in this area 1E+1 1E+2 Productivity Trans./Staff-Mo.

System-level design tools Logic Transistors per Chip (K) Behavioral synthesis 1E+0 1E+1 Logic synthesis 1980 1985 1990 1995 2000 2005 2010 Circuit tuning Place/route, placement-driven synthesis Circuit analysis/extraction Manufacturing enhancement

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research

Impact of the

Architecture Special functions for the trends Enhanced ISA functionality game and embedded Virtualization/LPAR space

Power Power-aware microarchitecture Low-power cpu cores/components Low-power circuits Low-power process technology Microarchitecture Power-aware pipelines Meet the frequency curve Availability Need for ILP and task parallelism continues Intra-processor redundancy Balanced designs for power/performance System component redundancy

Cost SoC-like design Component-based design SoC Better tools

LargeLarge systemssystems DesktopDesktop andand GameGame EmbeddedEmbedded

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research

Agenda

System and workload trends Impact on architecture and microarchitecture The Memory Wall Cellular architectures and IBM's Blue Gene Summary Overcoming challenges of design and verification complexity

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research TheThe MemoryMemory WallWall

The Memory Wall A prehistoric problem that won't go away anytime soon!

Processor speedup expected ~60% p.a.; DRAM speedup ~10% p.a. Increasing number of processor cycles as processor speeds have increased by an order of magnitude Implication is significant increase in CPI Typical on-chip L2 cache performance degradation now more than 2x the ideal (i.e. perfect) cache For multiprocessors, inter-cache latencies increase this degradation to 3x or more for 4 processors and up

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research TheThe MemoryMemory WallWall

Some approaches to attack the Memory Wall

Traditional techniques Larger caches, deeper cache structures Latency hiding via prefetching (h/w, s/w, both) Compilers/application software cognizant of the memory hierarchy Hardware multithreading Emerging research opportunities Reduced intercache/scaling effects via affinity scheduling of tasks Machine learning applied to code prefetching and code pre-positioning Self-optimizing cooperation between the hardware and software directives New computing paradigms with programming models designed to better tolerate memory latency

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research

Agenda

System and workload trends Impact on architecture and microarchitecture The Memory Wall Cellular architectures and IBM's Blue Gene Summary

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research CellularCellular Arch.Arch. andand BlueBlue GeneGene

Building cellular architectures with embedded processors

Compares favorably to 600 conventional systems Embedded Power efficiency 10-50x better processor Cost/performance 10x better 400 Cost is drastically lower than High perf conventional systems processor Exploitation of high 200 Performance redundancy helps availability 0 0 50 100 150 200 Power (Watts)

Web Serving performance: pages served (MB/s)

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research CellularCellular Arch.Arch. andand BlueBlue GeneGene

IBM's Blue Gene

Advance the state of the art in computer design and software for extremely large scale systems Blue Gene is a cellular architecture A homogeneous collection of simple independent processing units called cells, each with its own system image All cells have the same computational and communications capabilities (interchangeable from OS or application view) Integrated connection hardware provides a straightforward path to scalable systems with thousands/millions of cells Future blade systems could be cellular architectures Low-cost, high-performance, better power characteristics For many new high-growth apps

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research CellularCellular Arch.Arch. andand BlueBlue GeneGene

Cellular architectures user component-based design Building Blue Gene System (64 cabinets, 64x32x32)

Cabinet (32 Node boards, 8x8x16)

Node Board (32 chips, 4x4x2) 16 Compute Cards Compute Card (2 chips, 2x1x1)

Chip (2 processors) 180/360 TF/s 16 TB DDR 2.9/5.7 TF/s 256 GB DDR 440 core EDRAM 90/180 GF/s 440 core I/O 8 GB DDR 5.6/11.2 GF/s 2.8/5.6 GF/s 0.5 GB DDR 4 MB

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research CellularCellular Arch.Arch. andand BlueBlue GeneGene

Cellular architectures offer high levels of availability

We expect transient errors to be 10 times more common than permanent errors (1 per day vs. 1 in 10 days) Blue Gene supports semiautomatic, coordinated checkpointing Application re-launched on same partition after loading data from previous checkpoint - handles transient errors Approach insufficient for dealing with undetected soft errors - may occur once/month for machine with 64K nodes

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research

Agenda

System and workload trends Impact on architecture and microarchitecture The Memory Wall Cellular architectures and IBM's Blue Gene Summary

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation IBM Research

Summary

Exciting opportunities for microarchitecture, architecture, and design Integrated system design approach is key to performance, functionality, and development cost Memory Wall Architects must focus on enhanced functionality Virtualization Power awareness Autonomic computing support Low design cost and complexity demand component-based designs Computation model shift in new applications Leverage high-volume embedded processors This talk is dedicated to John Cocke, the father of Reduced Instruction-Set Computing (RISC).

System Trends and their Impact | MICRO 35 | Tilak Agerwala © 2002 IBM Corporation