C O N F I D E N T I A L

Future of Supercomputing: The Computational Element

Tommy Toles 512-327-5389 [email protected] Agenda

• HPC – The New World Order

• Whole > Sum-of-Parts?

• Key Challenges

• A Look Ahead

2 All Worldwide Servers Compared To HPC – Revenue, Units & Processors All Servers Worldwide 2003 to 2006 2005 to 2006 2003 2004 2005 2006 CAGR CAGR Total Factory Revenue ($B) $46,149 $49,146 $51,268 $52,251 4.2% 1.9% Units Shipped (same as nodes) 5,278,222 6,307,484 7,050,099 7,472,649 12.3% 6.0% Processor Dies Shipped 8,662,823 10,134,624 11,712,766 12,779,159 13.8% 9.1% Source: IDC 2007

HPC Technical Servers Worldwide 2003 to 2006 2005 to 2006 2003 2004 2005 2006 CAGR CAGR HPC Server Revenue ($B) $5,698 $7,393 $9,208 $10,030 20.7% 8.9% Adjusted Revenues (To match enterprise$5,128 revenue definitions)$6,654 $8,287 $9,027 20.7% 8.9% Node Units Shipped 411,327 734,510 1,215,735 1,419,221 51.1% 16.7% Processor Elements Shipped 1,002,905 1,657,827 2,681,079 3,351,843 49.5% 25.0% Source: IDC 2007

HPC As A Ratio Of All Servers 2003 2004 2005 2006 Revenue ($B) 12.3% 15.0% 18.0% 19.2% Adjusted Revenues (Apples-to-apples) 11.1% 13.5% 16.2% 17.3% Units Shipped (Nodes) 7.8% 11.6% 17.2% 19.0% Processors Shipped 11.6% 16.4% 22.6% 26.1% Source: IDC 2007 Total HPC Revenue by Processor Type Source: IDC

100% 90% 80% 70% -64 60% x86-32 50% VECTOR 40% RISC 30% EPIC 20%

10% Revenuepercentage typeby CPU 0% 2000 2001 2002 2003 2004 2005 2006

X86 bases system took 63% share in 2006 Total HPC Revenue by OS Source: IDC

100%

80%

60% Linux Unix W/NT . 40% Other

20% Revenue percentagebyO/S 0% 2000 2001 2002 2003 2004 2005 2006

Linux systems accounted for 66% of the total revenue in 2006 Historical Performance Metrics

Performance

Time

6 AMD VSP Custo mer New Metric: Performance per Watt

Increase Performance within Power Envelope

Performance Innovations

Time

7 AMD VSP Custo mer Power – The final frontier…

The combined total of data centers in California for 2004 were estimated to require ~300MW of energy.

That’s equivalent to ~5000 barrels of oil a day!

SOURCE: California Energy Commission http://www.energy.ca.gov/reports/2004-04-07_500-04-004.PDF

9/27/ 8 2005 Feeding the Beast (Watts)

More High-Capacity CRAC More High-Voltage switch units (air-conditioners) Equipment Requirements $$$ $$$ More Back-up Power Generator Requirements $$$

More UPS equipment requirements $$$

Lower Density/ Unusable Data Center Expansion Floor Space $$$ $$$

9/27/ 9 2005 IT Knows it has a Problem

Power Consumption/Cooling Issues How Are Companies Addressing Tracked By Company: 71% These Issues

Increased Amount Of Power Consumption 12% Power Supplied To Data Center 44% Stopped Buying More Servers And/Or 26% Cooling 21% Consolidated Existing Equipment Implemented "Cool Aisle/Hot Aisle“ Both Equally Layout 25% Important 38% Increased Size Of Data Center 23% Suspect That Both are Issues but do not Track at this Time 12% Other 16%

Neither Presents An 17% Issue at this Time None Of The Above 15%

Base: 1,177 IT Decision Makers November 2005 Strategy Group/Ziff-Davis

Average of 18% of total rack space wasted due to power and cooling issues

9/27/ 10 2005 Watts From: Memory Next-Generation Power Comparison Northbridge CPU

In 2006 Next-Generation 83W AMD ™ Defined 83W A New Standard In 22W Performance-Per-Watt 22W

With Energy-Efficient DDR2 83W Memory and Improved 35W 35W AMD PowerNow!™ 22W Capabilities 290W In mid-2007 We Plan to 266W Offer Quad-Core AMD 190W 180W 190W Opteron in the Same DDR2-based Platforms at the Same Power Dual-Core Quad-Core Efficiency

Xeon Xeon Rev F „Dempsey‟ „Wood- „Clover- Quad-Core Crest‟‟ Town‟‟

Wattage based on 2P systems with 8 DIMMs at max CPU wattage; Wattage for „Dempsey‟, „Woodcrest‟ and „Clovertown‟ is estimated based on currently publicly available values (see, eg: 11 http://www.reghardware.co.uk/2006/05/25/intel_clovertown_power_specs/) and is subject to change. The examples contained herein are intended for informational purposes only. Other factors will affect real-world power consumption. At the Wall Power Comparison

12 AMD Power Efficiency Innovation

Independent Dynamic AMD CoolCore™ Technology Core Technology

Same Power And Thermal Envelopes As Dual-Core!

Dual Dynamic Power Low-Power DDR2 Memory Management™

13 Computational Requirements

14 Source: Andy Bechtolsheim , IEEE Grid & Cluster Conference, 2007 What Makes a Great CPU even better?

• Frequency Lift

• Instruction Set Enhancement

• Increasing Cores

• Core Capabilities

• Memory Access (Capacity, Bandwidth & Latency)

• CPU-CPU connectivity

• IO Connectivity (Data access)

• Software Scaling

15 Legacy Northbridge Server Architecture

FSB Sharing: •Memory Server Server •CPU-CPU Processor Processor •I/O nd Memory access delayed 2 Processor competes by passing through NB For FSB B/W I/O & memory compete for FSB B/W North Bridge PCI-X DDR PCI-X (MCH) Bridge

B/W bottlenecks: link B/W < I/O device B/W

IDE, FDC, South PCI USB, Etc. Bridge

16 AMD VSP Custo mer AMD Opteron™ “Direct Connect” Server Architecture

DDR AMD AMD Opteron™ Opteron™ DDR Processor Processor

•Reduced Bus Contention HyperTransport ™ bus •Separate Mem. And I/O Paths has ample bandwidth for I/O devices •Dedicated Memory Path AMD-8131™ •Reduced Memory Latency PCI-X PCI-X Bridge •Better Scalability

AMD-8111™ IDE, FDC, I/O PCI USB, Etc. Hub Fewer chips needed for basic server

17 AMD VSP Custo mer AMD Opteron™ Platform Historic MP Server

HyperTransport™ HyperTransport™ Technology Maximum of Four Technology Buses Enable Buses for Glueless I/O or CPU FSB Bus Bandwidth Shared Processors per Memory Glueless Expansion for up Expansion Across All Four Processors Controller Hub to 8-way Servers

Limited Memory Processor Processor AMD AMD Processor Processor DDR DDR Bandwidth 144-bit Opteron™ Opteron 144-bit Shared by All Memory

I/O & Memory Share FSB Memory DDR AMD AMD DDR DDR Address 144-bit Buffer4 144-bit Opteron Opteron 144-bit PCI-X PCI-X Bridge2 Memory DDR Address HyperTransport Memory Separate Memory and 144-bit Buffer PCI-X I/O Paths Eliminates Link Has Ample Ctlr Hub PCI-X Bridge Most Bus Contention Bandwidth For Memory (MCH) 1 I/O Devices DDR Address 144-bit Buffer PCI-X PCI-X PCI-X PCI-X PCI-X Bridge Bridge * Bridge Memory DDR Address 144-bit Buffer Bandwidth IDE, FDC, I/O Bottlenecks: Other Other IDE, USB, I/O PCI USB, Etc. 3 Link Bandwidth < I/O I/O LPC, Etc. Hub** Hub Bridge Key Bridge Bandwidth Memory Traffic Fewer Chips Needed for 4-way Server (Reduces Cost) I/O Traffic 9-Chip Chipset Needed for 4-way Server IPC Traffic

•Scalable memory and I/O bandwidth •System scalability limited by Northbridge – Up to 8 processors without glue logic – Maximum of 4 processors – Each processor adds more memory o Processors compete for FSB bandwidth – Each processor adds additional HyperTransport™ – Memory size and bandwidth are limited buses for more PCI-X and other I/O bridges – Maximum of 3 PCI-X bridges – Fewer chips required – Many more chips required

1 ServerWorks CMIC HE Memory Controller Hub (MCH) 2 ServerWorks CIOB-X 64-bit PCI/PCI-X Controller Hub *AMD-8131™ HyperTransport PCI-X Tunnel **AMD-8111™ HyperTransport I/O Hub 3 ServerWorks CSB5 I/O Controller Hub 4 ServerWorks REMC Memory Address18 Buffer AMD VSP Custo mer AMD Performance Innovation

AMD Wide Floating-Point AMD Memory Optimizer Accelerator Technology

~150%

100%

Comprehensive Dual-Core Quad-Core Performance Enhancements!

Rapid Virtualization Indexing™ AMD Balanced Smart Cache

19 Dual-Core to Quad-Core Uplift

Dual-Core AMD OpteronTM 2200 Series vs. Quad-Core AMD Opteron Model 2350 2 Socket Performance Scaling

VMmark >124% >124%

SPECint_rate2006 57%59%57%

SPECfp_rate2006 49%57% 54% Average Performance Stream memory bandwidth 23% 49% Increase

SPECompMbase2001 17%

50 100 150 200 250

100% = Dual-Core AMD Opteron Processor Performance

SPEC and the benchmark name SPECint, SPECfp and SPECOMPM are registered trademarks of the Standard Performance Evaluation Corporation. Benchmark results stated above for Dual-Core AMD Opteron™ processor Model 2222 reflect results published on www.spec.org as of Sep 9, 2007. The comparison presented above is based on results for Quad- Core AMD Opteron processor Model 2350 under submission to SPEC as of Sep 9, 2007. For the latest results visit http://www.spec.org/cpu2006/results/ and http://www.spec.org/omp/results/. Stream and VMmark results based on internal measurements at AMD performance labs.

20 Quad-Core vs. Dual-Die

AMD‟s design will be a TRUE quad-core processor without compromising performance, power or heat

Intel may rush a “dual die” architecture to market in order to claim “first to market”, only to change the design to true quad-core later - more churn and increased customer TCO

Native Quad-Core Design Dual Die (Dual Cavity) • Optimum performance • Can hinder performance (FSB design) • Same power & thermal • Publicly known thermal design power envelopes as dual-core ranges higher than dual-core products

21 Native Quad-Core Benefit: Faster Data Sharing Situation: Core 1 needs data in Core 3 cache … How Does it Get There?

Native Quad-Core AMD Opteron™ Quad-Core Clovertown Core 1 Core 2 Core 3 Core 4 L3 100011

Core 1 Core 2 Core 3 Core 4 L2 L2 100011 L2 L2 L2 L2 Front-Side Bus Front-Side Bus System Request Queue Crossbar

Hyper Transport™ Memory Controller Memory Controller

Northbridge

1. Core 1 probes Core 3 cache, data 1. Core 1 sends a request to the memory is copied directly back to Core 1 controller, which probes Core 3 cache 2. Core 3 sends data back to the memory controller, which forwards it to Core 1

This happens at processor frequency This happens at front-side bus frequency Result: Improved Result: Reduced Quad-Core Performance Quad-Core Performance

22 Performance-Per-Watt Leadership

Quad-Core AMD Opteron™ Processor Model 2350 (75 Watt ) vs. Xeon 5345 (80 Watt, without Additional Watts of Memory Controller and FBDIMM)

Fluent 6.4.3Fluent (sedan_4m) sedan_4m 67%

SPECompMBase2001SPECompM2001 Base 36%

SPECfp_rate_base2006 SPECfp_rate2006Both on GCC gcc 30%

SPECfp_rate2006 SPECfp_rate2006 PGI 27% Intel compiler vs. PGI compiler

LSDynaLSDyna 3 Vehicle 3 Vehicle Collision Collison 12% 26% Average Performance SPECint_rate_base2006 9% Increase SPECint_rate2006Both on GCCgcc

SPECint_rate2006SPECint_rate2006 PGI Intel compiler vs. PGI compiler -5%

50 70 90 110 130 150 170 190

100% = Intel Xeon 5345

SPEC and the benchmark name SPECint, SPECfp and SPECOMPM are registered trademarks of the Standard Performance Evaluation Corporation. Competitive benchmark results stated above reflect results published on www.spec.org as of Sep 9, 2007. The comparison presented above is based on results for Quad-Core AMD Opteron processor Model 2350 and Xeon 5345 (specint_rate2006 gcc and SPECompM2001 base) under submission to SPEC as of Sep 9, 2007. For the latest results visit http://www.spec.org/cpu2006/results/. Fluent and LSDyna result based on internal measurements at AMD performance labs.

23 AMD Server Platform Roadmap

Single Core Dual Core Quad Core CPU

Maximum Performance: 120W (Special Edition)

Mainstream: 95W Power Blades/Rack Dense: up to 68W

DDR-3

Registered DDR-2

Memory Registered DDR-1

HT – 2.6 GHz

HyperTransport - 1 GHz Interconnect 2003 2005 2007 2009

24 Investment Protection: Stable Platform Progression Long-term success for partners and end-customers 45nm 2003 2004 2005 2006 2007 2008 2009Octal-core 2010 2011 2012

rd 65nm 45nm 3 Generation Platform Quad-core Quad-core 90nm Dual Core

2nd Generation Platform

130nm 90nm Single Core Dual Core

1st Generation Platform

Stable platforms deliver better long-term value and logical transitions for partners and customers

25 Platforms in Market Today

PowerEdge DL585 2003 vs. SC 1435 2970 PowerEdge 2007 DL385 6950 X4600 DL365 X2200 DL145 X4500 BL465c IBM eServer 325 X4200/4100 X2100 BL685c

U20 & U40 WS xw9400

Blade X6220 E-9422R Blade X8420 E-9522R

E-9722R X3455 X3655 E-9222T

X3755 XT3™ LS21

LS41 XT4 1st Generation AMD Opteron™

AMD Validated Solutions BladeFrame G5450 ES and EX X630 S2

26 Customers are Responding! AMD Opteron Server Market Share

2005 2006

WW total 11.9% 16.0%

WW x86 12.8% 17.1%

WW x86 2-way 13.6% 18.2%

WW x86 4-way 28.2% 40.1%

US x86 20.8% 27.4%

US x86 2-way 21.2% 27.9%

US x86 4-way 36.0% 56.2%

Source: Gartner, IDC, end of year results

27 The Heterogeneous Processing Imperative Dual-Core Opteron

AMD64 HD, DRM

3D, digital media

486 Processing

-

CPU

bit -

Java, XML, web- services

64

bit

CPU+xPU -

32-bit Multi

DIVERSITY

64

Homogeneous Homogeneous Heterogeneous SingleE-mail, GUI, PERF. PowerPoint, web browsers Core Core Single

≤16-bit POWER/PERF. Platform Co Platform Single Spreadsheets,PERF. word-processing Core

1981 1990s 2000s 2010s By the end of the decade, homogenous multi-core becomes increasingly inadequate

28 Continuum of Solutions

“Fusion"

""

Accelerator Accelerator C

CPU P U

HTX Accelerator AMD NB Processor PCI-E Chipset Silicon level Package level integration Accelerator integration PCIe Accelerator (MCM)

Add-in Chipset Accelerator Accelerated Processors Opteron Socket Fusion – AMD’s code name for: Accelerated "Stream" Socket Processors (integrated acceleration) general compatible Torrenza – AMD’s code name for: purpose GPU accelerator slot or socket based acceleration

Stream – Specific example of a GPGPU Slot or Socket Acceleration accelerator under Torrenza

29 Torrenza: Accelerated Computing Today

• Direct Connect Accelerators in sockets or slots deliver • Familiar programming interfaces speed superior performance without bridge chips time to implementation • 100s of GFlops to solve complex math • Application specific optimization

• Application Partners: Partners: • • Stream Libraries • CTM • Celoxica • ASICs • Compilers • Celoxica • DRC • FPGA • OpenFPGA • Lattice • Hardware • Peakstream • Interfaces • XtremeData • Rapidmind Torrenza • IB Partners: • XML Partners: • Bay Microsystems • iSCSI • 3Leaf Systems • Commex • 10Gb E • Liquid • NetLogic Computing • Qlogic • Search • Mannheim • RMI • Storage • Scale Up • Newisys • • Security • Virtualization • Panta Systems • Woven

• Specialized Direct Connect devices for high- • cHT and HT provide peer level interfaces to build throughput, low-latency processing systems from commodity building blocks

30 March 2007 AMD Commercial How Do We Get There?

Silicon

Budgets Carbon

Collaboration

Facilities Algorithms

31 AMD Collaboration Resources

• Green Grid http://www.thegreengrid.org/home

• Developer Pages http://developer.amd.com/

• Torrenza Forum (accelerators) http://enterprise.amd.com/us-en/AMD-Business/Technology- Home/Torrenza.aspx

• Lightweight Profiling for increased parallellism http://developer.amd.com/lwp.jsp

• HyperTransport interconnect

http://www.hypertransport.org/

32 My Prediction…

Texas A&M 38

OSU 13

Gig „Em Aggies!

33 Disclaimer and Trademark Attribution

DISCLAIMER

The information contained herein is subject to change and may be rendered inaccurate for many reasons, including, but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN OR FOR THE PERFORMANCE OR OPERATION OF ANY PERSON, INCLUDING, WITHOUT LIMITATION, ANY LOST PROFITS, BUSINESS INTERRUPTION, DAMAGE TO OR DESTRUCTION OF PROPERTY, OR LOSS OF PROGRAMS OR OTHER DATA, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

© 2006 , Inc. AMD, the AMD Arrow logo, AMD Opteron, and combinations thereof, are trademarks of Advanced Micro Devices, Inc. Windows is a registered trademark of Microsoft Corporation in the U.S. and/or other countries. Linux is a registered trademark of Linus Torvalds. WebBench and NetBench are trademarks of Ziff Davis Publishing Holdings Inc., an affiliate of Veritest Inc. Other product and company names are for informational purposes only and may be trademarks of their respective companies.

34