Intel‘s new Core™ Software and Solutions Group Holger Gruen, Senior Application Engineer, Global Developer Relationship Division July 14, 2006 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH ® PRODUCTS.EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND/OR USE OF INTEL PRODUCTS, INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT, OR OTHER INTELLECTUAL PROPERTY RIGHT.

Intel may make changes to specifications, product descriptions, and plans at any time, without notice.

All dates provided are subject to change without notice.

* Other names and brands may be claimed as the property of others.

Copyright © 2006, Intel Corporation.

2 7/17/2006 Agenda

Multi-Core Roadmap •Intels new processor •Game Performance •Microarchitecture Details •Summary •Q&A

3 7/17/2006 Agenda

•Intels Multi-Core Roadmap •Intels new processor •Game Performance •Microarchitecture Details •Summary •Q&A

4 7/17/2006 IntelsMulti-Core Roadmap

2005 2006 2007 2008

1st quad- Presler XE Kentsfield (2C:4T) (4C:4T) core EM64T EM64T Smithfield XE (2C:4T) EM64T Smithfield Conroe Desktop (2C:2T) (2C:2T) EM64T EM64T Presler (2C:2T) EM64T The future is P4 based D multi-core CPUs

Yonah (2C:2T) (2C:2T) EM64T Mobile 1st dual-core mobile cpu

Converged core line : Intel® Core™ Microarchitecture

5 7/17/2006 Agenda

•Intels Multi-Core Roadmap •Intels new processor •Game Performance •Microarchitecture Details •Summary •Q&A

6 7/17/2006 Introducing Intelsnew Mircoarchitecture

Lessons learned Innovations ® from NetBurst 4 wide vs 3 wide Microarchitecture

Wider and Deeper Machine Intel® Core™ 2 Improved SSE performance + New innovations Duo Processor Improved Memory Access Microarchitecture Improved Cache Improved Power Saving

Well working features of the Mobile Microarchitecture Other improvements: •Shorter pipeline: 14 stages •1 additional integer port •Improved latency for many ops

7 7/17/2006 Agenda

•Intels Multi-Core Roadmap •Intels new processor •Game Performance •Microarchitecture Details •Summary •Q&A

8 7/17/2006 Intel® Core™ 2 Duo 3D Game Performance

This is not the fastest Pre-production Processor code-name AMD* Athlon64* FX60 Processor Conroe —Conroe“ (4 MB L2, 2.66 GHz, 1066 MHz FSB) (2x1 MB L2 Cache, 2.60 GHz) r

o Conroe measured on pre-production hardware and drivers. Final performance information may vary from these results s

s Screen resolution was set to 1024x768x32. Games use medium settings. These settings are used s 2.00 Screen resolution was set to 1024x768x32. Games use medium settings. These settings are used e for comparing CPU contribution to game performance. c o r . P

)

z Most of these games use only one core ! 0 6 H X G

F 1.50 1.36 0 1.32

* 1.30 1.30 6 1.26 . 4 2 6

n , e o l

h 1.00 1.00 1.00 1.00 1.00 h c t a

A 1.00

C

* 2 D L

Conroe 32% Conroe 36% M B A Conroe 30% above comp Conroe 30% above comp M o 1 t Conroe 26% above comp above comp

x 0.50

d above comp 2 ( e z i l a m r o

N 0.00 Unreal* Tournament Doom* 3 build 1062 Quake* 4 patch 1.0.5 F.E.A.R* v1.02 Half-Life* 2 build 2596 2004 "Botmatch" SMP=1 Lost Coast

Source: Intel. Configuration: Pre-Production Processor code-name —Conroe“ (4MB L2 Cache, 2.66 GHz, 1066 MHz FSB),Intel® 975X Express Chipset, modified Intel® D975XBX board, pre-production BIOS; Memory 1GB DDR2 667 5-5-5-15 (2x512MB); Intel® Chipset Software Installation Utility 7.2.2.1006; AMD* Athlon* 64 FX60(2x 1MB, 2.60 GHz), ATI* Radeon* express* 200 chipset, DFI LanPartyUT RDX200 CF-DR board, BIOS RDXDC23.BIN 12/23/2005, Drivers = ATI 5-8-igp_xp-2k_dd_ccc_wdm_sb_gart_enu_25203.exe; Memory 1GB DDR 400 2-2-2-5 (2x512MB); All Platforms œ Dual ATI* Radeon* X1900 XTX PCIe, ATI Catalyst 6.3 Driver Suite: display driver version: 8.23.1;Maxtor* DiamondMax* 10 6B300S0 300GB NCQ Serial ATA (7200 RPM, 16MB cache), DirectX 9.0c, Operating System: Windows* XP Professional Build 2600 SP2 NTFS. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit http://www.intel.com/performance/resources/index.htm *Other names and brands may be AMD expected to add DDR2 support mid 2006 claimed as the property of others 9 7/17/2006 Agenda

•Intels Multi-Core Roadmap •Intels new processor •Game Performance •Microarchitecture Details •Summary •Q&A

10 7/17/2006 Wider and Deeper Machine

•Queue of LRU code window •Small loops can reside in queue Instruction Fetch •Faster to fetch from queue and PreDecode •4 decoders vs 1 in NetBurst 2M/4M Instruction Queue •4-5 instructions decoded in parallel 5 shared L2 uCode •CMP+JMP macro-fused to 1 SOp ROM Decode Cache 4 •Improved SOps + microfusion up to Rename/Alloc •4 wide register renaming 10.4 Gb/s •4 wide SOp allocation FSB Retirement Unit 4 (ReOrder Buffer) •4 wide in order SOp retirement Schedulers ALU ALU ALU •Out of order scheduling Branch FAdd FMul MMX/SSE MMX/SSE MMX/SSE Load Store •Out of order execution FPmove FPmove FPmove L1 D-Cache and D-TLB

11 7/17/2006 ImprovedSSE performance

Only

Per clock cycle

SSE Latency is down to 1 cycle vs 3 cycles

12 7/17/2006 Improved Memory Access 1/2: Memory Disambiguation

Memory Memory Data W Data W 3 Load4 X Load4 X 4

Store3 W Data Z Store3 W 4 1 Data Z Load2 Y Load2 Y 2 3 Store1 Y 1 Store1 Y 2 Data Y Data Y

Without Disambiguation Data X With Disambiguation Data X Subsequent Loads Must Wait Subsequent Loads can decouple from Stores Ù Improved Latency Hiding

13 7/17/2006 Improved Memory Access 2/2: Prefetchers

Shared youngest L1 L2 Data Data Load4 Cache Cache Load3 Load2 Load1 oldest

Memory is far away and slow

14 7/17/2006 Improved Memory Access 2/2: Prefetchers

Shared youngest L1 L2 Data Data Load4 Cache Cache Load3 Load2 Load1 oldest Caches are closer and fast when they have the data

15 7/17/2006 Improved Memory Access 2/2: Prefetchers

Shared youngest L1 L2 Data Data Load4 Cache Cache Load3

Load2 Prefetchers now detect Load1 applications data reference patterns oldest

16 7/17/2006 Improved Memory Access 2/2: Prefetchers

Shared youngest L1 L2 Data Data Load4 Cache Cache Load3 And bring the data Load2 closer to caches to avoid latencies Load1 oldest

17 7/17/2006 Improved Memory Access 2/2: Prefetchers

Shared youngest • 8 PrefeLt1chers per twLo2-core processor Þ 2 Ddata prefetcherDpatear core Load4 Þ 1C iancshtreuction preCfeatcchheer per core Þ able to handle multiple simultaneous Load3 patterns • 2 pAenrd CbPriUn gp rtehfee dtcahtae rs in the L2 cache closer to caches to avoid Load2 Prefetchers now detect tralackteinngci emsultiple patterns per core Load1• Prefetacphpelricsamtioonnsit doar tdaemand traffic and reference patterns regulate —aggression“ oldest Caches are closer and fast when they have the data

18 7/17/2006 Improved Cache

Benefits: Shared Cache • Fast data sharing without FSB • Fast bandwidth sharing Core1 Core2 • No replicated data Plus: • 2X BW to L1 caches • Split stores down to Independent Cache L2 Cache (today) 11 cycles

Core1 Core2

Shared Cache adapts to mismatched loads. Independent Cache can thrash L2 L2 heavy app even when other cache is Cache Cache under-utilized

19 7/17/2006 Improved Power Saving

•Ultra fined grained power control •Turn on only required logic systems •Even during performance execution, parts can be shut off.

Improved Power Saving delivers more performance per watt

20 7/17/2006 Agenda

•Intels Multi-Core Roadmap •Intels new processor •Game Performance •Microarchitecture Details •Summary •Q&A

21 7/17/2006 Summary

•New Archprovides excellent SC speedup œ Biggest non-frequency related speedup in years •2 cores result in even bigger speedup •Use speedup for more advanced games •New Arch provides excellent Mips/Watt •Contact me mailto:[email protected]

22 7/17/2006 Agenda

•Intels Multi-Core Roadmap •Intels new processor •Game Performance •Microarchitecture Details •Summary •Q&A

23 7/17/2006 Q&A

24 7/17/2006