Intel‘s new Core™ Microarchitecture Software and Solutions Group Holger Gruen, Senior Application Engineer, Global Developer Relationship Division July 14, 2006 INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS.EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY RELATING TO SALE AND/OR USE OF INTEL PRODUCTS, INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT, OR OTHER INTELLECTUAL PROPERTY RIGHT.
Intel may make changes to specifications, product descriptions, and plans at any time, without notice.
All dates provided are subject to change without notice.
* Other names and brands may be claimed as the property of others.
Copyright © 2006, Intel Corporation.
2 7/17/2006 Agenda
•Intels Multi-Core Roadmap •Intels new processor •Game Performance •Microarchitecture Details •Summary •Q&A
3 7/17/2006 Agenda
•Intels Multi-Core Roadmap •Intels new processor •Game Performance •Microarchitecture Details •Summary •Q&A
4 7/17/2006 IntelsMulti-Core Roadmap
2005 2006 2007 2008
1st quad- Presler XE Kentsfield (2C:4T) (4C:4T) core EM64T EM64T Smithfield XE (2C:4T) EM64T Smithfield Conroe Desktop (2C:2T) (2C:2T) EM64T EM64T Presler (2C:2T) EM64T The future is P4 based Pentium D multi-core CPUs
Yonah Merom (2C:2T) (2C:2T) EM64T Mobile 1st dual-core mobile cpu
Converged core line : Intel® Core™ Microarchitecture
5 7/17/2006 Agenda
•Intels Multi-Core Roadmap •Intels new processor •Game Performance •Microarchitecture Details •Summary •Q&A
6 7/17/2006 Introducing Intelsnew Mircoarchitecture
Lessons learned Innovations ® from NetBurst 4 wide vs 3 wide Microarchitecture
Wider and Deeper Machine Intel® Core™ 2 Improved SSE performance + New innovations Duo Processor Improved Memory Access Microarchitecture Improved Cache Improved Power Saving
Well working features of the Mobile Microarchitecture Other improvements: •Shorter pipeline: 14 stages •1 additional integer port •Improved latency for many ops
7 7/17/2006 Agenda
•Intels Multi-Core Roadmap •Intels new processor •Game Performance •Microarchitecture Details •Summary •Q&A
8 7/17/2006 Intel® Core™ 2 Duo 3D Game Performance
This is not the fastest Pre-production Processor code-name AMD* Athlon64* FX60 Processor Conroe —Conroe“ (4 MB L2, 2.66 GHz, 1066 MHz FSB) (2x1 MB L2 Cache, 2.60 GHz) r
o Conroe measured on pre-production hardware and drivers. Final performance information may vary from these results s
s Screen resolution was set to 1024x768x32. Games use medium settings. These settings are used s 2.00 Screen resolution was set to 1024x768x32. Games use medium settings. These settings are used e for comparing CPU contribution to game performance. c o r . P
)
z Most of these games use only one core ! 0 6 H X G
F 1.50 1.36 0 1.32
* 1.30 1.30 6 1.26 . 4 2 6
n , e o l
h 1.00 1.00 1.00 1.00 1.00 h c t a
A 1.00
C
* 2 D L
Conroe 32% Conroe 36% M B A Conroe 30% above comp Conroe 30% above comp M o 1 t Conroe 26% above comp above comp
x 0.50
d above comp 2 ( e z i l a m r o
N 0.00 Unreal* Tournament Doom* 3 build 1062 Quake* 4 patch 1.0.5 F.E.A.R* v1.02 Half-Life* 2 build 2596 2004 "Botmatch" SMP=1 Lost Coast
Source: Intel. Configuration: Pre-Production Processor code-name —Conroe“ (4MB L2 Cache, 2.66 GHz, 1066 MHz FSB),Intel® 975X Express Chipset, modified Intel® D975XBX board, pre-production BIOS; Memory 1GB DDR2 667 5-5-5-15 (2x512MB); Intel® Chipset Software Installation Utility 7.2.2.1006; AMD* Athlon* 64 FX60(2x 1MB, 2.60 GHz), ATI* Radeon* express* 200 chipset, DFI LanPartyUT RDX200 CF-DR board, BIOS RDXDC23.BIN 12/23/2005, Drivers = ATI 5-8-igp_xp-2k_dd_ccc_wdm_sb_gart_enu_25203.exe; Memory 1GB DDR 400 2-2-2-5 (2x512MB); All Platforms œ Dual ATI* Radeon* X1900 XTX PCIe, ATI Catalyst 6.3 Driver Suite: display driver version: 8.23.1;Maxtor* DiamondMax* 10 6B300S0 300GB NCQ Serial ATA (7200 RPM, 16MB cache), DirectX 9.0c, Operating System: Windows* XP Professional Build 2600 SP2 NTFS. Performance tests and ratings are measured using specific computer systems and/or components and reflect the approximate performance of Intel products as measured by those tests. Any difference in system hardware or software design or configuration may affect actual performance. Buyers should consult other sources of information to evaluate the performance of systems or components they are considering purchasing. For more information on performance tests and on the performance of Intel products, visit http://www.intel.com/performance/resources/index.htm *Other names and brands may be AMD expected to add DDR2 support mid 2006 claimed as the property of others 9 7/17/2006 Agenda
•Intels Multi-Core Roadmap •Intels new processor •Game Performance •Microarchitecture Details •Summary •Q&A
10 7/17/2006 Wider and Deeper Machine
•Queue of LRU code window •Small loops can reside in queue Instruction Fetch •Faster to fetch from queue and PreDecode •4 decoders vs 1 in NetBurst 2M/4M Instruction Queue •4-5 instructions decoded in parallel 5 shared L2 uCode •CMP+JMP macro-fused to 1 SOp ROM Decode Cache 4 •Improved SOps + microfusion up to Rename/Alloc •4 wide register renaming 10.4 Gb/s •4 wide SOp allocation FSB Retirement Unit 4 (ReOrder Buffer) •4 wide in order SOp retirement Schedulers ALU ALU ALU •Out of order scheduling Branch FAdd FMul MMX/SSE MMX/SSE MMX/SSE Load Store •Out of order execution FPmove FPmove FPmove L1 D-Cache and D-TLB
11 7/17/2006 ImprovedSSE performance
Only
Per clock cycle
SSE Latency is down to 1 cycle vs 3 cycles
12 7/17/2006 Improved Memory Access 1/2: Memory Disambiguation
Memory Memory Data W Data W 3 Load4 X Load4 X 4
Store3 W Data Z Store3 W 4 1 Data Z Load2 Y Load2 Y 2 3 Store1 Y 1 Store1 Y 2 Data Y Data Y
Without Disambiguation Data X With Disambiguation Data X Subsequent Loads Must Wait Subsequent Loads can decouple from Stores Ù Improved Latency Hiding
13 7/17/2006 Improved Memory Access 2/2: Prefetchers
Shared youngest L1 L2 Data Data Load4 Cache Cache Load3 Load2 Load1 oldest
Memory is far away and slow
14 7/17/2006 Improved Memory Access 2/2: Prefetchers
Shared youngest L1 L2 Data Data Load4 Cache Cache Load3 Load2 Load1 oldest Caches are closer and fast when they have the data
15 7/17/2006 Improved Memory Access 2/2: Prefetchers
Shared youngest L1 L2 Data Data Load4 Cache Cache Load3
Load2 Prefetchers now detect Load1 applications data reference patterns oldest
16 7/17/2006 Improved Memory Access 2/2: Prefetchers
Shared youngest L1 L2 Data Data Load4 Cache Cache Load3 And bring the data Load2 closer to caches to avoid latencies Load1 oldest
17 7/17/2006 Improved Memory Access 2/2: Prefetchers
Shared youngest • 8 PrefeLt1chers per twLo2-core processor Þ 2 Ddata prefetcherDpatear core Load4 Þ 1C iancshtreuction preCfeatcchheer per core Þ able to handle multiple simultaneous Load3 patterns • 2 pAenrd CbPriUn gp rtehfee dtcahtae rs in the L2 cache closer to caches to avoid Load2 Prefetchers now detect tralackteinngci emsultiple patterns per core Load1• Prefetacphpelricsamtioonnsit doar tdaemand traffic and reference patterns regulate —aggression“ oldest Caches are closer and fast when they have the data
18 7/17/2006 Improved Cache
Benefits: Shared Cache • Fast data sharing without FSB • Fast bandwidth sharing Core1 Core2 • No replicated data Plus: • 2X BW to L1 caches • Split stores down to Independent Cache L2 Cache (today) 11 cycles
Core1 Core2
Shared Cache adapts to mismatched loads. Independent Cache can thrash L2 L2 heavy app even when other cache is Cache Cache under-utilized
19 7/17/2006 Improved Power Saving
•Ultra fined grained power control •Turn on only required logic systems •Even during performance execution, parts can be shut off.
Improved Power Saving delivers more performance per watt
20 7/17/2006 Agenda
•Intels Multi-Core Roadmap •Intels new processor •Game Performance •Microarchitecture Details •Summary •Q&A
21 7/17/2006 Summary
•New Archprovides excellent SC speedup œ Biggest non-frequency related speedup in years •2 cores result in even bigger speedup •Use speedup for more advanced games •New Arch provides excellent Mips/Watt •Contact me mailto:[email protected]
22 7/17/2006 Agenda
•Intels Multi-Core Roadmap •Intels new processor •Game Performance •Microarchitecture Details •Summary •Q&A
23 7/17/2006 Q&A
24 7/17/2006