Distinguished Lecture 2003
Total Page:16
File Type:pdf, Size:1020Kb
Intel Distinguished Lecture 2003 Billion Transistor Processor Chips in Mainstream Enterprise Platforms of the Future Dileep Bhandarkar Architect at Large Enterprise Platforms Group Intel Corporation February 10th, 2003 Ninth International Symposium on High Performance Computer Architecture Anaheim, California Copyright © 2002 Intel Corporation. Intel Distinguished Lecture 2003 Outline Semiconductor Technology Evolution Moore’s Law Video Parallelism in Microprocessors Today Multiprocessor Systems The Billion Transistor Chip Summary ©2002, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others Intel Distinguished Lecture 2003 Birth of the Revolution -- The Intel 4004 Introduced November 15, 1971 108 KHz, 50 KIPs , 2300 10m transistors Intel Distinguished Lecture 2003 2001 – Pentium® 4 Processor Introduced November 20, 2000 @1.5 GHz core, 400 MT/s bus 42 Million 0.18µ transistors August 27, 2001 @2 GHz, 400 MT/s bus 640 SPECint_base2000* 704 SPECfp_base2000* Source: http://www.specbench.org/cpu2000/results/ Intel Distinguished Lecture 2003 30 Years of Progress 40044004 toto PentiumPentium®® 44 processorprocessor – Transistor count: 20,000x increase – Frequency: 20,000x increase – 39% Compound Annual Growth rate Intel Distinguished Lecture 2003 2002 – Pentium® 4 Processor November 14, 2002 @3.06 GHz, 533 MT/s bus 1099 SPECint_base2000* 1077 SPECfp_base2000* 55 Million 130 nm process Source: http://www.specbench.org/cpu2000/results/ Intel Distinguished Lecture 2003 Itanium® 2 Processor Overview .18mm bulk, 6 layer Al Branch Unit Floating Point Unit process IA32 Pipeline Control 8 stage, fully stalled in- L1I order pipeline cache ALAT Multi- Integer Int Symmetric six integer- L1D Datapath Medi Symmetric six integer- RF a cache unit issue design CLK HPW IA32 execution engine DTLB integrated 3 levels of cache on-die 21.6 mm L2D Array and Control L3 Tag totaling 3.3MB 221 Million transistors Bus Logic 130W @1GHz, 1.5V 421 mm2 die 2 142 mm CPU core L3 Cache 19.5mm Intel Distinguished Lecture 2003 Madison Processor Branch Unit Floating Point Unit ~1.5 GHz Pipeline Control 6 MB L3 Cache L1I IA32 Cache Multi- 24 way set ALAT Integer Int 24 way set Media L1D DataPath Cache RF Unit associative CLK HPW DTLB ~130W L2D Cache 410 M transistors L3 Tag/LRU PADS 374 square mm Bus Logic L3 Cache Intel Distinguished Lecture 2003 Continuing at this Rate by End of the Decade 10,000 1,000 Itanium ® 2 100 • 221M in 2002 • 410M in 2003 Pentium® 4 Million 10 Million Pentium® Pro Transistors 1 486 Pentium proc 386 0.1 286 8085 8086 0.01 8080 8008 4004 0.001 ’70 ’80 ’90 ’00 ’10 Billion Transistors possible within 4 years Intel Distinguished Lecture 2003 “If the automobile industry advanced as rapidly as the semiconductor industry, a Rolls Royce would get 1/2 million miles per gallon and it would be cheaper to throw it away than to park it.” Gordon Moore, Intel Corporation Intel Distinguished Lecture 2003 Nanotechnology Advancements Influenza virus 50nm 90nm Node Lgate = 50nm 65nm Node Production - 2003 45nm Node Lgate = 30nm Lgate = 20nm Production - 2005 Production - 2007 30nm Node Lgate = 15nm Production - 2009 Process Advancements Fulfill Moore’s Law Source: Intel Intel Distinguished Lecture 2003 Semiconductor Manufacturing Process Evolution Actual Forecast Process name P852 P854 P856 P858 Px60 P1262 P1264 Production 1993 1995 1997 1999 2001 2003 2005 Generation 0.50 0.35 0.25 0.18µm 130 nm 90 nm 65 nm Gate Length 0.50 0.35 0.20 0.13 <70 nm <50 nm <35 nm Wafer Size (mm) 200 200 200 200 200/300 300 300 New generation every 2 years Intel Distinguished Lecture 2003 Outline Semiconductor Technology Evolution Moore’s Law Video Parallelism in Microprocessors Today Multiprocessor Systems The Billion Transistor Chip Summary ©2002, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others Intel Distinguished Lecture 2003 Moore’s Law Intel Distinguished Lecture 2003 Outline Semiconductor Technology Evolution Moore’s Law Video Parallelism in Microprocessors Today Multiprocessor Systems The Billion Transistor Chip Summary ©2002, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries *Other names and brands may be claimed as the property of others Intel Distinguished Lecture 2003 Parallelism at Multiple Levels Within a processor – multiple issue processors with lots of execution units – wider superscalar – explicit parallelism Multiple processors on a chip – Hardware Multi Threading – Multiple cores System Level Multiprocessors Intel Distinguished Lecture 2003 EPICEPIC ArchitectureArchitecture FeaturesFeatures Explicitly Parallel Instruction Computing Enable wide execution by providing processor implementations that compiler can take advantage of Performance through parallelism – Multiple execution units and issue ports in parallel – 2 bundles (up to 6 Instructions) dispatched every cycle Massive on-chip resources – 128 general registers, 128 floating point registers – 64 predicate registers, 8 branch registers – Exploit parallelism – Efficient management engines (register stack engine) Provide features that enable compiler to reschedule programs using advanced features (predication, speculation) Enable, enhance, express, and exploit parallelism Intel Distinguished Lecture 2003 Instruction Formats: Bundles 127 87 86 46 45 5 4 0 Instruction Slot 2 Instruction Slot 1 Instruction Slot 0 Template (41 bits) (41 bits) (41 bits) (5 bits) 128 bits Template identifies types of instructions in bundle and delineates independent operations (through “stops”) Instruction types Template encodes types – M: Memory – MII, MLX, MMI, MFI, MMF, MI_I, M_MI – I: Shifts and multimedia – Branch: MIB, MMB, MFB, MBB, – A: ALU BBB – B: Branch Template encodes – F: Floating point parallelism – L+X: Long – L+X: Long – All come in two flavors: with and without stop at end Intel Distinguished Lecture 2003 Itanium® 2 Processor Architecture Itanium 2 processor 6.4 GB/s High speed 128 bits wide System bus 400 MHz System bus 3 MB L3, 256k L2, 32k L1 all on-die Large on-die cache, reduced latency 8-stage pipeline 8 1 2 3 4 5 6 7 8 9 10 11 11 Issue ports 328 on-board Registers Large number of Execution units Lots of 6 Integer, 2 FP, 2 Load & 3 Branch 1 SIMD 2 Store parallelism within a 1 GHz single core 6 Instructions / Cycle Itanium 2 processor 221 million transistors total 25 million in CPU core logic Processor Structure Intel Distinguished Lecture 2003 Itanium® 2 Processor Block Diagram L1 Instruction Cache and ECC ECC ITLB Fetch/Pre-fetch Engine IA-32 Branch Instruction Decode Prediction 8 bundles Queue and Control 11 Issue Ports B B B M M M M I I F F Register Stack Engine / Re-Mapping Quad Port Branch & Predicate 128 Integer Registers 128 FP Registers – Registers L3 Cache Branch Integer Quad-Port Units and L1 L2 Cache MM Units Data Floating ALAT ,NaTs, Exceptions Cache Point Scoreboard, Predicate and Units ECC ECC DTLB ECC ECC ECC Bus Controller Intel Distinguished Lecture 2003 Integer & FP Performance SPECint2000_base SPECfp2000_base 1400 1200 1000 800 600 400 200 0 Power 4 1.3 GHz Alpha 21264C 1.25 GHz Itanium 2 Processor 1 GHz UltraSPARC III Cu 1.056 GHz Pentium 4 Processor 2.8 GHz Source: www.specbench.org, 9/10/2002 *Other names and brands may be claimed as the property of others. Intel Distinguished Lecture 2003 Long Latency DRAM Accesses: Needs Memory Level Parallelism (MLP) 1400 1200 1000 800 600 400 200 Peak Instructions during DRAM Access 0 Pentium® Pentium-Pro Pentium III Pentium 4 Future CPUs Processor Processor Processor Processor 66 MHz 200 MHz 1100 MHz 2 GHz Intel Distinguished Lecture 2003 Multithreading Introduced on Intel® Xeon™ Processor MP Two logical processors for < 5% additional die area Executes two tasks simultaneously – Two different applications – Two threads of same application CPU maintains architecture state for two processors – Two logical processors per physical processor Power efficient performance gain 20-30% performance improvement on many throughput oriented workloads Intel Distinguished Lecture 2003 HyperThreading Technology: What was added? Instruction Streaming Buffers Next Instruction Pointer Return Stack Predictor Trace Cache Next IP Trace Cache Fill Buffers Instruction TLB Register Alias Tables <5% die size (& max power), up to 30% performance increase Intel Distinguished Lecture 2003 IBM Power4 Dual Processor on a Chip Two cores (~30M transistors each) Large Shared L2: Multi-ported: 3 independent L3 & Mem slices Controller: L3 tags Chip-to-Chip & on-die for MCM-to-MCM full-speed Fabric: coherency Glueless SMP checks *Other names and brands may be claimed as the property of others Intel Distinguished Lecture 2003 HP PA-8800 Dual Processor on a Chip *Other names and brands may be claimed as the property of others Intel Distinguished Lecture 2003 Outline Semiconductor Technology Evolution Moore’s Law Video Parallelism in Microprocessors Today Multiprocessor Systems The Billion Transistor Chip Summary ©2002, Intel Corporation Intel, the Intel logo, Pentium, Itanium and Xeon are