Inside the Pentium® 4 Processor Micro-Architecture
Total Page:16
File Type:pdf, Size:1020Kb
InsideInside thethe Fall 2000 PentiumPentium®® 44 ProcessorProcessor MicroMicro--architecturearchitecture NextNext GenerationGeneration IAIA--3232 MicroMicro--architecturearchitecture DougDoug CarmeanCarmean PrincipalPrincipal ArchitectArchitect IntelIntel ArchitectureArchitecture GroupGroup August 24, 2000 Intel Copyright © 2000 Intel Corporation. Labs AgendaAgenda Fall 2000 l IA-32 Processor Roadmap l Design Goals l Frequency l Instructions Per Cycle l Summary Intel Copyright © 2000 Intel Corporation. PDX ® ® IntelIntel PentiumPentium 44 ProcessorProcessor Fall 2000 Intel® NetBurst™ Micro-Architecture Today P6 Micro-Architecture Performance Performance P5 Micro-Architecture 486 Micro-architecture Time Intel Copyright © 2000 Intel Corporation. PDX Intel®Intel® Pentium®Pentium® 44 ProcessorProcessor DesignDesign GoalsGoals Fall 2000 l Deliver world class performance across both existing and emerging applications l Deliver performance headroom and scalability for the future MicroMicro--architecturearchitecture that that will will Drive Drive Performance Performance LeadershipLeadership for for the the Next Next Several Several Years Years Intel Copyright © 2000 Intel Corporation. PDX Intel®Intel® NetBurst™NetBurst™ MicroMicro--architecturearchitecture Fall 2000 400 MHz Advanced System Advanced Transfer Bus Transfer Cache Advanced Dynamic Execution Hyper Pipelined Rapid Rapid Technology Execution Engine Streaming SIMD Extensions 2 Execution Enhanced Floating Trace Cache Point / Multi-Media Intel Copyright © 2000 Intel Corporation. PDX Pentium®Pentium® 44 ProcessorProcessor BlockBlock DiagramDiagram L2 Cache and Control Store BTB AGU Load AGU TLB ALU TLB - - ALU ALU TLB TLB 3.2 GB/s System Interface - - Integer RF 3 3 Integer RF ALU Queues Queues Decoder Decoder Schedulers Schedulers uop uop Cache and D BTB & I Cache and D BTB & I Trace Cache Trace Cache FP move - - Rename/Alloc Rename/Alloc FP store FMul FP RF L1 D FP RF FAdd L1 D uCode MMX ROM SSE Pentium®Pentium® 44 ProcessorProcessor BlockBlock DiagramDiagram L2 Cache and Control Store BTB AGU Load AGU TLB ALU TLB - - ALU ALU TLB TLB 3.2 GB/s System Interface - - Integer RF 3 3 Integer RF ALU Queues Queues Decoder Decoder Schedulers Schedulers uop uop Cache and D BTB & I Cache and D BTB & I Trace Cache Trace Cache FP move - - Rename/Alloc Rename/Alloc FP store FMul FP RF L1 D FP RF FAdd L1 D uCode MMX ROM SSE CPUCPU ArchitectureArchitecture 101101 Fall 2000 DeliveredDelivered PerformancePerformance == FrequencyFrequency ** InstructionsInstructions PerPer CycleCycle FrequencyFrequencyFrequency Intel Copyright © 2000 Intel Corporation. PDX FrequencyFrequency Fall 2000 l What limits frequency? –Process technology –Microarchitecture l On a given process technology –Fewer gates per pipeline stage will deliver higher frequency FrequencyFrequency isis drivendriven byby MicroMicro--architecturearchitecture Intel Copyright © 2000 Intel Corporation. PDX NetburstNetburstTM MicroMicro--architecturearchitecture PipelinePipeline vsvs P6P6 Fall 2000 Basic P6 Pipeline Intro at 733MHz 1 2 3 4 5 6 7 8 .18µ9 10 Fetch Fetch Decode Decode Decode Rename ROB Rd Rdy/Sch Dispatch Exec Basic Pentium® 4 Processor Pipeline Intro at 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16³³ 171.4GHz18 19 20 TC Nxt IP TC Fetch Drive Alloc Rename Que Sch Sch Sch Disp Disp RF RF Ex.18µFlgs Br Ck Drive HyperHyper pipelinedpipelined TechnologyTechnology enablesenables industryindustry leadingleading performanceperformance andand clockclock raterateIntel Copyright © 2000 Intel Corporation. PDX HyperHyper PipelinedPipelined Fall 2000 TechnologyTechnology 20 Netburst™ Micro- Today Architecture 1.13GHz ³³1.4GHz 10 P6 Micro-Architecture Frequency Frequency 233MHz 166MHz 5 60MHz P5 Micro-Architecture Introduction Time Intel Copyright © 2000 Intel Corporation. PDX HyperHyper pipelinedpipelined TechnologyTechnology Fall 2000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 TC Nxt IP TC Fetch Drive Alloc Rename Que Sch Sch Sch Disp Disp RF RF Ex Flgs Br Ck Drive TC Nxt IP: Trace cache next instruction pointer Pointer from the BTB, indicating location of next instruction. L2 Cache and Control BTB AGU AGU ALU TLB - ALU ALU TLB System Interface - 3 3 Integer RF ALU Queues Decoder Schedulers uop BTB & I Trace Cache Cache and D - Rename/Alloc Fms L1 D FP RF Fop ROM Intel Copyright © 2000 Intel Corporation. PDX HyperHyper PipelinedPipelined TechnologyTechnology Fall 2000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 TC Nxt IP TC Fetch Drive Alloc Rename Que Sch Sch Sch Disp Disp RF RF Ex Flgs Br Ck Drive TC Fetch: Trace cache fetch Read the decoded instructions (uOPs) out of the Execution Trace Cache L2 Cache and Control BTB AGU AGU ALU TLB - ALU ALU TLB System Interface - 3 3 Integer RF ALU Queues Decoder Schedulers uop BTB & I Trace Cache Cache and D - Rename/Alloc Fms Trace Cache L1 D FP RF Fop ROM Intel Copyright © 2000 Intel Corporation. PDX HyperHyper PipelinedPipelined TechnologyTechnology Fall 2000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 TC Nxt IP TC Fetch Drive Alloc Rename Que Sch Sch Sch Disp Disp RF RF Ex Flgs Br Ck Drive Drive: Wire delay Drive the uOPs to the allocator L2 Cache and Control BTB AGU AGU ALU TLB - ALU ALU TLB System Interface - 3 3 Integer RF ALU Queues Decoder Schedulers uop BTB & I Trace Cache Cache and D - Rename/Alloc Fms L1 D FP RF Fop ROM Intel Copyright © 2000 Intel Corporation. PDX HyperHyper PipelinedPipelined TechnologyTechnology Fall 2000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 TC Nxt IP TC Fetch Drive Alloc Rename Que Sch Sch Sch Disp Disp RF RF Ex Flgs Br Ck Drive Alloc: Allocate Allocate resources required for execution. The resources include Load buffers, Store buffers, etc.. L2 Cache and Control BTB AGU AGU ALU TLB - ALU ALU TLB System Interface - 3 3 Integer RF ALU Queues Decoder Schedulers uop BTB & I Trace Cache Cache and D - Rename/Alloc Rename/Alloc Fms L1 D FP RF Fop ROM Intel Copyright © 2000 Intel Corporation. PDX HyperHyper PipelinedPipelined TechnologyTechnology Fall 2000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 TC Nxt IP TC Fetch Drive Alloc Rename Que Sch Sch Sch Disp Disp RF RF Ex Flgs Br Ck Drive Rename: Register renaming Rename the logical registers (EAX) to the physical register space (128 are implemented). L2 Cache and Control BTB AGU AGU ALU TLB - ALU ALU TLB System Interface - 3 3 Integer RF ALU Queues Decoder Schedulers uop BTB & I Trace Cache Cache and D - Rename/Alloc Rename/Alloc Fms L1 D FP RF Fop ROM Intel Copyright © 2000 Intel Corporation. PDX HyperHyper PipelinedPipelined TechnologyTechnology Fall 2000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 TC Nxt IP TC Fetch Drive Alloc Rename Que Sch Sch Sch Disp Disp RF RF Ex Flgs Br Ck Drive Que: Write into the uOP Queue uOPs are placed into the queues, where they are held until there is room in the schedulers L2 Cache and Control BTB AGU AGU ALU TLB - ALU ALU TLB System Interface - 3 3 Integer RF ALU Queues Queues Decoder Schedulers uop BTB & I Trace Cache Cache and D uop - Rename/Alloc Fms L1 D FP RF Fop ROM Intel Copyright © 2000 Intel Corporation. PDX HyperHyper PipelinedPipelined TechnologyTechnology Fall 2000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 TC Nxt IP TC Fetch Drive Alloc Rename Que Sch Sch Sch Disp Disp RF RF Ex Flgs Br Ck Drive Sch: Schedule Write into the schedulers and compute dependencies. Watch for dependency to resolve. L2 Cache and Control BTB AGU AGU ALU TLB - ALU ALU TLB System Interface - 3 3 Integer RF ALU Queues Decoder Schedulers uop BTB & I Trace Cache Cache and D - Rename/Alloc Schedulers Fms L1 D FP RF Fop ROM Intel Copyright © 2000 Intel Corporation. PDX HyperHyper PipelinedPipelined TechnologyTechnology Fall 2000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 TC Nxt IP TC Fetch Drive Alloc Rename Que Sch Sch Sch Disp Disp RF RF Ex Flgs Br Ck Drive Disp: Dispatch Send the uOPs to the appropriate execution unit. L2 Cache and Control BTB AGU AGU ALU TLB - ALU ALU TLB System Interface - 3 3 Integer RF ALU Queues Decoder Schedulers uop BTB & I Trace Cache Cache and D - Rename/Alloc Fms L1 D FP RF Fop ROM Intel Copyright © 2000 Intel Corporation. PDX HyperHyper PipelinedPipelined TechnologyTechnology Fall 2000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 TC Nxt IP TC Fetch Drive Alloc Rename Que Sch Sch Sch Disp Disp RF RF Ex Flgs Br Ck Drive RF: Register File Read the register file. These are the source(s) for the pending operation (ALU or other). L2 Cache and Control BTB AGU AGU ALU TLB - ALU Integer RF ALU TLB System Interface - 3 3 Integer RF ALU Queues Decoder Schedulers uop BTB & I Trace Cache Cache and D - Rename/Alloc Fms L1 D FP RF FP RF Fop ROM Intel Copyright © 2000 Intel Corporation. PDX HyperHyper PipelinedPipelined TechnologyTechnology Fall 2000 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 TC Nxt IP TC Fetch Drive Alloc Rename Que Sch Sch Sch Disp Disp RF RF Ex Flgs Br Ck Drive Ex: Execute Execute the uOPs on the appropriate execution port. L2 Cache and Control BTB AGUAGU AGUAGU ALUALU TLB - ALUALU ALUALU TLB System Interface - 3 3 Integer RF ALUALU Queues Decoder Schedulers uop BTB & I Trace Cache Cache and D - Rename/Alloc Fms L1 D FP RF FopFop ROM Intel Copyright © 2000 Intel Corporation.