Architectures of Processor Chips Basic Architectural Features
Total Page:16
File Type:pdf, Size:1020Kb
University of Athens Department of Informatics and Telecommunications Architectures of processor chips Ø Basic Architectural features Ø Single core processor chip ü Scalar processors –Pipelining ü Superscalars–ILP -threading Ø Amdahl’s Law Ø Models of Multicores Ø Multicores ü Symmetric MulticoreChips ü Asymmetric MulticoreChips ü Dynamic MulticoreChips Ø Multi-Threading Ø Accelerators ConstantinHalatsis 1 University of Athens Department of Informatics and Telecommunications Basic Architectural features Ø Memory Hierarchy (caches, Main memory, HDs,..) Ø Pipelining Ø Branch prediction Ø Instruction Level Paralelism–ILP Ø Data Level Parallelism –DLP Ø Thread Level Parallelism -TLP ConstantinHalatsis 2 1 PDF created with pdfFactory Pro trial version www.pdffactory.com University of Athens Department of Informatics and Telecommunications Example ConstantinHalatsis 3 University of Athens Department of Informatics and Telecommunications ConstantinHalatsis 4 2 PDF created with pdfFactory Pro trial version www.pdffactory.com University of Athens Department of Informatics and Telecommunications Basic techniques Based on the CDC 6000 Architecture Scoreboarding Important Feature: Scoreboard Issue: WAW, Decode: RAW, execute and write results: WAR Reorder Buffer TomasuloAlgorithm Implemented in the IBM360/91’s floating point unit. Important Feature: Reservation Station and CDB (Common Data Bus) Issue: tag if not available, copy if they are; Execute: stall RAW monitoring the CDB Write results: Send results to the CDB and dump the store buffercontents; Exception Handling: No instscan be issued until a branch can be resolved Register Renaming ConstantinHalatsis 5 University of Athens Department of Informatics and Telecommunications Single core processors Αρχιτεκτονικέςέναρξηςπολλώνεντολών: Αύξησητωνεντολώνανάκύκλο, IPC Εκμετάλλευσηπαραλληλίαςστοεπίπεδοτηςεντολής, ILP Common Issue Hazard Scheduling Distinguishing Examples Name Structure Detection characteristics Superscalar Dynamic Hardware Static In order execution Sun (static) UltraSPARCII and III Superscalar Dynamic hardware Dynamic Some out of order IBM Power2 (dynamic) execution Superscalar Dynamic Hardware Dynamic With Speculative out of Pentium 3 and 4 (speculative) speculation order execution εικοτολογίας VLIW / LIW Static Software Static No hazards between Trimedia, i860 issues packets EPIC Mostly Static Mostly Mostly Static Explicit Dependences Itanium Software marked by compiler Register Renaming TomasuloAlgorithm Reorder Buffer Scoreboarding ConstantinHalatsis 6 3 PDF created with pdfFactory Pro trial version www.pdffactory.com University of Athens Department of Informatics and Telecommunications Intel’s x86 line of processors SPECint92 Leveling off 10000 Prescott (2M) P4/3200 * ** * * Prescott (1M) 5000 P4/3060 * Northwood B P4/2400 * ** * P4/2800 P4/2000 * P4/2200 P4/1500* * 2000 P4/1700 PIII/600 PIII/1000 1000 * * PII/400 * PIII/500 PII/300 * *PII/450 500 * ~ 100*/10 years Pentium/200 * Pentium Pro/200 200 * Pentium/133 * * Pentium/166 * Pentium/120 100 Pentium/100 * Pentium/66 * 50 * 486-DX4/100 486/50 * * 486-DX2/66 486/33 *486-DX2/50 20 * 486/25 * 10 * 386/33 386/20 * 5 * 386/25 386/16 * 80286/12 2 * 80286/10 1 * 8088/8 0.5 * 0.2 * 8088/5 Year 79 1980 81 82 83 84 85 86 87 88 89 1990 91 92 93 94 95 96 97 98 99 2000 01 02 03 04 05 The integer performance of Intel’s x86 line of processors ConstantinHalatsis 7 University of Athens Department of Informatics and Telecommunications Επίδοσηεπεξεργαστών (Spec Int2000) 3. Source: F. Labonte, http://mos.stanford.edu/papers/flabonte_thesis.pdf ConstantinHalatsis 8 4 PDF created with pdfFactory Pro trial version www.pdffactory.com University of Athens Department of Informatics and Telecommunications ProcessorClock rate 3. Source: F. Labonte, http://mos.stanford.edu/papers/flabonte_thesis.pdf ConstantinHalatsis 9 University of Athens Department of Informatics and Telecommunications Power density evolution –Watt/mm2 3. Source: F. Labonte, http://mos.stanford.edu/papers/flabonte_thesis.pdf ConstantinHalatsis 10 5 PDF created with pdfFactory Pro trial version www.pdffactory.com University of Athens Department of Informatics and Telecommunications ILPprocessing: Pipelining Paradigms of ILP-processing Temporal parallelism Pipeline processors ConstantinHalatsis 11 University of Athens Department of Informatics and Telecommunications Pipelining Schemes TypesoftemporalparallelisminILP processors Sequential Prefetching Pipelined Pipeline processing EUs processors Overlapping the fetch Overlapping the execute Overlapping and further phases phases through pipelining all phases ii ii+1 F D EW F D E E i FDEW i 1 2 E3 D F EW i i ii F D E W i ii+1 i+1 ii+1 i i i+2 i+2 ii+2 i i+3 ii+3 Mainframes Early 34 35 mainframes Stretch (1961) IBM 360/91 (1967) Atlas 37 (1963) 36 CDC 7600 (1969) 38 Microprocessors IBM 360/91 (1967) i80286 39 (1982) 41 40 R2000 (1988) M68020 (1985) i8038642 (1985) M6803043 (1988) (F: fetchcycle, D: decodecycle, E: executecycle, W: writecycle) ConstantinHalatsis 12 6 PDF created with pdfFactory Pro trial version www.pdffactory.com University of Athens Department of Informatics and Telecommunications Sourcesofraisingclockfrequencies(3) No of pipeline stages 40 P4 Prescott (~30) 30 * Pentium 4 (~20) Core Duo 20 * Conroe Pentium Pro Athlon-64 (~12) (14) Athlon (12) * Pentium * K6 (6) * 10 (5) (6) * * * Year 1990 1995 2000 2005 Figure4.2:Number of pipelinestages in Intel’s and AMD’s processors ConstantinHalatsis 13 University of Athens Department of Informatics and Telecommunications POWER5 vs. POWER6 Pipeline Comparison Pre-Dec POWER5 Pipe Stages Pre-Dec POWER6 Pipe Stages 22 FO4 13 FO4 Pre-Dec Pre-Dec Pre-Dec I Cache I Cache 5 cycle 14 cycle 3-4 cycle Xmit 12 cycle I Cache redirect Branch redirect Branch IBUF Xmit Resolution Resolution Rotate Group Form Decode IBUF0 Assembly IBUF1 Group Xfer Pre-dispatch Group Disp Group Disp Rename 4 cycle RF Load-load AG/RF Issue 1 cycle Fx-FX RF DCache / Ex 2 cycle AG/Ex 2 cycle Fx-Fx Load-Fx DCache 3 cycle Load-load/fx D Cache Fmt1 Fmt Fmt2 6 cycle FP-FP 6 cycle FP to local FP use Writeback 5 cycle LD-FP 8 cycle FP to remote FP use Finish Writeback 0 cycle LD to FP use Completion Completion Chkpt ConstantinHalatsis 14 7 PDF created with pdfFactory Pro trial version www.pdffactory.com University of Athens Department of Informatics and Telecommunications ILPprocessing: VLIW Paradigms of ILP-processing Temporal parallelism Issue parallelism Static dependency resolution Pipeline VLIW processors processors ConstantinHalatsis 15 University of Athens Department of Informatics and Telecommunications VLIW processing Instructions Independent instructions (static dependency resolution) F F F E E E Processor VLIW: Very Large Instruction Word ConstantinHalatsis 16 8 PDF created with pdfFactory Pro trial version www.pdffactory.com University of Athens Department of Informatics and Telecommunications ILPprocessing-Superscalars Paradigms of ILP processing Temporal parallelism Issue parallelism Static Dynamicdependency dependency resolution resolution Pipeline VLIW Superscalar processors processors processors ConstantinHalatsis 17 University of Athens Department of Informatics and Telecommunications VLIW Superscalar processing Instructions processing Independent Dependent instructions instructions (static dependency resolution) Dynamic dependency resolution F F F F F F E E E E E E Processor Processor VLIW: Very Large Instruction Word ConstantinHalatsis 18 9 PDF created with pdfFactory Pro trial version www.pdffactory.com University of Athens Department of Informatics and Telecommunications ILP processing Paradigms of ILP processing Temporal parallelism Issue parallelism Data parallelism Static Dynamicdependency dependency resolution resolution Pipeline VLIW Superscalar SIMD processors processors processors extension ConstantinHalatsis 19 University of Athens Department of Informatics and Telecommunications ILP-processing Issueparallelism Data parallelism Static dependency resolution Sequential Temporal VLIW processors EPIC processors processing parallelism Pipeline Dynamic processors. dependency resolution Superscalarprocessors Superscalar proc.s with SIMDextension ~ ~ ~ ’95 -‘00 ‘85 ‘90 Figure 1.3:The emergence of ILP-paradigms andprocessor types ConstantinHalatsis 20 10 PDF created with pdfFactory Pro trial version www.pdffactory.com University of Athens Department of Informatics and Telecommunications Performance of ILP-processors Ideal case Real case Absolute performance 1 Sequential P = f * ai C CPI CPI i 1 Pipeline P = f * ai C CPI CPIi IP IP i i 1 VLIW/ Pai = fC * * IP superscalar CPI OPIi SIMD 1 extension P = f * * IP *OPI ao C CPI ConstantinHalatsis 21 University of Athens Department of Informatics and Telecommunications Performance components of ILP-processors: 1 P = f * * IP * OPI * h ao C CPI Clock Temporal Issue Data Efficiency of frequency parall. parall. parall. spec. exec. Pao = f C * IPC eff 1 with: IPC = * IP * OPI * h eff CPI Clock frequency Per cycle efficiency Depends on technology/ Depends on ISA, μarchitecture, system μarchitecture architecture, OS,compiler, application ConstantinHalatsis 22 11 PDF created with pdfFactory Pro trial version www.pdffactory.com University of Athens Department of Informatics and Telecommunications Options to implement issue parallelism VLIW (EPIC) instruction issue Static dependency resolution (3.2) Pipeline processing Superscalar instruction issue Dynamic dependency resolution (3.3) ConstantinHalatsis 23 University of Athens Department of Informatics and