Memory Subsystem
Total Page:16
File Type:pdf, Size:1020Kb
Power 7 Dan Christiani Kyle Wieschowski History 1980 - 2000 ● 1980 RISC Prototype ● 1990 POWER1 (Performance Optimization With Enhanced RISC) (1 um) ● 1993 IBM launches 66MHz POWER2 (.35 um) ● 1997 POWER2 ‘Super Chip’ POWER1 History 1980 - 2000 ● 1980 RISC Prototype ● 1990 POWER1 (Performance Optimization With Enhanced RISC) (1 um) ● 1993 IBM launches 66MHz POWER2 (.35 um) ● 1997 POWER2 ‘Super Chip’ ● 1998 POWER3 (.22 um) 64-bit (POWER2+PowerPC) History 2000-2007 ● 2001 POWER4 (180 nm) - Dual Core ● 2004 POWER5 (130 nm) - SMT ● 2006 POWER6 (65 nm) - High Frequency ○ 4.7 GHz - Dual Core ○ First server to hold all major benchmark records ○ 3x faster than the comparable Intel Itanium processor ● 2010 POWER7 (45 nm) - Cores, eDRAM Power 7 Architectural Focus ● Reduce core area and power ○ Frequency is lowered to reduce power ● Fit the chip in the same sockets as POWER6 ● Utilize the same SMP and I/O buses ○ At higher frequencies ● Remove external L3 cache chips ● Double floating-point capability of each core Architecture Overview ● 8 Cores ○ 12 execution units ○ Four-way SMT ○ Integrated L2 cache ● 2 Memory Controllers ○ 4 channels of DDR3 ● Shared L3 Cache ● 5 SMP Links ○ Allows 32 sockets The Core ● 6 Primary Units ○ IFU, ISU, LSU, FXU, VSU, and decimal FPU ● 12 Execution Units ○ 2 fixed point, 2 load/store, 4 double-precision, 1 vector, 1 branch, 1 decimal FP, 1 control register ● In a given cycle: ○ Fetch up to 8 instructions ○ Decode and dispatch up to 6 instructions ○ Issue and execute up to 8 instructions Instruction Fetch Unit (IFU) ● Feeds pipeline with most likely instructions ○ Based on branch prediction ● Maintains balance of instruction execution ○ Based on software-defined thread priority ● Decodes and groups instructions ● Executes branch instructions Instruction-Sequencing Unit (ISU) ● Dispatches instructions ○ As groups to a single thread ● Renames registers ● Completes instructions ○ Global Completion Table ○ As groups also ● Handles exception conditions ● In charge of flushing core Load/Store Unit (LSU) ● 2 symmetric LS execution pipelines (OoO) ○ 1 load or store operation each ● Dependencies: ○ 1 stall between load and FXU operations ○ 2 stalls between load and VSU operations ● Also executes FX add and logical instructions ● SRQ - 32 outstanding stores can be issued ● LRQ - 32 outstanding loads can be issued Fixed-Point Unit (FXU) ● Two identical pipelines ● Containing: ○ Multiport GPR file ○ ALU, Divider, and Multiplier ○ Rotator ○ Count leading zeros unit ○ Bit-select unit ○ Miscellaneous unit (to execute population count, parity, and binary-coded decimal assist instructions) Vector and Scalar Unit (VSU) ● Vector instructions for ○ Vector modification: e.g. Merge, Shift, ○ Load/Store ○ Arithmetic - no Divide ○ Floating Point Arithmetic- no divide Cache ● Private 32 KB Level 1 caches ○ Instruction Cache integrated with the IFU ○ Data Cache integrated with the LSU ● Private 256 KB Level 2 caches ○ 8-way set associative ● 32 MB Level 3 cache ○ 4 MB of Local L3 (comprised of 32 eDRAM macros) ○ 28 MB of Global L3 Memory Subsystem ● 2 Memory Controllers ○ Synchronous Region: ■ Services reads and writes ■ Arbitrates among conflicting requests ■ Manages coherence directory information ○ Asynchronous Region: ■ Manages traffic through channels/buffer chips ■ Schedules reads, writes, and maintenance ■ Balances utilization of resources The Future of Power POWER8 (mid-2014) ● 22 nm Design ● SMT8 ● 12 Core: ○ 10 Issue ○ 16 Execution Pipes ■ 2 FXU, 2 LSU, 2 LU ■ 4 FPU, 2 VMX ■ 1 Crypto, 1 DFU ■ 1 CR, 1 BR ○ 64 KB L1, external 128 MB L3 ○ 2x Estimated performance during max SMT OpenPower “The OpenPOWER Consortium brings together an ecosystem of hardware, system software, and enterprise applications that will provide powerful computing systems based on NVIDIA GPUs and POWER CPUs” POWER8 + Infrastructure + CUDA = NextGen DataCenter Questions? References http://www-05.ibm.com/cz/events/febannouncement2012/pdf/power_architecture.pdf https://www-950.ibm.com/events/wwe/grp/grp030.nsf/vLookupPDFs/Tour%20P8%20Charts/$file/Tour%20P8% 20Charts.pdf http://www.theregister.co.uk/2013/08/27/ibm_power8_server_chip/ http://studies.ac.upc.edu/ETSETB/SEGPAR/microprocessors/power2%20%28mpr%29.pdf.