Ultrasparc T1 Sparc History Sun + Sparc = Ultrasparc
Total Page:16
File Type:pdf, Size:1020Kb
ULTRASPARC T1 SUN + SPARC = ULTRASPARC THE PROCESSOR FORMERLY KNOWN AS “NIAGARA” Processor Cores Threads/Core Clock L1D L1I L2 Cache UltraSPARC IIi 1 1 550Mhz, 650Mhz 16KiB 16KiB 512KiB UltraSPARC IIIi 1 1 1.593Ghz I D 1MBa UltraSPARC III 1 1 1.05-1.2GHz 64KiB 32KiB 8MiBb UltraSPARC IV 2c 1 1.05-1.35Ghz 64KiB 32KiB 16MiBd UltraSPARC IV+ 1 2 1.5Ghz I D 2MiBe UltraSPARC T1 8 4 1.2Ghz 32KiB 16KiBf 3MiBg UltraSPARC T2h 16 (?) 8 2Ghz+ (?) ? ? ? Slide 1 Slide 3 aOn-chip bExternal, on chip tags cUltraSPARC III cores d8MiB per core e32MiB off chip L3 fI/D Cache per core g4 way banked hSecond-half 2007 This work supported by UNSW and HP through the Gelato Federation SPARC HISTORY INSTRUCTION SET ➜ Scalable Processor ARCHitecture ➜ RISC! ➜ 1985 – Sun Microsystems ➜ Berkeley RISC – 1980-1984 ➜ Load–store only through registers ➜ MIPS – 1981-1984 ➜ Fixed size instructions (32 bits) ➜ register + register Slide 2 Architecture v Implementation: Slide 4 ➜ register + 13 bit immediate ➜ SPARC Architecture ➜ Branch delay slot ➜ SPARC V7 – 1986 X Condition Codes ➜ SPARC Interntaional, Ltd – 1989 V (V9) CC and non-CC instructions ➜ SPARC V8 – 1990 V (V9) Compare on integer registers ➜ SPARC V9 – 1994 ➜ Synthesised instructions ➜ Privileged v Non-Privileged SUN + SPARC = ULTRASPARC 1 CODE EXAMPLE 2 CODE EXAMPLE V9 REGISTER WINDOWS void addr(void) { int i = 0xdeadbeef; } 00000054 <addr>: Slide 5 54: 9d e3 bf 90 save %sp, -112, %sp Slide 7 58: 03 37 ab 6f sethi %hi(0xdeadbc00), %g1 5c: 82 10 62 ef or %g1, 0x2ef, %g1 60: c2 27 bf f4 st %g1, [ %fp + -12 ] 64: 81 e8 00 00 restore 68: 81 c3 e0 08 retl 6c: 01 00 00 00 nop REGISTERS REGISTER WINDOWS TODAY X (V8) Window buffer needs to be flushed ➜ Large Register File %i0 − %i7 ➜ Fixed register windows X Kernel code has deep call chains ➜ Registers renamed ➜ Walks up and down a lot %l0 − %l7 Caller X Question over studies showing advantages %r8 <− r%32 ➜ save and restore Register Window ➜ State required for C compared to higher-level languages Slide 6 %o0 − %o7 %i0 − %i7 Slide 8 %r32 −>r%8 X Less is not more Callee ➜ Superscalar needs a lot of registers General Windowed Description %l0 − %l7 %r0 - %r7 %g0 - %g7 Global (all) ➜ Itanium %r8 - %r15 %o0 - %o7 Window output V Variable sized windows %r16 - %r23 %l0 - %l7 Window local %o0 − %o7 %r24 - %r31 %i0 - %i7 Window input V RSE deals with fill/spill V Allows for growth of underlying register file Register File V9 REGISTER WINDOWS 3 CONTEXTS 4 THROUGHPUT COMPUTING CONTEXTS 100 90 Primary memory conflict ASID 80 long fp ASID short fp Secondary ASID 70 long integer short integer ASID Nucleus 60 load delays control hazards Context Registers 50 branch misprediction Slide 9 Slide 11 dcache miss Processes 40 icache miss dtlb miss Percent of Total Issue Cycles Issue Total of Percent 30 itlb miss Address Spaces processor busy 20 Operating System 10 ➜ 0 Multiple Address spaces li ora swm doduc nasa7 fpppp alvinn su2cor eqntott hydro2d mdljdp2 mdljsp2 tomcatv espresso ➜ composite Primary, Alternate and explicit load instructions Applications ➜ 1 th < 8 issue machine requires 8 (12.5%) filled for CPI 1 TSB ➜ Translation Store Buffer is a direct mapped cache of ... ➜ Translation Table Entries ➜ sun4u, sun4v ULTRASPARC T1 ➜ Hardware pre-computes index into TSB (for 2 specified page sizes) ➜ 8 cores ➜ Software in fast fault handler can check if TTE valid ➜ 4 threads / core – thread group Small Pages Large Pages ➜ 32 way multi-threaded Slide 10 Slide 12 ➜ 5.76 IPC (CPI 0.17, efficiency 71%) TTE TTE ➜ Good luck finding the clock speed (1.2Ghz) TTE TTE TTE TTE ➜ 70 Watts – “Green Processor” TTE TTE TTE TTE ➜ UltraSPARC Architecture 2005 TTE TTE Small Page TSB Large Page TSB ➜ Same underlying principles as SPARC V9 Virtual Address Space THROUGHPUT COMPUTING 5 ULTRASPARC T1 PIPELINE 6 ULTRASPARC T1 PIPELINE CHIP RESOURCES (2) Sparc pipe DDR 4-way MT Dram control Fetch Thread select Decode Execute Memory Writeback Channel 0 L2 B0 Sparc pipe 4-way MT Register Sparc pipe DDR file 4-way MT Dram control × Channel 1 4 L2 B1 Sparc pipe 4-way MT ICache Instruction DCache Sparc pipe DDR × ALU Crossbar ITLB buffer 4 Thread DTLB Crossbar 4-way MT Dram control Slide 13 select Decode MUL Slide 15 Channel 2 Shifter store interface L2 B2 Mux buffers × 4 Sparc pipe DIV 4-way MT Sparc pipe DDR 4-way MT Dram control Instruction type Channel 3 L2 B3 Thread selects Thread Misses Sparc pipe select 4-way MT logic Traps and interrupts Resource conflicts PC Thread logic select × 4 Mux I/O and shared functions I/O interface CHIP RESOURCES ➜ Per Thread ➜ Registers ➜ Working Set and Architectural Set THREAD SWITCHING ➜ Instruction Buffer ➜ Default – switch per cycle, LRU ➜ Per Core ➜ Other heuristics go into thread swapping logic ➜ Slide 14 L1I 16Kib, 4-way set associative, 32 byte lines Slide 16 ➜ Predecoded Information – long latency instructions ➜ L1D 8KiB, 4-way set associative, 16 byte lines ➜ Traps – system calls, exceptions ➜ I/DTLB 64-entry, fully associative ➜ Resource Conflicts – execution resources ➜ Execution Units ➜ Cache Miss ➜ Shared ➜ L2 3MiB, 12-way set associative, 4-way banked ➜ I/O CHIP RESOURCES (2) 7 HYPERVISOR 8 HYPERVISOR ➜ Unprivileged, Privileged, Hyperprivileged ➜ The hypervisor is the “hardware” ➜ API and source code published OPEN SOURCE ➜ Hyperprivileged resources V First open source processor Slide 17 ➜ MMU Slide 19 ➜ http://opensparc.sunsource.net/ ➜ Interrupts ➜ Mailing Lists, Forums ➜ PCI ➜ Bug reports and feedback ➜ Machine Description mechanism ➜ Fast and Slow trap mechanisms to privileged mode ➜ VA extended with partition ID HYPERVISOR IN ACTION Primary ASID ASID User requests VA Secondary ASID Virtual Address VPN Offset Nucleus ASID Context Registers Hypervisor intercepts Calculate TSB offset, raise OS fault Address Spaces ASID taken from primary or secondary context register Real Address DATA TAG ASID VPN THANK YOU Slide 18 TSB Address Space ID Slide 20 Operating System QUESTIONS? OS requests TLB insert Hypervisor adds partition ID TSB Address To TLB address Partition ID Partition ID Partition ID PID ASID VPN Partition ID Per OS Hypervisor State Partition Real Addresses Hypervisor TLB translation ASIDPartition Virtual Physical TLB Physical Hardware OPEN SOURCE 9 QUESTIONS? 10.