16 NIAGARA.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
NIAGARA Presented by Linda Pescatore 1 NIAGARA: It’s about throughput § Key performance metric: Sustained throughput of client requests § Both multicore and multithreaded Less romantic name: UltraSPARC T1 with CoolThreads technology Released Nov 14, 2005 SPARC = Scalable Processor ARChitecture 2 NIAGARA: Amdahl’s at it again Improving performance of a single thread using § Multiple instruction issue § Out of order processing, and § Aggressive branch prediction mostly reduces compute time, not memory access time. (True?) 3 NIAGARA: We like throughput § Optimized for multithreaded performance § Commercial server apps: high TLP, low to medium ILP § 8 “Sparc pipe” thread groups of 4 § 32 threads total (64 in Niagara 2) § “Parallel execution of many threads … hides memory latency.” § No aggressive branch prediction § Speculative thread: low priority 4 NIAGARA: We got the power § Power eficiency: § Dissipate 60W expected § Resource-sharing § Clock speed not pushed to limit § Sun’s SWaP = Performance / (Space x Power Consumption) “The performance per watt is four to 10 times better than any other chip.” (Nathan Hydroelectric power dam at the Robert Brookwood, Analyst, Insight64) Moses generating facility, fed by conduits under the city of Niagara Falls. § Conserve space 5 Block Diagrams Kongetira Credit: David Halko, Creative Commons license 6 NIAGARA: Ceci n’est pas une Sparc Pipe • Adds Thread Select Logic • Controls when to fetch, when to decode and execute. • Thread selection policy: – Switch between available threads every cycle – Prioritize least recently used Kongetira 7 Niagara makes a splash § The T1 processor is in: § UltraSPARC T2 (N2, Victoria Falls): 8x8 § Sun/Fujitsu/Fujitsu Siemens SPARC enterprise § 2x threads = area-efXicient, enhance T1000 and T2000 servers cryptography, incorporate FGU § Sun Fire T1000, T2000 § New “pick” pipe stage chooses 2 of servers 8 threads to execute each cycle § Sun Netra T2000 server § Double set associativity of L1-I to 8 § Sun Netra CP3060 Blade § Double fully associative DTLB to § Sun Blade T6300 server 128 entries module § Double L2 banks to 8 § UltraSPARC T2 Plus: 16 cores x 8 threads § UltraSPARC T3: 16 x 8 § UltraSPARC T4 (2011!): 8 cores, OOO § UltraSPARC T5: 16 cores, 28 nm process 8 Niagara 1 and 2 are open source! § First and only 64-bit chip multithreaded microprocessors ever open-sourced, according to OpenSparc.net. Find: § Processor design source code (Verilog) § Simulation tools § Design veriication suites § Hypervisor source code § OpenSPARC can boot real off-the-shelf commercial operating systems (e.g., Solaris, Linux, FreeBSD). Use a real design for your study or research! 9 Related work: Piranha • Piranha: Compaq 2000 Niagara: Sun 2005 • Niagara paper refs Piranha* • Almost identical rationales • High BW, low latency – 1.6 GB/sec x 8 = 12.8 GB/sec • 8 Alpha single-issue in-order cores (RISC), individual L1 data and instruction caches, Intra- A piranha at the Memphis zoo, by Alexdi, Chip Switch, shared L2 Creative Commons license • 8-stage pipeline: – Instruction fetch * “Other studies have also indicated the signiXicant performance gains possible – Register read using this approach on multithreaded – ALU stages 1-5 (incl FP & mult.) workloads.” (Konetira) – Write back 10 Remember Niagara Photos of Niagara Falls courtesy of GoCanada 11 Circular registers Kongetira paper The SPARC Architecture Manual Version 9 12 Niagara 2 vs UltraSparc T1 Golla , slide 8 13 .