OpenSPARC – An Open Platform for Hardware Reliability Experimentation
Ishwar Parulkar and Alan Wood Sun Microsystems, Inc. James C. Hoe and Babak Falsafi Carnegie Mellon University Sarita V. Adve and Josep Torrellas University of Illinois at Urbana- Champaign Subhasish Mitra Stanford University
IEEE SELSE 4 - March 26, 2008 www.OpenSPARC.net Outline
1.Chip Multi-threading (CMT) 2.OpenSPARC T2 and T1 processors 3.Reliability in OpenSPARC processors 4.What is available in OpenSPARC 5.Current university research using OpenSPARC 6.Future research directions
IEEE SELSE 4 – March 26, 2008 2 www.OpenSPARC.net
World's First 64-bit Open Source Microprocessor
OpenSPARC.net Governed by GPLv2
Complete processor architecture & implementation
Register Transfer Level (RTL)
Hypervisor API
Verification suite and architectural models
Simulation model for operating system bringup on s/w
IEEE SELSE 4 – March 26, 2008 3 www.OpenSPARC.net
Chip Multithreading (CMT) Instruction- Low Low Low Medium Low High level Parallelism Thread-level Parallelism High High High High High
Instruction/Data Large Large Medium Large Large Working Set
Data Sharing Low Medium High Medium High Medium
IEEE SELSE 4 – March 26, 2008 4 www.OpenSPARC.net Memory Bottleneck Relative Performance
10000 CPU Frequency DRAM Speeds 1000
2 Years 100 Every Gap 2x -- CPU 6 10 -- 2x Every DRAM Years
1 1980 1985 1990 1995 2000 2005
Source: Sun World Wide Analyst Conference Feb. 25, 2003 IEEE SELSE 4 – March 26, 2008 5 www.OpenSPARC.net Single Threading HURRY Up to 85% Cycles Waiting for Memory UP AND WAIT!
Single Threaded Performance
Typical Processor Threa Utilization:15–25% d
C M C M C M
Time Memory Latency Compute
IEEE SELSE 4 – March 26, 2008 6 www.OpenSPARC.net The Power of CMT
Single UltraSPARCThreaded T1 core PerformanceProcessor Utilization: Up to Chip Multi-threaded 85% (CMT) Performance
Thread 4 C M C M C M Thread 3 C M C M C M Thread 2 C M C M C M Thread 1 C M C M C M
Time Memory Latency Compute
IEEE SELSE 4 – March 26, 2008 7 www.OpenSPARC.net Chip Multi-Threading (CMT)
CMP HMT CMT (chip multiprocessing) (hardware multithreading) (chip multithreading)
n cores per processor m threads per core n x m threads per processor
IEEE SELSE 4 – March 26, 2008 8 www.OpenSPARC.net CMT Paradigm Shift!
> Higher reliability CMT technology > Better performance allows simple, > Lower cost compact system > Faster Installation > More efficient energy use designs, which > Lower HVAC cost deliver: > Faster time-to-repair > ... and more Everybody has changed to multi-core (CMP) and/or chip multi-threaded (CMT) processors: Sun(CMT), IBM(CMT), Intel(CMP), AMD(CMP)
IEEE SELSE 4 – March 26, 2008 9 www.OpenSPARC.net
UltraSPARC T2 and T1
Instruction- Low Low Low Medium Low High level CMT Processors Parallelism Thread-level Parallelism High High High High High
Instruction/Data Large Large Medium Large Large Working Set
Data Sharing Low Medium High Medium High Medium
IEEE SELSE 4 – March 26, 2008 10 www.OpenSPARC.net
8 SPARC cores, 8 UltraSPARC T2 threads each Die Photo Shared 4MB L2, 8 banks, 16-way associative
Four dual-channel FBDIMM memory controllers
Two 10/1 Gb Enet ports w/onboard packet classification and filtering
One PCI-E x8 port
Cryptograhic coprocessor on chip
1831 pins, 711 signal I/0
2 342mm die in 65nm IEEE SELSE 4 – March 26, 2008 11 www.OpenSPARC.net UltraSPARC T2 Block Diagram
IEEE SELSE 4 – March 26, 2008 12 www.OpenSPARC.net UltraSPARC T2
IEEE SELSE 4 – March 26, 2008 13 www.OpenSPARC.net UltraSPARC T2 Reliability
Extensive error detection and correction Parity protection on I$, D$ tags and data, ITLB, DTLB, CAM and data, modular arithmetic, store address buffer ECC on integer RF, floating point RF, store data buffer, trap stack, L2$ and other internal arrays Combination of hardware and software correction flows Hardware re-fetch for I$ and D$ Software recovery for other errors Offlining of a thread, group of threads or physical core Hardware error injection for verification Selective disabling of detection and reporting for bringup
IEEE SELSE 4 – March 26, 2008 14 www.OpenSPARC.net UltraSPARC T2 Reliability Faster Can Be Cooler (1)
Single-Core Processor
107C C C C C 102C 1 2 3 4 96C
91C
85C
80C
74C
69C
63C
58C C C C C 5 6 7 8 (Not to Scale)
IEEE SELSE 4 – March 26, 2008 15 www.OpenSPARC.net UltraSPARC T2 Reliability Faster Can Be Cooler (2)
Single-Core Processor T2 Processor
107C C C C C 102C 1 2 3 4 96C
91C
85C
80C
74C
69C
63C
58C C C C C 5 6 7 8 (Not to Scale)
IEEE SELSE 4 – March 26, 2008 16 www.OpenSPARC.net
OpenSPARC Instruction- Low Low Low Medium Low High level Parallelism Thread-level Parallelism High High High High High
Instruction/Data Large Large Medium Large Large Working Set
Data Sharing Low Medium High Medium High Medium
IEEE SELSE 4 – March 26, 2008 17 www.OpenSPARC.net OpenSPARC Communities Academia/Universities EDA Vendors Architecture, ISA, VLSI course work Benchmarking Threading, Scaling, Parallelization Reference flow Benchmarks FPGA Emulation Verification Physical Design Multi-threaded tools CMT Tools Compilers, Threading Optimization Hardware IP Suppliers Performance Analysis PCI cores, SERDES etc. Operating Systems OpenSolaris, Chip Designers Linux, BSD variants, SoC designs, Hard macros Embedded OSs Telecom applications
IEEE SELSE 4 – March 26, 2008 18 www.OpenSPARC.net What's Available in OpenSPARC 1. Chip design and verification UltraSPARC Architecture 2005 spec UltraSPARC T2/T1 implementation spec Full RTL (Verilog) of OpenSPARC T2/T1 (8 cores, 64/32 threads – more than 4 million lines of code!) Verification test suites Full OpenSPARC simulation environment Synthesis scripts for RTL FPGA implementation support Reduced (to fit capacity), synthesizable version of RTL Synplicity scripts for FPGA synthesis
IEEE SELSE 4 – March 26, 2008 19 www.OpenSPARC.net What's Available in OpenSPARC 2. Architecture and performance modeling
SAM – SPARC Architectural Model (including source code) Legion – Instruction accurate simulator (incl. source code) OBP – Open Boot PROM source code Hypervisor source code Solaris images for simulation RST Trace Tool – trace format for SPARC instruction-level traces
IEEE SELSE 4 – March 26, 2008 20 www.OpenSPARC.net What's Available in OpenSPARC 3. Tools for tuning and debug
ATS – Binary reoptimization and recompilation tool for tuning and troubleshooting applications Corestat – Online monitoring of core and FPU utilization Discover – Runtime detection of programming errors in allocating and using program memory Thread Analyzer – Checking of multi-threaded programming errors such as data races and deadlocks More...
IEEE SELSE 4 – March 26, 2008 21 www.OpenSPARC.net What's Available in OpenSPARC 4. Tools for software developers Sun Studio 12 – C, C++, Fortran compilers for Solaris/Linux combined with Netbeans, etc. BIT – Binary Improvement Tool analyzes and optimizes SPARC binaries for performance and code coverage SPOT – produces detailed report on conditions that impact performance of an application Source code analysis tool to identify incompatible APIs between Solaris and Linux to speed up migration More... IEEE SELSE 4 – March 26, 2008 22 www.OpenSPARC.net
University research in hardware reliability using
Instruction- LowOpenSPARCLow Low Medium Low High level Parallelism Thread-level Parallelism High High High High High
Instruction/Data Large Large Medium Large Working Set
Data Sharing Low Medium High Medium High Medium
IEEE SELSE 4 – March 26, 2008 23 www.OpenSPARC.net Architectural Fingerprints Problem: Error detection for the processor pipeline ( soft, wearout, … )
Solution: Architectural fingerprints Summarize retiring architectural updates into compact hash (regs, stores) Periodically compare hash with reference (another core, previous execution) Results: Multithreaded OpenSPARC T1 RTL implementation — less than 4% area overhead Scalable to wide-issue superscalar BW Soft fault injection: effective detection for errors propagated to arch. state
Decode Ex Mem Writeback Silent Data Corruption Hang Loop D- Decode ALU Store 1.0 Cache To 0.8 Buffer L2 0.6 RegFile errors
0.4 x4 FP Match Compare
Fract. 0.2 0.0 Queue arch. byp exu fcl fdp lsu swl tlu Full Hash SPARC Prof. Hoe and Prof. Falsafi @ Carnegie Mellon University IEEE SELSE 4 – March 26, 2008 24 www.OpenSPARC.net FIRST – Detecting Emerging Wearout Faults
Problem: Detecting device wearout during soft breakdown stage Faults initially hidden by guardbands & masking
Solution: Periodically test processor cores for signs of growing wearout Reduce freq./voltage guardbands until marginal Test w/Arch. or μArch. fingerprints Observe fails at incr. conservative conditions
Results: Wearout fault injection in OpenSPARC Arch. and μArch. fingerprints 1 equivalent for wide-spread 0.8 μArch Arch wearout 0.6 μArch. needed for isolated Timeout 0.4 wearout 0.2 0
Frac. Fails detected 0 50 100 150 200 Stress past guardband (ps) Prof. Hoe and Prof. Falsafi @ Carnegie Mellon University IEEE SELSE 4 – March 26, 2008 25 www.OpenSPARC.net SWAT – SoftWare Anomaly Treatment Motivation Low cost solutions needed for in-field detection, diagnosis, recovery and repair for failures due to aging, soft errors inadequate burn-in, design defects, … SWAT Framework Components • Detection: Software symptoms, minimal backup hardware • Recovery: Software/hardware checkpoint and rollback • Diagnosis: Firmware-controlled rollback/replay on multicore • Repair/reconfiguration: Redundant, reconfigurable hardware
Chkpoint Fault Error Symptom Recovery Chkpoint detected Always-on, zero or low cost Diagnosis Repair Prof. S. Adve, V. Adve and Y. Zhou @ May have high overhead, rarely invoked University of Illinois at U-C IEEE SELSE 4 – March 26, 2008 26 www.OpenSPARC.net SWAT – Status and Ongoing Work
Status Detection techniques with > 95% coverage for most structures [ASPLOS’08, SELSE’08, DSN’08] Microarchitecture level, firmware-driven diagnosis with > 97% coverage [SELSE’08, DSN’08] So far, used microarchitecture-level fault injection in simulation Ongoing/future work with OpenSPARC Gate-level fault modeling Hypervisor implementation
Prof. S. Adve, V. Adve and Y. Zhou @ University of Illinois at U-C IEEE SELSE 4 – March 26, 2008 27 www.OpenSPARC.net SWAT – Ongoing Work High-level fault models and validation
Goals Understand how gate level faults propagate to microarch & s/w Abstract fault models at microarchitecture level Evaluate reliability solutions and validate results Methodology Perform fault injections at gate level For better simulation speed Hierarchical integration of microarchitecture level full system simulator with lower-level simulation of faulty unit Using OpenSPARC Verilog model
Prof. S. Adve, V. Adve and Y. Zhou @ University of Illinois at U-C IEEE SELSE 4 – March 26, 2008 28 www.OpenSPARC.net SWAT – Future Work Hypervisor implementation
Plan to use OpenSPARC hypervisor to prototype and evaluate firmware part of SWAT Methodology Leverage, extend interface between hypervisor/hardware and hypervisor/OS Extend hypervisor for functionality Use for error detection, recovery, diagnosis, repair
Prof. S. Adve, V. Adve and Y. Zhou @ University of Illinois at U-C IEEE SELSE 4 – March 26, 2008 29 www.OpenSPARC.net VARIUS – Process Parameter Variation
Problem: Parameter variation in present and future multicore chips Goals: Model parameter variation and resulting timing errors Design multicore microarchitectures to detect and tolerate variation-induced errors Develop new microarchitectural techniques to mitigate variation and variation-induced errors.
Prof. Torrellas @ University of Illinois at U-C IEEE SELSE 4 – March 26, 2008 30 www.OpenSPARC.net VARIUS – Process Parameter Variation Accomplishments
VARIUS model of parameter variation and resulting timing errors for microarchitects [TSM08] ReCycle: Pipeline rebalance under process variation [ISCA07] Fine-grain adaptive body bias (ABB) to mitigate variation in multicores [MICRO07] Workload scheduling and DVFS power management in multicores under variation [ISCA08]
Paceline: Core pairing for reliability under process variation [PACT07]
Prof. Torrellas @ University of Illinois at U-C IEEE SELSE 4 – March 26, 2008 31 www.OpenSPARC.net VARIUS – Process Parameter Variation Using OpenSPARC
Goal: Get insights into the effect of parameter variation on a real processor Design entry Measure the distribution of the path delays Apply the variation model RTL & Timing Constraints & Evaluation Flow: Library Synopsys dc_shell-t: Compile RTL compile RTL to gate-level netlist (dc_shell-t) Netlist & Cadence SOCEncounter Timing Constraints & Physical library Floorplan, Placement, Routing, Timing analysis Synopsys Primetime SOCEncounter Static timing analysis & timing debugging Placement & Cadence NCSim Timing report & Routing Simulation
Primetime NCSim
Netlist & Timing info
Prof. Torrellas @ University of Illinois at U-C IEEE SELSE 4 – March 26, 2008 32 www.OpenSPARC.net CASP – Concurrent Autonomous Chip Self-test using Stored Patterns Motivation Solution: EXTREMELY THOROUGH online self-test
Burn-in Circuit aging difficult dominant Failure rate
Time Infant mortality Normal lifetime Wearout
Soft errors: effective techniques exist
Prof. Mitra @ 33 Stanford University IEEE SELSE 4 – March 26, 2008 33 www.OpenSPARC.net CASP – Test Flow Test Scheduling Pre-processing Core 4 Core 4 selected for temporarily test isolated Schedule Prepare ... test on ... core for Core N next core Core N test normal normal operation operation
Post-processing Test Application
Core 4 Core 4 Thorough resume scan & operation Bring under test ... core from ... functional test to testing; Core N Core N normal recovery if normal normal operation failed operation operation
Prof. Mitra @ 34 Stanford University IEEE SELSE 4 – March 26, 2008 34 www.OpenSPARC.net OpenSPARC Modifications for CASP CASP control CASP Controller FPU On-chip buffer for scan test data
Architectural modfications DRAM ➢ Before a core is tested 8 Cross- Control ➢ stalling/draining pipeline processor bar ➢ disabling communication with cores Switch core under test L2 ➢ saving critical state Modified Modified ➢ invalidating D$ for for CASP Cache CASP support CASP ➢ After a core is tested support ➢ restoring critical state off-chip ➢ enabling communication with core Storage under test (52MB) ➢ restarting pipeline
● 8000 lines of new Verilog code on-chip buffer Jbus ● Verification regression used to (7.5KB) Interface simulate normal operation of chip 35 Prof. Mitra @ Stanford University IEEE SELSE 4 – March 26, 2008 35 www.OpenSPARC.net
Future research possibilities in hardware reliability using
Instruction- LowOpenSPARCLow Low Medium Low High level Parallelism Thread-level Parallelism High High High High High
Instruction/Data Large Large Medium Large Working Set
Data Sharing Low Medium High Medium High Medium
IEEE SELSE 4 – March 26, 2008 36 www.OpenSPARC.net Future research possibilities
Using CMT hardware resources for error detection and recovery cores, threads, structures used by cores/threads Understanding errors in the context of CMT architectural constructs thread arbitration and scheduling speculative threading Validate error management solutions using a state-of-the-art microprocessor design
IEEE SELSE 4 – March 26, 2008 37 www.OpenSPARC.net Future research possibilities Study impact of reliability solutions on microprocessor performance use performance tools available in OpenSPARC Firmware and software solutions for hardware reliability FPGA implementation and T1000/2000 servers with Solaris/Hypervisor source and other tools Study impact of error detectors in processor on chip level and application failure rates enable error detection selectively, use simulators Several more...
IEEE SELSE 4 – March 26, 2008 38 www.OpenSPARC.net Conclusions
OpenSPARC is an open source community based around UltraSPARC T1 and T2 CMT microprocessors OpenSPARC provides a rich, state-of-the-art infrastructure for research in hardware reliability Many universities are actively using OpenSPARC in their research, with a lot of success There is a lot more research in hardware reliability that can be done using OpenSPARC
IEEE SELSE 4 – March 26, 2008 39 www.OpenSPARC.net Acknowledgment We would like to acknowledge the students (past and present) from Carnegie Mellon University, University of Illinois at U-C and Stanford University who contributed to the research described in this presentation.
IEEE SELSE 4 – March 26, 2008 40