OpenSPARC – An Open Platform for Hardware Reliability Experimentation

Ishwar Parulkar and Alan Wood , Inc. James C. Hoe and Babak Falsafi Carnegie Mellon University Sarita V. Adve and Josep Torrellas University of Illinois at Urbana- Champaign Subhasish Mitra Stanford University

IEEE SELSE 4 - March 26, 2008 www.OpenSPARC.net Outline

1.Chip Multi-threading (CMT) 2.OpenSPARC T2 and T1 processors 3.Reliability in OpenSPARC processors 4.What is available in OpenSPARC 5.Current university research using OpenSPARC 6.Future research directions

IEEE SELSE 4 – March 26, 2008 2 www.OpenSPARC.net

World's First 64-bit Open Source

OpenSPARC.net  Governed by GPLv2

 Complete architecture & implementation

 Register Transfer Level (RTL)

 Hypervisor API

 Verification suite and architectural models

 Simulation model for operating system bringup on s/w

IEEE SELSE 4 – March 26, 2008 3 www.OpenSPARC.net

Chip Multithreading (CMT) Instruction- Low Low Low Medium Low High level Parallelism -level Parallelism High High High High High

Instruction/Data Large Large Medium Large Large Working Set

Data Sharing Low Medium High Medium High Medium

IEEE SELSE 4 – March 26, 2008 4 www.OpenSPARC.net Memory Bottleneck Relative Performance

10000 CPU Frequency DRAM Speeds 1000

2 Years 100 Every Gap 2x -- CPU 6 10 -- 2x Every DRAM Years

1 1980 1985 1990 1995 2000 2005

Source: Sun World Wide Analyst Conference Feb. 25, 2003 IEEE SELSE 4 – March 26, 2008 5 www.OpenSPARC.net Single Threading HURRY Up to 85% Cycles Waiting for Memory UP AND WAIT!

Single Threaded Performance

Typical Processor Threa Utilization:15–25% d

C M C M C M

Time Memory Latency Compute

IEEE SELSE 4 – March 26, 2008 6 www.OpenSPARC.net The Power of CMT

Single UltraSPARCThreaded T1 core PerformanceProcessor Utilization: Up to Chip Multi-threaded 85% (CMT) Performance

Thread 4 C M C M C M Thread 3 C M C M C M Thread 2 C M C M C M Thread 1 C M C M C M

Time Memory Latency Compute

IEEE SELSE 4 – March 26, 2008 7 www.OpenSPARC.net Chip Multi-Threading (CMT)

CMP HMT CMT (chip multiprocessing) (hardware multithreading) (chip multithreading)

n cores per processor m threads per core n x m threads per processor

IEEE SELSE 4 – March 26, 2008 8 www.OpenSPARC.net CMT Paradigm Shift!

> Higher reliability CMT technology > Better performance allows simple, > Lower cost compact system > Faster Installation > More efficient energy use designs, which > Lower HVAC cost deliver: > Faster time-to-repair > ... and more Everybody has changed to multi-core (CMP) and/or chip multi-threaded (CMT) processors:  Sun(CMT), IBM(CMT), Intel(CMP), AMD(CMP)

IEEE SELSE 4 – March 26, 2008 9 www.OpenSPARC.net

UltraSPARC T2 and T1

Instruction- Low Low Low Medium Low High level CMT Processors Parallelism Thread-level Parallelism High High High High High

Instruction/Data Large Large Medium Large Large Working Set

Data Sharing Low Medium High Medium High Medium

IEEE SELSE 4 – March 26, 2008 10 www.OpenSPARC.net

 8 SPARC cores, 8 UltraSPARC T2 threads each Die Photo  Shared 4MB L2, 8 banks, 16-way associative

 Four dual-channel FBDIMM memory controllers

 Two 10/1 Gb Enet ports w/onboard packet classification and filtering

 One PCI-E x8 port

 Cryptograhic coprocessor on chip

 1831 pins, 711 signal I/0

2  342mm die in 65nm IEEE SELSE 4 – March 26, 2008 11 www.OpenSPARC.net UltraSPARC T2 Block Diagram

IEEE SELSE 4 – March 26, 2008 12 www.OpenSPARC.net UltraSPARC T2

IEEE SELSE 4 – March 26, 2008 13 www.OpenSPARC.net UltraSPARC T2 Reliability

 Extensive error detection and correction  Parity protection on I$, D$ tags and data, ITLB, DTLB, CAM and data, modular arithmetic, store address buffer  ECC on integer RF, floating point RF, store data buffer, trap stack, L2$ and other internal arrays  Combination of hardware and software correction flows  Hardware re-fetch for I$ and D$  Software recovery for other errors  Offlining of a thread, group of threads or physical core  Hardware error injection for verification  Selective disabling of detection and reporting for bringup

IEEE SELSE 4 – March 26, 2008 14 www.OpenSPARC.net UltraSPARC T2 Reliability Faster Can Be Cooler (1)

Single-Core Processor

107C C C C C 102C 1 2 3 4 96C

91C

85C

80C

74C

69C

63C

58C C C C C 5 6 7 8 (Not to Scale)

IEEE SELSE 4 – March 26, 2008 15 www.OpenSPARC.net UltraSPARC T2 Reliability Faster Can Be Cooler (2)

Single-Core Processor T2 Processor

107C C C C C 102C 1 2 3 4 96C

91C

85C

80C

74C

69C

63C

58C C C C C 5 6 7 8 (Not to Scale)

IEEE SELSE 4 – March 26, 2008 16 www.OpenSPARC.net

OpenSPARC Instruction- Low Low Low Medium Low High level Parallelism Thread-level Parallelism High High High High High

Instruction/Data Large Large Medium Large Large Working Set

Data Sharing Low Medium High Medium High Medium

IEEE SELSE 4 – March 26, 2008 17 www.OpenSPARC.net OpenSPARC Communities Academia/Universities EDA Vendors Architecture, ISA, VLSI course work Benchmarking Threading, Scaling, Parallelization Reference flow Benchmarks FPGA Emulation Verification Physical Design Multi-threaded tools CMT Tools Compilers, Threading Optimization Hardware IP Suppliers Performance Analysis PCI cores, SERDES etc. Operating Systems OpenSolaris, Chip Designers , BSD variants, SoC designs, Hard macros Embedded OSs Telecom applications

IEEE SELSE 4 – March 26, 2008 18 www.OpenSPARC.net What's Available in OpenSPARC 1. Chip design and verification  UltraSPARC Architecture 2005 spec  UltraSPARC T2/T1 implementation spec  Full RTL () of OpenSPARC T2/T1 (8 cores, 64/32 threads – more than 4 million lines of code!)  Verification test suites  Full OpenSPARC simulation environment  Synthesis scripts for RTL  FPGA implementation support  Reduced (to fit capacity), synthesizable version of RTL  Synplicity scripts for FPGA synthesis

IEEE SELSE 4 – March 26, 2008 19 www.OpenSPARC.net What's Available in OpenSPARC 2. Architecture and performance modeling

 SAM – SPARC Architectural Model (including source code)  Legion – Instruction accurate simulator (incl. source code)  OBP – Open Boot PROM source code  Hypervisor source code  Solaris images for simulation  RST Trace Tool – trace format for SPARC instruction-level traces

IEEE SELSE 4 – March 26, 2008 20 www.OpenSPARC.net What's Available in OpenSPARC 3. Tools for tuning and debug

 ATS – Binary reoptimization and recompilation tool for tuning and troubleshooting applications  Corestat – Online monitoring of core and FPU utilization  Discover – Runtime detection of programming errors in allocating and using program memory  Thread Analyzer – Checking of multi-threaded programming errors such as data races and deadlocks  More...

IEEE SELSE 4 – March 26, 2008 21 www.OpenSPARC.net What's Available in OpenSPARC 4. Tools for software developers  Sun Studio 12 – C, C++, Fortran compilers for Solaris/Linux combined with Netbeans, etc.  BIT – Binary Improvement Tool analyzes and optimizes SPARC binaries for performance and code coverage  SPOT – produces detailed report on conditions that impact performance of an application  Source code analysis tool to identify incompatible APIs between Solaris and Linux to speed up migration  More... IEEE SELSE 4 – March 26, 2008 22 www.OpenSPARC.net

University research in hardware reliability using

Instruction- LowOpenSPARCLow Low Medium Low High level Parallelism Thread-level Parallelism High High High High High

Instruction/Data Large Large Medium Large Working Set

Data Sharing Low Medium High Medium High Medium

IEEE SELSE 4 – March 26, 2008 23 www.OpenSPARC.net Architectural Fingerprints Problem: Error detection for the processor pipeline ( soft, wearout, … )

Solution: Architectural fingerprints  Summarize retiring architectural updates into compact hash (regs, stores)  Periodically compare hash with reference (another core, previous execution) Results:  Multithreaded OpenSPARC T1 RTL implementation — less than 4% area overhead  Scalable to wide-issue superscalar BW  Soft fault injection: effective detection for errors propagated to arch. state

Decode Ex Mem Writeback Silent Data Corruption Hang Loop D- Decode ALU Store 1.0 Cache To 0.8 Buffer L2 0.6 RegFile errors

0.4 x4 FP Match Compare

Fract. 0.2 0.0 Queue arch. byp exu fcl fdp lsu swl tlu Full Hash SPARC Prof. Hoe and Prof. Falsafi @ Carnegie Mellon University IEEE SELSE 4 – March 26, 2008 24 www.OpenSPARC.net FIRST – Detecting Emerging Wearout Faults

Problem: Detecting device wearout during soft breakdown stage  Faults initially hidden by guardbands & masking

Solution: Periodically test processor cores for signs of growing wearout  Reduce freq./voltage guardbands until marginal  Test w/Arch. or μArch. fingerprints  Observe fails at incr. conservative conditions

Results:  Wearout fault injection in OpenSPARC  Arch. and μArch. fingerprints 1 equivalent for wide-spread 0.8 μArch Arch wearout 0.6  μArch. needed for isolated Timeout 0.4 wearout 0.2 0

Frac. Fails detected 0 50 100 150 200 Stress past guardband (ps) Prof. Hoe and Prof. Falsafi @ Carnegie Mellon University IEEE SELSE 4 – March 26, 2008 25 www.OpenSPARC.net SWAT – SoftWare Anomaly Treatment Motivation Low cost solutions needed for in-field detection, diagnosis, recovery and repair for failures due to aging, soft errors inadequate burn-in, design defects, … SWAT Framework Components • Detection: Software symptoms, minimal backup hardware • Recovery: Software/hardware checkpoint and rollback • Diagnosis: Firmware-controlled rollback/replay on multicore • Repair/reconfiguration: Redundant, reconfigurable hardware

Chkpoint Fault Error Symptom Recovery Chkpoint detected Always-on, zero or low cost Diagnosis Repair Prof. S. Adve, V. Adve and Y. Zhou @ May have high overhead, rarely invoked University of Illinois at U-C IEEE SELSE 4 – March 26, 2008 26 www.OpenSPARC.net SWAT – Status and Ongoing Work

Status  Detection techniques with > 95% coverage for most structures [ASPLOS’08, SELSE’08, DSN’08]  level, firmware-driven diagnosis with > 97% coverage [SELSE’08, DSN’08]  So far, used microarchitecture-level fault injection in simulation Ongoing/future work with OpenSPARC  Gate-level fault modeling  Hypervisor implementation

Prof. S. Adve, V. Adve and Y. Zhou @ University of Illinois at U-C IEEE SELSE 4 – March 26, 2008 27 www.OpenSPARC.net SWAT – Ongoing Work High-level fault models and validation

Goals  Understand how gate level faults propagate to microarch & s/w  Abstract fault models at microarchitecture level  Evaluate reliability solutions and validate results Methodology  Perform fault injections at gate level  For better simulation speed  Hierarchical integration of microarchitecture level full system simulator with lower-level simulation of faulty unit  Using OpenSPARC Verilog model

Prof. S. Adve, V. Adve and Y. Zhou @ University of Illinois at U-C IEEE SELSE 4 – March 26, 2008 28 www.OpenSPARC.net SWAT – Future Work Hypervisor implementation

Plan to use OpenSPARC hypervisor to prototype and evaluate firmware part of SWAT Methodology  Leverage, extend interface between hypervisor/hardware and hypervisor/OS  Extend hypervisor for functionality  Use for error detection, recovery, diagnosis, repair

Prof. S. Adve, V. Adve and Y. Zhou @ University of Illinois at U-C IEEE SELSE 4 – March 26, 2008 29 www.OpenSPARC.net VARIUS – Process Parameter Variation

 Problem:  Parameter variation in present and future multicore chips  Goals:  Model parameter variation and resulting timing errors  Design multicore to detect and tolerate variation-induced errors  Develop new microarchitectural techniques to mitigate variation and variation-induced errors.

Prof. Torrellas @ University of Illinois at U-C IEEE SELSE 4 – March 26, 2008 30 www.OpenSPARC.net VARIUS – Process Parameter Variation Accomplishments

 VARIUS model of parameter variation and resulting timing errors for microarchitects [TSM08]  ReCycle: Pipeline rebalance under process variation [ISCA07]  Fine-grain adaptive body bias (ABB) to mitigate variation in multicores [MICRO07]  Workload scheduling and DVFS power management in multicores under variation [ISCA08]

 Paceline: Core pairing for reliability under process variation [PACT07]

Prof. Torrellas @ University of Illinois at U-C IEEE SELSE 4 – March 26, 2008 31 www.OpenSPARC.net VARIUS – Process Parameter Variation Using OpenSPARC

 Goal: Get insights into the effect of parameter variation on a real processor Design entry  Measure the distribution of the path delays  Apply the variation model RTL & Timing Constraints &  Evaluation Flow: Library  Synopsys dc_shell-t: Compile RTL  compile RTL to gate-level netlist (dc_shell-t)  Netlist & Cadence SOCEncounter Timing Constraints & Physical library  Floorplan, Placement, Routing, Timing analysis  Synopsys Primetime SOCEncounter  Static timing analysis & timing debugging  Placement & Cadence NCSim Timing report & Routing  Simulation

Primetime NCSim

Netlist & Timing info

Prof. Torrellas @ University of Illinois at U-C IEEE SELSE 4 – March 26, 2008 32 www.OpenSPARC.net CASP – Concurrent Autonomous Chip Self-test using Stored Patterns Motivation Solution: EXTREMELY THOROUGH online self-test

Burn-in Circuit aging difficult dominant Failure rate

Time Infant mortality Normal lifetime Wearout

Soft errors: effective techniques exist

Prof. Mitra @ 33 Stanford University IEEE SELSE 4 – March 26, 2008 33 www.OpenSPARC.net CASP – Test Flow Test Scheduling Pre-processing Core 4 Core 4 selected for temporarily test isolated Schedule Prepare ... test on ... core for Core N next core Core N test normal normal operation operation

Post-processing Test Application

Core 4 Core 4 Thorough resume scan & operation Bring under test ... core from ... functional test to testing; Core N Core N normal recovery if normal normal operation failed operation operation

Prof. Mitra @ 34 Stanford University IEEE SELSE 4 – March 26, 2008 34 www.OpenSPARC.net OpenSPARC Modifications for CASP CASP control CASP Controller FPU On-chip buffer for scan test data

Architectural modfications DRAM ➢ Before a core is tested 8 Cross- Control ➢ stalling/draining pipeline processor bar ➢ disabling communication with cores Switch core under test L2 ➢ saving critical state Modified Modified ➢ invalidating D$ for for CASP Cache CASP support CASP ➢ After a core is tested support ➢ restoring critical state off-chip ➢ enabling communication with core Storage under test (52MB) ➢ restarting pipeline

● 8000 lines of new Verilog code on-chip buffer Jbus ● Verification regression used to (7.5KB) Interface simulate normal operation of chip 35 Prof. Mitra @ Stanford University IEEE SELSE 4 – March 26, 2008 35 www.OpenSPARC.net

Future research possibilities in hardware reliability using

Instruction- LowOpenSPARCLow Low Medium Low High level Parallelism Thread-level Parallelism High High High High High

Instruction/Data Large Large Medium Large Working Set

Data Sharing Low Medium High Medium High Medium

IEEE SELSE 4 – March 26, 2008 36 www.OpenSPARC.net Future research possibilities

 Using CMT hardware resources for error detection and recovery  cores, threads, structures used by cores/threads  Understanding errors in the context of CMT architectural constructs  thread arbitration and scheduling  speculative threading  Validate error management solutions using a state-of-the-art microprocessor design

IEEE SELSE 4 – March 26, 2008 37 www.OpenSPARC.net Future research possibilities  Study impact of reliability solutions on microprocessor performance  use performance tools available in OpenSPARC  Firmware and software solutions for hardware reliability  FPGA implementation and T1000/2000 servers with Solaris/Hypervisor source and other tools  Study impact of error detectors in processor on chip level and application failure rates  enable error detection selectively, use simulators  Several more...

IEEE SELSE 4 – March 26, 2008 38 www.OpenSPARC.net Conclusions

 OpenSPARC is an open source community based around UltraSPARC T1 and T2 CMT  OpenSPARC provides a rich, state-of-the-art infrastructure for research in hardware reliability  Many universities are actively using OpenSPARC in their research, with a lot of success  There is a lot more research in hardware reliability that can be done using OpenSPARC

IEEE SELSE 4 – March 26, 2008 39 www.OpenSPARC.net Acknowledgment We would like to acknowledge the students (past and present) from Carnegie Mellon University, University of Illinois at U-C and Stanford University who contributed to the research described in this presentation.

IEEE SELSE 4 – March 26, 2008 40