AA CaseCase forfor thethe RuntimeRuntime ValidationValidation ofof HardwareHardware

Sharad Malik Princeton University IBM Verification Conference, Haifa 13 November, 2005

Research supported by the Microelectronics Advanced Research Consortium (MARCO) through the Gigascale Systems Research Center (GSRC) OutlineOutline

‹TheThe VerificationVerification Gap/CrisisGap/Crisis

‹DifferentDifferent FailureFailure ModesModes andand RuntimeRuntime ValidationValidation

‹MicroarchitecturalMicroarchitectural SolutionsSolutions usingusing RuntimeRuntime ValidationValidation

‹RuntimeRuntime PropertyProperty CheckingChecking

‹AA GeneralGeneral RTLRTL MethodologyMethodology

‹SummarySummary andand ConclusionsConclusions

2 RateRate ofof IncreaseIncrease ofof DesignDesign ComplexityComplexity

‹ Moore’s Law: Growth rate of transistors/chip is exponential

‹ Corollary 1: Growth rate of state bits/chip is exponential

‹ Corollary 2: Growth rate of state space (proxy for complexity) is doubly exponential But…

‹ Corollary 3: Growth rate of compute power is exponential Thus…

‹ Growth rate of complexity is exponential relative to our ability to deal with it

3 ButBut whatwhat aboutabout……

‹TheThe factfact thatthat notnot allall statestate bitsbits contributecontribute equallyequally toto complexitycomplexity (e.g.(e.g. memory)memory)

‹TheThe factfact thatthat designdesign reusereuse enablesenables useuse ofof prepre--verifiedverified IPIP SoSo maybemaybe itsits notnot thatthat badbad……

4 TheThe VerificationVerification GapGap

1,000,000,000 1.E+05

GeForceFXPowerIV 100,000,000 Radeon97 Prescott PentiumIV Crusoe GeForceIV

K7 GeforceII c

10,000,000 se PentiumPro Riva TNT2 / s s PentiumII e or

Pentium cl st

i Nvidia NV2

1,000,000 m.cy ans y r t s i386 Verification gap i286 Verification gap 100,000

10,000 1.E+04 1980 1985 1990 1995 2000 2005 design size simulation speed

Source: Valeria Bertacco, Univ. of Michigan

5 TheThe VerificationVerification GapGap

© EETimes 03/18/2004

Orginal source : SIA 6 DataData fromfrom MicroprocessorMicroprocessor VerificationVerification

‹FunctionalFunctional validationvalidation isis aa majormajor bottleneckbottleneck

ΠDeeply pipelined complex microarchitectures

Pre-silicon logic bugs per generation 25000 ( Source: Tom Schubert, , DAC 2003 )

7855

800 2240

Pentium Pentium Pro Next ?

‹LogicLogic bugsbugs increaseincrease atat 33--44 times/generationtimes/generation

ΠBugs increase is exponential over time

7 HighHigh CostCost ofof VerificationVerification

2000 2007 1000B

200 2001 10B

20 1995 100M Engineer Years Simulation Vectors

1M 10M 100M

Logic Gates Source: Synopsys

Data from SoC Designs 8 Source: G. Spirakis, keynote address at DATE 2004 YetYet IncreasingIncreasing NumberNumber ofof BugBug EscapesEscapes……

Number of spins in IC Designs 60% 50% 2000 71% of SoC re-spins 2002 Designs 40% are due to logic bugs 30%

20%

Percent of 10%

0% 1234≥5 Spin Count Source: Colett International Research

‹HighHigh costcost ofof rere--spinsspins

ΠIncreasing mask costs ($5M and increasing)

‹CostCost ofof recallsrecalls eveneven higherhigher

ΠPentium FDIV Bug Recall Рseveral million $ 9 TimeTime toto MarketMarket CostsCosts

‹DelaysDelays inin rere--spinsspins ⇒⇒ LostLost timetime toto marketmarket

Œ Strong pressure for “first time correct” silicon

ΠChips verified to death

10 RuntimeRuntime ValidationValidation

‹IncreasinglyIncreasingly needneed toto reconcilereconcile ourselvesourselves toto thethe factfact thatthat hardwarehardware likelike softwaresoftware willwill bebe shippedshipped withwith bugsbugs

‹RuntimeRuntime verificationverification (through(through errorerror detectiondetection andand recovery)recovery) offersoffers aa potentiallypotentially scalablescalable solutionsolution

ΠProvide robustness in the face of inevitable bug escapes

‹SignificantlySignificantly reducereduce verificationverification costscosts

Œ Verify chips “to life” rather than “to death”

11 OutlineOutline

‹TheThe VerificationVerification Gap/CrisisGap/Crisis

‹DifferentDifferent FailureFailure ModesModes andand RuntimeRuntime ValidationValidation

‹MicroarchitecturalMicroarchitectural SolutionsSolutions usingusing RuntimeRuntime ValidationValidation

‹RuntimeRuntime PropertyProperty CheckingChecking

‹AA GeneralGeneral RTLRTL MethodologyMethodology

‹SummarySummary andand ConclusionsConclusions

12 SourcesSources ofof ErrorsErrors

‹ Process/Operating Condition Variations, Aging Œ Device failure Œ Timing Failure ‹ Soft Errors Operational ‹ Aggressive Deployment Aggressive Deployment failures Œ Low Vdd Œ Overclocking Œ Push the design to the edge so as to result in device/timing failures ‹ Manufacturing Defects Functional ‹ Design Complexity failures 13 IncreasingIncreasing ProcessProcess VariationsVariations

Original Source: Sani Nassif IBM

Percentage of total variation accounted for by within- die variation(device and interconnect)

‹ Increase in variation of process parameters over generations ‹ Worst-case design getting more expensive Worst-case design getting more expensive 14 ‹ “Better than worst-case” design must be error tolerant OtherOther VariationsVariations

Heat Flux (W/cm2) Temperature Variation (°C) Results in Vcc variation Results in Hot spots

250 110

) 100 200 2 m 90 c /

150 W 80 ( 70 100 ux

Fl 60 t a 50 Temperature (C)

e 50

H 40 0

Random Dopant Fluctuations 10000

t n a p

o 1000 D s f m o o er At b 100 m u N ean

M 10 1000 500 250 130 65 32 Technology Node (nm) Source: Shekhar Borkar, Intel15 SoftSoft--ErrorError TrendsTrends

1.0E+04 [P. Shivkumar et al., 1.0E+03 DSN 2002]

1.0E+02 ) p i

h 1.0E+01 c / T

I 1.0E+00

(F 1.0E-01 te a SRAM R 1.0E-02 r

o latch 6 FO4s r

r 1.0E-03 logic 6 FO4s

ft E 1.0E-04 o

S 1.0E-05

1.0E-06

1.0E-07 600nm 350nm 250nm 180nm 130nm 100nm 70nm 50nm 1992 1994 1997 1999 2002 2005 2008 2011 Technology Generation ‰ SER per chip of logic circuits • Nine orders of magnitude increase from 600 nm to 50 nm

• Dominant source of soft errors after 50 nm 16 ClassificationClassification ofof ErrorError ModesModes

‹ Two axes of classifications Œ Temporal (same instance, different invocations) Deterministic Design Errors Œ Spatial (different instances, possibly on the same chip) Spatial Runtime error detection and correction applicable to all – Probabilistic Soft Errors Manufacturing however error mode Process/OC Variations Defects determines which class of Aggressive Deployment techniques can be applied: Transient Permanent Œ Replication works for probabilistic errors, but not deterministic errors. Temporal Œ Repeating computation works for transient errors but not Design errors dominate along both axes permanent ones. 17 RuntimeRuntime Validation:Validation: QuickQuick AnalysisAnalysis

‹ Pros Œ Do not have to deal with all possible behavior, only the actual behavior  No state explosion problem Œ Checking circuits only need to recognize the symptoms when the bug is exercised, not its cause, e.g.  Check the results of division using multiplication  Check if a bus is deadlocked Œ Recovery does not need to fix the bug, rather only ensure forward progress ‹ Cons Œ Recovery likely to be hard  Need a trusted alternate computation path ‹ Tradeoff performance for simplicity • Simpler slower divider • Simpler slower protocol Œ Additional complexity  Who checks the checker? Œ Performance, power, area overhead

18 OutlineOutline

‹TheThe VerificationVerification Gap/CrisisGap/Crisis

‹DifferentDifferent FailureFailure ModesModes andand RuntimeRuntime ValidationValidation

‹MicroarchitecturalMicroarchitectural SolutionsSolutions usingusing RuntimeRuntime ValidationValidation

‹RuntimeRuntime PropertyProperty CheckingChecking

‹AA GeneralGeneral RTLRTL MethodologyMethodology

‹SummarySummary andand ConclusionsConclusions

19 InstructionInstruction SetSet ProcessorProcessor ValidationValidation

‹ Perform online checking to detect/rectify faults [Austin, Micro 1999]

Performance Correctness

Core EX/ speculative Checker MEM Instructions IF ID REN REG SCHEDULER CHK CT in-order with inputs and outputs [Source: Todd Austin, Univ. of Michigan] ‹ Key observations

Œ Correctness: checker implements the same abstract functional interface  Simple checker design enables high-quality functional verification

Œ Performance: leverage the core processor pre-execution to streamline checker  Cache pre-fetching and branch/value prediction

Œ Robustness: focus on reliability of the checker  Simpler, slower checker can be constructed with large time margin, large transistors 20 ValidatingValidating SimultaneousSimultaneous MultithreadingMultithreading

‹MotivationMotivation

Œ High practical application  General processing trend

Œ High potential for general lessons  Race conditions  Synchronization  Access to shared data  …

‹SoSo whatwhat dodo wewe needneed toto dodo beyondbeyond checkingchecking eacheach threadthread withwith aa checkerchecker processorprocessor……

Kaiyu Chen and Sharad Malik, “Runtime Validation of Multithreaded Processors,” Technical Report, Dept. of Electrical Engineering, Princeton University, May 2005. Available by email from [email protected] 21 MemoryMemory ConsistencyConsistency BetweenBetween ThreadsThreads

Thread 1 Thread 2 flag Shared Data 0 flag is initialized to 0 t&s reg, flag Thread - 1 1 t&s reg, flag lock: t&s reg, flag 1 bnz lock st flag, #0 Critical Section 0 t&s reg, flag SMT Processor unlock: st flag, #0

Time Error? 0 Thread - 2 t&s reg, flag 1 lock: t&s reg, flag st flag, #0 bnz lock 0 Critical Section t&s reg, flag 1 unlock: st flag, #0 st flag, #0 22 Checker Processors CorrectnessCorrectness ofof SynchronizationSynchronization InstructionsInstructions

Two semaphores sem1, sem2 are initialized to 0

Thread 1: Thread 2: ...... SEMV(sem1); SEMV(sem2); SEMP(sem2); SEMP(sem1); … …

Thread 1 Thread 2

SEMV What’s the correct semantic in checker SEMP processors? Time SEMV

SEMP Barrier 23 ForwardingForwarding ofof ErroneousErroneous ResultsResults

Thread 1: Thread 2: ...... if (flag == 0) key = data; i = i + 1; result = foo(key); data = i; … …

Thread - 1 Thread - 2

bne (resolved with correct prediction)

add (completed with error)

st (buffered in LSQ) Time (forwarded from LSQ)

ld Valid?

24 TheThe IntegratedIntegrated RuntimeRuntime ValidationValidation SolutionSolution

Hardware Runtime Synchronization Unit Monitoring Hardware Context Status Register

Per-thread retired instructions DIVA checker processor Retired Instructions dispatch DIVA checker processor SMT Processor

Register File Memory

Architected State25 ExperimentalExperimental ResultsResults

1.2

1.15 e m i

T 1.1 n io t u c e x 1.05 E d lize a 1 m r o N 0.95

0.9 FFT LU CHOLESKY BARNES FMM WATER- WATER- NSQUARED SPATIAL SPLASH-2 Benchmarks

Runtime Validation Configuration Fault Rate = 1/100 Fault Rate = 1/1K Fault Rate = 1/10K Fault Rate = 1/100K Fault Rate = 1/1M 26 MicroarchitecturalMicroarchitectural SupportSupport forfor ValidationValidation Uniprocessor correctness Fault Detection + Recovery Fault Detection DIVA AR-SMT, SRT, DUSD Fault Recovery RSE

Fingerprinting

Cherry Single thread domain

Multi-thread domain SafetyNet Cantin Meixner Sorin

Memory consistency Cache coherence27 MicroarchitecturalMicroarchitectural SupportSupport forfor ValidationValidation -- ReferencesReferences

‹ [DIVA] Todd Austin. “DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design”. ACM/IEEE 32nd Annual Symposium on Microarchitecture (Micro), 1999 ‹ [AR-SMT] E. Rotenberg. “AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors”. 29th Int’l Symposium on Fault-Tolerant Computing, June 1999. ‹ [SRT] S. K. Reinhardt and S. S. Mukherjee. “Transient Fault Detection via Simultaneous Multithreading”. 27th Annual Int'l Symposium on Architecture (ISCA), 2000 ‹ [DUSD] Joydeep Ray, James C. Hoe and Babak Falsafi. “Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery”. Proceedings of International Symposium on Microarchitecture (MICRO), December 2001 ‹ [RSE] Nithin Nakka, Jun Xu, Zbigniew Kalbarczyk and Ravishankar K. Iyer. “An Architectural Framework for Providing Reliability and Security Support”. IEEE Int’l Conf. on Dependable Systems and Networks (DSN), 2004 ‹ [Fingerprinting] Jared C. Smolens, Brian T. Gold, Jangwoo Kim, Babak Falsafi, James C. Hoe, and Andreas G. Nowatzyk. “Fingerprinting: Bounding Soft-Error Detection Latency and Bandwidth”. (ASPLOS), October 2004 ‹ [Cherry] J.F. Martínez, J. Renau, M.C. Huang, M. Prvulovic, and J. Torrellas. "Cherry: Checkpointed early resource recycling in out-of-order microprocessors". In Int’l Symposium on Microarchitecture (Micro), Nov. 2002 ‹ [Safetynet] Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill, and David A. Wood. “SafetyNet: Improving the Availability of Shared Memory Multiprocessors with Global Checkpoint/Recovery.” 29th Inter-national Symposium on Computer Architecture (ISCA), May 2002 ‹ [Meixner] Albert Meixner and Daniel J. Sorin. "Dynamic Verification of Sequential Consistency." 32nd Annual International Symposium on Computer Architecture (ISCA), June 2005. ‹ [Cantin] J. Cantin, M. Lipasti, J. E. Smith. “Dynamic Verification of Cache Coherence Protocols”. Workshop on Memory Performance issues, Gothenburg, Sweden, June 2001. ‹ [Sorin] Daniel J. Sorin, Mark D. Hill, and David A. Wood. “Dynamic Verification of End-to-End Multiprocessor 28 Invariants”. International Conference on Dependable Systems and Networks (DSN), June 2003. OutlineOutline

‹TheThe VerificationVerification Gap/CrisisGap/Crisis

‹DifferentDifferent FailureFailure ModesModes andand RuntimeRuntime ValidationValidation

‹MicroarchitecturalMicroarchitectural SolutionsSolutions usingusing RuntimeRuntime ValidationValidation

‹RuntimeRuntime PropertyProperty CheckingChecking

‹AA GeneralGeneral RTLRTL MethodologyMethodology

‹SummarySummary andand ConclusionsConclusions

29 AssertionAssertion BasedBased RuntimeRuntime ValidationValidation

‹ Terminology Œ Property: A quality or trait belonging to a design (based on specifications)  e.g. Any request should be acknowledged eventually Œ Assertion: Typically syntactical statement of a property that should hold  PSL: assert always (req → eventually! ack);  Assertions available from design/simulation/formal verification processes Œ Runtime Checker: Hardware responsible for validating a property or properties  Assertions can be synthesized [Abarbanel99] ‹ Runtime Validation based on Assertions/Specifications Req Œ Find/Write the assertions  E.g. G(Req →X Ack)

Ack !Req !Ack Œ Generate hardware models for error detectors based on those assertions Œ Implement recovery mechanism (design specific) Err  e.g. Invalidate all requests

30 AA ClassificationClassification ofof PropertiesProperties

‹ Based on time Œ Liveness: Good things will eventually happen  e.g. G ( Req → F ack)  Bounded liveness: fix time bound for when this will happen Œ Safety: Bad things should never happen  e.g. G ( Req → XX Ack) ‹ Based on spatial distribution Œ Local: All necessary information can be gathered easily (e.g. one clock cycle)  e.g. G ( req → X !req) Œ Distributed: Signals separated in space and difficult to gather  e.g. dual-ownership of cache lines in a shared memory multi-processor system ‹ Based on recovery requirement Œ Soft: Only control bits may be corrupted  e.g. deadlock situation, bad control state, data is safe Œ Hard: Both data and control bits may be corrupted  e.g. dual-ownership of cache lines in a shared memory multi-processor system

31 CompositionalCompositional ReasoningReasoning usingusing RuntimeRuntime ValidationValidation

‹ large state ‹Complementary relation between RV ??? space Runtime Validation (RV) and Model small state Checking (MC) RV, MC MC space Œ RV does not suffer from state space explosion Œ MC handles distributed properties local distributed easily properties properties ‹Basic Idea Œ Runtime Validate some property R ‹ When does this help? Œ Use it in off-line model checking of property P Œ MC does not “complete” checking Œ Proving P with assumption R using P without assumption R MC guarantees (R → P) at runtime Œ MC does not “complete” checking Œ As long as R holds at runtime, P holds R, i.e. RV is necessary for at runtime validating R 32 UsingUsing RuntimeRuntime AbstractersAbstracters

‹ Runtime Abstraction: Using abstraction in model checking and validating correctness of abstraction at runtime. Œ Allows for wider range of abstractions (e.g. property specific and not just parametric)

A A C A

B’ B B B’

Design Checker C checks Model Checking Abstracter B’ at runtime uses Abstracter B’ 33 ExperimentalExperimental ResultsResults

Model Checking Single-cluster CCP Model Checking Multi-cluster TokenShare

Proccessor with without Checking Count Assumptions Assumptions Assumptions Unit # Safety/A Safety Check A Liveness/A Liveness

4 0.08 0.51 9.87 4 2.03 4.45 27.11 67.19 385.72

6 0.33 8.94 143.57 6 8.16 14.84 105.47 294.94 85428.15

8 0.61 29.06 348.95 8 50.12 74.56 448.99 1840.63 TIME-OUT

10 0.8 68.86 1606.61 10 66.09 57.02 448.32 2007.45 TIME-OUT

12 5.13 141.33 8008.13 12 133.82 82.88 723.48 4212.76 TIME-OUT

14 3.07 183.51 145989.05 14 285.83 120.39 1110.78 9053.72 TIME-OUT 16 13.2 656.41 TIME-OUT 16 408.03 158.98 1423.31 13810.5 TIME-OUT

Results for Using Runtime Assumptions Time unit is second. TIME-OUT is 2 days.

•Runtime abstraction: Replacing one 16-unit cluster with the runtime abstracter drops verification time to 56.2 seconds from 408 seconds in TokenShare •HDL implementation of CCP: 1600 lines of Veriloglog code. No memory overhead for safety checkers. Bounded livenessliveness checkers use 10-bit counter for 4-cluster, 16-processor configuratiguration.ion.

Ali Bayazit and Sharad Malik, “Complementary Use of Runtime Validation and Model Checking,” 34 in ICCAD’05: Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2005. OutlineOutline

‹TheThe VerificationVerification Gap/CrisisGap/Crisis

‹DifferentDifferent FailureFailure ModesModes andand RuntimeRuntime ValidationValidation

‹MicroarchitecturalMicroarchitectural SolutionsSolutions usingusing RuntimeRuntime ValidationValidation

‹RuntimeRuntime PropertyProperty CheckingChecking

‹AA GeneralGeneral RTLRTL MethodologyMethodology

‹SummarySummary andand ConclusionsConclusions

35 GeneralGeneral DesignDesign withwith IntegratedIntegrated RuntimeRuntime ValidationValidation

‹ Conceptually, runtime validated systems have 3 essential components Œ Primary implementation of design ( D ) Œ Runtime checkers ( C ) Œ Design-specific runtime error recovery ( R ) ‹ Usually, the designer has to worry about interactions between D, C, and R and ensure that: Œ Checkers and recovery are “silent” under normal operation Œ Design halts when an error is detected Œ Recovery kicks in safely when an error gets detected Œ Recovery occurs correctly Œ Design continues execution safely post-recovery ‹ Given D, C, R can we relieve the designer of correctly implementing their interactions?

36 AA ProposalProposal forfor anan RTLRTL MethodologyMethodology SolutionSolution

‹ Clearly separate D, C and R in specification ‹ Language semantics should enforce useful properties of interactions between D, C and R Œ Checkers and recovery are “silent” under normal operation Œ Design halts when an error is detected Œ Recovery kicks in safely when an error gets detected Œ Recovery occurs correctly Œ Design continues execution safely post-recovery ‹ Leave actual implementation of the interactions between D, C and R to synthesis Œ use a generic hardware implementation template that respects these semantics ‹ Additional checking/recovery specific language features Œ e.g. check-pointed register data type for rollback and recovery, implemented using automated checkpointing

37 ExampleExample

‹ A possible specification couldcould looklook likelike thisthis

{design D usual HDL description of design }while {checker C monitor property at runtime }else {recovery R recovery procedure }

38 ExampleExample (contd.)(contd.)

General Design Template

‹ C and D operate in parallel

‹ C can examine D’’s state

‹ On detection of error, D gets stalled, and R is triggered

‹ R permitted to change D’’s state to perform recovery

‹ D continues operation after recovery

39 BackwardBackward ErrorError RecoveryRecovery

‹ Checkpoint the state regularly ‹ In case of error detected by C Œ C stalls D Œ D is rolled back using checkpointed state Œ R performs the specified recovery

checkpoint, compute compute compute checkpoint, …

compute check computation no stall OK needed

! compute check computation stall rollback, issued recovery ‹ Stall can arrive after multiple computation cycles ‹ Pipeline for throughput

40 ForwardForward ErrorError RecoveryRecovery

‹ State in D only committed after checker approves Œ Checker “enables” or “stalls” design ‹ In case of error detected by C Œ C does not enable state update of D Œ R is triggered to perform prescribed recovery

compute compute

compute check computation no stall state update OK needed

! compute check computation stall issued

recovery

‹ Stall arrives before the next computation cycle ‹ Pipeline for throughput 41 SynthesisSynthesis TasksTasks

‹CoordinateCoordinate actionsactions ofof D,D, CC andand RR

‹GuaranteeGuarantee thethe timingtiming aspectsaspects ofof thethe recoveryrecovery semanticssemantics

‹IdentifyIdentify partsparts ofof DD thatthat needneed toto bebe stalledstalled duringduring recoveryrecovery

‹MakeMake RR electricallyelectrically robustrobust andand lessless susceptiblesusceptible toto processprocess variationsvariations andand noisenoise (compared(compared toto D)D)

‹HelpHelp withwith generatinggenerating checker/recoverychecker/recovery circuits?circuits?

42 AdvantagesAdvantages

‹SeparationSeparation betweenbetween thethe blocksblocks byby intentintent makesmakes thethe checkingchecking andand recoveryrecovery easyeasy toto reasonreason aboutabout

ΠImplementation guarantees language semantics

ΠGeneric reusable implementation template

‹ToolsTools cancan handlehandle differentdifferent blocksblocks differentlydifferently

‹ EasierEasier toto maintainmaintain thethe designdesign

43 OutlineOutline

‹TheThe VerificationVerification Gap/CrisisGap/Crisis

‹DifferentDifferent FailureFailure ModesModes andand RuntimeRuntime ValidationValidation

‹MicroarchitecturalMicroarchitectural SolutionsSolutions usingusing RuntimeRuntime ValidationValidation

‹RuntimeRuntime PropertyProperty CheckingChecking

‹AA GeneralGeneral RTLRTL MethodologyMethodology

‹SummarySummary andand ConclusionsConclusions

44 SummarySummary andand ConclusionsConclusions

‹ Design complexity increasing exponentially faster than our ability to handle it Œ Increasing cost of verification Œ Increasing bug escapes ‹ Runtime Validation inevitable for increasing operational failures Œ Consider functional failures in the same framework ‹ Runtime Validation as an insurance policy for functional failures Œ Learning to live with bug escapes ‹ Already being considered for specific instances ‹ Possible use in property checking ‹ Can we bring this into general RTL design? Œ Clean separation of design, checking and recovery through language semantics and synthesis support

45