AA CaseCase forfor thethe RuntimeRuntime ValidationValidation ofof HardwareHardware
Sharad Malik Princeton University IBM Verification Conference, Haifa 13 November, 2005
Research supported by the Microelectronics Advanced Research Consortium (MARCO) through the Gigascale Systems Research Center (GSRC) OutlineOutline
TheThe VerificationVerification Gap/CrisisGap/Crisis
DifferentDifferent FailureFailure ModesModes andand RuntimeRuntime ValidationValidation
MicroarchitecturalMicroarchitectural SolutionsSolutions usingusing RuntimeRuntime ValidationValidation
RuntimeRuntime PropertyProperty CheckingChecking
AA GeneralGeneral RTLRTL MethodologyMethodology
SummarySummary andand ConclusionsConclusions
2 RateRate ofof IncreaseIncrease ofof DesignDesign ComplexityComplexity
Moore’s Law: Growth rate of transistors/chip is exponential
Corollary 1: Growth rate of state bits/chip is exponential
Corollary 2: Growth rate of state space (proxy for complexity) is doubly exponential But…
Corollary 3: Growth rate of compute power is exponential Thus…
Growth rate of complexity is exponential relative to our ability to deal with it
3 ButBut whatwhat aboutabout……
TheThe factfact thatthat notnot allall statestate bitsbits contributecontribute equallyequally toto complexitycomplexity (e.g.(e.g. memory)memory)
TheThe factfact thatthat designdesign reusereuse enablesenables useuse ofof prepre--verifiedverified IPIP SoSo maybemaybe itsits notnot thatthat badbad……
4 TheThe VerificationVerification GapGap
1,000,000,000 1.E+05
GeForceFXPowerIV 100,000,000 Radeon97 Prescott PentiumIV Crusoe GeForceIV
K7 GeforceII c
10,000,000 se PentiumPro Riva TNT2 / s s PentiumII e or
Pentium cl st
i Nvidia NV2 i486
1,000,000 m.cy ans y r t s i386 Verification gap i286 Verification gap 100,000
10,000 1.E+04 1980 1985 1990 1995 2000 2005 design size simulation speed
Source: Valeria Bertacco, Univ. of Michigan
5 TheThe VerificationVerification GapGap
© EETimes 03/18/2004
Orginal source : SIA 6 DataData fromfrom MicroprocessorMicroprocessor VerificationVerification
FunctionalFunctional validationvalidation isis aa majormajor bottleneckbottleneck
Deeply pipelined complex microarchitectures
Pre-silicon logic bugs per generation 25000 ( Source: Tom Schubert, Intel, DAC 2003 )
7855
800 2240
Pentium Pentium Pro Pentium 4 Next ?
LogicLogic bugsbugs increaseincrease atat 33--44 times/generationtimes/generation
Bugs increase is exponential over time
7 HighHigh CostCost ofof VerificationVerification
2000 2007 1000B
200 2001 10B
20 1995 100M Engineer Years Simulation Vectors
1M 10M 100M
Logic Gates Source: Synopsys
Data from SoC Designs 8 Source: G. Spirakis, keynote address at DATE 2004 YetYet IncreasingIncreasing NumberNumber ofof BugBug EscapesEscapes……
Number of spins in IC Designs 60% 50% 2000 71% of SoC re-spins 2002 Designs 40% are due to logic bugs 30%
20%
Percent of 10%
0% 1234≥5 Spin Count Source: Colett International Research
HighHigh costcost ofof rere--spinsspins
Increasing mask costs ($5M and increasing)
CostCost ofof recallsrecalls eveneven higherhigher
Pentium FDIV Bug Recall – several million $ 9 TimeTime toto MarketMarket CostsCosts
DelaysDelays inin rere--spinsspins ⇒⇒ LostLost timetime toto marketmarket
Strong pressure for “first time correct” silicon
Chips verified to death
10 RuntimeRuntime ValidationValidation
IncreasinglyIncreasingly needneed toto reconcilereconcile ourselvesourselves toto thethe factfact thatthat hardwarehardware likelike softwaresoftware willwill bebe shippedshipped withwith bugsbugs
RuntimeRuntime verificationverification (through(through errorerror detectiondetection andand recovery)recovery) offersoffers aa potentiallypotentially scalablescalable solutionsolution
Provide robustness in the face of inevitable bug escapes
SignificantlySignificantly reducereduce verificationverification costscosts
Verify chips “to life” rather than “to death”
11 OutlineOutline
TheThe VerificationVerification Gap/CrisisGap/Crisis
DifferentDifferent FailureFailure ModesModes andand RuntimeRuntime ValidationValidation
MicroarchitecturalMicroarchitectural SolutionsSolutions usingusing RuntimeRuntime ValidationValidation
RuntimeRuntime PropertyProperty CheckingChecking
AA GeneralGeneral RTLRTL MethodologyMethodology
SummarySummary andand ConclusionsConclusions
12 SourcesSources ofof ErrorsErrors
Process/Operating Condition Variations, Aging Device failure Timing Failure Soft Errors Operational Aggressive Deployment Aggressive Deployment failures Low Vdd Overclocking Push the design to the edge so as to result in device/timing failures Manufacturing Defects Functional Design Complexity failures 13 IncreasingIncreasing ProcessProcess VariationsVariations
Original Source: Sani Nassif IBM
Percentage of total variation accounted for by within- die variation(device and interconnect)
Increase in variation of process parameters over generations Worst-case design getting more expensive Worst-case design getting more expensive 14 “Better than worst-case” design must be error tolerant OtherOther VariationsVariations
Heat Flux (W/cm2) Temperature Variation (°C) Results in Vcc variation Results in Hot spots
250 110
) 100 200 2 m 90 c /
150 W 80 ( 70 100 ux
Fl 60 t a 50 Temperature (C)
e 50
H 40 0
Random Dopant Fluctuations 10000
t n a p
o 1000 D s f m o o er At b 100 m u N ean
M 10 1000 500 250 130 65 32 Technology Node (nm) Source: Shekhar Borkar, Intel15 SoftSoft--ErrorError TrendsTrends
1.0E+04 [P. Shivkumar et al., 1.0E+03 DSN 2002]
1.0E+02 ) p i
h 1.0E+01 c / T
I 1.0E+00
(F 1.0E-01 te a SRAM R 1.0E-02 r
o latch 6 FO4s r
r 1.0E-03 logic 6 FO4s
ft E 1.0E-04 o
S 1.0E-05
1.0E-06
1.0E-07 600nm 350nm 250nm 180nm 130nm 100nm 70nm 50nm 1992 1994 1997 1999 2002 2005 2008 2011 Technology Generation SER per chip of logic circuits • Nine orders of magnitude increase from 600 nm to 50 nm
• Dominant source of soft errors after 50 nm 16 ClassificationClassification ofof ErrorError ModesModes
Two axes of classifications Temporal (same instance, different invocations) Deterministic Design Errors Spatial (different instances, possibly on the same chip) Spatial Runtime error detection and correction applicable to all – Probabilistic Soft Errors Manufacturing however error mode Process/OC Variations Defects determines which class of Aggressive Deployment techniques can be applied: Transient Permanent Replication works for probabilistic errors, but not deterministic errors. Temporal Repeating computation works for transient errors but not Design errors dominate along both axes permanent ones. 17 RuntimeRuntime Validation:Validation: QuickQuick AnalysisAnalysis
Pros Do not have to deal with all possible behavior, only the actual behavior No state explosion problem Checking circuits only need to recognize the symptoms when the bug is exercised, not its cause, e.g. Check the results of division using multiplication Check if a bus is deadlocked Recovery does not need to fix the bug, rather only ensure forward progress Cons Recovery likely to be hard Need a trusted alternate computation path Tradeoff performance for simplicity • Simpler slower divider • Simpler slower protocol Additional complexity Who checks the checker? Performance, power, area overhead
18 OutlineOutline
TheThe VerificationVerification Gap/CrisisGap/Crisis
DifferentDifferent FailureFailure ModesModes andand RuntimeRuntime ValidationValidation
MicroarchitecturalMicroarchitectural SolutionsSolutions usingusing RuntimeRuntime ValidationValidation
RuntimeRuntime PropertyProperty CheckingChecking
AA GeneralGeneral RTLRTL MethodologyMethodology
SummarySummary andand ConclusionsConclusions
19 InstructionInstruction SetSet ProcessorProcessor ValidationValidation
Perform online checking to detect/rectify faults [Austin, Micro 1999]
Performance Correctness
Core EX/ speculative Checker MEM Instructions IF ID REN REG SCHEDULER CHK CT in-order with inputs and outputs [Source: Todd Austin, Univ. of Michigan] Key observations
Correctness: checker implements the same abstract functional interface Simple checker design enables high-quality functional verification
Performance: leverage the core processor pre-execution to streamline checker Cache pre-fetching and branch/value prediction
Robustness: focus on reliability of the checker Simpler, slower checker can be constructed with large time margin, large transistors 20 ValidatingValidating SimultaneousSimultaneous MultithreadingMultithreading
MotivationMotivation
High practical application General processing trend
High potential for general lessons Race conditions Synchronization Access to shared data …
SoSo whatwhat dodo wewe needneed toto dodo beyondbeyond checkingchecking eacheach threadthread withwith aa checkerchecker processorprocessor……
Kaiyu Chen and Sharad Malik, “Runtime Validation of Multithreaded Processors,” Technical Report, Dept. of Electrical Engineering, Princeton University, May 2005. Available by email from [email protected] 21 MemoryMemory ConsistencyConsistency BetweenBetween ThreadsThreads
Thread 1 Thread 2 flag Shared Data 0 flag is initialized to 0 t&s reg, flag Thread - 1 1 t&s reg, flag lock: t&s reg, flag 1 bnz lock st flag, #0 Critical Section 0 t&s reg, flag SMT Processor unlock: st flag, #0
Time Error? 0 Thread - 2 t&s reg, flag 1 lock: t&s reg, flag st flag, #0 bnz lock 0 Critical Section t&s reg, flag 1 unlock: st flag, #0 st flag, #0 22 Checker Processors CorrectnessCorrectness ofof SynchronizationSynchronization InstructionsInstructions
Two semaphores sem1, sem2 are initialized to 0
Thread 1: Thread 2: ...... SEMV(sem1); SEMV(sem2); SEMP(sem2); SEMP(sem1); … …
Thread 1 Thread 2
SEMV What’s the correct semantic in checker SEMP processors? Time SEMV
SEMP Barrier 23 ForwardingForwarding ofof ErroneousErroneous ResultsResults
Thread 1: Thread 2: ...... if (flag == 0) key = data; i = i + 1; result = foo(key); data = i; … …
Thread - 1 Thread - 2
bne (resolved with correct prediction)
add (completed with error)
st (buffered in LSQ) Time (forwarded from LSQ)
ld Valid?
24 TheThe IntegratedIntegrated RuntimeRuntime ValidationValidation SolutionSolution
Hardware Runtime Synchronization Unit Monitoring Hardware Context Status Register
Per-thread retired instructions DIVA checker processor Retired Instructions dispatch DIVA checker processor SMT Processor
Register File Memory
Architected State25 ExperimentalExperimental ResultsResults
1.2
1.15 e m i
T 1.1 n io t u c e x 1.05 E d lize a 1 m r o N 0.95
0.9 FFT LU CHOLESKY BARNES FMM WATER- WATER- NSQUARED SPATIAL SPLASH-2 Benchmarks
Runtime Validation Configuration Fault Rate = 1/100 Fault Rate = 1/1K Fault Rate = 1/10K Fault Rate = 1/100K Fault Rate = 1/1M 26 MicroarchitecturalMicroarchitectural SupportSupport forfor ValidationValidation Uniprocessor correctness Fault Detection + Recovery Fault Detection DIVA AR-SMT, SRT, DUSD Fault Recovery RSE
Fingerprinting
Cherry Single thread domain
Multi-thread domain SafetyNet Cantin Meixner Sorin
Memory consistency Cache coherence27 MicroarchitecturalMicroarchitectural SupportSupport forfor ValidationValidation -- ReferencesReferences
[DIVA] Todd Austin. “DIVA: A Reliable Substrate for Deep Submicron Microarchitecture Design”. ACM/IEEE 32nd Annual Symposium on Microarchitecture (Micro), 1999 [AR-SMT] E. Rotenberg. “AR-SMT: A Microarchitectural Approach to Fault Tolerance in Microprocessors”. 29th Int’l Symposium on Fault-Tolerant Computing, June 1999. [SRT] S. K. Reinhardt and S. S. Mukherjee. “Transient Fault Detection via Simultaneous Multithreading”. 27th Annual Int'l Symposium on Computer Architecture (ISCA), 2000 [DUSD] Joydeep Ray, James C. Hoe and Babak Falsafi. “Dual Use of Superscalar Datapath for Transient-Fault Detection and Recovery”. Proceedings of International Symposium on Microarchitecture (MICRO), December 2001 [RSE] Nithin Nakka, Jun Xu, Zbigniew Kalbarczyk and Ravishankar K. Iyer. “An Architectural Framework for Providing Reliability and Security Support”. IEEE Int’l Conf. on Dependable Systems and Networks (DSN), 2004 [Fingerprinting] Jared C. Smolens, Brian T. Gold, Jangwoo Kim, Babak Falsafi, James C. Hoe, and Andreas G. Nowatzyk. “Fingerprinting: Bounding Soft-Error Detection Latency and Bandwidth”. (ASPLOS), October 2004 [Cherry] J.F. Martínez, J. Renau, M.C. Huang, M. Prvulovic, and J. Torrellas. "Cherry: Checkpointed early resource recycling in out-of-order microprocessors". In Int’l Symposium on Microarchitecture (Micro), Nov. 2002 [Safetynet] Daniel J. Sorin, Milo M. K. Martin, Mark D. Hill, and David A. Wood. “SafetyNet: Improving the Availability of Shared Memory Multiprocessors with Global Checkpoint/Recovery.” 29th Inter-national Symposium on Computer Architecture (ISCA), May 2002 [Meixner] Albert Meixner and Daniel J. Sorin. "Dynamic Verification of Sequential Consistency." 32nd Annual International Symposium on Computer Architecture (ISCA), June 2005. [Cantin] J. Cantin, M. Lipasti, J. E. Smith. “Dynamic Verification of Cache Coherence Protocols”. Workshop on Memory Performance issues, Gothenburg, Sweden, June 2001. [Sorin] Daniel J. Sorin, Mark D. Hill, and David A. Wood. “Dynamic Verification of End-to-End Multiprocessor 28 Invariants”. International Conference on Dependable Systems and Networks (DSN), June 2003. OutlineOutline
TheThe VerificationVerification Gap/CrisisGap/Crisis
DifferentDifferent FailureFailure ModesModes andand RuntimeRuntime ValidationValidation
MicroarchitecturalMicroarchitectural SolutionsSolutions usingusing RuntimeRuntime ValidationValidation
RuntimeRuntime PropertyProperty CheckingChecking
AA GeneralGeneral RTLRTL MethodologyMethodology
SummarySummary andand ConclusionsConclusions
29 AssertionAssertion BasedBased RuntimeRuntime ValidationValidation
Terminology Property: A quality or trait belonging to a design (based on specifications) e.g. Any request should be acknowledged eventually Assertion: Typically syntactical statement of a property that should hold PSL: assert always (req → eventually! ack); Assertions available from design/simulation/formal verification processes Runtime Checker: Hardware responsible for validating a property or properties Assertions can be synthesized [Abarbanel99] Runtime Validation based on Assertions/Specifications Req Find/Write the assertions E.g. G(Req →X Ack)
Ack !Req !Ack Generate hardware models for error detectors based on those assertions Implement recovery mechanism (design specific) Err e.g. Invalidate all requests
30 AA ClassificationClassification ofof PropertiesProperties
Based on time Liveness: Good things will eventually happen e.g. G ( Req → F ack) Bounded liveness: fix time bound for when this will happen Safety: Bad things should never happen e.g. G ( Req → XX Ack) Based on spatial distribution Local: All necessary information can be gathered easily (e.g. one clock cycle) e.g. G ( req → X !req) Distributed: Signals separated in space and difficult to gather e.g. dual-ownership of cache lines in a shared memory multi-processor system Based on recovery requirement Soft: Only control bits may be corrupted e.g. deadlock situation, bad control state, data is safe Hard: Both data and control bits may be corrupted e.g. dual-ownership of cache lines in a shared memory multi-processor system
31 CompositionalCompositional ReasoningReasoning usingusing RuntimeRuntime ValidationValidation
large state Complementary relation between RV ??? space Runtime Validation (RV) and Model small state Checking (MC) RV, MC MC space RV does not suffer from state space explosion MC handles distributed properties local distributed easily properties properties Basic Idea Runtime Validate some property R When does this help? Use it in off-line model checking of property P MC does not “complete” checking Proving P with assumption R using P without assumption R MC guarantees (R → P) at runtime MC does not “complete” checking As long as R holds at runtime, P holds R, i.e. RV is necessary for at runtime validating R 32 UsingUsing RuntimeRuntime AbstractersAbstracters
Runtime Abstraction: Using abstraction in model checking and validating correctness of abstraction at runtime. Allows for wider range of abstractions (e.g. property specific and not just parametric)
A A C A
B’ B B B’
Design Checker C checks Model Checking Abstracter B’ at runtime uses Abstracter B’ 33 ExperimentalExperimental ResultsResults
Model Checking Single-cluster CCP Model Checking Multi-cluster TokenShare
Proccessor with without Checking Count Assumptions Assumptions Assumptions Unit # Safety/A Safety Check A Liveness/A Liveness
4 0.08 0.51 9.87 4 2.03 4.45 27.11 67.19 385.72
6 0.33 8.94 143.57 6 8.16 14.84 105.47 294.94 85428.15
8 0.61 29.06 348.95 8 50.12 74.56 448.99 1840.63 TIME-OUT
10 0.8 68.86 1606.61 10 66.09 57.02 448.32 2007.45 TIME-OUT
12 5.13 141.33 8008.13 12 133.82 82.88 723.48 4212.76 TIME-OUT
14 3.07 183.51 145989.05 14 285.83 120.39 1110.78 9053.72 TIME-OUT 16 13.2 656.41 TIME-OUT 16 408.03 158.98 1423.31 13810.5 TIME-OUT
Results for Using Runtime Assumptions Time unit is second. TIME-OUT is 2 days.
•Runtime abstraction: Replacing one 16-unit cluster with the runtime abstracter drops verification time to 56.2 seconds from 408 seconds in TokenShare •HDL implementation of CCP: 1600 lines of Veriloglog code. No memory overhead for safety checkers. Bounded livenessliveness checkers use 10-bit counter for 4-cluster, 16-processor configuratiguration.ion.
Ali Bayazit and Sharad Malik, “Complementary Use of Runtime Validation and Model Checking,” 34 in ICCAD’05: Proceedings of the IEEE/ACM International Conference on Computer-Aided Design, 2005. OutlineOutline
TheThe VerificationVerification Gap/CrisisGap/Crisis
DifferentDifferent FailureFailure ModesModes andand RuntimeRuntime ValidationValidation
MicroarchitecturalMicroarchitectural SolutionsSolutions usingusing RuntimeRuntime ValidationValidation
RuntimeRuntime PropertyProperty CheckingChecking
AA GeneralGeneral RTLRTL MethodologyMethodology
SummarySummary andand ConclusionsConclusions
35 GeneralGeneral DesignDesign withwith IntegratedIntegrated RuntimeRuntime ValidationValidation
Conceptually, runtime validated systems have 3 essential components Primary implementation of design ( D ) Runtime checkers ( C ) Design-specific runtime error recovery ( R ) Usually, the designer has to worry about interactions between D, C, and R and ensure that: Checkers and recovery are “silent” under normal operation Design halts when an error is detected Recovery kicks in safely when an error gets detected Recovery occurs correctly Design continues execution safely post-recovery Given D, C, R can we relieve the designer of correctly implementing their interactions?
36 AA ProposalProposal forfor anan RTLRTL MethodologyMethodology SolutionSolution
Clearly separate D, C and R in specification Language semantics should enforce useful properties of interactions between D, C and R Checkers and recovery are “silent” under normal operation Design halts when an error is detected Recovery kicks in safely when an error gets detected Recovery occurs correctly Design continues execution safely post-recovery Leave actual implementation of the interactions between D, C and R to synthesis use a generic hardware implementation template that respects these semantics Additional checking/recovery specific language features e.g. check-pointed register data type for rollback and recovery, implemented using automated checkpointing
37 ExampleExample
A possible specification couldcould looklook likelike thisthis
{design D usual HDL description of design }while {checker C monitor property at runtime }else {recovery R recovery procedure }
38 ExampleExample (contd.)(contd.)
General Design Template
C and D operate in parallel
C can examine D’’s state
On detection of error, D gets stalled, and R is triggered
R permitted to change D’’s state to perform recovery
D continues operation after recovery
39 BackwardBackward ErrorError RecoveryRecovery
Checkpoint the state regularly In case of error detected by C C stalls D D is rolled back using checkpointed state R performs the specified recovery
checkpoint, compute compute compute checkpoint, …
compute check computation no stall OK needed
! compute check computation stall rollback, issued recovery Stall can arrive after multiple computation cycles Pipeline for throughput
40 ForwardForward ErrorError RecoveryRecovery
State in D only committed after checker approves Checker “enables” or “stalls” design In case of error detected by C C does not enable state update of D R is triggered to perform prescribed recovery
compute compute
compute check computation no stall state update OK needed
! compute check computation stall issued
recovery
Stall arrives before the next computation cycle Pipeline for throughput 41 SynthesisSynthesis TasksTasks
CoordinateCoordinate actionsactions ofof D,D, CC andand RR
GuaranteeGuarantee thethe timingtiming aspectsaspects ofof thethe recoveryrecovery semanticssemantics
IdentifyIdentify partsparts ofof DD thatthat needneed toto bebe stalledstalled duringduring recoveryrecovery
MakeMake RR electricallyelectrically robustrobust andand lessless susceptiblesusceptible toto processprocess variationsvariations andand noisenoise (compared(compared toto D)D)
HelpHelp withwith generatinggenerating checker/recoverychecker/recovery circuits?circuits?
42 AdvantagesAdvantages
SeparationSeparation betweenbetween thethe blocksblocks byby intentintent makesmakes thethe checkingchecking andand recoveryrecovery easyeasy toto reasonreason aboutabout
Implementation guarantees language semantics
Generic reusable implementation template
ToolsTools cancan handlehandle differentdifferent blocksblocks differentlydifferently
EasierEasier toto maintainmaintain thethe designdesign
43 OutlineOutline
TheThe VerificationVerification Gap/CrisisGap/Crisis
DifferentDifferent FailureFailure ModesModes andand RuntimeRuntime ValidationValidation
MicroarchitecturalMicroarchitectural SolutionsSolutions usingusing RuntimeRuntime ValidationValidation
RuntimeRuntime PropertyProperty CheckingChecking
AA GeneralGeneral RTLRTL MethodologyMethodology
SummarySummary andand ConclusionsConclusions
44 SummarySummary andand ConclusionsConclusions
Design complexity increasing exponentially faster than our ability to handle it Increasing cost of verification Increasing bug escapes Runtime Validation inevitable for increasing operational failures Consider functional failures in the same framework Runtime Validation as an insurance policy for functional failures Learning to live with bug escapes Already being considered for specific instances Possible use in property checking Can we bring this into general RTL design? Clean separation of design, checking and recovery through language semantics and synthesis support
45