Mitigating the Dangers of Multi-Clock Designs
David Landoll Applications Architect Mentor Graphics Corp.
National Software and Complex Electronic Hardware Conference August 20-21, 2008 Denver Mentor Graphics Around the World
R&D Sites Sales Offices
4400 employees, 44 sales offices, 28 R&D locations, ~60% R&D outside NA
2008 National SW & CEH Conference 2 Mentor Confidential Who is Mentor Graphics?
2007 revenues $880M
Projecting ~$915M for 2008
Focus on growth through internal technology development — R&D 29% of total revenues, invested >$1.5B since 1996
Product portfolio that includes numerous market leaders — Calibre, Questa, 0-In, Expedition, Catapult, TestKompress…
Acquisitions that build on leadership positions
2008 National SW & CEH Conference 3 Mentor Confidential MentorMentor’’ss GraphicsGraphics ServesServes thethe MilMil--AeroAero MarketMarket Cable/Harness Systems Embedded SW •EDGE Developer Suite •Capital platform •Nucleus Secure RTOS •Veysys family •Inflexion Application Platform •TransDesign
PCB Design
FPGA ASIC •HDL Designer • C synthesis •Precision synthesis • Questa adv verification platform •Board station •ModelSim simulator • Veloce HW-assisted verification •Expedition Enterprise flow •Questa Adv. Verification • Seamless HW/SW co-design •PADS • Advanced M/S mixed signal design •Data Management System • Physical Implementation • Manufacturing tests 2008 National SW & CEH Conference 4 Mentor Confidential Today’s FPGAs
HDLC0 HDLC1 CAN0 CAN1HADDR RS422 SpaceWire Up/Downlink Up/Downlink Arbiter HWDATA Slave #1 HADDR HRDATA
Onboard Computer Master #1 HWDATA HRDATA HADDR Dual CAN System on Chip / CAN HWDATA Address and transceiver Slave #2 Switch HRDATA FPGA HADDR Control Mux
Master #2 HWDATA HRDATA DMA UART SpaceWire HDLC HDLC CAN Write data mux HADDR DMA UART ControllerController Controller Controller Controller ControllerRead data mux HWDATA Slave #3 HRDATA
AMBA APB Decoder AMBA AHB
Memory Boot Timers IRQCtrl I/O port DesignLeon CPU 2: AMBAAHB/APBAMBA AHB Interconnect PROM Controller CPU BridgeAHB
Design 1: DMA ControllerAddress/Control bus FPU
32bit Data bus EDAC Controller +3.3V +1.5V
1.5V Linear SDRAM SDRAM Configuration CLK Regulator PROM Generator Data Memory Parity Memory
• FabricationPPS in advancesPPS out provideJTAG more available+3.3V silicon area • More functionality can weigh less and take up less space Design 3: CPU • Integrating/reusing capabilitiesDesign lowers 4:cost UART
2008 National SW & CEH Conference 5 Mentor Confidential Avionic Integration – New challenges
Flight Management Weather Radar Flight Control & Avoidance Systems
Integrated Avionics Maintenance Processing Diagnostics
Boeing 787: Integration’s Next Step From its central processor to its common data network, surveillance system and navigation system, the theme of the Boeing 787 Dreamliner is integration. James W. Ramsey Communications Image Processing 2008 National SW & CEH Conference 6 Mentor Confidential Today’s Avionics…
Today’s CEH challenges: ■ Complex systems ■ Complex intercommunication ■ New ARINC serial communication ■ Contain “Multi-clock designs”
All Multi-clock designs have CDCs ■ So, what are “CDCs”? ■ Why should you care?
2008 National SW & CEH Conference 7 Mentor Confidential Clock Domain Crossing Errors Unpredictable Loss of Data ■ CDC problems
— corrupt control and data signals
— are subtle, intermittent, unpredictable nd — are the 2 major cause of respins
— are difficult to reproduce and debug
— are temperature, voltage, and process sensitive
— will only occur in hardware; often in the final design
■ Traditional verification techniques do not work for CDC signals AA CDCCDC VerificationVerification methodologymethodology isis neededneeded toto reducereduce thethe riskrisk ofof CDCCDC relatedrelated datadata errorserrors
2008 National SW & CEH Conference 8 Mentor Confidential FPGA/ASIC Verification Trends
IC/ASIC Designs Requiring Re-Spins by Type of Flaw
Logic/Functional 71% 75% Clocking ■ 50% of ASICS require Tuning Analog Circuit Fast Path more than one spin Yield/Reliability Delays/Glitches ■ Principle contributors: Slow Path Mixed-Signal Interface ■ Functional bugs Power Consumption ■ Clocking related bugs IR Drops Firmware Market Study 2002 [Collet 2005] Other Market Study 2004
0% 20% 40% 60% 80% 100% Percent of Designs Requiring Two or More Silicon Spins Effort Allocation of Dedicated Verification Engineers by Type of Activity
■ Traditional methods failing to catch bugs 40% ■ Debugging becomes the main bottleneck 60% ■ Usually indeterminate amount of time
Verification Debug Testbench Development Source: 2004/2002 IC/ASIC Functional VerificationVerification Study, CollettCollett International Research, Used with Permission
2008 National SW & CEH Conference 9 Mentor Confidential Agenda
¾ Digital Clocks, Registers, Metastability Explained ¾ Why are Clock Domain Crossings Dangerous? ¾ Where are CDCs likely on today’s CEH? ¾ Mitigating the Clock Domain Crossing issue ¾ Verifying the mitigation was performed correctly ¾ Recommendations
CDC = Clock Domain Crossing
2008 National SW & CEH Conference 10 Mentor Confidential Metastability What the heck is it, anyway?
■ What is a clock? Vcc, Vdd : +5V, +3.3V — Periodic pulsing signal Vee, Vss: GND: 0V — Digital logic uniformly connected to this signal
— Acts as the Symphony Conductor – keeps logic in sync
— Action happens across the logic at one specific point ■ Typically the “rising edge”
2008 National SW & CEH Conference 11 Mentor Confidential Metastability What the heck is it, anyway?
■ What’s in a register?
— (Also known as a latch, flip-flop, etc)
— Contain transistors that “trap” the input value at the appropriate time ■ E.g. rising edge of the clock
— How does this happen?
2008 National SW & CEH Conference 12 Mentor Confidential Metastability The Physics of a Register
■ Let’s take a look at a register -- simple D-type flip-flop — CMOS D-type transmission gate flipflop process(CLK) begin 0 D Q 0 if rising_edge(CLK) then Q <= D; 0 CLK end if; end process;
CLK
D Q
Transistor Model of a D FlipFlop
2008 National SW & CEH Conference 13 Mentor Confidential Metastability The Physics of a Register
■ Let’s take a look at a register -- simple D-type flip-flop — CMOS D-type transmission gate flipflop process(CLK) begin 1 D Q 0 if rising_edge(CLK) then Q <= D; 0 CLK end if; end process;
CLK
D Q
Transistor Model of a D FlipFlop
2008 National SW & CEH Conference 14 Mentor Confidential Metastability The Physics of a Register
■ Let’s take a look at a register -- simple D-type flip-flop — CMOS D-type transmission gate flipflop process(CLK) begin 1 D Q 1 if rising_edge(CLK) then Q <= D; 1 CLK end if; end process;
CLK
D Q
Transistor Model of a D FlipFlop
2008 National SW & CEH Conference 15 Mentor Confidential Metastability The Physics of a Register
■ Let’s take a look at a register -- simple D-type flip-flop — CMOS D-type transmission gate flipflop process(CLK) begin 0 D Q 1 if rising_edge(CLK) then Q <= D; 0 CLK end if; end process;
CLK
D Q
Only works Transistorif D has a “good Model value” of a atD theFlipFlop rising edge of the clock (no Set-up/hold time violations)
2008 National SW & CEH Conference 16 Mentor Confidential Metastability The Physics of a Register
■ When setup/hold conditions are violated, the output of a storage element becomes unpredictable
Setup/hold window 1 D Q D MTBF = clk ×× tff din CLK CLK fclk = Clock Frequency
fin = Input Signal Frequency
Q td = Duration of critical time window
■ This effect is called metastability ■ If not contained, metastability can propagate… Metastability is UNAVOIDABLE in designs with multiple asynchronous clocks
2008 National SW & CEH Conference 17 Mentor Confidential Clock Domain Crossings Guaranteed to Cause Metastability
When 2 or more designs run on disparate clocks: — The clocks will continually skew, guaranteeing setup/hold violations — Signals from one design to another are “Clock Domain Crossings” (CDCs) Clock Domain Crossing signal
D Q D Q
CLK CLK
Sensor System Guidance System
Signals that cross Clock A asynchronous clock Tx domains (CDC signals) Clock B WILL violate setup and hold conditions
Setup/hold window
2008 National SW & CEH Conference 18 Mentor Confidential Aircraft CEH Where can CDC issues occur?
■ Any time 2 or more systems (or parts within a system) are run by unique clocks (e.g. PLLs) ■ Could be in ANY system ■ Obvious Examples — ARINC 818 ■ Digital Video Bus – based on Fibre Channel — ARINC 664 (AFDX) ■ Ethernet Data Bus — Why these? ■ Both “recover” the clock from the incoming serial bitstream
Incoming Serial Data
SERDES
2008 National SW & CEH Conference 19 Mentor Confidential Aircraft CEH Where can CDC issues occur?
■ ARINC 818 — Digital Video Bus – based on Fibre Channel ■ ARINC 664 (AFDX) — Ethernet Data Bus ■ Recovered Clocks — Technology is widely used and understood, — But…the design needs to handle the clock… ■ Best solution: Transfer data to a more stable clock domain ClockClock period can shutand dutyoff cycle can vary Incoming Serial Data SERDES Serializer Deserializer
2008 National SW & CEH Conference 20 Mentor Confidential Aircraft CEH Where can these issues occur? Serializer / Deserializer (A.K.A SERDES)
2008 National SW & CEH Conference 21 Mentor Confidential Agenda
■ Digital Clocks, Registers, Metastability Explained ■ Why are Clock Domain Crossings Dangerous? ■ Where are CDCs likely on today’s CEH? ■ Mitigating the Clock Domain Crossing issue ■ Verifying the mitigation was performed correctly ■ Recommendations
2008 National SW & CEH Conference 22 Mentor Confidential Mitigating Clock Domain Crossing Issues
■ Problem:
— Signals crossing a clock domain will violate set-up/hold
— Impact: Control/data signals will be dropped/corrupted ■ Loss of Data ■ Approaches:
— Avoid having systems that have multiple clocks
— Only allow one clock, one edge
— Only one problem…
2008 National SW & CEH Conference 23 Mentor Confidential These systems ALREADY have multiple clocks…
2008 National SW & CEH Conference 24 Mentor Confidential Mitigating Clock Domain Crossing Issues
Problem:
— Signals crossing a clock domain will violate set-up/hold
— Impact: Control/data signals will be dropped/corrupted ■ Loss of Data Approaches:
— Avoid having systems that have multiple clocks ■ So, although sensible, it’s becoming impossible
— Design around the problem ■ Designer can add “synchronizers” to the design ■ Metastability still happens, but nobody else sees it
— E.g. 2DFF, FIFO, etc. — “Fences in” metastability
2008 National SW & CEH Conference 25 Mentor Confidential Isolate Metastability: Synchronizers
■ Designers add synchronizers to reduce the probability of metastable signals ■ Synchronizers are sub-circuits that can prevent metastable values from being sampled across clock domains — Take unpredictable metastable signals and create predictable behavior
2008 National SW & CEH Conference 26 Mentor Confidential Mitigating Clock Domain Crossing Issues Isolate Metastability: Synchronizers
Q
Clock A Clock B Metastability window Rx Tx
i -1 i i +1 i +2 ii --11 ii i +1i +1 i +2i +2 i +3
When metastability occurs, the delay through a synchronizer becomes unpredictable
2008 National SW & CEH Conference 27 Mentor Confidential Synchronizer Delays Can Reconverge withwith unexpectedunexpected resultsresults
■ CDC signals cross with an assumed relationship ■ Can be combinational, sequential, or deeply sequential ■ Unpredictable delays on CDC paths lead to reconvergence errors — Designs need logic to correctly handle reconvergence — Can occur on single-bit or multiple-bit signals Grey Dec Grey E
FSM Inpu S2 tx_d0 Sync 0 S1
n Sync 1 tx_d1 co o d d t e e S3 S4 r tx_d2 Sync 2 r 0 1 0 1 0 0 1 0 0 1 1 0 0 1 1 0 1 0 0
Invalid Command Valid Command – but delayed
2008 National SW & CEH Conference 28 Mentor Confidential And, Synchronizers Fail if Misused
■ Synchronization between clock domains requires a transfer protocol — Ensures data is predictably transferred between domains ■ These protocols must be verified ■ When protocol is violated — Data is lost — Simulation may not show a failure — Silicon will eventually show a functional error Synchronizer won’t function properly if the required Transfer Protocol is violated
2008 National SW & CEH Conference 29 Mentor Confidential Must Verify All Three CDC Problems Missing sync problem Possible protocol problem
Reconvergence problem
Clock domain crossings need: — Structured synchronization — Transfer protocols — Global reconvergence checking
2008 National SW & CEH Conference 30 Mentor Confidential Mitigating Clock Domain Crossing Issues
■ Problem:
— Signals crossing a clock domain will violate set-up/hold
— Impact: Control/data signals will be dropped/corrupted ■ Approaches:
— Avoid having systems that have multiple clocks
— Designer can add “synchronizers” to the design
— Designer-added synchronizers + full CDC verification ■ Assures synchronizers are present and used correctly
2008 National SW & CEH Conference 31 Mentor Confidential Recommendations
During design planning 1. Create systems/designs using 1 clk, 1 edge when possible 2. If multiple clocks are required, try to use 1 designer for both clock domains 3. When multi-clock design is required, plan for proper verification 1. How to we accomplish this?
2008 National SW & CEH Conference 32 Mentor Confidential Agenda
■ Digital Clocks, Registers, Metastability Explained ■ Why are Clock Domain Crossings Dangerous? ■ Where are CDCs likely on today’s CEH? ■ Mitigating the Clock Domain Crossing issue ■ Verifying the mitigation was performed correctly ■ Recommendations
2008 National SW & CEH Conference 33 Mentor Confidential Verifying CDC Synchronization
■ Problem:
— Missing synchronizers will create metastability
— Correctly placed but misused synchronizers won’t work
— Reconvergence of synchronized signals can create unexpected behavior ■ Approaches:
— Simulation ■ Digital logic simulators do NOT model transistor behavior ■ Do not model “metastability”
2008 National SW & CEH Conference 34 Mentor Confidential For example …
Setup Violation Hold Violation
D D
CLK CLK
Q in simulation Q in simulation Q Q
Q in silicon Q in silicon
Simulation captures a ‘1’ while Simulation captures a ‘0’ while silicon produces either a ‘1’ or ‘0’ silicon produces either a ‘1’ or ‘0’
Simulation Does NOT Reflect Silicon Behavior
2008 National SW & CEH Conference 35 Mentor Confidential Verifying CDC Synchronization
■ Problem: — Missing synchronizers will create metastability — Correctly placed but misused synchronizers won’t work — Reconvergence of synchronized ÎControl logic bugs ■ Approaches: — Simulation ■ Won’t model CDC’s correctly to detect errors — Static Timing Analysis ■ Can be used to identify signals that cross domains ■ Can be used as input for a manual review ■ But…Won’t detect missing or incorrectly used synchronizers, or reconvergence
2008 National SW & CEH Conference 36 Mentor Confidential Verifying CDC Synchronization
■ Problem: — Missing synchronizers will create metastability — Correctly placed but misused synchronizers won’t work — Reconvergence of synchronized ÎControl logic bugs ■ Approaches: — Simulation ■ Won’t model CDC’s correctly to detect errors — Static Timing Analysis ■ Identifies signals for manual review, but otherwise useless — Manual Design Reviews ■ Error prone (and very time consuming) ■ Typically only identifies synchronizer structures, misses reconvergence and invalid sync protocol usage ■ Evidence suggests at least some synchronizers will be missed
2008 National SW & CEH Conference 37 Mentor Confidential For Example… Trivial Reconvergence Error
■ Reconverging synchronized CDC signals - timing is unpredictable. ■ Need to verify the downstream logic can handle variations — Manually identifying the reconvergence is very hard — Manually identifying all possible behaviors is harder — Manually assuring logic will behave correctly – typically intractable
2008 National SW & CEH Conference 38 Mentor Confidential Verifying CDC Synchronization
■ Problem: — Missing synchronizers will create metastability — Correctly placed but misused synchronizers won’t work — Reconvergence of synchronized ÎControl logic bugs ■ Approaches: — Simulation - Won’t model CDC’s correctly to detect errors — Timing Analysis - Identifies signals for review, but otherwise useless — Manual Design Reviews - error prone, incomplete — Lab Verification? ■ Problem is intermittent, debug is impossible — Spice simulation? – It *does* model transistors, but… ■ Where will you get the “Spice deck”? (transistor level model) ■ Would be far too slow on a large FPGA
2008 National SW & CEH Conference 39 Mentor Confidential Verifying CDC Synchronization
■ Problem: — Missing synchronizers will create metastability — Correctly placed but misused synchronizers won’t work — Reconvergence of synchronized ÎControl logic bugs ■ Approaches: — So - we need a new method that reliably: ■ Identifies ALL CDC signals, structures, reconvergence ■ Assures ALL connected, functioning correctly ■ Creates reports for manual reviews ■ Î The EDA industry has responded — 6 commercial tools now available…and counting — But…most won’t identify all 3 of our CDC issues
2008 National SW & CEH Conference 40 Mentor Confidential CDC Verification Technology Caveat – To the best of my knowledge…
Mentor Cadence Synopsys Atrenta Real Intent Aldec 0-In CDC Conformal Leda Spyglass- CIV CDC CDC Structural Verification Protocols Verification Reconvergance Verification DO-254 Robustness ? ? ? ? ?
Minimal Support
Moderate Support
Strong Support
2008 National SW & CEH Conference 41 Mentor Confidential Mentor’s CDC Verification Technology
Who’s using our technology? ■ DO-254 — Honeywell, Inc. — L-3 Communications — Lockheed Martin Co — Ministry of Aerospace & Aeronautics — Northrop Grumman Corp — Raytheon — Rockwell Collins Inc. — SAAB Group — Thales ■ Commercial — Widely used in commercial space ■ TheThe marketmarket leaderleader inin CDCCDC verificationverification
2008 National SW & CEH Conference 42 Mentor Confidential Example Value from One Customer
■ Design — IEEE standard serial communications core — Used in 50-60 other COMMERCIAL ASIC products — Widely deployed (millions in use daily) ■ Placed core in a sensor guidance system — Found issues in the lab — Debugged FPGA for weeks — Suspected a CDC issue, but not sure… ■ Deployed Mentor’s CDC solution — Results same day — Found 199 serious CDC bugs! ■ 45 Missing Synchronizers ■ 83 Incorrect Synchronizers ■ 76 Reconverging Signals ■ 11 other problems — Most resulting from “more stressful” usage ■ In production: — Commercial ASIC : Customer issue – device is erratic, locks up — Avionics: Could result in an Airworthiness Directive
2008 National SW & CEH Conference 43 Mentor Confidential Summary Recommendations During design planning 1. Create systems/designs using 1 clk, 1 edge when possible 2. If multiple clocks are required, try to use 1 designer for all clock domains 3. When multi-clock design is required, plan for proper verification
During verification 1. Watch for multiple clocks in designs (Tip – Count PLLs) 2. Ask how CDC issues are mitigated (remember there are 3)
Utilize commercial tools designed for detecting these problems 1. Verify all 3 classes of CDC problems 1. Structural Verification 2. Protocols Verification 3. Reconvergance Verification 2. Use reports to aid manual reviews 3. Use CDC tools to support ROBUSTNESS
2008 National SW & CEH Conference 44 Mentor Confidential In Conclusion …
■ Every multi-clock design is subject to metastability ■ ARINC 664, 818 standards require multi-clock designs ■ Traditional verification methodologies CANNOT assure robustness
■ To properly mitigate the dangers of CDC, we strongly recommend a solution that… : — Supports Manual Reviews — Automatically reports all sources of CDC problems — Has a proven CDC verification methodology & customer success
2008 National SW & CEH Conference 45 Mentor Confidential Further Learning
■ Visit our web site: www.mentor.com/go/do-254 ■ Has numerous resources & papers including:
— “Automating Clock-Domain Crossing Verification for DO-254 (and other Safety-Critical) Designs”
— “Achieving Quality and Traceability in FPGA/ASIC Flows for DO- 254 Aviation Projects”
— “The Use of Advanced Verification Methods to Address DO-254 Design Assurance”
— “Effective Functional Verification Methodologies for DO-254 Level A/B and Other Safety-Critical Devices”
— “Assessing the ModelSim Tool for Use in DO-254 and ED-80 Projects”
— “DO-254 Compliant Design and Verification with VHDL-AMS”
2008 National SW & CEH Conference 46 Mentor Confidential David Landoll Applications Architect Mentor Graphics Corp. [email protected] www.mentor.com/go/do-254
2008 National SW & CEH Conference 47 Mentor Confidential 48 Mentor Confidential Appendix The 0-In CDC Verification Solution
33■ Structural CDC analysis — Automatically recognizes a large set of synchronizers — Comprehensive modal analysis 33■ Protocol verification — Automatic generation of CDC protocol assertions — These can either be proven or verified through simulation 33■ Reconvergence verification — Structural analysis to identify potential reconvergence issues — Metastability simulation to verify the design correctly handles reconvergence Multi-clock designs often ■ Tuned for capacity, debug, and exhibit design flaws that 33 ease of use simulation alone can’t find!
0-In CDC – The only complete RTL-level CDC solution
Mentor Confidential Background What’s a “SERDES” (and why do I care?)
Transmission – pretty straightforward • Take the data, bit by bit – shift it out fast enough • Done by a “serializer”
Avionics Computer Clock
“Transmit Clock”
Outgoing Signal
2008 National SW & CEH Conference 51 Mentor Confidential Background What’s a “SERDES” (and why do I care?) Reception – This is where things get clever ¾ Run a really fast clock – try to identify the characters coming in ¾ “Lock” the clock (Recovered Clock) on whatever shift allowed you to start seeing characters ¾ Declare yourself “synced” after recognizing n characters without errors ¾ This is done by the “Deserializer”
“Receive PLL Clock”
“Receive Clock”
“Receive PLL Clock”
“Receive Clock”
00111110101100000101
Incoming Signal
2008 National SW & CEH Conference 52 Mentor Confidential Background What’s a “SERDES” (and why do I care?) Reception – So, what’s the problem? ¾ If the transmitter sends garbage – you don’t get a “recovered clock”…
Running a digital design when the clock turns off doesn’t work very well… The solution is to “synchronize” the incoming data to another stable clock domain ASAP
“Receive PLL Clock”
“Receive Clock”
“Receive PLL Clock”
“Receive Clock”
00111110101100000101
Incoming Signal
2008 National SW & CEH Conference 53 Mentor Confidential