Mitigating the Dangers of Multi-Clock Designs

David Landoll Applications Architect Mentor Graphics Corp.

National Software and Complex Electronic Hardware Conference August 20-21, 2008 Denver Mentor Graphics Around the World

R&D Sites Sales Offices

4400 employees, 44 sales offices, 28 R&D locations, ~60% R&D outside NA

2008 National SW & CEH Conference 2 Mentor Confidential Who is Mentor Graphics?

„ 2007 revenues $880M

„ Projecting ~$915M for 2008

„ Focus on growth through internal technology development — R&D 29% of total revenues, invested >$1.5B since 1996

„ Product portfolio that includes numerous market leaders — Calibre, Questa, 0-In, Expedition, Catapult, TestKompress…

„ Acquisitions that build on leadership positions

2008 National SW & CEH Conference 3 Mentor Confidential MentorMentor’’ss GraphicsGraphics ServesServes thethe MilMil--AeroAero MarketMarket Cable/Harness Systems Embedded SW •EDGE Developer Suite •Capital platform •Nucleus Secure RTOS •Veysys family •Inflexion Application Platform •TransDesign

PCB Design

FPGA ASIC •HDL Designer • C synthesis •Precision synthesis • Questa adv verification platform •Board station •ModelSim simulator • Veloce HW-assisted verification •Expedition Enterprise flow •Questa Adv. Verification • Seamless HW/SW co-design •PADS • Advanced M/S mixed signal design •Data Management System • Physical Implementation • Manufacturing tests 2008 National SW & CEH Conference 4 Mentor Confidential Today’s FPGAs

HDLC0 HDLC1 CAN0 CAN1HADDR RS422 SpaceWire Up/Downlink Up/Downlink Arbiter HWDATA Slave #1 HADDR HRDATA

Onboard Computer Master #1 HWDATA HRDATA HADDR Dual CAN System on Chip / CAN HWDATA Address and transceiver Slave #2 Switch HRDATA FPGA HADDR Control Mux

Master #2 HWDATA HRDATA DMA UART SpaceWire HDLC HDLC CAN Write data mux HADDR DMA UART ControllerController Controller Controller Controller ControllerRead data mux HWDATA Slave #3 HRDATA

AMBA APB Decoder AMBA AHB

Memory Boot Timers IRQCtrl I/O port DesignLeon CPU 2: AMBAAHB/APBAMBA AHB Interconnect PROM Controller CPU BridgeAHB

Design 1: DMA ControllerAddress/Control FPU

32bit Data bus EDAC Controller +3.3V +1.5V

1.5V Linear SDRAM SDRAM Configuration CLK Regulator PROM Generator Data Memory Parity Memory

• FabricationPPS in advancesPPS out provideJTAG more available+3.3V silicon area • More functionality can weigh less and take up less space Design 3: CPU • Integrating/reusing capabilitiesDesign lowers 4:cost UART

2008 National SW & CEH Conference 5 Mentor Confidential Avionic Integration – New challenges

Flight Management Weather Radar Flight Control & Avoidance Systems

Integrated Maintenance Processing Diagnostics

Boeing 787: Integration’s Next Step From its central processor to its common data network, surveillance system and navigation system, the theme of the Boeing 787 Dreamliner is integration. James W. Ramsey Communications Image Processing 2008 National SW & CEH Conference 6 Mentor Confidential Today’s Avionics…

Today’s CEH challenges: ■ Complex systems ■ Complex intercommunication ■ New ARINC ■ Contain “Multi-clock designs”

All Multi-clock designs have CDCs ■ So, what are “CDCs”? ■ Why should you care?

2008 National SW & CEH Conference 7 Mentor Confidential Clock Domain Crossing Errors Unpredictable Loss of Data ■ CDC problems

— corrupt control and data signals

— are subtle, intermittent, unpredictable nd — are the 2 major cause of respins

— are difficult to reproduce and debug

— are temperature, voltage, and process sensitive

— will only occur in hardware; often in the final design

■ Traditional verification techniques do not work for CDC signals AA CDCCDC VerificationVerification methodologymethodology isis neededneeded toto reducereduce thethe riskrisk ofof CDCCDC relatedrelated datadata errorserrors

2008 National SW & CEH Conference 8 Mentor Confidential FPGA/ASIC Verification Trends

IC/ASIC Designs Requiring Re-Spins by Type of Flaw

Logic/Functional 71% 75% Clocking ■ 50% of ASICS require Tuning Analog Circuit Fast Path more than one spin Yield/Reliability Delays/Glitches ■ Principle contributors: Slow Path Mixed-Signal Interface ■ Functional bugs Power Consumption ■ Clocking related bugs IR Drops Firmware Market Study 2002 [Collet 2005] Other Market Study 2004

0% 20% 40% 60% 80% 100% Percent of Designs Requiring Two or More Silicon Spins Effort Allocation of Dedicated Verification Engineers by Type of Activity

■ Traditional methods failing to catch bugs 40% ■ Debugging becomes the main bottleneck 60% ■ Usually indeterminate amount of time

Verification Debug Testbench Development Source: 2004/2002 IC/ASIC Functional VerificationVerification Study, CollettCollett International Research, Used with Permission

2008 National SW & CEH Conference 9 Mentor Confidential Agenda

¾ Digital Clocks, Registers, Metastability Explained ¾ Why are Clock Domain Crossings Dangerous? ¾ Where are CDCs likely on today’s CEH? ¾ Mitigating the Clock Domain Crossing issue ¾ Verifying the mitigation was performed correctly ¾ Recommendations

CDC = Clock Domain Crossing

2008 National SW & CEH Conference 10 Mentor Confidential Metastability What the heck is it, anyway?

■ What is a clock? Vcc, Vdd : +5V, +3.3V — Periodic pulsing signal Vee, Vss: GND: 0V — Digital logic uniformly connected to this signal

— Acts as the Symphony Conductor – keeps logic in sync

— Action happens across the logic at one specific point ■ Typically the “rising edge”

2008 National SW & CEH Conference 11 Mentor Confidential Metastability What the heck is it, anyway?

■ What’s in a register?

— (Also known as a latch, flip-flop, etc)

— Contain transistors that “trap” the input value at the appropriate time ■ E.g. rising edge of the clock

— How does this happen?

2008 National SW & CEH Conference 12 Mentor Confidential Metastability The Physics of a Register

■ Let’s take a look at a register -- simple D-type flip-flop — CMOS D-type transmission gate flipflop process(CLK) begin 0 D Q 0 if rising_edge(CLK) then Q <= D; 0 CLK end if; end process;

CLK

D Q

Transistor Model of a D FlipFlop

2008 National SW & CEH Conference 13 Mentor Confidential Metastability The Physics of a Register

■ Let’s take a look at a register -- simple D-type flip-flop — CMOS D-type transmission gate flipflop process(CLK) begin 1 D Q 0 if rising_edge(CLK) then Q <= D; 0 CLK end if; end process;

CLK

D Q

Transistor Model of a D FlipFlop

2008 National SW & CEH Conference 14 Mentor Confidential Metastability The Physics of a Register

■ Let’s take a look at a register -- simple D-type flip-flop — CMOS D-type transmission gate flipflop process(CLK) begin 1 D Q 1 if rising_edge(CLK) then Q <= D; 1 CLK end if; end process;

CLK

D Q

Transistor Model of a D FlipFlop

2008 National SW & CEH Conference 15 Mentor Confidential Metastability The Physics of a Register

■ Let’s take a look at a register -- simple D-type flip-flop — CMOS D-type transmission gate flipflop process(CLK) begin 0 D Q 1 if rising_edge(CLK) then Q <= D; 0 CLK end if; end process;

CLK

D Q

Only works Transistorif D has a “good Model value” of a atD theFlipFlop rising edge of the clock (no Set-up/hold time violations)

2008 National SW & CEH Conference 16 Mentor Confidential Metastability The Physics of a Register

■ When setup/hold conditions are violated, the output of a storage element becomes unpredictable

Setup/hold window 1 D Q D MTBF = clk ×× tff din CLK CLK fclk = Clock Frequency

fin = Input Signal Frequency

Q td = Duration of critical time window

■ This effect is called metastability ■ If not contained, metastability can propagate… Metastability is UNAVOIDABLE in designs with multiple asynchronous clocks

2008 National SW & CEH Conference 17 Mentor Confidential Clock Domain Crossings Guaranteed to Cause Metastability

When 2 or more designs run on disparate clocks: — The clocks will continually skew, guaranteeing setup/hold violations — Signals from one design to another are “Clock Domain Crossings” (CDCs) Clock Domain Crossing signal

D Q D Q

CLK CLK

Sensor System Guidance System

Signals that cross Clock A asynchronous clock Tx domains (CDC signals) Clock B WILL violate setup and hold conditions

Setup/hold window

2008 National SW & CEH Conference 18 Mentor Confidential Aircraft CEH Where can CDC issues occur?

■ Any time 2 or more systems (or parts within a system) are run by unique clocks (e.g. PLLs) ■ Could be in ANY system ■ Obvious Examples — ARINC 818 ■ Digital Video Bus – based on — ARINC 664 (AFDX) ■ Data Bus — Why these? ■ Both “recover” the clock from the incoming serial bitstream

Incoming Serial Data

SERDES

2008 National SW & CEH Conference 19 Mentor Confidential Aircraft CEH Where can CDC issues occur?

■ ARINC 818 — Digital Video Bus – based on Fibre Channel ■ ARINC 664 (AFDX) — Ethernet Data Bus ■ Recovered Clocks — Technology is widely used and understood, — But…the design needs to handle the clock… ■ Best solution: Transfer data to a more stable clock domain ClockClock period can shutand dutyoff cycle can vary Incoming Serial Data SERDES Serializer Deserializer

2008 National SW & CEH Conference 20 Mentor Confidential Aircraft CEH Where can these issues occur? Serializer / Deserializer (A.K.A SERDES)

2008 National SW & CEH Conference 21 Mentor Confidential Agenda

■ Digital Clocks, Registers, Metastability Explained ■ Why are Clock Domain Crossings Dangerous? ■ Where are CDCs likely on today’s CEH? ■ Mitigating the Clock Domain Crossing issue ■ Verifying the mitigation was performed correctly ■ Recommendations

2008 National SW & CEH Conference 22 Mentor Confidential Mitigating Clock Domain Crossing Issues

■ Problem:

— Signals crossing a clock domain will violate set-up/hold

— Impact: Control/data signals will be dropped/corrupted ■ Loss of Data ■ Approaches:

— Avoid having systems that have multiple clocks

— Only allow one clock, one edge

— Only one problem…

2008 National SW & CEH Conference 23 Mentor Confidential These systems ALREADY have multiple clocks…

2008 National SW & CEH Conference 24 Mentor Confidential Mitigating Clock Domain Crossing Issues

Problem:

— Signals crossing a clock domain will violate set-up/hold

— Impact: Control/data signals will be dropped/corrupted ■ Loss of Data Approaches:

— Avoid having systems that have multiple clocks ■ So, although sensible, it’s becoming impossible

— Design around the problem ■ Designer can add “synchronizers” to the design ■ Metastability still happens, but nobody else sees it

— E.g. 2DFF, FIFO, etc. — “Fences in” metastability

2008 National SW & CEH Conference 25 Mentor Confidential Isolate Metastability: Synchronizers

■ Designers add synchronizers to reduce the probability of metastable signals ■ Synchronizers are sub-circuits that can prevent metastable values from being sampled across clock domains — Take unpredictable metastable signals and create predictable behavior

2008 National SW & CEH Conference 26 Mentor Confidential Mitigating Clock Domain Crossing Issues Isolate Metastability: Synchronizers

Q

Clock A Clock B Metastability window Rx Tx

i -1 i i +1 i +2 ii --11 ii i +1i +1 i +2i +2 i +3

When metastability occurs, the delay through a synchronizer becomes unpredictable

2008 National SW & CEH Conference 27 Mentor Confidential Synchronizer Delays Can Reconverge withwith unexpectedunexpected resultsresults

■ CDC signals cross with an assumed relationship ■ Can be combinational, sequential, or deeply sequential ■ Unpredictable delays on CDC paths lead to reconvergence errors — Designs need logic to correctly handle reconvergence — Can occur on single-bit or multiple-bit signals Grey Dec Grey E

FSM Inpu S2 tx_d0 Sync 0 S1

n Sync 1 tx_d1 co o d d t e e S3 S4 r tx_d2 Sync 2 r 0 1 0 1 0 0 1 0 0 1 1 0 0 1 1 0 1 0 0

Invalid Command Valid Command – but delayed

2008 National SW & CEH Conference 28 Mentor Confidential And, Synchronizers Fail if Misused

■ Synchronization between clock domains requires a transfer protocol — Ensures data is predictably transferred between domains ■ These protocols must be verified ■ When protocol is violated — Data is lost — Simulation may not show a failure — Silicon will eventually show a functional error Synchronizer won’t function properly if the required Transfer Protocol is violated

2008 National SW & CEH Conference 29 Mentor Confidential Must Verify All Three CDC Problems Missing sync problem Possible protocol problem

Reconvergence problem

Clock domain crossings need: — Structured synchronization — Transfer protocols — Global reconvergence checking

2008 National SW & CEH Conference 30 Mentor Confidential Mitigating Clock Domain Crossing Issues

■ Problem:

— Signals crossing a clock domain will violate set-up/hold

— Impact: Control/data signals will be dropped/corrupted ■ Approaches:

— Avoid having systems that have multiple clocks

— Designer can add “synchronizers” to the design

— Designer-added synchronizers + full CDC verification ■ Assures synchronizers are present and used correctly

2008 National SW & CEH Conference 31 Mentor Confidential Recommendations

During design planning 1. Create systems/designs using 1 clk, 1 edge when possible 2. If multiple clocks are required, try to use 1 designer for both clock domains 3. When multi-clock design is required, plan for proper verification 1. How to we accomplish this?

2008 National SW & CEH Conference 32 Mentor Confidential Agenda

■ Digital Clocks, Registers, Metastability Explained ■ Why are Clock Domain Crossings Dangerous? ■ Where are CDCs likely on today’s CEH? ■ Mitigating the Clock Domain Crossing issue ■ Verifying the mitigation was performed correctly ■ Recommendations

2008 National SW & CEH Conference 33 Mentor Confidential Verifying CDC Synchronization

■ Problem:

— Missing synchronizers will create metastability

— Correctly placed but misused synchronizers won’t work

— Reconvergence of synchronized signals can create unexpected behavior ■ Approaches:

— Simulation ■ Digital logic simulators do NOT model transistor behavior ■ Do not model “metastability”

2008 National SW & CEH Conference 34 Mentor Confidential For example …

Setup Violation Hold Violation

D D

CLK CLK

Q in simulation Q in simulation Q Q

Q in silicon Q in silicon

Simulation captures a ‘1’ while Simulation captures a ‘0’ while silicon produces either a ‘1’ or ‘0’ silicon produces either a ‘1’ or ‘0’

Simulation Does NOT Reflect Silicon Behavior

2008 National SW & CEH Conference 35 Mentor Confidential Verifying CDC Synchronization

■ Problem: — Missing synchronizers will create metastability — Correctly placed but misused synchronizers won’t work — Reconvergence of synchronized ÎControl logic bugs ■ Approaches: — Simulation ■ Won’t model CDC’s correctly to detect errors — Static Timing Analysis ■ Can be used to identify signals that cross domains ■ Can be used as input for a manual review ■ But…Won’t detect missing or incorrectly used synchronizers, or reconvergence

2008 National SW & CEH Conference 36 Mentor Confidential Verifying CDC Synchronization

■ Problem: — Missing synchronizers will create metastability — Correctly placed but misused synchronizers won’t work — Reconvergence of synchronized ÎControl logic bugs ■ Approaches: — Simulation ■ Won’t model CDC’s correctly to detect errors — Static Timing Analysis ■ Identifies signals for manual review, but otherwise useless — Manual Design Reviews ■ Error prone (and very time consuming) ■ Typically only identifies synchronizer structures, misses reconvergence and invalid sync protocol usage ■ Evidence suggests at least some synchronizers will be missed

2008 National SW & CEH Conference 37 Mentor Confidential For Example… Trivial Reconvergence Error

■ Reconverging synchronized CDC signals - timing is unpredictable. ■ Need to verify the downstream logic can handle variations — Manually identifying the reconvergence is very hard — Manually identifying all possible behaviors is harder — Manually assuring logic will behave correctly – typically intractable

2008 National SW & CEH Conference 38 Mentor Confidential Verifying CDC Synchronization

■ Problem: — Missing synchronizers will create metastability — Correctly placed but misused synchronizers won’t work — Reconvergence of synchronized ÎControl logic bugs ■ Approaches: — Simulation - Won’t model CDC’s correctly to detect errors — Timing Analysis - Identifies signals for review, but otherwise useless — Manual Design Reviews - error prone, incomplete — Lab Verification? ■ Problem is intermittent, debug is impossible — Spice simulation? – It *does* model transistors, but… ■ Where will you get the “Spice deck”? (transistor level model) ■ Would be far too slow on a large FPGA

2008 National SW & CEH Conference 39 Mentor Confidential Verifying CDC Synchronization

■ Problem: — Missing synchronizers will create metastability — Correctly placed but misused synchronizers won’t work — Reconvergence of synchronized ÎControl logic bugs ■ Approaches: — So - we need a new method that reliably: ■ Identifies ALL CDC signals, structures, reconvergence ■ Assures ALL connected, functioning correctly ■ Creates reports for manual reviews ■ Î The EDA industry has responded — 6 commercial tools now available…and counting — But…most won’t identify all 3 of our CDC issues

2008 National SW & CEH Conference 40 Mentor Confidential CDC Verification Technology Caveat – To the best of my knowledge…

Mentor Cadence Synopsys Atrenta Real Intent Aldec 0-In CDC Conformal Leda Spyglass- CIV CDC CDC Structural Verification Protocols Verification Reconvergance Verification DO-254 Robustness ? ? ? ? ?

Minimal Support

Moderate Support

Strong Support

2008 National SW & CEH Conference 41 Mentor Confidential Mentor’s CDC Verification Technology

Who’s using our technology? ■ DO-254 — Honeywell, Inc. — L-3 Communications — Lockheed Martin Co — Ministry of Aerospace & Aeronautics — Northrop Grumman Corp — Raytheon — Rockwell Collins Inc. — SAAB Group — Thales ■ Commercial — Widely used in commercial space ■ TheThe marketmarket leaderleader inin CDCCDC verificationverification

2008 National SW & CEH Conference 42 Mentor Confidential Example Value from One Customer

■ Design — IEEE standard serial communications core — Used in 50-60 other COMMERCIAL ASIC products — Widely deployed (millions in use daily) ■ Placed core in a sensor guidance system — Found issues in the lab — Debugged FPGA for weeks — Suspected a CDC issue, but not sure… ■ Deployed Mentor’s CDC solution — Results same day — Found 199 serious CDC bugs! ■ 45 Missing Synchronizers ■ 83 Incorrect Synchronizers ■ 76 Reconverging Signals ■ 11 other problems — Most resulting from “more stressful” usage ■ In production: — Commercial ASIC : Customer issue – device is erratic, locks up — Avionics: Could result in an Airworthiness Directive

2008 National SW & CEH Conference 43 Mentor Confidential Summary Recommendations During design planning 1. Create systems/designs using 1 clk, 1 edge when possible 2. If multiple clocks are required, try to use 1 designer for all clock domains 3. When multi-clock design is required, plan for proper verification

During verification 1. Watch for multiple clocks in designs (Tip – Count PLLs) 2. Ask how CDC issues are mitigated (remember there are 3)

Utilize commercial tools designed for detecting these problems 1. Verify all 3 classes of CDC problems 1. Structural Verification 2. Protocols Verification 3. Reconvergance Verification 2. Use reports to aid manual reviews 3. Use CDC tools to support ROBUSTNESS

2008 National SW & CEH Conference 44 Mentor Confidential In Conclusion …

■ Every multi-clock design is subject to metastability ■ ARINC 664, 818 standards require multi-clock designs ■ Traditional verification methodologies CANNOT assure robustness

■ To properly mitigate the dangers of CDC, we strongly recommend a solution that… : — Supports Manual Reviews — Automatically reports all sources of CDC problems — Has a proven CDC verification methodology & customer success

2008 National SW & CEH Conference 45 Mentor Confidential Further Learning

■ Visit our web site: www.mentor.com/go/do-254 ■ Has numerous resources & papers including:

— “Automating Clock-Domain Crossing Verification for DO-254 (and other Safety-Critical) Designs”

— “Achieving Quality and Traceability in FPGA/ASIC Flows for DO- 254 Aviation Projects”

— “The Use of Advanced Verification Methods to Address DO-254 Design Assurance”

— “Effective Functional Verification Methodologies for DO-254 Level A/B and Other Safety-Critical Devices”

— “Assessing the ModelSim Tool for Use in DO-254 and ED-80 Projects”

— “DO-254 Compliant Design and Verification with VHDL-AMS”

2008 National SW & CEH Conference 46 Mentor Confidential David Landoll Applications Architect Mentor Graphics Corp. [email protected] www.mentor.com/go/do-254

2008 National SW & CEH Conference 47 Mentor Confidential 48 Mentor Confidential Appendix The 0-In CDC Verification Solution

33■ Structural CDC analysis — Automatically recognizes a large set of synchronizers — Comprehensive modal analysis 33■ Protocol verification — Automatic generation of CDC protocol assertions — These can either be proven or verified through simulation 33■ Reconvergence verification — Structural analysis to identify potential reconvergence issues — Metastability simulation to verify the design correctly handles reconvergence Multi-clock designs often ■ Tuned for capacity, debug, and exhibit design flaws that 33 ease of use simulation alone can’t find!

0-In CDC – The only complete RTL-level CDC solution

Mentor Confidential Background What’s a “SERDES” (and why do I care?)

Transmission – pretty straightforward • Take the data, bit by bit – shift it out fast enough • Done by a “serializer”

Avionics Computer Clock

“Transmit Clock”

Outgoing Signal

2008 National SW & CEH Conference 51 Mentor Confidential Background What’s a “SERDES” (and why do I care?) Reception – This is where things get clever ¾ Run a really fast clock – try to identify the characters coming in ¾ “Lock” the clock (Recovered Clock) on whatever shift allowed you to start seeing characters ¾ Declare yourself “synced” after recognizing n characters without errors ¾ This is done by the “Deserializer”

“Receive PLL Clock”

“Receive Clock”

“Receive PLL Clock”

“Receive Clock”

00111110101100000101

Incoming Signal

2008 National SW & CEH Conference 52 Mentor Confidential Background What’s a “SERDES” (and why do I care?) Reception – So, what’s the problem? ¾ If the transmitter sends garbage – you don’t get a “recovered clock”…

Running a digital design when the clock turns off doesn’t work very well… The solution is to “synchronize” the incoming data to another stable clock domain ASAP

“Receive PLL Clock”

“Receive Clock”

“Receive PLL Clock”

“Receive Clock”

00111110101100000101

Incoming Signal

2008 National SW & CEH Conference 53 Mentor Confidential