UNDERSTANDING ISSUES OF CLOCK DOMAIN CROSSING

Neenu Preetam Isukametla

M.Tech (Microelectronics), Department of ECE, SEEC, Manipal University Jaipur(MUJ)

([email protected])

Abstract-Current System-on-a-chip (SoC) designs required, but before devising a new methodology it is contain increased levels of functional and structural important to understand the issues related to clock complexities within a single system. With the domain crossings properly. Different types of clock integration of multiple designs, various clock domain crossings are discussed here along with the domains are introduced. In this paper, we discuss possible issues encountered in each one of them and various clock domain crossing issues. Keywords – Clock Domain Crossing, Metastability, their solutions. A new verification methodology is Data loss, Data Incoherency then proposed which will ensure that data is transferred correctly across clock domains. In all the subsequent sections, the signal names I. INTRODUCTION shown in Figure are directly used. For example, C1 SoCs are becoming more complex these days. A lot and C2 imply the source and destination clocks of functionality is being added to chips and data is respectively. Similarly A and B are used as source frequently transferred from one clock domain to and destination flop outputs respectively. Also, the another. Hence, clock domain crossing verification source and destination flops are assumed to be has become one of the major verification challenges positive edge triggered. in deep submicron designs. II. CLOCK DOMAIN CROSSING TYPES A clock domain crossing occurs whenever data is 1. SYNCHRONOUS CLOCK DOMAIN transferred from a flop driven by one clock to a flop driven by another clock. CROSSINGS

Clocks which have a known phase and frequency relationship between them are known as synchronous clocks . The clocks originate from same clock root, and clock crossing between those clocks is known as synchronous clock domain crossing. Synchronous

Clock domain crossing. CDC is divided into several categories based on the In the Figure, signal A is launched by the C1 clock phase and frequency relationship of the source and domain and needs to be captured properly by the C2 destination clocks as follows : clock domain. Depending on the relationship between the two clocks, there could be different types of Clocks with the same frequency and zero phase problems in transferring data from the source clock to difference the destination clock. Along with that, the solutions Clocks with the same frequency and constant phase difference to those problems can also be different. Clocks with different frequency and variable phase Traditional methods like simulation and static timing difference analysis alone are not sufficient to verify that the data Integer multiple clocks is transferred consistently and reliably across clock Non-Integer multiple clocks domains. Hence, new verification methodologies are 2. ASYNCHRONOUS CLOCK DOMAIN slower Clk2 clock domain. Signals T1 and T2 are CROSSINGS launched by Clk1 clock domain, and will go through combinatorial logic before captured by Clk2 clock Clocks which do not have a known phase or domain. As the design intent is at least 1T cycle time, frequency relationship between them are known as Flops F1 and F2 in Clk1 domain cannot launch at asynchronous clocks. The phase difference between 0.5T edge shown as X in Figure 2b. When data is the clocks is unpredictable, and there can be transferred from Clk1 to Clk2 domain at clock edge metastability issues if their active edges arrive very X, there is a possibility of glitch which cannot be close to each other. seen in functional simulation, but will appear in the

III. GLITCH PROBLEM IN CDC hardware. Figure 2b, shows the timing diagram where a glitch can happen in the real circuit with 3:2 Consider two synchronous clock domains which can clock ratio. In Figure 2b, say at clock edge X, inputs be run at various integer and non-integer clock ratios, T1 and T2 of AND gate changes from 1->0 and 0->1 for example, 1:1, 2:1, 3:2, etc. The intent of the respectively at the positive edge of Clk1, then this design is to avoid fractional cycletimes across clock should effectively maintain a value of 0 at the output domains, i.e., the timing constraint between flops of AND gate. But, if T1 and T2 arrival times are across clock domains is at least 1 cycle-time of the different because of real circuit delays, then there fastest clock. In this paper, we consider non-integral may be a glitch at the output of the AND gate before 3:2 clock ratio as an example. Figure1 shows typical the AND gate output T3 settles down to 0. So, if Clk2 3:2 clock ratio waveform. clock edge which is 0.5T later were to sample the output of the AND gate, then there is a possibility that this edge would capture the glitch generated at the AND gate output which can cause a possible functional failure of the design. This glitch can be avoided if Flops F1 and F2 in Clk1 domain do not launch at clock edge X. By using the combination of Figure1: Typical 3:2 clock ratio waveform formal verification and static timing analysis (STA), we prove that for all clock ratios, a possible glitch In the waveform, we can see the possible time generated in one clock domain cannot cause a timing periods between posedge to posedge of faster clock violation in the other clock domain. (Clk1) to slower clock (Clk2) in a 3:2 clock ratio are 1.5T, 1T, and 0.5T, where T is the cycle time of the faster clock. The design intent is to have a set-up time of at least 1T cycle time. Consider a sample CDC design snippet as shown in Figure 2a. Flops F1 and F2 are in faster Clk1 clock domain. Flop F3 is in phenomenon is known as metastability and the flop "FB" is said to have entered a metastable state. Metastability in turn can have the following consequences from a design perspective: 1. If the unstable data is fed to several other places in the design, it may lead to a high current flow and even chip burnout in the worst case. 2. Different fan-out cones may read different values of the signal, and may cause the design to enter into an unknown functional state, leading to functional issues in the design. Figure 2a: Sample CDC design 3. The destination domain output may settle down to the new value or may return to the old value. However, the propagation delay could be high leading to timing issues. For example, see Figure 2. If the input signal A transitions very close to the posedge of clock C2, the output of the destination flop can be metastable. As a result it can be unstable and may finally settle to 1 or 0 as depicted by signals B1 and B2.

Figure 2b: Glitch in CDC for 3:2 clock ratio IV.CLOCK DOMAIN CROSSING ISSUES Metastability has consequences. This section describes three main issues which can Solution. Metastability problems can be avoided by possibly occur whenever there is a clock domain adding special structures known as synchronizers in crossing. The solutions for those issues are also the destination domain. The synchronizers allow described. sufficient time for the oscillations to settle down and A.Metastability ensure that a stable output is obtained in the Problem. If the transition on signal A happens very destination domain. A commonly used synchronizer close to the active edge of clock C2, it could lead to is a multi-flop synchronizer as shown in Figure . setup or hold violation at the destination flop "FB". As a result, the output signal B may oscillate for an indefinite amount of time. Thus the output is unstable and may or may not settle down to some stable value before the next clock edge of C2 arrives. This Hence, there may not be a cycle by cycle correspondence between the source and destination domain data. Whatever the case, it is important that each transition on the source data should get captured in the destination domain. For example: Assume that the source clock C1 is twice as fast as the destination clock C2 and there is no phase difference between the two clocks. Further Multi-flop synchronization. assume that the input data sequence "A" generated on the positive edge of clock C1 is "00110011". The This structure is mainly used for single and multi-bit data B captured on the positive edge of clock C2 will control signals and single bit data signals in the be "0101". Here, since all the transitions on signal A design. Other types of synchronization schemes are are captured by B, the data is not lost. This is required for multi-bit data signals such as MUX depicted in Figure. recirculation, handshake, and FIFO. B.Data Loss Problem. Whenever a new source data is generated, it may not be captured by the destination domain in the very first cycle of the destination clock because of metastability. As long as each transition on the source signal is captured in the destination domain, data is not lost. In order to ensure this, the source data should remain stable for some minimum time, so that the setup and hold time requirements are met with No data is lost in this case. respect to at least one active edge of destination However, if the input sequence is "00101111", then clock. the output in the destination domain will be "0011". If the active clock edges of C1 and C2 arrive close Here the third data value in the input sequence which together, the first clock edge of C2, which comes is "1" is lost as shown in Figure. after the transition on source data A, is not able to capture it. The data finally gets captured by the second edge of clock C2 (Figure 4). However, if there is sufficient time between the transition on data A and the active edge of clock C2, the data is captured in the destination domain in the first cycle of C2.

Data is lost in this case. Solution. In order to prevent data loss, the data should be held constant in the source domain long enough to be properly captured in the destination domain. In other words, after every transition on source data, at least one destination clock edge should arrive where there is no setup or hold violation so that the source data is captured properly in the destination domain. There are several techniques to ensure this.

Effect of metastability on data capture. For example, a finite state machine (FSM) can be used to generate source data at a rate, such that it is stable for at least 1 complete cycle of the destination clock. This can be generally useful for synchronous clocks when their frequencies are known. For asynchronous clock domain crossings, techniques like handshake and FIFO are more suitable. C.Data Incoherency

Problem. As seen in the previous section whenever new data is generated in the source clock domain, it may take 1 or more destination clock cycles to capture it, depending on the arrival time of active clock edges. Consider a case where multiple signals are being transferred from one clock domain to Data coherency is lost in this case. another and each signal is synchronized separately Next, there is a transition from 0->1 on both the bits using a multi-flop synchronizer. If all the signals are of signal X. Here the rising edge of clock C2 comes changing simultaneously and the source and close to the transition on signal X. While the destination clock edges arrive close together, some of transition on X[0] is captured in the first clock cycle, the signals may get captured in the destination the transition on X[1] gets captured in second clock domain in the first clock cycle while some others cycle of C2. This results in an intermediate value of may be captured in the second clock cycle by virtue "10" on Y[0:1] which is an invalid state. Data of metastability. This may result in an invalid coherency is lost in this case. combination of values on the signals at the destination side. Data coherency is said to have been Solution. In the above example, the problem results lost in such a case. because all the bits are not changing to a new state in the same cycle of destination clock. If all the bits If these signals are together controlling some function either retain their original value or change to the new of the design, then this invalid state may lead to value in the same cycle, then the design either functional errors. remains in the original state or goes to a correct new For example: Assume that "00" and "11" are two state. valid values for a signal X[0:1] generated by clock Now, if the circuit is designed in such a way that C1. As shown in Figure 7, initially there is a while changing the design from one state to another, transition from 1->0 on both the bits of X. Both the only one bit change is required, then either that bit transitions get captured by clock C2 in the first cycle would change to a new value or would retain the itself. Hence the signal Y[0:1] becomes "00". original value. Since all the other bits have the same value in both the states, the complete bus will either change to the new value or retain the original value in this case. This in turn implies that if the bus is Gray-encoded, the problem would get resolved and an invalid state would never be obtained. However, this is applicable only for control busses as it may not be possible to Gray-encode the data busses. In such cases, other techniques like handshake, FIFO and MUX recirculation can be used to generate a common control logic to transfer data correctly. combinations of timing and functionality The MUX recirculation technique is shown in Figure. that can make them unpredictable and intermittent. V. CHALLENGES AND SOLUTION OF CDC’s • Some of the tools used to tackle such bugs Advanced semiconductor processes have • have limited capacity, which can force made it possible to integrate hundreds of designers into partitioning their SoCs millions of gates of digital logic on a die. artificially, so that each sub-block can be What has made this practical, however, has handled within the tool’s capacity. Other been the shift to block-based design, in methodologies produce so many false which many large functional blocks from a positives that their diagnostic value is self variety of sources are quickly integrated into limiting – there are just too many potential a new SoC. Without the ability to reuse errors to consider. design blocks, it would be impractical, and perhaps even impossible, to take full • Our current thinking is that designers need a advantage of the capabilities of an advanced configurable solution, which uses both process in any reasonable timescale – structural and functional analysis to ensure designing all that functionality from scratch that signals which cross between domains on is simply too complex. both ASICs and FPGAs are received reliably. Such a tool also needs high • What abstraction to the block level gives capacity and to work hierarchically, so that a with one hand it tends to takes away with design can be partitioned for analysis in a the other. Even if each block can be relied way that matches the original design intent, upon to behave properly within its but doesn’t sacrifice top-level, full-chip boundaries, a complex SoC design attempts precision on extremely large designs. to integrate and then coordinate many such blocks, despite the fact that each may have • If we can get this right then we have a been designed by a different group using a chance of helping designers tackle the new different strategy. Each block, for example, class of issues that has emerged as chip may expect a different clock rate, may design has evolved into a block-integration dynamically adjust its clock to match its challenge. workload, and may employ sophisticated

clock-gating strategies to minimise power consumption.

• Clock domain crossing (CDC) bugs can VI. CONCLUSION happen when a signal crosses from one In today’s SoCs, the convergence of multiple asynchronous domain to another, and arrives functions causes very complex interaction between too close to the receiving clock edge, so that advanced interface protocols and logic. Combined the captured value is non-deterministic due with the growing number of IPs used, the very large to set-up or hold-time violations. This design sizes, and complex clocking, is causing the metastable state results in incorrect values number of paths crossing clock domains to explode. being propagated through the design, These forces combine and create the demand to find all CDC issues as early as possible in the design causing functional errors. The problem with cycle, to avoid tedious debugging late in verification such errors is that using standard methods to stages, or far worse, missing a CDC bug, resulting in track them down, such as simulation or expensive silicon re-spins, or even more expensive static timing analysis, doesn’t make sense field returns. A clock domain crossing (CDC) because the failures arise due to corner-case verification solution must address this verification challenge, while maximizing overall productivity and Asynchronous Clock Designs,” SNUG-2001. effectiveness. It needs to cover clock domain analysis [3] Carver Mead, et al., “Introduction to VLSI and structural and functional verification, addressing Systems,” Addison Wesley, pp. 237-242, Oct. both RTL and gate-level verification needs. 1980. [4] Saurabh Verma, Ashima S. Dabare, Atrenta, Design How-To Understanding Clock Domain Crossing Issues, EETimes 2007. [5] Prakash Rashinka, et al, “System-on-a-chip Verification. Methodology & Techniques,” Kluwer Academic Publishers, 2001. VII. REFERENCES [6] Techdesignforums.com/practice/technique/c lock-domain-crossing-solution/ [1] Cadence Technical Paper, Closing the loop [7] Kesava R. Talupuru, Sanjai Athi, Achieving on clock domain functional implementation Glitch-Free Clock Domain Crossing Signals problems. Using Formal Verification, Static Timing [2] Clifford E. Cummings, “Synthesis and Analysis, and Sequential Equivalence Checking. Scripting Techniques for Designing Multi-

Author :

Mr. I. Neenu Preetam, currently pursing M.Tech (Microelectronics) at SEEC, Manipal University Jaipur. And is working as ASIC Design Verification Intern at CVC Pvt Ltd Bangalore. He completed his B.Tech ECE from CMRCET, JNTU University, Hyderabad. He has published IEEE paper on Optimized Kernel Fuzzy C Means (OKFCM) Clustering Algorithm on Level Set Method for Noisy Images, and has also published various papers in International and National conferences,International Journals. His area of interest lies in Verification, VLSI ,Signal Processing, Communications and Image Processing.