Gradual Synchronization
Total Page:16
File Type:pdf, Size:1020Kb
Gradual Synchronization Sandra Jackson and Rajit Manohar Computer Systems Laboratory, Cornell University Ithaca, NY 14853, U.S.A. fsandra, [email protected] Abstract—System-on-Chip (SoC) designs using multiple clock sender receiver domains are gaining importance due to clock distribution difficul- ties and increasing in-die process variations. For the same reasons more emerging SoC designs utilize clock-less domains for parts of the system. Both clock domain crossing and clocked/clock- less domain crossing require a mechanism for inter-domain data transfer that re-synchronizes data to the clock domain of the receiver and avoids metastability. These synchronizers introduce s r added latency and reduce throughput. This paper proposes φ φ merging synchronization with computation in order to reduce latency while keeping throughput high. The method, called Fig. 1. Classic two flip-flop synchronizer. 'r is the receiver clock and 's is Gradual Synchronization (GSync), can reduce synchronization the sender clock. latency at maximum operating frequency by up to 37 percent, with even greater benefit at slower frequencies. We show the benefits of this approach in the scenario of an asynchronous signal to be synchronized is passed through two sequential flip- NoC with synchronous end-points. flops both of which are clocked by the receiver’s clock. The presence of two flip-flops ensures that at least a full cycle is I. INTRODUCTION permitted for the resolution of any metastability. This interval The presence of multiple clock domains or mixed is deemed to be sufficient for all practical purposes, as it can clocked/clock-less domains in emerging system-on-chip (SoC) provide an extremely low failure probability [5], [6]. While designs creates the possibility of errors during communica- this solution is very robust, it incurs a high penalty in terms of tion across domain boundaries. The classic problem of re- synchronization overhead—both in terms of latency as well as synchronizing an asynchronous signal to a clock is known throughput. Attempts to “optimize” this synchronizer to reduce to exhibit the phenomenon of metastability. Metastability is its latency can result in erroneous implementations [6]. an instance of Buridan’s principle (named after fourteenth In this paper, we present an alternative approach that merges century French philosopher Jean Buridan) which can be stated synchronization into pipelined computation stages. This ap- as follows: a discrete decision based on a continuous range of proach reduces the number of cycles lost to synchronization. values cannot be made within a bounded length of time [1]. II. RELATED WORK In this particular case, the decision to be made is whether a signal is at logic zero or logic one, and it must be based on Correct use of the ”two flip-flop” synchronizer requires the voltage of the signal—a continuous value. The amount of some flow control between the two communicating clock time required for a circuit (known as a synchronizer) to “make domains. This is often accomplished in the form of requests up its mind” and determine whether an asynchronous signal is and acknowledges. Unfortunately, both must be synchronized logic zero or logic one cannot be bounded, and the net effect to their respective receiving clocks. Carefully optimizing this is that this decision may require more time than allocated synchronizer in the context of the flow control structure to it by a synchronous circuit—causing the synchronous achieves a lower latency and higher throughput capability [7], logic to potentially malfunction. This phenomenon was first the modifications do result in a reduction of the mean time experimentally demonstrated by Chaney and Molnar [2]. between failures (MTBF). A fast alternative is the even/odd Fortunately, there are well-known synchronizer implemen- synchronizer [8]. This synchronizer computes a phase esti- tations that ensure the probability that the circuit has “not mate based on the relative frequency of the two clocks. The made up its mind” decays exponentially with time [3], [4]. transmitter writes registers on alternate cycles and the receiver The probability that it takes more than time T to make a uses the phase estimate to determine which register to sample. decision, Prft > T g, is given by the expression e−T/τ where This synchronizer must be integrated with some form of flow τ is the time constant of the synchronizer. An adequate level of control in order to ensure every data is sampled. protection from synchronization failure is obtained by simply In order to keep the synchronization throughput high both waiting long enough—i.e., by choosing a large enough T . Seizovic [9] and Nowick [10] proposed FIFO based syn- The most common synchronizer is the “two flip-flop” syn- chronizers. In Seizovic’s work an asynchronous FIFO is con- chronizer shown in Figure 1. In this design, the transmitted structed, each FIFO stage attempts to further synchronize the φ Aφ ME R A Fig. 3. An ME element with one input connected to a clock. Fig. 2. A detailed view of a pipeline synchronizer acknowledged at any one time. Connecting one of the ME data so that the final output of the FIFO is synchronized to the element’s input signals to a clock has the effect of adding receiver clock, effectively creating pipelined synchronization. a variable delay to the acknowledge of the other input (see Section II-A discusses this work in greater detail. Nowick’s Figure 3). If T is the clock period, the ME will block R from dual-clock FIFO design only incurs a synchronization over- passing to A for half the clock period, and let R pass to A in head when the FIFO is nearly full or nearly empty. While this the other half of the clock period. If R shows up within Tw of solution will not have a significant synchronization penalty the relevant clock edge then a metastability occurs. However, when the FIFO is busy, is leads to high synchronization latency since the clock oscillates, the ME element will either exit the when the FIFO is either completely full or completely empty. metastable state on its own or be forced out of the metastable When two domains communicate with differing frequencies, state when the clock makes its next transition. it seems likely that the connection between the two domains The asynchronous FIFO element follows the specification: will encounter this high latency scenario more often than not. Other work has focused on periodic clock domains, where ∗[[Ri]; Ai;Ro;[Ao]] (1) a state-machine is used to block communication between two The clocks are two phase non overlapping clocks, '0 flip-flops that are in different clock domains if the commu- and '1. As the total number of stages (k) increases, the nication may lead to metastability. These solutions rely on a probability of encountering a metastability in the kth stage well-defined and predictable clock periodicity, and in some decreases. The alternating clock ensures that if no metastability cases they require that the clocks have similar frequencies or is encountered, a blocking phase will be encountered and the frequencies that bear a rational relationship [11], [12], [13]. synchronizer will enter its steady state. In the steady state Most of these approaches also have a significant communica- each stage takes T=2 and the signals are synchronized with tion latency, because the core of the design involves communi- the clock. cation between two flip-flops in different clock domains. Also, The MTBF can be adjusted to meet the need of the this class of solutions do not operate through gated clocks as system by changing the number of stages in the synchro- they exploit the periodicity of the clock signal, although this T nizer. Each stage introduces a latency of 2 . This latency is may not be a serious drawback if the number of crossings the major disadvantage of pipeline synchronization. Gradual between synchronization domains is small. synchronization uses a similar staged approach and also uses The situation is different if the receiver is completely Seitz’s implementation of the ME element as a synchronizer, asynchronous. An idle asynchronous circuit is ready to receive but GSync does not introduce a large latency penalty since its inputs at any time, and therefore will not encounter the it allows synchronization to occur in parallel with existing metastability problem in the form described above [3]. Even system computation. if an asynchronous circuit wishes to sample a signal while it is changing, it can do so safely because the rest of the III. GRADUAL SYNCHRONIZATION asynchronous circuit can wait for the synchronizer to “make up Gradual Synchronization merges synchronization with com- its mind” [14]. In certain special cases, one can even optimize putation by dispersing existing pipeline stages into synchro- the synchronizer design so as to have minimal impact on nizer stages as shown in Figure 4. In each stage data compu- overall system performance [15]. tation (CL) occurs while the synchronizer (S) block attempts to synchronize the request to the receiving clock. The two A. Pipeline Synchronization tasks which normally occur in isolation now occur in parallel. The pipeline synchronizer attempts to change the arrival Designed in this manner the synchronizer now resembles two- time of an input signal to a time that is synchronized with its phase pipeline operation with the FIFOs as the latches between clock input by the end of k stages. Each stage consists of one the stages. synchronizer and one FIFO element. The stages are cascaded to form the full pipeline allowing more than one data set to A. Gradual Synchronizer be passing through the synchronizer at a time as shown in Each FIFO in the pipeline requires stable data to be present figure 2.