Impact of C-Elements in Asynchronous Circuits
Total Page:16
File Type:pdf, Size:1020Kb
Impact of C-Elements in Asynchronous Circuits Matheus Moreira, Bruno Oliveira, Fernando Moraes, Ney Calazans Faculdade de Informática Pontifícia Universidade Católica do Rio Grande do Sul Porto Alegre, Brazil {matheus.moreira, bruno.scherer}@acad.pucrs.br {fernando.moraes, ney.calazans}@pucrs.br Abstract single cell level and the application, or core, level. For the Asynchronous circuits are a potential solution to address former scope, delay and power consumption were measured some of the obstacles in deep submicron (DSM) design. One through electrical simulations at the standard cell level after of the most frequently used devices to build asynchronous electrical extraction of the physical layout. For the latter circuits is the C-element, a device present as a basic building scope, the different C-elements were used to build block in several asynchronous design styles. This work asynchronous implementations of an oscillator ring and an measures the impact of three different C-element types. The RSA cryptographic core. A comparison of performance and paper compares the use of each implementation to build a real required silicon area of the case studies made it possible to case asynchronous circuit, an RSA cryptographic core, and scrutinize the systemic effect of using different C-element reports results of precise electrical simulations of each C- implementations when designing asynchronous circuits in element. Findings in this work show that previous results in DSM technologies. The rest of the paper is organized in seven sections. the literature about C-element implementation types must be Section II describes basic concepts on asynchronous circuits re-evaluated when using C-elements in DSM technologies. and Section III presents a discussion on related works. Section IV presents details on the CMOS design and implementation Keywords of C-elements. Section V approaches the comparison of each asynchronous circuits; standard cell; C-element; deep implementation at the cell level. Section VI presents the case submicron study circuits using each of the three C-element implementations. Finally, Section 0 draws some conclusions I. INTRODUCTION and directions for further work. The interest in non-synchronous circuits is increasing. The International Technology Roadmap for Semiconductors II. ASYNCHRONOUS CIRCUITS (ITRS) in its 2008 edition [1] describes a clear need for A digital circuit is synchronous if its design implies the asynchronous communication protocols for control and use of a single clock signal controlling all circuit events. synchronization in integrated circuits (ICs) for the next Otherwise it is called non-synchronous. As a special case, a decades. However, one major problem for adopting the digital circuit is asynchronous when no clock signal is used to asynchronous design paradigm is that most commercial EDA control any sequencing of events. These employ explicit tools assume that a single (or a few) clock signal(s) globally handshaking among their components to synchronize, controls an entire IC. Moreover, basic components required to communicate and operate [2]. The resulting behavior is implement asynchronous communication, like the C-element, similar to a synchronous system where registers are clocked are not available off-the-shelf in typical standard cell libraries. only when and where necessary. Characterizing an The C-element is a fundamental primitive for building asynchronous design style requires: (i) the choice of a delay asynchronous logic and implementing the synchronization model, (ii) an information encoding method and (iii) a set of required by most handshaking protocols, which provide the basic devices. Each of these are explored in the rest of this basis for asynchronous communication. Three static CMOS Section. implementations are the Sutherland pull-up pull-down, the 1 Asynchronous circuits can be classified according to weak feedback and the van Berkel implementation. Some several criteria. One important criterion is based on the delays previous work comparing C-element types also mention a of wires and gates. The most robust and restrictive delay fourth type, the dynamic C-element. Since the objective of this model is the delay-insensitive (DI) model, which operates work is to investigate the use of C-elements in the general correctly regardless of gate and wire delay values. scope of asynchronous implementations, where no systematic Unfortunately, this class is too restrictive. The addition of an refresh of dynamic circuits is generally available, this latter assumption on wire delays in some carefully selected forks type is ignored here. enables to define the quasi-delay-insensitive (QDI) circuit This work presents a comparison between the three first C- class. Here, signal transitions must occur at the same time element mentioned implementations. To do so, these were only at each end point of the mentioned forks. QDI circuits are designed and implemented as standard cells in the quite common, although other models, such as bundled-data STMicroelectronics (STM) 65nm CMOS technology. Two [2] are still used in specific contexts. This work assumes the different scopes served to compare the implementations, the use of QDI as target model. 1 There are different ways to encode data to adequately Some works in the literature call the van Berkel C-element symmetric, support delay models. The use of regular binary encoding of because of its special circuit topology and its transition effects. This paper avoids this nomenclature to prevent confusion with the behavioral variations data implies the use of separate request-acknowledge control of C-elements, which are named symmetric and asymmetric or generic. signals. While this makes design straightforward for those 978-1-4673-1036-9/12/$31.00 ©2012 IEEE 437 13th Int'l Symposium on Quality Electronic Design used to synchronous techniques, the timing relationship In [7], Elissati et al. conducted a study comparing four between control and data signals need to be guaranteed at different CMOS implementations of the C-element for a every handshake point, making design of large asynchronous specific case of application, self-timed rings. The work modules difficult and hard to scale. As an alternative, DI compares the performance of rings using each C-element encodings are robust to wire delay variations, because request implementation and through simulation for a 65nm signals are embedded within data signals. An example is the technology. Results show that the van Berkel implementation dual-rail encoding, that uses two wires to represent each bit, seems to be the best trade-off between low power and and can represent bit values as well as the absence of data. operating speed. However, the work does not take into The request signal is computed from the data and therefore account any layout consideration either. Besides, the results demands extra hardware. Throughout this work, circuits will presented are valid mostly for self-timed rings only. In [8], employ dual rail encoding. More efficient DI encodings exist Bastos et al. evaluated transient-fault effects on four C- and are discussed in detail in several publications e.g. in [3]. element implementations. The work shows that the weak- Most of the asynchronous design techniques proposed to feedback C-element is the most transient-fault robust date require devices other than ordinary logic gates and flip- implementation. Moreover, results show that the van Berkel flops available in current standard cell sets. These include e.g. C-element is the fastest and lowest power and area consuming metastability filters, event fork, join and merge devices. implementation. Albeit the work provides concrete results on Although most of these may be built from logic gates this is the robustness of each implementation for transient faults, inefficient. A fundamental device that enables to build such speed, power consumption and area results were not evaluated elements more effectively is the C-element. Its importance through a layout-aware approach. Wire loads and parasitic comes from the fact that C-elements operate as an event were also not taken into account during simulations. synchronizer. Figure 1depicts the truth table and state diagram This paper stands off by scrutinizing the electrical for a C-element with symmetric behavior. Its output only behavior of each C-element CMOS topology after the switches when all inputs have the same logical value. When extraction of the physical implementation. Moreover, it inputs A and B are equal, the output Q assumes this same presents a layout- and application-aware comparison of them value. However, when the inputs are different, the output in generic asynchronous circuits. Comparisons presented here keeps the previous logic value. It is possible to build and use are impartial, because the physical design of the C-elements is several alternative similar behaviors, by individually negating based in a parameterizable flow that allows the generation of the inputs or the output, increasing the number of inputs and different cells with the same driving capability. Results associating differentiated logic behavior to distinct inputs. showed that, differently from previous works concluded, the This last characteristic produces the asymmetric C-elements, van Berkel is not the best option for low area, low power and which are discussed for example in [4]. On the rest of this high-speed designs. An in-depth discussion on electrical paper the discussion restricts attention to different CMOS design and simulation of C-elements is conducted to implementation of the Figure 1 C-element. demonstrate that each C-element type is advantageous in some context. This is, as far as the Authors could verify, the first A B Qi 0 0 0 work to compare C-elements taking into account parasitic Q devices and wire loads, guaranteeing a fairer comparison. 0 1 i- 1 Q IV. C-ELEMENT CMOS IMPLEMENTATION 1 0 i- 1 Three different static CMOS topologies of the C-element 1 1 1 were implemented and compared in this work. Figure 2(a) Figure 1 - A basic C-element truth table and state diagram for symmetric shows the Sutherland pull-up pull-down implementation, behavior with regard to the inputs.