Asynchronous Circuits

Maitham Shams Jo C Eb ergen

Department of Electrical and Engineering Sun Microsystems Lab oratories

UniversityofWaterlo o Waterlo o Ont CANADA Mountain View CA USA

Mohamed I Elmasry

Department of Electrical and Computer Engineering

UniversityofWaterlo o Waterlo o Ont CANADA

Digital VLSI circuits are usually classied into synchronous and asynchronous circuits

Synchronous circuits are generally controlled by global synchronization signals provided by

a clo ck Asynchronous circuits on the other hand do not use such global synchroniza

tion signals Between these extremes there are various hybrids Digital circuits in to days

commercial pro ducts are almost exclusively synchronous Despite this big dierence in p op

ularity there are a numb er of reasons whyasynchronous circuits are of interest

In this article we presentabriefoverview of asynchronous circuits First we address some

of the motivations for designing asynchronous circuits Then we discuss dierent classes of

asynchronous circuits and briey explain some asynchronous design metho dologies Finally

we presenta typical asynchronous design in detail

Motivations for Asynchronous Circuits

Throughout the years researchers have had various reasons for studying and building asyn

chronous circuits Some of the often mentioned advantages of asynchronous circuits are

sp eed low energy dissipation mo dular design imm unity to metastable b ehavior freedom

from clo ckskew and low generation of and low susceptibility to electromagnetic interfer

ence We elab orate here on some of these p otentials and indicate when they have b een

demonstrated through comparative case studies

Sp eed

Sp eed has always b een a motivation for designing asynchronous circuits The main reasoning

b ehind this advantage is that synchronous circuits exhibit worstcase b ehavior whereas

asynchronous circuits exhibit averagecase b ehavior The sp eed of a is

governed by its clo ck frequency The clo ck p erio d should b e large enough to accommo date

the worstcase propagation delay in the critical path of the circuit the maximum clo ckskew

and a safety factor due to uctuations in the chip fabrication pro cess op erating temp erature

and supply voltage Thus synchronous circuits exhibit worstcase p erformance This worst

case b ehavior is dictated by the global clo ck and in spite of the fact that the worstcase

propagation in many circuits particularly arithmetic units is improbable and maybemuch

longer than the averagecase propagation

Many asynchronous circuits are controlled bylocalcommunications and are based on the

principle of initiating a computation waiting for its completion and then initiating the next

one When a computation is completed early the next computation can start earlyFor this

reason the sp eed of asynchronous circuits equipp ed with completiondetection mechanisms

dep end on the computation time of the data b eing pro cessed not the worstcase timing

Accordinglysuchasynchronous circuits exhibit averagecase p erformance An example of

an asynchronous circuit where the averagecase p otential is nicely exploited is rep orted in

an asynchronous divider that is twice as fast as its synchronous counterpart Nevertheless

to date there are few concrete examples demonstrating that the averagecase p erformance

of asynchronous circuits is higher than that of synchronous circuits p erforming similar func

tions The reason is that the averagecase p erformance advantage is often counterbalanced

by the overhead in control circuitry and completiondetection mechanisms

Besides demonstrating the averagecase p otential there are case studies in which the

sp eed of an asynchronous design is compared to the sp eed of a corresp onding synchronous

version Molnar et al rep ort a case study of an asynchronous FIFO that is every bit as fast as

any synchronous FIFO using the same data latches Furthermore the asynchronous FIFO

has the additional b enets that it op erates under lo cal control and is easily expandable At

the end of this article we give an example of a FIFO with a slightly dierent control circuit

Immunity to Metastable Behavior

Any circuit with a numb er of stable states also has metastable states When such a cir

cuit gets into a metastable it can remain there for an indenite p erio d of time b efore

resolving into a stable state Metastable b ehavior o ccurs for example in circuit prim

itives that realize mutual exclusion b etween pro cesses called arbiters and comp onents that

synchronize indep endent signals of a system called synchronizers Although the probabil

ity that metastable b ehavior lasts longer than p erio d t decreases exp onentially with tit

is p ossible that metastable b ehavior in a synchronous circuit lasts longer than one clo ck

p erio d Consequently when metastable b ehavior o ccurs in a synchronous circuit erroneous

data may b e sampled at the the computation time of the clo ck pulses An asynchronous

circuit deals gracefully with metastable b ehavior by simply delaying the computation until

the metastable b ehavior has disapp eared and the element has resolved into a stable state

Mo dularity

Modularity in design is an advantage exploited bymany asynchronous design styles The

basic idea is that an is comp osed of functional mo dules communi

cating along welldened interfaces Comp osing asynchronous systems is simply a matter

of connecting the prop er mo dules with matching interfacial sp ecications The interfacial

sp ecications describ e only the sequences of events that can take place and do not sp ecify

any restrictions on the timing of these events This characteristic reduces the design time

and complexity of an asynchronous circuit b ecause the designer do es not havetoworry

ab out the delays incurred in individual mo dules or the delays inserted by connection wires

Designers of synchronous circuits on the other hand often pay considerable attention to

satisfying the detailed interfacial timing sp ecications

Besides ease of comp osability mo dular design also has the p otential for b etter technology

migration ease of incremental improvement and reuse of mo dules Here the idea is

that an asynchronous system adapts itself more easily to the advances in technology The

obsolete parts of an asynchronous system can b e replaced with new parts to improve system

p erformance Synchronous systems cannot takeadvantage of new parts as easily b ecause

they must b e op erated with the old clo ck frequency or other mo dules must b e redesigned to

op erate at the new clo ck frequency

One of the earliest pro jects that exploited mo dularity in designing asynchronous circuits

is the Macromo dules pro ject Another nice example where mo dular design has b een

demonstrated is the TANGRAM compiler develop ed at Philips Research Lab oratories

LowPower

Due to rapid growth in the use of p ortable equipment and the trend in highp erformance

pro cessors towards unmanageable p ower dissipation energy eciency has b ecome crucial

in VLSI design Asynchronous circuits are attractive for energyecient designs mainly

b ecause of the elimination of the clo ck In systems with a global clo ck all of the latches and

registers op erate and consume dynamic energy during each clo ck pulse in spite of the fact

that many of those latches and registers mightnothave new data to store There is no such

waste of energy in asynchronous circuits b ecause computations are initiated only when they

need to b e done

Two notable examples that demonstrated the p otential of asynchronous circuits when in

energyecient design are the work done at Philips Research Lab oratories and at Manchester

University The Philips group designed a fully asynchronous digital compactcassette DCC

error detector which consumed less energy than a similar synchronous version The

AMULET group at Manchester University successfully implemented an asynchronous version

of the ARM micropro cessor one of the most energyecient synchronous micropro cessors

The asynchronous version achieved a p ower dissipation comparable to the fourth generation

of ARM around mW in a similar technology

Recentlypower management techniques are b eing used in synchronous systems to turn

the clo ck on and o conditionally However these techniques are only worthwhile imple

menting at the level of functional units or higher Besides the comp onents that monitor the

environment for switching the clo ckcontinue dissipating energy

It is also worth mentioning that unlike synchronous circuits most asynchronous circuits

do not waste energy on hazardswhich are spurious changes in a signal Asynchronous

circuits are essentially designed to b e hazardfree Hazards can b e resp onsible for up to

of energy loss in synchronous circuits

Freedom from Clo ckSkew

Because asynchronous circuits generally do not have clo cks they do not havemany of the

problems asso ciated with clo cks One such problem is clock skew the technical term for

the maximum dierence in clo ck arrival time at dierent parts of a circuit In synchronous

circuits it is crucial that all mo dules op erating with a common clo ck receive this signal

simultaneously that is within a tolerable p erio d of time Minimizing clo ckskew is a dicult

problem for large circuits Various techniques have b een prop osed to control clo ckskew but

generally they are exp ensive in terms of silicon area and energy dissipation For instance the

clo ck distribution network of the DEC Alpha a MHz micropro cessor at a V supply

o ccupies of the chip area and has a share in the total chip p ower consumption

Although asynchronous circuits do not havetheclockskew problem they have their own

set of problems in minimizing the overhead needed for synchronization among the parts

Mo dels and Metho dologies

There are many mo dels and metho dologies for analyzing and designing asynchronous circuits

Asynchronous circuits can b e categorized by the following criteria signaling proto col and

data enco ding underlying delay mo del mo de of op eration and formalism for sp ecifying and

designing circuits This section presents an informal explanation of these criteria

Signaling Proto cols and Data Enco dings

Mo dules in an asynchronous circuit communicate data with some signaling proto col con

sisting of request and acknowledgment signals There are two common signaling proto cols

for communicating data b etween a sender and a receiver the fourphase and the twophase

proto col In addition to the signaling proto col there are dierentways to enco de data The

most common enco dings are singlerail and dualrail enco ding We explain the two signaling

proto cols rst and then discuss the data enco dings

If the sender and receiver communicate through a twophase signaling proto col then each

communication cycle has two distinct phases The rst phase consists of a request initiated

by the sender The second phase consists of an acknowledgmentby the receiver The request

and acknowledgment signals are often implemented byvoltage transitions on separate wires

No distinction is made b etween the directions of voltage transitions Both rising and falling

transitions denote a signaling event

ol consists of four phases a request followed byanac The fourphase signaling protoc

knowledgment followed by a second request and nally a second acknowledgment If the

request and acknowledgment are implemented byvoltage transitions then at the end of

every four phases the signaling wires return to the same voltage levels as at the start of the

four phases Because the initial voltage is usually zero this typ e of signaling is also called

returntozero signaling Other names for twophase and fourphase signaling are twocycle

and fourcycle signaling resp ectivelyor transition and level signaling resp ectively

Both signaling proto cols can b e used with single and dualrail data enco dings In single

rail data encoding each bit is enco ded with one wire whereas in dualrail encodingeachbit

is enco ded with two wires

In singlerail enco ding the value of the bit is represented by the voltage on the data

wire When communicating n data bits with a singlerail enco ding during p erio ds where

the data wires are guaranteed to remain stable wesay that the data are valid During p erio ds

where the data wires are p ossibly changing wesay the data are invalid Atwophase or

fourphase signaling proto col is used to tell the receiver when data are valid or invalid The

sender informs the receiver ab out the validity of the data through the request signal and the

receiver in turn informs the sender of the receipt of the data through the acknowledgment

signal Therefore to communicate n bits of data a total number of n wires are necessary

between the sender and the receiver The connection pattern for singlerail enco ding and

two or fourphase signaling is depicted in Figure a

Request R R S E S E E n C E 2n C N E N E DATA DATA D I D I E V E V R E R E AcknowledgeR Acknowledge R

(a) Bundled Data Convention (b) Dual-Rail Data Encoding

Figure Two dierentdatacommunication schemes

Figure a shows the sequence of events in a twophase signaling proto col The events

include the times when the data b ecome valid and invalid The transparent bars indicate

the p erio d when data are valid during the other p erio ds data are invalid Notice that a

request signal o ccurs only after data b ecome valid This is an imp ortant timing restriction

asso ciated with these communication proto cols namely the request signal that indicates

that data are valid should always arrive at the receiver after all data wires have attained

their prop er value The restriction is referred to as the bund ling constraintFor this reason

the communication proto col is often called the bund led data protocol Figure b shows a

sequence of events in a fourphase proto col and singlerail data enco ding Other sequences

are also applicable for the fourphase proto col

The dualrail enco ding scheme uses two wires for every data bit There are several dual

rail enco ding schemes All combine the data enco ding and signaling proto col There is

no explicit request signal and the dualrail enco ding schemes all require n wires as Request Request

Data Data

Acknowledge Achnowledge

One Cycle One Cycle

(a) (b)

Figure Data transfer in twophase signaling a and fourphase signaling b

illustrated in Figure b In the case of fourphase signaling there are several enco dings

that can b e used to transmit a data bit The most common enco ding has the following

meaning for the four states in whicheach pair of wires can b e in reset valid

valid and is an unused state Every pair of wires has to go through the reset state

b efore b ecoming valid again In the rst phase of the fourphase signaling proto col every

pair of wires leaves the reset state for a valid or state The receiver detects the arrival of a

new set of valid data when all pairs of wires have left the reset state This detection replaces

an explicit request signal The second phase consists of an acknowledgment to inform the

sender that data has b een consumed The third phase consists of the reset of all pairs of

wires to the reset state and the fourth phase is the reset of the acknowledgment

In a twophase signaling proto col a dierent dualrail enco ding is used An example of

an enco ding is as follows Each pair of wires has one wire asso ciated with a and one wire

asso ciated with a A transition on the wire asso ciated with represents the communication

of a whereas a transition on the other wire represents a communication of a Thus a

transition on one wire of each pair signals the arrival of a new bit value A transition on

b oth wires is not allowed In the rst phase of the twophase signaling proto col every pair of

wires communicates a or a The second phase is an acknowledgment sentby the receiver

Of all data enco dings and signaling proto cols the most p opular are the singlerail enco d

ing and fourphase signaling proto col The main advantages of these proto cols are the small

numb er of connection wires and the simplicity of the enco ding which allows using conven

tional techniques for implementing data op erations The disadvantages of these proto cols

are the bundling constraints that must b e satised and the extra energy and time wasted

in the additional two phases compared with twophase signaling Dualrail data enco dings

have b een used to communicate data in async hronous circuits free of any timing constraints

Dualrail enco dings however are exp ensive in practice b ecause of the manyinterconnec

tion wires the extra circuitry to detect completion of a transfer and the dicultyindata

pro cessing

Delay Mo dels

An imp ortantcharacteristic distinguishing dierent asynchronous circuit styles is the delay

mo del on which they are based For each circuit primitive gate or wire a delay model

stipulates the sort of delay it imp oses and the range of the delays Delay mo dels are needed

to analyze all p ossible b ehavior of a circuit for various correctness conditions like the absence

of hazards

A circuit is comp osed of gates and interconnection wires all of which imp ose delays on

the signals propagating through them The delay mo dels are categorized into two classes

pure delay mo dels and inertial delay mo dels In a pure delay mo del the delay asso ciated with

a circuit comp onent pro duces only a time shift in the voltage transitions In reality a circuit

comp onentmay shift the signals and also lter out pulses of small width A delaymodel

which captures this fact is called an inertial delay mo del Both classes of delay mo dels can

have several ranges for the delay shifts We distinguish the zerodelay xeddelay bounded

delayandunboundeddelay mo dels In the zerodelay mo del the values of the delays are

zero In the xeddelay mo del the values of the delays are constant whereas in the b ounded

delay mo del the values of the delays vary within a b ounded range The unb oundeddelay

mo del do es not imp ose any restriction on the value of the delays except that they cannot

b e innite Sometimes two dierent delay mo dels are assumed for the wires and the gates

in an asynchronous circuit For example the op eration of a class of asynchronous circuits is

based on the zerodelay mo del for wires and the unb oundeddelay mo del for gates Formal

denitions of the various delay mo dels are given in

A concept closely related to the delay mo del of a circuit is its mode of operation The

mo de of op eration characterizes the interaction b etween a circuit and its environment Clas

sical asynchronous circuits op erate in the fundamental mode which assumes that the

environmentchanges only one input signal and waits until the circuit reaches a stable state

Then the environment is allowed to apply the next change to one of the input signals Many

mo dern asynchronous circuits op erate in the inputoutput mo de In contrast to the funda

mental mo de the inputoutput mo de allows for input changes immediately after receiving

an appropriate resp onse to a previous input change even if the entire circuit has not yet

stabilized The fundamental mo de was intro duced in the sixties to simplify the analysis and

design of gate circuits with Bo olean algebra The inputoutput mo de evolved in the eighties

from eventbased formalisms to describ e mo dular design metho ds that abstracted from the

internal op eration of a circuit

Formalisms

Just as in any other design discipline designers of asynchronous circuits use various for

malisms to master the complexities in the design and analysis of their artifacts The for

malisms used in asynchronous circuit design can b e categorized into two classes formalisms

based on Bo olean algebra and formalisms based on sequences of events Most design metho d

ologies in asynchronous circuits use some mixture of b oth formalisms

The design of many asynchronous circuits is based on Bo olean algebra or its derivative

switching theory Such circuits often use the fundamental mo de of op eration the b ounded

delay mo del and have as primitive elements gates that corresp ond to the basic func

tions like and or and inversion These formalisms are convenient for implementing logic

functions analyzing circuits for the presence of hazards and synthesizing fundamentalmo de

circuits

Eventbased formalisms deal with sequences of events rather than binary logic variables

Circuits designed with an eventbased formalism op erate in the inputoutput mo de un

der an unb oundeddelay mo del and have as primitive elements the join the toggle

and the merge for example Eventbased formalisms are particularly convenient for de

signing asynchronous circuits when a high degree of concurrency is involved Several to ols

have b een generated for the automatic verication of asynchronous circuits with eventbased

formalisms Examples of eventbased formalisms are Trace Theory DI Alge

bra Petri nets and Signal Transition Graphs

Design Techniques

This section intro duces the most p opular typ es of asynchronous circuits and briey describ es

some of their design techniques

Typ es of Asynchronous Circuits

There are sp ecial typ es of asynchronous circuits for which formal and informal sp ecications

have b een given Here are brief informal descriptions of some of them in a historical context

There are twotyp es of logic circuits combinational and sequential The output of a

combinational circuit dep ends only on the current inputs whereas the output of a sequential

circuit dep ends also on the previous sequences of the inputs With this denition of a

sequential circuit almost all asynchronous circuit styles fall into this category However

the term asynchronous sequential circuits or machines generally refers to those asynchronous

circuits based on nite state machines similar to those in synchronous sequential circuits

Muller was the rst to give a rigorous formalization of a sp ecial typ e of circuits for which

he coined the name speedindependent circuits An account of this formalization is given

in Informally a sp eedindep endent circuit is a network of gates that satises its

sp ecication irresp ectiveofany gate delays

From a design discipline that was develop ed as part of the Macromo dules pro ject at

Washington University in St Louis the concept of another typeofasynchronous circuits

evolved whichwas given the name delayinsensitive circuit that is a network of mo dules

that satises its sp ecication irresp ectiveofany element and wire delays It was realized

that prop er formalization of this concept was needed to sp ecify and design such circuits in

en by Udding awelldened manner Such a formalization was giv

Another name frequently used in designing asynchronous circuits is selftimed systems

This name was intro duced by Seitz A selftimed system is describ ed recursively as

either a selftimed element or a legal connection of selftimed systems The idea is that

selftimed elements can b e implemented with their own timing discipline and some may

even have synchronous implementations In comp osing selftimed systems from selftimed

elements however no reference to the timing of events is made only the sequence of events

is relevant In other words the elements keep time to themselves

Some have found the unb ounded gateandwire delay assumption on which the concept

of a delayinsensitive circuit is based to b e to o restrictive in practice For example the

unb ounded gateandwire delay assumption implies that a signal senttomultiple recipients

by a fork can incur a dierentunb ounded delay for each of the recipients They prop osed

to relax this delay assumption slightly byusing isochronic forks An iso chronic fork is a

fork whose dierence in the delays of its branches is negligible compared with the delays in

the element to which it is connected A delayinsensitive circuit that uses iso chronic forks

is called a quasidelayinsensitive circuit Although the use of iso chronic forks gives

more design freedom in exchange for less delay insensitivity care has to b e taken with its

implementation

Asynchronous Sequential Machines

The design of asynchronous sequential nite state machines was initiated with the pioneer

ing work of Human He prop osed a structure similar to that of synchronous sequential

circuits consisting of a combinational logic circuit inputs outputs and state variables

Human circuitshowever store the state variables in feedback lo ops containing delay ele

ments instead of in latches or ipops as synchronous sequential circuits do The design

pro cedure b egins with creating a ow table and reducing it through some state minimiza

tion technique After a state assignment the pro cedure obtains the Bo olean expressions

and implements them in combinational logic with the aid of a logic minimization program

To guarantee a hazardfree op eration Human circuits adopt the restrictive singleinput

change fundamental mo de that is the environmentchanges only one input and waits until

the circuit b ecomes stable b efore changing another input This requirement can substantially

degrade the circuit p erformance Hollaar realized this fact and intro duced a new structure

in which the fundamental mo de assumption is relaxed In his implementation the state

variables are stored in nand latches so that inputs are allowed to change earlier than the

fundamentalmodewould allow Although Hollaars metho d improves the p erformance it

suers from the danger of pro ducing hazards Besides neither technique seem to b e adequate

for designing concurrent systems Mo dels and algorithms for the analysis of asynchronous

sequential circuits have b een develop ed byBrzozowski and Seger

The quest for more concurrency higher p erformance and hazardfree op eration resulted

in the formulation of a new generation of asynchronous sequen tial circuits known as burst

mode machines A burstmo de circuit do es not react until the environment p erforms

anumb er of input changes called an input burstTheenvironment in turn is not allowed

to intro duce the next input burst until the circuit pro duces a numb er of outputs called an

output burst A state graph is used to sp ecify the transitions caused by the input and output

bursts Twosynthesis metho ds have b een prop osed and automated for implementing burst

mo de circuits The rst metho d employs a lo cally generated clo cktoavoid some hazards

The second metho d uses threedimensional ow tables and is based on Human circuits

One limitation of burst mo de circuits is that they restrict concurrency within a burst

Sp eedIndep endent Circuits and STG synthesis

Sp eedindep endent circuits are usually designed by a form of Petri nets A p opular ver

sion of Petri nets signal transition graphs STG was intro duced byChu He also develop ed

asynthesis technique for transforming STGs into sp eedindep endent circuits Chus work

was extended by Meng who pro duced an STGbased to ol for synthesizing sp eedindep endent

circuits from highlevel sp ecications In this technique a circuit is comp osed of com

putational blo cks and interconnection blo cks Computational blo cks range from a simple

shifter mo dule to more complicated ones such as ALUs RAMs and ROMs Interconnec

tion blo cks synchronize the op eration of computational blo cks by pro ducing appropriate

control signals Computational blo cks generate completion signals after their output data

b ecome valid The interconnection blo cks use the completion signals to generate fourphase

handshake proto cols

DelayInsensitive Circuits and Compilation

Several researchers have prop osed techniques for designing delayinsensitive circuits Eb er

gen has develop ed a synthesis metho d based on the formalism of TraceTheory The

metho d consists of sp ecifying a comp onentby a program and then transforming this program

into a delayinsensitivenetwork of basic elements The program notation allows sp ecifying

parallel b ehavior Eb ergens metho d has b een applied to the design of small comp onents

like stacks various counters and arbiters

Martin prop oses a metho d that starts with a sp ecication of an asynchronous circuit

in a highlevel programming language similar to Hoares Communicating Sequential Processes

CSP An asynchronous circuit is sp ecied as a group of pro cesses communicating over

channels After various transformations the program is mapp ed into a network of gates

This metho d led to the design of an asynchronous micropro cessor in Martins

metho d yields quasidelayinsensiti ve circuits

Van Berkel has designed a compiler based on a highlevel language called TangramA

Tangram program also sp ecies a set of pro cesses communicating over channels A Tangram

program is rst translated into a handshake circuit Then these handshake circuits are

mapp ed into various target architectures dep ending on the dataenco ding techniques or

standardcell libraries used The translation is syntaxdirected which means that every

op eration o ccurring in a Tangram program corresp onds to a primitive in the translated

handshake circuit This prop erty is exploited byvarious to ols that quickly estimate the

area p erformance and energy dissipation of the nal design by analyzing the Tangram

program Van Berkels metho d also yields quasidelayinsensiti ve circuits

Other translation metho ds from a CSPlike language to a quasi delayinsensitive circuit

can b e found in

ATypical Asynchronous Design

In this section we presentatypical asynchronous design a micropipeline The circuit

uses singlerail enco ding with the twophase signaling proto col to communicate data b etween

stages of the pip eline The control circuit for the pip eline is a delayinsensitive circuit First

we present the primitives for the control circuit then we present the latches that store the

data and nally we present the complete design

The Control Primitives

Figure shows a few simple primitives used in eventbased design styles The schematic

symb ol for each primitive is depicted opp osite its name

WIRE ab

IWIRE ab

a JOIN c b

a MERGE M c b

b TOGGLE a

c

Figure Some delayinsensitive primitives

The simplest primitive is the wireatwoterminal element that pro duces an output

event on its output terminal b after every input event on its input terminal a Input and

output events in a wire must alternate An input event a must b e followed by an output

event b b efore another event a o ccurs A wire is physically realizable with a wire and events

are implemented byvoltage transitions An initialized wireor iwireisvery similar to a

wire except that it starts by pro ducing an output event b instead of accepting an input

event a after this its b ehavior exactly resemblesthatofawire

The primitive for synchronization is the join also called the rendezvous A join

has twoinputsa and b and one output cThejoin p erforms the and op eration of twoevents

a and b It pro duces an output event c only after b oth of its inputs a and b received an

event The inputs can change again after an output is pro duced A join can b e implemented

bya Mul ler Celement explained in the next section

The merge comp onent p erforms the or op eration of twoevents If a merge comp onent

receives an event on either of its inputs a or b it pro duces an output event c After an input

event there must b e an output event successive input events are not allowed A merge

can b e implemented byaxor gate

The toggle has a single input a and two outputs b and c After an event on input aan

event o ccurs on output b The next eventon a results in a transition on output c An input

eventmust b e followed byanoutputevent b efore another input event can o ccur Thus

output events alternate or toggle after each input event The dot in the toggle schematic

indicates the output which pro duces the rst event

The Muller CElement

The Muller Celement is named for its inventor D E Muller Traditionall y its logical

behavior is describ ed as follows If b oth inputs are then the output b ecomes

otherwise the output remains the same For the prop er op eration of the Celement it is also

assumed that once b oth inputs b ecome they will not change again until the output

changes A state diagram is given in Figure The b ehavior of the output c of the Celement

100 011 a b a b c 000 110 111 001

b a b a 010 101

c

Figure State diagram of the Celement

is expressed in terms of the inputs a and b and the previous state of the outputc by the

following Bo olean function

c c a b a b

The Celement can b e used for implementing the join whichhasaslightly more re

strictiveenvironment b ehavior in the sense that an input is not allowed to change twice in

succession A state graph for the join is pro duced by replacing the bidirectional arcs by

unidirectional arcs

There are many implementations of the Celement Wehave given two p opular CMOS

implementations in Figure Implementation a is a conventional pullup pulldown imple

mentation suggested by Sutherland Implementation b is suggested byVan Berkel

Each implementation has its own characteristics Implementation b is the b est choice for sp eed and energy eciency

VDD V DD

a P1 c P3 b a P3 P4 b a P1 P5 P6 b P2 P5 b p2 P4 a P6 c’ c c’ c

b N2 N5 b N2 N4 a N6 N6 N5 a N1 b N3 N4 a a N1 c N3 b

(a) (b)

Figure Two CMOS implementations of the Celement a conventional and b symmetric

Storage Primitives

Nowwe discuss twoeventcontrolled latches due to Sutherland as depicted in Figure

Their op eration is managed through two input control signals capture and pass lab eled c

and p resp ectively They also havetwo output control signals capture done cd and pass

done pd The input data is lab eled D and the output data is lab eled Q Implementation

a is comp osed of three socalled doublethrow switches Implementation b includes a

mergeatoggle and a levelcontrolled latch consisting of a doublethrow switch and an

inverter A doublethrowswitchisschematically represented byaninverter and a switching

tail The tail toggles b etween two p ositions based on the logic value of a controlling signal

A doublethrow switch in fact is a twoinput multiplexer that pro duces an inverted version

of its selected input A CMOS implementation of the doublethrow switch is shown in

Figure The p osition of the switch corresp onds to the state where c is low c p cp

M D D Q Q

cd pd cd pd

(a) (b)

Figure Twoeventdriven latch implementations

VDD

c’ c c x y x z z y x y

cc’

Figure A CMOS implementation of a doublethrow switch

An eventcontrolled latch can assume two states transparent and opaque In the trans

parent state no data is latched but the output replicates the input b ecause a path of two

inverting stages exists b etween the input and the output In the opaque state this path is

disconnected so that the input data can change without aecting the output the current

data at the output however is latched Implementations in Figures a and b are b oth

shown in their initial transparent states The capture and pass signals in an eventcontrolled

latch always alternate Up on a transition on cthelatch captures the current input data and

b ecomes opaque The following transition on cd is an acknowledgment to the data provider

that the current data has b een captured and that the input data can b e changed safelyA

subsequent transition on p returns the latchback to its transparent state to pass the next

data to its output The p signal is acknowledged by a transition on pd Notice that in

implementation a of Figure signals cd and pd are merely delayed and p ossibly amplied

versions of c and p resp ectively

A group of eventcontrolled latc hes similar to implementation a of Figure can b e

connected sharing a capture wire and a pass wire to form an eventcontrolled register of

arbitrary data width Implementation b of Figure can b e generalized similarly into a

register by inserting additional levelcontrolled latches b etween the merge and the toggle

A comparison of dierent micropip eline latches is rep orted in and later in

Pip elining

Pip elining is a p owerful technique for constructing highp erformance pro cessors Micropip elines

are elegant asynchronous circuits that have gained much attention in the asynchronous com

munity Many VLSI circuits based on micropip elines have b een successfully fabricated The

AMULET micropro cessor is one example

The simplest form of a micropip eline is a FIFO A fourstage FIFO is shown in Figure

It has a control circuit comp osed solely of interconnected joinsandadatapathofevent

controlled registers The control signals are indicated by dashed lines The thick arrows

show the direction of data ow Data is implemented with singlerail enco ding and the data

path is as wide as the registers can accommo date Adjacent stages of the FIFO communicate

through a twophase bundleddata signaling proto col This means that a request arriveat

the next stage only when the data for that stage b ecomes valid A bubble at the input of

a join is a shorthand for a join with an iwire on that input It implies that initially

an event has already o ccurred on the input with the bubble and the join can pro duce an

output event immediately up on receiving an event on the other input

Rin a1 r2 a3 Rout

c pd cd p c pd cd p

Din REG REG REG REG Dout

cd p c pd cd p c pd

Ain r1 a2 r3 Aout

Figure A fourstage micropip eline FIFO structure

Initially all control wires of the FIFO are at lowvoltage and the data in the registers are

not valid The FIFO is activated by a rising transition on R which indicates that input

in

data is valid Subsequently the rststage join pro duces a rising output transition This

signal is a request to the rststage register to capture the data and b ecome opaque After

capturing the data the register pro duces a rising transition on its cd output terminal This

causes a transition on A and a transition on r which is a request to the second stage of

in

the FIFO Meanwhile the data has pro ceeded to the secondstage register and has arrived

there b efore the transition on r o ccurs If the environment do es not send any new data

the rst stage remains idle and the data and the request signals propagate further to the

right Notice that each time the data is captured by a stage an acknowledgmentissent

back to the previous stage which causes its latch to b ecome transparent again When the

data has propagated to the last register it is stored and a request signal R is forwarded

out

to the consumer of the FIFO Atthispoint all control signals are at high voltage except

for A If the data is not removed out of the FIFO that is A remains low the next

out out

data coming from the pro ducer advance only up to the thirdstage register b ecause the

fourthstage join cannot pro duce an output Finally A also b ecomess high when the

out

consumer acknowledges receipt of the data Further data storage and removal follows the

same pattern The op eration of each join can b e interpreted as follows If the previous

stage has sent a request for data capture and the present stage is empty then send a signal

to capture the data in the present stage

The FIFO can b e mo died easily to include data pro cessing A fourstage micropip eline

in its general form is illustrated in Figure Now the data path consists of alternately p o

sitioned eventdriven registers and combinational logic circuits The eventdriven registers

store the input and output data of the combinational circuits and the combinational cir

cuits p erform the necessary data pro cessing To satisfy the data bundling constraint delay

elements may o ccasionally b e required to slowdown the propagation of the request signals

A delayelementmust at least match the delay through its corresp onding combinational

logic circuit either by some completion detection mechanism or through the insertion of a

worstcase delay

A micropip eline FIFO is exible in the numb er of data items it buers There is no

restriction on the rate at whichdataenters or exits the micropip eline except for the delays

imp osed by the circuit elements That is why this FIFO and micropip elines generally are

termed elasticIncontrast in an ordinary synchronous pip eline the rates at which data enter

and exit the pip eline are the same dictated by the external clo ck signal A micropip eline

is also exible in the amount of energy it dissipates which is prop ortional to the number

of data movements Aclocked pip eline however continuously dissipates energy as if all hin er go or more e bibliog erication ed pip eline v k t this feature e feature of a er that within ev w Acloc hronous circuits and y t areas of v b ecause it should nev e hop e ho hanism to implemen erhead selftimed ns b CMOS v ol pp No t mec v t to the area of async cuits an tly consumes energy hronous circuits W uts o when there is no activit k managemen witz A zeroo vided enough information for further readings F Din er constan epro Ain ev v Rin w eha Among the topics omitted are the imp ortan REG cd c hronous circuits please see or A comprehensiv pd 1r a3 r2 a1 p Figure A general fourstage micropip eline structure hanism ho DELAY hronous circuits can b e found in Uptodate information on researc IEEE Journal of SolidState Cir hed only on a few topics relev LOGIC y others r1 e touc v Concluding Remarks REG c cd hronous circuit design can b e found at divider y of async The authors wish to thank Bill Coates for his generous criticisms of a previous draft of pd a2 p eha T E Williams and M A Horo omitted man the scop e of these pages w testing and p erformance analysis of async information on async idle raph async W stages of the pip elinemicropip eline capture is and that pass it automatically data sh all the time Another attractiv on the other hand requires a sp ecialThis clo c sensing mec this article References DELAY

LOGIC REG cd c pd p DELAY

LOGIC r3 REG c cd pd p DELAY

LOGIC Aout Rout Dout

C E Molnar I W Jones B Coates and J Lexau A FIFO ring oscillator p erformance

exp eriment in Proc International Symposium on AdvancedResearch in Asynchronous

Circuits and Systems IEEE Computer So ciety Press Apr

T J Chaney and C E Molnar Anomalous b ehavior of synchronizer and arbiter

circuits IEEE Transactions on vol C pp Apr

L R Marino General theory of metastable op eration IEEE Transactions on Com

putersvol C pp Feb

R F Sproull and I E Sutherland Asynchronous Systems Palo Alto Sutherland

Sproull and Asso ciates Vol I Intro duction Vol I I Logical eort and asyn

chronous mo dules Vol I I I Case studies

W A Clark and C E Molnar Macromo dular computer systems in Computers in

Biomedical Research R W Stacy and B D Waxman eds vol IV ch pp

Academic Press

K v Berkel and M Rem VLSI programming of asynchronous circuits for lowpower

in Asynchronous Digital Circuit Design G Birtwistle and A Davis eds Workshops

in Computing pp SpringerVerlag

K v Berkel R Burgess J Kessels A Peeters M Roncken and F Schalij A fully

asynchronous lowp ower error corrector for the DCC player IEEE Journal of Solid

State Circuitsvol pp Dec

S Furb er Computing without clo cks Micropip elining the ARM pro cessor in Asyn

chronous Digital Circuit Design G Birtwistle and A Davis eds Workshops in Com

puting pp SpringerVerlag

A P Chandrakasan S Sheng and R W Bro dersen Lowp ower CMOS digital de

sign IEEE Journal of SolidState Circuitsvol pp Apr

D W Dobb erpuhl and et al A mhz b dualissue cmos micropro cessor IEEE

Journal of SolidState Circuitsvol pp Nov

J A Brzozowski and CJ H Seger Asynchronous Circuits SpringerVerlag

E J McCluskeyFundamental mo de and pulse mo de sequential circuits in Proc of

IFIP Congress pp NorthHolland

S H Unger Asynchronous Sequential Switching Circuits New York Wiley

Interscience John Wiley Sons Inc

D L Dill Trace Theory for Automatic Hierarchical Verication of SpeedIndependent

CircuitsACM Distinguished Dissertations MIT Press

J Eb ergen and S Gingras A verier for network decomp ositions of commandbased

sp ecications in Proc Hawaii International Conf System Sciencesvol I IEEE Com

puter So ciety Press Jan

K v Berkel Handshake Circuits an Asynchronous Architecture for VLSI Program

mingvol of International Series on Paral lel Computation Cambridge University

Press

J C Eb ergen J Segers and I Benko Parallel program and asynchronous circuit

design in Asynchronous Digital Circuit Design G Birtwistle and A Davis eds

Workshops in Computing pp SpringerVerlag

T Verho e A Theory of DelayInsensitive Systems PhD thesis Dept of Math and

CS Eindhoven Univ of TechnologyMay

erview of DI algebra in Proc Hawaii Inter M B Josephs and J T Udding An ov

national Conf System Sciencesvol I IEEE Computer So ciety Press Jan

TA Chu Synthesis of SelfTimedVLSICircuits from GraphTheoretic Specications

PhD thesis MIT Lab oratory for Computer Science June

T HY Meng Asynchronous Design for Processing Architectures PhD

thesis UC Berkely

D A Human The synthesis of sequential switching circuits IRE Transactions on

Electronic Computersvol no

D E Muller and W S Bartky A theory of asynchronous circuits in Proceedings of an

International Symposium on the Theory of Switching pp Harvard University

Press Apr

R E Miller Sequential Circuits and Machinesvol of Switching Theory John Wiley

Sons

J T Udding Classication and Composition of DelayInsensitive Circuits PhD thesis

Dept of Math and CS Eindhoven Univ of Technology

C L Seitz System timing in Introduction to VLSI Systems C A Mead and L A

Conway eds ch AddisonWesley

A J Martin Programming in VLSI From communicating pro cesses to delay

insensitive circuits in Developments in Concurrency and Communication C A R

Hoare ed UT Year of Programming Series pp AddisonWesley

K v Berkel Beware the iso chronic fork Integration the VLSI journalvol

pp June

L A Hollaar Direct implementation of asynchronous control units IEEE Transac

tions on Computersvol C pp Dec

B Coates A Davis and K Stevens The Post Oce exp erience Designing a large

asynchronous chip Integration the VLSI journalvol pp Oct

A Davis Synthesizing asynchronous circuits Practice and exp erience in Asyn

chronous Digital Circuit Design G Birtwistle and A Davis eds Workshops in Com

puting pp SpringerVerlag

S M Nowick and D L Dill Automatic synthesis of lo callyclo cked asynchronous state

machines in Proc International Conf ComputerAided Design ICCAD pp

IEEE Computer So ciety Press Nov

K Y Yun and D L Dill Automatic synthesis of D asynchronous state machines in

Proc International Conf ComputerA ided Design ICCAD pp IEEE Com

puter So ciety Press Nov

J L Peterson Petri nets Computing Surveysvol pp Sept

T HY Meng R W Bro dersen and D G Messerschmitt Automatic synthesis of

asynchronous circuits from highlevel sp ecications IEEE Transactions on Computer

Aided Designvol pp Nov

J C Eb ergen Translating Programs into DelayInsensitive CircuitsvolofCWI

TractCentre for Mathematics and Computer Science

C A R Hoare Communicating Sequential Processes PrenticeHall

A J Martin S M Burns T K Lee D Borkovic and P J Hazewindus The design

of an asynchronous micropro cessor in AdvancedResearch in VLSI Proceedings of the

Decennial Caltech Conference on VLSI C L Seitz ed pp MIT Press

E Brunvand and R F Sproull Translating concurrent programs into delayinsensitive

circuits in Proc International Conf ComputerAided Design ICCAD pp

IEEE Computer So ciety Press Nov

S Web er B Blo om and G Brown Compiling Joy to silicon in Proceedings of

BrownMIT ConferenceonAdvancedResearch in VLSI and Paral lel Systems T Knight

and J Savage eds pp MIT Press Mar

I E Sutherland Micropip elines Communications of the ACMvol pp

June

M Shams J Eb ergen and M Elmasry A comparison of CMOS implementations of

an asynchronous circuits primitive the Celement in International Symposium on Low

Power Electronics and Design pp Aug

PDay and J V Wo o ds Investigation into micropip eline latch design styles IEEE

Transactions on VLSI Systemsvol pp June

K Y Yun P A Beerel and J Arceo Highp erformance asynchronous pip eline cir

cuits in Proc International Symposium on AdvancedResearch in Asynchronous Cir

cuits and Systems IEEE Computer So ciety Press Mar

wick Asynchronous circuit design Motivation background and A Davis and S M No

metho ds in Asynchronous Digital Circuit Design G Birtwistle and A Davis eds

Workshops in Computing pp SpringerVerlag

S Hauck Asynchronous design metho dologies An overview Proceedings of the IEEE

vol Jan

A Peeters The Asynchronous BibliographyBibT X database le asyncbib

E

ftpftpwintuenlpubtexasyncbibZ Corresp onding email address

asyncbibwintuenl

J Garside The Asynchronous Logic Homepage

httpwwwcsmanacukamuletasync