ELEC3221 Digital IC & Sytems Design

Iain McNally Design Digital IC & Sytems Design Content • – Introduction – Overview of Technologies Iain McNally – Layout – CMOS Processing 15 lectures ≈ – Design Rules and Abstraction – Cell Design and Euler Paths Koushik Maharatna – System Design using Standard Cells – Wider View 15 lectures ≈ Notes & Resources • https://secure.ecs.soton.ac.uk/notes/bim/notes/icd/

1001 1003

Digital IC & Sytems Design History

Assessment 1947 First Transistor • John Bardeen, Walter Brattain, and William Shockley (Bell Labs) 10% Coursework L-Edit Gate Design (BIM) 90% Examination 1952 Integrated Circuits Proposed Geoffrey Dummer (Royal Establishment) - prototype failed... Books • 1958 First Integrated Circuit Integrated Circuit Design (Texas Instruments) - Co-inventor a.k.a. Principles of CMOS VLSI Design - A Circuits and Systems Perspective 1959 First Planar Integrated Circuit Neil Weste & David Harris (Fairchild) - Co-inventor Pearson, 2011 1961 First Commercial ICs Digital System Design with SystemVerilog Simple logic functions from TI and Fairchild Mark Zwolinski 1965 Moore’s Law Pearson Prentice-Hall, 2010 Gordon Moore (Fairchild) observes the trends in integration.

1002 1004 History History

Moore’s Law Moore’s Law; a Self-fulfilling Prophesy Predicts exponential growth in the number of components per chip. The whole industry uses the Moore’s Law curve to plan new fabrication facilities.

1965 - 1975 Doubling Every Year Slower - wasted investment In 1965 Gordon Moore observed that the number of components per chip had Must keep up with the Joneses2. doubled every year since 1959 and predicted that the trend would continue through to 1975. Faster - too costly Moore describes his initial growth predictions as ”ridiculously precise”. Cost of capital equipment to build ICs doubles approximately ev- ery 4 years. 1975 - 201? Doubling Every Two Years In 1975 Moore revised growth predictions to doubling every two years. Moore’s law is not dead (at least not quite), although there are worries that below 20nm, Growth would now depend only on process improvements rather than on clever processing required for smaller transistors means that cost per transistor is going more efficient packing of components. up rather than down. In 2000 he predicted that the growth would continue at the same rate for an- How will you future engineers increase the number of transistors? other 10-15 years before slowing due to physical limits. 2or the Intels 1005 1007

History

Moore’s Law at Intel1

18−core Xeon E5 v3 10,000,000,000 72−core Dual Core Itantium 2 10−core XeonXeon Phi 1,000,000,000 Itantium 2 Westfield−Ex Itantium 100,000,000 Pentium 4 Pentium III 10,000,000 Pentium II Pentium 1,000,000 486 DX 286 386 100,000 8086

10,000 4004 8080 8008 1,000 1970 1980 1990 2000 2010

1Intel was founded by Gordon Moore and Robert Noyce from Fairchild 1006 1947 Point Contact transistor1961 Fairchild Bipolar RTL RS Flip−Flop (4 Transistors)

Source: Bell Labs Source: Fairchild Moore’s Law (1965) Number of transistor has doubled every year and will continue to do so until 1975

Moore’s Law (1975) Number of transistors will double every two years

18−core Xeon E5 v3 10,000,000,000 72−core Dual Core Itantium 2 Xeon Phi 1,000,000,000 10−core Xeon Itantium 2 Westfield−Ex Itantium Self−fulfilling 100,000,000 Prophecy Pentium 4 Pentium III 10,000,000 Pentium II Pentium 1,000,000 486 DX 286 386 100,000 8086

10,000 4004 8080 8008 1,000 1970 1980 1990 2000 2010

1000 Overview of Technologies Overview of Technologies

Components for Logic All functions can be realized using only the NOR gates1 available in the RTL logic 2 Diode family.

PN

Bipolar Transistors

N P P N N P

MOS Transistors

Enhancement Depletion Silicon Oxide Insulator

G G N−Channel

SD SDNNP SUB SUB P−Channel 1Note that an inverter is a special case of a NOR gate with only one input. Field induced N−channel 2NAND gates could be used instead for logic families which support only NAND gates. 2001 2003

Overview of Technologies Overview of Technologies

Other Bipolar Technologies TTL NAND Gate ECL OR/NOR Gate A A Z Z B B Z

RTL Inverter and NOR gate VCC VCC

Z

VCC A VCC A A Z Z Z Z B B AB

Z Z

A AB VEE

TTL gives faster switching than RTL at the expense of greater complexity3. The • characteristic multi-emitter transistor reduces the overall component count. ECL is a very high speed, high power, non-saturating technology. • 3Most TTL families are more complex than the basic version shown here 2002 2004 Overview of Technologies Overview of Technologies

NMOS - a VLSI technology. CMOS logic V V DD DD CMOS - state of the art VLSI.

A A Z Z VDD B B

V Z Z DD

A

AZ A Z AB B

Circuit function determined by series/parallel combination of devices. • Depletion transistor acts as non-linear load resistor. • An active PMOS device complements the NMOS device giving: Resistance increases as the enhancement device turns on, thus reducing power • consumption. – rail to rail output swing. The low output voltage is determined by the size ratio of the devices. – negligible static power consumption. •

2005 2007

Overview of Technologies Digital CMOS Circuits

Alternative transistors representations for NMOS circuits Alternative transistor representations for CMOS circuits

VDD VDD VDD VDD VDD

V N−Channel DD P−Channel Depletion Enhancement

Z Z Z Z Z

A A A A A N−Channel Enhancement N−Channel Enhancement B B B B B Digital CMOS circuits4 tend to use simplified symbols like their NMOS counter- parts. Various shorthands are used for simplifying NMOS circuit diagrams. In general substrate connections are not drawn where they connect to Vdd Substrate connections need not be drawn since all must connect to Gnd. • • (PMOS) and Gnd (NMOS). Since source and drain are indistinguishable in the layout, there is no need to • All CMOS devices are enhancement mode. show the source on the circuit diagram. • Transistors act as simple digitally controlled switches. Note that schematic tools not designed for IC design will usually include inappropriate • 3-terminal symbols. 4in analog CMOS circuits we may have wells not connected to Vdd/GND 2006 2008 Digital CMOS Circuits Digital CMOS Circuits

Compound Gate Example Static CMOS complementary gates

A A A Z B Z B Z VDD B C C Pull Up Network

,,, Symbol

A B Z C D Pull Down Network

,,,

For any set of inputs there will exist either a path to Vdd or a path to Gnd. GND •

2009 2011

Digital CMOS Circuits

Compound Gates

A A B B Z C D Z C E F

All compound gates are inverting. • Realisable functions are arbitrary AND/OR expressions with inverted output. •

2010 RTL NOR Gate TTL NAND Gate ECL OR/NOR Gate A Z Z A A B Z B B Z

VCC VCC VCC

Z Z

AB A Z Z B AB

VEE

NMOS Compund Gate CMOS Compund Gate VDD VDD A A B B Z Z C C

Bipolar Transitors with Resistors - MSI/LSI • RTL - NOR TTL - NAND ECL - OR/NOR MOS Transistors (no resistors) - VLSI • NMOS CMOS - No static power! Both allow construction of NOR, NAND & Compound gate (always inverting) 2000 Components for IC Design Components for IC Design

Diodes and Bipolar Transistors MOS Transistors

Diode Simple NMOS Transistor Thin Oxide Polysilicon G Thick Oxide PN G G S D P N

NN SD S NNP D P−substrate SUB SUB SUB P N

G Ideal structure - 1D Polysilicon Mask • SD Real structure - 3D • Active Area Mask Depth controlled implants. •

3001 3003

Components for IC Design Components for IC Design

Diodes and Bipolar Transistors Simple NMOS Transistor

NPN Transistor Active Area mask defines extent of Thick Oxide. • EBC Polysilicon mask also controls extent of Thin Oxide (alias Gate Oxide). • N PN N N-type implant has no extra mask. P N • – It is blocked by thick oxide and by polysilicon. – The implant is Self Aligned. Substrate connection is to bottom of wafer. • EBC – All substrates to ground. NPN Gate connection not above transistor area. • – Design Rule.

Two n-type implants. •

3002 3004 Interconnect Components for IC Design

Contact Via Metal 2 Layer Resistors Metal 1 Layer Examples for Metal Example for Polysilicon assuming Rs = 0.1 ohms per square assuming Rs = 200 ohms per square N 1 1 2 3 45 6 7 8 9 10 R = 1 1 2 3 4 5 6 7 8 9 10 3 18 17 16 15 14 13 12 11 2

  19     420 21 22 23 24 25 26 27 5   R = R = 28 7 36 35 34 33 32 31 30 29 6 37 8 38 39 40 41 42 43 44 45 46 Contact Window Mask Via 1 Mask R = Polycrystaline Silicon (Polysilicon/Poly) Conductors: Diffusion (controlled by Active Area Mask) Metal 1 Metal 2 for larger resistances we need minimum width poly (often combined with a • Crossing conductors on different masks do not interact1. serpantine shape) to save on area • - Explicit contact/via is required for connection. corner squares count as half2 squares • Crossing conductors on the same mask are always connected. for predicatability and matching we may need wider tracks without corners • • 1the exception to this rule is that polysilicon crossing diffusion gives us a transistor 2effective resistance 0.56R ≈ s 3005 3007

Interconnect Components for IC Design

Resistance Capacitors

ρ l R = B w t t w     w s where ρ is the resistivity constant 8 l 3.2 10− Ωm for aluminium l A × 8 1.7 10− Ωm for copper B fringe capacitor × B Since t and ρ are fixed for a paricular mask layer, the value that is normally used is A A ρ the sheet resistance: Rs = t .  l w R = Rs parallel tracks over underlying conductor metal−insulator−metal capacitor poly−insulator−poly capacitor w (e.g. bulk silicon) (specialist analog process with 2 poly layers)   where Rs is sheet resistance Capacitance to underlying conductor C = C w l + 2 C l l a f 0.1Ω/ for 170nm thick copper •  Coupling capacitance to adjacent track C = C l/s • c where Ca, Cf , Cc are constants for a given layer and process Rs = resistance of a square (i.e. w = l) so the units for Rs are Ω/ (ohms per square). in digital designs our only aim is to minimise parasitic capacitance 3006 3008 Diode NPN Transistor NMOS Enhancement transistor NMOS Process

P N G EBC

SD

P EBC N SD N P N NN P

SUB

Resistor Capacitors

1 2 3 4 5 6 7 8 9 1 B B 10 3 18 17 16 15 14 13 12 11 2 19 420 21 22 23 24 25 26 27 5 28 7 36 35 34 33 32 31 30 29 6 37 8 38 39 40 41 42 43 44 45 46 A A

3000 CMOS CMOS

NMOS Transistor – with top substrate connection CMOS Inverter G G SUB S D

P +NN + + SD N P well SUB P G

P

P well mask N P P+ implant mask SUB SD N N−well Active Area N implant N P implant P N+ implant mask Polysilicon Contact Window Metal

4001 4003

CMOS CMOS

NMOS Transistor – with top substrate connection CMOS Inverter Where it is not suitable for substrate connections to be shared, a more complex process is used.

Five masks must be used to define the transistor: The process described here is an N Well process since it has only an N Well. • • – P Well P Well and Twin Tub processes also exist. – Active Area Note that the P-N junction between chip substrate and N Well will remain re- • – Polysilicon verse biased. – N+ implant Thus the transistors remain isolated. – P+ implant N implant defines NMOS source/drain and PMOS substrate contact. • P Well, for isolation. P implant defines PMOS source/drain and NMOS substrate contact. • • Top substrate connection. • P+/N+ implants produce good ohmic contacts. •

4002 4004 Processing – Photolithography

Photoresist Exposure UV Light Contact/Proximity Mask Glass Mask N−well Mask Pattern P N (chrome) Active Area defines Thick Oxide P N Photoresist Development Silicon Wafer Polysilicon defines Thin Oxide P N P implant aligned to AA and Poly P P P Oxide Growth Oxide Etch PN SiO 2 N implant aligned to AA and Poly PNNPPN PN

Photoresist Deposition Photoresist Photoresist Strip Contact Window (positive) PNNPPN PN

Metal PNNPPN PN

4005 4007

Processing – Mask Making CMOS - Short Gate Techiques

Reticle written by scanning Pattern reproduced on wafer (or contact/proximity mask) Shallow Trench Isolation electron beam by step and repeat with optical reduction

PNN+ + +NN + + UV Light P well Reticle Mask

Lens

Rather than grow the thick oxide, dig1 a trench and deposit silicon oxide in the • space. Deeper oxide, sharper edges, better isolation. Optical reduction allows narrower line widths. • • 1etch away 4006 4008 CMOS - Short Gate Techiques

Fin FET

With the aid of trenches we raise the active area above the bulk silicon. • We can then wrap the gate around the channel. • Avoids an effect where a channel is created in a region which is closer to the • drain than the gate.

4009 PMOS Enhancement transistor CMOS Inverter Twin Tub CMOS Process CMOS Process

N Well UV Light

PNNPPN PNI P type silicon wafer CMOS Inverter N−Well CMOS Processs (with explicit N+ implant mask) P

P

P

N well P

N well P PNNPPN P N many steps for a single mask Active Area CMOS Inverter N−Well CMOS Processs (without explicit N+ implant mask)

N well P Polysilicon

N well P P+ Implant

PP+ + N well P N+ Implant PNNPPN PP+ +N + P N N well P Features may be determined by a number of masks e.g. NMOS source drain: ActiveArea AND NOT(NWell OR Poly OR PImplant)

4000 Design Rules Design Rules

To prevent chip failure, designs must conform to design rules: 0.5 µm CMOS inverter Single layer rules •

N−well Active Area Multi-layer rules • P implant = NOT{ N implant } Polysilicon Contact Window Metal

5001 5003

Derivation of Design Rules Abstraction

Levels of Abstraction

Isotropic Etching Mask Misalignment Optical Focus over 3D terrain

Reticle Mask

Lateral Diffusion

Misalignment can be Cumulative Mask Level Design N • – Laborious Technology/Process dependent. – Design rules may change during a design! NN Transistor Level Design • is aligned to NN – Process independent, Technology dependent. is aligned to Gate Level Design • – Process/Technology independent.

5002 5004 Abstraction - Stick Diagrams Digital CMOS Design

Stick Diagrams

Stick diagrams give us many of the benefits of abstraction: Much easier/faster than full mask specification. • Process independent (valid for any CMOS process). • Easy to change. • while avoiding some of the problems: Optimized layout may be generated much more easily from a stick diagram • than from transistor or gate level designs.1

1note that all IC designs must end at the mask level. 5005 5007

Digital CMOS Design Digital CMOS Design

Stick Diagrams Stick Diagrams

Explore your Design Space. •

– Implications of crossovers.

– Number of contacts.

Metal – Arrangement of devices and connections. Polysilicon N+ P+ Process independent layout. Contact • Tap Combined contact & tap Easy to expand to a full layout for a particular process. •

5006 5008 Sticks and CAD - Symbolic Capture Sticks and CAD - Magic

L=0.5 W=2.2

L=0.5 W=1.1

Transistors are placed and explicitly sized. Log style design (sticks with width) - DRC errors are flagged immediately. • • - components are joined with zero width wires. - again contacts are automatically selected as required. - contacts are automatically selected as required. On-line DRC leads to rapid generation of correct designs. • A semi-automatic compaction process will create DRC correct layout. - symbolic capture style compaction is available if desired. • 5009 5010 Design Rules − width, separation, overlap Optimised Mask Layout Equivalent Stick Diagram

Metal Polysilicon N+ P+ Contact Tap Combined contact & tap

5000 Digital CMOS Design Digital CMOS Design

A logical approach to gate layout. Finding an Euler Path Algorithms

All complementary gates may be designed using a single row of n-transistors above or • It is relatively easy for a computer to consider all possible arrangements of below a single row of p-transistors, aligned at common gate connections. • transistors in search of a suitable Euler path. This is not so easy for the human designer.

One Human Algorithm

Find a path which passes through all n-transistors exactly once. • Express the path in terms of the gate connections. • Is it possible to follow a similarly labelled path through the p-transistors? • – Yes – you’ve succeeded. – No – try again (you may like to try a p path first this time).

6001 6003

Digital CMOS Design Digital CMOS Design

Euler Path Finding an Euler Path

For the majority of these gates we can find an arrangement of transistors such • that we can butt adjoining transistors. A C – Careful selection of transistor ordering. Z B – Careful orientation of transistor source and drain. Referred to as line of diffusion. •

Here there are four possible Euler paths.

6002 6004 Digital CMOS Design Digital CMOS Design

Finding an Euler Path Finding an Euler Path

A B C D Z E F

ODD PDN Topolgy: ODD

ODD

ODD

No possible path through n-transistors!

6005 6007

Digital CMOS Design Digital CMOS Design

Euler Path Example Finding an Euler Path

1 2 3 4 5

1 2 3 4 5

1. Find Euler path 3. Route power nodes 5. Route remaining nodes 2. Label poly columns 4. Route output node 6. Add taps1 for PMOS and NMOS A combined contact and tap, , may be used only where a power contact exists at the end of a line of diffusion. Where this is not the case a simple tap, , should be used. 11 tap is good for about 6 transistors – insufficient taps may leave a chip vulnerable to latch-up 6006 6008 Digital CMOS Design Digital CMOS Design

Finding an Euler Path Philosophers vs. Engineers

A A D D E E B B F Z F Z G G C C H PUN Topolgy: H I EVEN I

ODD ODD ODD

EVEN

No possible path through p-transistors. The philosopher is happy to prove that there is no Euler path to be found. • • No re-arrangement will create a solution! The engineer will use partial Euler paths to reach the best solution. • •

6009 6010 Investigation of Euler paths leads to more efficient layout*

ODD 1 5 EVEN 2 EVEN 6 EVEN 3 7 ODD 4 EVEN ODD 6

ODD ODD 7 5 EVEN 4 2 ODD

1 ODD ODD 3

1 23 4 5 6 7 1 23 4 5 6 7 1 23 4 5 6 7

1 23 4 5 6 7 1 23 4 5 6 7 1 23 4 5 6 7

*not all gates will support a common Euler path for both PMOS and NMOS

6000 Digital CMOS Design Digital CMOS Design

Multiple gates Two-layer Metal

SABY

A 1 Y B 0

S 1 METAL 2 METAL LAYER LAYERS

A Y B

S

Most modern VLSI processes support two or more metal layers. The norm is to use only metal for inter-cell routing. SABY Metal1 horizontal (and for power rails) usually for inter-cell routing Metal2 vertical (and for cell inputs and outputs).

7001 7003

Digital CMOS Design Standard Cell Design

Multiple gates Many ICs are designed using the standard cell method. Cell Library Creation • 1 Gates should all be of same height. A cell library, containing commonly used logic gates is created for a process. • This is often carried out by or on behalf of the foundry. – Power and ground rails will then line up when butted. ASIC2 Design • The ASIC designer must design a circuit using the logic gates available in the All gate inputs and outputs are available at top and bottom. library. • – All routing is external to cells. The ASIC designer usually has no access to the full layout of the standard cells and doesn’t create any new cells for the library. – Preserves the benefits of hierarchy. Layout work performed by the ASIC designer is divided into two stages: – Placement Interconnect is via two conductor routing. • – Routing – In this case Polysilicon vertically and Metal horizontally. 1note that a standard cell may include transistors from more than one basic function (e.g. NAND + inverter to give AND) but will normally be designed flat i.e. without layout hierarchy. 2Application Specific Integrated Circuit 7002 7004 Standard Cell Design Placement & Routing

Choosing a set of cells for a cell library Placement There is no set size for a cell library. • Theoretically just one cell ( ) or one type of cell ( , , , ...) is • sufficient for a cell library. The use of more complex cells allows for designs optimised for area and/or • performance: Multiplexer standard cell (single cell − no hierarchy)

A 1 A Y Y B 0 B

S S SABY

Which basic gates; which compound gates; which sequential gates? • Do we provide different versions of the same gate (e.g. small area version, high • Cells are placed in one or several equal length lines with inter-digitated power and drive version)? If so, how many different versions? ground rails.

7005 7007

Standard Cell Design Placement & Routing

Layout and Abstract Views of a Standard Cell Routing

ABY ABY

Vdd! Vdd! Vdd! Vdd!

LAYOUT ABSTRACT

GND! GND! GND! GND!

A B Y A B Y The partial cell layout usually given to the ASIC designer is known as a black box or abstract view. The abstract:

must include cell ports and a cell boundary • may include some or all of the metal mask information In the routing channels between the cells we route metal1 horizontally and metal2 • vertically.

7006 7008 Placement & Routing Standard Cell Design

Two conductor routing More Metal Layers

Conductor A horizontal for inter-cell routing 3 • Conductor B vertical This logical approach means that we should never have to worry about signals • crossing. This makes life considerably easier for a computer (or even a human) to com- plete the routing. We must only ensure that two signals will not meet in the same horizontal or • vertical channel. Computer algorithms can be used to ensure placement of cells such that wires • are short.4 Further computer algorithms can be used to optimize the routing itself. • With this approach we can route safely over the cell to the specified pins leading 3In the two-metal example Conductor A is Metal1 and Conductor B is Metal2 4In VLSI circuits we often find that inter-cell wiring occupies more area than the cells themselves. to much smaller gaps between cell rows. 7009 7011

Standard Cell Design Standard Cell Design

More Metal Layers With three or more metal layers it is possible to take a different approach. Alternative Placement Style The simplest example uses three metal layers. Vdd! Vdd! Vdd! Vdd! Standard Cells • GND! GND! GND! GND! Use only metal1 except for I/O which is in metal2 GND! GND! GND! GND! GND! Two Conductor Routing • Vdd! Vdd! Vdd! Vdd! Vdd! Uses metal2 and metal3 Vdd! Vdd! Vdd! Vdd! Vdd!

GND! GND! GND! GND! GND! GND! GND! GND! GND! Vdd! Vdd! Vdd! Vdd! Vdd! Vdd!

ABY ABY ABY By flipping every second row it may be possible to eliminate gaps between rows. N-wells are merged and power or ground rails are shared. GND! GND! This approach is normally associated with sparse rows and non channel based LAYOUT ABSTRACT ROUTING routing algorithms.

7010 7012 LAYOUT ABSTRACT ROUTING

A B Y

Vdd! Vdd! Vdd! Vdd! A B Y A B Y 1 METAL NAND2 NAND2 NAND2 LAYER

GND! A B Y GND! A B Y GND! GND!

A B Y

ABY

Vdd! Vdd! Vdd! Vdd! ABY ABY 2 METAL NAND2 NAND2 NAND2 LAYERS

ABY ABY GND! GND! GND! GND!

A B Y Vdd! Vdd! Vdd! Vdd! 3 METAL NAND2 NAND2 NAND2 LAYERS ABY ABY ABY GND! GND! GND! GND!

7000 Static CMOS Complementary Gates Pass Transistor Circuits

Vdd Transmission Gate PUN • – For static circuits we would normally use a CMOS transmission gates: OUT

PDN EN

GND IN OUT Static • After the appropriate propagation delay the ouput becomes valid and remains valid.1 EN Complementary • For any set of inputs there will exist either a path to Vdd or a path to GND. -- balanced n and p pass transistors Where this condition is not met we have either a high impedence output or a -- faster pull-up conflict in which the strongest path succeeds. Static CMOS Non-complementary -- slower pull-down gates make use of these possibilities.

1c.f. Dynamic logic which uses circuit capicitance to store state for a short time. 8001 8003

Pass Transistor Circuits Pass Transistor Circuits

Pass Transistor Transmission Gate Layout • • IN OUT EN

ENABLE EN

IN OUT IN OUT IN OUT – Provides very compact circuits. EN – Good transmission of logic ‘0’.

– Poor transmission of logic ‘1’. EN EN EN -- slow rise time -- degradation of logic value – note that these circuits are not fully complementary3 hence they do not im- mediately lend themselves to a line of diffusion implementation. The pass transistor is used in many dynamic CMOS circuits2.

2where pull-up is performed by an alternative method 3since there are sets of inputs for which the output is neither pulled low nor high 8002 8004 Pass Transistor Circuits Bus Distributed Multiplexing

Transmission Gate Multiplexor Decoder • & 0 1 Sequencer MUX S fn ALU IN1 Operation Code IN1 IN1 PC ACC OUT S OUT OUT IR IN0 IN0 Operand IN0 S S MAR MDR S S

– very few transistors 4 (+2 for inverter) ME Address Data WR – difficult layout may offset this advantage OE RAM -- prime candidate for 2 level metal Ideal for signals with many drivers from different modules.

8005 8007

Pass Transistor Circuits Bus Distributed Multiplexing

Bus Wiring Implementation of bitslice ALU:5 • A[n] M[n] A[n+1] A[n−1]

A0 B0 C0 Subtract

A1 B1 C1 0

C[n−1] C[n] A2 B2 C2 FA

A3 B3 C3

AS AS BSBS CS CS A[n]’ A M 0 A+M A&M A|M A^M A>>1 A<<1 A−M – distributed multiplexing4 Separate circuit for each function – only one inverter required per bank of transmission gates • Connected via distributed multiplexor – greatly simplifies global wiring • 5Note that transmission gates have no drive capability in themselves. Here a good drive is 4internal chip bus should never be allowed to float high impedance ensured by providing buffers. 8006 8008 Bus Distributed Multiplexing Pass Transistor Circuits

A[n] M[n] Functions Supported: Tristate Inverter A A+M A>>1 ~A • F5 0 1 M A−M A<<1 ~M F4 0 A&M F3 A|M A^M ~(A^M) A&~M A|~M EN C[n−1] C[n] FA IN OUT EN A[n+1]

F2 F1 0 1 2 3 4 F0 3

A[n]’ – Alternatively the transmission gate may be incorporated into the gate. Single optimized ALU module -- one connection is removed - easier to layout • Multiplexing is not distributed -- also easier to simulate! • Multiplexor implementation may use transmission gates • 8009 8011

Pass Transistor Circuits Pass Transistor Circuits

Tristate Inverter Tristate Inverter Layout • •

EN

IN OUT

EN

– Any gate may have a tri-state output by combining it with a transmission gate. INEN EN OUT EN IN OUT

8010 8012 Pass Transistor Circuits

Tristate Inverter Bus Driver •

A0 B0 C0

A1 B1 C1

A2 B2 C2

A3 B3 C3

AS AS BSBS CS CS

– a tristate inverting buffer is often used to drive high capacitance bus signals – transistors may be sized as required

8013 Pass Transistor CMOS Transmission Gate EN EN IN OUT IN OUT IN OUT IN OUT EN EN EN

Transmission Gate Multiplexor EN EN S OUT Tri−state gates are used for Multiplexing IN1 IN1 Distributed Multiplexing OUT S OUT using transmission gates IN0 IN0 IN1 IN0

S S S S using tri−state inverters *note distictive polysilicon crossover

Tri−state Inverter

using tri−state buffers

IN OUT IN OUT EN EN

Bi−directional I/O EN IN OUT Tri−state Buffer

IN OUT IN OUT

EN EN *this is another form of multiplexing

8000 Latches and Flip-Flops Latches and Flip-Flops

A simpler layout may be achieved using tristate inverters. • CMOS transmission gate latch •

D D 1 Q 0 Q D Q LOAD

LOAD LD LD

A simple transparent latch can be build around a transmission gate multiplexor D LD LD Q – transparent when load is high – latched when load is low – this design requires two additional transistors but may well be more com- – two inverters are required since the transmission gate cannot drive itself pact.

9001 9003

Latches and Flip-Flops Latches and Flip-Flops

For use in simple synchronous circuits we use a pair of latches in a master slave • Transmission gate latch layout configuration. •

D 1 1 Q 0 0 D Q CLOCK CLOCK

LD LD

DQLD LD – this avoids the race condition in which a transparent latch drives a second transparent latch operating on the same clock phase. – the circuit behaves as a rising edge triggered D type flip-flop. – a compact layout is possible using 2 layer metal

9002 9004 Latches and Flip-Flops Latches and Flip-Flops

Transmission gate implementation Layout of master slave D type. • •

D Q

CLK D CLK 16 Transistors Q D Q

Tristate inverter implementation • CLK D CLK 16 Transistors

Q CLK CLK CLK CLK

– very compact using alternative configuration. CLK CLK 20 Transistors

9005 9007

Latches and Flip-Flops Latches and Flip-Flops

Alternative configuration For the same functionality we could use an edge triggered D type: • • CLK D Q D 1 1 Q 0 0

CLOCK CLOCK D

Q – Implementation

CLK 26 Transistors D Q

– a few more transistors CLK – more complex wiring CLK 16 Transistors – simpler clock distribution

9006 9008 Register File

Where we have large amounts of storage the use of individual latches can lead to space saving.

Z

A D Q Reg1 ENAr1 LDr1 Load D Q B ENBr1 Clock D Q Reg2 ENAr2 LDr2 Load Reg1 Clock ENBr2 Reg2 D Q Reg3 Reg1 ENAr3 LDr3 Load AB Z ENBr3

Load signals must be glitch free with tightly controlled timing. • Edge Triggered D-type prevents a race condition (Reg1 Reg1 + Reg2). • ←

9009 Latch Master Slave D−Type Flip−Flop

D D 1 1 Q Q 0 0 D Q D Q

CLOCK CLOCK LD LD

LD LD CLK CLK CLK CLK

D D 1 1 Q Q 0 0 D Q D Q

CLOCK CLOCK

LD LD

LD LD CLK CLK CLK CLK *muliplexor based latches and flip−flops include distictive polysilicon crossover

Edge Triggered D−Type Flip−Flop

D

Q

CLK 26 Transistors

CLK D Q Euler path analysis is applied creatively to these multi−gate cells − gates are often linked via the common gnd/pwr node Final layouts will be more complex where clock buffers, reset circuitry and metal 2 i/o are included

9000 PLAs, ROMs and RAMs PLAs, ROMs and RAMs

PLA structures PLA structure

Programmable Logic Array structures provide a logical and compact method of P4 implementing multiple SOP (Sum of Products) or POS expressions. P1 A X P3 SOP = AND−OR AND plane OR plane P2 B Y P2 P1 P1 P3 C Z A X A X P2 P2 P4 P1 B Y B Y P3 P3 C Z C Z P4 P4

ABCXYZ

Most PLA structures employ pseudo-NMOS NOR gates using a P-channel device A regular layout is employed, with columns for inputs and outputs and rows in place of the NMOS depletion load. • for intermediate expressions.

10001 10003

PLAs, ROMs and RAMs PLAs, ROMs and RAMs

Pseudo-NMOS NOR gate PLA structure

AND plane cells OR plane cells

P4

OUT P3 I1 I2 I3 I4 I5 I6 I7 P2 Peripheral cells

P1 Unlike complementary CMOS circuits, these gates will dissipate power under • static conditions (since the P device is always on). The P and N channel devices must be ratioed in order to create the required • low output voltage. ABCXYZ

This ratioing results in a slower gate, although there is a trade-off between gate Layout is simply a matter of selecting and placing rectangular cells from a lim- • • speed and static power dissipation. ited set.

10002 10004 PLAs, ROMs and RAMs PLAs, ROMs and RAMs

PLA structure Static RAM

VDD GND VDD GND Used for high density storage on a standard CMOS process. • P4 P4 Short lived conflict during write - NMOS transistors offer stronger path. • P3 Differential amplifiers are used for speedy read. P3 • P2 P2 BIT BIT BIT BIT Vdd P1 P1 ACACB B

ACACB B

Only AND plane cells are shown here

WORD WORD

GND Conversion to sticks is straight forward with opportunities for further opti- • mization. Standard 6 transistor static RAM cell.

10005 10007

PLAs, ROMs and RAMs SRAM Structure

ROMs A ROM may simply be a PLA with fixed decoder plane1 and programmable • Din[0] Dout[0] data plane. Decode Plane Data Plane W7 Din[1] Dout[1] W6

W5

W4 Din[2] Dout[2] Word Lines W3

W2 nWE

W1

W0 A[0]

A[1]

A2 A2 A1 A1 A0 A0 D3 D2 D1 D0 Address Lines Data Lines nOE 1RAM structures can make use of the same decode plane. 10006 10008 SRAM Write SRAM Write

0 1BIT 01 BIT 0 1 0 1BIT 01 BIT 0 1

Word Select Word Select

Write Enable Write Enable

10009 10009 PLA and ROM SRAM

BIT BIT

WORD select

AND Plane OR Plane

ROM is PLA with fixed AND (decoder) plane programmable OR (data) plane Cells are designed to butt together in two dimensions leading to efficient layout PLA layout efficiency will depend on the actual function implemented (e.g. number of common product terms)

10000 System Design Choices Programmable Logic

Programmable Logic • – PLD e.g. Lattice ispGAL22V10, Atmel ATF1502 CPLD – Field Programmable Gate Array (FPGA) e.g. Intel Cyclone V, Xilinx Artix-7/Zync-7000

Semi-Custom Design • – Mask Programmable Gate Array e.g. ECS CMOS Gate Array Intel HardCopy II structured ASICs

– Standard Cell Design ICT PEEL22CV10 Source: ICT e.g. AMS CORELIB 0.35µm cell library One time use - Fuse programmable. • Full Custom Design Reprogrammable - UV/Electrically Erasable. • •

11001 11003

System Design Choices Field Programmable Gate Array – Xilinx XC4000

Programmable I/O Buffers Programmable Logic START HERE Interconnection • Point – Best possible design turnaround time CLB CLB – Cheapest for prototyping CLB CLB – Best time to market Switching Matrix

– Minimum skill required CLB Semi-Custom Design I/O Buffers I/O Buffers • Vertical Routing Channel Full Custom Design Horizontal Routing Channel • – Cheapest for mass production – Fastest I/O Buffers – Lowest Power 2 1 Configurable Logic Blocks (CLBs) & I/O Blocks – Highest Density • – Most skill required Programmable Interconnect •

1optimization limited by speed/power/area trade off 2Xilinx XC4013 has 576 (24 24) CLBs and up to 192 (4 48) user I/O pins. × × 11002 11004 Slice Description

X-Ref Target - Figure 2-3

SRHI D SRLO Reset Type INIT1 Q CE Sync/Async COUT INIT0 CK SR FF/LAT

DX DMUX DI2 D6:1 A6:A1 W6:W1 D D O6 FF/LAT O5 DX INIT1 Q DQ D INIT0 CK DI1 SRHI CE SRLO WEN MC31 SRHI D SRLO CK Q SR DI INIT1 CE INIT0 CK SR

CX CMUX DI2 C6:1 A6:A1 W6:W1 C C O6 O5 FF/LAT CX INIT1 Q CQ D INIT0 CK DI1 CE SRHI SRLO WEN MC31 SRHI CK D SRLO SR CI INIT1 Q CE INIT0 CK SR

BX 3 BMUX Product Obsolete or Under Obsolescence R DI2 Field Programmable Gate Array – Xilinx XC4000 CLB Xilinx Artix-7B6:1 –A6:A1 SLICEM CLB W6:W1 B XC4000E and XC4000X Series Field Programmable Gate Arrays B O6 O5 FF/LAT BX INIT1 Q BQ D DI1 INIT0 CK CE SRHI SRLO WEN MC31 SRHI CK 4 D SRLO SR C • • • C BI 1 4 INIT1 Q CE INIT0 CK SR

AX H1 DIN/H2 SR/H0 EC AMUX DI2 A6:1 A6:A1 W6:W1 A A G S/R Bypass O6 4 O5 FF/LAT CONTROL AX INIT1 Q AQ D INIT0 DI1 DIN YQ CK CE SRHI G LOGIC SRLO 3 F' SD WEN MC31 CK FUNCTION D Q G' G' SR OF AI 0/1 H' G2 G1-G4 SR CE CLK CK G1 WEN LOGIC WE CIN EC UG474_c2_02_110510 FUNCTION G' RD OF H' H' Figure 2-3: Diagram of SLICEM F', G', 1 AND Y H1 Source: Xilinx F 4 S/R Bypass 7Series FPGAs CLB User Guide www.xilinx.com 19 CONTROL DIN XQ 4x 6-input Look-UpUG474 (v1.7) November Tables 17, 2014 (LUTs) for combinational logic F LOGIC 3 F' SD FUNCTION D Q • F' G' OF H' F2 F1-F4 Carry chain supporting fast carry lookahead • F1 8x storage elements EC RD • K (CLOCK) 1 LUTs can be alternatively configured as H' X F'

Multiplexer Controlled 256 bits RAM by Configuration Program X6692 • 32-bit shift register Figure 1: Simplified Block Diagram of XC4000 Series CLB (RAM and Carry Logic functions not shown) • Source: Xilinx 3Xilinx XC7A200T has 16,825 CLBs (each containing 2 slices) and up to 500 user I/O pins. Flip-Flops Clock Enable The CLB can pass the combinatorial output(s) to the inter-11005The clock enable signal (EC) is active High. The EC pin is 11007 connect network, but can also store the combinatorial shared by both storage elements. If left unconnected for results or other incoming data in one or two flip-flops, and either, the clock enable for that storage element defaults to connect their outputs to the interconnect network as well. the active state. EC is not invertible within the CLB. The two edge-triggered D-type flip-flops have common clock (K) and clock enable (EC) inputs. Either or both clock Table 2: CLB Storage Element Functionality inputs can also be permanently enabled. Storage element (active rising edge isSlice shown) Description functionality is described in Table 2. Mode K EC SR D Q X-Ref Target - Figure 2-3

Power-UpSRHI or Latches (XC4000X only) D SRLO Reset Type INIT1 Q X X X X SR FPGA - System On Chip CE Sync/Async COUT INIT0GSR CK The CLB storage elements can also be configured as SR FF/LAT latches. The two latches have common clock (K) and clock X X 1 X SR DX Flip-Flop __/ 1*DMUX 0* D D enable (EC) inputs. StorageDI2 element functionality is D6:1 A6:A1 described in Table 2. W6:W1 0 XD 0* X Q D O6 FF/LAT O5 DX 1INIT1 Q 1*DQ 0* X Q INIT0 Modern FPGAs are big enough for: D Clock Input CK DI1 Latch SRHI CE SRLO WEN MC31 SRHI 0 1* 0* D D D SRLO CK INIT1 Q SR Each flip-flop can be triggeredDI on either the rising or falling CE INIT0 Both X 0 0* X Q clock edge. The clock pin is shared by both storage ele- CK SR Legend: CX One or more soft-core processors ments. However, the clock is individually invertible for each CMUX DI2 X Don’t care • C6:1 A6:A1 storage element. Any inverter placedW6:W1 on the clock input is __/ Rising edge C C O6 SR Set or ResetFF/LAT value. Reset is default. automatically absorbed into the CLB.O5 Program memory CX INIT1 Q CQ 0* Input Dis LowINIT0 or unconnected (default value) CK DI1 SRHI • CE SRLO WEN MC31 SRHI 1* Input CKis High or unconnected (default value) D SRLO SR CI INIT1 Q CE INIT0 Data memory CK SR •

BX BMUX DI2 + specialist hardware B6:1 A6:A1 W6:W1 B B O6 6-10 O5 FF/LAT May 14, 1999 (Version 1.6) BX INIT1 Q BQ D DI1 INIT0 CK CE SRHI SRLO WEN MC31 SRHI CK The new trend is for FPGAs with hard processors built in: D SRLO SR BI INIT1 Q CE INIT0 CK SR

AX AMUX Xilinx Zync-7000 includes dual-core ARM A9 DI2 A6:1 A6:A1 • W6:W1 A A O6 O5 FF/LAT AX INIT1 Q AQ D INIT0 Intel Cyclone V SE includes dual-core ARM A9 DI1 CK CE SRHI SRLO • WEN MC31 CK SR AI 0/1 4 SR Cypress PSoC 4 includes ARM Cortex-M0 and programmable digital and ana- CE CLK • CK log blocks WEN WE CIN UG474_c2_02_110510

Figure 2-3: Diagram of SLICEM Artix-7 – SLICEM CLB Source: Xilinx 4here the digital block is PLD rather than FPGA

7Series FPGAs CLB User Guide www.xilinx.com 19 UG474 (v1.7) November 17, 2014 11006 11008 Mask Programmable Gate Array Standard Cell Design

Logic Functions 9 Output Pads • VDD Pad

8 Input Pads 68 Gate Sites arranged as 4 columns of 17 sites each.

Auto Generated Macro Blocks • – PLA – ROM – RAM

8 Input Pads System Level Blocks • GND Pad – Microprocessor core5 9 Output Pads 5Will support System On Chip applications. 11009 11011

Mask Programmable Gate Array Full Custom

GND Vdd GND Vdd X X X X All design styles need full custom designers O O O O

C C C C to design the base programmable logic chips B B B B • to design building blocks for semi-custom A A A A •

Where large ASICs use full custom techniques they are likely to be used alongside semi-custom techniques. Vdd ACB e.g. Hand-held computer game chip O C Full custom bitslice datapath • B hand crafted for optimum area efficiency and low power consumption A GND Standard cell controller • Macro block RAM, ROM Customize Metal and Contact Window masks only. • •

11010 11012 Programmable Logic Semi−Custom Full Custom

Skill Required Little Lots

Time to Market Quick Slow One−off Cheap Manufacturing Cost Expensive Unit Cost Cheap In Production Expensive

Speed Fast Slow

Area Small Big

Power Low High

All design styles need full custom designers A large ASIC (especially SoC) may mix Semi−Custom and Full Custom

11000