4/27/2019

Indian Institute of Technology Jodhpur, Year 2018‐2019 Digital Logic and Design (Course Code: EE222) Lecture 33‐34: Memory contd.. Programmable Logic Devices

Course Instructor: Shree Prakash Tiwari Email: [email protected] Office: 210, Phone: 0291‐244‐1356

Webpage: http://home.iitj.ac.in/~sptiwari/ Course related documents will be uploaded on http://home.iitj.ac.in/~sptiwari/DLD/

Note: The information provided in the slides are taken form text books Digital Electronics (including Mano & Ciletti), and various other resources from internet, for teaching/academic use only 1

Programmable Logic Overview

° Programmable logic offers designers opportunity to customize chips ° Programmable logic devices have a fixed logic structure ° contain AND-OR circuits • First introduced in early 1980’s ° Field programmable gate arrays (FPGAs) contain small blocks that implement truth tables • First introduced in 1985 ( Corporation) ° Software used to convert user designs to programming information

1 4/27/2019

Design Implementation • Chip creation is a long and difficult process • Millions of dollars required to create custom silicon - Simulation, synthesis, fabrication (lots of jobs for engineers)

Programmable Logic Design

° “Generic” chip created and then customized by designer ° Programming information used • Like a ROM ° Analogy – sign making • Custom sign – more expensive, customized by manufacturer, difficult to change • Sign built by consumer from individual letters – less expensive, not quite as nice, easier to change (remix letters)

2 4/27/2019

Programmable Logic Devices

 Progggy()rammable Logic Arrays (PLA)

 Programmable Array Logic (PAL)

 Simple Programmable Logic Device (SPLD)

 Complex Programmable Logic Device (CPLD)

 Field Programmable Gate Array (FPGA)

PLD Summary

3 4/27/2019

PLA Example

Programmable Array Logic

• Implements sum-of- products expressions • Four external inputs x (and complements) • Feedback path from output F1 x • Product term connections made via switches

4 4/27/2019

Programmable Array Logic

• Consider implementing the following expression x x x x x x x x - I1 I2 I3 + I2‘ I3‘ I4 + I1 I4 = F1

• Note that only functions of up to three product terms can be implemented - Larger functions need to be chained together via the feedback path

Reconfigurable Hardware

Logic Element A B AND Out C D

A . B . C . D = out

• Each logic element operates on four one-bit inputs. • Output is one data bit. • Can perform any Boolean function of four inputs

5 4/27/2019

Field-Programmable Gate Array

Logic Element Tracks

LE LE LE LE

LE LE LE LE

LE LE LE LE

• Each logic element outputs one data bit. • Interconnect programmable between elements. • Interconnect tracks grouped into channels.

FPGA Architecture Issues

Logic Element

• Need to explore architectural issues. • How much functionality should go in a logic element? • How many routing tracks per channel? • Switch “population”?

6 4/27/2019

Translating a Design to an FPGA C program Circuit Array

. A + C . B C = A+B . • CAD to translate circuit from text description to physical implementation well understood. • CAD to translate from C program to circuit not well understood. • Veryyppg difficult for application designers to successfully yg write high- performance applications

Need for design automation!

Circuit Compilation 1. Technology Mapping

LUT

2. Placement LUT

? Assign a logical LUT to a physical location.

3. Routing Select wire segments And switches for Interconnection.

7 4/27/2019

Two Bit Adder

Made of Full Adders AB A+B = D

Co FA Ci S Logic synthesis tool reduces circuit to SOP form

S = ABCi + A’B’Ci + AB’Ci’ + A’BCi’

A A B B LUT Co LUT S Ci Ci

Co = ABCi + A’BCi + AB’Ci + ABCi’

Dynamic Reconfiguration

LL

LL

• What if I want to exchange part of the design in the device with another piece? • Need to create architectures and software to incrementally change designs. • Effectively a “configuration cache”

8 4/27/2019

Field Programmable Arrays Capabilities

° Dominant digital design implementation ° Ability to re-configure FPGA to implement any digital logic function • Partial re-configuration allows a portion of the FPGA to be continuously running while another portion is being re-configured ° FPGAs also contain analog circuitry features including a programmable slew rate and drive strength, differential comparators on I/O designed to be connected to differential signaling channels. ° Mixed-signal FPGAs contains ADCs and DACs with analog signal conditional blocks allowing them to operate as a system-on-chip (SoC)

FPGA Architectures

° Early FPGAs • N x N array of unit cells (CLB + routing) - Special routing along center axis ° Next Generation FPGAs • M x N unit cells • Small block RAMs around edges ° More recent FPGAs • Added block RAM arrays • Added multiplier cores • Adders processor cores

configurable (CLB)

9 4/27/2019

FPGA Architecture Trends

° Memories • Single & Dual-port RAMS • FIFO (first-in first-out) • ECC ( error correcti ng cod es) ° Digital Signal Processors • Multipliers • Accumulators • Arithmetic Logic Units (ALUs) ° Embedded Processors • Hardcore (dedicated processors) - Dedicated program and data memories - Programmable RAM in FPGA can be used in conjunction with the processor to provide program and data memories • Soft core (synthesized from a HDL)

Basic FPGA Architecture

•More recent FPGA architectures have small block RAM arrays (usually placed in center column), multipliers, processor cores, DSP cores w/ multipliers, and I/O cells along columns for ball grid arrays (BGAs)

10 4/27/2019

FPGA Operation

User writes configuration memory which defines the function of the system. This includes: the connectivity between the CLBs and the I/O cells, the logic to be implemented onto the CLBs, and the I/O blocks.

By changing the data in the configuration memory, the function of the system chllThihidthanges as well. This change in data can be implemented at anytime during FPGA operation (run-time configuration).

Configurable Logic Blocks (CLBs) Architecture ° CLBs consist of: • Look-up Tables (LUT) which implement the entries of a logic functions truth table - Some FPGAs can use LUTs to implement small Random Access Memory (RAM) • Carry and Control Logic - Implements fast arithmetic operations (adders/ subtractors) - Can be also configured for additional operations (Built-in-Self Test iterative-OR chain) • Memory Elements - Configurable Flip Flops (FFs)/ Latches( Programmable clock edges, set/reset, and clock enable) - These memoryyyg elements usually can be configured as shift- registers

11 4/27/2019

Configurable Logic Blocks

A CLB can contain several slices, which makilCLBke up a single CLB. Xilinx Virtex-5 FPGAs (right) have two slices: SLICEL (logic) and SLICEM (memory).

In addition to the basic CLB architecture, the Virtex-5 contains wide- function MUXs which can implement: - 4:1 MUX using 1 LUT - 8:1 MUX using 2 LUTs - 16:1 MUX using 4 LUTs

Look-up Tables (2:1 MUX Example)

° Configuration memory holds output of truth table entries ° Internal signals connect to control signals of MUXs to select a values of the truth tables for any given input signals

12 4/27/2019

LUT Based Ram

° Normal LUT mode performs read operation ° Address decoders with WE generates clock signals to latches for write operation ° Smaller RAMs can be combined to create larger RAMs (up to 64-bit in Virtex-5)

FPGA Programmable Interconnection Network ° Horizontal and vertical mesh of wire segments interconnected by programmable switches called programmable interconnect points (PIPs). These PIPs are implemented using a transmission gate controlled by a memory bits from the configuration memory. ° Consists of global routing connecting PLBs to I/O buffers, non-adjacent PLBs, and other embedded components. Local routing connects PLBs to other adjacent PLBs and PLBs to global routing (done through a switch matrix)

° Several types of PIPs are used • Cross-point = connects vertical or horizontal wire segments allowing turns • Breakpoint = connects or isolates 2 wire segments • Decoded MUX = group of 2^n cross-points connected to a single output configure by n configuration bits • Non-decoded MUX = n wire segments each with a configuration bit (n segments) • Compound cross-point = 6 Break-point PIPS (can isolate two isolated signal nets)

13 4/27/2019

Progammable Input/Output Cells ° Bi-directional Buffers • Programmable for inputs or outputs • Tri-state controls bi-directional operation • Pull-up/down resistors • FFs/ Latches are used to improve timing issues - Set-up and hold times - Clock-to-out delay ° Routing Resources • Connections to core of array ° Programmable I/O voltage and current levels

Boundary Scan Access

FPGA Configuration Interfaces ° Master (Serial or Parallel) • FPGA retrieves configuration from ROM at initial power-up ° Slave (Serial or Parallel) • FPGA configured by an external source ( iei.e microprocessor/ other FPGA) • Used for dynamic partial re-configuration ° Boundary Scan • 4-wire IEEE standard serial interface used for testing • Write and read access to configuration memory • Interfaces to FPGA core internal routing network

14 4/27/2019

Boundary Scan Configuration

Multi-FPGA Emulation Framework to support NoC design and verification (UNLV NSIL) Developed to test interconnect between chips on PCB

Daisy Chain Configuration Test Access Point (TAP) controller composed of 16 state FSM

FPGA Configuration Techniques ° Full configuration and readback • Simple configuration interface - Automatic internal calculation of frame address • Larger FPGAs have a longer download time

° Compressed configuration • Requires multiple frame write capability - Identical frames of configuration data are written to multiple frame addresses • Extension of partial re-configuration interface capabilities - Frame address is much smaller than frame of configuration data • Reduces download time for initial configuration depending on regularity of system function and the array percent that is utilized

° Partial re-configuration and readback • Only change portions of configuration memory with respect to reference design - Reduces download time for re-configuration

15 4/27/2019

Xilinx Virtex-5 FPGAs

Virtex-5 FPGA Platforms

•Over 320,000 PLBs on the largest Virtex-5

•ExpressFabric interconnect sturcture and Five Virtex-5 Platforms 12 levels of metal interconnect allowing 1. LX- general logic applications implementation of complex logic functions 2. LXT- logic with advanced allowing connections to neighboring PLBs serial connectivity in few hops than Virtex-4 3. SXT-signal processing applications with advanced •Each PLB contains 8 LUTs, 8 configurable serial connectivity memory elements (can be configured as 4. TXT- high performance RAM/ ROM/ shift register) systems with double density advanced serial connectivity •Enhanced DSP functions on 25 x 18-bit 5. FXT- higgph performance multipliers (ability to be cascaded) embedded systems with advanced serial connectivity •Clock managments contain one PLLC and two managers which can drive global clock buffers and filter jitter (cascaded)

16 4/27/2019

Virtex-5 CLB

A single CLB in Virtex-5 consists of two slices: SLICEL (logic) and SLICEM (memory) . Each CLB is connected to a switch matrix which can access to a general routing (global) matrix.

Every slice contains four LUTS, wide function MUXs, carry logic, and configurable memory elements. SLICEM support storing data using distributed RAM and data shifting with 32-bit shift registers

FPGA Design Comparison Virtex-5, Virtex-6, and spartan 6

Virtex-6 CLB have the same setup as Virtex-5 (SLICEL & SLICEM)

Virtex-6 devices add four additional storage elements which can only be configured as edge-triggered D-FFs. The D inputs are driven by the output of the LUTs or bypass slice in puts AX-DX

17 4/27/2019

FPGA Design Comparison Virtex-5, Virtex-6, and spartan 6

Spartan-6 CLB columns are separated into two columns: 1 column for a new SLICEX and 1 column for alternating SLICEL and SLICEM. SLICEX is a basic CLB without any carry logic added

Back to Virtex-5 CLB LUT ° Up to 207, 360 LUTs (6-input) with greater than 13 million configuration bits. ° Can be configured as dual-output 5-input LUTs. In single 6 -input LUT , O6 is the primary output.

18 4/27/2019

Virtex-5 Programmable I/O

The I/O cells in Virtex-5 have output logic blocks (OLOGIC) , input logic blocks (ILOGIC), I/O delayy,s blocks, and a bidirectional I/O buffer.

OLOGIC implements registers to improve system clock-to-output timing and supports single data- rate (SDR) and double data-rate (DDR) reception of data. It can also perform parallel-to-serial conversion of output data (2 & 6 bits) in Serial/De-serializer (SerDes) mode.

ILOGIC i mpl ements regi sters to i mprove set-up and hold times and support SDR and DDR transmission of data. It can perform serial-to- parallel conversion of input data(2 & 6 bits) when Two I/O cells are grouped to form a in SerDes mode. single I/O tile. In master/slave mode, two I/O cells in the same I/O tile are connected via dedicated shift routing to support larger data widths.

FPGAs see diminishing benefits with scaling

° 90% of FPGA logic area is programmable interconnect ° Performance and power penalty are direct result of the area (70% Virtex-2) ° Interconnect needs to increase faster than number of gates to keep up (Rents rule)

10% Interconnect 14% Logic Clocking 60% 16% IOB

Dynamic Power in Virtex-2 (Shang FPGA’02)

19 4/27/2019

3D Integrated Circuits •More functionality in a smaller space  extends Moore’s Law •More transistors in a package  larger designs •Shorter Interconnects  less RC delays  better chip performance •Power Decrease  shorter wires reduce power consumption by producing less capacitance (also less inductance) •Bandwith large number of vertial vias between layers allow construction of wide bandwidth buses between functional blocks in different layers

3D Integrate Circuit

Metal layers

Device layer 2

Metal layers

Device layer 1

Si Substrate

20 4/27/2019

Summary

° Programmable logic allows for designers to easily create custom designs ° Programmable array logic contains AND-OR structures to implement SOP equations ° FPGAs contain small memories and numerous wires for routing ° Designers create designs in • Design translated to the chip via software

21