2384-10

ICTP Latin-American Advanced Course on FPGADesign for Scientific Instrumentation

19 November - 7 December, 2012

FPGA Introduction

SISTERNA Cristian Alejandro Universidad Nacional de San Juan Instituto de Investigaciones Antisismicas Av. San Martin 1290 - Oeste 5400 San Juan ARGENTINA

FPGA INTRODUCTION

Cristian AlejandroSisterna, MSc Universidad Nacional San Juan Argentina

ICTP 2012 - CUBA Agenda

2

Introduction

FPGA Architecture

Configuration and routing cells

Basic slice resources available in FPGAs

Basic I/O resources available in Xilinx FPGAs

Clocking resources

Memory blocks and distributed memory

Multipliers and DSP blocks

Routing

Spartan 6, Virtex 6, Virtex 7

FPGA Configuration

Basic Architecture 2 Cristian Sisterna ICTP 2012 Introduction

Cristian Sisterna ICTP 2012 PLDs Evolution

4

Cristian Sisterna ICTP 2012 FPGA?

5

Field Programmable Gate Array

Cristian Sisterna ICTP 2012 FPGA: Competitive Market

6

Altera Inc. Corp. Lucent Technology Philips Semiconductors FPGA Corp. Motorola Semiconductors 2000 QuickLogic Cypress 2012 Lattice Semiconductors AMD Vantis Xilinx

Cristian Sisterna ICTP 2012 FPGA Architecture

Cristian Sisterna ICTP 2012 FPGA? – What is it?

8

Programmable Logic, Interconnections and Routing Programmable in System (ISP) Dedicated Blocks: Memory Clock Control DSP blocks Embedded processor(s) Gigabits serial transmission/reception Ethernet controller Memory controllers

Cristian Sisterna ICTP 2012 FPGA? – What is it? (cont.)

9

Up to1200 I/O More than 40 I/O standards. Single ended, Differentials More than 40.000 Flips-Flops and Look-Up-Tables (LUTs) Soft-Coded Processors, 8051, ARM3 PLLs and DLLs available (2-12) per device. Up to 550MHz. Programmable output impedance Dedicated hard coded blocks: Processors PCI E interface Gigabit transceivers Dedicated DSP blocks Memory blocks

Cristian Sisterna ICTP 2012 FPGA Architecture

10

RAM Memory Block

f CLBs iR lo aw s DSP Block

I/O Block columns Cristian Sisterna ICTP 2012 FPGA Architecture (cont.)

11

CLBs

RAM Memory RAM

DSP Block

I/O Block

Com Cristian Sisterna InterfacesICTP 2012 Spartan-6 FPGA Architecture

12

CLB I/O CMT Memory Controller BUFG BUFIO MGT Block RAM PCIe Endpoint DSP48

Cristian Sisterna ICTP 2012 13

Spartan 3 Internal View

Cristian Sisterna ICTP 2012 General FPGA Architecture

14

Cristian Sisterna ICTP 2012 FPGA Silicon View

15

Cristian Sisterna ICTP 2012 Resources Available in an FPGA

16

ASMBL™ th 500 MHz 4 Generation Column-Based SmartRAM™ Advanced Logic Architecture BRAM/FIFO

Integrated 450 MHz PowerPC Cores 0.6 - 11.1 Gbps RocketIO™

Integrated Tri-Mode Ethernet MAC SelectIO with Cores ChipSync™ Technology: Integrated 500 MHz 500 MHz - 1 Gbps LVDS System Monitor Xesium™ Clocking Xtreme DSP™ Slice - 600 Mbps SE

Cristian Sisterna ICTP 2012 Xilinx FPGA Architecture Alignment

17

Virtex-6 FPGAs Spartan-6 FPGAs

760K Common Resources 150K Logic Cell Logic Cell Device LUT-6 CLB Device

BlockRAM

DSP Slices High-performance Clocking

FIFO Logic Parallel I/O Hardened Memory Controllers

Tri-mode EMAC HSS Transceivers* 3.3 Volt compatible I/O

System Monitor PCIe® Interface

Basic Architecture 17 Cristian Sisterna ICTP 2012 Xilinx FPGAs Overview

18

Cristian Sisterna ICTP 2012 FPGA Configuration and Routing Cell

Cristian Sisterna ICTP 2012 FPGA Logic Configuration Options

20

Cristian Sisterna ICTP 2012 FPGA Routing Options

21

Cristian Sisterna ICTP 2012 Logic and Routing Configuration

22

L

Configuration & Routing Bits 110101011101010010001

Cristian Sisterna ICTP 2012 FPGA Configuration Cells

23

‰ Tipos de Celdas ‰ SRAM ‰ Anti-Fuse ‰ Flash ‰ Flash y SRAM

Cristian Sisterna ICTP 2012 FPGA Cell Comparison

24 SRAM Anti_fuse Flash

One or Two One or Two Technology Latest generation behind generation behind

Speed Slower Faster Slower Volatility yes No No Power Poor Better Medium

Density Good Best Medium

Radiation Tolerant Poor Best Medium External Configuration Yes No No Cell size 1 1/10 1/7 Reprogrammable Yes No Si Instant-On No Yes Yes

Security Poor Very Good Very Good

Config. Transistors 6 Transistors Tiny 2 transistors Cristian Sisterna ICTP 2012 FPGA Architecture: Big Fight

25

In general the FPGA arquitecture is similar among the largest vendors. Even though each vendor states the its FPGA is the BEST. . .

Who is the big winner of this The final user fight?????

Cristian Sisterna ICTP 2012 FPGA Architecture

26

The FPGAs from Xilinx are divided in

Spartan 2-3-6 Virtex 2-2P-4-5-6

Good performance High performance Great relation price/performance Expensive

Cristian Sisterna ICTP 2012 CLB SLICEs

Cristian Sisterna ICTP 2012 Spartan 3 – FPGA

28

Cristian Sisterna ICTP 2012 S3 - Configurable Logic Block (CLB)

29

CLB

Cristian Sisterna ICTP 2012 S3 - CLB – Actual Internal View

30

Cristian Sisterna ICTP 2012 S3 - CLB – Main Logic

31

Two LUTs Tow flip-flops Four outputs Two combinationals Two registered Control Input for FFs

I/O carry chain

Cristian Sisterna ICTP 2012 S3 – Half Slice Detailed View

32

Cristian Sisterna ICTP 2012 S3 - Look-Up Table

33

ABCDZ 00000 00010 00100 00111 01001 01011 ... 11000 11010 11100 11111

The LUT configuration is not responsibility of the designer

Cristian Sisterna ICTP 2012 S3 - CLB Register Elements

34

Cristian Sisterna ICTP 2012 S3 - CLB – Carry Logic

35

Cristian Sisterna ICTP 2012 S3 - Different Type of SLICEs

36

Cristian Sisterna ICTP 2012 37

CLB Internal View

Cristian Sisterna ICTP 2012 S6 - SLICEs

38

SLICEM: Full slice † LUT can be used for logic and SLICEX memory/SRL † Has wide and carry SLICEM chain SLICEL: Logic and arithmetic only or † LUT can only be used for logic (not memory) † Has wide multiplexers and carry chain SLICEX SLICEX: Logic only † LUT can only be used for logic (not SLICEL memory) † No wide multiplexers or carry chain

Cristian Sisterna ICTP 2012 S6 – SLICEs

39

SliceM (25%) SliceL (25%) SliceX (50%)

ƒ LUT6 ƒ LUT6 ƒ LUT6 ƒ 8 Registers ƒ 8 Registers ƒ Optimized for Logic ƒ Carry Logic ƒ Carry Logic ƒ 8 Registers ƒ Wide Function Muxes ƒ Wide Function Muxes ƒ Distributed RAM / SRL logic

Basic Architecture 39 Cristian Sisterna ICTP 2012 S6 - SLICE

Four LUTs

Eight storage elements LUT/RAM/SRL † Four flip-flop/latches † Four flip-flops

F7MUX and F8MUX LUT/RAM/SRL † Connects LUT outputs to create wide functions † Output can drive the flip-flop/latches LUT/RAM/SRL Carry chain (Slice0 only) † Connected to the LUTs and the four

flip-flop/latches LUT/RAM/SRL

0 1

Basic Architecture 40 Cristian Sisterna ICTP 2012 S6 - 6-Input LUT with Dual Output

6-input LUT can be two 5-input LUTs with common inputs † Minimal speed impact to 6-LUT a 6-input LUT A6

† One or two outputs A5 A5 A4 A4 D † Any function of six variables or A3 A3 5-LUT two independent functions of A2 A2 A1 A1 five variables O6

A5 A4 D O5 A3 5-LUT A2 A1

Basic Architecture 41 Cristian Sisterna ICTP 2012 Configuring LUTs as a Shift Register (SRL)

LUT D DQ CE CE CLK

DQ CE

DQ Q CE

DQ LUT CE

A[4:0] Q31 (cascade out)

Basic Architecture 42 Cristian Sisterna ICTP 2012 Shift Register LUT Example

20 Cycles

Operation A Operation B 64 8 Cycles 12 Cycles 64 Operation C Operation D - NOP

3 Cycles 17 Cycles Paths are Statically Balanced 20 Cycles

Operation D - NOP must add 17 pipeline stages of 64 bits each † 1,088 flip-flops (hence 136 slices) or † 64 SRLs (hence 16 slices)

Basic Architecture 43 Cristian Sisterna ICTP 2012 44

V5 Slice L

Cristian Sisterna ICTP 2012 I/O Resources

Cristian Sisterna ICTP 2012 46

Spartan 3 I/O Block (IOB)

Cristian Sisterna ICTPTP 202012 I/O Block

47

Cristian Sisterna ICTP 2012 Spartan 6 I/O Block Diagram

Logical Resources Electrical Resources

‰ IOSERDES

IODELAY ‰ Parallel to serial IOLOGIC P converter

IOSERDES (serializer) ‰ Serial to parallel

LVDS converter Termination (De-serializer) ‰ IODELAY ‰ Selectable fine-

IOLOGIC IODELAY grained delay Interconnect to FPGA fabric Interconnect to FPGA N ‰ SDR and DDR IOSERDES resources

Basic Architecture 48 Cristian Sisterna ICTP 2012 S6 - FPGA Supports 40+ Standards

Each input can be 3.3 V compatible

LVCMOS (3.3 V, 2.5 V, 1.8 V, 1.5 V, and 1.2 V)

LVCMOS_JEDEC

LVPECL (3.3 V, 2.5 V)

PCI

I2C*

HSTL (1.8 V, 1.5 V; Classes I, II, III, IV) † DIFF_HSTL_I, DIFF_HSTL_I_18 † DIFF_HSTL_II*

SSTL (2.5 V, 1.8 V; Classes I, II) † DIFF_SSTL_I, DIFF_SSTL18_I † DIFF_SSTL_II*

LVDS, Bus LVDS

RSDS_25 (point-to-point)

Basic Architecture 49 Cristian Sisterna ICTP 2012 S6 - FPGA I/O Bank Structure

BANK 0 All I/Os are on the edges of the chip

I/Os are grouped into banks BANK 3 BANK 1 † 30 ~ 83 I/O per banks BANK 2 † Eight clock pins per edge Chip View † Common VCCO, VREF (LX45/T and Smaller) „ Restricts mixture of standards in one bank BANK 0 The differential driver is only available BANK 5 in BANK 4

BANK 1 Bank0 and Bank2 BANK 3 † Differential receiver is available in all banks BANK 2 † On-chip termination is available in all banks Chip View (LX100/T and Larger) Basic Architecture 50 Cristian Sisterna ICTP 2012 I/O Resources

51

Digital Controlled Impedance (DCI) Drive Strenght Slew Rate Bus Hold (Bus keeper) Pull-up/Pull-down Differential Termination IODelay (V5, V6, V7) † Fixed † Variable

Cristian Sisterna ICTP 2012 FPGA Memory

Cristian Sisterna ICTP 2012 FPGA Block RAM (BRAM) Features

18 kb size 18k Memory † With multiple size configuration

Multiple configuration options Dual-Port † True dual-port, simple dual-port, single-port BRAM Two independent ports access common data † Individual address, clock, write enable, clock eenenableable † Independent widths for each port

Byte-write enable

Different modes: † Write first † Read first, then write † No change Basic Architecture 53 Cristian Sisterna ICTP 2012 S3 - Memory Block (BRAM)

54

Cristian Sisterna ICTP 2012 BRAMs Usages

55

Cristian Sisterna ICTP 2012 BRAM Configuration Size

56

1 2 4

0 0 0

4Kx4

8Kx2 4.095

8

16Kx1 8.191 0 2Kx8

2047

16+2 0

16.383 1023

Cristian Sisterna ICTP 2012 BRAM Forced Location

57

Location Constraint:

LOC = RAMB16_X#Y# Cristian Sisterna ICTP 2012 SLICEM Used as Distributed SelectRAM Memory

Uses the same storage that is used for the look-up table function Single Dual Simple Quad Port Port Dual Port Port Synchronous write, asynchronous read † Can be converted to synchronous read 32x2 32x2D 32x6SDP 32x2Q using the flip-flops available in the slice 32x4 32x4D 64x3SDP 64x1Q 32x6 64x1D Various configurations 32x8 64x2D † Single port 64x1 128x1D „ One LUT6 = 64x1 or 32x2 RAM 64x2 „ Cascadable up to 256x1 RAM 64x3 † Dual port (D) 64x4 „ 1 read / write port + 1 read-only port 128x1 † Simple dual port (SDP) 128x2 „ 1 write-only port + 1 read-only port 256x1 † Quad-port (Q) „ 1 read / write port + 3 read-only ports

Basic Architecture 58 Cristian Sisterna ICTP 2012 S6 - Memory Controller

59

Only low cost FPGA with a “hard” memory controller

Guaranteed memory interface performance providing † Reduced engineering & board design time † DDR, DDR2, DDR3 & LP DDR support † Up to 12.8Mbps bandwidth for each memory controller

Automatic calibration features DRAM

DRAM Multiport structure for user interface SRAMAAMM DDR † Six 32-bit programmable ports from fabric Spartan-6 DDR2 DDR3 FLASHASSH LP DDR † Controller interface to 4, 8 or 16 bit memories devicesce

EEPROM

Basic Architecture 59 Cristian Sisterna ICTP 2012 FPGA Multipliers and DSP Blocks

Cristian Sisterna ICTP 2012 Different Multipliers/DSP Blocks

Cristian Sisterna ICTP 2012 S3 – Multiplier Locations

62

Cristian Sisterna ICTP 2012 Spartan 3 - Multiplier

63

P = A x B

36 = 18 x 18

Pipelining (optional)

Cristian Sisterna ICTP 2012 Virtex 5/6 - Conexiones del DSP48E

64

Cristian Sisterna ICTP 2012 Virtex 5/6 - Bloque DSP48E

65

Cristian Sisterna ICTP 2012 FPGA Routing

Cristian Sisterna ICTP 2012 Routing

67

Transistor de Paso

Y0

Y

M

PIP Cristian Sisterna ICTP 2012 Routing (cont.)

68

Cristian Sisterna ICTP 2012 Routing Delay Report

69

Cristian Sisterna ICTP 2012 Routing (cont.)

70

Cristian Sisterna ICTP 2012 Routing (cont.)

71

Cristian Sisterna ICTP 2012 Clock Resources

Cristian Sisterna ICTP 2012 S3 – Digital Clock Manager

73

Cristian Sisterna ICTP 2012 S3 – Digital Clock Manager (cont.)

74

CLKIN CLK0

CLKFB CLK2X

CLK2X180 DCM

CLKDIV

CLKFX

CLKFX180

Cristian Sisterna ICTP 2012 DCM Purposes

75

Elimintating clock skew

Clock phase shifting † Variable † Fixed

Multiply and Divide input clock, to generate a new frequency

Duty cycle 50%

Rebuffer clock input

Cristian Sisterna ICTP 2012 DCM Aplication

76

Skew elimination on internal clock signals

Skew elimination on external clock signal

Cristian Sisterna ICTP 2012 Dedicated Clokc Routing

77

GEFH

DCM DCM

8 8 4 H H G G

F F E 8 8 8 8 E D D C C B B 8 A 8 4 A

DCM DCM CABD

Cristian Sisterna ICTP 2012 78

Dedicated Clock Routing: Real application

Cristian Sisterna ICTP 2012 Spartan-6 FPGA I/O Clock Network

BUFIO2 IO bank P N P N P N P N BUFPLL

IOLOGIC IOLOGIC IOLOGIC IOLOGIC

CMT PLL

Special clock network dedicated to I/O logical resources † Independent of global clock resources † Speeds up to 1 GHz Multiple sources for clocking I/O logic † BUFIO2: for high-speed dedicated I/O clock signals † BUFPLL: for clocks driven by the PLL in the CMT

Basic Architecture 79 Cristian Sisterna ICTP 2012 Spartan-6 FPGA Clock Management Tile (CMT)

Clocks from BUFG

Feedback clocks from BUFIO2FB

GCLK Inputs

CLKIN 6 CLKOUT<5:0> pll_clkout<5:0> PLL CLKFB

CLKIN 10 CLKOUT<9:0> dcm1_clkout<9:0> CLKFB DCM

CLKIN 10 CLKOUT<9:0> dcm2_clkout<9:0> CLKFB DCM

Basic Architecture 80 Cristian Sisterna ICTP 2012 Spartan 6 - Virtex 6 - Virtex 7

Cristian Sisterna ICTP 2012 Virtex® Product & Process Evolution

82

Virtex-6 40-nm Virtex-5 65-nm Virtex-4 90-nm Virtex-II Pro 130-nm Virtex-II 150-nm Virtex-E 180-nm Virtex 220-nm

1st Generation 2nd Generation 3rd Generation 4th Generationn 5th Generation 6th Generation

Basic Architecture 82 Cristian Sisterna ICTP 2012 Virtex-6 and Spartan-6 FPGA Sub-Families

Virtex-6 Virtex-6 Virtex-6 Virtex-6 CXT FPGA LXT FPGA SXT FPGA HXT FPGA

• Upto 3.75Gbps serial connectivity• High Logic Density • High Logic Density • High Logic Density and corresponding logic • High-Speed Serial • High-Speed Serial • Ultra High-Speed Serial performance Connectivity Connectivity Connectivity Spartan-6 Spartan-6 • Enhanced DSP LX FPGA LXT FPGA

Logic Block RAM DSP Parallel I/O Serial I/O

• Lowest Cost Logic • Lowest Cost Logic • Low-Cost Serial Connectivity

Basic Architecture 83 Cristian Sisterna ICTP 2012 Virtex 7

84

Common elements enable easy IP reuse for quick design portability across all 7 series families Artix™-7 FPGA † Design scalability from low-cost to high-performance † Expanded eco-system support † Quickest TTM

Logic Fabric Precise, Low Jitter Clocking LUT-6 CLB MMCMs Kintex™-7 FPGA

On-Chip Memory Enhanced Connectivity 36Kbit/18Kbit Block RAM PCIe® Interface Blocks

DSP Engines Hi-perf. Parallel I/O Connectivity DSP48E1 Slices SelectIO™ Technology

Hi-performance Serial I//O Connectivity Transceiver Technology Virtex®-7 FPGA

Basic Architecture 84 Cristian Sisterna ICTP 2012 The Xilinx 7 Series FPGAs

Industry’s Lowest Power and First Unified Architecture

† Spanning Low-Cost to Ultra High-End applications

Three new device families with breakthrough innovations in power efficiency, performance-capacity and price-performance

Basic Architecture 85 PageCristian 85 Sisterna ICTP 2012 FPGA Configuration

Cristian Sisterna ICTP 2012 FPGA Master

87

Cristian Sisterna ICTP 2012 FPGA Slave

88

Cristian Sisterna ICTP 2012