2384-10
ICTP Latin-American Advanced Course on FPGADesign for Scientific Instrumentation
19 November - 7 December, 2012
FPGA Introduction
SISTERNA Cristian Alejandro Universidad Nacional de San Juan Instituto de Investigaciones Antisismicas Av. San Martin 1290 - Oeste 5400 San Juan ARGENTINA
FPGA INTRODUCTION
Cristian AlejandroSisterna, MSc Universidad Nacional San Juan Argentina
ICTP 2012 - CUBA Agenda
2
Introduction
FPGA Architecture
Configuration and routing cells
Basic slice resources available in Xilinx FPGAs
Basic I/O resources available in Xilinx FPGAs
Clocking resources
Memory blocks and distributed memory
Multipliers and DSP blocks
Routing
Spartan 6, Virtex 6, Virtex 7
FPGA Configuration
Basic Architecture 2 Cristian Sisterna ICTP 2012 Introduction
Cristian Sisterna ICTP 2012 PLDs Evolution
4
Cristian Sisterna ICTP 2012 FPGA?
5
Field Programmable Gate Array
Cristian Sisterna ICTP 2012 FPGA: Competitive Market
6
Altera Inc. Actel Corp. Lucent Technology Philips Semiconductors FPGA Intel Corp. Motorola Semiconductors 2000 QuickLogic Cypress 2012 Lattice Semiconductors AMD Vantis Xilinx
Cristian Sisterna ICTP 2012 FPGA Architecture
Cristian Sisterna ICTP 2012 FPGA? – What is it?
8
Programmable Logic, Interconnections and Routing Programmable in System (ISP) Dedicated Blocks: Memory Clock Control DSP blocks Embedded processor(s) Gigabits serial transmission/reception Ethernet controller Memory controllers
Cristian Sisterna ICTP 2012 FPGA? – What is it? (cont.)
9
Up to1200 I/O More than 40 I/O standards. Single ended, Differentials More than 40.000 Flips-Flops and Look-Up-Tables (LUTs) Soft-Coded Processors, 8051, ARM3 PLLs and DLLs available (2-12) per device. Up to 550MHz. Programmable output impedance Dedicated hard coded blocks: Processors PCI E interface Gigabit transceivers Dedicated DSP blocks Memory blocks
Cristian Sisterna ICTP 2012 FPGA Architecture
10
RAM Memory Block
f CLBs iR lo aw s DSP Block
I/O Block columns Cristian Sisterna ICTP 2012 FPGA Architecture (cont.)
11
CLBs
RAM Memory RAM
DSP Block
I/O Block
Com Cristian Sisterna InterfacesICTP 2012 Spartan-6 FPGA Architecture
12
CLB I/O CMT Memory Controller BUFG BUFIO MGT Block RAM PCIe Endpoint DSP48
Cristian Sisterna ICTP 2012 13
Spartan 3 Internal View
Cristian Sisterna ICTP 2012 General Altera FPGA Architecture
14
Cristian Sisterna ICTP 2012 FPGA Silicon View
15
Cristian Sisterna ICTP 2012 Resources Available in an FPGA
16
ASMBL™ th 500 MHz 4 Generation Column-Based SmartRAM™ Advanced Logic Architecture BRAM/FIFO
Integrated 450 MHz PowerPC Cores 0.6 - 11.1 Gbps RocketIO™
Integrated Tri-Mode Ethernet MAC SelectIO with Cores ChipSync™ Technology: Integrated 500 MHz 500 MHz - 1 Gbps LVDS System Monitor Xesium™ Clocking Xtreme DSP™ Slice - 600 Mbps SE
Cristian Sisterna ICTP 2012 Xilinx FPGA Architecture Alignment
17
Virtex-6 FPGAs Spartan-6 FPGAs
760K Common Resources 150K Logic Cell Logic Cell Device LUT-6 CLB Device
BlockRAM
DSP Slices High-performance Clocking
FIFO Logic Parallel I/O Hardened Memory Controllers
Tri-mode EMAC HSS Transceivers* 3.3 Volt compatible I/O
System Monitor PCIe® Interface
Basic Architecture 17 Cristian Sisterna ICTP 2012 Xilinx FPGAs Overview
18
Cristian Sisterna ICTP 2012 FPGA Configuration and Routing Cell
Cristian Sisterna ICTP 2012 FPGA Logic Configuration Options
20
Cristian Sisterna ICTP 2012 FPGA Routing Options
21
Cristian Sisterna ICTP 2012 Logic and Routing Configuration
22
L
Configuration & Routing Bits 110101011101010010001
Cristian Sisterna ICTP 2012 FPGA Configuration Cells
23
Tipos de Celdas SRAM Anti-Fuse Flash Flash y SRAM
Cristian Sisterna ICTP 2012 FPGA Cell Comparison
24 SRAM Anti_fuse Flash
One or Two One or Two Technology Latest generation behind generation behind
Speed Slower Faster Slower Volatility yes No No Power Poor Better Medium
Density Good Best Medium
Radiation Tolerant Poor Best Medium External Configuration Yes No No Cell size 1 1/10 1/7 Reprogrammable Yes No Si Instant-On No Yes Yes
Security Poor Very Good Very Good
Config. Transistors 6 Transistors Tiny 2 transistors Cristian Sisterna ICTP 2012 FPGA Architecture: Big Fight
25
In general the FPGA arquitecture is similar among the largest vendors. Even though each vendor states the its FPGA is the BEST. . .
Who is the big winner of this The final user fight?????
Cristian Sisterna ICTP 2012 FPGA Architecture
26
The FPGAs from Xilinx are divided in
Spartan 2-3-6 Virtex 2-2P-4-5-6
Good performance High performance Great relation price/performance Expensive
Cristian Sisterna ICTP 2012 CLB SLICEs
Cristian Sisterna ICTP 2012 Spartan 3 – FPGA
28
Cristian Sisterna ICTP 2012 S3 - Configurable Logic Block (CLB)
29
CLB
Cristian Sisterna ICTP 2012 S3 - CLB – Actual Internal View
30
Cristian Sisterna ICTP 2012 S3 - CLB – Main Logic
31
Two LUTs Tow flip-flops Four outputs Two combinationals Two registered Control Input for FFs
I/O carry chain
Cristian Sisterna ICTP 2012 S3 – Half Slice Detailed View
32
Cristian Sisterna ICTP 2012 S3 - Look-Up Table
33
ABCDZ 00000 00010 00100 00111 01001 01011 ... 11000 11010 11100 11111
The LUT configuration is not responsibility of the designer
Cristian Sisterna ICTP 2012 S3 - CLB Register Elements
34
Cristian Sisterna ICTP 2012 S3 - CLB – Carry Logic
35
Cristian Sisterna ICTP 2012 S3 - Different Type of SLICEs
36
Cristian Sisterna ICTP 2012 37
CLB Internal View
Cristian Sisterna ICTP 2012 S6 - SLICEs
38
SLICEM: Full slice LUT can be used for logic and SLICEX memory/SRL Has wide multiplexers and carry SLICEM chain SLICEL: Logic and arithmetic only or LUT can only be used for logic (not memory) Has wide multiplexers and carry chain SLICEX SLICEX: Logic only LUT can only be used for logic (not SLICEL memory) No wide multiplexers or carry chain
Cristian Sisterna ICTP 2012 S6 – SLICEs
39
SliceM (25%) SliceL (25%) SliceX (50%)
LUT6 LUT6 LUT6 8 Registers 8 Registers Optimized for Logic Carry Logic Carry Logic 8 Registers Wide Function Muxes Wide Function Muxes Distributed RAM / SRL logic
Basic Architecture 39 Cristian Sisterna ICTP 2012 S6 - SLICE
Four LUTs
Eight storage elements LUT/RAM/SRL Four flip-flop/latches Four flip-flops
F7MUX and F8MUX LUT/RAM/SRL Connects LUT outputs to create wide functions Output can drive the flip-flop/latches LUT/RAM/SRL Carry chain (Slice0 only) Connected to the LUTs and the four
flip-flop/latches LUT/RAM/SRL
0 1
Basic Architecture 40 Cristian Sisterna ICTP 2012 S6 - 6-Input LUT with Dual Output
6-input LUT can be two 5-input LUTs with common inputs Minimal speed impact to 6-LUT a 6-input LUT A6
One or two outputs A5 A5 A4 A4 D Any function of six variables or A3 A3 5-LUT two independent functions of A2 A2 A1 A1 five variables O6
A5 A4 D O5 A3 5-LUT A2 A1
Basic Architecture 41 Cristian Sisterna ICTP 2012 Configuring LUTs as a Shift Register (SRL)
LUT D DQ CE CE CLK
DQ CE
DQ Q CE
DQ LUT CE
A[4:0] Q31 (cascade out)
Basic Architecture 42 Cristian Sisterna ICTP 2012 Shift Register LUT Example
20 Cycles
Operation A Operation B 64 8 Cycles 12 Cycles 64 Operation C Operation D - NOP
3 Cycles 17 Cycles Paths are Statically Balanced 20 Cycles
Operation D - NOP must add 17 pipeline stages of 64 bits each 1,088 flip-flops (hence 136 slices) or 64 SRLs (hence 16 slices)
Basic Architecture 43 Cristian Sisterna ICTP 2012 44
V5 Slice L
Cristian Sisterna ICTP 2012 I/O Resources
Cristian Sisterna ICTP 2012 46
Spartan 3 I/O Block (IOB)
Cristian Sisterna ICTPTP 202012 I/O Block
47
Cristian Sisterna ICTP 2012 Spartan 6 I/O Block Diagram
Logical Resources Electrical Resources
IOSERDES
IODELAY Parallel to serial IOLOGIC P converter
IOSERDES (serializer) Serial to parallel
LVDS converter Termination (De-serializer) IODELAY Selectable fine-
IOLOGIC IODELAY grained delay Interconnect to FPGA fabric Interconnect to FPGA N SDR and DDR IOSERDES resources
Basic Architecture 48 Cristian Sisterna ICTP 2012 S6 - FPGA Supports 40+ Standards
Each input can be 3.3 V compatible
LVCMOS (3.3 V, 2.5 V, 1.8 V, 1.5 V, and 1.2 V)
LVCMOS_JEDEC
LVPECL (3.3 V, 2.5 V)
PCI
I2C*
HSTL (1.8 V, 1.5 V; Classes I, II, III, IV) DIFF_HSTL_I, DIFF_HSTL_I_18 DIFF_HSTL_II*
SSTL (2.5 V, 1.8 V; Classes I, II) DIFF_SSTL_I, DIFF_SSTL18_I DIFF_SSTL_II*
LVDS, Bus LVDS
RSDS_25 (point-to-point)
Basic Architecture 49 Cristian Sisterna ICTP 2012 S6 - FPGA I/O Bank Structure
BANK 0 All I/Os are on the edges of the chip
I/Os are grouped into banks BANK 3 BANK 1 30 ~ 83 I/O per banks BANK 2 Eight clock pins per edge Chip View Common VCCO, VREF (LX45/T and Smaller) Restricts mixture of standards in one bank BANK 0 The differential driver is only available BANK 5 in BANK 4
BANK 1 Bank0 and Bank2 BANK 3 Differential receiver is available in all banks BANK 2 On-chip termination is available in all banks Chip View (LX100/T and Larger) Basic Architecture 50 Cristian Sisterna ICTP 2012 I/O Resources
51
Digital Controlled Impedance (DCI) Drive Strenght Slew Rate Bus Hold (Bus keeper) Pull-up/Pull-down Differential Termination IODelay (V5, V6, V7) Fixed Variable
Cristian Sisterna ICTP 2012 FPGA Memory
Cristian Sisterna ICTP 2012 FPGA Block RAM (BRAM) Features
18 kb size 18k Memory With multiple size configuration
Multiple configuration options Dual-Port True dual-port, simple dual-port, single-port BRAM Two independent ports access common data Individual address, clock, write enable, clock eenenableable Independent widths for each port
Byte-write enable
Different modes: Write first Read first, then write No change Basic Architecture 53 Cristian Sisterna ICTP 2012 S3 - Memory Block (BRAM)
54
Cristian Sisterna ICTP 2012 BRAMs Usages
55
Cristian Sisterna ICTP 2012 BRAM Configuration Size
56
1 2 4
0 0 0
4Kx4
8Kx2 4.095
8
16Kx1 8.191 0 2Kx8
2047
16+2 0
16.383 1023
Cristian Sisterna ICTP 2012 BRAM Forced Location
57
Location Constraint:
LOC
Uses the same storage that is used for the look-up table function Single Dual Simple Quad Port Port Dual Port Port Synchronous write, asynchronous read Can be converted to synchronous read 32x2 32x2D 32x6SDP 32x2Q using the flip-flops available in the slice 32x4 32x4D 64x3SDP 64x1Q 32x6 64x1D Various configurations 32x8 64x2D Single port 64x1 128x1D One LUT6 = 64x1 or 32x2 RAM 64x2 Cascadable up to 256x1 RAM 64x3 Dual port (D) 64x4 1 read / write port + 1 read-only port 128x1 Simple dual port (SDP) 128x2 1 write-only port + 1 read-only port 256x1 Quad-port (Q) 1 read / write port + 3 read-only ports
Basic Architecture 58 Cristian Sisterna ICTP 2012 S6 - Memory Controller
59
Only low cost FPGA with a “hard” memory controller
Guaranteed memory interface performance providing Reduced engineering & board design time DDR, DDR2, DDR3 & LP DDR support Up to 12.8Mbps bandwidth for each memory controller
Automatic calibration features DRAM
DRAM Multiport structure for user interface SRAMAAMM DDR Six 32-bit programmable ports from fabric Spartan-6 DDR2 DDR3 FLASHASSH LP DDR Controller interface to 4, 8 or 16 bit memories devicesce
EEPROM
Basic Architecture 59 Cristian Sisterna ICTP 2012 FPGA Multipliers and DSP Blocks
Cristian Sisterna ICTP 2012 Different Multipliers/DSP Blocks
Cristian Sisterna ICTP 2012 S3 – Multiplier Locations
62
Cristian Sisterna ICTP 2012 Spartan 3 - Multiplier
63
P = A x B
36 = 18 x 18
Pipelining (optional)
Cristian Sisterna ICTP 2012 Virtex 5/6 - Conexiones del DSP48E
64
Cristian Sisterna ICTP 2012 Virtex 5/6 - Bloque DSP48E
65
Cristian Sisterna ICTP 2012 FPGA Routing
Cristian Sisterna ICTP 2012 Routing
67
Transistor de Paso
Y0
Y
M
PIP Cristian Sisterna ICTP 2012 Routing (cont.)
68
Cristian Sisterna ICTP 2012 Routing Delay Report
69
Cristian Sisterna ICTP 2012 Routing (cont.)
70
Cristian Sisterna ICTP 2012 Routing (cont.)
71
Cristian Sisterna ICTP 2012 Clock Resources
Cristian Sisterna ICTP 2012 S3 – Digital Clock Manager
73
Cristian Sisterna ICTP 2012 S3 – Digital Clock Manager (cont.)
74
CLKIN CLK0
CLKFB CLK2X
CLK2X180 DCM
CLKDIV
CLKFX
CLKFX180
Cristian Sisterna ICTP 2012 DCM Purposes
75
Elimintating clock skew
Clock phase shifting Variable Fixed
Multiply and Divide input clock, to generate a new frequency
Duty cycle 50%
Rebuffer clock input
Cristian Sisterna ICTP 2012 DCM Aplication
76
Skew elimination on internal clock signals
Skew elimination on external clock signal
Cristian Sisterna ICTP 2012 Dedicated Clokc Routing
77
GEFH
DCM DCM
8 8 4 H H G G
F F E 8 8 8 8 E D D C C B B 8 A 8 4 A
DCM DCM CABD
Cristian Sisterna ICTP 2012 78
Dedicated Clock Routing: Real application
Cristian Sisterna ICTP 2012 Spartan-6 FPGA I/O Clock Network
BUFIO2 IO bank P N P N P N P N BUFPLL
IOLOGIC IOLOGIC IOLOGIC IOLOGIC
CMT PLL
Special clock network dedicated to I/O logical resources Independent of global clock resources Speeds up to 1 GHz Multiple sources for clocking I/O logic BUFIO2: for high-speed dedicated I/O clock signals BUFPLL: for clocks driven by the PLL in the CMT
Basic Architecture 79 Cristian Sisterna ICTP 2012 Spartan-6 FPGA Clock Management Tile (CMT)
Clocks from BUFG
Feedback clocks from BUFIO2FB
GCLK Inputs
CLKIN 6 CLKOUT<5:0> pll_clkout<5:0> PLL CLKFB
CLKIN 10 CLKOUT<9:0> dcm1_clkout<9:0> CLKFB DCM
CLKIN 10 CLKOUT<9:0> dcm2_clkout<9:0> CLKFB DCM
Basic Architecture 80 Cristian Sisterna ICTP 2012 Spartan 6 - Virtex 6 - Virtex 7
Cristian Sisterna ICTP 2012 Virtex® Product & Process Evolution
82
Virtex-6 40-nm Virtex-5 65-nm Virtex-4 90-nm Virtex-II Pro 130-nm Virtex-II 150-nm Virtex-E 180-nm Virtex 220-nm
1st Generation 2nd Generation 3rd Generation 4th Generationn 5th Generation 6th Generation
Basic Architecture 82 Cristian Sisterna ICTP 2012 Virtex-6 and Spartan-6 FPGA Sub-Families
Virtex-6 Virtex-6 Virtex-6 Virtex-6 CXT FPGA LXT FPGA SXT FPGA HXT FPGA
• Upto 3.75Gbps serial connectivity• High Logic Density • High Logic Density • High Logic Density and corresponding logic • High-Speed Serial • High-Speed Serial • Ultra High-Speed Serial performance Connectivity Connectivity Connectivity Spartan-6 Spartan-6 • Enhanced DSP LX FPGA LXT FPGA
Logic Block RAM DSP Parallel I/O Serial I/O
• Lowest Cost Logic • Lowest Cost Logic • Low-Cost Serial Connectivity
Basic Architecture 83 Cristian Sisterna ICTP 2012 Virtex 7
84
Common elements enable easy IP reuse for quick design portability across all 7 series families Artix™-7 FPGA Design scalability from low-cost to high-performance Expanded eco-system support Quickest TTM
Logic Fabric Precise, Low Jitter Clocking LUT-6 CLB MMCMs Kintex™-7 FPGA
On-Chip Memory Enhanced Connectivity 36Kbit/18Kbit Block RAM PCIe® Interface Blocks
DSP Engines Hi-perf. Parallel I/O Connectivity DSP48E1 Slices SelectIO™ Technology
Hi-performance Serial I//O Connectivity Transceiver Technology Virtex®-7 FPGA
Basic Architecture 84 Cristian Sisterna ICTP 2012 The Xilinx 7 Series FPGAs
Industry’s Lowest Power and First Unified Architecture
Spanning Low-Cost to Ultra High-End applications
Three new device families with breakthrough innovations in power efficiency, performance-capacity and price-performance
Basic Architecture 85 PageCristian 85 Sisterna ICTP 2012 FPGA Configuration
Cristian Sisterna ICTP 2012 FPGA Master
87
Cristian Sisterna ICTP 2012 FPGA Slave
88
Cristian Sisterna ICTP 2012