ECE 448 Lecture 5
FPGA Devices & FPGA Design Flow
ECE 448 – FPGA and ASIC Design with VHDL George Mason University Required reading
• Spartan-6 FPGA Configurable Logic Block: User Guide
§ CLB Overview § Slice Description
2 What is an FPGA?
Configurable Logic Blocks Block RAMs Block RAMs Block I/O Blocks
Block RAMs
ECE 448 – FPGA and ASIC Design with VHDL 3 Modern FPGA
RAMRAM bblockslocks Multipliers/DSPMultipliers units LogicLog resourcesic blocks
(#Logic resources, #Multipliers/DSP units, #RAM_blocks)
Graphics based on The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com) 4 Major FPGA Vendors
SRAM-based FPGAs • Xilinx, Inc. ~ 51% of the market ~ 85% • Altera Corp. ~ 34% of the market • Lattice Semiconductor • Atmel • Achronix • Tabula
Flash & antifuse FPGAs • Microsemi SoC Products Group (formerly Actel Corp.) • Quick Logic Corp.
ECE 448 – FPGA and ASIC Design with VHDL 5 Xilinx
u Primary products: FPGAs and the associated CAD software
Programmable Logic Devices ISE Alliance and Foundation Series Design Software
u Main headquarters in San Jose, CA
u Fabless* Semiconductor and Software Company
u UMC (Taiwan) {*Xilinx acquired an equity stake in UMC in 1996}
u Seiko Epson (Japan)
u TSMC (Taiwan)
u Samsung (Korea)
ECE 448 – FPGA and ASIC Design with VHDL 6 Xilinx FPGA Families Technology Low-cost High- performance 220 nm Virtex 180 nm Spartan-II, Spartan-IIE 120/150 nm Virtex-II, Virtex-II Pro 90 nm Spartan-3 Virtex-4 65 nm Virtex-5 45 nm Spartan-6 40 nm Virtex-6 28 nm Ar x-7 Virtex-7 FPGA Family
8 Spartan-6 FPGA Family
ECE 448 – FPGA and ASIC Design with VHDL 9 CLB Structure
ECE 448 – FPGA and ASIC Design with VHDL George Mason University General structure of an FPGA
Programmable interconnect
Programmable logic blocks
The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com)
ECE 448 – FPGA and ASIC Design with VHDL 11 Xilinx Spartan-6 CLB
ECE 448 – FPGA and ASIC Design with VHDL 12 Row & Column Relationship Between CLBs & Slices
ECE 448 – FPGA and ASIC Design with VHDL 13 SLICEX
ECE 448 – FPGA and ASIC Design with VHDL 14 4-input LUT (Look-Up Table) (used in earlier families of FPGAs)
• Look-Up tables x1 x 2 y x x x x y x3 LUT x x x x y are primary 1 2 3 4 x 1 2 3 4 0 0 0 0 1 4 0 0 0 0 0 0 0 0 1 1 0 0 0 1 1 elements for 0 0 1 0 1 0 0 1 0 0 0 0 1 1 1 0 0 1 1 0 logic 0 1 0 0 1 0 1 0 0 0 0 1 0 1 1 0 1 0 1 1 0 1 1 0 1 0 1 1 0 0 implementation 0 1 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 0 0 0 1 0 0 1 1 1 0 0 1 1 • Each LUT can 1 0 1 0 1 1 0 1 0 0 1 0 1 1 1 1 0 1 1 0 implement any 1 1 0 0 0 1 1 0 0 1 1 1 0 1 0 1 1 0 1 1 x x x x function of 1 1 1 0 0 1 2 3 4 1 1 1 0 0 1 1 1 1 0 1 1 1 1 0 4 inputs
x1 x2
y
y
ECE 448 – FPGA and ASIC Design with VHDL 15 6-Input LUT of Spartan-6
ECE 448 – FPGA and ASIC Design with VHDL 16 17 Reset and Set Configurations
• No set or reset • Synchronous set • Synchronous reset • Asynchronous set (preset) • Asynchronous reset (clear)
ECE 448 – FPGA and ASIC Design with VHDL 18 Three Different Types of Slices
50% 25% 25%
ECE 448 – FPGA and ASIC Design with VHDL 19 SLICEL
20 Fast Carry Logic
u Each CLB contains separate logic and routing for the fast generation of sum & carry MSB signals
• Increases efficiency and performance of adders, subtractors, accumulators, comparators, and counters Routing Carry Logic Carry u Carry logic is independent of LSB normal logic and routing resources
21 Accessing Carry Logic
u All major synthesis tools can infer carry logic for arithmetic functions • Addition (SUM <= A + B) • Subtraction (DIFF <= A - B) • Comparators (if A < B then…) • Counters (count <= count +1)
22 SLICEM
ECE 448 – FPGA and ASIC Design with VHDL 23 Xilinx Multipurpose LUT (MLUT)
132-bit6-bit SRSR
1646 x x 1 1 RRAMAM
464-in px u1t ROMLUT (logic)
The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com)
24 Single-port 64 x 1-bit RAM
25 Memories Built of Neighboring MLUTs
Memories built of 2 MLUTs:
• Single-port 128 x 1-bit RAM: RAM128x1S • Dual-port 64 x 1-bit RAM : RAM64x1D
Memories built of 4 MLUTs:
• Single-port 256 x 1-bit RAM: RAM256x1S • Dual-port 128 x 1-bit RAM: RAM128x1D • Quad-port 64 x 1-bit RAM: RAM64x1Q • Simple-dual-port 64 x 3-bit RAM: RAM64x3SDP (one address for read, one address for write)
26 Dual-port 64 x 1 RAM
• Dual-port 64 x 1-bit RAM : 64x1D • Single-port 128 x 1-bit RAM: 128x1S
ECE 448 – FPGA and ASIC Design with VHDL 27 Total Size of Distributed RAM
28 MLUT as a 32-bit Shift Register (SRL32)
ECE 448 – FPGA and ASIC Design with VHDL 29 Input/Output Blocks (IOBs)
ECE 448 – FPGA and ASIC Design with VHDL George Mason University Basic I/O Block Structure
Three-State D Q EC FF Enable Three-State Clock SR Control Set/Reset
Output D Q FF Enable EC Output Path SR
Direct Input FF Enable Input Path Registered Q D Input EC SR
ECE 448 – FPGA and ASIC Design with VHDL 31 IOB Functionality
• IOB provides interface between the package pins and CLBs • Each IOB can work as uni- or bi-directional I/O • Outputs can be forced into High Impedance • Inputs and outputs can be registered • advised for high-performance I/O • Inputs can be delayed
ECE 448 – FPGA and ASIC Design with VHDL 32 Spartan-6 Family Attributes
ECE 448 – FPGA and ASIC Design with VHDL George Mason University Spartan-6 FPGA Family Members
ECE 448 – FPGA and ASIC Design with VHDL 34 FPGA device present on the Digilent Nexys 3 board
XC6SLX16-CSG324C
Size Spartan 6 324 pins family Logic Package type Optimized (Ball Chip-Scale) Commercial temperature range 0° C – 85° C
ECE 448 – FPGA and ASIC Design with VHDL 35 FPGA Design Flow
George Mason University FPGA Design process (1)
Design and implement a simple unit permitting to speed up encryption with RC5-similar cipher with fixed key set on 8031 microcontroller. Unlike in the experiment 5, this time your unit has to be able Specification / Pseudocode to perform an encryption algorithm by itself, executing 32 rounds…..
On-paper hardware design (Block diagram & ASM chart)
VHDL description (Your Source Files)
Library IEEE; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; Functional simulation entity RC5_core is port( clock, reset, encr_decr: in std_logic; data_input: in std_logic_vector(31 downto 0); data_output: out std_logic_vector(31 downto 0); out_full: in std_logic; key_input: in std_logic_vector(31 downto 0); key_read: out std_logic; ); end AES_core;
Synthesis Post-synthesis simulation FPGA Design process (2)
Implementation Timing simulation
Configuration On chip testing Tools used in FPGA Design Flow
Functionally verified VHDL code Design
VHDL code
Xilinx XST Synplify Premier Synthesis
Netlist
Implementation Xilinx ISE
Bitstream 39 Synthesis
George Mason University Synthesis Tools
Xilinx XST Synplify Premier
… and others
41 Logic Synthesis
VHDL description Circuit netlist architecture MLU_DATAFLOW of MLU is signal A1:STD_LOGIC; signal B1:STD_LOGIC; signal Y1:STD_LOGIC; signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC; begin A1<=A when (NEG_A='0') else not A; B1<=B when (NEG_B='0') else not B; Y<=Y1 when (NEG_Y='0') else not Y1;
MUX_0<=A1 and B1; MUX_1<=A1 or B1; MUX_2<=A1 xor B1; MUX_3<=A1 xnor B1;
with (L1 & L0) select Y1<=MUX_0 when "00", MUX_1 when "01", MUX_2 when "10", MUX_3 when others; end MLU_DATAFLOW;
42 Circuit netlist (RTL view)
43 Mapping
LUT0
FF1
LUT1
FF2 LUT2
44 Implementation
George Mason University Implementation
• After synthesis the entire implementation process is performed by FPGA vendor tools
46 Implementation
47 Translation
Synthesis
Circuit Timing Constraint Editor Netlist Constraints or Text Editor
UCF User Constraint File
Translation
NGD Native Generic Database file
48 Mapping
LUT0
FF1
LUT1
FF2 LUT2
49 Placing FPGA CLB SLICES
50 Routing FPGA
Programmable Connections
51 Configuration
• Once a design is implemented, you must create a file that the FPGA can understand • This file is called a bit stream: a BIT file (.bit extension)
• The BIT file can be downloaded directly to the FPGA, or can be converted into a PROM file which stores the programming information
52 Two main stages of the FPGA Design Flow Synthesis Implementation Technology Technology dependent independent
RTL Map Place & Route Configure Synthesis
- Code analysis - Mapping of extracted logic - Placement of generated - Bitstream - Derivation of main logic structures to device primitives netlist onto the device generation constructions - Technology dependent - Choosing best interconnect - Burning device - Technology independent optimization structure for the placed optimization - Application of “synthesis design - Creation of “RTL View” constraints” - Application of “physical - Netlist generation constraints” - Creation of “Technology View” Synthesis Report Example – Resource Utilization (1) Device utilization summary: ------
Selected Device : 6slx4tqg144-3
Slice Logic Utilization: Number of Slice Registers: 53 out of 4800 1% Number of Slice LUTs: 163 out of 2400 6% Number used as Logic: 163 out of 2400 6%
Slice Logic Distribution: Number of LUT Flip Flop pairs used: 198 Number with an unused Flip Flop: 145 out of 198 73% Number with an unused LUT: 35 out of 198 17% Number of fully used LUT-FF pairs: 18 out of 198 9% Number of unique control sets: 7
54 Synthesis Report Example – Resource Utilization (2)
IO Utilization: Number of IOs: 43 Number of bonded IOBs: 43 out of 102 42%
Specific Feature Utilization: Number of BUFG/BUFGCTRLs: 1 out of 16 6% Number of DSP48A1s: 5 out of 8 62%
55 Synthesis Report Example – Timing
Timing Summary: ------Speed Grade: -3
Minimum period: 6.031ns (Maximum Frequency: 165.817MHz)
56 Map Report Example – Resource Utilization (1) Design Summary ------Slice Logic Utilization: Number of Slice Registers: 54 out of 4,800 1% Number used as Flip Flops: 53 Number used as Latches: 0 Number used as Latch-thrus: 0 Number used as AND/OR logics: 1 Number of Slice LUTs: 149 out of 2,400 6% Number used as logic: 148 out of 2,400 6% Number using O6 output only: 133 Number using O5 output only: 0 Number using O5 and O6: 15 Number used as ROM: 0 Number used as Memory: 0 out of 1,200 0% Number used exclusively as route-thrus: 1
57 Map Report Example – Resource Utilization (2)
Slice Logic Distribution: Number of occupied Slices: 58 out of 600 9% Number of MUXCYs used: 32 out of 1,200 2% Number of LUT Flip Flop pairs used: 162 Number with an unused Flip Flop: 109 out of 162 67% Number with an unused LUT: 13 out of 162 8% Number of fully used LUT-FF pairs: 40 out of 162 24% Number of unique control sets: 7 Number of slice register sites lost to control set restrictions: 35 out of 4,800 1%
IO Utilization: Number of bonded IOBs: 43 out of 102 42%
58 Map Report Example – Resource Utilization (3)
Specific Feature Utilization:
Number of RAMB16BWERs: 0 out of 12 0% Number of RAMB8BWERs: 0 out of 24 0% ……. Number of DSP48A1s: 5 out of 8 62% …….
59 Post-PAR Static Timing Report
Clock to Setup on destination clock clk_i ------+------+------+------+------+ | Src:Rise| Src:Fall| Src:Rise| Src:Fall| Source Clock |Dest:Rise|Dest:Rise|Dest:Fall|Dest:Fall| ------+------+------+------+------+ clk_i | 7.530| | | | ------+------+------+------+------+
60 PAR Report
------Constraint | Check | Worst Case | Best Case | Timing | Timing | | Slack | Achievable | Errors | Score ------Autotimespec constraint for clock net clk | SETUP | N/A| 7.530ns| N/A| 0 _i_BUFGP | HOLD | 0.457ns| | 0| 0 ------
61 Timing Report (1)
Timing constraint: Default period analysis for net "clk_i_BUFGP" 3354 paths analyzed, 309 endpoints analyzed, 0 failing endpoints 0 timing errors detected. (0 setup errors, 0 hold errors) Minimum period is 7.530ns. ------Delay (setup path): 7.530ns (data path - clock path skew + uncertainty) Source: a_register/q_o_4 (FF) Destination: x_reg_inst/q_o_3 (FF) Data Path Delay: 7.453ns (Levels of Logic = 2) Clock Path Skew: -0.042ns (0.513 - 0.555) Source Clock: clk_i_BUFGP rising Destination Clock: clk_i_BUFGP rising Clock Uncertainty: 0.035ns
62 Timing Report (2)
Maximum Data Path at Slow Process Corner: a_register/q_o_4 to x_reg_inst/q_o_3 Location Delay type Delay(ns) Physical Resource Logical Resource(s) ------SLICE_X4Y36.AQ Tcko 0.447 a_register/q_o<4> a_register/q_o_4 DSP48_X0Y3.B4 net (fanout=21) 1.194 a_register/q_o<4> DSP48_X0Y3.M3 Tdspdo_B_M 3.364 Mmult_mult_unsigned Mmult_mult_unsigned SLICE_X8Y39.C4 net (fanout=1) 2.050 mult_unsigned<3> SLICE_X8Y39.CLK Tas 0.398 x_reg_inst/q_o<3> Mmux_x_57 Mmux_x_4_f7_2 Mmux_x_2_f8_2 x_reg_inst/q_o_3 ------Total 7.453ns (4.209ns logic, 3.244ns route) (56.5% logic, 43.5% route)
63 Timing Report (3)
------Delay (setup path): 7.484ns (data path - clock path skew + uncertainty) Source: a_register/q_o_7_1 (FF) Destination: x_reg_inst/q_o_3 (FF) Data Path Delay: 7.391ns (Levels of Logic = 2) Clock Path Skew: -0.058ns (0.513 - 0.571) Source Clock: clk_i_BUFGP rising Destination Clock: clk_i_BUFGP rising Clock Uncertainty: 0.035ns
Clock Uncertainty: 0.035ns ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2 + PE Total System Jitter (TSJ): 0.070ns Total Input Jitter (TIJ): 0.000ns Discrete Jitter (DJ): 0.000ns Phase Error (PE): 0.000ns
64 Timing Report (4)
Maximum Data Path at Slow Process Corner: a_register/q_o_7_1 to x_reg_inst/ q_o_3 Location Delay type Delay(ns) Physical Resource Logical Resource(s) ------SLICE_X2Y33.AQ Tcko 0.447 a_register/q_o_7_2 a_register/q_o_7_1 DSP48_X0Y3.B7 net (fanout=13) 1.132 a_register/q_o_7_1 DSP48_X0Y3.M3 Tdspdo_B_M 3.364 Mmult_mult_unsigned Mmult_mult_unsigned SLICE_X8Y39.C4 net (fanout=1) 2.050 mult_unsigned<3> SLICE_X8Y39.CLK Tas 0.398 x_reg_inst/q_o<3> Mmux_x_57 Mmux_x_4_f7_2 Mmux_x_2_f8_2 x_reg_inst/q_o_3 ------Total 7.391ns (4.209ns logic, 3.182ns route) (56.9% logic, 43.1% route)
65