ECE 448 Lecture 5
FPGA Devices
ECE 448 – FPGA and ASIC Design with VHDL George Mason University Required reading
• 7 Series FPGAs Configurable Logic Block: User Guide
§ Overview § Functional Details
2 What is an FPGA?
Configurable Logic Blocks
Block RAMs Block RAMs I/O Blocks
Block RAMs
ECE 448 – FPGA and ASIC Design with VHDL 3 Modern FPGA
RAMRAM bblockslocks Multipliers/DSPMultipliers units LogicLog resourcesic blocks
(#Logic resources, #Multipliers/DSP units, #RAM_blocks)
Graphics based on The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com) 4 Major FPGA Vendors SRAM-based FPGAs • Xilinx, Inc. ~ 51% of the market ~ 85% • Intel ~ 34% of the market (until 2015, Altera Corp.) • Lattice Semiconductor • Atmel (since 2016, subsidiary of Microchip Technology) • Achronix Semiconductor • Tabula (went out of business in 2015) Flash & antifuse FPGAs • Microsemi SoC Products Group (until 2010 Actel) • Quick Logic Corp.
ECE 448 – FPGA and ASIC Design with VHDL 5 Xilinx
u Primary products: FPGAs and the associated CAD software
Programmable CAD Software Logic Devices
u Main headquarters in San Jose, CA
u Fabless* Semiconductor and Software Company
u TSMC (Taiwan)
u UMC (Taiwan) {*Xilinx acquired an equity stake in UMC in 1996}
u Seiko Epson (Japan)
u Samsung (Korea)
ECE 448 – FPGA and ASIC Design with VHDL 6 Xilinx FPGA Families Technology Low-cost Mid-range High-performance 220 nm Virtex 180 nm Spartan-II, Spartan-IIE 120/150 nm Virtex-II, Virtex-II Pro 90 nm Spartan-3 Virtex-4 65 nm Virtex-5 45 nm Spartan-6 40 nm Virtex-6 28 nm Artix-7 Kintex-7 Virtex-7 20 nm Kintex Virtex UltraSCALE UltraSCALE 16 nm Kintex Virtex UltraSCALE+ UltraSCALE+ FPGA Family
8 Artix-7 FPGA Family
ECE 448 – FPGA and ASIC Design with VHDL 9 CLB Structure
ECE 448 – FPGA and ASIC Design with VHDL George Mason University General structure of an FPGA
Programmable interconnect
Programmable logic blocks
The Design Warrior s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com) ECE 448 – FPGA and ASIC Design with VHDL 11 Xilinx Artix-7 CLB
ECE 448 – FPGA and ASIC Design with VHDL 12 Row & Column Relationship Between CLBs & Slices
ECE 448 – FPGA and ASIC Design with VHDL 13 Chapter 2: Functional Details
X-Ref Target - Figure 2-4
SRHI D COUT SRLO INIT1 Q Reset Type CE INIT0 CK SR Sync/Async FF/LAT DX DMUX
D6:1 A6:A1 D O6 D O5 FF/LAT DX INIT1 Q DQ D INIT0 SRHI SRHI CE SRLO D SRLO CK Basic SR INIT1 Q CE INIT0 CK SR
CX Components CMUX C6:1 A6:A1 C C O6 O5 FF/LAT CX INIT1 Q CQ of the Slice D INIT0 SRHI CE SRHI D SRLO SRLO CK INIT1 Q SR CE INIT0 CK SR
BX BMUX LUTs B6:1 A6:A1 B B O6 FF/LAT O5 BX INIT1 Q BQ D INIT0 CE SRHI SRLO SRHI CK D SRLO SR INIT1 Q CE INIT0 CK SR
AX AMUX
Storage A6:1 A6:A1 A A O6 FF/LAT O5 AX INIT1 Q AQ Elements D INIT0 CE SRHI SRLO CK SR
0/1 SR CE CLK 14 CIN UG474_c2_03_101210
Figure 2-4: Diagram of SLICEL
20 www.xilinx.com 7Series FPGAs CLB User Guide UG474 (v1.7) November 17, 2014 Example of a 4-input LUT (Look-Up Table) (used in earlier families of FPGAs)
• Look-Up tables x1 x 2 y x x x x y x3 LUT x x x x y are primary 1 2 3 4 x 1 2 3 4 0 0 0 0 1 4 0 0 0 0 0 0 0 0 1 1 0 0 0 1 1 elements for 0 0 1 0 1 0 0 1 0 0 0 0 1 1 1 0 0 1 1 0 logic 0 1 0 0 1 0 1 0 0 0 0 1 0 1 1 0 1 0 1 1 0 1 1 0 1 0 1 1 0 0 implementation 0 1 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 0 0 0 • Each LUT can 1 0 0 1 1 1 0 0 1 1 1 0 1 0 1 1 0 1 0 0 1 0 1 1 1 1 0 1 1 0 implement any 1 1 0 0 0 1 1 0 0 1 1 1 0 1 0 1 1 0 1 1 x x x x function of 1 1 1 0 0 1 2 3 4 1 1 1 0 0 1 1 1 1 0 1 1 1 1 0 4 inputs
x1 x2
y
y
ECE 448 – FPGA and ASIC Design with VHDL 15 LUT of Artix-7
ECE 448 – FPGA and ASIC Design with VHDL 16 Chapter 2: Functional Details
There are four additional storage elements that can only be configured as edge-triggered D-type flip-flops. The D input can be driven by the O5 output of the LUT or the BYPASS slice inputs via AX, BX, CX, or DX input. When the original four storage elements are configured as latches, these four additional storage elements cannot be used. Figure 2-5 shows both the register only and the register/latch configuration in a slice.
X-Ref Target - Figure 2-5 DFF DFF/LATCH LUT D O5 Output LUT D Output FF INIT1 LATCH Q DQ INIT1 Q DQ D INIT0 D SRHIGH INIT0 DX CE SRLOW DX CE SRHIGH CK CK SRLOW SR SR
CFF CFF/LATCH LUT C O5 Output LUT C Output FF INIT1 LATCH Q CQ INIT1 Q CQ D INIT0 D SRHIGH INIT0 CX CE SRLOW CX CE SRHIGH CK CK SRLOW SR SR
Reset Type Reset Type SR SR Sync Sync LUT B O5 Output LUT B Output Async Async BFF BFF/LATCH FF INIT1 LATCH Q BQ Q BQ BX INIT0 BX INIT1 D SRHIGH D INIT0 CE CE SRLOW CE CE SRHIGH CK CK SRLOW CLK SR CLK SR
AFF AFF/LATCH LUT A O5 Output LUT A Output FF INIT1 LATCH INIT0 Q AQ INIT1 Q AQ D SRHIGH D INIT0 AX CE SRLOW AX CE SRHIGH CK CK SRLOW SR SR
17 UG474_c2_04_101210
Figure 2-5: Two Versions of Configuration in a Slice: 4 Registers Only and 4 Register/Latch
Control Signals The control signals clock (CLK), clock enable (CE), and set/reset (SR) are common to all storage elements in one slice. When one flip-flop in a slice has SR or CE enabled, the other flip-flops used in the slice also have SR or CE enabled by the common signal. Only the CLK signal has programmable polarity. Any inverter placed on the clock signal is automatically absorbed. The CE and SR signals are active-High. These initialization options are available for storage elements: •SRLOW: Synchronous or asynchronous Reset when CLB SR signal is asserted •SRHIGH: Synchronous or asynchronous Set when CLB SR signal is asserted
22 www.xilinx.com 7Series FPGAs CLB User Guide UG474 (v1.7) November 17, 2014 Reset and Set Configurations
• No set or reset • Synchronous set • Synchronous reset • Asynchronous set (preset) • Asynchronous reset (clear)
ECE 448 – FPGA and ASIC Design with VHDL 18 Two Different Types of Slices in Artix-7
ECE 448 – FPGA and ASIC Design with VHDL 19 Chapter 2: Functional Details
X-Ref Target - Figure 2-4
SRHI D COUT SRLO INIT1 Q Reset Type CE INIT0 CK SR Sync/Async SLICEL FF/LAT DX DMUX
D6:1 A6:A1 D O6 D O5 FF/LAT DX INIT1 Q DQ D INIT0 SRHI SRHI CE SRLO D SRLO CK INIT1 Q SR CE INIT0 CK SR
CX CMUX
C6:1 A6:A1 C C O6 O5 FF/LAT CX INIT1 Q CQ D INIT0 SRHI CE SRHI D SRLO SRLO CK INIT1 Q SR CE INIT0 CK SR
BX BMUX
B6:1 A6:A1 B B O6 FF/LAT O5 BX INIT1 Q BQ D INIT0 CE SRHI SRLO SRHI CK D SRLO SR INIT1 Q CE INIT0 CK SR
AX AMUX
A6:1 A6:A1 A A O6 FF/LAT O5 AX INIT1 Q AQ D INIT0 CE SRHI SRLO CK SR
0/1 SR CE CLK 20 CIN UG474_c2_03_101210
Figure 2-4: Diagram of SLICEL
20 www.xilinx.com 7Series FPGAs CLB User Guide UG474 (v1.7) November 17, 2014 Fast Carry Logic
u Each SliceL and SliceM contains separate logic and routing for the fast generation MSB of sum & carry signals • Increases efficiency and performance of adders, subtractors, accumulators, comparators, and counters Routing Carry Logic Carry u Carry logic is independent of LSB normal logic and routing resources
21 Accessing Carry Logic
u All major synthesis tools can infer carry logic for arithmetic functions • Addition (SUM <= A + B) • Subtraction (DIFF <= A - B) • Comparators (if A < B then…) • Counters (count <= count +1)
22 Carry Logic
Carry Logic In addition to function generators, dedicated fast lookahead carry logic is provided to perform fast arithmetic addition and subtraction in a slice. A 7 series FPGA CLB has two separate carry chains, as shown in Figure 1-1. The carry chains are cascadable to form wider add/subtract logic, as shown in Figure 2-2. The carry chain runs upward and has a height of four bits per slice. For each bit, there is a carry multiplexer (MUXCY) and a dedicated XOR gate for adding/subtracting the operands with a selected carry bits. The dedicated carry path and carry multiplexer (MUXCY) can also be used to cascade function generators for implementing wide logic functions. Figure 2-24 illustrates the carry chain with associated logic in a slice.
X-Ref Target - Figure 2-24 COUT (To Next Slice) Carry Chain Block (CARRY4) CO3 DMUX/DQ* S3 O6 From LUTD MUXCY O3 DMUX O5 From LUTD DI3 DX DQ DQ
(Optional) CO2 CMUX/CQ* S2 O6 From LUTC MUXCY O2 CMUX O5 From LUTC DI2 CX DQ CQ
(Optional) CO1 BMUX/BQ* S1 O6 From LUTB MUXCY O1 BMUX O5 From LUTB DI1 BX DQ BQ
(Optional) CO0 AMUX/AQ* S0 O6 From LUTA MUXCY O0 AMUX O5 From LUTA DI0 AX DQ AQ
CYINIT CIN (Optional)
*Can be used if 0 1 unregistered/registered outputs are free. CIN (From Previous Slice) 23 UG474_c2_23_071813
Figure 2-24: Fast Carry Logic Path and Associated Elements
7Series FPGAs CLB User Guide www.xilinx.com 43 UG474 (v1.7) November 17, 2014 Slice Description
X-Ref Target - Figure 2-3
SRHI D SRLO Reset Type INIT1 Q CE Sync/Async COUT INIT0 CK SLICEM SR FF/LAT DX DMUX DI2 D6:1 A6:A1 W6:W1 D D O6 FF/LAT O5 DX INIT1 Q DQ D INIT0 CK DI1 SRHI CE SRLO WEN MC31 SRHI D SRLO CK Q SR DI INIT1 CE INIT0 CK SR
CX CMUX DI2 C6:1 A6:A1 W6:W1 C C O6 O5 FF/LAT CX INIT1 Q CQ D INIT0 CK DI1 CE SRHI SRLO WEN MC31 SRHI CK D SRLO SR CI INIT1 Q CE INIT0 CK SR
BX BMUX DI2 B6:1 A6:A1 W6:W1 B B O6 O5 FF/LAT BX INIT1 Q BQ D DI1 INIT0 CK CE SRHI SRLO WEN MC31 SRHI CK D SRLO SR BI INIT1 Q CE INIT0 CK SR
AX AMUX DI2 A6:1 A6:A1 W6:W1 A A O6 O5 FF/LAT AX INIT1 Q AQ D INIT0 DI1 CK CE SRHI SRLO WEN MC31 CK SR AI 0/1 SR CE CLK CK WEN WE CIN UG474_c2_02_11051024
Figure 2-3: Diagram of SLICEM
7Series FPGAs CLB User Guide www.xilinx.com 19 UG474 (v1.7) November 17, 2014 Xilinx Multipurpose LUT (MLUT)
1326-bbitit SRSR
1646 x x 1 1 RRAMAM
464-in px u1t ROMLUT (logic)
The Design Warrior s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com)
25 Single-port 64 x 1-bit RAM
26 Single-port 64 x 1-bit RAM
27 Memories Built of Neighboring MLUTs
Memories built of 2 MLUTs:
• Single-port 128 x 1-bit RAM: RAM128x1S • Dual-port 64 x 1-bit RAM : RAM64x1D
Memories built of 4 MLUTs:
• Single-port 256 x 1-bit RAM: RAM256x1S • Dual-port 128 x 1-bit RAM: RAM128x1D • Quad-port 64 x 1-bit RAM: RAM64x1Q • Simple-dual-port 64 x 3-bit RAM: RAM64x3SDP (one address for read, one address for write)
28 Dual-port 64 x 1 RAM
• Dual-port 64 x 1-bit RAM : 64x1D • Single-port 128 x 1-bit RAM: 128x1S
29 Dual-port 64 x 1 RAM
• Dual-port 64 x 1-bit RAM : 64x1D • Single-port 128 x 1-bit RAM: 128x1S
ECE 448 – FPGA and ASIC Design with VHDL 30 Total Size of Distributed RAM in Artix-7
31 MLUT as a 32-bit Shift Register (SRL32)
ECE 448 – FPGA and ASIC Design with VHDL 32 Input/Output Blocks (IOBs)
ECE 448 – FPGA and ASIC Design with VHDL George Mason University Basic I/O Block Structure
Three-State D Q EC FF Enable Three-State Clock SR Control Set/Reset
Output D Q FF Enable EC Output Path SR
Direct Input FF Enable Input Path Registered Q D Input EC SR
ECE 448 – FPGA and ASIC Design with VHDL 34 IOB Functionality
• IOB provides interface between the package pins and CLBs • Each IOB can work as uni- or bi-directional I/O • Outputs can be forced into High Impedance • Inputs and outputs can be registered • advised for high-performance I/O • Inputs can be delayed
ECE 448 – FPGA and ASIC Design with VHDL 35 Family Attributes
ECE 448 – FPGA and ASIC Design with VHDL George Mason University Artix-7 FPGA Family
ECE 448 – FPGA and ASIC Design with VHDL 37 FPGA device present on the Digilent Nexys 4 DDR board
XC7A35T- 1CPG236C
Speed Grade Size Artix-7 236 pins family Package type
Commercial temperature range 0 C – 85 C
ECE 448 – FPGA and ASIC Design with VHDL 38 FPGA Design Process
39 FPGA Design process (1)
Design and implement a simple unit permitting to speed up encryption with RC5-similar cipher with fixed key set on 8031 microcontroller. Unlike in the experiment 5, this time your unit has to be able Specification / Pseudocode to perform an encryption algorithm by itself, executing 32 rounds…..
On-paper hardware design (Block diagram & ASM chart)
VHDL description (Your Source Files)
Library IEEE; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; Functional simulation entity RC5_core is port( clock, reset, encr_decr: in std_logic; data_input: in std_logic_vector(31 downto 0); data_output: out std_logic_vector(31 downto 0); out_full: in std_logic; key_input: in std_logic_vector(31 downto 0); key_read: out std_logic; ); end AES_core;
Synthesis Post-synthesis simulation FPGA Design process (2)
Implementation Timing simulation
Results Configuration On chip testing Synthesis
George Mason University Logic Synthesis
VHDL description Circuit netlist architecture MLU_DATAFLOW of MLU is signal A1:STD_LOGIC; signal B1:STD_LOGIC; signal Y1:STD_LOGIC; signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC; begin A1<=A when (NEG_A='0') else not A; B1<=B when (NEG_B='0') else not B; Y<=Y1 when (NEG_Y='0') else not Y1;
MUX_0<=A1 and B1; MUX_1<=A1 or B1; MUX_2<=A1 xor B1; MUX_3<=A1 xnor B1;
with (L1 & L0) select Y1<=MUX_0 when "00", MUX_1 when "01", MUX_2 when "10", MUX_3 when others; end MLU_DATAFLOW;
43 Circuit netlist (RTL view)
44 Implementation
George Mason University Mapping
LUT0
FF1
LUT1
FF2 LUT2
46 Placing FPGA CLB SLICES
47 Routing FPGA
Programmable Connections
48 Two main stages of the FPGA Design Flow Synthesis Implementation
Technology Technology dependent independent
RTL Map Place & Route Configure Synthesis
- Code analysis - Mapping of extracted logic - Placement of generated - Bitstream - Derivation of main logic structures to device primitives netlist onto the device generation constructions - Technology dependent -Choosing best interconnect - Burning device - Technology independent optimization structure for the placed optimization - Application of synthesis design - Creation of RTL View constraints -Application of physical -Netlist generation constraints - Creation of Technology View