ECE 448 Lecture 5

FPGA Devices

ECE 448 – FPGA and ASIC Design with VHDL George Mason University Required reading

• 7 Series FPGAs Configurable : User Guide

§ Overview § Functional Details

2 What is an FPGA?

Configurable Logic Blocks

Block RAMs Block RAMs I/O Blocks

Block RAMs

ECE 448 – FPGA and ASIC Design with VHDL 3 Modern FPGA

RAMRAM bblockslocks Multipliers/DSPMultipliers units LogicLog resourcesic blocks

(#Logic resources, #Multipliers/DSP units, #RAM_blocks)

Graphics based on The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Corp. (www.mentor.com) 4 Major FPGA Vendors SRAM-based FPGAs • , Inc. ~ 51% of the market ~ 85% • ~ 34% of the market (until 2015, Corp.) • (since 2016, subsidiary of ) • Semiconductor • (went out of business in 2015) Flash & antifuse FPGAs • Microsemi SoC Products Group (until 2010 ) • Quick Logic Corp.

ECE 448 – FPGA and ASIC Design with VHDL 5 Xilinx

u Primary products: FPGAs and the associated CAD software

Programmable CAD Software Logic Devices

u Main headquarters in San Jose, CA

u Fabless* Semiconductor and Software Company

u TSMC (Taiwan)

u UMC (Taiwan) {*Xilinx acquired an equity stake in UMC in 1996}

u Seiko Epson (Japan)

u Samsung (Korea)

ECE 448 – FPGA and ASIC Design with VHDL 6 Xilinx FPGA Families Technology Low-cost Mid-range High-performance 220 nm Virtex 180 nm Spartan-II, Spartan-IIE 120/150 nm Virtex-II, Virtex-II Pro 90 nm Spartan-3 Virtex-4 65 nm Virtex-5 45 nm Spartan-6 40 nm Virtex-6 28 nm Artix-7 Kintex-7 Virtex-7 20 nm Kintex Virtex UltraSCALE UltraSCALE 16 nm Kintex Virtex UltraSCALE+ UltraSCALE+ FPGA Family

8 Artix-7 FPGA Family

ECE 448 – FPGA and ASIC Design with VHDL 9 CLB Structure

ECE 448 – FPGA and ASIC Design with VHDL George Mason University General structure of an FPGA

Programmable interconnect

Programmable logic blocks

The Design Warriors Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com) ECE 448 – FPGA and ASIC Design with VHDL 11 Xilinx Artix-7 CLB

ECE 448 – FPGA and ASIC Design with VHDL 12 Row & Column Relationship Between CLBs & Slices

ECE 448 – FPGA and ASIC Design with VHDL 13 Chapter 2: Functional Details

X-Ref Target - Figure 2-4

SRHI D COUT SRLO INIT1 Q Reset Type CE INIT0 CK SR Sync/Async FF/LAT DX DMUX

D6:1 A6:A1 D O6 D O5 FF/LAT DX INIT1 Q DQ D INIT0 SRHI SRHI CE SRLO D SRLO CK Basic SR INIT1 Q CE INIT0 CK SR

CX Components CMUX C6:1 A6:A1 C C O6 O5 FF/LAT CX INIT1 Q CQ of the Slice D INIT0 SRHI CE SRHI D SRLO SRLO CK INIT1 Q SR CE INIT0 CK SR

BX BMUX LUTs B6:1 A6:A1 B B O6 FF/LAT O5 BX INIT1 Q BQ D INIT0 CE SRHI SRLO SRHI CK D SRLO SR INIT1 Q CE INIT0 CK SR

AX AMUX

Storage A6:1 A6:A1 A A O6 FF/LAT O5 AX INIT1 Q AQ Elements D INIT0 CE SRHI SRLO CK SR

0/1 SR CE CLK 14 CIN UG474_c2_03_101210

Figure 2-4: Diagram of SLICEL

20 www.xilinx.com 7Series FPGAs CLB User Guide UG474 (v1.7) November 17, 2014 Example of a 4-input LUT (Look-Up Table) (used in earlier families of FPGAs)

• Look-Up tables x1 x 2 y x x x x y x3 LUT x x x x y are primary 1 2 3 4 x 1 2 3 4 0 0 0 0 1 4 0 0 0 0 0 0 0 0 1 1 0 0 0 1 1 elements for 0 0 1 0 1 0 0 1 0 0 0 0 1 1 1 0 0 1 1 0 logic 0 1 0 0 1 0 1 0 0 0 0 1 0 1 1 0 1 0 1 1 0 1 1 0 1 0 1 1 0 0 implementation 0 1 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 0 0 0 • Each LUT can 1 0 0 1 1 1 0 0 1 1 1 0 1 0 1 1 0 1 0 0 1 0 1 1 1 1 0 1 1 0 implement any 1 1 0 0 0 1 1 0 0 1 1 1 0 1 0 1 1 0 1 1 x x x x function of 1 1 1 0 0 1 2 3 4 1 1 1 0 0 1 1 1 1 0 1 1 1 1 0 4 inputs

x1 x2

y

y

ECE 448 – FPGA and ASIC Design with VHDL 15 LUT of Artix-7

ECE 448 – FPGA and ASIC Design with VHDL 16 Chapter 2: Functional Details

There are four additional storage elements that can only be configured as edge-triggered D-type flip-flops. The D input can be driven by the O5 output of the LUT or the BYPASS slice inputs via AX, BX, CX, or DX input. When the original four storage elements are configured as latches, these four additional storage elements cannot be used. Figure 2-5 shows both the register only and the register/latch configuration in a slice.

X-Ref Target - Figure 2-5 DFF DFF/LATCH LUT D O5 Output LUT D Output FF INIT1 LATCH Q DQ INIT1 Q DQ D INIT0 D SRHIGH INIT0 DX CE SRLOW DX CE SRHIGH CK CK SRLOW SR SR

CFF CFF/LATCH LUT C O5 Output LUT C Output FF INIT1 LATCH Q CQ INIT1 Q CQ D INIT0 D SRHIGH INIT0 CX CE SRLOW CX CE SRHIGH CK CK SRLOW SR SR

Reset Type Reset Type SR SR Sync Sync LUT B O5 Output LUT B Output Async Async BFF BFF/LATCH FF INIT1 LATCH Q BQ Q BQ BX INIT0 BX INIT1 D SRHIGH D INIT0 CE CE SRLOW CE CE SRHIGH CK CK SRLOW CLK SR CLK SR

AFF AFF/LATCH LUT A O5 Output LUT A Output FF INIT1 LATCH INIT0 Q AQ INIT1 Q AQ D SRHIGH D INIT0 AX CE SRLOW AX CE SRHIGH CK CK SRLOW SR SR

17 UG474_c2_04_101210

Figure 2-5: Two Versions of Configuration in a Slice: 4 Registers Only and 4 Register/Latch

Control Signals The control signals clock (CLK), clock enable (CE), and set/reset (SR) are common to all storage elements in one slice. When one flip-flop in a slice has SR or CE enabled, the other flip-flops used in the slice also have SR or CE enabled by the common signal. Only the CLK signal has programmable polarity. Any inverter placed on the clock signal is automatically absorbed. The CE and SR signals are active-High. These initialization options are available for storage elements: •SRLOW: Synchronous or asynchronous Reset when CLB SR signal is asserted •SRHIGH: Synchronous or asynchronous Set when CLB SR signal is asserted

22 www.xilinx.com 7Series FPGAs CLB User Guide UG474 (v1.7) November 17, 2014 Reset and Set Configurations

• No set or reset • Synchronous set • Synchronous reset • Asynchronous set (preset) • Asynchronous reset (clear)

ECE 448 – FPGA and ASIC Design with VHDL 18 Two Different Types of Slices in Artix-7

ECE 448 – FPGA and ASIC Design with VHDL 19 Chapter 2: Functional Details

X-Ref Target - Figure 2-4

SRHI D COUT SRLO INIT1 Q Reset Type CE INIT0 CK SR Sync/Async SLICEL FF/LAT DX DMUX

D6:1 A6:A1 D O6 D O5 FF/LAT DX INIT1 Q DQ D INIT0 SRHI SRHI CE SRLO D SRLO CK INIT1 Q SR CE INIT0 CK SR

CX CMUX

C6:1 A6:A1 C C O6 O5 FF/LAT CX INIT1 Q CQ D INIT0 SRHI CE SRHI D SRLO SRLO CK INIT1 Q SR CE INIT0 CK SR

BX BMUX

B6:1 A6:A1 B B O6 FF/LAT O5 BX INIT1 Q BQ D INIT0 CE SRHI SRLO SRHI CK D SRLO SR INIT1 Q CE INIT0 CK SR

AX AMUX

A6:1 A6:A1 A A O6 FF/LAT O5 AX INIT1 Q AQ D INIT0 CE SRHI SRLO CK SR

0/1 SR CE CLK 20 CIN UG474_c2_03_101210

Figure 2-4: Diagram of SLICEL

20 www.xilinx.com 7Series FPGAs CLB User Guide UG474 (v1.7) November 17, 2014 Fast Carry Logic

u Each SliceL and SliceM contains separate logic and routing for the fast generation MSB of sum & carry signals • Increases efficiency and performance of adders, subtractors, accumulators, comparators, and counters Routing Carry Logic Carry u Carry logic is independent of LSB normal logic and routing resources

21 Accessing Carry Logic

u All major synthesis tools can infer carry logic for arithmetic functions • Addition (SUM <= A + B) • Subtraction (DIFF <= A - B) • Comparators (if A < B then…) • Counters (count <= count +1)

22 Carry Logic

Carry Logic In addition to function generators, dedicated fast lookahead carry logic is provided to perform fast arithmetic addition and subtraction in a slice. A 7 series FPGA CLB has two separate carry chains, as shown in Figure 1-1. The carry chains are cascadable to form wider add/subtract logic, as shown in Figure 2-2. The carry chain runs upward and has a height of four bits per slice. For each bit, there is a carry multiplexer (MUXCY) and a dedicated XOR gate for adding/subtracting the operands with a selected carry bits. The dedicated carry path and carry multiplexer (MUXCY) can also be used to cascade function generators for implementing wide logic functions. Figure 2-24 illustrates the carry chain with associated logic in a slice.

X-Ref Target - Figure 2-24 COUT (To Next Slice) Carry Chain Block (CARRY4) CO3 DMUX/DQ* S3 O6 From LUTD MUXCY O3 DMUX O5 From LUTD DI3 DX DQ DQ

(Optional) CO2 CMUX/CQ* S2 O6 From LUTC MUXCY O2 CMUX O5 From LUTC DI2 CX DQ CQ

(Optional) CO1 BMUX/BQ* S1 O6 From LUTB MUXCY O1 BMUX O5 From LUTB DI1 BX DQ BQ

(Optional) CO0 AMUX/AQ* S0 O6 From LUTA MUXCY O0 AMUX O5 From LUTA DI0 AX DQ AQ

CYINIT CIN (Optional)

*Can be used if 0 1 unregistered/registered outputs are free. CIN (From Previous Slice) 23 UG474_c2_23_071813

Figure 2-24: Fast Carry Logic Path and Associated Elements

7Series FPGAs CLB User Guide www.xilinx.com 43 UG474 (v1.7) November 17, 2014 Slice Description

X-Ref Target - Figure 2-3

SRHI D SRLO Reset Type INIT1 Q CE Sync/Async COUT INIT0 CK SLICEM SR FF/LAT DX DMUX DI2 D6:1 A6:A1 W6:W1 D D O6 FF/LAT O5 DX INIT1 Q DQ D INIT0 CK DI1 SRHI CE SRLO WEN MC31 SRHI D SRLO CK Q SR DI INIT1 CE INIT0 CK SR

CX CMUX DI2 C6:1 A6:A1 W6:W1 C C O6 O5 FF/LAT CX INIT1 Q CQ D INIT0 CK DI1 CE SRHI SRLO WEN MC31 SRHI CK D SRLO SR CI INIT1 Q CE INIT0 CK SR

BX BMUX DI2 B6:1 A6:A1 W6:W1 B B O6 O5 FF/LAT BX INIT1 Q BQ D DI1 INIT0 CK CE SRHI SRLO WEN MC31 SRHI CK D SRLO SR BI INIT1 Q CE INIT0 CK SR

AX AMUX DI2 A6:1 A6:A1 W6:W1 A A O6 O5 FF/LAT AX INIT1 Q AQ D INIT0 DI1 CK CE SRHI SRLO WEN MC31 CK SR AI 0/1 SR CE CLK CK WEN WE CIN UG474_c2_02_11051024

Figure 2-3: Diagram of SLICEM

7Series FPGAs CLB User Guide www.xilinx.com 19 UG474 (v1.7) November 17, 2014 Xilinx Multipurpose LUT (MLUT)

1326-bbitit SRSR

1646 x x 1 1 RRAMAM

464-in px u1t ROMLUT (logic)

The Design Warriors Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com)

25 Single-port 64 x 1-bit RAM

26 Single-port 64 x 1-bit RAM

27 Memories Built of Neighboring MLUTs

Memories built of 2 MLUTs:

• Single-port 128 x 1-bit RAM: RAM128x1S • Dual-port 64 x 1-bit RAM : RAM64x1D

Memories built of 4 MLUTs:

• Single-port 256 x 1-bit RAM: RAM256x1S • Dual-port 128 x 1-bit RAM: RAM128x1D • Quad-port 64 x 1-bit RAM: RAM64x1Q • Simple-dual-port 64 x 3-bit RAM: RAM64x3SDP (one address for read, one address for write)

28 Dual-port 64 x 1 RAM

• Dual-port 64 x 1-bit RAM : 64x1D • Single-port 128 x 1-bit RAM: 128x1S

29 Dual-port 64 x 1 RAM

• Dual-port 64 x 1-bit RAM : 64x1D • Single-port 128 x 1-bit RAM: 128x1S

ECE 448 – FPGA and ASIC Design with VHDL 30 Total Size of Distributed RAM in Artix-7

31 MLUT as a 32-bit Shift Register (SRL32)

ECE 448 – FPGA and ASIC Design with VHDL 32 Input/Output Blocks (IOBs)

ECE 448 – FPGA and ASIC Design with VHDL George Mason University Basic I/O Block Structure

Three-State D Q EC FF Enable Three-State Clock SR Control Set/Reset

Output D Q FF Enable EC Output Path SR

Direct Input FF Enable Input Path Registered Q D Input EC SR

ECE 448 – FPGA and ASIC Design with VHDL 34 IOB Functionality

• IOB provides interface between the package pins and CLBs • Each IOB can work as uni- or bi-directional I/O • Outputs can be forced into High Impedance • Inputs and outputs can be registered • advised for high-performance I/O • Inputs can be delayed

ECE 448 – FPGA and ASIC Design with VHDL 35 Family Attributes

ECE 448 – FPGA and ASIC Design with VHDL George Mason University Artix-7 FPGA Family

ECE 448 – FPGA and ASIC Design with VHDL 37 FPGA device present on the Digilent Nexys 4 DDR board

XC7A35T- 1CPG236C

Speed Grade Size Artix-7 236 pins family Package type

Commercial temperature range 0C – 85 C

ECE 448 – FPGA and ASIC Design with VHDL 38 FPGA Design Process

39 FPGA Design process (1)

Design and implement a simple unit permitting to speed up encryption with RC5-similar cipher with fixed key set on 8031 microcontroller. Unlike in the experiment 5, this time your unit has to be able Specification / Pseudocode to perform an encryption algorithm by itself, executing 32 rounds…..

On-paper hardware design (Block diagram & ASM chart)

VHDL description (Your Source Files)

Library IEEE; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; Functional simulation entity RC5_core is port( clock, reset, encr_decr: in std_logic; data_input: in std_logic_vector(31 downto 0); data_output: out std_logic_vector(31 downto 0); out_full: in std_logic; key_input: in std_logic_vector(31 downto 0); key_read: out std_logic; ); end AES_core;

Synthesis Post-synthesis simulation FPGA Design process (2)

Implementation Timing simulation

Results Configuration On chip testing Synthesis

George Mason University Logic Synthesis

VHDL description Circuit netlist architecture MLU_DATAFLOW of MLU is signal A1:STD_LOGIC; signal B1:STD_LOGIC; signal Y1:STD_LOGIC; signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC; begin A1<=A when (NEG_A='0') else not A; B1<=B when (NEG_B='0') else not B; Y<=Y1 when (NEG_Y='0') else not Y1;

MUX_0<=A1 and B1; MUX_1<=A1 or B1; MUX_2<=A1 xor B1; MUX_3<=A1 xnor B1;

with (L1 & L0) select Y1<=MUX_0 when "00", MUX_1 when "01", MUX_2 when "10", MUX_3 when others; end MLU_DATAFLOW;

43 Circuit netlist (RTL view)

44 Implementation

George Mason University Mapping

LUT0

FF1

LUT1

FF2 LUT2

46 Placing FPGA CLB SLICES

47 Routing FPGA

Programmable Connections

48 Two main stages of the FPGA Design Flow Synthesis Implementation

Technology Technology dependent independent

RTL Map Place & Route Configure Synthesis

- Code analysis - Mapping of extracted logic - Placement of generated - Bitstream - Derivation of main logic structures to device primitives netlist onto the device generation constructions - Technology dependent -Choosing best interconnect - Burning device - Technology independent optimization structure for the placed optimization - Application of synthesis design - Creation of RTL View constraints -Application of physical -Netlist generation constraints - Creation of Technology View