ECE 448 Lecture 7
FPGA Devices & FPGA Design Flow
George Mason University Required reading
• P. Chu, RTL Hardware Design using VHDL
Chapter 1, Introduction to Digital System Design
• Spartan-6 FPGA CLB, User Guide
§ CLB Overview § Slice Description
2 Two competing implementation approaches
ASIC FPGA Application Specific Field Programmable Integrated Circuit Gate Array
• designed all the way • no physical layout design; from behavioral description design ends with to physical layout a bitstream used to configure a device • designs must be sent for expensive and time • bought off the shelf consuming fabrication and reconfigured by in semiconductor foundry designers themselves
3 Which Way to Go? ASICs FPGAs
Off-the-shelf High performance Low development cost Low power Short time to market Low cost in high volumes Reconfigurability
4 What is an FPGA?
Configurable Logic Blocks Block RAMs Block RAMs Block I/O Blocks
Block RAMs
5 Modern FPGA
RAMRAM bblockslocks Multipliers/DSPMultipliers units LogicLog resourcesic blocks
(#Logic resources, #Multipliers/DSP units, #RAM_blocks)
Graphics based on The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com) 6 Major FPGA Vendors
SRAM-based FPGAs • Xilinx, Inc. ~ 51% of the market ~ 85% • Altera Corp. ~ 34% of the market • Lattice Semiconductor • Atmel
Flash & antifuse FPGAs • Actel Corp. (Microsemi SoC Products Group) • Quick Logic Corp.
7 Xilinx u Primary products: FPGAs and the associated CAD software
Programmable Logic Devices ISE Alliance and Foundation Series Design Software u Main headquarters in San Jose, CA u Fabless* Semiconductor and Software Company
u UMC (Taiwan) {*Xilinx acquired an equity stake in UMC in 1996}
u Seiko Epson (Japan)
u TSMC (Taiwan)
u Samsung (Korea)
8 Xilinx FPGA Families Technology Low-cost High- performance 220 nm Virtex 180 nm Spartan II, Spartan IIE 120/150 nm Virtex II, Virtex II Pro 90 nm Spartan 3 Virtex 4 65 nm Virtex 5 45 nm Spartan 6 40 nm Virtex 6 28 nm Ar x 7 Virtex 7 Altera FPGA Families
Technology Low-cost Mid-range High- performance
130 nm Cyclone Stra x
90 nm Cyclone II Stra x II
65 nm Cyclone III Arria I Stra x III
40 nm Cyclone IV Arria II Stra x IV
28 nm Cyclone V Arria V Stra x V FPGA Family
11 Spartan 6 FPGA Family
12 CLB Structure
George Mason University General structure of an FPGA
Programmable interconnect
Programmable logic blocks
The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com)
14 Xilinx Spartan 6 CLB
15 Row & Column Relationship Between CLBs & Slices
16 Three Different Types of Slices
50% 25% 25%
17 SLICEX
18 SLICEL
19 Xilinx Multipurpose LUT (MLUT)
132-bit6-bit SRSR
1646 x x 1 1 RRAMAM
464-in px u1t ROMLUT (logic)
The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com)
20 4-input LUT (Look-Up Table) in the Basic ROM Mode
• Look-Up tables x1 x 2 y x x x x y x3 LUT x x x x y are primary 1 2 3 4 x 1 2 3 4 0 0 0 0 1 4 0 0 0 0 0 0 0 0 1 1 0 0 0 1 1 elements for 0 0 1 0 1 0 0 1 0 0 0 0 1 1 1 0 0 1 1 0 logic 0 1 0 0 1 0 1 0 0 0 0 1 0 1 1 0 1 0 1 1 0 1 1 0 1 0 1 1 0 0 implementation 0 1 1 1 1 0 1 1 1 1 1 0 0 0 1 1 0 0 0 0 1 0 0 1 1 1 0 0 1 1 • Each LUT can 1 0 1 0 1 1 0 1 0 0 1 0 1 1 1 1 0 1 1 0 implement any 1 1 0 0 0 1 1 0 0 1 1 1 0 1 0 1 1 0 1 1 x x x x function of 1 1 1 0 0 1 2 3 4 1 1 1 0 0 1 1 1 1 0 1 1 1 1 0 4 inputs
x1 x2
y
y
21 6-Input LUT of Spartan 6
22 23 Reset and Set Configurations
• No set or reset • Synchronous set • Synchronous reset • Asynchronous set (preset) • Asynchronous reset (clear)
24 MLUT as a 32-bit Shift Register (SRL32)
25 Fast Carry Logic
u Each CLB contains separate logic and routing for the fast generation of sum & carry MSB signals
• Increases efficiency and performance of adders, subtractors, accumulators, comparators, and counters Routing Carry Logic Carry u Carry logic is independent of LSB normal logic and routing resources
26 Accessing Carry Logic
u All major synthesis tools can infer carry logic for arithmetic functions • Addition (SUM <= A + B) • Subtraction (DIFF <= A - B) • Comparators (if A < B then…) • Counters (count <= count +1)
27
Full-adder
x cout FA y s cin
x y cin cout s 0 0 0 0 0 2 1 x + y + c = ( c s ) 0 0 1 0 1 in out 2 0 1 0 0 1 0 1 1 1 0 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 1 1 Carry & Control Logic in Xilinx FPGAs
x y COUT 0 0 y x y 0 1 CIN 1 0 CIN 1 1 y
Propagate = x ⊕ y Generate = y Sum= Propagate ⊕ CIN = x ⊕ y ⊕ CIN Carry & Control Logic in Spartan 6 FPGAs
x y
LUT
Hardwired (fast) logic Examples:
Determine the amount of Spartan 6 resources needed to implement a given circuit
George Mason University m Circuit 1: 0 1 run w Top level R0 R1
R2
R3
R4
R5
R6
R7 a b F R8 c y d R9
R10
R11
R12
R13
R14
R15 clk a Circuit 1: b y3 0 a w1 F – function 1 y2 b w0 2 c y1 3 En 4 1 d y0 5 e 1 y 2-to-4 Decoder 0 6 f 7 0
a x3 y3 3 e b x2 y2 f <<<3 c x1 y1 g d x0 y0 s h cout cin c Full x y Adder
g h d Circuit 2: 0 1 run z Top level R0 R1
R2
R3
R4
R5
R6
R7 a b R8 c F y d R9 e R10
R11
R12
R13
R14
R15 clk a Circuit 2: e a w3 0 F – function y1 1 b w2 2 y0 c 3 w1 1 4 d z 1 5 g y Priorityw0 Encoder 0 6 h 0 7
a x3 y3 3 f
b x2 y2 >>2 g c x1 y1 h d s i x0 y0 cout Half x y Adder
e i Circuit 3: Top level Input/Output Blocks (IOBs)
George Mason University Basic I/O Block Structure
Three-State D Q EC FF Enable Three-State Clock SR Control Set/Reset
Output D Q FF Enable EC Output Path SR
Direct Input FF Enable Input Path Registered Q D Input EC SR
39 IOB Functionality
• IOB provides interface between the package pins and CLBs • Each IOB can work as uni- or bi-directional I/O • Outputs can be forced into High Impedance • Inputs and outputs can be registered • advised for high-performance I/O • Inputs can be delayed
40 Clock Management
George Mason University A simple clock tree
Clock Flip-flops tree
Special clock pin and pad
Clock signal from outside world
The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com)
42 Clock Manager
Clock signal from outside world Daughter clocks Clock used to drive internal clock trees Manager or output pins etc.
Special clock pin and pad
The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com)
43 Jitter
1 2 3 4
Ideal clock signal
Real clock signal with jitter Cycle 1 Cycle 2 Cycle 3 Cycle 4 Superimposed cycles
The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com)
44 Removing Jitter
Clock signal from outside world with jitter “Clean” daughter Clock clocks used to drive internal clock trees Manager or output pins etc.
Special clock pin and pad
The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com)
45 Frequency Synthesis
1.0 x original clock frequency
2.0 x original clock frequency
.5 x original clock frequency
The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com)
46 Phase shifting
0o Phase shifted
90o Phase shifted
180o Phase shifted
270o Phase shifted
The Design Warrior’s Guide to FPGAs Devices, Tools, and Flows. ISBN 0750676043 Figure 4-20 Copyright © 2004 Mentor Graphics Corp. (www.mentor.com)
47 Clock Management Tiles
DCM – Digital Clock Manager PLL - Phase Locked Loop
48 Spartan-6 Family Attributes
George Mason University Spartan-6 FPGA Family Members
50 FPGA device present on the Digilent Nexys 3 board
XC6SLX16-CSG324C
Size Spartan 6 324 pins family Logic Package type Optimized (Ball Chip-Scale) Commercial temperature range 0° C – 85° C
51 FPGA Design Flow
George Mason University FPGA Design process (1)
Design and implement a simple unit permitting to speed up encryption with RC5-similar cipher with fixed key set on 8031 microcontroller. Unlike in the experiment 5, this time your unit has to be able Specification / Pseudocode to perform an encryption algorithm by itself, executing 32 rounds…..
On-paper hardware design (Block diagram & ASM chart)
VHDL description (Your Source Files)
Library IEEE; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; Functional simulation entity RC5_core is port( clock, reset, encr_decr: in std_logic; data_input: in std_logic_vector(31 downto 0); data_output: out std_logic_vector(31 downto 0); out_full: in std_logic; key_input: in std_logic_vector(31 downto 0); key_read: out std_logic; ); end AES_core;
Synthesis Post-synthesis simulation FPGA Design process (2)
Implementation Timing simulation
Configuration On chip testing Tools used in FPGA Design Flow
Functionally verified VHDL code Design
VHDL code
Xilinx XST Synplify Premier Synthesis
Netlist
Implementation Xilinx ISE
Bitstream 55 Synthesis
George Mason University Synthesis Tools
Xilinx XST Synplify Premier
… and others
57 Logic Synthesis
VHDL description Circuit netlist architecture MLU_DATAFLOW of MLU is signal A1:STD_LOGIC; signal B1:STD_LOGIC; signal Y1:STD_LOGIC; signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC; begin A1<=A when (NEG_A='0') else not A; B1<=B when (NEG_B='0') else not B; Y<=Y1 when (NEG_Y='0') else not Y1;
MUX_0<=A1 and B1; MUX_1<=A1 or B1; MUX_2<=A1 xor B1; MUX_3<=A1 xnor B1;
with (L1 & L0) select Y1<=MUX_0 when "00", MUX_1 when "01", MUX_2 when "10", MUX_3 when others; end MLU_DATAFLOW;
58 Circuit netlist (RTL view)
59 Mapping
LUT0 LUT4
LUT1 FF1 LUT5
LUT2
FF2 LUT3
60 Xilinx XST Inputs/Outputs
61 Xilinx XST Inputs
• RTL VHDL and/or Verilog files • Constraints – XCF Xilinx constraints file in which you can specify synthesis, timing, and specific implementation constraints that can be propagated to the NGC file. • Core files These files can be in either NGC or EDIF format. XST does not modify cores. It uses them to inform area and timing optimization surrounding the cores.
62 Xilinx XST Outputs
• NGC Netlist file with constraint information • NGR This is a schematic representation of the pre-optimized design shown at the Register Transfer Level (RTL). This representation is in terms of generic symbols, such as adders, multipliers, counters, AND gates, and OR gates, and is generated after the HDL synthesis phase of the synthesis process. • LOG This report contains the results from the synthesis run, including area and timing estimation.
63 RTL view in Synplify Premier