Announcing Vivado™ Built from the Ground Up for the Next Decade of ‘All Programmable’ Devices
© Copyright 2012 Xilinx Announcing Vivado Design Suite
IP & System Centric Next Generation Design Environment
For the Next Decade of ‘All Programmable’ Devices
Accelerating Integration & Implementation up to 4X
Built from the Ground Up for the Next Decade of Programmable Design
Page 3 © Copyright 2012 Xilinx Why Now?
Programmable Logic Devices ALL Programmable Devices Enables Programmable Enables Programmable ‘Logic’ Systems ‘Integration’
Page 4 © Copyright 2012 Xilinx Bottlenecks are Shifting
System Integration Bottlenecks – Design and IP reuse – Integrating algorithmic and RTL level IP – Mixing DSP, embedded, connectivity, logic – Verification of blocks and “systems”
Implementation Bottlenecks – Hierarchical chip planning – Multi-domain and multi-die physical optimization – Predictable ‘design’ vs. ‘timing’ closure – Late ECOs and rippling effect of changes
Page 5 © Copyright 2012 Xilinx Vivado: Accelerating Productivity up to 4X
Accelerating Integration
IP & System-centric Vivado Next up to Integration with Fast Generation 4X Verification Design System
RTL to Bit-stream with Iterative Approach 1X Fast, Hierarchical and Deterministic Closure Accelerating Automation w/ ECO Implementation
1X up to 4X
Page 6 © Copyright 2012 Xilinx Vivado Design Suite Elements
Integrated Design Environment Shared Scalable Data Model Accelerating IP & System-centric Integration Integration with Fast Debug and Analysis Verification
Fast, Hierarchical and Accelerating Deterministic Closure Implementation Automation w/ ECO
Scalable to 100M Gates
Page 7 © Copyright 2012 Xilinx Vivado Key Enabling Technologies Shared, Scalable Data Model
Progressive estimation accuracy across the entire flow Reduced iterations late in the cycle
Estimation
IP RTL Design Synthesis Place & Route Integration
Shared, Scalable Data Model
Shares design information between RTL Schematics Placement implementation steps entity FIR is port (clk : in – Ensures fast convergence and timing closure rst : in din : in
Tool Highly efficient memory utilization Code Settings – Scalable to future families > 10M logic cells Changes Placement Edits
(100M Gates) Timing Report
Timing Path #1 Timing Path #2 Enables cross-probing across the Timing Path #3 entire design Reports
Page 8 © Copyright 2012 Xilinx Vivado Design Suite Elements
Integrated Design Environment
ESL Algorithm Shared Scalable Data Model IP Synthesis Accelerating IP & System-centricSystems Debug Debug and Analysis Integration IntegrationIP Assembly with Integration Fast Stds Based IP ReuseVerification
Fast Simulation & HW Co-sim
Scalable to 100M Gates
Page 9 © Copyright 2012 Xilinx IP-Centric Integration with Fast Verification
Hand-coded Vivado HLS Memory Interface VHDL C Design Time 12 1 Memory Interfaces (weeks) Processor PCIe Display Latency 37 21 System (ms) Memory 134 (16%) 10 (1%) (RAMB18E1) Embedded Interconnect Memory 273 (65%) 138 (33%) rd (RAMB36E1) User IP Xilinx IP 3 Party IP
Registers 29686 (9%) 14263 (4%) Processing Datapath
LUTs 28152 (18%) 24257 (16%)
ESL Algorithm IP Synthesis IP & System-centricIP & HW-SW IntegrationIP Assembly with Integrator Fast Stds Based IP ReuseVerification Runtime
Tcl SDC Fast Simulation & HW Co-Sim
w/ HW Co-sim Vivado ISim
Page 10 © Copyright 2012 Xilinx Package Designs into System-Level IP for Reuse
Standardized IP-XACT representation Vivado IP Integrator
Memory Interface
Memory Interfaces Source (C, RTL, IP, etc) Processor PCIe Display Simulation Models System Xilinx IP Documentation IP Packager Embedded Interconnect Example Designs rd 3 Party IP User IP Xilinx IP 3rd Party IP
Test Bench Processing Datapath User IP
Share IP within your team, project or company 3rd party IP delivered with a common look and feel
Reuse IP at any point in the implementation process Reuse in different designs – Source, placed, or placed and routed
Reuse multiple times
Page 11 © Copyright 2012 Xilinx Seamless IP Access and Customization
Integrated IP catalog – Powerful search capabilities – Single-click access to IP functionality and collateral
IP customization and generation – Instant access to customization GUI – Generate output products in project or remote directory – Customize graphically or via Tcl
Page 12 © Copyright 2012 Xilinx Selectable IP Targets
Flexible output targets – On-demand generation of IP output targets – Generate testbench, example, etc.
Integrated example designs – Evaluate IP directly as an instantiated source in a Vivado project
Multiple options for IP synthesis – IP sources with overall design – IP pre-synthesized as a netlist prior to design synthesis
Page 13 © Copyright 2012 Xilinx IP Packager: IP-XACT
IEEE 1685 IP-XACT is an industry standard way to represent data about IP (meta-data) – Port information – Latency – Configurable parameters – Etc. ASCII XML based Enables IP to be used in multiple vendor tools flows
Page 14 © Copyright 2012 Xilinx IP Packager: Generate IP-XACT for your IP
Wizard-based flow automates generation of IP-XACT IP Prepare an IP for distribution to customers or colleagues Many pieces of meta-data automatically inferred Users can add additional meta-data
Page 15 © Copyright 2012 Xilinx IP Packager: Create System-Level IP
1. Run IP Packager from Tools menu 2. Package sources from Vivado project as IP 3. Provide information to uniquely identify your IP
Page 16 © Copyright 2012 Xilinx Extensible IP Catalog: Add Packaged IP
1. Unzip IP to a local directory 2. Right-click on IP Catalog 3. Add directory to IP Catalog
Page 17 © Copyright 2012 Xilinx Vivado IP Integrator
A graphical design environment to enable rapid and accurate connection of complex IP – Connections made at the interface level, not the individual signal level – Automatic setting and propagation of IP parameters – Automated generated of RTL – Full support for arbitrary levels of design hierarchy – Capable of processor-based or non-processor based design creation Tight integration with Vivado IP Packager flow for rapid IP and subsystem reuse
Page 18 © Copyright 2012 Xilinx IP Integrator User Interface
Hierarchy Support
System Hierarchy Interface Connections View with Real-time DRCs
TCL Console
Page 19 © Copyright 2012 Xilinx IP Integrator Real-time DRCs
IP and system configuration rules can be very complex – User will require help to correctly connect IPs IP Integrator provides immediate feedback on design errors/optimization
Page 20 © Copyright 2012 Xilinx IP Integrator Real-time DRCs (cont)
All IP Integrator automation services can issue DRCs from: – IP configuration XGUI – IP specific automation services – Built-in automation services – System optimization services Not just errors: Intelligent DRCs may also include solutions – Goal: Proposed solutions can be actual Tcl code, not just passive text GUI collects, prioritize and reports DRCs to the user so they can make informed choices
Apply solution 1
Page 21 © Copyright 2012 Xilinx Vivado IP Integrator – Demo
Graphical design to enable rapid and
Memory accurate connection of complex IP Interface
– Connections made at the interface level, Memory Interfaces
not the individual signal level Processor PCIe Display System – Automatic setting and propagation of IP parameters Embedded Interconnect – Automated generation of RTL User IP Xilinx IP 3rd Party IP – Full support for arbitrary levels of design Processing Datapath hierarchy – Capable of processor-based or non-processor based design creation
Tight integration with Vivado IP Packager flow for rapid IP and subsystem reuse Start Demo >
Page 22 © Copyright 2012 Xilinx Vivado High-Level Synthesis Accelerates IP Development and Design Space Exploration
Ideal for DSP, video and high performance compute applications QoR that rivals hand coded RTL – Fast compilation and design exploration – Algorithm/architecture feasibility Comprehensive coverage – C/C++/SystemC – Arbitrary precision – Floating-point Accelerated verification – 2 to 3 orders of magnitude faster than RTL for larger design
BDTI certified and production proven at 20+ customer sites
Page 23 © Copyright 2012 Xilinx ESL Design Methodology
Functionality High-Level Synthesis
Model Model-Based ESL Design
Architecture Synthesis RTL
Gates Place & Route Netlist
Silicon Layout
Page 24 © Copyright 2012 Xilinx ESL Solutions
Electronic System Level ESL High-Level Synthesis Model-Based Design HLS MBD Language Structure Input Method C-based (functions) C-based (bus-functional Simulation models) Behavior Signals Operation Level (no clocks/resets) High-Level Synthesis IP Implementation Method (allocation/scheduling) • Flexibility • Available libraries • Architecture exploration • Result analysis/visualization Benefits Portability Verification speed Quality of results
Page 25 © Copyright 2012 Xilinx Model Based Design – System Generator
Easily create System Generator DSP sources Add existing files or create new Simulink models
Page 26 © Copyright 2012 Xilinx Model Based Design – System Generator
Fully integrated into the Vivado IDE – Launch System Generator from the Vivado IDE
Page 27 © Copyright 2012 Xilinx HLS: What’s different?
Established specification language – C/C++/SystemC standards Quality of Results – Extracting parallel execution from sequential specification Accommodates datapath and control Complement RTL-based tools Acknowledgement of verification needs Consideration for physical interfaces
Source: IEEE Design&Test of Computers (2009) Volume: 26, Issue: 4, Publisher: IEEE Computer Society, Pages: 18-25
Page 28 © Copyright 2012 Xilinx Vivado HLS Design Flow
Vivado HLS
Function C Specification C Verification
Starts at C C C – C Design Test Bench
– C++ C Synthesis – SystemC Architecture RTL Produces RTL Design – Verilog RTL Verification – VHDL C Behavioral – SystemC Wrapper Verification
Automates Flow IP Block – RTL Verification Packaging Vivado IP Packager – IP Packaging IP Package Vivado IP Integrator System Generator
Page 29 © Copyright 2012 Xilinx Core Technology
Function Parameter CTHREAD Function Loop Statement Array
A P F1 L1 L2 F
Coding Style (C) Style Coding
NROLL NLINE TREAM IPELINE I ATAFLOW ARTITION U S P LLOCATION NTERFACE
Architecture P D I A
f L1 P F1 L2 F Directives (Tcl) (Tcl) Directives Port Clock Module Process Operation Memory Domain AXI4 Interface Performance Resources IP Block
______AXI4 Portable IP Portable
Page 30 © Copyright 2012 Xilinx Function versus Architecture
Function Sequential void top ( int& dout1, int& dout2, int din1, int din2 ) { dout1 = din1+din2; dout2 = din1*din2; }
Architecture Datapath State machine Interface always @(posedge clk) always @(posedge clk) case (state) module top (dout1,dout2, begin RST: din1,din2, if (rst == 1’b1) begin ovld,ivld, state <= RESET; dout1 <= 32’b0; clk,rst); else dout2 <= 32’b0; output [31:0] dout1,dout2; state <= next_state; ovld <= 0’b0; output ovld; case (state) end input [31:0] din1,din2; RST: CALC: input ivld; next_state <= INPUT; begin input clk; INPUT: dout1 <= din1+din2; din1 input rst; if (ivld == 1’b1) tmp <= din1*din2; + dout1 begin end Storage next_state <= CALC; OUTPUT: rdin1 <= din1; begin dout2 reg [31:0] tmp,rdin1,rdin2; rdin2 <= din2; dout2 <= tmp; * reg [31:0] dout1,dout2; end ovld <= 1’b1; din2 reg ovld; CALC: end reg [1:0] state, next_state; next_state <= OUTPUT; rst endcase parameter OUTPUT: RESET=2’b00,INPUT=2’b01, next_state <= RESET; RESET CALC=2’b10,OUTPUT=2’b11; default: next_state <= RESET; Parallel endcase OUTPUT INPUT Process end CALC !ivld endmodule ivld
Page 31 © Copyright 2012 Xilinx Vivado HLS C Development
CDT based – Simplified for HLS user – Windows • MinGW/msys included – Linux SystemC libraries included Video/Image functional verification: 10000x speed versus RTL simulation
Page 32 © Copyright 2012 Xilinx Standard Input
Structured Programming Design Specification Object Oriented Programming (OOP) System Modeling – Superior language support C CPP OSCI SystemC • C Easy, familiar (C99) (Standard C++) (IEEE 1666-2005) • C++ Methodical Function Class Module/Port Template (STL) Arbitrary precision • SystemC Standard Parallel process Time (Simulation Directives Kernel) Abstraction (TLM) – Tcl Efficient Exploration – Pragma Self-documenting
Page 33 © Copyright 2012 Xilinx Vivado HLS Synthesis
Fast for architecture 250 exploration
– Tcl-based batch mode 200 (s) Scalable for ultra-large IP blocks 150 – 200K LUT/hour 100 Quality of results – DSP48 inferencing Synthesis Time Time Synthesis 50 – Parallelization (instruction and task) 0 0 10000 20000 Design Size (LUT)
Page 34 © Copyright 2012 Xilinx DSP Applications
Arbitrary Precision – C void yuv2rgb ( pixel_t *in, • Simulation and Synthesis pixel_t *out ) { – C++ hls_ufixed<8,8,HLS_RND,HLS_SAT> R, G, B; hls_fixed<8,8,HLS_RND,HLS_SAT> Y, U, V; • Rounding and Saturation const ap_fixed<11,2,HLS_RND> Wyuv[3][3] = { {1, 0, 1.13983}, {1,-0.39465,-0.5806}, void yuv2rgb ( {1, 2.03211, 0}, pixel_t *in, }; pixel_t *out
) { Y = in->col1; uint8 R, G, B; U = in->col2; int9 C, D, E, Y, U, V; V = in->col3; const int11 Wyuv[3][3] = { R = Wyuv[0][0] * Y + Wyuv[0][2] * V; {298, 0, 409}, G = Wyuv[1][0] * Y + Wyuv[1][1] * U + Wyuv[1][2] * V; {298, -100, -208}, B = Wyuv[2][0] * Y + Wyuv[2][1] * U ; {298, 516, 0}, out->col1 = R; }; out->col2 = G;
out->col3 = B; Y = in->col1; } U = in->col2; V = in->col3; C = Y - 16; D = U - 128; E = V - 128; R = CLIP(( Wyuv[0][0] * C + Wyuv[0][2] * E + 128) >> 8); G = CLIP(( Wyuv[1][0] * C + Wyuv[1][1] * D + Wyuv[1][2] * E + 128) >> 8); B = CLIP(( Wyuv[2][0] * C + Wyuv[2][1] * D + 128) >> 8); out->col1 = R; out->col2 = G; out->col3 = B; }
Page 35 © Copyright 2012 Xilinx DSP Applications
DSP48 inferencing
#include "ap_int.h" #include
typedef ap_int<18> data_t; typedef complex
• Expression matching: pre/mult/post • Coding Style • Attributes
areal * - rreal aimag * breal * + rimag bimag * Page 36 © Copyright 2012 Xilinx HPC Applications
Floating-Point – Performance – Latency IEEE 754 Compliance Allocated #include "fir.h" Vectorization Full data_t fir(data_t x) { Vectorization const coef_t c[N] = { #include "fir.inc" }; // Delay line has extra delay at input static data_t z[N]; acc_t acc = 0; int i,j; taps: for (i = N-1; i >= 0; i--) { z[i] = (i==0) ? x : z[i-1]; acc += z[i] * c[i]; } return acc; } typedef float coef_t; typedef float data_t; typedef float acc_t;
Page 37 © Copyright 2012 Xilinx Vivado High-Level Synthesis – Demo
Bridging the Gap – Algorithm designer to Hardware designer
Specification & Creating with C Debug C C – Concise Specification Design Test Bench High-Level – Algorithm Exploration Synthesis RTL – Fast Verification Design
Verification
C Architectural Reusing with HLS Wrapper Verification
– Architecture Exploration Packaging
. IP Integrator – Cost/Power Reduction IP Packager . System Generator . RTL – Portable IP Vivado HLS
Start Demo >
Page 38 © Copyright 2012 Xilinx Vivado Design Suite Elements
Integrated Design Environment Shared Scalable Data Model Debug Debug and Analysis
Hierarchical Chip Planning P o Fast Synthesis w/ E w SDC Constraints C Accelerating e O r Implementation Deterministic P&R, Closure Automation
Scalable to 100M Gates
Page 39 © Copyright 2012 Xilinx Deterministic Design Closure
TOP
DATA CPU CNTRL
MEM DMA
Hierarchical Chip Planning P Analyzer o E Fast Synthesis w/ T (hrs) w SDC Constraints C Optimizer e O r Deterministic P&R, Full compile Closure Automation Incremental Full compile Incremental
Full compile Full compile
Page 40 © Copyright 2012 Xilinx Vivado Synthesis
Tightly integrated into Vivado IDE Superior SystemVerilog support 3x faster runtime – 15x with “quick-synthesis” option – Scales to multi million Logic Cells Design for debug – Easy to navigate schematic with cross-probing to HDL – Mark nets for hardware debug
Page 41 © Copyright 2012 Xilinx Higher-level RTL Synthesis Control Data Flow Graph Optimization
Example: Counting 1’s in a vector
c = 0; c = a[0] + a[1] + a[2] + for (i=0; i<8; i=i+1) a[3] + a[4] + a[5] + if (a[i] == 1) a[6] + a[7]; c = c + 1; Optimized & unrolled Original code a[0] a[7] a[1] + + c c a[2] + +1 0 +1 +1 a[3] … a[0] + a[6] a[7] a[6] a[4] a[5] Traditional Synthesis tools Optimized by Vivado Synthesis – Area: 8 ADD + 8 MUX – Area: 4 smaller ADD 4x smaller – Depth: O(n) – Depth: O(log(n))
Page 42 © Copyright 2012 Xilinx Vivado Key Enabling Technologies Analytical Place & Route Engine
initial random seed random moves optimal solution (not found)
not routable
Timing Cost f(x)Cost Timing best solution found
Placement Solution x (found by random moves and seeds)
Traditional P&R Vivado P&R “Cost” 3 dimensions: timing, congestion, 1 dimension: timing minimization Criteria wire length minimization
Primary “Simulated Annealing”: Random, Analytical: solves simultaneous Algorithm iterative search based on initial seed equations to minimize all dimensions Unpredictable Very predictable Runtime Due to random nature of algorithm. Manages congestion Exponential with congestion
Poor results as design approaches Will handle 10M+ logic cells Scalability 1M logic cells with predictable results
Page 43 © Copyright 2012 Xilinx Runtime Advantage of Analytical Place & Route
Up to 4x faster than alternative solutions 25
20 12h/MLC
15
Vivado ISE 10
Runtime (hours) CompetitorCompet 4.6h/MLC
5 More predictable runtimes 0 0 500,000 1,000,000 1,500,000 2,000,000 Design size (LC)
Based on a benchmark suite of 100+ designs
Page 44 © Copyright 2012 Xilinx Vivado Design Example 1.2 M LC Virtex-7 2000T Design*
ISE Vivado P&R runtime 13 hrs 5 hrs Memory usage 16 GB 9 GB
Wire length and congestion
Significantly reduced
Customer proven: All SSI customers using Vivado today! *Zynq Emulation Platform
Page 45 © Copyright 2012 Xilinx Vivado Design Example Kintex-7 325T Design
ISE Vivado Component usage 82% LUT, 98% DSP 94% LUT, 63% DSP
Congestion
Does not route Routes!
Analytical placer discovers a more optimal solution
Page 46 © Copyright 2012 Xilinx Vivado Design Example Virtex-7 485XT Design
ISE Vivado Component usage 27% LUT, 15% FF 27% LUT, 15% FF
Wire length
Could not meet timing Timing met!
More effective placement eases timing closure
Page 47 © Copyright 2012 Xilinx Vivado IDE
Design & Analysis Environment – IP Integrator & IP Catalog – Estimations at any stage of design flow – Implementation, incremental editing – Cross-probing / Tcl interaction – Floorplanner and Design Editor – Report and log browser
Page 48 © Copyright 2012 Xilinx XDC – Xilinx Design Constraints
XDC is an extension of Synopsys Design Constraints (SDC) – Standard SDC for timing, plus physical constraints – Constraint for Synthesis through P&R – Vivado sign-off static timing analysis (STA) Vivado Tcl Powerful debug and analysis – Fast custom timing reports XDC – What-If Analysis with STA – Extendable and customizable SDC Industry Standard Tool Control – Complete automation for design compiles – 3rd Party tools use same interface – Cross-platform scripting (Linux and Windows)
Page 49 © Copyright 2012 Xilinx Tcl-driven Environment
Interaction between Tcl and IDE views Direct in-memory access to the design database – Batch mode: start_gui or stop_gui at any time for graphical analysis – Analyze or even make changes
Page 50 © Copyright 2012 Xilinx Project vs. Batch Flows
Project Based Batch Based Rodin supports two flows: – Project IDE Script Script – Batch (Project-less) Project Flow create_project … read_verilog … add_files … read_vhdl … – Project infrastructure import_files … read_edif … – IDE or Tcl script … … launch_run synth_1 synth_design … – Reports automatically wait_on_run … report_timing_summary generated open_run … write_checkpoint … Batch Flow report_timing_summary opt_design launch_runs impl_1 place_design – No project infrastructure wait_on_run … report_timing_summary – Tcl based open_run … write_checkpoint … – Use GUI for visualization route_design report_timing_summary via start_gui write_checkpoint … – Must manually create reports and checkpoints
Page 51 © Copyright 2012 Xilinx Analysis at Schematic View
Create Schematic from selected Timing Path(s) View logic levels across critical timing paths Expand connectivity and select parent hierarchies
Page 52 © Copyright 2012 Xilinx Analysis at Placed Design View
Select Paths in Timing Results, view objects cross-selected in other windows
Page 53 © Copyright 2012 Xilinx Detailed Placement View
Zoom in for detailed device view – Timing paths display from actual pins of primitives Slice usage shown – Data flow through a slice displayed
Page 54 © Copyright 2012 Xilinx Analysis at Routed Design View
View Routing Resources View route used Cross-probing enabled Change layers & colors
Page 55 © Copyright 2012 Xilinx Vivado Implementation – Demo
Easy to Use – Single IDE to learn – Push-button flows – Task-based “Views”
Fast Synthesis
Fast Implementation
Rapid design analysis
Start Demo >
Page 56 © Copyright 2012 Xilinx Make Last Minute Design Changes without Re-building the Entire Design
Incremental Implementation T (hrs) – Ideal for small changes (<5%) – Timing preservation from run to run Full compile • Timing changes limited to modified areas Incremental Full compile – Runtime: 2.5x faster Incremental
Full compile Full compile
Post P&R with Device Editor – All FPGA elements are visible with exact placement and precise routing topology – Rapid post-route design editing • Make design changes and auto-route nets • Modify routing, placement, logic – GUI or Tcl based
Page 57 © Copyright 2012 Xilinx Hierarchical Design Flows: Design Reuse
Design Reuse Flow enables parallel implementation for Team Design – Place & Route modules without top level design – Assemble results with exact preservation – Leverages natively-hierarchical data model Package IP and reuse in new designs – Reuse module as a pre-verified placed & routed result
Design Reuse support starts at 2012.3
Page 58 © Copyright 2012 Xilinx Hierarchical Design Flows: Partial Reconfiguration
System Flexibility – Swap functions and perform remote updates while system is operational
Cost and Size Reduction – Time-multiplexing hardware requires a smaller FPGA – Reduces board space – Minimizes bitstream storage
Power Reduction – Via smaller or/and fewer devices – Swap out power-hungry tasks
Partial Reconfiguration support starts in 2013
Page 59 © Copyright 2012 Xilinx Power Optimization
Fine-grain clock gating reduces dynamic power by up to 30%
Before After
Access via Tcl command – power_opt_design Push button power reduction for the entire design Targeted optimizations for specified resources – Clock domains (set_power_opt -clocks) – BRAM, Registers, SRL (set_power_opt -cell_types) – Instances (set_power_opt -include_cells / -exclude_cells)
Page 60 © Copyright 2012 Xilinx Power Analysis
Accurate power and thermal analysis Power estimates at every stage after synthesis Analyze power by consumption type – Each view deconstructed with low-level details What-if analysis by varying switching activity Extensive debug capabilities via cross-probing Export data to Xilinx Power Estimator
Page 61 © Copyright 2012 Xilinx