<<

Announcing Vivado™ Built from the Ground Up for the Next Decade of ‘All Programmable’ Devices

© Copyright 2012 Xilinx Announcing Vivado Design Suite

IP & System Centric Next Generation Design Environment

For the Next Decade of ‘All Programmable’ Devices

Accelerating Integration & Implementation up to 4X

Built from the Ground Up for the Next Decade of Programmable Design

Page 3 © Copyright 2012 Xilinx Why Now?

Programmable Logic Devices ALL Programmable Devices Enables Programmable Enables Programmable ‘Logic’ Systems ‘Integration’

Page 4 © Copyright 2012 Xilinx Bottlenecks are Shifting

System Integration Bottlenecks – Design and IP reuse – Integrating algorithmic and RTL level IP – Mixing DSP, embedded, connectivity, logic – Verification of blocks and “systems”

Implementation Bottlenecks – Hierarchical chip planning – Multi-domain and multi-die physical optimization – Predictable ‘design’ vs. ‘timing’ closure – Late ECOs and rippling effect of changes

Page 5 © Copyright 2012 Xilinx Vivado: Accelerating Productivity up to 4X

Accelerating Integration

IP & System-centric Vivado Next up to Integration with Fast Generation 4X Verification Design System

RTL to Bit-stream with Iterative Approach 1X Fast, Hierarchical and Deterministic Closure Accelerating Automation w/ ECO Implementation

1X up to 4X

Page 6 © Copyright 2012 Xilinx Vivado Design Suite Elements

Integrated Design Environment Shared Scalable Data Model Accelerating IP & System-centric Integration Integration with Fast Debug and Analysis Verification

Fast, Hierarchical and Accelerating Deterministic Closure Implementation Automation w/ ECO

Scalable to 100M Gates

Page 7 © Copyright 2012 Xilinx Vivado Key Enabling Technologies Shared, Scalable Data Model

Progressive estimation accuracy across the entire flow Reduced iterations late in the cycle

Estimation

IP RTL Design Synthesis Place & Route Integration

Shared, Scalable Data Model

Shares design information between RTL Schematics Placement implementation steps entity FIR is port (clk : in – Ensures fast convergence and timing closure rst : in din : in

Tool Highly efficient memory utilization Code Settings – Scalable to future families > 10M logic cells Changes Placement Edits

(100M Gates) Timing Report

Timing Path #1 Timing Path #2 Enables cross-probing across the Timing Path #3 entire design Reports

Page 8 © Copyright 2012 Xilinx Vivado Design Suite Elements

Integrated Design Environment

ESL Algorithm Shared Scalable Data Model  IP Synthesis Accelerating IP & System-centricSystems Debug Debug and Analysis Integration IntegrationIP Assembly with Integration Fast Stds Based IP ReuseVerification

Fast Simulation & HW Co-sim

Scalable to 100M Gates

Page 9 © Copyright 2012 Xilinx IP-Centric Integration with Fast Verification

Hand-coded Vivado HLS Memory Interface VHDL Design Time 12 1 Memory Interfaces (weeks) Processor PCIe Display Latency 37 21 System (ms) Memory 134 (16%) 10 (1%) (RAMB18E1) Embedded Interconnect Memory 273 (65%) 138 (33%) rd (RAMB36E1) User IP Xilinx IP 3 Party IP

Registers 29686 (9%) 14263 (4%) Processing Datapath

LUTs 28152 (18%) 24257 (16%)

ESL Algorithm  IP Synthesis IP & System-centricIP & HW-SW IntegrationIP Assembly with Integrator Fast Stds Based IP ReuseVerification Runtime

Tcl SDC Fast Simulation & HW Co-Sim

w/ HW Co-sim Vivado ISim

Page 10 © Copyright 2012 Xilinx Package Designs into System-Level IP for Reuse

Standardized IP-XACT representation Vivado IP Integrator

Memory Interface

Memory Interfaces Source (C, RTL, IP, etc) Processor PCIe Display Simulation Models System Xilinx IP Documentation IP Packager Embedded Interconnect Example Designs rd 3 Party IP User IP Xilinx IP 3rd Party IP

Test Bench Processing Datapath User IP

Share IP within your team, project or company 3rd party IP delivered with a common look and feel

Reuse IP at any point in the implementation process Reuse in different designs – Source, placed, or placed and routed

Reuse multiple times

Page 11 © Copyright 2012 Xilinx Seamless IP Access and Customization

Integrated IP catalog – Powerful search capabilities – Single-click access to IP functionality and collateral

IP customization and generation – Instant access to customization GUI – Generate output products in project or remote directory – Customize graphically or via Tcl

Page 12 © Copyright 2012 Xilinx Selectable IP Targets

Flexible output targets – On-demand generation of IP output targets – Generate testbench, example, etc.

Integrated example designs – Evaluate IP directly as an instantiated source in a Vivado project

Multiple options for IP synthesis – IP sources with overall design – IP pre-synthesized as a netlist prior to design synthesis

Page 13 © Copyright 2012 Xilinx IP Packager: IP-XACT

IEEE 1685 IP-XACT is an industry standard way to represent data about IP (meta-data) – Port information – Latency – Configurable parameters – Etc. ASCII XML based Enables IP to be used in multiple vendor tools flows

Page 14 © Copyright 2012 Xilinx IP Packager: Generate IP-XACT for your IP

Wizard-based flow automates generation of IP-XACT IP Prepare an IP for distribution to customers or colleagues Many pieces of meta-data automatically inferred Users can add additional meta-data

Page 15 © Copyright 2012 Xilinx IP Packager: Create System-Level IP

1. Run IP Packager from Tools menu 2. Package sources from Vivado project as IP 3. Provide information to uniquely identify your IP

Page 16 © Copyright 2012 Xilinx Extensible IP Catalog: Add Packaged IP

1. Unzip IP to a local directory 2. Right-click on IP Catalog 3. Add directory to IP Catalog

Page 17 © Copyright 2012 Xilinx Vivado IP Integrator

A graphical design environment to enable rapid and accurate connection of complex IP – Connections made at the interface level, not the individual signal level – Automatic setting and propagation of IP parameters – Automated generated of RTL – Full support for arbitrary levels of design hierarchy – Capable of processor-based or non-processor based design creation Tight integration with Vivado IP Packager flow for rapid IP and subsystem reuse

Page 18 © Copyright 2012 Xilinx IP Integrator User Interface

Hierarchy Support

System Hierarchy Interface Connections View with Real-time DRCs

TCL Console

Page 19 © Copyright 2012 Xilinx IP Integrator Real-time DRCs

IP and system configuration rules can be very complex – User will require help to correctly connect IPs IP Integrator provides immediate feedback on design errors/optimization

Page 20 © Copyright 2012 Xilinx IP Integrator Real-time DRCs (cont)

All IP Integrator automation services can issue DRCs from: – IP configuration XGUI – IP specific automation services – Built-in automation services – System optimization services Not just errors: Intelligent DRCs may also include solutions – Goal: Proposed solutions can be actual Tcl code, not just passive text GUI collects, prioritize and reports DRCs to the user so they can make informed choices

Apply solution 1

Page 21 © Copyright 2012 Xilinx Vivado IP Integrator – Demo

Graphical design to enable rapid and

Memory accurate connection of complex IP Interface

– Connections made at the interface level, Memory Interfaces

not the individual signal level Processor PCIe Display System – Automatic setting and propagation of IP parameters Embedded Interconnect – Automated generation of RTL User IP Xilinx IP 3rd Party IP – Full support for arbitrary levels of design Processing Datapath hierarchy – Capable of processor-based or non-processor based design creation

Tight integration with Vivado IP Packager flow for rapid IP and subsystem reuse Start Demo >

Page 22 © Copyright 2012 Xilinx Vivado High-Level Synthesis Accelerates IP Development and Design Space Exploration

Ideal for DSP, video and high performance compute applications QoR that rivals hand coded RTL – Fast compilation and design exploration – Algorithm/architecture feasibility Comprehensive coverage – C/C++/SystemC – Arbitrary precision – Floating-point Accelerated verification – 2 to 3 orders of magnitude faster than RTL for larger design

BDTI certified and production proven at 20+ customer sites

Page 23 © Copyright 2012 Xilinx ESL Design Methodology

Functionality High-Level Synthesis

Model Model-Based ESL Design

Architecture Synthesis RTL

Gates Place & Route Netlist

Silicon Layout

Page 24 © Copyright 2012 Xilinx ESL Solutions

Electronic System Level ESL High-Level Synthesis Model-Based Design HLS MBD Language Structure Input Method C-based (functions) C-based (bus-functional Simulation models) Behavior Signals Operation Level (no clocks/resets) High-Level Synthesis IP Implementation Method (allocation/scheduling) • Flexibility • Available libraries • Architecture exploration • Result analysis/visualization Benefits Portability Verification speed Quality of results

Page 25 © Copyright 2012 Xilinx Model Based Design – System Generator

Easily create System Generator DSP sources Add existing files or create new Simulink models

Page 26 © Copyright 2012 Xilinx Model Based Design – System Generator

Fully integrated into the Vivado IDE – Launch System Generator from the Vivado IDE

Page 27 © Copyright 2012 Xilinx HLS: What’s different?

Established specification language – C/C++/SystemC standards Quality of Results – Extracting parallel execution from sequential specification Accommodates datapath and control Complement RTL-based tools Acknowledgement of verification needs Consideration for physical interfaces

Source: IEEE Design&Test of Computers (2009) Volume: 26, Issue: 4, Publisher: IEEE Computer Society, Pages: 18-25

Page 28 © Copyright 2012 Xilinx Vivado HLS Design Flow

Vivado HLS

Function C Specification C Verification

Starts at C C C – C Design Test Bench

– C++ C Synthesis – SystemC Architecture RTL Produces RTL Design – RTL Verification – VHDL C Behavioral – SystemC Wrapper Verification

Automates Flow IP Block – RTL Verification Packaging Vivado IP Packager – IP Packaging IP Package Vivado IP Integrator System Generator

Page 29 © Copyright 2012 Xilinx Core Technology

Function Parameter CTHREAD Function Loop Statement Array

A P F1 L1 L2 F

Coding Style (C) Style Coding

NROLL NLINE TREAM IPELINE I ATAFLOW ARTITION U S P LLOCATION NTERFACE

Architecture P D I A

f L1 P F1 L2 F Directives (Tcl) (Tcl) Directives Port Clock Module Process Operation Memory Domain AXI4 Interface Performance Resources IP Block

______AXI4 Portable IP Portable

Page 30 © Copyright 2012 Xilinx Function versus Architecture

Function Sequential void top ( int& dout1, int& dout2, int din1, int din2 ) { dout1 = din1+din2; dout2 = din1*din2; }

Architecture Datapath State machine Interface always @(posedge clk) always @(posedge clk) case (state) module top (dout1,dout2, begin RST: din1,din2, if (rst == 1’b1) begin ovld,ivld, state <= RESET; dout1 <= 32’b0; clk,rst); else dout2 <= 32’b0; output [31:0] dout1,dout2; state <= next_state; ovld <= 0’b0; output ovld; case (state) end input [31:0] din1,din2; RST: CALC: input ivld; next_state <= INPUT; begin input clk; INPUT: dout1 <= din1+din2; din1 input rst; if (ivld == 1’b1) tmp <= din1*din2; + dout1 begin end Storage next_state <= CALC; OUTPUT: rdin1 <= din1; begin dout2 reg [31:0] tmp,rdin1,rdin2; rdin2 <= din2; dout2 <= tmp; * reg [31:0] dout1,dout2; end ovld <= 1’b1; din2 reg ovld; CALC: end reg [1:0] state, next_state; next_state <= OUTPUT; rst endcase parameter OUTPUT: RESET=2’b00,INPUT=2’b01, next_state <= RESET; RESET CALC=2’b10,OUTPUT=2’b11; default: next_state <= RESET; Parallel endcase OUTPUT INPUT Process end CALC !ivld endmodule ivld

Page 31 © Copyright 2012 Xilinx Vivado HLS C Development

CDT based – Simplified for HLS user – Windows • MinGW/msys included – Linux SystemC libraries included Video/Image functional verification: 10000x speed versus RTL simulation

Page 32 © Copyright 2012 Xilinx Standard Input

Structured Programming Design Specification Object Oriented Programming (OOP) System Modeling – Superior language support C CPP OSCI SystemC • C Easy, familiar () (Standard C++) (IEEE 1666-2005) • C++ Methodical Function Class Module/Port Template (STL) Arbitrary precision • SystemC Standard Parallel process Time (Simulation Directives Kernel) Abstraction (TLM) – Tcl Efficient Exploration – Pragma Self-documenting

Page 33 © Copyright 2012 Xilinx Vivado HLS Synthesis

Fast for architecture 250 exploration

– Tcl-based batch mode 200 (s) Scalable for ultra-large IP blocks 150 – 200K LUT/hour 100 Quality of results – DSP48 inferencing Synthesis Time Time Synthesis 50 – Parallelization (instruction and task) 0 0 10000 20000 Design Size (LUT)

Page 34 © Copyright 2012 Xilinx DSP Applications

Arbitrary Precision – C void yuv2rgb ( pixel_t *in, • Simulation and Synthesis pixel_t *out ) { – C++ hls_ufixed<8,8,HLS_RND,HLS_SAT> R, G, B; hls_fixed<8,8,HLS_RND,HLS_SAT> Y, U, V; • and Saturation const ap_fixed<11,2,HLS_RND> Wyuv[3][3] = { {1, 0, 1.13983}, {1,-0.39465,-0.5806}, void yuv2rgb ( {1, 2.03211, 0}, pixel_t *in, }; pixel_t *out

) { Y = in->col1; uint8 R, G, B; U = in->col2; int9 C, D, E, Y, U, V; V = in->col3; const int11 Wyuv[3][3] = { R = Wyuv[0][0] * Y + Wyuv[0][2] * V; {298, 0, 409}, G = Wyuv[1][0] * Y + Wyuv[1][1] * U + Wyuv[1][2] * V; {298, -100, -208}, B = Wyuv[2][0] * Y + Wyuv[2][1] * U ; {298, 516, 0}, out->col1 = R; }; out->col2 = G;

out->col3 = B; Y = in->col1; } U = in->col2; V = in->col3; C = Y - 16; D = U - 128; E = V - 128; R = CLIP(( Wyuv[0][0] * C + Wyuv[0][2] * E + 128) >> 8); G = CLIP(( Wyuv[1][0] * C + Wyuv[1][1] * D + Wyuv[1][2] * E + 128) >> 8); B = CLIP(( Wyuv[2][0] * C + Wyuv[2][1] * D + 128) >> 8); out->col1 = R; out->col2 = G; out->col3 = B; }

Page 35 © Copyright 2012 Xilinx DSP Applications

DSP48 inferencing

#include "ap_int.h" #include using namespace std;

typedef ap_int<18> data_t; typedef complex cdata_t; cdata_t cmult ( cdata_t a, cdata_t b ) { return a*b; }

• Expression matching: pre/mult/post • Coding Style • Attributes

areal * - rreal aimag * breal * + rimag bimag * Page 36 © Copyright 2012 Xilinx HPC Applications

Floating-Point – Performance – Latency IEEE 754 Compliance Allocated #include "fir.h" Vectorization Full data_t fir(data_t x) { Vectorization const coef_t c[N] = { #include "fir.inc" }; // Delay line has extra delay at input static data_t z[N]; acc_t acc = 0; int i,j; taps: for (i = N-1; i >= 0; i--) { z[i] = (i==0) ? x : z[i-1]; acc += z[i] * c[i]; } return acc; } typedef float coef_t; typedef float data_t; typedef float acc_t;

Page 37 © Copyright 2012 Xilinx Vivado High-Level Synthesis – Demo

Bridging the Gap – Algorithm designer to Hardware designer

Specification & Creating with C Debug C C – Concise Specification Design Test Bench High-Level – Algorithm Exploration Synthesis RTL – Fast Verification Design

Verification

C Architectural Reusing with HLS Wrapper Verification

– Architecture Exploration Packaging

. IP Integrator – Cost/Power Reduction IP Packager . System Generator . RTL – Portable IP Vivado HLS

Start Demo >

Page 38 © Copyright 2012 Xilinx Vivado Design Suite Elements

Integrated Design Environment Shared Scalable Data Model Debug Debug and Analysis

Hierarchical Chip Planning P o Fast Synthesis w/ E w SDC Constraints C Accelerating e O r Implementation Deterministic P&R, Closure Automation

Scalable to 100M Gates

Page 39 © Copyright 2012 Xilinx Deterministic Design Closure

TOP

DATA CPU CNTRL

MEM DMA

Hierarchical Chip Planning P Analyzer o E Fast Synthesis w/ T (hrs) w SDC Constraints C Optimizer e O r Deterministic P&R, Full compile Closure Automation Incremental Full compile Incremental

Full compile Full compile

Page 40 © Copyright 2012 Xilinx Vivado Synthesis

Tightly integrated into Vivado IDE Superior SystemVerilog support 3x faster runtime – 15x with “quick-synthesis” option – Scales to multi million Logic Cells Design for debug – Easy to navigate schematic with cross-probing to HDL – Mark nets for hardware debug

Page 41 © Copyright 2012 Xilinx Higher-level RTL Synthesis Control Data Flow Graph Optimization

Example: Counting 1’s in a vector

c = 0; c = a[0] + a[1] + a[2] + for (i=0; i<8; i=i+1) a[3] + a[4] + a[5] + if (a[i] == 1) a[6] + a[7]; c = c + 1; Optimized & unrolled Original code a[0] a[7] a[1] + + c c a[2] + +1 0 +1 +1 a[3] … a[0] + a[6] a[7] a[6] a[4] a[5] Traditional Synthesis tools Optimized by Vivado Synthesis – Area: 8 ADD + 8 MUX – Area: 4 smaller ADD  4x smaller – Depth: O(n) – Depth: O(log(n))

Page 42 © Copyright 2012 Xilinx Vivado Key Enabling Technologies Analytical Place & Route Engine

initial random seed random moves optimal solution (not found)

not routable

Timing Cost f(x)Cost Timing best solution found

Placement Solution x (found by random moves and seeds)

Traditional P&R Vivado P&R “Cost” 3 dimensions: timing, congestion, 1 dimension: timing minimization Criteria wire length minimization

Primary “Simulated Annealing”: Random, Analytical: solves simultaneous Algorithm iterative search based on initial seed equations to minimize all dimensions Unpredictable Very predictable Runtime Due to random nature of algorithm. Manages congestion Exponential with congestion

Poor results as design approaches Will handle 10M+ logic cells Scalability 1M logic cells with predictable results

Page 43 © Copyright 2012 Xilinx Runtime Advantage of Analytical Place & Route

Up to 4x faster than alternative solutions 25

20 12h/MLC

15

Vivado ISE 10

Runtime (hours) CompetitorCompet 4.6h/MLC

5 More predictable runtimes 0 0 500,000 1,000,000 1,500,000 2,000,000 Design size (LC)

Based on a benchmark suite of 100+ designs

Page 44 © Copyright 2012 Xilinx Vivado Design Example 1.2 M LC Virtex-7 2000T Design*

ISE Vivado P&R runtime 13 hrs 5 hrs Memory usage 16 GB 9 GB

Wire length and congestion

Significantly reduced

Customer proven: All SSI customers using Vivado today! *Zynq Emulation Platform

Page 45 © Copyright 2012 Xilinx Vivado Design Example Kintex-7 325T Design

ISE Vivado Component usage 82% LUT, 98% DSP 94% LUT, 63% DSP

Congestion

Does not route Routes!

Analytical placer discovers a more optimal solution

Page 46 © Copyright 2012 Xilinx Vivado Design Example Virtex-7 485XT Design

ISE Vivado Component usage 27% LUT, 15% FF 27% LUT, 15% FF

Wire length

Could not meet timing Timing met!

More effective placement eases timing closure

Page 47 © Copyright 2012 Xilinx Vivado IDE

Design & Analysis Environment – IP Integrator & IP Catalog – Estimations at any stage of design flow – Implementation, incremental editing – Cross-probing / Tcl interaction – Floorplanner and Design Editor – Report and log browser

Page 48 © Copyright 2012 Xilinx XDC – Xilinx Design Constraints

XDC is an extension of Synopsys Design Constraints (SDC) – Standard SDC for timing, plus physical constraints – Constraint for Synthesis through P&R – Vivado sign-off static timing analysis (STA) Vivado Tcl Powerful debug and analysis – Fast custom timing reports XDC – What-If Analysis with STA – Extendable and customizable SDC Industry Standard Tool Control – Complete automation for design compiles – 3rd Party tools use same interface – Cross-platform scripting (Linux and Windows)

Page 49 © Copyright 2012 Xilinx Tcl-driven Environment

Interaction between Tcl and IDE views Direct in-memory access to the design database – Batch mode: start_gui or stop_gui at any time for graphical analysis – Analyze or even make changes

Page 50 © Copyright 2012 Xilinx Project vs. Batch Flows

Project Based Batch Based Rodin supports two flows: – Project IDE Script Script – Batch (Project-less) Project Flow create_project … read_verilog … add_files … read_vhdl … – Project infrastructure import_files … read_edif … – IDE or Tcl script … … launch_run synth_1 synth_design … – Reports automatically wait_on_run … report_timing_summary generated open_run … write_checkpoint … Batch Flow report_timing_summary opt_design launch_runs impl_1 place_design – No project infrastructure wait_on_run … report_timing_summary – Tcl based open_run … write_checkpoint … – Use GUI for visualization route_design report_timing_summary via start_gui write_checkpoint … – Must manually create reports and checkpoints

Page 51 © Copyright 2012 Xilinx Analysis at Schematic View

Create Schematic from selected Timing Path(s) View logic levels across critical timing paths Expand connectivity and select parent hierarchies

Page 52 © Copyright 2012 Xilinx Analysis at Placed Design View

Select Paths in Timing Results, view objects cross-selected in other windows

Page 53 © Copyright 2012 Xilinx Detailed Placement View

Zoom in for detailed device view – Timing paths display from actual pins of primitives Slice usage shown – Data flow through a slice displayed

Page 54 © Copyright 2012 Xilinx Analysis at Routed Design View

View Routing Resources View route used Cross-probing enabled Change layers & colors

Page 55 © Copyright 2012 Xilinx Vivado Implementation – Demo

Easy to Use – Single IDE to learn – Push-button flows – Task-based “Views”

Fast Synthesis

Fast Implementation

Rapid design analysis

Start Demo >

Page 56 © Copyright 2012 Xilinx Make Last Minute Design Changes without Re-building the Entire Design

Incremental Implementation T (hrs) – Ideal for small changes (<5%) – Timing preservation from run to run Full compile • Timing changes limited to modified areas Incremental Full compile – Runtime: 2.5x faster Incremental

Full compile Full compile

Post P&R with Device Editor – All FPGA elements are visible with exact placement and precise routing topology – Rapid post-route design editing • Make design changes and auto-route nets • Modify routing, placement, logic – GUI or Tcl based

Page 57 © Copyright 2012 Xilinx Hierarchical Design Flows: Design Reuse

Design Reuse Flow enables parallel implementation for Team Design – Place & Route modules without top level design – Assemble results with exact preservation – Leverages natively-hierarchical data model Package IP and reuse in new designs – Reuse module as a pre-verified placed & routed result

Design Reuse support starts at 2012.3

Page 58 © Copyright 2012 Xilinx Hierarchical Design Flows: Partial Reconfiguration

System Flexibility – Swap functions and perform remote updates while system is operational

Cost and Size Reduction – Time-multiplexing hardware requires a smaller FPGA – Reduces board space – Minimizes bitstream storage

Power Reduction – Via smaller or/and fewer devices – Swap out power-hungry tasks

Partial Reconfiguration support starts in 2013

Page 59 © Copyright 2012 Xilinx Power Optimization

Fine-grain clock gating reduces dynamic power by up to 30%

Before After

Access via Tcl command – power_opt_design Push button power reduction for the entire design Targeted optimizations for specified resources – Clock domains (set_power_opt -clocks) – BRAM, Registers, SRL (set_power_opt -cell_types) – Instances (set_power_opt -include_cells / -exclude_cells)

Page 60 © Copyright 2012 Xilinx Power Analysis

Accurate power and thermal analysis Power estimates at every stage after synthesis Analyze power by consumption type – Each view deconstructed with low-level details What-if analysis by varying switching activity Extensive debug capabilities via cross-probing Export data to Xilinx Power Estimator

Page 61 © Copyright 2012 Xilinx