Embedded Processing Using FPGAs Agenda

• Why FPGA Platform Based Embedded Processing • Embedded Use Models And Their FPGA Based Solutions • Architecture/Topology Choices • A Reoccurring Question: Hardware Or • Reconfigurable Hardware • Tool Flows For FPGA Based Embedded Systems

2 - Embedded Processing using FPGAs www..com Agenda

• Why FPGA Platform Based Embedded Processing • Embedded Use Models And Their FPGA Based Solutions • Architecture/Topology Choices • A Reoccurring Question: Hardware Or Software • Reconfigurable Hardware • Tool Flows For FPGA Based Embedded Systems

3 - Embedded Processing using FPGAs www.xilinx.com Why Use Processors In the First Place • Microcontrollers (µC) and Microprocessors (µP) Provide a Higher Level of Design Abstraction – Most µC functions can be implemented using VHDL or - downsides are parallelism & complexity – Using /C++ abstraction & serial execution make certain functions much easier to implement in a µC • Discrete µCs are Inexpensive and Widely Used – µCs have years of momentum and software designers have vast experience using them

4 - Embedded Processing using FPGAs www.xilinx.com Why Embedded Design using FPGAs In Addition To The Universal Drive Towards Smaller Cheaper Faster With Less Power….

1 Difficult to Find the Required Mix of Peripherals in Off the Shelf (OTS) Microcontroller Solutions •2 Selecting a Single Core with Long Term Solution Viability is Difficult at Best •3 Without Direct Ownership of the Processing Solution, Obsolescence is Always a Concern •4 Many Microprocessor Based Solutions Provide Limited On-Chip Peripheral Support

5 - Embedded Processing using FPGAs www.xilinx.com Rarely the Ideal Mix

1• Difficult to Find the Required Mix of Peripherals in Off the Shelf (OTS) Microcontroller Solutions – Even with variants, compromises must still be made

Microcontroller #1 Microcontroller #2 System Requirements FLASHFLASH UARTUART FLASHFLASH UARTUART 2@ UART RAM UART DDR CAN TIMER RAM UART DDR CAN I2C GPIOGPIO SPISPI GPIOGPIO SPISPI CPU Core CPU CPU Core CPU ? SPI CPU Core CPU CPU Core CPU GPIO Timer I2C Timer FLASH Timer I2C Timer DDR SDRAM Lacks I2C & Includes Lacks a Second UART & RAM vs DDR SDRAM Includes Unnecessary IP – Must decide between inventory impact of multiple configurations or device cost of comprehensive IP

6 - Embedded Processing using FPGAs www.xilinx.com Changing Requirements

2• Selecting a Single Processor Core with Long Term Solution Viability is Difficult at Best – Future proofing system requirements is difficult at best

Microcontroller Today’s System Future System

Requirements FLASHFLASH UARTUART Requirements 100 MHz 150 MHz 2@ UART DDRDDR UARTUART 2@ UART TIMER TIMER I2C GPIOGPIO SPISPI I2C ? SPI SPI 100 MHz Core MHz 100 100 MHz Core MHz 100 GPIO I2CI2C TimerTimer GPIO FLASH FLASH DDR SDRAM Completely meets current DDR SDRAM 10/100 Ethernet system requirements – Changing processor cores to accommodate new requirements consumes valuable design resources

7 - Embedded Processing using FPGAs www.xilinx.com Here Today, Gone Tomorrow

3• Without Direct Ownership of the Processing Solution, Obsolescence is Always a Concern – A single core is used to create a family of µCs

Microcontroller Variant #1 - High Volume Automotive FLASHFLASH CANCAN Microcontroller Variant #2 - Moderate Volume Consumer RAMRAM FLASHFLASH UARTUART SPISPI Microcontroller Variant #3 - Lower Volume Niche GPIOGPIO RAMRAM FLASHFLASH SPISPI UARTUART I2CI2C CPU Core CPU CPU Core CPU

GPIOGPIO RAMRAM TimerTimer SPISPI UARTUART CPU Core CPU CPU Core CPU

GPIOGPIO TimerTimer SPISPI CPU Core CPU

CPU Core CPU ?

TimerTimer – The sheer number of µC configurations can lead to obsolescence of lower volume configurations/variants

8 - Embedded Processing using FPGAs www.xilinx.com Chipset Solutions

4• Many Microprocessor Based Solutions Provide Limited On-Chip Peripheral Support – Manufacturers create I/O devices to extend the functionality of the base processor core

SerialSerial Microprocessor Chip-Set ParallelParallel MPU Interface MPU MPU Interface MPU PWMPWM Pre-defined interface

Microprocessor limits performance Microprocessor I/O Expansion Device – An external, pre-defined interface between a µP and its support device limits overall system performance

9 - Embedded Processing using FPGAs www.xilinx.com FPGA Embedded Design

1• FPGAs Allow For The Implementation Of An Ideal Mix Of Peripherals And System Infrastructure 2• New System Requirements Can Be Supported Without Changing Core Design 3• Longevity Of FPGAs Approaches The Longest Available Microcontrollers In The Market 4• FPGAs Are Used To Augment µp Functionality - Absorbing The Core Is The Next Natural Step

10 - Embedded Processing using FPGAs www.xilinx.com Agenda

• Why FPGA Platform Based Embedded Processing • Embedded Use Models And Their FPGA Based Solutions • Architecture/Topology Choices • A Reoccurring Question: Hardware Or Software • Reconfigurable Hardware • Tool Flows For FPGA Based Embedded Systems

11 - Embedded Processing using FPGAs www.xilinx.com Processor Use Models

1 2 3

State Machine Microcontroller Custom Embedded • Lowest Cost, No • Medium Cost, Some • Highest Integration, Peripherals, No RTOS Peripherals, Possible Extensive Peripherals, & No Bus Structures RTOS & Bus Structures RTOS & Bus Structures • VGA & LCD Controllers • Control & Instrumentation • Networking & Wireless • Low/High Performance • Moderate Performance • High Performance

Range of Use Models

12 - Embedded Processing using FPGAs www.xilinx.com Xilinx Processor Solutions

1 2 3

State Machine Microcontroller Custom Embedded

Range of Solutions

13 - Embedded Processing using FPGAs www.xilinx.com PicoBlaze 8-Bit Microcontroller Reference Design

• Free PicoBlaze macro for implementing complex state machines and micro sequencers • Supports Spartan-3, Virtex-4, Virtex-II/Pro FPGAs and CoolRunner-II CPLDs • Minimal logic size – Only 96 Spartan-3 slices, 5% of XC3S200 device • High performance – Up to 44 MIPS in Spartan-3 and 100 MIPS in Virtex-4 • 100% embedded capability – Everything in FPGA – no external components required • http://www.xilinx.com/picoblaze

14 - Embedded Processing using FPGAs www.xilinx.com • 3-Stage Parallel Pipeline MicroBlaze • Optional Instruction and Data Caches – 2KB to 64KB Each 123 DMIPS @ 150 MHz in • Optional Multiply Unit Virtex-II Pro (0.82 DMIPs/MHz) • 32 x 32-bit GPRs Program • Dedicated Fast Simplex Links - FSL Counter Execution Unit ( Add, Sub, Shift, – 32-bit, Point-to-Point Links Logical, Multiply ) between GPRs in the Register File and User Logic Instruction – 8 Input & 8 Output Decode ( includes – Blocking & Non-Blocking inst buffer ) FSL Register File Transfers – Predefined Insts ( 32 X 32b ) • Local Memory Bus Interfaces - LMB Data-Side Bus Interface Data-Side Local MemoryLocal - Bus LMB Local MemoryLocal - Bus LMB Data-Side Bus Interface Data-Side Local MemoryLocal - Bus LMB Local MemoryLocal - Bus LMB • Provide Access to Inst/Data Instruction-Side Bus Interface

Instruction-Side Bus Interface I-Cache D-Cache Memory Without Bus Access ( 2KB to 64KB ) ( 2KB to 64KB ) • 100 MHz+ On-Chip Peripheral Bus – 32-bit Semi-Duplex Bus Firm IP Core Relatively Placed Macro (RPM) • Fast access “Cache Link” interface Remaining FPGA Fabric On -Chip Peripheral Bus - OPB Cache Link

15 - Embedded Processing using FPGAs www.xilinx.com PowerPC Processor Block

10/100/1000 InstructionInstruction SideSide EMAC On Chip Memory OnFPGA Chip Logic Memory Core Client PHY InterfaceInterface

Processor Block ISOCM Control Auxiliary ISOCM Processor EMAC Statistics APU APU FCM Unit Control PowerPC 405 Core DCR Host I/F ISPLB Processor Local

DSPLB Bus Interface

DCR DCR Statistics EMAC Control DSOCM Device Configuration DSOCM Reset, Register Control Debug Register InterfaceInterface

10/100/1000 Client PHY Ext DCR Data Side EMAC On Chip Memory Core InterfaceInterface

16 - Embedded Processing using FPGAs www.xilinx.com Integrating a PowerPC 405

• IP-Immersion™ • ActiveInterconnect™ – Advanced Technology Metal – Segmented Routing Enables High “Headroom” Enables True IP Immersion Bandwidth Interface –1 IP-Immersion Tiles Provide Direct –2 General Purpose Routing Passes Connectivity to Virtex-II Pro Fabric Over Hard IP (Hex & Long)

Metal 9 Metal 8 Metal 7 2 Metal 6 Metal 5 Metal 4 Metal 4 Metal 4 Metal 3 Metal 3 Metal 3 Embedded Metal 2 Metal 2 Metal 2 Tile Tile PowerPC 405 1 Metal 1 Metal 1 Metal 1 IP-Immersion IP-Immersion Poly IP-Immersion Poly Poly

Silicon Substrate Replaces only 1,024 LUTs and 8 Block RAMs

17 - Embedded Processing using FPGAs www.xilinx.com UltraController

Easy to use 5-port HDL module JTAG sys_clk sys_rst

PowerPC 405 ResetReset JTAGJTAG

BRAM BRAM gpio_in ISOCM DSOCM ISOCM DSOCM gpio_out

Code stored in block RAM Virtex-II Pro Fabric

Simple processor/fabric interface uses minimal FPGA resources

18 - Embedded Processing using FPGAs www.xilinx.com UltraController II

Easy to use HDL module Interrupt

sys_clk sys_rst_out

RR PowerPC 405 ee sys_rst ss ee tt JJ gpio_in JTAG TT

A ISOCM

A DSOCM ISOCM DSOCM gpio_out GG

Code Loaded FPGAFPGA Fabric Fabric and stored in Cache Operates at maximum CPU frequency Simple processor/fabric interface uses minimal FPGA resources

19 - Embedded Processing using FPGAs www.xilinx.com Virtex 5 Processor Block

• 5 x 2 Crossbar Connection (128-bit) including Arbiters – Configure through bit-stream or DCR – Up to1:1 freq w/ processor • 1 PLB master interface to FPGA fabric Processor Block DMA LocalLink • 2 PLB slave interfaces from FPGA fabric DMA LocalLink • PLB Master/Slave ports PLB-S0 I/F – operate at diff freq 440 FCM APU Core Memory Ctlr I/F • 4 Thirty-two bit DMA Channels Interface Control IPLB DRPLB PLB-M I/F • Memory Ctlr Interface to connect DWPLB soft logic based Memory controller CPM/ PLB-S1 I/F • Auxiliary Processing Unit Controller Control

– 128 bit interface, pipelining supported DCR DMA LocalLink Internal • DCR Controller Module Registers DMA LocalLink

DCR Slave DCR Master • CPM and Control Module Interface Interface

20 - Embedded Processing using FPGAs www.xilinx.com The System Infrastructure

Bus infrastructure ranges from logic efficient to maximum bandwidth

MasterMaster MasterMaster MasterMaster MasterMaster MasterMaster MasterMaster

3-State3-State Bi-directionalBi-directional BusBus CentralCentral ArbitrationArbitration ControlControl

ArbArb ArbArb

SlaveSlave SlaveSlave SlaveSlave SlaveSlave SlaveSlave SlaveSlave

Bi-directional Bus Centrally Arbitrated Slave-side Arbitration Single channel of communication, Full duplex communication, N-channels of communication, least amount of logic required and moderate amount of logic required most expensive logic requirement moderate routing requirement and minimal routing requirement and extensive routing required

21 - Embedded Processing using FPGAs www.xilinx.com System Peripherals - IP

ArbiterArbiter PrimaryPrimary BusBus BridgeBridge SecondarySecondary BusBus ArbiterArbiter

Slower peripherals such as I/O functions are typically placed 10/10010/100 DDRDDR SPISPI WatchdogWatchdog UARTUART IntcIntc on secondary bus structures EthernetEthernet SDRAMSDRAM EEPROMEEPROM TimerTimer

Infrastructure I/O Based Peripherals Memory Interfaces High Level Functions High speed memory and system interfaces are typically placed on primary bus structures

22 - Embedded Processing using FPGAs www.xilinx.com High Levels of Integration

PowerPC 405 Data Control Register Bus - DCR SystemSystem ResetReset BRAM UserUser PeripheralPeripheral ISOCM DSOCM ISOCM DSOCM UserUser LogicLogic IPIFIPIF JTAGJTAG (master(master oror BlockBlock slave)slave) Instruction Data

Arbiter Processor Local Bus - PLB Bridge Secondary PLB Arbiter

10/100/1G10/100/1G MemoryMemory 10/10010/100 PCIPCI 32/3332/33 GPIOGPIO UARTUART EthernetEthernet ControllerController EthernetEthernet

MemoryMemory DDR SDRAM

SDRAM ZBT SSRAM Full System Customization & High Performance

23 - Embedded Processing using FPGAs www.xilinx.com Complete Customization

Dual Port Instruction-Side Dual Port Data-Side Local Local Memory Bus BlockBlock RAMRAM Memory Bus

MicroBlaze SystemSystem UserUser ResetReset PeripheralPeripheral

JTAG Inst LMB JTAG Data LMB IPIF Inst Inst LMB IPIF Data Data LMB BlockBlock

Processor Local Bus - PLB Arbiter

MemoryMemory EthernetEthernet UART1UART1 UARTUART 7 7 SPISPI ControllerController oror CANCAN

FPGA Fabric DDR SDRAM ZBT SSRAM SDRAM

24 - Embedded Processing using FPGAs www.xilinx.com Gigabit System Reference Design (GSRD) • Designed for maximum TCP/IP performance with the PowerPC 405 • GSRD building blocks – LocalLink Gigabit Ethernet MAC – Communications DMA Controller – Multi Port Memory Controller • High Performance TCP transmit – Over 900Mbps (with Treck TCP/IP stack) • Applications – High performance bridging – TCP/IP and user data interfaces – IP over GigE Ethernet Switching – High Speed Remote Monitoring – Remote Image/Video Capture

25 - Embedded Processing using FPGAs www.xilinx.com Multiported Memory Controller

26 - Embedded Processing using FPGAs www.xilinx.com Today You Have Multiple Topology Choices

Custom Local Memory Coprocessors

JTAG Point-to-Point MicroBlaze or PowerPC Debug Trace Connections for Shared Bus for higher performance PLBv46 smaller area Multi Port Interrupt Memory Controller Controller DMA Timer/PWM

Ethernet MAC I2C/SPI

PCI UART

Generic Peripheral Controller GPIO

CAN/MOSTGPIO Custom I/O GPIO Peripherals USBGPIOGPIO 2.0 Virtex or Spartan FPGA

27 - Embedded Processing using FPGAs www.xilinx.com Multicore Topologies

Memory Map. SW responsibility for non- PLBv46 PLBv46 conflicting resource partition XPS_INTC XPS_INTC Private boot memory Private boot memory Data_IRQ_A Data_IRQ_B

XPS BRAM XPS BRAM Shared external memory

Interrupt Interrupt Pass small sized messages between a IPLB2 IPLB IXCL IPLB MPMC DPLB DPLB2 DPLB DXCL sender and a receiver (External Memory) PPC405 MicroBlaze Partitioning clocking PLB XCL domains Use synchronous-polling or PLB XCL assync interrupts LL PLB Treatment of CPU and Free LL system reset

Other LL XPS_LLTEMAC Peripheral

_Data_IRQ_A _Data_IRQ_B Processor Subsystem 1 Send FIFO Processor Subsystem 2

Other XPS Receive FIFO Other XPS Peripherals Peripherals UART, GPIO, TIMER XPS_Mailbox Core UART, GPIO, TIMER

Mutex array

XPS_Mutex Core

Shared BRAM memory

CPUs with shared XPS BRAM You can still use shared resources need to sync. memory techniques with BRAM Configure number of

PLB_v46 BRIDGE mutex registers to test and set

28 - Embedded Processing using FPGAs www.xilinx.com Virtex 5 Processor Block

• 5 x 2 Crossbar Connection (128-bit) including Arbiters – Configure through bit-stream or DCR – Up to1:1 freq w/ processor • 1 PLB master interface to FPGA fabric Processor Block DMA LocalLink • 2 PLB slave interfaces from FPGA fabric DMA LocalLink • PLB Master/Slave ports PLB-S0 I/F – operate at diff freq 440 FCM APU Core Memory Ctlr I/F • 4 Thirty-two bit DMA Channels Interface Control IPLB DRPLB PLB-M I/F • Memory Ctlr Interface to connect DWPLB soft logic based Memory controller CPM/ PLB-S1 I/F • Auxiliary Processing Unit Controller Control

– 128 bit interface, pipelining supported DCR DMA LocalLink Internal • DCR Controller Module Registers DMA LocalLink

DCR Slave DCR Master • CPM and Control Module Interface Interface

29 - Embedded Processing using FPGAs www.xilinx.com FPGA logic

FPGA logic FPGA logic

Significant reduction in amount of FPGA logic required to buil d a system using integrated functionality of Virtex -5 Processor Block

30 - Embedded Processing using FPGAs www.xilinx.com Agenda

• Why FPGA Platform Based Embedded Processing • Embedded Use Models And Their FPGA Based Solutions • Architecture/Topology Choices • The Reoccurring Question: Hardware Or Software • Reconfigurable Hardware • Tool Flows For FPGA Based Embedded Systems

31 - Embedded Processing using FPGAs www.xilinx.com Two Ways to meet Performance Requirements

APU or FSL 5 GHz 5 MHz

CPU HW Accele FPGAFPGA rator µµCC

Brut Force Intelligent

33 - Embedded Processing using FPGAs www.xilinx.com Co-Processor Interfacing

FPGA,FPGA, ASICASIC Conventional µC µC oror ASSPASSP approach Bus Interface

• Fast • Deterministic • Easy Xilinx APU Solutions FSL implementation Co-Processor / Co-Processor / Peripheral Peripheral

34 - Embedded Processing using FPGAs www.xilinx.com Conventional vs. APU Implementations

Bus Conventional Co-ProcessorCo -Processor PowerPC BUS InterfaceInterface Logic Approach BUS Interface Logic Virtex-II PRO Logic

Overhead required to connect soft logic

Benefits APU centric • Less overhead logic APU approach PowerPC Controller Logic • Faster transfer Virtex-4 • Efficient software implementation

35 - Embedded Processing using FPGAs www.xilinx.com Bus Cycle Improvement via APU

Wr Execution Res With APU

Write Operand 1 Write Operand 2 Execution Read Result Read Status Without APU

Reduce number of bus cycles by factor of 10 by using APU

36 - Embedded Processing using FPGAs www.xilinx.com Up to 13x Performance Boost PPC 405 Single Precision FPU

Performance Algorithm Software Emulation of Single Precision Speed Up Floating Pt * FPU

FIR Filter 846.09 ms 63.94 ms 13x

Video Editing Algorithm 248.38 ms 27.93 ms 9x

PID Loop 1.66 us 0.37 us 4x

1024pt FFT 19.48 ms 5.42 ms 4x

(*C-Code is compiled using IBM Performance Libs delivered with EDK 8.2i)

www.xilinx.com Floating Point Accelerated by APU

• Algorithm description - FPU w/ FIR filter • PowerPC operating frequency – 400 MHz • APU with direct interface to fabric- XtremeDSP blocks

SW 2MFlops

PLBPLB ++ FPUFPU 2121 MFlopsMFlops

OCMOCM ++ FPUFPU 3030 MFlopsMFlops

APUAPU ++ FPUFPU 4545 MFlopsMFlops (est)(est)

APU improves algorithm execution 20X over software emulationemulation

38 - Embedded Processing using FPGAs www.xilinx.com Accelerated MPEG De-Compression Example: Video application - IDCT MPEG de-compression

• Leverages Virtex-4 Integrated Features PLB – PowerPC, APU, XtremeDSP Slice – Software Emulation – 13000 IDCTs/sec • HW acceleration over software – Lower latency and high bandwidth APU Soft PowerPC Auxiliary Controller • Efficient HW/SW design partitioning APU FPGA Processor – Optimized implementation I/F Interface ExtermeDSP • Significant performance increase ExtremeDSP Hard Processor Block – 20x improvement over software ExtremeDSP – Utilizing APU and ExtremeDSP OCM FPGA Fabric

39 - Embedded Processing using FPGAs www.xilinx.com Tailor the System to Achieve Performance Take MP3 Decoding with Custom Hardware Logic

100MHz100MHz MicroBlazeMicroBlaze TMTM ,, purepure softwaresoftware 1X == 146146 secondsseconds IMDCT DCT LL MAC

100MHz100MHz MicroBlazeMicroBlaze TMTM +FSL++FSL+ LLLL MACMAC 16X == 99 secondsseconds

TM Custom Hardware Logic HardwareCustom 100MHz100MHz MicroBlazeMicroBlaze TM +FSL+FSL +DCT+DCT ++ IMDCTIMDCT ++ LLLL MACMAC 21X == 77 secondsseconds 1x 2x 8x 10x 20x 50x 100x Performance Improvement

40 - Embedded Processing using FPGAs www.xilinx.com Note: MicroBlaze TM v4.00 core, ML40x board, 100MHz system clock, EDK8.1 An Example: MicroBlaze TM FPU in Motor Control

Function Configuration Improvement Fault GPIO Calls/Second MicroBlaze Feedback MicroBlaze TM UART w/FPU Over 82,000 Over 15X with FPU SPI Motor TM Contrl. MicroBlaze BRAM

Power Stage Power Over 5,300 1X CAN without FPU

IIC

Quad Isolation Ethernet Decoder

Memory HALL Interface Decoder Benefit: XC3S1500 The customer now have the extra processor bandwidth for use on other things. Extra cycles needed for specialized peripherals

• Design implications: – You will get more done for a given clock frequency – Reducing the clock frequency (if possible) reduces power dissipation

Note:41 - Embedded Used MicroBlaze Processing usingTM v4.00 FPGAs core on Spartan 3S1500.www.xilinx.com Auxiliary Processing Solutions

PLB Fabric Floating Point Control Application Module Unit Specific Hard Processor Block Interface -V4

APU Auxiliary 405 Core APU I/F Auxiliary Control Processor Processor FSL C to HDL

Poseidon Design OCM Impulse Systems General Celoxica Solution FPGA Fabric Technologies Solution

42 - Embedded Processing using FPGAs www.xilinx.com C to HDL Tools

Code Profile • Create Code • Profile Code • Identify Critical Code Segments

• Translate Critical Code to HDL HDL • Build HW Accelerators from HDL • Replace Critical Code with Calls to HW Accelerators Code Accelerator Call

43 - Embedded Processing using FPGAs www.xilinx.com Programming Model • System of Independent sequential execution units – Called Processes – Can be executed in HW or SW and run concurrently • Processes communicate with each other – Via self-synchronizing FIFO buffers called streams, or – Via shared memories

SW Only SW FPGA Fabric Faster Producer Producer Execution Function Function Stream Function

Co Developer™ Critical Critical Function

Function PROCESS Function Consumer Consumer Function Stream PROCESS PROCESS

Sequential execution only execution Sequential Function Function Concurrent Software and Hardware execution

46 - Embedded Processing using FPGAs www.xilinx.com Impulse C-HDL Tool Flow

47 - Embedded Processing using FPGAs www.xilinx.com Agenda

• Why FPGA Platform Based Embedded Processing • Embedded Use Models And Their FPGA Based Solutions • Architecture/Topology Choices • A Reoccurring Question: Hardware Or Software • Reconfigurable Hardware • Tool Flows For FPGA Based Embedded Systems

48 - Embedded Processing using FPGAs www.xilinx.com What is Partial Reconfiguration?

• Dynamically modify blocks of logic in an FPGA by downloading partial bit files while unmodified logic continues to operate

• Logic is LUTs, FFs, BRAMs, DSP blocks, MGTs...

FPGA

Reconfig Block

49 - Embedded Processing using FPGAs www.xilinx.com PR Applications Today • JTRS Software Defined Radio – SWaP and multi-channel concurrency

• Microwave Communications – Flexibility and extended functionality • Cryptography – Physical isolation in one FPGA

50 - Embedded Processing using FPGAs www.xilinx.com SDR Development Platform

RJ45 – 100BT RJ45 – 100BT RJ45 – 100BT (Data) (Voice) (Ctlr)

Ethernet Switch

Ethernet interface Core Connect

WAVEFORM B WAVEFORM WAVEFORM A A WAVEFORM

Waveform combiner Interpolation filter Decimation filter TX NCO RX NCO D/A Interface A/D Interface (14 bits -- 300 MHz) (12 bits -- 200 MHz)

DAC ADC

51 - Embedded Processing using FPGAs www.xilinx.com PR Application: Dynamic Network Interface Protocol Switching

Config Memory Storage FPGA 10 GigE tx/rx 10 GigE tx/rx

OC48 tx/rx OC48 tx/rx Fibre tx/rx Switch uP Custom tx/rx Fibre tx/rx

Custom tx/rx

52 - Embedded Processing using FPGAs www.xilinx.com Partial Reconfiguration Applications

• Reduce FPGA size by multiplexing FPGA hardware resources

External Memory with full system configuration and PR bit streams

Memory Waveform D Waveform C Waveform B Waveform A

CPU

Reconfiguration Port Internal configuration access port

54 - Embedded Processing using FPGAs www.xilinx.com Agenda

• Why FPGA Platform Based Embedded Processing • Embedded Use Models And Their FPGA Based Solutions • Architecture/Topology Choices • A Reoccurring Question: Hardware Or Software • Reconfigurable Hardware • Tool Flows For FPGA Based Embedded Systems

55 - Embedded Processing using FPGAs www.xilinx.com Necessary Tools

• A Full Complement of Tools are Required to Design an Embedded Processor System – Processor System Generation – Hardware Implementation & Software Compilation – Hardware & Software Debug • HW Implementation & SW Compilation are the two Main Flows that Must be Addressed – The Embedded Flows Should Mirror Traditional Flows

56 - Embedded Processing using FPGAs www.xilinx.com System Generation

Tools are needed to create the system infrastructure, connect peripherals and clock configure the peripherals as desired interrupts reset CoreCore

ArbiterArbiter PrimaryPrimary BusBus BridgeBridge SecondarySecondary BusBus ArbiterArbiter

Databits = 8 FLASHFLASH TimerTimer UARTUART Baudrate = 9600 ? Parity = Off

SRAMSRAM Once the system is defined, the hardware can be implemented and software compiled

57 - Embedded Processing using FPGAs www.xilinx.com 58 - Embedded Processing using FPGAs www.xilinx.com Co-processor Creation Auto-generated HW & SW Templates

HW instantiated Test application code

SW Drivers created

61 - Embedded Processing using FPGAs www.xilinx.com Hardware Implementation

Source Code SynthesisSynthesis VHDL or Verilog

System Netlist Standard FPGA HW Development Flow

Constraints BuildBuild && MapMap Simulation/Synthesis

Build & Map Merged & Mapped Design Place & Route

Memory Map for Configuration File Place & Route Internal Code Storage Place & Route

Configuration File

62 - Embedded Processing using FPGAs www.xilinx.com Eclipse Based SDK for the Software Savvy Engineer

Powerful Software Development • Eclipse open source Software IDE allows an industry standard method for seamless tool integration • Consists of: • Perspectives • Views • Editors • Environment window displays one or more perspectives • A perspective contains views (like the Navigator) and editors

63 - Embedded Processing using FPGAs www.xilinx.com Software IP

Board Support • Device drivers are provided for Package (BSP) each hardware device • Device drivers are written in ANSI RTOS Adaptation Layer C and are designed to be portable across processors • Device drivers allow the user to

Driver select the desired functionality to UART 16550 UART Device Driver Device Device Driver Device Device Driver Device Device Drivers Device

Ethernet 10/100 Ethernet minimize the required memory IIC IIC MasterSlave & ATM Utopia Device ATM Peripheraln, n+1… Initialization Code • Device drivers in all layers have Boot Code common characteristics and API

64 - Embedded Processing using FPGAs www.xilinx.com Board Support Package

ArbiterArbiter PrimaryPrimary BusBus BridgeBridge SecondarySecondary BusBus ArbiterArbiter libxil.a

XEmac_mWriteReg ( … ); XEmac_SendFrame ( … ); DDRDDR 10/10010/100 SPISPI UARTUART TimerTimer IntcIntc ... SDRAMSDRAM EthernetEthernet EEPROMEEPROM XSpi_mSendByte ( … ); XSpi_GetStatusReg ( … ); ... xtmrctr.h XUartLite_SendByte ( … ); xemac.h XUartLite_GetStats ( … ); XTmrCtr_mWriteReg ( … ); ... XEmac_mWriteReg ( … ); XTmrCtr_IsExpired ( … ); XTmrCtr_mWriteReg ( … ); XEmac_SendFrame ( … ); XTmrCtr_IsExpired ( … ); ... XIntc_mEnableIntr ( … ); ... … uartlite.h ... XIntc_mEnableIntr ( … ); ... xspi.h XUartLite_SendByte ( … ); XUartLite_GetStats ( … ); XSpi_mSendByte ( … );... XSpi_GetStatusReg ( … ); ...

Board Support Packages (BSPs) are collections of parameterized drivers for a specific processor system

65 - Embedded Processing using FPGAs www.xilinx.com Software Compilation

C Code CompilerCompiler Source Code

Standard Embedded Assembly Code SW Development Flow

C/C++ Cross Compiler AssemblerAssembler Source Code

Assembler Relocatable Linker Object Code

Machine Code Mapfile and Linker Linker Linker Script

Machine Code

66 - Embedded Processing using FPGAs www.xilinx.com System Design Flow

C Code VHDL or Verilog

Standard Embedded Embedded Standard FPGA SW Development Flow Developers Kit HW Development Flow

Code Entry Board Support HDL Entry System Netlist Package Instantiate the C/C++Include Cross the Compiler BSP Simulation/Synthesis and Compile the ‘System Netlist’ Software Image and Implement Linker Data2MEM Implementationthe FPGA

? 2 Compiled ELF 3 Compiled BIT 1 ?

Download Combined Load Software Download Bitstream Image to FPGA Into FLASH Into FPGA

Debugger Chipscope

RTOS, Board Support Package

67 - Embedded Processing using FPGAs www.xilinx.com HW & SW Debug

Internal & external logic analysis is JTAG Port used to determine where problems reside Debug Port clock interrupts IntegratedIntegrated reset CoreCore BusBus AnalyzerAnalyzer

ArbiterArbiter PrimaryPrimary BusBus BridgeBridge SecondarySecondary BusBus ArbiterArbiter

SW debug typically IntegratedIntegrated FLASHFLASH SRAMSRAM TimerTimer UARTUART consists of JTAG based LogicLogic AnalyzerAnalyzer runtime control of the target processor core

External Connection

68 - Embedded Processing using FPGAs www.xilinx.com “Platform Debug” Integrated HW/SW Debuggers

• Cross Trigger HW and SW Debuggers to Find and Fix Bugs Faster!

• Enable better insight into the HW / SW code dynamics

69 - Embedded Processing using FPGAs www.xilinx.com Key Messages

• FPGA Based Embedded Systems Solve Classic Integration Issues (Size, Cost, Power) • They Also Creating Novel Solutions Not Easily Realized Using Other Technologies • FPGA Platforms Enable Flexible and Dynamic Systems • Intelligent Tools Accelerate Development And Enable You To Optimize The Balance Of Feature Set, Performance, Area & Cost Of Your Design

72 - Embedded Processing using FPGAs www.xilinx.com Q&A

73 - Embedded Processing using FPGAs www.xilinx.com