Accelerate High- Performance Real-Time Video & Imaging
Total Page:16
File Type:pdf, Size:1020Kb
Accelerate High- Performance Real-Time Video & Imaging Applications with FPGA & Programmable DSP Agenda • Overview • FPGA/PDSP HW/SW Development System Platform: TI EVM642 + Xilinx XEVM642-2VP20 • FPGA for Algorithm Acceleration: H.264/AVC SD Video Encoder • Xilinx MPEG-4 Codec Reference Design • Xilinx SysGen Co-Sim Design and C/C++ Copyright 2004. All rights reserved 2 Analysts See Explosive Growth in Digital Media Market Advanced Codec Unit Shipments (in millions) 250 200 Advanced Codecs 150 Include: MPEG-4 100 H.264 WMV9 50 0 2003 2004 2005 2006 2007 2008 Source: In-Stat/MDR, 6/04 Copyright 2004. All rights reserved 3 Using FPGAs and DSPs Together for Video Processing Codecs Application examples H.264 DSP only MPEG4 DSP + FPGA H.263 MPEG2 JPEG Many Coding Few Channels Encode decode encode / Simultaneous Decode Encode /decode QCIF CIF D1 SD HD Resolution Copyright 2004. All rights reserved 4 Targeted Video Applications Features Features Features 30fps CIF resolution encode Real-time 30fps TV/VGA Real-time 30fps TV/VGA & decode resolution encode & decode resolution encode & decode Integrated audio, video & Integrated audio, video & Integrated audio, video & streaming DSP controller streaming DSP controller streaming DSP controller Headroom available for feature Headroom available for High- Headroom available for enhancements Def feature enhancements & High-Def feature & codec extensions codec extensions enhancements & codec extensions Copyright 2004. All rights reserved 5 DSPs and FPGAs: Complementary Solutions • FPGAs Suitable for Parallel Data-Path Bound Functions/Problems • SW/HW Co-Design Inner-Loop Rule: “Any C/C++ that requires tight inner-loop assembly codes probably should be in hardware” • FPGAs typically complement programmable DSPs in high-performance real-time systems in one or more of the following ways: – System logic muxing and consolidation – New peripheral or bus interface implementation – Performance acceleration in the signal processing chain Copyright 2004. All rights reserved 6 FPGA Complements Programmable DSP FPGA as pre-processor FPGA as co-processor 148.5 MHz sample 20.74 MHz sample rate/pixel clock rate/pixel clock 20.74 MHz DM64x™ DM64x™ 3.04 3.04 MHz MHz Copyright 2004. All rights reserved 7 TI DSP Architecture Multiple bus architecture TMS320C64xTM L1 Program Cache – Large number of simultaneous Peripherals VelociTI.2TM inputs and outputs (avoids Instruction Fetch Control Interrupt Video Registers Control bottleneck in processing) Enhanced DMAController port-0 Instruction Dispatch Packet Boundary – Allows parallel fetching L2 Cache/Memory Advanced Video Span Emulation of an instruction and data port-1 Instruction Decode Extensive pipelining Video Data Path 1 Data Path 2 port-2 Register File A Register File B – Executes parts of several A15-A0 B15-B0 10/100 instructions in a single cycle Ethernet A31-A16 B31-B16 Special instructions MAC 66 MHz L1 S1 M1 D1 D2 M2 S2 L2 – Combines several operations PCI + + + + + + + + x + + x + + into a single cycle 64-Bit + + + + EMIF + + x x + + P F D A R X P F D A R X P F D A R X L1 Data Cache P F D A R X P F D A R X P F D A R X Courtesy of Texas Instruments Copyright 2004. All rights reserved 8 TMS320DM642/DM641/DM640 – Digital Media Processors TMS320DM642 DM642 DM641 DM640 2880/2400/2000 2400/2000 1600 Video MMACs MMACs MMACs port-0 L2 Cache/Memory 256KBytes L2: 256 KB L2: 128 KB L2: 128 KB L1P Cache Enhanced DMA Controller Video L1P: 16 KB L1P: 16 KB L1P: 16 KB 16 KBytes port-1 L1D: 16 KB L1D: 16 KB L1D: 16 KB 3 Video Ports 2 Video Ports 1 Video Port Video port-2 (20-Bits) (8-Bits) (8-Bits) C64xTM 8-bit McASP 4-bit McASP 4-bit McASP DSP Core McASP Ethernet MAC Ethernet MAC Ethernet MAC 10/100 32-bit HPI 16-bit HPI -- Ethernet MAC 66 MHz PCI -- -- 64-bit EMIF 32-bit EMIF 32-bit EMIF L1D Cache 32-bit HPI 16 KBytes 666mW @ 666mW @ 552mW @ 66 MHz PCI 500MHZ, 1.2V 500MHZ, 1.2V 400MHz, 1.2V (total internal) (total internal) (total internal EMIF 1.11W @ 600MHz 1.11W @ 600MHz power) 20051.4V (total Price internal) (10Ku)1.4V 2005 (total Price internal) (10Ku) 2005 Price (10Ku) $59.99 (720 $35.59 (600 $19.95 (400 MHz) 64-bit wide at 133 MHz MHz) MHz) $42.70 (600 $31.02 (500 SDRAM MHz) $37.95 MHz) (500 MHz) Courtesy of Texas Instruments Copyright 2004. All rights reserved 9 FPGAs for High-Performance Co-DSP Applications 80 MHz – 20 bit 1 GB /s 64-bit FPGA DSP • Internal Memory • Serial Interfaces • Internal memory • Serial Interfaces – BRAM – Rapid I/O, SR I/O – L1/L2 cache – McBSP – Video Line buffers/ – 10GigBit Ethernet memory – EMAC Ethernet deep and wide – HyperTransport – 256K RAM – Cache tag memory/dual – Configurable port RAM cache/RAM – Large FIFOs/packet buffers • Memory interface • Memory interface – SDRAM – SDRAM/DDR – 4 CE spaces Copyright 2004. All rights reserved 10 Virtex-4 SX The SX family emphasizes Xilinx commitment to Co-DSP applications by providing a strong skew toward dedicated arithmetic units versus logic 4VSX25 2,650 CLB 128 BRAM 128 XtremeDSP Slices to augment DSP Math Largest device - 4VSX55 6,144 CLB 320 BRAM 512 XtremeDSP Slices to augment DSP Math Copyright 2004. All rights reserved Agenda • Overview • FPGA/TI-DSP HW/SW Development System Platform: TI EVM642 + Xilinx XEVM642-2VP20 • FPGA for Algorithm Acceleration: H.264/AVC SD Video Encoder • Xilinx MPEG-4 Codec Reference Design • Xilinx SysGen Co-Sim Design and C/C++ Copyright 2004. All rights reserved 12 Xilinx Daughter Card & TI EVM Board Reference Design • TI TMS320DM642 EVM Board getting excitement and traction at TI DSP accounts – Texas Instruments TMS320DM642 specifically designed for video DSP • 720 MHz, 8 instructions per clock, 5.7 GIPS – Designed by Spectrum Digital • TI’s primary DSP Board and Board Sales Channel – Designed to support DSP algorithm development and demonstration for streaming video • Xilinx Daughter Card to augment DSP Math – P20, 88 x 200 MHz = 17 GMACs – P50, 232 x 200 MHz = 46 GMACs – SX55, 512 x 500 MHz = 256 GMACs – High bandwidth, large depth Frame Buffer Memory Copyright 2004. All rights reserved 13 FPGA as DSP Accelerator TI Analog st and Power • V2P20-50 1 G FPGA Modules nd FF1152 GPIO, JTAG, • V4SX25-55 2 G RS232, LEDs, Switches, etc. • H.264/AVC SD Codec Xilinx XEVM642-2VP20 daughter card • Microsoft WMV-9 (VC-9) SD Codec Spectrum Digital TI DM642 EVM Board Source: Spectrum Digital Copyright 2004. All rights reserved 14 TI EVM642 and Xilinx XEVM642-2VP20 Xilinx Spartan DM642 Copyright 2004. All rights reserved 15 XEVM642-2VP20 as a Video DSP Pre-processor FPGA Pre-processor XEVM642-2VP20 Board To Display EVM Interface TI DM642 EVM Board Copyright 2004. All rights reserved 16 XEVM642-2VP20 as a Video DSP Co-processor FPGA Co-processor XEVM642-2VP20 Board To Display Video Input EVM Interface TI DM642 EVM Board Copyright 2004. All rights reserved 17 FPGA VHDL Block Diagram FPGA HDD Top Level Flow Host_EDMA_Strobe DM642 EDMA State Machine FIFO HDD_Xfer_InProgress Host_Data_InOut HDD_Sector_Count Sectors HDD_Data_InOut HDD_DMA HDD_IORDY HDD DMA State Machine HDD_DMA • The FPGA consists of two functional blocks that control the arbitration of the DMA Engines of the HDD and the DM642 DSP – DM642 EDMA State Machine controls the continuous streaming of Data to/from the EMIF A Peripheral of the DM642 DSP – UltraDMA HDD State Machine controls the continuous streaming of Data from the FIFO to/from the HDD Sectors Copyright 2004. All rights reserved 18 Virtex-4 SX/DM642 Cable Box Statistical-Remultiplexor: Cable head-end box, residing with the local cable-service provider that - Takes in multiplexed digital channels - Encodes & inserts local programming content - Incorporates local/targeted advertisements - Outputs a specific subscriber’s cable package - Uses FPGAs in Design Input/Output Control Video Processing Units Satellite Transrating/transcoding 800 Mbps DVB ASI/ Virtex-4 SX PCI DM642 SDRAM DHEI Local Bus DVB ASI/ Shared DM642 SDRAM DHEI Virtex-4 SX RAM 12-18 Video DM642 SDRAM Host I/F Bridge Buffer #1 channels/ #2 Transponder #3 RISC ROM Ethernet 10/100Mbps Copyright 2004. All rights reserved 19 Video Solution Support • Video DSPs – Texas Instruments • Board & Manufacturing – Spectrum Digital, Inc. • DSP Algorithm Support – Ittiam – UBVideo – W&W Communications (DSP Research) • Xilinx Expert Design Services – Nuvation • Reference Designs Available – H.264/AVC SD Codec (Q1’05) – Multi-channel Video PIP + JPEG Web Server (Q4’04) • Contact Xilinx DSP Marketing for Details Copyright 2004. All rights reserved 20 Demo #1 Agenda • Overview • FPGA/PDSP HW/SW Development System Platform: TI EVM642 + Xilinx XEVM642-2VP20 • FPGA for Algorithm Acceleration: H.264/AVC SD Video Encoder • Xilinx MPEG-4 Codec Reference Design • Xilinx SysGen Co-Sim Design and C/C++ Copyright 2004. All rights reserved 22 Chronological Table of Video Coding Standards ITU-T H.263 H.263++ VCEG (1995/96 (2000) ) H.263+ H.261 (1997/98) H.264 (1990) MPEG-2 ( MPEG-4 (H.262) Part 10 ) (1994/95 MPEG-4 v1 (2003) ) ISO/IEC (1998/99) MPEG MPEG-4 v2 MPEG-1 (1999/00) (1993) MPEG-4 v3 (2001) 1990 1992 1994 1996 1998 2000 2002 2003 Copyright 2004. All rights reserved 23 Applications for H.264/AVC • Entertainment Video (1 - 8+ Mbps, higher latency) – Broadcast / Satellite / Cable / DVD / VoD / FS-VDSL / … • Conversational H.32X Services (usu. <1Mbps, low latency) – H.320 Conversational – 3GPP Conversational H.324/M – H.323 Conversational Internet/unmanaged/best effort IP/RTP – 3GPP Conversational IP/RTP/SIP • Streaming Services (usu. lower bit rate, higher latency) – 3GPP Streaming IP/RTP/RTSP