Video & Imaging Acceleration Overview
Total Page:16
File Type:pdf, Size:1020Kb
Video & Imaging Acceleration Overview Sean Gardner [email protected] Sr. Marketing Mgr., Video & Imaging DCG August, 2019 © Copyright 2019 Xilinx What we’ll cover today… ˃ Market Overview Market challenges / pain points Challenges with present solution(s) Xilinx Two Pronged Strategy: Soft IP (high quality) and hardened ZU+ EV devices (high density) ˃ Xilinx DCG Video Soft IP strategy: High Quality Xilinx solutions competitive offering and positioning Xilinx delivers value to customers bottom line Video Software Solution Stack and Ecosystem Video Innovative Solutions with Adaptive Acceleration ‒ SocioNext new H.264 High Quality encoder ‒ Vnova Perseus +: Accelerating standard base codecs ˃ Xilinx DCG Video Zynq UltraScale+ EV strategy: High Density What solutions and platforms are out there and what’s coming Video Software Solution Stack Roadmap for enhancements for the VCU >> 2 © Copyright 2019 Xilinx 2 BIG challenges facing our Industry ˃Capital expenditures are out of control CAPEX – expenses in servers/hardware and Data Centers ˃Operating expenditures are now 3rd largest cost OPEX – expenses related to streaming/bandwidth or CDN costs >> 3 © Copyright 2019 Xilinx Live Video a $70B Market by 2021 $70B Worldwide by 2021 www.rethinkresearch.biz Published July 2019 © Copyright 2019 Xilinx Page 4 China is # 1 Mobile Live Video © Copyright 2019 Xilinx Page 5 Network Provisioning Costs are an Issue Provisioning for peak traffic will grow in inefficiencies 77% Live Video is the Most Expensive © Copyright 2019 Xilinx Page 7 Quick Comparison VOD vs. UGC (Live) 1000’s to 100,000’s files 10,000’s to 100 files CDN CDN Daily Hrs. per file RealTime Netflix may ingest 100 files daily Twitch has 100,000 concurrent streamers today Page 8 © Copyright 2019 Xilinx Compute costs up 35x (480p30 H.264 to 4kp30 AV1) Does not include massive growth in video traffic volume Chart Title Increase in codec Complexity 120 140 120 100 100 80 80 60 96x more pixels 25 x more complex 60 40 40 20 20 Increase in resolution x Increase in complexity x 0 0 SD (480) HD (720) FHD (1080) UHD (4k) 8k MPEG2 H.264 HEVC / VP9 AV1 © Copyright 2019 Xilinx Page 9 Revenue vs Costs (OPEX) Streaming Costs Unprofitable zone Profit zone © Copyright 2019 Xilinx Page 10 Bandwidth Costs (OPEX) Bandwidth % of Company Total Revenue Bandwidth Cost Revenue 4,130 RMB (M) 247.8 RMB (M) BiliBili $616M USD 6% $37M USD 4,663 RMB (M) 359 RMB (M) Huya $678M USD 9% $92M USD 15,764 RMB (M) 967 RMB (M) YY Inc. $1.5B USD 6% $144M USD 24,120 RMB (M) 2,318 RMB (M) iQIYI Inc. $3.6B USD 8.9% $346M USD 3496 RMB (M) 555.9 RMB (M) Douyu Inc. $520M USD 15.9% $82.8M USD 13,408 RMB (M) 804 RMB (M) Momo Inc. $1,9B USD 6% (est.) $117M USD © Copyright 2019 Xilinx 6.78 RMB to USD Conversion rate when not provided Distribution of Video Workloads (80/20 rule) 20% of Streams, 80% of bits 80% of Streams, 20% of bits # of Channels Ninja 600k Viewers OPEX CAPEX Challenge Challenge # of Viewers / Bandwidth Lowest Lowest cost cost per bit per channel Alfred E. Neuman 4 viewers Safe City 1k’s 10k’s Millions # of Streams >> 12 © Copyright 2019 Xilinx The 80/20 Rule Holds True (13% of Streams generate 74%) https://techcrunch.com/2019/07/12/twitchcontinuestodominatelivestreamingwithitssecondbiggestquartertodate/amp/ © Copyright 2019 Xilinx Distribution of Video Workloads (80/20 rule) Cost per bit optimized Cost per channel optimized Xilinx: Two Pronged Strategy Safe City High High Density # of Accelerators / Channels Transcoding Quality Online Gaming eSports eSports Smart Retail Live Video Live Video # of Viewers / Bandwidth Safe City 1k’s 10k’s 1000K # of Streams >> 14 © Copyright 2019 Xilinx DCG Video Soft IP Strategy High VQ (Video Quality) / Low bitrate strategy © Copyright 2019 Xilinx Xilinx has a Solution for all Video Workloads All live video traffic maps to one or both of these models OPEX Focused CAPEX Focused Lowest Bandwidth Highest Density (Cost per bit) (Cost per Channel) x264 Med. Preset Intel QSV # of Viewers Safe City # of Streams © Copyright 2019 Xilinx Page 16 Software encoders sacrifice efficiency for speed (fps) What they live with for realtime applications Best compression Worst throughput https://youtu.be/x9wn633vl_c Xilinx & x265 bitrate comparison at same PSNR Bitrate savings vs x265 Medium at the same PSNR 105% 100% 95% 14% less bits 17% less bits 90% 85% 80% 75% 70% 65% 60% x265 medium Xilinx HEVC x265 slow Note: 1920x1080 encoding done using E5 2666 Dual socket server used for CPU measurements © Copyright 2019 Xilinx Xilinx vs CPUs: 20x faster @ same quality Encode speed (fps) at same Quality 140 12x 12x 20x 120 100 80 60 40 20 0 x264 Very Slow XLX x265 Slow XLX Libvpx XLX Note: 1920x1080 encoding done using E5 2666 Dual socket server used for CPU measurements © Copyright 2019 Xilinx Why Bits matter : Bandwidth = Cost “Quarterly bandwidth costs increased by 66.8% to US$25.3 million 3Q2018 from same period of 2017, primarily due to bandwidth usage as a result of increased user base and enhanced live streaming video quality improvement” © Copyright 2019 Xilinx Page 20 PSNR ParkJoy 30 Xilinx vs Nvidia: 40% less bits @ same quality 29 28 27 26 25 24 40% lower bitrate for same quality NVenc HEVC NGcodec HEVC PSNR 23 22 0 2e+06 4e+06 6e+06 8e+06 1e+07 1.2e+07 1.4e+07 Tested using Nvidia P4 © Copyright 2019 Xilinx 2M 4M 6M 8M 10M 12M 30% Less Bandwidth Saves Millions of Dollars © Copyright 2019 Xilinx Page 22 Video Solutions Available Today on Alveo Codec Partner Description Channels per card H.264 HDE Alma High Density Encoder 12 x 1080p60 H.264 HQE Socionext High Quality Encoder 2 x 1080p60 H.264 HDD VYU Synch High Density Decoder 12 x 1080p60 HEVCHQE NGCodec High Density Encoder 2x 1080p60 (June 19) HEVCHDD Path Partner High Density Decoder 12 x 1080p60 HEVCHEIFD Path Partner HEIF Image Decoder 10 x 4kp15 VP9HQE NGCodec High Quality Encoder 2 x 1080p60 (July 19) Perseus+ VNova+NGCodec High Quality HEVC Encoder 1 x 4kp60 WebPE Xilinx High Density Encoder Resolution dependent ABR Scaler Xilinx Multichannel Scaler JPEGHDE Deepoly High Density Encoder Today (resolution dependent) JPEGHDD CTAccel High Density Decoder Today (resolution dependent) © Copyright 2019 Xilinx Xilinx Video Solution Stack : Seamless FFmpeg integration Customer Application FPGA h.264 encode FPGA HEVC FPGA VP9 encode Xilinx h.264 decode Xilinx ABR Scaler Xilinx Yolo plugin plugin encode plugin plugin plugin plugin (video codecs, scalers, compositing etc.) Xilinx Media Acceleration API Xilinx Runtime API Xilinx Accelerator Binary x86 Server Xilinx Alveo Accelerator Card © Copyright 2019 Xilinx NO FPGA EXPERIENCE REQUIRED ffmpeg \ ‐f rawvideo ‐pix_fmt yuv420p ‐s:v 1920x1080 ‐r 30 ‐an ‐i /home/ffmpeg/VU9P/TestSequences/Kimono1_1920x1080_ 24.yuv \ ‐frames 240 ‐c:v libx264 ‐preset medium ‐profile:v high ‐crf 23 ‐bf 4 ‐refs 3 ‐g 30 ‐b:v 4000k ‐maxrate 4000k ‐bufsize 8000k ‐f h264 ‐r 30 ‐y ./sw_outdir/x264_medium_out0_br4000k.h264 $ ffmpeg \ ‐f rawvideo ‐pix_fmt yuv420p ‐s:v 1920x1080 ‐r 30 ‐an ‐i /home/ffmpeg/VU9P/TestSequences/Kimono1_1920x1080_ 24.yuv \ ‐frames 240 ‐b:v 4000k ‐g 30 ‐c:v xlnx_h264_enc‐hq ‐f h264 ‐y ./hw_outdir/out0_br4000k.h264 $ ffmpeg \ ‐f rawvideo ‐pix_fmt yuv420p ‐s:v 1920x1080 ‐r 30 ‐an ‐i /home/ffmpeg/VU9P/TestSequences/Kimono1_1920x1080_ 24.yuv \ ‐frames 240 ‐b:v 4000k ‐g 30 ‐c:v xlnx_HEVC_enc ‐f h265 ‐y ./hw_outdir/out1_br4000k.h264 ˃ As simple as changing 20 characters to get acceleration https://trac.ffmpeg.org/wiki/EncodingForStreamingSites © Copyright 2019 Xilinx Video Innovative Solutions >> 26 © Copyright 2019 Xilinx Socionext H.264 Encoder © Copyright 2019 Xilinx Socionext uses Xilinx FPGA for next Gen Solution © Copyright 2019 Xilinx Socionext H.264 Enc. vs. x264 Enc. Very Slow setting © Copyright 2019 Xilinx Major testing to ensure high performance PSNR Testing VMAF Testing Page 30 © Copyright 2019 Xilinx FPGA H.264 Enc vs. Nvidia T4 H.264 Enc. VNova Perseus+ IP © Copyright 2019 Xilinx HD video services over any network Actual screen shot (27th May 2019) © Copyright 2019 Xilinx Page 33 PERSEUS: a new approach PERSEUS Plus VNova becoming a standard: PERSEUS Pro undergoing PERSEUS Plus is based on hierarchical image standardization representation that is far more efficient than the traditional as VC6/ST2117 blockbased codecs. Combining PERSEUS Plus with an existing base codec improves the overall quality, throughput and resiliency of the picture. PERSEUS Plus in process for “Low Complexity Codec Enhancements” © Copyright 2019 Xilinx Page 34 Perseus+ NGCodec improves overall performance 4kp60 on single VU9P 4kp60 on 4 x VU9P 4kp60 on 80 x x86 cores VNova Perseus+ NGC HEVC NGC HEVC only x265 Software (very slow preset) Best Performance Medium Performance Lowest Performance Lowest Cost Medium Cost Highest Cost Lowest Power Medium Power Highest Power © Copyright 2019 Xilinx Page 35 Why PERSEUS Plus Xilinx? Unbeatable Density Bandwidth savings Codec Agnostic • 4x increase in density on FPGA • Up to 50% more efficient • PERSUS Plus is codec agnostic • 50x denser than the equivalent • Live UHDp60 @8Mbps, • Works with h.264, HEVC, VP9 and softwareonly implementation 1080p60 @3Mbps even AV1 when available • UHDp60 in single FPGA. • Increase reach, improve quality • Maximum compatibility with existing of experience, reduce cost workflow © Copyright 2019 Xilinx Page 36 Xilinx DCG Video ZU+ EV strategy: High Density >> 37 © Copyright 2019 Xilinx Xilinx has a Solution for all Video Workloads All live video traffic maps to one or both of these models OPEX Focused CAPEX Focused Lowest Bandwidth Highest Density (Cost per bit) (Cost per Channel) x264 Med.