Video & Imaging Acceleration Overview Sean Gardner [email protected] Sr. Marketing Mgr., Video & Imaging DCG August, 2019
© Copyright 2019 Xilinx What we’ll cover today…
˃ Market Overview Market challenges / pain points Challenges with present solution(s) Xilinx Two Pronged Strategy: Soft IP (high quality) and hardened ZU+ EV devices (high density) ˃ Xilinx DCG Video Soft IP strategy: High Quality Xilinx solutions competitive offering and positioning Xilinx delivers value to customers bottom line Video Software Solution Stack and Ecosystem Video Innovative Solutions with Adaptive Acceleration ‒ SocioNext new H.264 High Quality encoder ‒ Vnova Perseus +: Accelerating standard base codecs ˃ Xilinx DCG Video Zynq UltraScale+ EV strategy: High Density What solutions and platforms are out there and what’s coming Video Software Solution Stack Roadmap for enhancements for the VCU
>> 2 © Copyright 2019 Xilinx 2 BIG challenges facing our Industry
˃Capital expenditures are out of control CAPEX – expenses in servers/hardware and Data Centers
˃Operating expenditures are now 3rd largest cost OPEX – expenses related to streaming/bandwidth or CDN costs
>> 3 © Copyright 2019 Xilinx Live Video a $70B Market by 2021
$70B Worldwide by 2021
www.rethinkresearch.biz Published July 2019
© Copyright 2019 Xilinx Page 4 China is # 1 Mobile Live Video
© Copyright 2019 Xilinx Page 5 Network Provisioning Costs are an Issue
Provisioning for peak traffic will grow in inefficiencies
77% Live Video is the Most Expensive
© Copyright 2019 Xilinx Page 7 Quick Comparison VOD vs. UGC (Live)
1000’s to 100,000’s files 10,000’s to 100 files CDN CDN Daily
Hrs. per file RealTime
Netflix may ingest 100 files daily Twitch has 100,000 concurrent streamers today
Page 8 © Copyright 2019 Xilinx Compute costs up 35x (480p30 H.264 to 4kp30 AV1)
Does not include massive growth in video traffic volume
Chart Title Increase in codec Complexity
120 140
120 100
100 80 80 60 96x more pixels 25 x more complex 60
40 40
20 20 Increase in resolution x Increase in complexity x
0 0 SD (480) HD (720) FHD (1080) UHD (4k) 8k MPEG2 H.264 HEVC / VP9 AV1
© Copyright 2019 Xilinx Page 9 Revenue vs Costs (OPEX)
Streaming Costs
Unprofitable zone Profit zone
© Copyright 2019 Xilinx Page 10 Bandwidth Costs (OPEX)
Bandwidth % of Company Total Revenue Bandwidth Cost Revenue 4,130 RMB (M) 247.8 RMB (M) BiliBili $616M USD 6% $37M USD
4,663 RMB (M) 359 RMB (M) Huya $678M USD 9% $92M USD
15,764 RMB (M) 967 RMB (M) YY Inc. $1.5B USD 6% $144M USD
24,120 RMB (M) 2,318 RMB (M) iQIYI Inc. $3.6B USD 8.9% $346M USD
3496 RMB (M) 555.9 RMB (M) Douyu Inc. $520M USD 15.9% $82.8M USD
13,408 RMB (M) 804 RMB (M) Momo Inc. $1,9B USD 6% (est.) $117M USD
© Copyright 2019 Xilinx 6.78 RMB to USD Conversion rate when not provided Distribution of Video Workloads (80/20 rule) 20% of Streams, 80% of bits 80% of Streams, 20% of bits # of Channels
Ninja 600k Viewers OPEX CAPEX Challenge Challenge # of Viewers / Bandwidth Lowest Lowest cost cost per bit per channel
Alfred E. Neuman 4 viewers Safe City 1k’s 10k’s Millions # of Streams >> 12 © Copyright 2019 Xilinx The 80/20 Rule Holds True
(13% of Streams generate 74%)
https://techcrunch.com/2019/07/12/twitchcontinuestodominatelivestreamingwithitssecondbiggestquartertodate/amp/
© Copyright 2019 Xilinx Distribution of Video Workloads (80/20 rule) Cost per bit optimized Cost per channel optimized Xilinx: Two Pronged Strategy Safe City High High Density # of Accelerators / Channels Transcoding Quality Online Gaming
eSports eSports
Smart Retail Live Video Live Video # of Viewers / Bandwidth
Safe City 1k’s 10k’s 1000K # of Streams >> 14 © Copyright 2019 Xilinx DCG Video Soft IP Strategy High VQ (Video Quality) / Low bitrate strategy
© Copyright 2019 Xilinx Xilinx has a Solution for all Video Workloads
All live video traffic maps to one or both of these models
OPEX Focused CAPEX Focused
Lowest Bandwidth Highest Density (Cost per bit) (Cost per Channel)
x264 Med. Preset Intel QSV # of Viewers Safe City # of Streams
© Copyright 2019 Xilinx Page 16 Software encoders sacrifice efficiency for speed (fps)
What they live with for realtime applications
Best compression Worst throughput
https://youtu.be/x9wn633vl_c Xilinx & x265 bitrate comparison at same PSNR
Bitrate savings vs x265 Medium at the same PSNR 105%
100%
95% 14% less bits 17% less bits 90%
85%
80%
75%
70%
65%
60% x265 medium Xilinx HEVC x265 slow
Note: 1920x1080 encoding done using E5 2666 Dual socket server used for CPU measurements
© Copyright 2019 Xilinx Xilinx vs CPUs: 20x faster @ same quality
Encode speed (fps) at same Quality
140 12x 12x 20x 120
100
80
60
40
20
0 x264 Very Slow XLX x265 Slow XLX Libvpx XLX
Note: 1920x1080 encoding done using E5 2666 Dual socket server used for CPU measurements
© Copyright 2019 Xilinx Why Bits matter : Bandwidth = Cost
“Quarterly bandwidth costs increased by 66.8% to US$25.3 million 3Q2018 from same period of 2017, primarily due to bandwidth usage as a result of increased user base and enhanced live streaming video quality improvement”
© Copyright 2019 Xilinx Page 20 PSNR ParkJoy 30 Xilinx vs Nvidia: 40% less bits @ same quality
29
28
27
26
25
24 40% lower bitrate for same quality NVenc HEVC NGcodec HEVC PSNR 23
22 0 2e+06 4e+06 6e+06 8e+06 1e+07 1.2e+07 1.4e+07
Tested using Nvidia P4
© Copyright 2019 Xilinx
2M 4M 6M 8M 10M 12M 30% Less Bandwidth Saves Millions of Dollars
© Copyright 2019 Xilinx Page 22 Video Solutions Available Today on Alveo
Codec Partner Description Channels per card
H.264 HDE Alma High Density Encoder 12 x 1080p60
H.264 HQE Socionext High Quality Encoder 2 x 1080p60
H.264 HDD VYU Synch High Density Decoder 12 x 1080p60
HEVCHQE NGCodec High Density Encoder 2x 1080p60 (June 19)
HEVCHDD Path Partner High Density Decoder 12 x 1080p60
HEVCHEIFD Path Partner HEIF Image Decoder 10 x 4kp15
VP9HQE NGCodec High Quality Encoder 2 x 1080p60 (July 19)
Perseus+ VNova+NGCodec High Quality HEVC Encoder 1 x 4kp60
WebPE Xilinx High Density Encoder Resolution dependent
ABR Scaler Xilinx Multichannel Scaler
JPEGHDE Deepoly High Density Encoder Today (resolution dependent)
JPEGHDD CTAccel High Density Decoder Today (resolution dependent)
© Copyright 2019 Xilinx Xilinx Video Solution Stack : Seamless FFmpeg integration
Customer Application
FPGA h.264 encode FPGA HEVC FPGA VP9 encode Xilinx h.264 decode Xilinx ABR Scaler Xilinx Yolo plugin plugin encode plugin plugin plugin plugin
(video codecs, scalers, compositing etc.)
Xilinx Media Acceleration API Xilinx Runtime API
Xilinx Accelerator Binary x86 Server Xilinx Alveo Accelerator Card
© Copyright 2019 Xilinx NO FPGA EXPERIENCE REQUIRED
ffmpeg \ ‐f rawvideo ‐pix_fmt yuv420p ‐s:v 1920x1080 ‐r 30 ‐an ‐i /home/ffmpeg/VU9P/TestSequences/Kimono1_1920x1080_ 24.yuv \ ‐frames 240 ‐c:v libx264 ‐preset medium ‐profile:v high ‐crf 23 ‐bf 4 ‐refs 3 ‐g 30 ‐b:v 4000k ‐maxrate 4000k ‐bufsize 8000k ‐f h264 ‐r 30 ‐y ./sw_outdir/x264_medium_out0_br4000k.h264
$ ffmpeg \ ‐f rawvideo ‐pix_fmt yuv420p ‐s:v 1920x1080 ‐r 30 ‐an ‐i /home/ffmpeg/VU9P/TestSequences/Kimono1_1920x1080_ 24.yuv \ ‐frames 240 ‐b:v 4000k ‐g 30 ‐c:v xlnx_h264_enc‐hq ‐f h264 ‐y ./hw_outdir/out0_br4000k.h264
$ ffmpeg \ ‐f rawvideo ‐pix_fmt yuv420p ‐s:v 1920x1080 ‐r 30 ‐an ‐i /home/ffmpeg/VU9P/TestSequences/Kimono1_1920x1080_ 24.yuv \ ‐frames 240 ‐b:v 4000k ‐g 30 ‐c:v xlnx_HEVC_enc ‐f h265 ‐y ./hw_outdir/out1_br4000k.h264
˃ As simple as changing 20 characters to get acceleration
https://trac.ffmpeg.org/wiki/EncodingForStreamingSites © Copyright 2019 Xilinx Video Innovative Solutions
>> 26 © Copyright 2019 Xilinx Socionext H.264 Encoder
© Copyright 2019 Xilinx Socionext uses Xilinx FPGA for next Gen Solution
© Copyright 2019 Xilinx Socionext H.264 Enc. vs. x264 Enc. Very Slow setting
© Copyright 2019 Xilinx Major testing to ensure high performance
PSNR Testing VMAF Testing
Page 30 © Copyright 2019 Xilinx FPGA H.264 Enc vs. Nvidia T4 H.264 Enc. VNova Perseus+ IP
© Copyright 2019 Xilinx HD video services over any network
Actual screen shot (27th May 2019)
© Copyright 2019 Xilinx Page 33 PERSEUS: a new approach
PERSEUS Plus VNova becoming a standard: PERSEUS Pro undergoing
PERSEUS Plus is based on hierarchical image standardization representation that is far more efficient than the traditional as VC6/ST2117 blockbased codecs. Combining PERSEUS Plus with an existing base codec improves the overall quality, throughput and resiliency of the picture. PERSEUS Plus in process for “Low Complexity Codec Enhancements”
© Copyright 2019 Xilinx Page 34 Perseus+ NGCodec improves overall performance
4kp60 on single VU9P 4kp60 on 4 x VU9P 4kp60 on 80 x x86 cores
VNova Perseus+ NGC HEVC NGC HEVC only x265 Software (very slow preset)
Best Performance Medium Performance Lowest Performance
Lowest Cost Medium Cost Highest Cost
Lowest Power Medium Power Highest Power
© Copyright 2019 Xilinx Page 35 Why PERSEUS Plus Xilinx?
Unbeatable Density Bandwidth savings Codec Agnostic • 4x increase in density on FPGA • Up to 50% more efficient • PERSUS Plus is codec agnostic • 50x denser than the equivalent • Live UHDp60 @8Mbps, • Works with h.264, HEVC, VP9 and softwareonly implementation 1080p60 @3Mbps even AV1 when available • UHDp60 in single FPGA. • Increase reach, improve quality • Maximum compatibility with existing of experience, reduce cost workflow
© Copyright 2019 Xilinx Page 36 Xilinx DCG Video ZU+ EV strategy: High Density
>> 37 © Copyright 2019 Xilinx Xilinx has a Solution for all Video Workloads
All live video traffic maps to one or both of these models
OPEX Focused CAPEX Focused
Lowest Bandwidth Highest Density (Cost per bit) (Cost per Channel)
x264 Med. Preset Intel QSV # of Viewers Safe City # of Streams
© Copyright 2019 Xilinx Page 38 Aupera: Delivering High Density Video Datacenter Solutions
Single 3RU high density chassis replaces 30 Xeon E5 servers
Single Aupera chassis with 48 x ZU7EV enabling 384 x 1080p30 simultaneous transcodes @ 750W
© Copyright 2019 Xilinx Page 39 Competitive overview
Intel VCA2 Nvidia P4 Aupera with Xilinx
Device 3 x E31585L Pascal ZU+ 7EV
Size Large Medium Very Small
Max Power (w) 235W 75W 20W Avg cost $$$$ $$$ $$
1080p (fps) QSV TU1 180fps NvEnc HQ/SP 300fps XLX 265 240fps
Power Per 1080p60 (W) 78 15 5
© Copyright 2019 Xilinx Page 40 Encoder Latency Modes
Xilinx provides support for all high density applications
Ultra Low Latency 1. Custom Lambda ˃Ultra low latency encoder 2. Scaling List SubFrame Low Latency 1. Smart AQ ˃Low latency encoder 2. Good RC improvements High VQ 1 Frame Latency 1. Dynamic GOP 2. Better RC improvements ˃High VQ encoder 3. Smarter AQ
>> 41 One Solution Stack for both Workloads/Solutions High Quality High Density
Transcoder Application (FFmpeg) Transcoder Application (FFmpeg)
XMA API XMA API
H264/H265 Dec ABR Scaler H264/H265 ABR Scaler Plugin VP9 Enc. Plugin . Plugin Plugin Enc. Plg
Xilinx Runtime (XRT) Xilinx Runtime (XRT)
XRT Linux Kernel Drivers XRT Linux Kernel Drivers
PCIe PCIe
Alveo FPGA Accelerator Alveo ZU+7EV Accelerator
>> 42 Growing hardware ecosystem
FH / FL Dbl Slot 4 x Zynq MPSoC
FH / HL Single Slot 2 x Zynq MPSoC
HH / HL Single Slot HH / HL Single Slot 2 x Zynq MPSoC 1 x Zynq MPSoC
HH / HL Single Slot 1 x Zynq MPSoC
© Copyright 2019 Xilinx New partners coming in 2019
© Copyright 2019 Xilinx Xilinx Helping with Industry Challenges
˃ Helping reduce need for servers (CAPEX) with higher density platforms Aupera high density chassis, Alveo U30 PCIe card Inspur PCIe card ˃ Reducing bandwidth while reducing number of servers New SocioNext H.264 Encoder New VNova LCEVC encoding technology NGCodec acquisition to ensure future leading edge compression
>> 45 © Copyright 2019 Xilinx Adaptable. Intelligent.
>> 46 © Copyright 2019 Xilinx