Video & Imaging Acceleration Overview Sean Gardner [email protected] Sr. Marketing Mgr., Video & Imaging DCG August, 2019

© Copyright 2019 Xilinx What we’ll cover today…

˃ Market Overview Market challenges / pain points Challenges with present solution(s) Xilinx Two Pronged Strategy: Soft IP (high quality) and hardened ZU+ EV devices (high density) ˃ Xilinx DCG Video Soft IP strategy: High Quality Xilinx solutions competitive offering and positioning Xilinx delivers value to customers bottom line Video Software Solution Stack and Ecosystem Video Innovative Solutions with Adaptive Acceleration ‒ SocioNext new H.264 High Quality encoder ‒ V­nova Perseus +: Accelerating standard base codecs ˃ Xilinx DCG Video Zynq UltraScale+ EV strategy: High Density What solutions and platforms are out there and what’s coming Video Software Solution Stack Roadmap for enhancements for the VCU

>> 2 © Copyright 2019 Xilinx 2 BIG challenges facing our Industry

˃Capital expenditures are out of control CAPEX – expenses in servers/hardware and Data Centers

˃Operating expenditures are now 3rd largest cost OPEX – expenses related to streaming/bandwidth or CDN costs

>> 3 © Copyright 2019 Xilinx Live Video a $70B Market by 2021

$70B Worldwide by 2021

www.rethinkresearch.biz Published July 2019

© Copyright 2019 Xilinx Page 4 China is # 1 Mobile Live Video

© Copyright 2019 Xilinx Page 5 Network Provisioning Costs are an Issue

Provisioning for peak traffic will grow in inefficiencies

77% Live Video is the Most Expensive

© Copyright 2019 Xilinx Page 7 Quick Comparison VOD vs. UGC (Live)

1000’s to 100,000’s files 10,000’s to 100 files CDN CDN Daily

Hrs. per file Real­Time

Netflix may ingest 100 files daily Twitch has 100,000 concurrent streamers today

Page 8 © Copyright 2019 Xilinx Compute costs up 35x (480p30 H.264 to 4kp30 AV1)

Does not include massive growth in video traffic volume

Chart Title Increase in codec Complexity

120 140

120 100

100 80 80 60 96x more pixels 25 x more complex 60

40 40

20 20 Increase in resolution x Increase in complexity x

0 0 SD (480) HD (720) FHD (1080) UHD (4k) 8k MPEG2 H.264 HEVC / VP9 AV1

© Copyright 2019 Xilinx Page 9 Revenue vs Costs (OPEX)

Streaming Costs

Unprofitable zone Profit zone

© Copyright 2019 Xilinx Page 10 Bandwidth Costs (OPEX)

Bandwidth % of Company Total Revenue Bandwidth Cost Revenue 4,130 RMB (M) 247.8 RMB (M) BiliBili $616M USD 6% $37M USD

4,663 RMB (M) 359 RMB (M) Huya $678M USD 9% $92M USD

15,764 RMB (M) 967 RMB (M) YY Inc. $1.5B USD 6% $144M USD

24,120 RMB (M) 2,318 RMB (M) iQIYI Inc. $3.6B USD 8.9% $346M USD

3496 RMB (M) 555.9 RMB (M) Douyu Inc. $520M USD 15.9% $82.8M USD

13,408 RMB (M) 804 RMB (M) Momo Inc. $1,9B USD 6% (est.) $117M USD

© Copyright 2019 Xilinx 6.78 RMB to USD Conversion rate when not provided Distribution of Video Workloads (80/20 rule) 20% of Streams, 80% of bits 80% of Streams, 20% of bits # of Channels

Ninja 600k Viewers OPEX CAPEX Challenge Challenge # of Viewers / Bandwidth Lowest Lowest cost cost per bit per channel

Alfred E. Neuman 4 viewers Safe City 1k’s 10k’s Millions # of Streams >> 12 © Copyright 2019 Xilinx The 80/20 Rule Holds True

(13% of Streams generate 74%)

https://techcrunch.com/2019/07/12/twitch­continues­to­dominate­live­streaming­with­its­second­biggest­quarter­to­date/amp/

© Copyright 2019 Xilinx Distribution of Video Workloads (80/20 rule) Cost per bit optimized Cost per channel optimized Xilinx: Two Pronged Strategy Safe City High High Density # of Accelerators / Channels Transcoding Quality Online Gaming

eSports eSports

Smart Retail Live Video Live Video # of Viewers / Bandwidth

Safe City 1k’s 10k’s 1000K # of Streams >> 14 © Copyright 2019 Xilinx DCG Video Soft IP Strategy High VQ (Video Quality) / Low bitrate strategy

© Copyright 2019 Xilinx Xilinx has a Solution for all Video Workloads

All live video traffic maps to one or both of these models

OPEX Focused CAPEX Focused

Lowest Bandwidth Highest Density (Cost per bit) (Cost per Channel)

Med. Preset Intel QSV # of Viewers Safe City # of Streams

© Copyright 2019 Xilinx Page 16 Software encoders sacrifice efficiency for speed (fps)

What they live with for real­time applications

Best compression Worst throughput

https://youtu.be/x9wn633vl_c Xilinx & bitrate comparison at same PSNR

Bitrate savings vs x265 Medium at the same PSNR 105%

100%

95% 14% less bits 17% less bits 90%

85%

80%

75%

70%

65%

60% x265 medium Xilinx HEVC x265 slow

Note: 1920x1080 encoding done using E5 2666 Dual socket server used for CPU measurements

© Copyright 2019 Xilinx Xilinx vs CPUs: 20x faster @ same quality

Encode speed (fps) at same Quality

140 12x 12x 20x 120

100

80

60

40

20

0 x264 Very Slow XLX x265 Slow XLX Libvpx XLX

Note: 1920x1080 encoding done using E5 2666 Dual socket server used for CPU measurements

© Copyright 2019 Xilinx Why Bits matter : Bandwidth = Cost

“Quarterly bandwidth costs increased by 66.8% to US$25.3 million 3Q2018 from same period of 2017, primarily due to bandwidth usage as a result of increased user base and enhanced live streaming video quality improvement”

© Copyright 2019 Xilinx Page 20 PSNR ParkJoy 30 Xilinx vs Nvidia: 40% less bits @ same quality

29

28

27

26

25

24 40% lower bitrate for same quality NVenc HEVC NGcodec HEVC PSNR 23

22 0 2e+06 4e+06 6e+06 8e+06 1e+07 1.2e+07 1.4e+07

Tested using Nvidia P4

© Copyright 2019 Xilinx

2M 4M 6M 8M 10M 12M 30% Less Bandwidth Saves Millions of Dollars

© Copyright 2019 Xilinx Page 22 Video Solutions Available Today on Alveo

Codec Partner Description Channels per card

H.264 HDE Alma High Density Encoder 12 x 1080p60

H.264 HQE Socionext High Quality Encoder 2 x 1080p60

H.264 HDD VYU Synch High Density Decoder 12 x 1080p60

HEVC­HQE NGCodec High Density Encoder 2x 1080p60 (June 19)

HEVC­HDD Path Partner High Density Decoder 12 x 1080p60

HEVC­HEIFD Path Partner HEIF Image Decoder 10 x 4kp15

VP9­HQE NGCodec High Quality Encoder 2 x 1080p60 (July 19)

Perseus+ V­Nova+NGCodec High Quality HEVC Encoder 1 x 4kp60

WebP­E Xilinx High Density Encoder Resolution dependent

ABR Scaler Xilinx Multi­channel Scaler

JPEG­HDE Deepoly High Density Encoder Today (resolution dependent)

JPEG­HDD CTAccel High Density Decoder Today (resolution dependent)

© Copyright 2019 Xilinx Xilinx Video Solution Stack : Seamless FFmpeg integration

Customer Application

FPGA h.264 encode FPGA HEVC FPGA VP9 encode Xilinx h.264 decode Xilinx ABR Scaler Xilinx Yolo plugin plugin encode plugin plugin plugin plugin

(video codecs, scalers, compositing etc.)

Xilinx Media Acceleration API Xilinx Run­time API

Xilinx Accelerator Binary x86 Server Xilinx Alveo Accelerator Card

© Copyright 2019 Xilinx NO FPGA EXPERIENCE REQUIRED

\ ‐f rawvideo ‐pix_fmt yuv420p ‐s:v 1920x1080 ‐r 30 ‐an ‐i /home/ffmpeg/VU9P/TestSequences/Kimono1_1920x1080_ 24.yuv \ ‐frames 240 ‐c:v libx264 ‐preset medium ‐profile:v high ‐crf 23 ‐bf 4 ‐refs 3 ‐g 30 ‐b:v 4000k ‐maxrate 4000k ‐bufsize 8000k ‐f h264 ‐r 30 ‐y ./sw_outdir/x264_medium_out0_br4000k.h264

$ ffmpeg \ ‐f rawvideo ‐pix_fmt yuv420p ‐s:v 1920x1080 ‐r 30 ‐an ‐i /home/ffmpeg/VU9P/TestSequences/Kimono1_1920x1080_ 24.yuv \ ‐frames 240 ‐b:v 4000k ‐g 30 ‐c:v xlnx_h264_enc‐hq ‐f h264 ‐y ./hw_outdir/out0_br4000k.h264

$ ffmpeg \ ‐f rawvideo ‐pix_fmt yuv420p ‐s:v 1920x1080 ‐r 30 ‐an ‐i /home/ffmpeg/VU9P/TestSequences/Kimono1_1920x1080_ 24.yuv \ ‐frames 240 ‐b:v 4000k ‐g 30 ‐c:v xlnx_HEVC_enc ‐f h265 ‐y ./hw_outdir/out1_br4000k.h264

˃ As simple as changing 20 characters to get acceleration

https://trac.ffmpeg.org/wiki/EncodingForStreamingSites © Copyright 2019 Xilinx Video Innovative Solutions

>> 26 © Copyright 2019 Xilinx Socionext H.264 Encoder

© Copyright 2019 Xilinx Socionext uses Xilinx FPGA for next Gen Solution

© Copyright 2019 Xilinx Socionext H.264 Enc. vs. x264 Enc. Very Slow setting

© Copyright 2019 Xilinx Major testing to ensure high performance

PSNR Testing VMAF Testing

Page 30 © Copyright 2019 Xilinx FPGA H.264 Enc vs. Nvidia T4 H.264 Enc. V­Nova Perseus+ IP

© Copyright 2019 Xilinx HD video services over any network

Actual screen shot (27th May 2019)

© Copyright 2019 Xilinx Page 33 PERSEUS: a new approach

PERSEUS Plus V­Nova becoming a standard: PERSEUS Pro undergoing

PERSEUS Plus is based on hierarchical image standardization representation that is far more efficient than the traditional as VC­6/ST­2117 block­based codecs. Combining PERSEUS Plus with an existing base codec improves the overall quality, throughput and resiliency of the picture. PERSEUS Plus in process for “Low Complexity Codec Enhancements”

© Copyright 2019 Xilinx Page 34 Perseus+ NGCodec improves overall performance

4kp60 on single VU9P 4kp60 on 4 x VU9P 4kp60 on 80 x x86 cores

V­Nova Perseus+ NGC HEVC NGC HEVC only x265 Software (very slow preset)

Best Performance Medium Performance Lowest Performance

Lowest Cost Medium Cost Highest Cost

Lowest Power Medium Power Highest Power

© Copyright 2019 Xilinx Page 35 Why PERSEUS Plus Xilinx?

Unbeatable Density Bandwidth savings Codec Agnostic • 4x increase in density on FPGA • Up to 50% more efficient • PERSUS Plus is codec agnostic • 50x denser than the equivalent • Live UHDp60 @8Mbps, • Works with h.264, HEVC, VP9 and software­only implementation 1080p60 @3Mbps even AV1 when available • UHDp60 in single FPGA. • Increase reach, improve quality • Maximum compatibility with existing of experience, reduce cost workflow

© Copyright 2019 Xilinx Page 36 Xilinx DCG Video ZU+ EV strategy: High Density

>> 37 © Copyright 2019 Xilinx Xilinx has a Solution for all Video Workloads

All live video traffic maps to one or both of these models

OPEX Focused CAPEX Focused

Lowest Bandwidth Highest Density (Cost per bit) (Cost per Channel)

x264 Med. Preset Intel QSV # of Viewers Safe City # of Streams

© Copyright 2019 Xilinx Page 38 Aupera: Delivering High Density Video Datacenter Solutions

Single 3RU high density chassis replaces 30 Xeon E5 servers

Single Aupera chassis with 48 x ZU7EV enabling 384 x 1080p30 simultaneous transcodes @ 750W

© Copyright 2019 Xilinx Page 39 Competitive overview

Intel VCA2 Nvidia P4 Aupera with Xilinx

Device 3 x E3­1585L Pascal ZU+ 7EV

Size Large Medium Very Small

Max Power (w) 235W 75W 20W Avg cost $$$$ $$$ $$

1080p (fps) QSV TU1 180fps NvEnc HQ/SP 300fps XLX 265 240fps

Power Per 1080p60 (W) 78 15 5

© Copyright 2019 Xilinx Page 40 Encoder Latency Modes

Xilinx provides support for all high density applications

Ultra Low Latency 1. Custom Lambda ˃Ultra low latency encoder 2. Scaling List Sub­Frame Low Latency 1. Smart AQ ˃Low latency encoder 2. Good RC improvements High VQ 1 Frame Latency 1. Dynamic GOP 2. Better RC improvements ˃High VQ encoder 3. Smarter AQ

>> 41 One Solution Stack for both Workloads/Solutions High Quality High Density

Transcoder Application (FFmpeg) Transcoder Application (FFmpeg)

XMA API XMA API

H264/H265 Dec ABR Scaler H264/H265 ABR Scaler Plugin VP9 Enc. Plugin . Plugin Plugin Enc. Plg

Xilinx Runtime (XRT) Xilinx Runtime (XRT)

XRT Linux Kernel Drivers XRT Linux Kernel Drivers

PCIe PCIe

Alveo FPGA Accelerator Alveo ZU+7EV Accelerator

>> 42 Growing hardware ecosystem

FH / FL Dbl Slot 4 x Zynq MPSoC

FH / HL Single Slot 2 x Zynq MPSoC

HH / HL Single Slot HH / HL Single Slot 2 x Zynq MPSoC 1 x Zynq MPSoC

HH / HL Single Slot 1 x Zynq MPSoC

© Copyright 2019 Xilinx New partners coming in 2019

© Copyright 2019 Xilinx Xilinx Helping with Industry Challenges

˃ Helping reduce need for servers (CAPEX) with higher density platforms Aupera high density chassis, Alveo U30 PCIe card Inspur PCIe card ˃ Reducing bandwidth while reducing number of servers New SocioNext H.264 Encoder New V­Nova LC­EVC encoding technology NGCodec acquisition to ensure future leading edge compression

>> 45 © Copyright 2019 Xilinx Adaptable. Intelligent.

>> 46 © Copyright 2019 Xilinx