How will Deep Learning Change Internet Video Delivery?

Hyunho Yeo 2 Ph.D. Candidate@KAIST • Work on DL-based video delivery • Publish papers at top-tier system/network conferences

Neural Adaptive Content-aware Internet Video Delivery Hyunho Yeo, Youngmok Jung, Jaehong Kim, Jinwoo Shin, Dongsu Han USENIX OSDI 2018

How will Deep Learning Change Internet Video Delivery? Hyunho Yeo, Sunghyun Do, Dongsu Han ACM HotNets 2017 3 Overview

“How will Deep Learning Change Internet Video Delivery?”

1. Observation/Limitation of Current Video Delivery

2. Recent research: DL-based adaptive streaming [OSDI 18]

3. Vision of DL-based Video Delivery 4 Era of Internet Video Delivery Internet video traffic has exponentially grown over last decade! [Global Internet Video Traffic]1 [Percentage of downstream traffic]2 3000 (Peak period in North America - 2014) 1,260,000,000 2000 1TB

1000

0 10 12 14 16 18 20 22 Video traffic (EB per year) per (EB traffic Video Year 1: CISCO Visual Networking Index, 16 data was interpolated 2: https://digitalbusinessblog.wordpress.com/2014/11/25/who-are-the-biggest-bandwidth-hogs/ 5 Observation on Current Video Ecosystem To handle bandwidth heterogeneity over space and time, Adaptive streaming has been widely deployed

Video server Client

Quality Video chunk Bandwidth Quality High ABR

Med. Low Time Time Time 6 Traditional Approaches

Optimizing ABR algorithms Pensieve [SIGCOMM 17], MPC [SIGCOMM 15]

Choosing better servers, CDNs Content Multihoming [SIGCOMM 12], VDN [SIGCOMM 15]

Leveraging centralized control plan Video Control Plane [SIGCOMM 12], Pythease [NSDI 17]

Goal: Find how to best utilize the network resource 7 Limitation of Current Video Delivery Video quality heavily depends on available bandwidth

Video server Client Network Low Quality Congestion Directly affect 8 Limitation of Current Video Delivery Client computing power is scarcely utilized other than for decoding Mobile GPU Desktop GPU 500 12 GTX 1080Ti GTX 1080 375 9 GTX 980Ti GTX 780Ti 250 6 GTX 780 (TFLOPs) (GFLOPs) 125 3 GTX 480 GTX 680 GTX 980 GTX 580 Computing power 0 Computing power 0 2009 2012 2015 2018 2009 2012 2015 2018 Year Year 9 Observation on Current Video Ecosystem Standard codecs efficiently reduce redundancy only inside GOP Group of Pictures (GOP) : 2—10 seconds Seamless switching Video GOP Video Quality Standard codecs (H.26x, VPx, AV1)

Time Compressed : Intra-frame[Adaptive coding streaming] : Inter-frame coding I-frame P,B-frames I-frame 10 Limitation of Current Video Delivery

Group of Pictures (GOP) : 2—10 seconds

Video

Time I-frame I-frame I-frame I-frame

Redundancy (large timescales)

Standard codecs lack any mechanisms for exploiting redundancy that occurs at large timescales 11 What Deep Neural Network (DNN) Can Do? 1. Utilize client computation to enhance video quality

Video server Client Network Congestion

DNN Low quality High quality

Client computing device 12 What Deep Neural Network (DNN) Can Do? 1. Utilize client computation to enhance video quality

Video server Client Network Congestion

Low resolution Super-resolution DNN High resolution

Client computing device 13 What Deep Neural Network (DNN) Can Do? 2. Trained and operate in large timescales (video)

Redundancy Redundancy GOP GOP GOP GOP

Standard codecs DNN 14 What Deep Learning (DL) can Do?

Can we overcome the current limitations via DNN?

How much quality improvement can we achieve?

To answer these, let’s move to our recent research, NAS [OSDI18] 15 NAS: DL-based Adaptive Streaming Apply super-resolution DNN on top of bitrate adaptation : Client computation Super-resolution Bandwidth 240p 1080p

360p 1080p

480p 1080p

1080p 16 17 NAS: System Target 1. Content: Video on demand (VOD) Example

2. Computing device: GTX 10 series Example GTX 1050 Ti (Entry-level) Titan Xp (High-end)

Price $139 $1,200 18 NAS: Two Initial Challenges NAS utilizes DNN and client computation, but … 19 NAS: Two Initial Challenges NAS utilizes DNN and client computation, but … 1. DNN testing accuracy is unreliable for unseen/new content • Even worse, degradation can occur (below figure) Unseen content EDSR [CVPRW 17] ↓Quality (Trained on DIV2K dataset)

SSIM = 0.86 SSIM = 0.84

For the real-world deployment, DNN accuracy should be guaranteed 20 NAS: Two Initial Challenges NAS utilizes DNN and client computation, but … 2. Client must process DNN at real-time, but computing power varies across space and time

GPU > x5 slower! GPU

Client A: Entry-level GPU Client B: High-end GPU (1.98 TFLOPS – 1050 Ti) (10.79 TFLOPS – Titan Xp)

Adaptation to computing power is required 21 Key Design 1: Content-aware DNN Challenge: Providing reliable DNN quality 1. New video admission 2. Generates a content-aware 3. Provide (video, DNN) Video 1 Content-aware DNN 1

Video 2 Content-aware DNN 2

Video server

Content-aware DNN delivers the reliable (over-fitted) training accuracy instead of the unpredictable testing accuracy. 22 Training a content-aware super-resolution

1. Prepares training data 2. Updates the DNN parameters

Updates parameters Raw high-resolution (1080p) frames Output Target Input DNN

Compressed low-resolutions (240p—720p) 23 Content-agnostic vs. Content-aware

“PSNR 2~4 dB gain over content-agnostic” 24 Bicubic vs. Content-aware DNN Average PSNR: 28.28dB Average PSNR: 34.40dB 25 Bicubic vs. Content-aware DNN Average PSNR: 26.15dB Average PSNR: 30.42dB 26 Key Design 2: Multiple Quality DNNs Challenge: Enabling real-time super-resolution on heterogeneous clients

Downloads several MBs? Video server  Delay video streaming Client

Quality: Low High Size: Small (93KB) Large (2,143KB) Compute: Low High 1. Provides multiple quality DNN options 27 Key Design 2: Multiple Quality DNNs Challenge: Enabling real-time super-resolution on heterogeneous clients Manifest file

2. Delivers DNN description Video server (#Layer, #Channel) Client MPD (Media Presentation Description) Period Adaptation Set (Video) Quality: Low High 1080p 720p 480p 4.8Mb/s 2.4Mb/s 1.2Mb/s Size: Small (93KB) Large (2,143KB) Compute: Low High Adaptation Set (DNN) (#layer, #filter) Low-240p Med.-240p High-240p 1. Provides multiple quality DNN options (20, 9) (20, 21) (20, 32) 28 Key Design 2: Multiple Quality DNNs Challenge: Enabling real-time super-resolution on heterogeneous clients Manifest file

2. Delivers DNN description Video server (#Layer, #Channel) Client 53 fps 52 fps

Quality: Low High 38 fps 21 fps Size: Small (93KB) Large (2,143KB) Computing device Selected (GTX 1080) Compute: Low High Mock DNNs 1. Provides multiple quality DNN options 3. Test-runs and selects the highest-quality running at real-time 29 NAS: Two Additional Challenges NAS streams video with a content-aware DNN, but …

: DNN data : Video data Video server Client 30 NAS: Two Additional Challenges NAS streams video with a content-aware DNN, but … 1. Takes long time to download and utilize a DNN

: DNN data : Video data Video server Client Ultra-high (2,145KB) 360p video (400Kbps) 1 x 21 seconds x

Need to provide incremental benefit during downloading a DNN 31 NAS: Two Additional Challenges NAS streams video with a content-aware DNN, but … 2. A DNN competes bandwidth with video

: DNN data : Video data Video server Client (-) Aggressive download: rebuffering, low video quality (-) Conservative download: low DNN benefit

Need to carefully decide how/when to download a DNN model 32 Key Design 3: Scalable DNN Challenge: Takes a long time to utilize a DNN

Video server Client st nd th : Required : Optional 1 chunk 2 chunk 5 chunk No DNN Input Output Layer Layer Layer Layer Layer Layer

additional by-passing paths Time 1. Implement a scalable DNN 2. Download/Apply a partial DNN (+ divide into similar-size chunks) NAS DNN Architecture

1. Per-resolution DNN: enable real-time processing MDSR [1] # Channel NAS-MDSR ( : Bicubic interpolation) 1080p 1080p 1080p 270p 540p 240p 360p 480p

Input: 240p 360p 480p Shared 2. Additional bypassing paths: enable anytime prediction Residual paths 480p image 1080p image Conv Sum Sum Conv Conv Conv Conv ReLU Bicubic Subpixel By-passing paths [1] Enhanced Deep Residual Networks for Single Image Super-Resolution, CVPRW, 2017 34 Key Design 4: Integrated ABR Challenge: How to decide when to download a DNN ABR already handles unpredictable bandwidth variations

3

2 Bitrate Bandwidth Bitrate ABR 1 Bandiwdth

0 0 50 100 150 200 250 300 Time  Integrate DNN download decisions with existing RL-based ABR (Pensieve [1]) DNN Bitrate 3 Integrated 2 Bitrate Bandwidth or DNN ABR 1

DNN Bandiwdth 0 0 50 100 150 200 250 300 But, non-trivial! Time [1]: Mao, Hongzi, Ravi Netravali, and Mohammad Alizadeh. “Neural adaptive video streaming with pensieve.”, SIGCOMM, 2017. [2]: Upper right figure is from the slide of“Neural adaptive video streaming with pensieve.”, 35 Key Design 4: Integrated ABR Challenge: How to decide when to download a DNN • Integrate DNN download decisions with existing RL-based ABR (Pensieve) [1]

QoE metric = bitrate - rebuffering – smoothness Reward Pensieve agent State Action 𝑟𝑟𝑡𝑡 Bandwidth 240p Environment 𝑡𝑡 𝑡𝑡 Bitrate𝑠𝑠 𝑎𝑎 Playback buffer 1080p

Goal: Maximize the total QoE over an entire video [1]: Mao, Hongzi, Ravi Netravali, and Mohammad Alizadeh. "Neural adaptive video streaming with pensieve.“, SIGCOMM, 2017. 36 Key Design 4: Integrated ABR Challenge: How to decide when to download a DNN • Integrate DNN download decisions with existing RL-based ABR (Pensieve) [1]

Reward QoE metric = DNN(bitrate)bitrate - rebuffering- rebuffering– smoothness– smoothness 𝑟𝑟𝑡𝑡 NAS agent State Action Bandwidth 𝑡𝑡 240p Environment Bitrate𝑠𝑠 𝑎𝑎𝑡𝑡 Playback buffer 1080p # Remaining DNN DNN Goal: Maximize the total QoE reflecting DNN-based quality enhancement Putting All Together: Implementation

Video

DNN Server NAS Player (dash.js) ∆1.7K LOC (8.8%) DNN Processor Integrated ABR 6.3K LOC 5.5K LOC 38 Evaluation

1) How much benefit does NAS deliver?

2) What are the cost and benefit of NAS ?

3) Does NAS effectively adapt to heterogeneous clients? 39 NAS vs. Existing Video Delivery : QoE • 17.8 hours real-world network traces: collected from 3G network and broadband (average bandwidth: 1.31Mbps) • 27 YouTube videos: 5-24 minutes, encoded at {400, 800, 1200, 2400, 4800}kbps • Computing device: NVIDIA Titan Xp, DNN quality: Ultra-high • Video player: Chromium browser, Video server: Apache server 40 QoE Metric Quantify user experience of video streaming

1080p vs. 240p Better experience Higher quality Less rebuffering

+(Quality) -(rebuffering) -(smoothness) Generalized QoE model1,2,3 : +(Quality) -(rebuffering) -(smoothness)( ) • : Perceptual quality of nth video chunk bitrate 𝑞𝑞 𝑅𝑅𝑛𝑛 − 𝜇𝜇𝑇𝑇𝑛𝑛 − 𝑞𝑞 𝑅𝑅𝑛𝑛 − 𝑞𝑞 𝑅𝑅𝑛𝑛−1 • : Rebuffering time for downloading nth video chunk 𝑛𝑛 𝑛𝑛 𝑞𝑞 𝑅𝑅 1: MPC-SIGCOMM15, Pensieve-SIGCOMM17, Oboe-SIGCOMM18 𝑅𝑅 𝑇𝑇𝑛𝑛 41 NAS vs. Existing Video Delivery : QoE • 17.8 hours real-world network traces: collected from 3G network and broadband (average bandwidth: 1.31Mbps) • 27 YouTube videos: 5-24 minutes, encoded at {400, 800, 1200, 2400, 4800}kbps • Computing device: NVIDIA Titan Xp, DNN quality: Ultra-high • Video player: Chromium browser, Video server: Apache server 1 better BOLA 0.5 RobustMPC

CDF Pensieve NAS 0 -0.5 0.5 1.5 2.5 Average QoE NAS improves user QoE by 43.08% compared to Pensieve and 92.28% compared to BOLA using same amount of bandwidth. 42 NAS vs. Existing Video Delivery : Cost

Pensieve CDN NAS Pensieve 2.5 = 0.085$/GB 2 NAS CDN 1.5 Training cost

= 0.085$/GB better 1

: ↓17.13% bandwidth cost ($) Total 0.5 for same quality 0 10 mins 1.4$/hour 0 10 20 30 40 50 60 = 0.23$/minute of video Total viewing time (hour)

When the total viewing reaches 30 hours (per minute of video), NAS CDN recoups the initial training cost. 43 Heterogeneous Clients

Each GPU processes at real-time BOLA RobustMPC Pensieve (> 30fps for all resolutions) Low Medium High Ultra-high DNN quality GPU model (Price) better Low GTX 1050 Ti ($139) Medium GTX 1060 ($249)

High GTX 1070 Ti ($449) CDF GTX 1080 ($559) Ultra-high GTX 1080 Ti ($669) Titan Xp ($1,200) Average QoE NAS adapts to heterogeneous devices, and a device with higher computing power receives greater benefit. 44 NAS: DL-based Adaptive Streaming NAS

• NAS shows that applying DNN on video content utilizing client computation can significantly enhance user QoE. • NAS accommodates four key designs: Content-aware DNN, Multiple quality DNNs, Scalable DNN, Integrated ABR. 45 What’s Next? NAS = Adaptive streaming + VoD contents + Desktop-class GPUs

• Integrate DL with various parts in video delivery infrastructure • Apply DL on diverse video applications (e.g., Live/4K/AR/VR) • Deploy DL-based streaming on commercial mobile devices

DL-based video ingest DL-based Live, 4K, AR/VR Content Provider CDN Client

DL-based video storage 46 Conclusion “How will Deep Learning Change Internet Video Delivery?” • The advance of deep learning presents unseen opportunities Current streaming Bandwidth DL-based streaming Client computation Content redundancy Better experience • Rethinking the video delivery infrastructure is required to take advantage of the new opportunities

: First step toward this direction Thank you

• Personal homepage http://ina.kaist.ac.kr/~hyunho/

• Lab homepage http://ina.kaist.ac.kr/

• Project homepage http://ina.kaist.ac.kr/~nas/ OSDI conference @ Carlsbad, CA, USA Content-agnostic vs. Content-aware Original agDNN awDNN 37 34 31

PSNR(dB) 28 25 1 2 3 4 5 6 7 8 9 Content type QoE breakdown 2.5 BOLA R-MPC Pensieve NAS 2 1.5

1 0.05 0.05 0.16 0.15 0.5 0.02 0.05 0.07 0.10 0 Bitrate utility Rebuffer penalty Smoothness penalty Average QoE over Content Types

BOLA R-MPC Pensieve NAS 1 0.8 0.6 0.4 0.2 0

Normalized average QoE average Normalized 1 2 3 4 5 6 7 8 9 Content type 51 Scalable DNN 1 1 0.8 0.8 0.6 17.54% 0.6

CDF NAS 0.4 CDF 0.4 NAS NAS FULL 0.2 0.2 NAS FULL Pensieve Pensieve 0 0 0.3 0.5 0.7 0.9 0.3 1.3 2.3 3.3 Average QoE Average QoE Integrated ABR (Quality-awareness) 240p 360p 480p 720p 1080p unaware lin aware

type unaware log aware QoE unaware hd aware 0 0.5 1 Ratio Integrated ABR (DNN downloads) 2 Pensieve (10%) 1.5 Pensieve (100%) NAS 1 0.15 0.5 0.06 0.09 0.06 0.07 0.09 0 Bitrate utility Rebuffering Smoothness penalty penalty Time Play time = 20 sec 20 = Play time DNN Player Processor 0sec 1 Download manifestDownload file : Video chunk 2 Mock DNN test DNN Mock Download 1 3 4 5 24 sec, 1 24 sec, 6 st 7 video chunk : DNN chunk 15 sec st 1 DNN Case Study 8 Apply 1 6 9 : DNN processing 32 sec, 2 32 sec, 1 2 st st 7 DNN chunk to 6 Download 2 nd DNN 8 30sec 2 9 3 nd nd : Timeline 44 sec, 3 44 sec, DNN chunk th video chunk 10 Download time rd Processing time Processing 11 DNN 4 3 rd 12 52 sec, 4 52 sec, 45sec 13 5 4 th 14 th DNN DNNs arefully downloaded 15 Full DNN - 60 sec, full DNN 17 18 60sec 5 th 19