Salsify: Low- Network Video Through Tighter Integration Between a and a Transport Protocol

Sadjad Fouladi Stanford University

Joint work with: John Emmons, Emre Orbay, Catherine Wu, Riad S. Wahby, Keith Winstein

� snr.stanford.edu What is real-time video?

Sender network Receiver

2 Real-time video latency target: tens of milliseconds

the amount of time between when

something happens you see it

and

3 Low latency is required to maintain the interactivity of the application

Interaction

Sender network Receiver

4 Cloud Video Gaming

Low-Latency Video Stream

Remote Server Teleoperation of Robots Remote Surgery Video Conferencing WebRTC (Google Chrome) slow reaction to network variations ⇒ stalls and glitches Enter Salsify

• Salsify is a new architecture for real-time Internet video.

• Salsify tightly integrates a video-aware transport protocol, with a functional video codec, allowing it to respond quickly to changing network conditions.

10 Conventional design: two control loops at arm’s length

video transport codec protocol

11 The narrow interface between codec and transport

new target bit rate

video transport codec protocol

compressed frames

12 Decades of research and development on these components

MPEG-1 Sprout NADA VC-1 MPEG-2 video GCC transportBBR LEDBAT H.265 H.264codec H.263 CDGprotocolRemyCC

VP8 VP9 AV1 FBRA

13 “Let’s improve the transport”

1000 (kbps)

Skype t

100 throughpu

0 10 20 30 40 50 60

5000

1000 500 (ms) y

100 dela 50

0 10 20 30 40 50 60 time (s)

K. Winstein, A. Sivaraman, H. Balakrishnan, “Stochastic Forecasts Achieve High Throughput and Low over Cellular Networks,” NSDI’13 14 “Let’s improve the transport”

1000 Sprout (kbps) t

100 throughpu improving the0 network10 20 component30 40 didn’t50 save60 the day… 5000

1000 500 (ms) y

100 dela 50

0 10 20 30 40 50 60 time (s)

K. Winstein, A. Sivaraman, H. Balakrishnan, “Stochastic Forecasts Achieve High Throughput and Low Delay over Cellular Networks,” NSDI’13 15 The video codec can only achieve the bit rate on average

24

18

12 target bit rate (2 Mbps) frame size (KB) 6

0 0 10 20 30 frame number

16 The problem: codec and transport are too decoupled

• The codec can only respond to changes in target bit rate over coarse time intervals.

• Individual frames may cause packet loss/queueing.

• The transport has little control over what codec produces.

17 Salsify explores a more tightly-integrated design

transport protocol & video codec Salsify’s architecture: Video-aware transport protocol

transport protocol & video codec

19 Video-aware transport protocol

“What should be the size of the next frame?”

• There’s no notion of bit rate, only the next frame size!

• Inspired by packet pair and Sprout-EWMA, transport uses packet inter- arrival time, reported by the receiver.

20 Salsify’s architecture: Functional video codec

transport protocol & video codec undershoot the target. target size quality settings upfront to meet a the appropriatchoose It’s challenging for The encoder onlycan know the output size —they tend to over-/ any codec e to

frame size (KB) 12 18 24 0 6 0 after the fact. frame number 10 20 30 22 The challenge: Getting an accurate frame out of an inaccurate codec

• Trial and error Encode with differentSOUNDS quality settings, GOOD, pick the one that fits. DOESN’T WORK!

23 Video encoder turns frames into a compressed bitstream

frame 0 frame 1 frame 2 frame 3

Video Encoder

00011000010 01110001000 11011010011 00000001001 01100001011 00110000111 01000110110 00101100000

24 Encoder is stateful

frame 0 frame 1 frame 2 frame 3

Video Encoder

00011000010 01110001000 11011010011 00000001001 01100001011 00110000111 01000110110 00101100000

25 There’s no way to undo an encoded frame in current codecs

codec.encode([�, �, ...]) → bytestream…

The state is internal to the encoder—no way to save/restore the state.

26 Functional video codec to the rescue

encode(state, �) → state′, frame

Salsify’s functional video codec exposes the state that can be saved/restored.

27 Order two, pick the one that fits!

• Salsify’s functional video codec can explore different execution paths without committing to them.

• For each frame, codec presents the transport with three options: A slightly-higher-quality version,

50 KB

A slightly-lower-quality version, better

❌ Discarding the frame. worse

10 KB

28 Salsify’s architecture: Unified control loop

transport protocol & video codec

29 Codec → Transport “Here’s two versions of the current frame.”

50 KB

better

worse

25 KB

target frame size 30 KB

30 Transport → Codec “I picked option 2. Base the next frame on its exiting state.”

25 KB

target frame size 30 KB

31 Codec → Transport “Here’s two versions of the latest frame.”

50 KB

better

worse

25 KB

target frame size 55 KB

32 Transport → Codec “I picked option 1. Base the next frame on its exiting state.”

50 KB

target frame size 55 KB

33 Codec → Transport “Here’s two versions of the latest frame.”

70 KB

better

worse

5025 KB

target frame size 5 KB

34 Transport → Codec “I cannot send any frames right now. Sorry, but discard them.”

target frame size 5 KB

35 Codec → Transport “Fine. Here’s two versions of the latest frame.”

45 KB

better

worse

20 KB

target frame size 50 KB

36 Transport → Codec “I picked option 1. Base the next frame on its exiting state.”

45 KB

target frame size 50 KB

37 There’s no notion of frame rate or bit rate in the system. Frames are sent when the network can accommodate them. Evaluation of Salsify Network Variations

WebRTC (Google Chrome 65.0 dev) Salsify Network Outages

WebRTC (Google Chrome 65.0 dev) Salsify Sender Receiver

Network Simulator

Video AV.io System

Measurement System emulated network

receiver HDMI output barcoded video

video in/out (HDMI) HDMI to USB camera Sent Image Timestamp: T+0.000s

Received Image Timestamp: T+0.765s Quality: 9.76 dB SSIM Evaluation results: Verizon LTE Trace

18

16 Skype WebRTC (VP9-SVC) IM dB) S 14 WebRTC FaceTime

12

o Quality (S 10 er e t Hangouts id

V Bet 8

7000 5000 2000 1000 700 500 Video Delay (95th percentile ms) 45 Evaluation results: Verizon LTE Trace

18

16 Skype WebRTC (VP9-SVC) IM dB) S 14 WebRTC FaceTime

12 Status Quo (conventional transport

o Quality (S 10 e and codec) Hangouts id V 8

7000 5000 2000 1000 700 500 Video Delay (95th percentile ms) 46 Evaluation results: Verizon LTE Trace

18

16 Skype WebRTC (VP9-SVC) IM dB) S 14 WebRTC FaceTime

12 Salsify (conventional codec) Status Quo (conventional transport

o Quality (S 10 e and codec) Hangouts id V 8

7000 5000 2000 1000 700 500 Video Delay (95th percentile ms) 47 Evaluation results: Verizon LTE Trace

18

Salsify 16 Skype WebRTC (VP9-SVC) IM dB) S 14 WebRTC FaceTime

12 Salsify (conventional codec) Status Quo (conventional transport

o Quality (S 10 e and codec) Hangouts id V 8

7000 5000 2000 1000 700 500 Video Delay (95th percentile ms) 48 Evaluation results: AT&T LTE Trace

16 Salsify 15 WebRTC (VP9-SVC) 14 FaceTime IM dB)

S 13 WebRTC 12 Hangouts 11

o Quality (S er e 10 t id

V 9 Bet Skype

8 5000 2000 1000 700 500 300 200 Video Delay (95th percentile ms) 49 Evaluation results: T-Mobile UMTS Trace

14 Salsify

13 WebRTC (VP9-SVC) IM dB)

S Skype 12 WebRTC

11 FaceTime

o Quality (S er e t

id 10 Hangouts V Bet

9 18000 14000 10000 7000 5000 3500 Video Delay (95th percentile ms) 50 Evaluation results: No variations

12 WebRTC (VP9-SVC)

11 IM dB)

S WebRTC 10 Salsify 9 FaceTime

o Quality (S er e t

id 8 Hangouts

V Bet Skype

7 15000 5000 2000 1000 700 500 300 Video Delay (95th percentile ms)

51 Individual component of Salsify are not exactly new…

• The transport protocol is a dumbed-down version of packet pair and Sprout.

• The video format, VP8, is 13 years old.

• The functional codec was introduced in [Fouladi et al., NSDI ’17].

• Its compression efficiency & speed is way lower than commercial codecs.

52 It’s the architecture that’s new!

• The functional abstraction separates the control from the algorithm.

• The system can now jointly control the codec and the transport in one control loop…

• …and respond faster to network variability.

53 In this context, improvements to video codecs may have reached the point of diminishing returns, but changes to the architecture of video systems can still yield signifcant benefts. Takeaways

• Salsify is a new architecture for real-time Internet video.

• Salsify tightly integrates a video-aware transport protocol, with a functional video codec, allowing it to respond quickly to changing network conditions.

• Salsify achieves 4.6⨉ lower p95-delay and 2.1 dB SSIM higher visual quality on average when compared with FaceTime, Hangouts, Skype, and WebRTC.

• The code is open-source, and the paper and raw data are open-access: https://snr.stanford.edu/salsify

Thank you: NSF, DARPA, Google, Dropbox, VMware, Huawei, Facebook, Stanford Platform Lab, and James.