Salsify: Low-Latency Network Video Through Tighter Integration Between a Video Codec and a Transport Protocol

Salsify: Low-Latency Network Video Through Tighter Integration Between a Video Codec and a Transport Protocol Sadjad Fouladi Stanford University Joint work with: John Emmons, Emre Orbay, Catherine Wu, Riad S. Wahby, Keith Winstein ! snr.stanford.edu What is real-time video? Sender network Receiver 2 Real-time video latency target: tens of milliseconds the amount of time between when something happens you see it and 3 Low latency is required to maintain the interactivity of the application Interaction Sender network Receiver 4 Cloud Video Gaming Low-Latency Video Stream Remote Server Teleoperation of Robots Remote Surgery Video Conferencing WebRTC (Google Chrome) slow reaction to network variations ⇒ stalls and glitches Enter Salsify • Salsify is a new architecture for real-time Internet video. • Salsify tightly integrates a video-aware transport protocol, with a functional video codec, allowing it to respond quickly to changing network conditions. 10 Conventional design: two control loops at arm’s length video transport codec protocol 11 The narrow interface between codec and transport new target bit rate video transport codec protocol compressed frames 12 Decades of research and development on these components MPEG-1 Sprout NADA VC-1 MPEG-2 video GCC transportBBR LEDBAT H.265 H.264codec H.263 CDGprotocolRemyCC VP8 VP9 AV1 FBRA 13 “Let’s improve the transport” 1000 (kbps) Skype t 100 throughpu 0 10 20 30 40 50 60 5000 1000 500 (ms) y 100 dela 50 0 10 20 30 40 50 60 time (s) K. Winstein, A. Sivaraman, H. Balakrishnan, “Stochastic Forecasts Achieve High Throughput and Low Delay over Cellular Networks,” NSDI’13 14 “Let’s improve the transport” 1000 Sprout (kbps) t 100 throughpu improving the0 network10 20 component30 40 didn’t50 save60 the day… 5000 1000 500 (ms) y 100 dela 50 0 10 20 30 40 50 60 time (s) K. Winstein, A. Sivaraman, H. Balakrishnan, “Stochastic Forecasts Achieve High Throughput and Low Delay over Cellular Networks,” NSDI’13 15 The video codec can only achieve the bit rate on average 24 18 12 target bit rate (2 Mbps) frame size (KB) 6 0 0 10 20 30 frame number 16 The problem: codec and transport are too decoupled • The codec can only respond to changes in target bit rate over coarse time intervals. • Individual frames may cause packet loss/queueing. • The transport has little control over what codec produces. 17 Salsify explores a more tightly-integrated design transport protocol & video codec Salsify’s architecture: Video-aware transport protocol transport protocol & video codec 19 Video-aware transport protocol “What should be the size of the next frame?” • There’s no notion of bit rate, only the next frame size! • Inspired by packet pair and Sprout-EWMA, transport uses packet inter- arrival time, reported by the receiver. 20 Salsify’s architecture: Functional video codec transport protocol & video codec The encoder can only know the output size after the fact. 24 It’s challenging for any codec to 18 choose the appropriate 12 quality settings upfront to meet a frame size (KB) target size—they tend to over-/ 6 undershoot the target. 0 0 10 20 30 frame number 22 The challenge: Getting an accurate frame out of an inaccurate codec • Trial and error Encode with differentSOUNDS quality settings, GOOD, pick the one that fits. DOESN’T WORK! 23 Video encoder turns frames into a compressed bitstream frame 0 frame 1 frame 2 frame 3 Video Encoder 00011000010 01110001000 11011010011 00000001001 01100001011 00110000111 01000110110 00101100000 24 Encoder is stateful frame 0 frame 1 frame 2 frame 3 Video Encoder 00011000010 01110001000 11011010011 00000001001 01100001011 00110000111 01000110110 00101100000 25 There’s no way to undo an encoded frame in current codecs codec.encode([�, �, ...]) → bytestream… The state is internal to the encoder—no way to save/restore the state. 26 Functional video codec to the rescue encode(state, �) → state′, frame Salsify’s functional video codec exposes the state that can be saved/restored. 27 Order two, pick the one that fits! • Salsify’s functional video codec can explore different execution paths without committing to them. • For each frame, codec presents the transport with three options: A slightly-higher-quality version, 50 KB A slightly-lower-quality version, better ❌ Discarding the frame. worse 10 KB 28 Salsify’s architecture: Unified control loop transport protocol & video codec 29 Codec → Transport “Here’s two versions of the current frame.” 50 KB better worse 25 KB target frame size 30 KB 30 Transport → Codec “I picked option 2. Base the next frame on its exiting state.” 25 KB target frame size 30 KB 31 Codec → Transport “Here’s two versions of the latest frame.” 50 KB better worse 25 KB target frame size 55 KB 32 Transport → Codec “I picked option 1. Base the next frame on its exiting state.” 50 KB target frame size 55 KB 33 Codec → Transport “Here’s two versions of the latest frame.” 70 KB better worse 5025 KB target frame size 5 KB 34 Transport → Codec “I cannot send any frames right now. Sorry, but discard them.” target frame size 5 KB 35 Codec → Transport “Fine. Here’s two versions of the latest frame.” 45 KB better worse 20 KB target frame size 50 KB 36 Transport → Codec “I picked option 1. Base the next frame on its exiting state.” 45 KB target frame size 50 KB 37 There’s no notion of frame rate or bit rate in the system. Frames are sent when the network can accommodate them. Evaluation of Salsify Network Variations WebRTC (Google Chrome 65.0 dev) Salsify Network Outages WebRTC (Google Chrome 65.0 dev) Salsify Sender Receiver Network Simulator Video AV.io System Measurement System emulated network receiver HDMI output barcoded video video in/out (HDMI) HDMI to USB camera Sent Image Timestamp: T+0.000s Received Image Timestamp: T+0.765s Quality: 9.76 dB SSIM Evaluation results: Verizon LTE Trace 18 16 Skype WebRTC (VP9-SVC) IM dB) S 14 WebRTC FaceTime 12 o Quality (S 10 er e t Hangouts id V Bet 8 7000 5000 2000 1000 700 500 Video Delay (95th percentile ms) 45 Evaluation results: Verizon LTE Trace 18 16 Skype WebRTC (VP9-SVC) IM dB) S 14 WebRTC FaceTime 12 Status Quo (conventional transport o Quality (S 10 e and codec) Hangouts id V 8 7000 5000 2000 1000 700 500 Video Delay (95th percentile ms) 46 Evaluation results: Verizon LTE Trace 18 16 Skype WebRTC (VP9-SVC) IM dB) S 14 WebRTC FaceTime 12 Salsify (conventional codec) Status Quo (conventional transport o Quality (S 10 e and codec) Hangouts id V 8 7000 5000 2000 1000 700 500 Video Delay (95th percentile ms) 47 Evaluation results: Verizon LTE Trace 18 Salsify 16 Skype WebRTC (VP9-SVC) IM dB) S 14 WebRTC FaceTime 12 Salsify (conventional codec) Status Quo (conventional transport o Quality (S 10 e and codec) Hangouts id V 8 7000 5000 2000 1000 700 500 Video Delay (95th percentile ms) 48 Evaluation results: AT&T LTE Trace 16 Salsify 15 WebRTC (VP9-SVC) 14 FaceTime IM dB) S 13 WebRTC 12 Hangouts 11 o Quality (S er e 10 t id V 9 Bet Skype 8 5000 2000 1000 700 500 300 200 Video Delay (95th percentile ms) 49 Evaluation results: T-Mobile UMTS Trace 14 Salsify 13 WebRTC (VP9-SVC) IM dB) S Skype 12 WebRTC 11 FaceTime o Quality (S er e t id 10 Hangouts V Bet 9 18000 14000 10000 7000 5000 3500 Video Delay (95th percentile ms) 50 Evaluation results: No variations 12 WebRTC (VP9-SVC) 11 IM dB) S WebRTC 10 Salsify 9 FaceTime o Quality (S er e t id 8 Hangouts V Bet Skype 7 15000 5000 2000 1000 700 500 300 Video Delay (95th percentile ms) 51 Individual component of Salsify are not exactly new… • The transport protocol is a dumbed-down version of packet pair and Sprout. • The video format, VP8, is 13 years old. • The functional codec was introduced in [Fouladi et al., NSDI ’17]. • Its compression efficiency & speed is way lower than commercial codecs. 52 It’s the architecture that’s new! • The functional abstraction separates the control from the algorithm. • The system can now jointly control the codec and the transport in one control loop… • …and respond faster to network variability. 53 In this context, improvements to video codecs may have reached the point of diminishing returns, but changes to the architecture of video systems can still yield signiﬁcant beneﬁts. Takeaways • Salsify is a new architecture for real-time Internet video. • Salsify tightly integrates a video-aware transport protocol, with a functional video codec, allowing it to respond quickly to changing network conditions. • Salsify achieves 4.6⨉ lower p95-delay and 2.1 dB SSIM higher visual quality on average when compared with FaceTime, Hangouts, Skype, and WebRTC. • The code is open-source, and the paper and raw data are open-access: https://snr.stanford.edu/salsify Thank you: NSF, DARPA, Google, Dropbox, VMware, Huawei, Facebook, Stanford Platform Lab, and James..

Salsify: Low-Latency Network Video Through Tighter Integration Between a Video Codec and a Transport Protocol

Improving Mobile IP Handover Latency on End-To-End TCP in UMTS/WCDMA Networks

OEM Cobranet Module Design Guide

Public Address System Network Design Considerations

Aperformance Analysis for Umts Packet Switched

Action-Sound Latency: Are Our Tools Fast Enough?

5G Ultra-Reliable Low-Latency Communication Implementation Challenges and Operational Issues with Iot Devices

Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads Sadjad Fouladi, Riad S

Audio Networking Special 2017

15. Voice Over IP: Speech Transmission Over Packet Networks Voicej

Low Latency Audio Processing

Low Delay MPEG2 Video Encoding N\ P

Trends in Mobile Communications Outline of Presentation