Salsify: Low-Latency Network Video Through Tighter Integration Between a Video Codec and a Transport Protocol
Sadjad Fouladi Stanford University
Joint work with: John Emmons, Emre Orbay, Catherine Wu, Riad S. Wahby, Keith Winstein
� snr.stanford.edu What is real-time video?
Sender network Receiver
2 Real-time video latency target: tens of milliseconds
the amount of time between when
something happens you see it
and
3 Low latency is required to maintain the interactivity of the application
Interaction
Sender network Receiver
4 Cloud Video Gaming
Low-Latency Video Stream
Remote Server Teleoperation of Robots Remote Surgery Video Conferencing WebRTC (Google Chrome) slow reaction to network variations ⇒ stalls and glitches Enter Salsify
• Salsify is a new architecture for real-time Internet video.
• Salsify tightly integrates a video-aware transport protocol, with a functional video codec, allowing it to respond quickly to changing network conditions.
10 Conventional design: two control loops at arm’s length
video transport codec protocol
11 The narrow interface between codec and transport
new target bit rate
video transport codec protocol
compressed frames
12 Decades of research and development on these components
MPEG-1 Sprout NADA VC-1 MPEG-2 video GCC transportBBR LEDBAT H.265 H.264codec H.263 CDGprotocolRemyCC
VP8 VP9 AV1 FBRA
13 “Let’s improve the transport”
1000 (kbps)
Skype t
100 throughpu
0 10 20 30 40 50 60
5000
1000 500 (ms) y
100 dela 50
0 10 20 30 40 50 60 time (s)
K. Winstein, A. Sivaraman, H. Balakrishnan, “Stochastic Forecasts Achieve High Throughput and Low Delay over Cellular Networks,” NSDI’13 14 “Let’s improve the transport”
1000 Sprout (kbps) t
100 throughpu improving the0 network10 20 component30 40 didn’t50 save60 the day… 5000
1000 500 (ms) y
100 dela 50
0 10 20 30 40 50 60 time (s)
K. Winstein, A. Sivaraman, H. Balakrishnan, “Stochastic Forecasts Achieve High Throughput and Low Delay over Cellular Networks,” NSDI’13 15 The video codec can only achieve the bit rate on average
24
18
12 target bit rate (2 Mbps) frame size (KB) 6
0 0 10 20 30 frame number
16 The problem: codec and transport are too decoupled
• The codec can only respond to changes in target bit rate over coarse time intervals.
• Individual frames may cause packet loss/queueing.
• The transport has little control over what codec produces.
17 Salsify explores a more tightly-integrated design
transport protocol & video codec Salsify’s architecture: Video-aware transport protocol
transport protocol & video codec
19 Video-aware transport protocol
“What should be the size of the next frame?”
• There’s no notion of bit rate, only the next frame size!
• Inspired by packet pair and Sprout-EWMA, transport uses packet inter- arrival time, reported by the receiver.
20 Salsify’s architecture: Functional video codec
transport protocol & video codec undershoot the target. target size quality settings upfront to meet a the appropriatchoose It’s challenging for The encoder onlycan know the output size —they tend to over-/ any codec e to
frame size (KB) 12 18 24 0 6 0 after the fact. frame number 10 20 30 22 The challenge: Getting an accurate frame out of an inaccurate codec
• Trial and error Encode with differentSOUNDS quality settings, GOOD, pick the one that fits. DOESN’T WORK!
23 Video encoder turns frames into a compressed bitstream
frame 0 frame 1 frame 2 frame 3
Video Encoder
00011000010 01110001000 11011010011 00000001001 01100001011 00110000111 01000110110 00101100000
24 Encoder is stateful
frame 0 frame 1 frame 2 frame 3
Video Encoder
00011000010 01110001000 11011010011 00000001001 01100001011 00110000111 01000110110 00101100000
25 There’s no way to undo an encoded frame in current codecs
codec.encode([�, �, ...]) → bytestream…
The state is internal to the encoder—no way to save/restore the state.
26 Functional video codec to the rescue
encode(state, �) → state′, frame
Salsify’s functional video codec exposes the state that can be saved/restored.
27 Order two, pick the one that fits!
• Salsify’s functional video codec can explore different execution paths without committing to them.
• For each frame, codec presents the transport with three options: A slightly-higher-quality version,
50 KB
A slightly-lower-quality version, better
❌ Discarding the frame. worse
10 KB
28 Salsify’s architecture: Unified control loop
transport protocol & video codec
29 Codec → Transport “Here’s two versions of the current frame.”
50 KB
better
worse
25 KB
target frame size 30 KB
30 Transport → Codec “I picked option 2. Base the next frame on its exiting state.”
25 KB
target frame size 30 KB
31 Codec → Transport “Here’s two versions of the latest frame.”
50 KB
better
worse
25 KB
target frame size 55 KB
32 Transport → Codec “I picked option 1. Base the next frame on its exiting state.”
50 KB
target frame size 55 KB
33 Codec → Transport “Here’s two versions of the latest frame.”
70 KB
better
worse
5025 KB
target frame size 5 KB
34 Transport → Codec “I cannot send any frames right now. Sorry, but discard them.”
target frame size 5 KB
35 Codec → Transport “Fine. Here’s two versions of the latest frame.”
45 KB
better
worse
20 KB
target frame size 50 KB
36 Transport → Codec “I picked option 1. Base the next frame on its exiting state.”
45 KB
target frame size 50 KB
37 There’s no notion of frame rate or bit rate in the system. Frames are sent when the network can accommodate them. Evaluation of Salsify Network Variations
WebRTC (Google Chrome 65.0 dev) Salsify Network Outages
WebRTC (Google Chrome 65.0 dev) Salsify Sender Receiver
Network Simulator
Video AV.io System
Measurement System emulated network
receiver HDMI output barcoded video
video in/out (HDMI) HDMI to USB camera Sent Image Timestamp: T+0.000s
Received Image Timestamp: T+0.765s Quality: 9.76 dB SSIM Evaluation results: Verizon LTE Trace
18
16 Skype WebRTC (VP9-SVC) IM dB) S 14 WebRTC FaceTime
12
o Quality (S 10 er e t Hangouts id
V Bet 8
7000 5000 2000 1000 700 500 Video Delay (95th percentile ms) 45 Evaluation results: Verizon LTE Trace
18
16 Skype WebRTC (VP9-SVC) IM dB) S 14 WebRTC FaceTime
12 Status Quo (conventional transport
o Quality (S 10 e and codec) Hangouts id V 8
7000 5000 2000 1000 700 500 Video Delay (95th percentile ms) 46 Evaluation results: Verizon LTE Trace
18
16 Skype WebRTC (VP9-SVC) IM dB) S 14 WebRTC FaceTime
12 Salsify (conventional codec) Status Quo (conventional transport
o Quality (S 10 e and codec) Hangouts id V 8
7000 5000 2000 1000 700 500 Video Delay (95th percentile ms) 47 Evaluation results: Verizon LTE Trace
18
Salsify 16 Skype WebRTC (VP9-SVC) IM dB) S 14 WebRTC FaceTime
12 Salsify (conventional codec) Status Quo (conventional transport
o Quality (S 10 e and codec) Hangouts id V 8
7000 5000 2000 1000 700 500 Video Delay (95th percentile ms) 48 Evaluation results: AT&T LTE Trace
16 Salsify 15 WebRTC (VP9-SVC) 14 FaceTime IM dB)
S 13 WebRTC 12 Hangouts 11
o Quality (S er e 10 t id
V 9 Bet Skype
8 5000 2000 1000 700 500 300 200 Video Delay (95th percentile ms) 49 Evaluation results: T-Mobile UMTS Trace
14 Salsify
13 WebRTC (VP9-SVC) IM dB)
S Skype 12 WebRTC
11 FaceTime
o Quality (S er e t
id 10 Hangouts V Bet
9 18000 14000 10000 7000 5000 3500 Video Delay (95th percentile ms) 50 Evaluation results: No variations
12 WebRTC (VP9-SVC)
11 IM dB)
S WebRTC 10 Salsify 9 FaceTime
o Quality (S er e t
id 8 Hangouts
V Bet Skype
7 15000 5000 2000 1000 700 500 300 Video Delay (95th percentile ms)
51 Individual component of Salsify are not exactly new…
• The transport protocol is a dumbed-down version of packet pair and Sprout.
• The video format, VP8, is 13 years old.
• The functional codec was introduced in [Fouladi et al., NSDI ’17].
• Its compression efficiency & speed is way lower than commercial codecs.
52 It’s the architecture that’s new!
• The functional abstraction separates the control from the algorithm.
• The system can now jointly control the codec and the transport in one control loop…
• …and respond faster to network variability.
53 In this context, improvements to video codecs may have reached the point of diminishing returns, but changes to the architecture of video systems can still yield signifcant benefts. Takeaways
• Salsify is a new architecture for real-time Internet video.
• Salsify tightly integrates a video-aware transport protocol, with a functional video codec, allowing it to respond quickly to changing network conditions.
• Salsify achieves 4.6⨉ lower p95-delay and 2.1 dB SSIM higher visual quality on average when compared with FaceTime, Hangouts, Skype, and WebRTC.
• The code is open-source, and the paper and raw data are open-access: https://snr.stanford.edu/salsify
Thank you: NSF, DARPA, Google, Dropbox, VMware, Huawei, Facebook, Stanford Platform Lab, and James.