Performance Analysis of Receive-Side Real-Time Congestion Control for Webrtc
Total Page:16
File Type:pdf, Size:1020Kb
Performance Analysis of Receive-Side Real-Time Congestion Control for WebRTC Varun Singh Albert Abello Lozano J¨org Ott Aalto University, Finland Aalto University, Finland Aalto University, Finland varun.singh@aalto.fi albert.abello.lozano@aalto.fi jorg.ott@aalto.fi Abstract—In the forthcoming deployments of WebRTC sys- congestion control, the IETF has chartered a new working tems, we speculate that high quality video conferencing will see group, RMCAT3, to standardize congestion-control for real- wide adoption. It is currently being deployed on Google Chrome time communication, which is expected to be a multi-year and Firefox web-browsers, meanwhile desktop and mobile clients are under development. Without a standardized signaling mech- process [1]; but early implementations are already available. anism, service providers can enable various types of topologies; In this paper, we evaluate the performance of WebRTC ranging from full-mesh to centralized video conferencing and video calls over different topologies and with varying amounts everything in between. In this paper, we evaluate the performance of cross-traffic. All experiments are conducted using the of various topologies using endpoints implementing WebRTC. Chrome browser and our testbed. With the testbed we are able We specifically evaluate the performance of the congestion con- trol currently implemented and deployed in these web-browser, to control the bottleneck link capacity, the end-to-end latency, Receive-side Real-Time Congestion Control (RRTCC). We use link loss rate and the queue size of intermediate routers. transport impairments like varying throughput, loss and delay, Consequently, in this testbed we can investigate the following: and varying amounts of cross-traffic to measure the performance. 1) utilization of the bottleneck link capacity, 2) (un)friendliness Our results show that RRTCC is performant when by itself, but to other media flows or TCP cross-traffic, and 3) performance starves when competing with TCP. When competing with self- similar media streams, the late-arriving flow temporarily starves in multiparty calls. The results in the paper complement the the existing media flows. results in [2]. We structure the remainder of the paper as follows. Sec- I. INTRODUCTION tion II discusses the congestion control currently implemented The development of WebRTC systems is going to encourage by the browsers and section III describes the architecture for wide adoption of video conferencing on the Internet. The monitoring the video calls. Section IV describes the testbed main reason is the shift from a desktop or stand-alone real- and the performance of the congestion control in various scen- time communication (RTC) application to RTC-enabled web- arios. Section V puts the results in perspective and compares browser. The standardization activity for WebRTC is split it to existing work. Section VI concludes the paper. between W3C1 and the IETF2. W3C is defining the Javascript APIs and the IETF is profiling the existing media components II. RECEIVE-SIDE REAL-TIME CONGESTION CONTROL (SRTP, SDP, ICE, etc.) for use with WebRTC. Until now, to Real-time Transport Protocol (RTP) [3] carries media data enable voice and video calls from within the browser required over the network, typically using UDP. In WebRTC, the media each service to implement their real-time communication stack packets (in RTP) and the feedback packets (in RTCP) are as a plugin, which the user downloads. Consequently, each multiplexed on the same port [4] to reduce the number of NAT voice and video service needed to build a plugin containing bindings and Interactive Connectivity Establishment (ICE) a complete multimedia stack. With WebRTC the multimedia overhead. Furthermore, multiple media sources (e.g, video and stack is built into the web-browser internals and the developers audio) are also multiplexed together and sent/received on the need to just use the appropriate HTML5 API. However, same port [5]. In topologies with multiple participants [6], me- WebRTC does not mandate a specific signaling protocol, dia streams from all participants are also multiplexed together and the implementers are free to choose from pre-existing and received on the same port [7]. protocols (SIP, Jingle, etc.) or write their own. In so doing, WebRTC provides the flexibility to deploy video calls in any Currently, WebRTC-capable browsers implement a conges- application-specific topology (point-to-point, mesh, centralized tion control algorithm proposed by Google called Receiver- mixer, overlays, hub-and-spoke). side Real-time Congestion Control (RRTCC) [8]. In the follow- While multimedia communication has existed for over a ing sub-sections, we briefly discuss the important features and decade (via Skype, etc.), there is no standardized congestion- assumptions of RRTCC, which are essential for the evaluation. control, but many have been proposed in the past. To tackle RRTCC has two components, a receiver-side and a sender-side, to function properly both need to be implemented. 1http://www.w3.org/2011/04/webrtc/ 2http://tools.ietf.org/wg/rtcweb/ 3http://tools.ietf.org/wg/rmcat/ 2 A. Receiver Side Control The receiver estimates the overuse or underuse of the bottle- neck link based on the timestamps of incoming frames relative to the generation timestamps. At high bit rates, the video '(£ frames exceed the MTU size and are fragmented over multiple &¨ " RTP packets, in which case the received timestamp of the last packet is used. When a video frame is fragmented all the RTP packets have the same generation timestamp [3]. Formally, the jitter is calculated as follows: Ji =(ti − ti−1) − (Ti − Ti−1), where t is receive timestamp, T is the RTP timestamp, and the Ji ¤¥¦ £¡! ©" ¡¢¡£ ¤¥¦ i and i-1 are successive frames. Typically, if is positive, it ¡¢ ¡ £ ¨© #! $$% © ¡ §¨ © ¨© corresponds to congestion. Further, [8] proposes inter-arrival §¨ © jitter as a function serialization delay, queuing delay and network jitter: Figure 1. Description of simple testing environment topology for WebRTC. Size(i) − Size(i − 1) Ji = + m(i)+v(i) Capacity ) ⎧ ⎪A i − × − . p p> . Here, m(i) and v(i) are a sample of a stochastic process, ⎨⎪ s( 1) (1 0 5 ) 0 10 which is modeled as white Gaussian process. When the mean A i A i − . ≤ p ≤ . s( )=⎪ s( 1) 0 02 0 1 of Gaussian process, m(i) is zero, then there is no ongoing ⎩⎪ 1.05 × (As(i − 1) + 1000) p<0.02 congestion. When the bottleneck is overused, the stochastic process m(i) is expected to increase, and when the congestion Lastly, the As(i) should be between the rate estimated by v(i) is abating it is expected to decrease. However, is expected the TFRC equation and REMB value signaled by the receiver. to be zero for frames that are roughly the same size, but The interesting part of the sender side algorithm is that it does sometimes the encoder produces key-frames (also called I- not react to losses under 10% and in the case of losses less frames) which are an order of magnitude larger and are than 2%, it even increases the rate by a bit over 5%. In these expected to take more time to percolate through the network. cases, in addition to the media stream, the endpoint generates Consequently, these packets are going to have an increasing FEC packets to protect the media stream, the generated FEC inter-arrival jitter just due to their sheer size. packets are in addition to the media sending rate. This is J The receiver tracks the i and frame size, while Capacity noticeable in the Chrome browser when the rate-controller C i m i ( ( ))and ( ) are estimated. [8] uses a Kalman filter to observes packet loss between 2-10%. In low-delay networks, 1 m i T compute [ C(i) ( )] and recursively updates the matrix. we also observe retransmissions requests sent by the receiver m i Overuse is triggered only when ( ) exceeds a threshold value in the form of Negative Acknowledgements (NACKs) and for at least a certain duration and a set number of frames are Picture Loss Indication (PLI). Full details can be found in m i received. Underuse is signaled when ( ) falls below a certain the Internet-Draft [8]. threshold value. When m(i) is zero, it is considered a stable situation and the old rate is kept. The current receiver estimate, III. MONITORING ARCHITECTURE Ar(i) is calculated as follows: ⎧ We run a basic web application to establish WebRTC calls. ⎪ . × RR ,m i > It runs on a dedicated server and uses nodejs4. The signaling ⎨⎪0 85 overuse ( ) 0 between browser and the server is over websockets5. Figure 1 Ar(i)= A i − ,m i ⎪ r( 1) stable ( )=0 describes an overview of the above architecture. We run mul- ⎩⎪ 1.5 × RR under-use,m(i) < 0 tiple TURN servers (e.g., restund6) on dedicated machines to traverse NATs and firewalls. The WebRTC-capable browsers The receiving endpoint sends an RTCP feedback message (Chrome7 and Firefox8) establish a connection over HTTPS containing the Receiver Estimated Media Bitrate (REMB) [9], with our web server and download the WebRTC javascript Ar at 1s intervals. (call.js) which configures the browser internals during call establishment. The media flows either directly between the B. Sender Side Control two participants or via the TURN servers if the endpoints are The sender side uses the TFRC equation [10] to estimate behind NATs or firewalls. the sending rate based on the observed loss rate (p), measured 4 RTT and the bytes sent. If no feedback is received for two http://www.nodejs.org/ v0.10.6 5http://socket.io/ feedback intervals, the media rate is halved. In case a RTCP 6http://www.creytiv.com/restund.html receiver report is received, the sender estimate is calculated 7https://www.google.com/intl/en/chrome/browser/, v27.01453.12 based on the number of observed losses (p).