Real-Time Bandwidth Prediction and Rate Adaptation for Video Calls Over Cellular Networks

Real-time Bandwidth Prediction and Rate Adaptation for Video Calls over Cellular Networks Eymen Kurdoglu Yong Liu Yao Wang [email protected] [email protected] [email protected] Department of Electrical and Computer Engineering NYU Tandon School of Engineering, Brooklyn, NY Yongfang Shi ChenChen Gu Jing Lyu [email protected] [email protected] [email protected] Tencent Co. Ltd. Shenzhen, China, 518057 ABSTRACT 1. INTRODUCTION We study interactive video calls between two users, where Advances in networking and video encoding technologies at least one of the users is connected over a cellular net- in the last decade have made the real-time video delivery ap- work. It is known that cellular links present highly-varying plications, including video calls and conferencing [17][9][3], network bandwidth and packet delays. If the sending rate an integral part of our lives. Despite their popularity in of the video call exceeds the available bandwidth, the video wired and Wi-Fi networks, real-time video applications have frames may be excessively delayed, destroying the interac- not found much use over cellular networks. The fundamental tivity of the video call. In this paper, we present Rebera, challenge of delivering real-time video over cellular networks a cross-layer design of proactive congestion control, video is to simultaneously achieve high-rate and low-delay video encoding and rate adaptation, to maximize the video trans- transmission on highly volatile cellular links with rapidly- mission rate while keeping the one-way frame delays suffi- changing available bandwidth (ABW), packet delay and loss. ciently low. Rebera actively measures the available band- On a cellular link, increasing the video sending rate beyond width in real-time by employing the video frames as packet what is made available by the PHY and MAC layers leads to trains. Using an online linear adaptive filter, Rebera makes self-congestion, and intolerable packet delays, hence frame a history-based prediction of the future capacity, and deter- delays. Excessively delayed frames will have to be treated mines a bit budget for the video rate adaptation. Rebera as lost. On the other hand, a conservative sending rate uses the hierarchical-P video encoding structure to provide clearly leads to the under-utilization of the cellular channel error resilience and to ease rate adaptation, while maintain- and consequently a lower quality than what is possible. ing low encoding complexity and delay. Furthermore, Re- The tight design space calls for a joint application and bera decides in real time whether to send or discard an en- transport cross-layer design, involving real-time video encoded frame, according to the budget, thereby preventing coding, bitrate control, sending rate adjustment, and error self-congestion and minimizing the packet delays. Our ex- control. Ideally, the transmitted video rate should closely periments with real cellular link traces demonstrate Rebera track the ABW on the cellular links. However, traditional can, on average, deliver higher bandwidth utilization and reactive congestion control algorithms [30, 2], which adjust shorter packet delays than Apple’s FaceTime. the sending rate through packet loss and/or packet delay in- formation, are too slow to adapt to the changes in the ABW, leading to either bandwidth under-utilization or significant CCS Concepts packet delays [27]. It is more preferable to design proactive •Networks → Cross-layer protocols; Network experi- congestion control algorithms that calculate the sending rate mentation; Network measurement; Mobile networks; based on cellular link ABW forecasts. Meanwhile, for video adaptation, video encoder can adjust various video encod- Keywords ing parameters, so that the resulting video bitrate matches the target sending rate determined by the congestion control real-time; cross-layer; forecasting; hierarchical-p algorithm. However, accurate rate control is very challenging for low-delay encoding, and significant rate mismatch is Permission to make digital or hard copies of all or part of this work for personal or often still present with the state-of-the-art video encoders. classroom use is granted without fee provided that copies are not made or distributed In addition, what makes the problem even more challenging for profit or commercial advantage and that copies bear this notice and the full cita- tion on the first page. Copyrights for components of this work owned by others than is that the lost and late packets can render not only their ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- corresponding frames, but also other frames non-decodable publish, to post on servers or to redistribute to lists, requires prior specific permission at the receiver. The encoder and the transport layer should and/or a fee. Request permissions from [email protected]. be designed to be error resilient so that lost and late packets MMSys’16, May 10-13, 2016, Klagenfurt, Austria have minimal impact on the decoded video. c 2016 ACM. ISBN 978-1-4503-4297-1/16/05. $15.00 In this study, we propose a new real-time video delivery DOI: http://dx.doi.org/10.1145/2910017.2910608 system, Rebera, designed for cellular networks, where we aim been an active area of research for decades. Window-based to maximize the sending rate of the video source and error re- congestion control algorithms are the dominant form of con- silience, while keeping the one-way frame delays sufficiently gestion control on the Internet, which reactively adjust the small. Our system consists of a proactive congestion con- window size, and hence the sending rate according to a troller, a temporal layered encoder, and a dynamic frame congestion signal. TCP variants such as Tahoe and New selection module. Our proactive congestion controller uses Reno [14] use packet losses, while Vegas [5], FAST [26] and the video frames themselves to actively measure the current Compound [21] react to packet round trip times. However, ABW in real-time, and then employs the well-known linear the additive-increase multiplicative-decrease (AIMD) prob- adaptive filtering methods [11] to predict the future capacity, ing method used in TCP, along with the retransmission of based on the past and present capacity measurements. For every single lost packet in these protocols renders them less error resilience, we resort to layered encoding, which enables desirable for interactive video calls. TFRC [8] and TCP unequal error protection (UEP). However spatial and quality Cubic [10] control the sending rate with smaller variations, layering incurs significant encoding complexity and coding however, their delay performances deteriorate in the face efficiency loss, making them unattractive for practical de- of fast ABW variations. As a result, rate-based congestion ployment. Thus we consider only temporal layering, which control protocols are dominantly used in commercial video provides a certain level of error resilience even without using call applications, such as Microsoft Skype, Google Hangouts explicit UEP. To minimize the delays for real-time delivery, and Apple FaceTime. Nonetheless, these protocols also be- we use hierarchical-P (hierP) coding structure for temporal have reactively when adjusting the sending rate, and there- layering. To address the rate control inaccuracy of the en- fore suffer from the same self-congestion problem over highly coder, we propose a dynamic frame selection algorithm for volatile links. Authors of [27] proposed a proactive conges- hierarchical-P, where the goal is to select in real-time which tion control scheme for realtime video delivery in cellular encoded frames to send, subject to a budget determined networks. They model cellular links as single-server queues by the predicted capacity value. Our frame selection algo- emptied out by a doubly-stochastic service process. For the rithm takes into account quality implications of the layers, ABW estimation, we, unlike [27], assume no particular time- decoding dependencies between the frames, and the smooth- evolution model for the link capacity. Furthermore, [27] fo- ness of frame inter-arrivals to maximize the delivered video cused only on congestion control without considering video quality under the said budget. We have implemented the adaptation. complete system, called “Rebera” for real-time bandwidth Adapting the video rate in real-time according to the send- estimation and rate adaptation, to evaluate its performance ing rate determined is crucial for a low-delay video applica- and compare it with Apple’s FaceTime video call application. This task is usually handled at the video encoder only. tion. Our implementation relies on an open source real-time However, if the rate control is not accurate and the encoded H.264 video encoder [24], which we have modified to produce video rate exceeds the rate constraint, sending every encoded a hierarchical-P stream. As a result, we can directly control frame will cause self-congestion. This problem can be allevi- the encoded video rate according to the measured capac- ated if the video stream has temporal layering, by allowing ity. Thanks to the combination of all these components we the sender to prioritize from lower to higher layers, until the have mentioned, Rebera is able to achieve higher bandwidth sending rate stays just below the rate constraint. However, utilization and lower frame delays than FaceTime. In this on-the-fly decision of discarding a higher-layer frame that study, we do not consider UEP among

Load more