Real-time Bandwidth Prediction and Rate Adaptation for Video Calls over Cellular Networks

Eymen Kurdoglu Yong Liu Yao Wang [email protected] [email protected] [email protected] Department of Electrical and Computer Engineering NYU Tandon School of Engineering, Brooklyn, NY Yongfang Shi ChenChen Gu Jing Lyu [email protected] [email protected] [email protected] Tencent Co. Ltd. Shenzhen, China, 518057

ABSTRACT 1. INTRODUCTION We study interactive video calls between two users, where Advances in networking and video encoding technologies at least one of the users is connected over a cellular net- in the last decade have made the real-time video delivery ap- work. It is known that cellular links present highly-varying plications, including video calls and conferencing [17][9][3], network bandwidth and packet delays. If the sending rate an integral part of our lives. Despite their popularity in of the video call exceeds the available bandwidth, the video wired and Wi-Fi networks, real-time video applications have frames may be excessively delayed, destroying the interac- not found much use over cellular networks. The fundamental tivity of the video call. In this paper, we present Rebera, challenge of delivering real-time video over cellular networks a cross-layer design of proactive congestion control, video is to simultaneously achieve high-rate and low-delay video encoding and rate adaptation, to maximize the video trans- transmission on highly volatile cellular links with rapidly- mission rate while keeping the one-way frame delays suffi- changing available bandwidth (ABW), packet delay and loss. ciently low. Rebera actively measures the available band- On a cellular link, increasing the video sending rate beyond width in real-time by employing the video frames as packet what is made available by the PHY and MAC layers leads to trains. Using an online linear adaptive filter, Rebera makes self-congestion, and intolerable packet delays, hence frame a history-based prediction of the future capacity, and deter- delays. Excessively delayed frames will have to be treated mines a bit budget for the video rate adaptation. Rebera as lost. On the other hand, a conservative sending rate uses the hierarchical-P video encoding structure to provide clearly leads to the under-utilization of the cellular channel error resilience and to ease rate adaptation, while maintain- and consequently a lower quality than what is possible. ing low encoding complexity and delay. Furthermore, Re- The tight design space calls for a joint application and bera decides in real time whether to send or discard an en- transport cross-layer design, involving real-time video en- coded frame, according to the budget, thereby preventing coding, bitrate control, sending rate adjustment, and error self-congestion and minimizing the packet delays. Our ex- control. Ideally, the transmitted video rate should closely periments with real cellular link traces demonstrate Rebera track the ABW on the cellular links. However, traditional can, on average, deliver higher bandwidth utilization and reactive congestion control algorithms [30, 2], which adjust shorter packet delays than Apple’s FaceTime. the sending rate through packet loss and/or packet delay in- formation, are too slow to adapt to the changes in the ABW, leading to either bandwidth under-utilization or significant CCS Concepts packet delays [27]. It is more preferable to design proactive •Networks → Cross-layer protocols; Network experi- congestion control algorithms that calculate the sending rate mentation; Network measurement; Mobile networks; based on cellular link ABW forecasts. Meanwhile, for video adaptation, video encoder can adjust various video encod- Keywords ing parameters, so that the resulting video bitrate matches the target sending rate determined by the congestion control real-time; cross-layer; forecasting; hierarchical-p algorithm. However, accurate rate control is very challeng- ing for low-delay encoding, and significant rate mismatch is Permission to make digital or hard copies of all or part of this work for personal or often still present with the state-of-the-art video encoders. classroom use is granted without fee provided that copies are not made or distributed In addition, what makes the problem even more challenging for profit or commercial advantage and that copies bear this notice and the full cita- tion on the first page. Copyrights for components of this work owned by others than is that the lost and late packets can render not only their ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- corresponding frames, but also other frames non-decodable publish, to post on servers or to redistribute to lists, requires prior specific permission at the receiver. The encoder and the transport layer should and/or a fee. Request permissions from [email protected]. be designed to be error resilient so that lost and late packets MMSys’16, May 10-13, 2016, Klagenfurt, Austria have minimal impact on the decoded video. c 2016 ACM. ISBN 978-1-4503-4297-1/16/05. . . $15.00 In this study, we propose a new real-time video delivery DOI: http://dx.doi.org/10.1145/2910017.2910608 system, Rebera, designed for cellular networks, where we aim been an active area of research for decades. Window-based to maximize the sending rate of the video source and error re- congestion control algorithms are the dominant form of con- silience, while keeping the one-way frame delays sufficiently gestion control on the Internet, which reactively adjust the small. Our system consists of a proactive congestion con- window size, and hence the sending rate according to a troller, a temporal layered encoder, and a dynamic frame congestion signal. TCP variants such as Tahoe and New selection module. Our proactive congestion controller uses Reno [14] use packet losses, while Vegas [5], FAST [26] and the video frames themselves to actively measure the current Compound [21] react to packet round trip times. However, ABW in real-time, and then employs the well-known linear the additive-increase multiplicative-decrease (AIMD) prob- adaptive filtering methods [11] to predict the future capacity, ing method used in TCP, along with the retransmission of based on the past and present capacity measurements. For every single lost packet in these protocols renders them less error resilience, we resort to layered encoding, which enables desirable for interactive video calls. TFRC [8] and TCP unequal error protection (UEP). However spatial and quality Cubic [10] control the sending rate with smaller variations, layering incurs significant encoding complexity and coding however, their delay performances deteriorate in the face efficiency loss, making them unattractive for practical de- of fast ABW variations. As a result, rate-based congestion ployment. Thus we consider only temporal layering, which control protocols are dominantly used in commercial video provides a certain level of error resilience even without using call applications, such as Microsoft Skype, Google Hangouts explicit UEP. To minimize the delays for real-time delivery, and Apple FaceTime. Nonetheless, these protocols also be- we use hierarchical-P (hierP) coding structure for temporal have reactively when adjusting the sending rate, and there- layering. To address the rate control inaccuracy of the en- fore suffer from the same self-congestion problem over highly coder, we propose a dynamic frame selection algorithm for volatile links. Authors of [27] proposed a proactive conges- hierarchical-P, where the goal is to select in real-time which tion control scheme for realtime video delivery in cellular encoded frames to send, subject to a budget determined networks. They model cellular links as single-server queues by the predicted capacity value. Our frame selection algo- emptied out by a doubly-stochastic service process. For the rithm takes into account quality implications of the layers, ABW estimation, we, unlike [27], assume no particular time- decoding dependencies between the frames, and the smooth- evolution model for the link capacity. Furthermore, [27] fo- ness of frame inter-arrivals to maximize the delivered video cused only on congestion control without considering video quality under the said budget. We have implemented the adaptation. complete system, called “Rebera” for real-time bandwidth Adapting the video rate in real-time according to the send- estimation and rate adaptation, to evaluate its performance ing rate determined is crucial for a low-delay video applica- and compare it with Apple’s FaceTime video call applica- tion. This task is usually handled at the video encoder only. tion. Our implementation relies on an open source real-time However, if the rate control is not accurate and the encoded H.264 video encoder [24], which we have modified to produce video rate exceeds the rate constraint, sending every encoded a hierarchical-P stream. As a result, we can directly control frame will cause self-congestion. This problem can be allevi- the encoded video rate according to the measured capac- ated if the video stream has temporal layering, by allowing ity. Thanks to the combination of all these components we the sender to prioritize from lower to higher layers, until the have mentioned, Rebera is able to achieve higher bandwidth sending rate stays just below the rate constraint. However, utilization and lower frame delays than FaceTime. In this on-the-fly decision of discarding a higher-layer frame that study, we do not consider UEP among the temporal layers was encoded before the more important lower-layer frames and the error resilience aspect of the system. These will be is not trivial, since the sizes of the upcoming encoded frames investigated in future studies. are yet unknown. To the best of our knowledge, there is no published work addressing this problem. 2. RELATED WORK Rate adaptation is a key problem for video transmission 3. PROPOSED SYSTEM OVERVIEW over best-effort networks. Most of the previous studies focus We examine a real-time video delivery scenario between a on one-way streaming of live or recorded video, where a few sender and a receiver, where at least one user is connected to seconds of video buffering at the receiver can be tolerated. a (Figure 1). We denote the source device Due to buffering, a temporary mismatch between the video by S, the destination device by D, and the corresponding rate and the ABW does not directly impact the video play- base stations by BS and BD, respectively. We call the di- back, as long as the buffer does not drain out. The recent rected path from S to D the forward path, and the directed industry trend here is Dynamic Adaptive Streaming over path from D to S in the reverse direction the backward path. HTTP (DASH) [20]. Various rate adaptation algorithms We assume that the in-network directed path (BS ,BD) that have recently been proposed [6, 22, 13, 29], and some of them connect the base stations has higher ABW, and constant were specifically designed for wireless networks, e.g. [19, 23]. queuing and propagation delay. Therefore, the overall ABW To the contrary, video call involves two-way real-time video along the forward path (S, BS ,BD,D) is equal to the min- streaming. To facilitate live interaction, video call does not imum of the bandwidths along the cellular uplink (S, BS ) have the luxury of seconds of video buffering. Consequently, and cellular downlink (BD,D). mismatch between the selected video rate and the ABW According to the queuing model of [27], all packets des- will directly translate into quality degradation of video play- tined to or sent from a given mobile device that is connected back, such as long video frame delays, delay jitters and video to a base station are queued up in isolated buffers, which are freezes. Thus, for a video call, real-time bandwidth estima- located on the mobile device for the uplink and in the base tion and video rate adaptation are more challenging, com- station for the downlink. These buffers are not shared by pared with one-way video streaming. any other flow to or from other users; that is, there is no Sending rate determination, or congestion control, has cross-traffic in these queues. The backlogged packets leave sender Packet Packet Decode & BS ISP1 Internet ISP2 BD Transmission Reception Display

H.264/AVC Capacity Capacity HierP Prediction Measurement receiver S D Figure 2: Proposed Rebera real-time video delivery system for cellular networks Figure 1: Cellular links between the mobile devices and their respective base stations are in red. 4. SENDING RATE CONTROL 4.1 Measuring the Available Bandwidth their respective buffers once they are successfully transmit- Packet pair/train methods [18] are well-known active ca- ted over the link. Thus, how fast these buffers are emptied pacity measurement schemes for finding the minimum capac- out directly reflects the capacity of the cellular links, and ity along a network path. The performance of these methods consequently the end-to-end ABW. improve if there is no cross-traffic on the links, making them As for the video stream, we assume that the sender uses suitable to measure the cellular link capacity according to a layered encoder so that it can easily adjust the sending our model. In our system, we propose measuring the av- rate by adjusting the number of video layers transmitted. erage ABW c(t1,t2) actively at the destination, using the Layered coding also enables unequal error protection; i.e., a video frames received in (t1,t2] as packet trains. Using the basic level of quality can be guaranteed with high likelihood video frames as packet trains enables us to directly exploit by providing more protection to the base layer. We consider the video data flow for capacity measurements and to avoid only temporal layering to keep the encoding complexity and sending additional measurement traffic. Specifically, at the overhead at a minimum. In order to minimize the encoding sender side, we first packetize each frame regardless of its delay, we further assume that the hierarchical-P structure size into p ≥ 2 packets , and then send them together in a (Figure 3) is used to achieve temporal scalability. Starting burst. The resulting instantaneous sending rate is likely to with the highest temporal layer, the frames can be discarded be higher than the instantaneous capacity of the link. As a to reduce the video rate. In the example shown in Figure result, packets queue up in the bottleneck; i.e. the base sta- 3, each Group of Picture (GoP) consists of 4 frames, which tion buffer for the downlink or the cellular device buffer for leads to three temporal layers (TLs). We assume that the the uplink, where they are transmitted one by one. Then, encoder inserts an I-frame every N frames, and we denote at the receiver side, we take capacity measurements {mn}, the time duration covering all N frames from an I-frame where the sample mn is obtained by using the arriving video up to but excluding the next I-frame by an “intra-period.” frame n as a packet train. Let us denote the inter-arrival Then, the time duration T foranintra-periodisequalto time between packet i − 1andi by ai,andthesizeofthe N/f, where f is the frame rate of the captured video. packet i by zi. Then, we can calculate the capacity sampled We can now summarize the operation of the proposed sys- by frame n as: tem. Since rate control is usually performed once per intra- period in conventional video encoders, we predict the aver- z2 + ···+ zp Zn mn   . (1) age ABW for each new intra-period. The prediction is based a2 + ···+ ap An on the average ABWs for the past intra-periods, which are For any time period (t1,t2], we can estimate the average measured by the receiver and fed back to the sender. Specif- capacity c(t1,t2) over this time simply by ically, the receiver periodically measures the ABW using the  video frames that arrived within the last T -second window, Z n∈N n and then feeds the result back to the sender. The window c˜(t1,t2)= , (2) n∈N An slides forward every Δ seconds. In order to have as fresh feedback messages as possible, we have Δ  T . The sender, where N is the set of all frames that arrived in (t1,t2]. Note in turn, records the most recent measurement, and updates that Eq. (2) is equivalent to taking a weighted average of its value with the arrival of each new measurement. Then, all the capacity samples in {mn}, where the sample mn at the beginning of the next intra-period k, the value of the is weighted in proportion to its measurement duration An most recent measurement is taken as the ABWc ˜k−1 mea- with weight wn = An/ n∈N An. Having completed the av- sured during the last intra-period k − 1. This value is input erage capacity measurement regarding (t1,t2], the receiver to an adaptive linear prediction filter, which then updates prepares a small feedback packet and sends it back to the its predictionc ˆk regarding the ABW during the new intra- source. Note that we are ultimately interested in measuring period k using the past bandwidth valuesc ˜k−1,...,c˜k−M . the ABW ck during (Tk,Tk+1], where Tk denotes the start of Using this prediction, the sender calculates the bit budget the intra-period k. However, since the sender and receiver bk, which is the maximum number of bits that the sender is have different clock times in general, the receiver cannot allowed to send during this intra-period so that all the data know when exactly an intra-period starts. Furthermore, the that have been sent arrive at the receiver until the end of feedback packets are subject to time-varying delays in the the intra-period with a high probability. The components of network. In short, we cannot guarantee that the feedback our design can be seen in Figure 2. packets will arrive at the sender on time for predicting the capacity of the next intra period. To address this issue, the receiver measures the average capacity within the last T sec- Table 1: Notation regarding the RLS predictor M onds every Δ seconds, where Δ  T . Each of these measure- number of filter taps λ ments are immediately sent back to the sender. Specifically, forgetting factor parameter θ a measurement generated at time t is the average capacity initializer parameter for P k M in (t − T,t], while the next measurement that is generated w( ) filter tap vector of length k M × M at t + Δ is the average bandwidth in (t − T +Δ,t+ Δ]. The P ( ) inverse of empirical autocorrelation matrix, g(k) gainvectoroflengthM sender then uses the latest feedback received before Tk to ck measured capacity predict the ABW in the next intra-period (Tk,Tk+1]. Lastly, k M assuming we keep the sending rate below the capacity, our c( ) vector of most recently measured capacity values  measurement accuracy depends on the difference between k a priori prediction error the sending rate and the capacity of the link. If the send- ing rate equals, or by any chance, exceeds the capacity, we would have very high measurement accuracy, but this may cal autocorrelation matrix of the measured capacities. The lead to saturated links and long queueing delays, which are overall procedure is summarized in Algorithm 1. detrimental to video call quality. Algorithm 1 Recursive Least Squares Robustness against bursts 1: P (0) = θ−1I, w(0) = 0, c(k)=0  Initialization It is known that the cellular links occasionally experience 2: for all intra-period k ≥ 1 do −1 channel outages that may last for up to several seconds, k λ P (k−1)c(k−1) 3: g( )= 1+λ−1cT (k−1)P (k−1)c(k−1) during which the capacity essentially drops to zero, and the T 4: k =˜ck − w (k − 1)c(k − 1) packets in transit are backlogged in the respective buffers. 5: w(k)=w(k − 1) + kg(k) As a result, the sender should stop sending any more pack- −1 T 6: P (k)=λ [P (k − 1) − g(k)c (k − 1)P (k − 1)] ets as soon as an outage is detected. When the outage ends, T 7:c ˆk+1 = w (k)c(k) all the packets queued up in the buffer are usually trans- 8: end for mitted and arrive at the receiver as a burst. If the receiver uses these packets for capacity measurement, the burst rate, which is on the order of several Mbps, can severely disrupt 4.3 Determining the Sending Rate the learning process of the predictor. In order to protect our system against these bursty measurements, we simply Our ultimate goal is to ensure that all the frames sent detect them through the sample measurement duration An. during an intra-period finish their transmission before the In our system, we consider a measurement bursty if An < 10 start of the next one. In other words, we aim to have each ms. Bursty measurements are simply discarded. I-frame encounter empty buffers with high probability. Let us denote our sending rate in the intra-period k +1 by rk+1. 4.2 Predicting the Available Bandwidth We determine rk+1 such that the probability to exceed the c History-based forecast is a popular method for prediction capacity k+1 is low; that is, [12], where the past measurement values are used to deter- Pr(ck+1

6. SIMULATIONS AND EXPERIMENTS Table 3: Comparing the prediction error RMS of the RLS predictor with those of the best and worst 6.1 Forecasting via Adaptive Filtering EWMA predictors with corresponding parameters. We start our evaluations by motivating the use of the RLS RLS, Best and Worst columns are in Kbps. linear adaptive filter for capacity prediction. We compare RLS αBest Best αWorst Worst the prediction performance of the RLS with the simple and Tr1 53 0.55 55 0.05 87 popular EWMA predictor [12]. In our experience, the filter length and the forgetting factor parameters do not signifi- Tr2 88 0.7 90 0.05 120 cantly affect the prediction errors provided that we choose Tr3 158 0.55 157 0.05 209 M<10 and λ>0.99. Therefore, we have selected M =5, Tr4 186 0.4 178 0.05 211 λ =0.999, θ =0.001 and fixed this configuration for the rest Tr5 250 0.2 235 1 293 of the evaluations. We collected six real cellular link capac- Tr6 244 0.4 242 0.05 291 ity traces (Figure 7) following the methodology in [27], over selection (DFS) algorithm against the layer-push (LP) and 300 0.35 frame-push (FP) algorithms. LP also estimates the frame DFS 250 LP 0.3 size in each temporal layer using the same approach as in FP 0.25 DFS, but then decides on the highest layer lmax that may 200 be sent. In other words, only the frames from layers up to 0.2 l 150 max are eligible for sending. Among these frames, following 0.15 the encoding order, the algorithm sends as many frames as 100 0.1 possible until the bit budget is exhausted. FP, on the other 50 0.05 DFS hand, does not consider layer information and sends as many LP Unutilized budget fraction

Total number of frames sent FP frames as possible following their encoding order, until the 0 0 20 40 60 80 20 40 60 80 bit budget is exhausted. Rate budget (kB) Rate budget (kB) For each algorithm, we evaluate the total number of frames 25 16 sent, the mean and the standard deviation of the resulting DFS DFS LP 14 LP frame intervals, and finally the fraction of the unused bit 20 FP FP budget. Here, a frame interval represents the temporal dis- 12 tance between a pair of consecutive frames that have been 15 10 selected to be sent. The frame interval statistics are cal- 8 culated using the fraction of time each interval lasts as the 10 6 probability to observe that interval. We use the JM en- 4 5 coder [15] to encode the video sequence “Crew” [28] with a 2 hierarchical-P structure having three temporal layers (GoP 0 0

Mean frame interval (frame-time) 20 40 60 80 20 40 60 80 length=4) and intra-period of 32 frames. We used a fixed RMS of frame interval (frame-time) quantization parameter (QP) of 36, yielding the average bi- Rate budget (kB) Rate budget (kB) trate of 415 kbps when all frames are included. The resulting video sequence has a frame rate of 30 fps and comprises 9 Figure 4: Comparison of DFS with FP and LP; intra-periods, with an intra-period of T =32/30 seconds. number of frames sent (upper-left), unused budget For the proposed algorithm, we used γ =0.75, which was (upper-right), mean and standard deviation of the found to perform the best, and the frame priority order is frame intervals (lower-left and lower-right). Encod- π = (0, 4, 8, 12, 16, 20, 24, 28, 30, 14, 6, 22, 26, 18, 10, ing frame rate is 30 Hz, thus frame-time is 1/30 sec. 2, 31, 15, 7, 23, 27, 19, 11, 3, 1, 5, 9, 13, 17, 21, 25, 29). In these simulations, we assume that the bit budget bk is constant for each intra-period k of the video and we want rics. In order to calculate the bandwidth utilization, we to compare the performances of the algorithms described count how many bytes were sent out by the video call appli- above under different bk values,from10kBto80kB.In cation under test and compare it with the minimum of the Figure 4, we see that FP sends the most frames by sending capacities of the sender link and the receiver link. The one- as many frames as possible. However, it also has the largest way end-to-end delays are collected by different means: in mean frame interval and the largest frame interval variation, Rebera experiments, for each packet that made it to the re- making the displayed video jittery. On the other hand, the ceiver, the receiver sends an acknowledgement packet back LP algorithm sends the lowest number of frames but also to the sender over an ethernet cable on which there is no with lower mean frame interval and frame interval variance. other traffic (Figure 5). As a result, the measured round-trip The proposed DFS algorithm achieves a good compromise times are almost equal to the one-way delays, enabling us between sending more frames, consequently utilizing ABW to measure the delay for each individual packet and frame. more closely, and reducing the frame distance variation. In In FaceTime experiments, we used Wireshark to sniff the fact, DFS outperforms both methods in terms of the mean packets on the emulator hosts. We also note that FaceTime and standard deviation of the frame intervals, while sending sends voice packets even after the voice is muted, at a con- almost as many frames as the FP. Finally, the plot in the stant rate of 32 kbps. Rebera, on the other hand, does not upper right shows the fraction of the unused bandwidth for send audio. In order to compensate for this in the bandwidth each method, where we see that the performance of DFS is utilization calculations, we subtract 32 kbps from the send- verysimilartoFP,whereasLPisnotasefficient. ing rate achieved by Rebera. In each test, we loop the video sequence “Crew”, which is more challenging in terms of the 6.3 Evaluation on the Testbed video rate than ”Akiyo” and somewhat captures hand/arm For system evaluation, we developed a testbed to compare movements present in video calls. Rebera with popular video call applications. On this testbed Rebera is able to encode the video in real-time thanks to (Figure 5), S and D are the source and destination end- the open source x264 video encoder [24]. This allows us to points running the video call application under test, while change the video rate according to the predicted ABW, for the nodes CS and CD are cellular link emulators running each new intra-period, using x264’s rate control module. We the CellSim software [27], respectively. The emulators are have modified x264’s code, so that the encoded video has a connected to each other through the campus network, and hierarchical-P coding structure, by changing the reference to their respective end-points via Ethernet. For cellular link frames used before encoding each frame according to the emulation, we use the uplink and downlink capacity traces H.264/AVC standard. Specifically, in our modification, the collected (Table 2). For evaluation, we report the ABW uti- GoP length is set to 4, giving rise to 3 temporal layers. In lization, the 95-percentile one-way packet delays, and the all our experiments in the lab, the minimum and maximum 95-percentile one-way frame delays as the performance met- encoding rates were set as 200 kbps and 3 Mbps, respec- CS CD 0.5 Tr1 NYU Network 0 0 200 400 600 800 1000 0.5 Tr2 0 eth0 eth0 0 200 400 600 800 1000 eth1 eth1 Tr3 S D 1 0 0 200 400 600 800 1000 Figure 5: Illustration of the testbed. Purple arrows 2 Tr4 1 indicate the flow direction of the video, whereas the 0 acks follow the green arrow from D to S. 0 200 400 600 800 1000 Tr5 2 Rebera measured bandwidth 600 Rebera in-queue 0 Rebera budget 0 200 400 600 800 1000 Tr6 Rebera sending rate 2 FaceTime sending rate 500 0 0 200 400 600 800 1000

400 Figure 7: Traces used in the experiments. Vertical axis: capacity (Mbps), horizontal axis: intra-period 300 index. Traces 2, 3, 4 and 5 are used as forward kbps capacities, 1 and 6 are used as backward capacities.

200 In this set of experiments, we use cellular bandwidth traces (Figure 7) to emulate the cellular links. Each experiment 100 lasts for 1000 intra-periods (1066 sec). We first present the results involving a single cellular link along the end-to-end 0 path. Specifically, we start by examining the particular sce- 0 100 200 300 400 500 600 700 nario where the sender is connected through a cellular net- Intra-period index work, and traces 5 and 6 were used to emulate the forward and backward end-to-end ABW, respectively. The receiver Figure 6: Sending rate of Rebera and FaceTime un- is assumed to have a wired connection. In the top plot der piecewise constant bandwidth. of Figure 9, we present Rebera’s and FaceTime’s sending rates over time. Here, Rebera achieves a forward bandwidth utilization of 75.6%, while FaceTime’s utilization is 65.2%. tively. The video and RLS parameters used are the same as Furthermore, 95-percentile packet and frame delays are ob- in Sections 6.2 and 6.1. Specifically, the encoding frame rate served to be 204 and 232 ms for Rebera, and 307 and 380 ms is 30 Hz, and the intra-period length T is 32 frames, or 1.066 for FaceTime. The empirical packet and frame delay CDFs seconds. The initial sending rate is set to 120 kbps. In each forbothsystemscanbeseeninFigure8. experiment, we evaluate the sending rate over consecutive periods of T seconds. Please note that FaceTime may not be using a constant intra-period, let alone the same intra- 1 1 period T as Rebera. Furthermore, FaceTime’s sending rate 0.9 0.9 is, in general, the sum of FEC and the video data rates. In 0.8 0.8 order to feed the same looped test video into FaceTime, we 0.7 0.7 1 used the ManyCam [25] virtual webcam on Mac OS 10.10.4 . 0.6 0.6 0.5 0.5 6.3.1 Evaluation with Piecewise Constant Bandwidth 0.4 0.4 In this experiment, we use a piecewise constant bandwidth 0.3 0.3

trace, with steps of 100 kbps lasting 100 seconds, between 1-way frame delay CDF 1-way packet delay CDF 0.2 0.2 300 and 600 kbps. We set the packet loss rate to zero. In 0.1 Rebera 0.1 Rebera Figure 6, we can see Rebera’s (i) measured bandwidth, (ii) FaceTime FaceTime 0 0 rate reduction due to the estimated number of backlogged 0 500 1000 1500 2000 2500 0 500 1000 1500 2000 2500 msec msec bits, (iii) overall budget and (iv) the sending rate, along with FaceTime’s sending rate. On average, the bandwidth utiliza- tion of Rebera is 86.21%, while FaceTime achieves a utiliza- Figure 8: Empirical packet (left) and frame (right) tion of 78.78%. Moreover, we can observe that, Rebera is delay CDFs of Rebera and FaceTime, where forward able to measure the current bandwidth very accurately, and and backward bandwidths were emulated via traces thus react to the changes in the bandwidth rapidly. 5 and 6, respectively. 6.3.2 Evaluation with Cellular Capacity Traces Similarly, we employ traces 2, 3, 4 and 5 as the forward, and traces 1 and 6 for the backward end-to-end ABW. Re- 1Detailed explanations can be found online at [1]. sults are summarized as bandwidth utilization, 95-percentile Table 4: Evaluation over single cellular link, using Table 6: Evaluation when both users are connected Trace 1 as the backward capacity. Reported val- over different cellular networks. Reported values are ues are bandwidth utilization percentage, 95-perc. bandwidth utilization percentage, 95-perc. packet packet delay, and 95-perc. frame delay, respectively. delay, and 95-perc. frame delay, respectively.

Fwd Cap. Rebera(%,ms,ms) FaceTime(%,ms,ms) Rebera(%,ms,ms) FaceTime(%,ms,ms) Trace 2 68.8, 402, 402 58.1, 447, 415 AtoB 60.5, 300, 312 47.9, 529, 508 Trace 3 63.8, 338, 334 32.7, 631, 528 BtoA 58.1, 483, 498 48.5, 485, 483 Trace 4 73.5, 177, 201 63.0, 383, 392 AtoC 59.9, 432, 442 44.0, 588, 518 Trace 5 71.8, 216, 243 58.7, 317, 341 CtoA 61.3, 1066, 1019 43.3, 1278, 851 Average 69.5, 283, 295 53.1, 444, 419 BtoC 61.2, 416, 419 28.2, 1180, 996 CtoB 59.5, 804, 809 44.0, 3230, 1090 Average 60.1, 583, 583 42.6, 1215, 741

Table 5: Evaluation over single cellular link, using Trace 6 as the backward capacity. Reported val- ues are bandwidth utilization percentage, 95-perc. Table 7: Effect of the tolerance parameter on Rebera packet delay, and 95-perc. frame delay, respectively. over single cellular link. Forward-backward capac- ity: traces 3-6

Fwd. Cap. Rebera(%,ms,ms) FaceTime(%,ms,ms) δ 0.05 0.1 0.2 0.5 Trace 2 69.5, 381, 394 59.2, 426, 406 ABW utilization (%) 66.4 69.6 72.7 75.36 Trace 3 66.4, 307, 313 61.9, 341, 349 95-perc. packet delay (ms) 307 354 371 486 Trace 4 76.1, 168, 189 70.6, 276, 307 95-perc. frame delay (ms) 313 367 403 516 Trace 5 75.6, 204, 232 65.2, 307, 380 Average 71.9, 265, 282 64.2, 337, 360

more frequent large packet and frame delays, in exchange for higher bandwidth utilization, which could be the case for packet and frame delay tuples in Tables 4 and 5. We can see video applications with less stringent delay requirements. from Table 4 and 5 that in all experiments, Rebera achieves a higher utilization of the forward ABW with shorter delays. 6.3.4 Rebera Bahavior in Presence of Packet Loss Averaged over these experiments, Rebera provides 20.5% The purpose of this evaluation is to demonstrate that Re- higher bandwidth utilization compared to FaceTime, and a bera can still track the link capacity in the presence of packet reduction of 122 ms and 102 ms in the 95-percentile packet loss. Note that additional studies are needed to investigate and frame queueing delays, respectively. When a more chal- the error resilience provided by the temporal layering, and lenging backward capacity (trace 1 in Table 4) is used for the how to further improve it through unequal error protection. backward path, the information fed back to the sender side To examine the performance of Rebera in presence of packet undergo a longer delay for both Rebera and FaceTime, de- loss, we employ CellSim to introduce random iid losses. We creasing the ABW utilization of both systems. FaceTime’s tested Rebera when the packet loss rate is 5% and 10%, and delay performance also degrades, whereas Rebera is still able the results are given in Table 8. Although not significantly, to provide similar delays. the ABW utilization drops with the loss rate, as there are Next, we consider the scenarios where both users are con- fewer packets crossing the links. Furthermore, the delays ex- nected over different cellular links. We assume there exists perienced by the received frames reduce, since there is less three different cellular connections, which we denote by A, backlog in the buffers. B and C, where the uplink and downlink ABW pairs for each connection are given as traces 5 and 6, traces 3 and 4, and traces 1 and 2, respectively. In other words, connection Table 8: Effect of the packet losses on Rebera over A provides the highest mean ABW, while the connection single cellular link. Forward-backward capacities: C provides the lowest. We evaluate all six cases for which traces 3-6 the sender and the receiver have different connections. The results can be seen in Table 6. In all scenarios, Rebera pro- Packet loss rate 0% 5% 10% vides a significantly higher ABW utilization, while still de- ABW utilization (%) 66.4 63.4 61.1 livering shorter packet and frame delays on average and in 95-perc. packet delay (ms) 307 281 286 most cases. 6.3.3 Effect of the Tolerance Parameter Next, we investigate the effect of the tolerance parameter 6.4 Evaluation over Cellular Networks δ in Section 4.3 on Rebera. We vary δ from0.05upto0.5, Finally, we evaluate Rebera over a real cellular network. and record the utilization and 95-percentile packet delays in The setup we used for this experiment can be seen in Figure Table 7. Having a larger δ value means the system is willing 10. Here, a mobile device (Motorola Nexus 6) is tethered to to tolerate more frequent capacity overshoots, and hence the sender host via USB, acting as a modem. The sender sending rates 3000 Forward capacity Rebera sending rate 2500 FaceTime sending rate 2000 1500 kbps 1000 500 0 0 100 200 300 400 500 600 700 800 900 1000 time(sec) 1-way e2e frame delay for Rebera 5000 4000 3000 2000 1000

frame delay(ms) 0 0 100 200 300 400 500 600 700 800 900 1000 time(sec)

1-way e2e frame delay for FaceTime 5000 4000 3000 2000 1000 frame delay(ms) 0 0 100 200 300 400 500 600 700 800 900 1000 time(sec)

Figure 9: Video sending rates of Rebera and FaceTime (top) over a single cellular link with traces 5 & 6 as the forward & backward capacities, respectively. We also present the one-way end-to-end frame delays of Rebera (middle) and FaceTime (bottom), where the horizontal axis represents the sending time of each frame. The average ABW utilization and 95-perc. packet and frame delay are reported in Table 5, row 4.

Table 9: Rebera’s performance over T-Mobile net- B NYU S T-Mobile Internet work with HSPA and LTE technologies Network Rebera HSPA LTE average sending rate (Mbps) 0.81 5.17 95-perc. packet delay (ms) 221 105 receiver 95-perc. frame delay (ms) 298 137 (public IP) usb sender Rebera in-the-wild 10 Sending rate LTE 9 Measured bandwidth LTE Figure 10: Setup for experiments in-the-wild Sending rate HSPA Measured bandwidth HSPA 8

7 is stationary during the experiments, which last for 10 min- utes. The receiver host is inside the campus network, and 6 has a public IP address. The experiment was done over the 5

T-Mobile network, using LTE and HSPA on December 4, Mbps 2015 at 6 PM. In Figure 11, we can see that, in the HSPA 4 uplink, which provides an average ABW of 1.055 Mbps, the 3 outages may last as long as 20 seconds. On the other hand, LTE provides an almost outage-free uplink channel, with an 2 average ABW of 5.95 Mbps. Table 9 summarizes the exper- 1 iment results. Note that during the experiment over LTE, 0 Rebera’s maximum encoding rate is set as 10 Mbps. This 0 100 200 300 400 500 600 change serves as a means to utilize the ABW better, and is Intra-period not necessary for Rebera’s operation. The empirical packet delay distributions for Rebera using either access technology Figure 11: Rebera sending rate and measured band- is given in Figure 12. widthoverLTEandHSPA 1 1 [7] B. Farhang-Boroujeny. Adaptive filters: theory and 0.9 0.9 applications. John Wiley & Sons, 2013. LTE LTE [8]S.Floyd,M.Handley,J.Padhye,andJ.Widmer. 0.8 HSPA 0.8 HSPA Equation-based congestion control for unicast applications, volume 30. ACM, 2000. 0.7 0.7 [9] Google. Hangouts. 0.6 0.6 http://www.google.com/+/learnmore/hangouts/. [10] S. Ha, I. Rhee, and L. Xu. Cubic: a new tcp-friendly 0.5 0.5 high-speed tcp variant. ACM SIGOPS Operating Systems 0.4 0.4 Review, 42(5):64–74, 2008. [11] S. Haykin. Adaptive filter theory. Prentice Hall, 2002. 0.3 0.3 [12] Q. He, C. Dovrolis, and M. Ammar. On the predictability 1-way frame delay CDF 1-way packet delay CDF 0.2 0.2 of large transfer tcp throughput. ACM SIGCOMM Computer Communication Review, 35(4):145–156, 2005. 0.1 0.1 [13] T.-Y. Huang, R. Johari, N. McKeown, M. Trunnell, and M. Watson. A buffer-based approach to rate adaptation: 0 0 0 200 400 0 200 400 Evidence from a large video streaming service. In msec msec Proceedings of the 2014 ACM conference on SIGCOMM, pages 187–198. ACM, 2014. Figure 12: One-way packet and frame delay CDFs [14] V. Jacobson. Congestion avoidance and control. In ACM of Rebera over LTE and HSPA SIGCOMM computer communication review, volume 18, pages 314–329. ACM, 1988. [15] JM Ref. Software. http://iphome.hhi.de/suehring/tml/. 7. CONCLUSIONS [16] Y. Liu, Z. G. Li, and Y. C. Soh. A Novel Rate Control Scheme for Low Delay Video Communication of Video calls over cellular links have to adapt to fast-changing H.264/AVC Standard. IEEE Transactions on Circuits and network bandwidth and packet delay. In this study, we Systems for Video Technology, 17(1):68–78, 2007. proposed a new real-time video delivery system, Rebera, [17] Microsoft. Skype. http://www.skype.com. designed for cellular networks. Rebera’s proactive conges- [18] R. Prasad, C. Dovrolis, M. Murray, and K. Claffy. tion controller uses the video frames to actively measure Bandwidth estimation: metrics, measurement techniques, the capacity of cellular links, and through these measure- and tools. IEEE Network, 17(6):27–35, 2003. ments makes a safe forecast for future capacity values, us- [19] W. Seo and R. Zimmermann. Efficient video uploading from mobile devices in support of http streaming. ACM ing the well-known adaptive filtering techniques. Through Multimedia Systems, 2012. its dynamic frame selection module designed for temporal [20] I. Sodagar. The MPEG-DASH Standard for Multimedia layered streams, Rebera ensures that its video sending rate Streaming over the Internet. IEEE MultiMedia, (4):62–67, never violates the forecast by discarding higher layer frames, 2011. thereby preventing self-congestion, and reducing the packet [21] K. Tan, J. Song, Q. Zhang, and M. Sridharan. A Compound and consequently the frame delays. Our experiments showed TCP Approach for High-speed and Long Distance that Rebera is able to deliver higher bandwidth utilization Networks. In Proceedings of IEEE INFOCOM, 2006. and shorter packet and frame delays compared with Apple’s [22] G. Tian and Y. Liu. Towards Agile and Smooth Video Adaptation in Dynamic HTTP Streaming. In ACM FaceTime on average. In the future, we will consider UEP International Conference on emerging Networking among temporal layers to improve the system performance EXperiments and Technologies (CoNEXT), 2012. in the presence of packet loss. [23] G. Tian and Y. Liu. On Adaptive Http Streaming to Mobile Devices. In Packet Video Workshop. IEEE, 2013. [24] VideoLAN. x264. 8. ACKNOWLEDGMENTS http://www.videolan.org/developers/x264.html. This research is supported in part by Tencent Co. Ltd, the [25] Visicom Media. ManyCam. https://manycam.com. NYU WIRELESS center, and the New York State Center for [26] D. X. Wei, C. Jin, S. H. Low, and S. Hegde. FAST TCP: Advanced Technology in Telecommunications (CATT). motivation, architecture, algorithms, performance. IEEE/ACM Transactions on Networking (ToN), 14(6):1246–1259, 2006. 9. REFERENCES [27] K. Winstein, A. Sivaraman, and H. Balakrishnan. [1] Video Lab @NYU. http://vision.poly.edu/index.html/ Stochastic Forecasts Achieve High Throughput and Low index.php?n=HomePage.Rebera. Delay over Cellular Networks. In USENIX Symposium on [2] H. Alvestrand and S. Holmer. A Google Congestion Control Networked Systems Design and Implementation,pages Algorithm for Real-Time Communication on the World 459–471, 2013. Wide Web. 2012. [28] Xiph.org. Test Media. https://media.xiph.org/video/derf/. [3] Apple. Facetime. http://www.apple.com/mac/facetime/. [29] X. Yin, A. Jindal, V. Sekar, and B. Sinopoli. A [4] A. Arasu and G. S. Manku. Approximate counts and Control-Theoretic Approach for Dynamic Adaptive Video quantiles over sliding windows. In Proceedings of ACM Streaming over HTTP. In Proceedings of the ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Conference on Special Interest Group on Data Database Systems, pages 286–296. ACM, 2004. Communication, pages 325–338. ACM, 2015. [5] L. S. Brakmo and L. L. Peterson. TCP Vegas: End-to-end [30] X. Zhang, Y. Xu, H. Hu, Y. Liu, Z. Guo, and Y. Wang. Congestion Avoidance on a Global Internet. IEEE JSAC, Profiling Skype Video Calls: Rate Control and Video 13(8):1465–1480, 1995. Quality. In Proceedings of IEEE INFOCOM, 2012. [6]L.DeCicco,S.Mascolo,andV.Palmisano.Feedback control for adaptive live video streaming. In Proceedings of the second annual ACM conference on Multimedia systems, pages 145–156. ACM, 2011.