Cross-Layer Adaptive Bitrate Streaming
Total Page:16
File Type:pdf, Size:1020Kb
Research Collection Master Thesis Cross-layer Adaptive Bitrate Streaming Author(s): Dan, Alexandru Publication Date: 2020 Permanent Link: https://doi.org/10.3929/ethz-b-000443424 Rights / License: In Copyright - Non-Commercial Use Permitted This page was generated automatically upon download from the ETH Zurich Research Collection. For more information please consult the Terms of use. ETH Library Cross-layer Adaptive Bitrate Streaming Master Thesis Alexandru Dan 12 September, 2020 Advisors: Prof. Dr. Ankit Singla, Melissa Licciardello Department of Computer Science, ETH Z¨urich Abstract Video traffic makes up most of today’s internet traffic. Yet, most trans- port layer protocols are oblivious to the nature of the traffic transmit- ted through them. Videos are divided in chunks and streamed at the application-level over HTTP connections. However, the underlying transport can not tell if the user watches a video or surfs the internet. We examine the opportunities that can arise from designing a cross- layer adaptive bitrate streaming algorithm. Can a cross-layer algorithm improve the user experience, while not overwhelming the competing traffic flows? i Contents Contents iii 1 Introduction1 2 Background3 2.1 Adaptive Bitrate Streaming.................... 3 2.2 Streaming Algorithms ....................... 5 2.2.1 Rate-based approaches................... 5 2.2.2 Buffer-based approaches.................. 6 2.2.3 Modern algorithms..................... 8 2.3 Congestion Control and ABR................... 9 2.3.1 QUIC and BBR....................... 9 2.3.2 Minerva ........................... 11 2.4 Data-centric approaches...................... 13 2.4.1 Reinforcement and Q-learning.............. 13 2.4.2 Reinforcement learning and congestion control . 15 3 Streaming DASH segments over QUIC 17 3.1 Server-side ABR Decision ..................... 17 3.2 DASH.js player adaptation..................... 19 3.3 Experimental setup......................... 20 4 Designing cross-layer algorithms 23 4.1 Problem formulation........................ 23 4.2 WorthedAbr: VMAF-aware planning............... 25 4.3 TargetAbr: low-liberty long-term planning........... 30 4.4 AutoTarget: generalizing congestion control .......... 34 4.5 GapAbr: specializing congestion control............. 37 4.6 Design space............................. 42 5 Evaluation 45 iii Contents 5.1 Experimental setup......................... 45 5.2 Competitive Scenarios....................... 47 5.2.1 Heterogeneous Videos................... 47 5.2.2 Homogeneous Videos................... 51 5.2.3 Multiple Flows....................... 54 5.3 Performance on Traces....................... 56 5.4 Self Fairness............................. 59 5.5 Background TCP Traffic...................... 63 5.6 Summary............................... 66 6 Conclusions 67 Bibliography 69 iv Chapter 1 Introduction Video traffic makes up most of today’s internet traffic. Yet, most transport layer protocols are oblivious to the nature of the traffic transmitted through them, while video players hardly consider low-level characteristics of the videos. In a typical setup, videos are divided in chunks and streamed at application-level over HTTP connections. However, the underlying trans- port can not tell if the user watches a video or surfs the internet as the video player delivers segment after segment. This thesis aims to examine the opportunities that can arise from designing cross-layer adaptive bitrate streaming algorithms. Since HTTP runs on top of transport-layer protocols, the performance of video players is directly affected both by network conditions and the imple- mentation of the transport layer. Congestion control adjusts the data trans- mission rate in order to avoid and recover from network congestion. We want to asses whether joint optimization of congestion control and adaptive bitrate streaming can lead to superior viewing experience. We continue prior work[18] which shows that communication between the 2 aforementioned layers is indeed beneficial. We take the concept of inter-layer communication one step further: we provide algorithms that jointly optimize for better user experience. Contributions We introduce a framework for taking on-the fly decisions about the qual- ity of the video segments served directly on the server side(chapter3) by exploiting long polling and the features of QUIC[14]. We propose a series of cross layer algorithms(chapter4), from which GapAbr (section 4.5) stands out by achieving better quality of experience results when running on its own(section 5.3). Furthermore, GapAbr acts fair to- wards other ABR algorithms by taking advantage of the video’s structure 1 1. Introduction and adapting accordingly(section 5.4). In chapter4, we also highlight the most important design decisions, along with implementation details and heuristics for achieving this important outcome. Along with Gap, we ex- plore a series of diverse algorithms ranging from short-sight heuristic plan- ning(section 4.2) to reinforcement learning(section 4.5) and control theory (section 4.5). Finally, we qualify both the efficiency and fairness of our solutions by ex- perimenting over 4 different videos in multiple scenarios(chapter5). We compare our algorithms with 3 different benchmarks(RobustMPC[26], Dy- namic [22] and Minerva[18]). We explore both the performance and fairness of Gap in comparison with our baselines in a series of diverse scenarios: iso- lated performance and fairness(section 5.4), performance on traces(section 5.3), competitive scenarios(section 5.2) and against background TCP traf- fic(section 5.5). 2 Chapter 2 Background 2.1 Adaptive Bitrate Streaming Adaptive video streaming is a technique used in streaming video and au- dio media over networks. The approach partitions videos in segments of fixed or variable sizes and uses HTTP to serve them in order. The segments are present for different bitrates to accommodate the varying network con- ditions. An adaptive bit rate(ABR) algorithm can modify the bitrate of the fetched segments dynamically in order to improve the viewing experience. A buffer of segments is stored on the client side, while past segments of the buffer are being consumed by the user that watches the video. Increasing the bitrate of the video improves the experience of the user, but under varying network conditions choosing a higher bitrate may cause the full consumption of the buffer resulting in the video player being stalled. Hence, the ABR algorithm has to balance the opportunity for downloading higher qualities with the risk of causing unwanted rebuffering events. Fur- thermore, experience[15] suggests that sudden changes in bitrates can de- grade the perceived quality of the frames being rendered during the change. Due to this fact, the ABR algorithm has to account for oscillating between qualities too often. MPEG-DASH is the first adaptive bit-rate HTTP-based streaming solution that became an international standard[4]. Alternative segments are encoded at different bit rates covering aligned short intervals of playback time. While the content is being played by the MPEG-DASH client, a bit rate adapta- tion algorithm is used to automatically select the segment with the highest possible rate that allows for playback without causing stalls or re-buffering events. There are multiple factors that users care about when watching a video[26]. 3 2. Background Using higher bitrates entails better video quality, but at the same time in- creases the risk of rebuffering events: not enough downloaded segments cause video to stall. The total duration these events is denoted as rebuffering time. Similarly, the startup time is the time waited by users at the beginning of a video. Finally, ABR algorithms have to take into consideration quality changes: sudden variations in quality can be noticed by users, degrading the experience. In the literature, all the factors mentioned above are combined into diverse Quality of Experience(QoE) metrics[26, 20, 18]. Buffer dynamics are oftentimes formulated under a continuous space and can be succinctly described by the following equations from paper [26]: dk(Rk) tk+1 = tk + + Dtk (2.1) Ck 1 Z tk+1−Dtk Ck = Ct dt (2.2) tk+1 − tk − Dtk tk dk(Rk) Bk+1 = max max Bk − , 0 + L − Dtk, 0 (2.3) Ck where Bk 2 [0, Bmax] represents the buffer occupancy when starting down- th loading the k segment, tk the time when downloading segment k starts and dk(Rk) is the dimension of the segment k encoded at rate Rk. The times Dtk represent the network latency component of the server-side receiving the current segment request, while Bmax represents an upper buffer limitation oftentimes applied to the client buffer. Equation 2.1 describes the download time evolution of the chunks. The time that passes until downloading chunk k + 1(tk+1) is the time needed to download a segment of size dk(Rk) for the corresponding bitrate Rk on top of the time at which segment k started downloading(tk). The download is done over a network capacity for segment k of Ck, that can be approximated from the individual measurements over the interval of time [tk+1 − Dtk, tk] as per equation 2.2. The buffer occupancy Bk+1(equation 2.1) is described by the gain of L seconds from downloading the current segment and the ( ) current buffer B adjusted by the download time of the segment – dk Rk . k Ck The value Dtk describes the timeout between downloading 2 segments, that is half an RTT if the ABR is run on the client-side. Quality of Experience(QoE) is often dependent on various components such as the video quality, rebuffer time, startup time and quality changes[26]. Measuring the QoE is not a straight-forward job as the mapping between the perceived quality of a segment and is not an obvious one, but rather stands to be related to the particular video being watched. Netflix’s approach of measuring the quality of a segment is the Multi-method Assessment Fusion, or VMAF for short. As stated in [15], VMAF is a video quality metric that 4 2.2. Streaming Algorithms combines human vision modeling with machine learning. VMAF captures differences between codecs, as well as scaling artifacts, in a way that’s better correlated with perceptual quality.