Optimized Bitrate Ladders for Adaptive Video Streaming with Deep Reinforcement Learning
Total Page:16
File Type:pdf, Size:1020Kb
Optimized Bitrate Ladders for Adaptive Video Streaming with Deep Reinforcement Learning ∗ Tianchi Huang1, Lifeng Sun1,2,3 1Dept. of CS & Tech., 2BNRist, Tsinghua University. 3Key Laboratory of Pervasive Computing, China ABSTRACT Transcoding Online Stage Video Quality Stage In the adaptive video streaming scenario, videos are pre-chunked Storage Cost and pre-encoded according to a set of resolution-bitrate/quality Deploy pairs on the server-side, namely bitrate ladder. Hence, we pro- … … pose DeepLadder, which adopts state-of-the-art deep reinforcement learning (DRL) method to optimize the bitrate ladder by consid- Transcoding Server ering video content features, current network capacities, as well Raw Videos Video Chunks NN-based as the storage cost. Experimental results on both Constant Bi- Decison trate (CBR) and Variable Bitrate (VBR)-encoded videos demonstrate Network & ABR Status Feedback that DeepLadder significantly improvements on average video qual- ity, bandwidth utilization, and storage overhead in comparison to Figure 1: An Overview of DeepLadder’s System. We leverage prior work. a NN-based decision model for constructing the proper bi- trate ladders, and transcode the video according to the as- CCS CONCEPTS signed settings. • Information systems → Multimedia streaming; • Computing and solve the problem mathematically. In this poster, we propose methodologies → Neural networks; DeepLadder, a per-chunk video transcoding system. Technically, we set video contents, current network traffic distributions, past KEYWORDS actions as the state, and utilize a neural network (NN) to deter- Bitrate Ladder Optimization, Deep Reinforcement Learning. mine the proper action for each resolution autoregressively. Unlike the traditional bitrate ladder method that outputs all candidates ACM Reference Format: at one step, we model the optimization process as a Markov Deci- Tianchi Huang, Lifeng Sun. 2020. Optimized Bitrate Ladders for Adaptive Video Streaming with Deep Reinforcement Learning. In Annual confer- sion Process (MDP), which can effectively reduce the action space. ence of the ACM Special Interest Group on Data Communication on the DeepLadder employs Dual-PPO [12], the state-of-the-art deep re- applications, technologies, architectures, and protocols for computer com- inforcement learning method, to learn the policy by interacting munication (SIGCOMM ’20 Demos and Posters), August 10–14, 2020, Vir- with the environments without any presumptions, attempting to tual Event, USA. ACM, New York, NY, USA, Article 4,3 pages. https: balance a variety of goals such as maximizing perceptual video //doi.org/10.1145/3405837.3411387 quality, improving bandwidth utilization, and reducing the storage overhead. 1 INTRODUCTION We implement DeepLadder-CBR and DeepLadder-VBR for deal- The Global Internet Phenomena Report COVID-19 Spotlight [9] ing videos with different encoding types (i.e. constant bitrate (CBR)- reveals that at least 51.43% of the video streams are delivered by encoded and variable bitrate(VBR)-encoded videos). Using trace- adaptive video streaming technologies. In such technologies, on driven experiments, first, comparing DeepLadder-CBR with ex- the server-side, video streams are pre-chunked and pre-encoded isting methods under various networks and videos, we observe at various bitrates or quality levels. On the client-side, users often that DeepLadder improves the average quality by 8.49%-14.25%, en- adopt adaptive bitrate (ABR) algorithms to pick the proper bitrates hances the bandwidth utilization by 3.66%-11.68%, and reduces the for chunks to ensure the quality of experience (QoE). We often storage cost by 39.77%-54.84%. Further, the results on DeepLadder- called the server-side encoding settings as bitrate ladder [1]. VBR also illustrate the significant improvements in the average The majority of existing bitrate ladder optimization schemes quality (3.8%-7.7%) and the storage costs (21.26%-44.88%). Finally, often use fixed encoding ladders (known as one-size-fits-all [3]), we demonstrate that DeepLadder is practical and can be deployed or only consider video contents [2] or network conditions [7], into the transcoding framework. Testing results on both CPU and GPU devices indicate that DeepLadder achieves the aforementioned * Corresponding Author. {htc19@mails, sunlf@}tsinghua.edu.cn. Permission to make digital or hard copies of part or all of this work for personal or performance with only 3.4% on extra costs. classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation 2 SYSTEM MECHANISM on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). Overview. As illustrated in Figure1, DeepLadder’s basic system SIGCOMM ’20 Demos and Posters, August 10–14, 2020, Virtual Event, USA work-flow is mainly composed of the transcoding stage andthe © 2020 Copyright held by the owner/author(s). ACM ISBN 978-1-4503-8048-5/20/08...$15.00 online stage. In the transcoding stage, we leverage an NN-based https://doi.org/10.1145/3405837.3411387 decision model for constructing the proper bitrate ladders and use Nf-Greedy Pen-Set Nf-Greedy Pen-Set RMPC-Set DeepLadder RMPC-Set DeepLadder Table 1: Real-world implementation of DeepLadder. 75 75 Video Content Inference Transcoding Extra 50 50 Process 25 25 Average Value Average Value Extraction Timespan Time Cost (%) 0 0 Video Quality Bandwidth Storage Video Quality Bandwidth Storage CPU(ms) 439 3 12060 3.67 (VMAF) Utilization (%) Overhead(50KB) (VMAF) Utilization (%) Overhead(50KB) GPU(ms) 94 12 3407 3.11 Figure 2: DeepLadder-CBR vs. existing approaches. ⊲ RobustMPC-Settings [13] (RMPC-Set): an another fixed encoding ladder settings with {300, 700, 1000, 1500, 2000, 3000}kbps. X265 DeepLadder,w=0 X265 DeepLadder,w=0 Furthermore, we select the following baselines for the VBR- Fixed5 DeepLadder,w=1 100 Fixed5 DeepLadder,w=1 100 encoded video scenario: 50 50 ⊲ Fixed-25 [5]: recent studies find that using the CRF value of Average Value Average Value 25 can provide good viewing quality. 0 0 Video Quality Bandwidth Storage Video Quality Bandwidth Storage ⊲ X265 [11]: the default CRF is fixed as 28 [8]. (VQAlog) Utilization (%) Overhead(50KB) (VQAlog) Utilization (%) Overhead(50KB) Trace-driven Evaluation. As shown in Figure2, we test the Figure 3: DeepLadder-VBR vs. basselines. performance of each scheme by the video under the HSDPA and FCC network environment and report the average video qual- ity, bandwidth utilization and storage overhead. We can see that the assigned settings to transcode the video for each chunk. In the DeepLadder outperforms previously proposed methods, with the online stage, the encoded chunks are deployed on the server and improvements on i) average video quality of 8.49% - 12.94% on the wait for fetching. The NN instantly obtains the video quality and HSDPA dataset and 8.78%-14.25% on the FCC dataset; ii) 3.66%- storage from the transcoding stage, as well as periodically receives 11.68% in terms of the bandwidth utilization on HSDPA dataset. the network status feedback from the online stage. Meanwhile, DeepLadder reduces the storage overhead by 39.77%- Details. We mainly categorize the DeepLadder’s state input 54.84% compared with the baselines. Figure3 illustrates the re- into four parts, that is, current network traffic distributions, video sults for the Oboe and HDFS network conditions. As expected, features, past actions, and desired reward weights F. DeepLadder DeepLadder-VBR (F = 0)’s bandwidth utilization is within 8.37%- is allowed to support diversity of coding methods. For the CBR- 15.03% compared with fixed encoding ladder schemes. Moreover, encoded videos, we set the the action 08 = ¹#8−1 ¸ #8 º/2, where DeepLadder-VBR (F = 1) effectively preserves the average video #8 represents the i-th throughput categorized by the bandwidth quality and reduces storage cost by 29.8% on the Oboe traces, 21.26%- histogram method. For the VBR-encoded videos, we set the action 44.88% on the HDFS network. size of DeepLadder-VBR as 20, symbolizing that the videos will be Practical Implementation. We deploy the proposed scheme encoded by Constant Rate Factor (CRF)= 20 − 39. Such settings as a service into the transcoding frameworks and list the time have already covered the best quality for most videos. The reward consumption for each process in Table1, tested on an Intel Core function is listed in Eq.1, given a network condition 퐶, we aim to i7 CPU2.2GHz and an Nvidia 1080Ti GPU. In particular, in the maximize the bandwidth utilization, the average video quality and GPU test, we use the FFmpeg’s integration of NVIDIA Video Codec minimize the storage cost. Here * is the bandwidth utilization, & SDK that enables high-performance hardware-accelerated video is the average video quality and (I is the storage cost, F is a scalar pipelines [10]. We show that DeepLadder (in gray) only consumes which controls the importance of the storage metric. Moreover, 3.11%-3.67% on the extra cost compared with the original system. we use Video Multimethod Assessment Fusion (VMAF) [6] as the quality metrics and measure the quality metric for each video chunk 4 SUMMARY AND FUTURE WORK with all the resolution and encoding bitrate tracks. We present DeepLadder, a novel bitrate ladder construction sys- tem which utilizes state-of-the-art NN-based model to build the Õ Õ Õ proper bitrate encoding profile. DeepLadder’s NN takes current AC = * ¹0C ,퐶º ¸ & ¹0C º − F (I ¹0Cº/C. (1) C C C raw video contents, network traffic capacities, and the storage over- head into the input. We adopt state-of-the-art DRL method to train DeepLadder w.r.t a large corpus of internet network traces and col- 3 PRELIMINARY RESULTS lected videos. Experimental results on both CBR and VBR-encoded Experimental Setup. To better simulate the diversity of real- videos indicate the superiority of DeepLadder over existing ap- world network conditions, we collect over 3,000 network traces, proaches.