<<

ANT: Learning Accurate Network for Better Adaptive Video Streaming

Jiaoyang Yin, Yiling Xu, Hao Chen, Yunfei Zhang, Steve Appleby, Zhan Ma

ABSTRACT learning-based ABR method, characterized the averaged network Adaptive (ABR) decision plays a crucial role for ensuring behavior from a number of different network traces to drive a single satisfactory Quality of Experience (QoE) in video streaming appli- neural bit rate decision model. Recently, Oboe [1] suggested two cations, in which past network statistics are mainly leveraged for ABR decision models for rate adaptation to cover two different future network prediction. However, most algorithms, bit rate ranges respectively, where each ABR decision model was either rules-based or learning-driven approaches, feed through- trained using corresponding traces belonging to the same bit rate put traces or classified traces based on traditional statistics (i.e., range. mean/standard deviation) to drive ABR decision, leading to com- Simply averaging network throughput is not sufficient to repre- promised performances in specific scenarios. Given the diverse sent its variations for any connection. Thus we propose the ANT network connections (e.g., WiFi, cellular and wired link) from time model to learn accurate network throughput dynamics for better to time, this paper thus proposes to learn the ANT (a.k.a., Accurate adaptive video streaming. Towards this, we first attempt to classify Network Throughput) model to characterize the full spectrum of the network into multiple (e.g., five) clusters by applying K-means network throughput dynamics in the past for deriving the proper to analyze the characteristics of network traces, by which each network condition associated with a specific cluster of network cluster of NTS, a.k.a., network condition, is assumed to represent throughput segments (NTS). Each cluster of NTS is then used to gen- a typical network connection having its discriminative behavior. erate a dedicated ABR model, by which we wish to better capture Then, we map past NTS to a specific network condition by quanti- the network dynamics for diverse connections. We have integrated tatively measuring its feature vector, where we leverage the deep the ANT model with existing reinforcement learning (RL)-based neural network (DNN) to learn temporal dynamics instead of sim- ABR decision engine, where different ABR models are applied to ply using the mean and/or standard deviation (std) of NTS [1]. In respond to the accurate network sensing for better rate decision. Ex- the meantime, we further use clustered NTS (as of different network tensive experiment results show that our approach can significantly conditions) from network traces to train corresponding neural mod- improve the user QoE by 65.5% and 31.3% respectively, compared els for ABR decision, with which we can leverage multiple neural with the state-of-the-art Pensive and Oboe, across a wide range of ABR models to accurately represent diverse network connections. network scenarios. In this work, we train well-known and emerging RL-based ABR decision engine to better select bit rate for improved QoE [1, 3, 24]. CCS CONCEPTS Contributions. 1) We suggest the ANT model to classify incom- • Information systems → Multimedia streaming; • Comput- ing NTS into one of five discriminative network conditions to drive ing methodologies → Neural networks. corresponding neural ABR decision model, by which we can better and accurately represent temporal network throughput dynamics KEYWORDS that are often overlooked in literature. Adaptive Video Streaming, Neural Networks, Throughput Learning, 2) Instead of simply applying the mean and/or std of incoming Condition Recognition NTS to recognize different network conditions, we rely on DNNs to accurately characterize the feature representation of NTS dynamics 1 INTRODUCTION over time, by which we could better differentiate typical scenarios of network connections. Recent years have witnessed an exponential increase of HTTP- 3) We demonstrate our ANT model using various public acces- arXiv:2104.12507v2 [cs.MM] 5 May 2021 based video streaming traffic volume [6]. To assure the high-quality sible network traces and alternative Tencent traces obtained from service provisioning, adaptive bit rate (ABR) policy is devised to se- large-scale Tencent video hosting platform, having noticeable QoE lect appropriate video chunks for combating the network dynamics improvements by 65.5% and 31.3% respectively against the state-of- (e.g., , congestion, throughput fluctuation, etc) from time the-art Pensive [24] and Oboe [1]. to time to achieve satisfactory QoE for heterogeneous connections. Prevalent ABR algorithms, i.e., either rules-based or recently 2 BACKGROUND AND MOTIVATION emerged learning-based approaches [1, 2, 4, 5, 8, 10–19, 21, 23, 24, Various ABR algorithms are adopted in popular video streaming 24–26, 29–37], typically run on client-side to perform the video applications. They mostly collect network throughput [16, 21, 25, bit rate decision through client-side playback buffer occupancy 31] and receiver states (e.g., buffer status, packet loss, etc)14 [ , 15, observation, and/or network throughput (bandwidth) estimation. 26, 30] in the past for future rate adaptation. Since existing ABR methods typically model the average of different In reality, it is hard for conventional heuristic methods to suffi- network throughput traces for bit rate prediction using fixed control ciently cover network variations. Thus, recent years have witnessed mechanism or very limited neural models, they mostly overlook the growth of learning-based approaches, in which well-known the significant variations of network connections from time to time RL-based neural engine is often deployed to learn a single model and from device to device. For example, Pensive [24], a pioneering to represent diverse network behaviors from a great number of 1 ,, Jiaoyang Yin, Yiling Xu, Hao Chen, Yunfei Zhang, Steve Appleby, Zhan Ma

public network traces [3, 24]. In this way, learned neural model has ABR transmission server Media server Media Content delivery

to compromise among all possible network conditions, typically Model 1 query network

......

leading to acceptable but not the best performance. Model 2 ......

Model n Network condition Video chunk

6 ABR decision model learning adaptation

5 States Client-side Network statistics update buffer occupancy Video player 4 ... Video chunk 3 Figure 2: Overall architecture of ANT-powered adaptive

(Mbps) hroughput 2 T video streaming. 1 0 100 200 300 400 500 condition learning module is the historical throughput data ob- 2 tained at client-side. Then a Convolutional Neural Network (CNN) is utilized to extract features of historical network data automati- 1 cally and output recognition results of future network conditions

QoE 0 after analyzing. ABR decision module adopts various states, such as historical bit rate, current buffer size, past chunk throughput, past 1 pensieve downloaded time, future chunk size, as inputs. Then these states are oboe 2 ANT fed into one of multiple neural network models trained under vari- 0 200 400 600 800 time(s) ous network conditions, based on the recognition result of network condition learning module, to make ABR decision. The decisions Figure 1: Necessity for network throughput learning. made by the neural network will affect the video streaming environ- ment, and in reverse, the environment will feedback changed states In order to deal with various network conditions during trans- into the ABR decision module. The overall procedure for proposed mission, [1] proposed Oboe which can auto-tune video ABR al- architecture can be formulated as follows: gorithm to network conditions. Oboe detects changes in network 푐표푛푑푖푡푖표푛 = 푓 − (푡ℎ푟표푢푔ℎ푝푢푡 ) (1) states/conditions using Bayesian change point detection algorithms 1퐷 퐶푁 푁 ℎ푖푠푡표푟푖푐푎푙 based on average and std of throughput, and adjusts ABR parame- 푎푐푡푖표푛 = 푓퐴퐵푅 (푠푡푎푡푒 , 푠푡푎푡푒 , 푐표푛푑푖푡푖표푛) (2) ters according to detection results. Benefiting from the auto-tuning 푛푒푡푤표푟푘 푝푙푎푦푒푟 mechanism, ABR algorithms can adapt to complex network condi- 3.2 Network Condition Learning Module tions and obtain optimal performance in QoE metric and underlying metrics. However, Oboe detects and recognizes network condition The network condition learning module we construct is based on changing based on average and std of throughput essentially, which CNN, which has the ability to extract comprehensive features of may not be enough to accurately represent dynamic and complex input data automatically. We propose a power CNN framework network conditions. And in this way, algorithm will not auto-select to analyze historical throughput data and learn current network appropriate ABR parameters. As shown in Fig. 1, throughput trends conditions for assisting future model selection. Historical network with similar average and std values appear during time slots be- throughput data is adopted as input of this module to perform both tween red dashed lines. When adopting fixed ABR model or dis- training and testing phases. tinguishing throughput trends according to average or std value, Due to lacking related labels in current network traces to indicate subtle differences in throughput trends may not be recognized and real network conditions, it will encounter difficulty when training appropriate ABR model can not be chosen in time, and hence, in- and validating this CNN. To tackle this issue, we propose a trace curring QoE fluctuation. On the contrary, if network throughput is analyzed and learned accurately, as proposed ANT shows, appro- Trace segmentation K-means cluster 1 segment clustering clusters priate ABR model is chosen in time and QoE fluctuation will be original prevented to some extent. To this end, it is necessary to establish a network cluster 2 ... cluster n mechanism to learn accurate network throughput and recognize traces comprehensive changes in throughput trends. percentage threshold reach not reach 3 LEARNING ANT MODEL AND ITS Segments (t seconds) certain condition uncertain APPLICATION TO ABR DECISION condition condition 1 condition 2 3.1 Architecture Overview The overall architecture of ANT-powered adaptive video streaming is shown in Fig. 2. It consists of a network condition learning module and a condition-wised ABR decision module. The input of network Figure 3: Illustration of the trace aggregation mechanism. 2 ANT: Learning Accurate Network Throughput for Better Adaptive Video Streaming ,,

Historical Feature extractor throughput data Fully-connection network

v0 v0

v1 v1 reshape ...

... data network ...

......

normalization ... condition v v

n n layer 1 layer 2 layer 3

)

1

x

3 conv ( channel-wised channel shuffle conv conv average

concat pooling )

1 batch activation

x × + 5

conv reshape

( reshape norm function

)

...

... 1

x relu sigmoid

7

conv (

Squeeze-and-Excitation operation

Figure 4: Framework of network condition learning module. aggregation mechanism to distinguish network traces based on the throughput data. Specifically, we use three types of convolutional data feature in temporal dimension, as shown in Fig. 3. Original scales in each layer including 3 × 1, 5 × 1, and 7 × 1 kernels and network traces are firstly split into several segments which contain then concatenate features after different scales of convolution oper- temporal information of 푡 seconds. Then K-means [22], a typical ation in channel dimension. 2) We add Squeeze-and-Excitation (SE) clustering algorithm, is adopted to achieve segment aggregation module [9] in the network backbone to assign different weights for according to the similarity of temporal features in 푡 seconds’ seg- corresponding channels based on their degrees of contribution for ment. After K-means operation, throughput segments from various final results. The SE operation is conducted through 1D-global pool- network traces are aggregated into 푘 clusters, and segments with ing, fully connection and non-linear activation functions (Rectified similar network condition are classified into same cluster. Eventu- Linear Unit and Sigmoid Function). After SE operation, weights of ally, the whole network trace can be labeled by the most frequent each channel are obtained and these values are multiplied with cor- type of network conditions for all component trace segments. It responding channels in original signal to achieve channel weighting. should be noted that if the percentage (frequency) does not reach 3) We add residual structure [7] by transmitting shallow features a predefined threshold ℎ,“uncertain” will be marked for this trace. to deeper layers directly and add shallow features with abstract With the network condition labels tagged for the throughput data, features. It has been proved to address the problem of shallow-layer these network traces can be fed into the CNN to conduct training feature submerging and gradient vanishment/explosion when CNN and testing. becomes to be deeper. 4) We adopt channel shuffle operation to With the indication of condition labels, the neural network is disturb original order of concatenated feature channels, which are trained to recognize the condition of next NTS by extracting and obtained through various scales of convolution operation, for bet- analyzing shallow and abstract features from input historical data ter generalization capability of the neural model. Feature channels in this module. The network condition is then used to drive ABR are firstly divided into three groups equally. Then feature matrix decision module to choose the matching ABR model for rate adap- is reshaped, transposed and reshaped to make feature channels tation. The framework of proposed neural network is shown in shuffled. 5) We also perform the mean-std standardization forin- Fig. 4. put throughput data and the batch normalization for data between The backbone is a 1D-CNN which is suitable to tackle with high- layers at sample level. With these two normalization operations, dimensional network throughput data. The input of neural network the training process can be accelerated and neural network can is historical NTS and the output is the recognition result indicat- converge to optimization in fewer epochs. What’s more, dropout is ing the network condition of next NTS. In the neural network, used in fully-connection layers to avoid over-fitting in the training three convolutional layers with same architecture but different process. hyper-parameters including size of convolutional kernel and num- ber of output channels, are devised to extract hierarchical features. 3.3 Condition-wised ABR Decision Module In order to improve the capability of feature extraction and the The condition-wised ABR decision module is constructed with a accuracy of condition recognition, we add several optimized opera- deep reinforcement-learning (RL) neural network. It is deployed to tions to the backbone network.1) We devise a multiple-perceptual- make bit rate decisions using one of trained ABR models indicated field mechanism by using multi-scale convolutional kernels in each by condition learning module mentioned above. The goal of ABR convolutional layer to extract multi-scale features in the network decision module is to achieve the maximum reward signal, which reflects video QoE of users. In the training phase, the agent obtains 3 ,, Jiaoyang Yin, Yiling Xu, Hao Chen, Yunfei Zhang, Steve Appleby, Zhan Ma a variety of observations from the video streaming environment, bandwidth from Belgium, traces from Oboe and traces from ACM including network statistics (i,e., bandwidth or throughput) and Live Streaming platform) and traces collected online from Tencent player states at client-side (i.e., buffer occupancy, bit rate and so video platform (Wifi network traces and 3G/ network traces). on), and inputs them as states into the RL neural network. Neural In the network trace files, time information (second) and corre- network produces actions based on these input states. Afterward, sponding throughput (bandwidth) information (Mbit per second) decisions made by neural network will change the states and get are contained. We randomly divide all 2658 traces into training set corresponding reward from the environment. The RL neural net- and testing set respectively by the proportion of 80% and 20%. NTS work is trained with different states to get a higher video QoEof in 20-second duration are provided for trace aggregation and the users in the future. When the training reaches convergence, the classified threshold mentioned above is set as 2/3. Correspondingly, neural network can enter into working phase which makes deci- throughput learning will be conducted every 20 seconds. sions based on observations of environment. The architecture of condition-wised ABR decision module is shown as Fig. 5. 4.2 Design of Confidence Mechanism In the trace aggregation mechanism, there may be some traces of reward QoE ABR decision agent calculation uncertain condition due to not reaching the threshold, as shown Client states Neural network in Fig. 3. In order to tackle with these inaccurate recognitions, we bit rate buffer occupancy Media player train a general ABR model under all various network traces, and a ... bitrate 1

bitrate 2 dedicated “uncertain” model under the traces of uncertain condi- ...

Network states ... tion type. To be specific, when condition learning model outputs bandwidth/ ... throughput bitrate n a different result compared with that in last period, we would not conduct model switching immediately. This is because that the network statistics/player states difference may be incurred by inaccurate recognition results (as the recognition accuracy is not 100%, see Sec. 5.2) instead of real network condition switching. In this case, switching the ABR model Figure 5: Architecture of condition-wised ABR decision may incur inappropriate model selection, which may significantly module. degrade the ABR performance. Thus, we propose a confidence We conduct the training phase under various network conditions mechanism based on the sliding window in implementation phase. in order to generate several ABR models. Every once in a while, Condition learning is conducted every 20 seconds and recognition certain ABR model corresponding to certain network condition is result at each step is queued into a sliding window. Current result selected to make bit rate decision according to recognition results is admitted only if the decision is equal to past two out of three of condition learning module. decisions. Otherwise, the “uncertain” ABR model will be selected The algorithm of RL in ABR decision module we use is Asynchro- for the following 20-second NTS’s ABR decision. In addition, when nous Advantage Actor-Critic (A3C) [27], a state-of-the-art actor- the video streaming system runs in the initial period (i.e., 60 sec- critic method. The RL neural network consists of two parts: an onds in the beginning) that not enough historical throughput data actor network, which is used to make bit rate decisions for each to conduct condition learning, the general ABR model is selected video chunk, and a critic network, which is used to evaluate the until the condition of confidence mechanism is met. The confidence performance of decisions made by the actor network. The inputs mechanism is illustrated in Fig. 6. adopted for reinforcement learning agent (RL-agent) include histor- ical throughput, download time, future chunk size, current buffer size, last chunk bit rate and number of chunks left. The policy gra- ... dient method is adopted to train the critic network and the actor slot 0 slot 1 slot 2 slot 3 slot 4 slot n-3 slot n-2 slot n-1 slot n network. The RL-agent takes an action 푎푡 that corresponds to the recognition bit rate for the next chunk when receiving a 푠푡 . After applying result=2 each action to client-side player, the environment will provide the uncertain model 1 model 1 model 1 model 1 model 1 model 1 RL-agent with a reward 푟푡 for that video chunk. In this paper, the model reward is set according to the QoE metric proposed in MPC [34] to -60s -40s -20s 0s -60s -40s -20s 0s recognition reflect the performance of each chunk download. The primary goal result=2 of the RL-agent is to maximize the expected cumulative reward received from the environment. model 1 model 2 model 2 model 1 model 2 model 2 model 2 -60s -40s -20s 0s -60s -40s -20s 0s 4 IMPLEMENTATION OF ANT recognition result=2 4.1 Introduction of Network Traces general general general general general general general Since it’s time-consuming to “experience” video downloads in the model model model model model model model -60s -40s -20s 0s -60s -40s -20s 0s real-world streaming environment, we use simulations over a wide range of network traces in the training and testing phases, including Figure 6: Illustration of confidence mechanism. traces collected from public dataset (broadband dataset provided by FCC, 3G/HSDPA mobile dataset collected in Norway, 4G/LTE 4 ANT: Learning Accurate Network Throughput for Better Adaptive Video Streaming ,,

4.3 More Details of Implementation cluster. Therefore, DBI value increases gradually with the number We implement the RL-based ABR decision module and CNN-based of clusters increasing till convergence. In the view of these two indi- condition learning module using Tensorflow. Bit rates of video cators, we find the turning point where trend slows down occurs at content we choose include 135Kbps, 340Kbps, 835Kbps, 1350Kbps 푘 = 5. Thus, we set the number of clusters as 5 for trace segments, and 2640Kbps. The actor network learning rate is set as 0.0001 and and 6 for whole traces considering extra “uncertain” condition after critic network learning rate is set as 0.001. The entropy weight of applying proposed confidence mechanism. actor network is set as 0.5. Actor network and critic network both include 1 hidden layer with 1D-CNN network with the kernel size 5.2 Network Condition Recognition as 4 and 128 output channels, and a fully-connection network with In order to recognize network conditions based on historical through- 128 neurons. The number of RL-agent is set as 16. We consider past 8 put data, we use clustered trace segments to train condition learning states observed from the environment, which are normalized before module. The length of input historical throughput data we adopt is input into ABR agents. Rebuffer penalty and smoothness penalty 20 seconds. We choose 80% and 20% of all throughput data respec- are set as 2.64 and 1 respectively. For neural network in condition tively as training data and validation data. We conduct the training learning module, three types of CNN filters are 3×1, 5×1, and 7×1 process until the validation accuracy reaches convergence. After respectively. The number of output channels of each CNN layer is 100 epochs, the validation accuracy converges to 98.56%. It proves 64 × 3, 128 × 3, and 256 × 3 respectively. Kernel size in pooling that throughput data in 20-second duration contains adequate infor- layers we choose is 2×1. The first layer of fully-connection network mation, based on which condition learning module could recognize has 256 neurons and 128 neurons are contained in the second layer. almost all the network conditions accurately. For all convolution operations and pooling operations, we set stride With the capability of recognizing network condition accurately, with 1 and add padding operation to maintain the data width. As for and the assistance of confidence mechanism, the network condition other hyper-parameters, We set learning rate as 0.0001 and batch learning module can drive ABR model switching for better bit rate size as 80 in the training phase. The device we use to train and decision based on historical throughput data. test neural networks is a Ubuntu 16.04 server with Intel Xeon CPU E5-2683 v4 @ 2.10GHz and Nvidia GeForce GTX 1080Ti 11G GPU. 5.3 ABR Decision and Final QoE 5 EXPERIMENT RESULTS AND ANALYSIS In this part, we evaluate the performance of ANT for bit rate adapta- tion on the QoE metric and its individual components, including bit 5.1 Network Trace Clustering rate utility (Mbps), rebuffering penalty (seconds) and smoothness Before training various ABR models under network traces of differ- penalty (Mbps), under diverse network traces. Two state-of-the-art ent conditions, traces are split into 20-second segments and feature ABR algorithms, i.e., Pensieve, a RL-based ABR algorithm with aggregation is conducted in the segment dimension as mentioned in one general model, and Oboe, an ABR algorithm with auto-tuning Sec.3.2. The number of clusters 푘 is critical when running K-means mechanism according to network conditions, are adopted for per- algorithm. To determine the most appropriate value 푘 considering formance comparing. For the Oboe algorithm, we divide all network all segments, we conduct K-means clustering with parameter 푘 traces into 5 parts according to the average value of throughput, varying from 2 to 8, and then make comparison under the metrics i.e., 0-3Mbps, 3-6Mbps, 6-9Mbps, 9-12Mbps and over 12Mbps. Then of Sum of the Squared Errors (SSE) and Davies-Bouldin Index (DBI). a similar training (to Sec. 3.3) is conducted to generate ABR models The trends of indicators with 푘 values are shown in Fig. 7. corresponding to throughput conditions. In addition, we evaluate their performance under another set of network traces collected

1e8 1.7 from Tencent video platform. The results are shown in Fig. 8 to 1.5 1.6 Fig. 10. 1.4 1.5 As shown in Fig. 8, we calculate the average values of final QoE 1.3 1.4 and its individual components for each chunk under all public traces. 1.2 1.3 ANT

DBI achieves 1.575 on final QoE metric and 1.88Mbps in average SSE 1.1 1.2 bit rate for each video chunk which are 65.5% and 25.3% higher 1.0 1.1 0.9 1.0 respectively than Pensieve algorithm. In addition, ANT rivals Pen- 0.8 0.9 sieve in terms of average rebuffering time and achieves much lower 2 3 4 5 6 7 8 2 3 4 5 6 7 8 average bit rate fluctuations (higher smoothness) than pensieve. k-value k-value On the other hand, ANT achieves 31.28% and 16.05% improvement (a) Trend of SSE with 푘 Values (b) Trend of DBI with 푘 Values in view of average QoE and average bit rate, compared with Oboe’s auto-tuning mechanism. Similarly, ANT shows less bit rate fluctua- Figure 7: Trends of indicators with 푘 values. tions. Although Oboe performs slightly better than proposed ANT for average rebuffering time under traces from public platform, it The SSE value becomes smaller gradually with the number of is acceptable because of much better performance of ANT in view clusters 푘 increasing. When 푘 approaches the most appropriate of final QoE, bit rate utility, and bit rate fluctuation. value, the downward trend (slop) will become slower till conver- For the evaluation under Tencent network traces, the results gence. On the contrary, DBI calculates the ratio of the degree of are shown in Fig. 8. ANT achieves 1.79 on final QoE metric and separation between clusters to the degree of aggregation within a 2.165Mbps in average bit rate for each video chunk, which are 25% 5 ,, Jiaoyang Yin, Yiling Xu, Hao Chen, Yunfei Zhang, Steve Appleby, Zhan Ma

pensieve oboe ANT pensieve oboe ANT 2 2.5 1.0 pensieve 1.0 pensieve 1.79 2.165 1.8 2.054 2.097 1.592 oboe oboe 1.6 1.5754 2 1.88 0.8 0.8 1.432 ANT ANT 1.4 1.62 1.2 1.5 QoE 1.2 1.5 0.6 0.6 1 0.952 Better CDF 0.8 1 CDF 0.4 0.4 Better 0.6 Average

0.4 Average Bitrate 0.5 0.2 0.2 0.2 0 0 Public Tencent Public Tencent 0.0 0.0 Network Trace Dataset Network Trace Dataset 4 2 0 2 0.5 1.0 1.5 2.0 2.5 QoE Bitrate(Mbps) pensieve oboe ANT pensieve oboe ANT 0.16 0.6 0.553 0.138 0.14 0.5 1.0 1.0 0.44 0.12 0.113 0.4 0.8 0.8 0.1 0.0865 0.32 0.08 0.3 0.258 0.21 0.6 0.6 0.06 0.2 0.147 0.0364 Better CDF 0.04 0.032 CDF Better 0.0264 0.1 0.4 0.4 Smoothness Penalty Smoothness Rebuffering Penalty Rebuffering 0.02 pensieve pensieve 0 0 0.2 0.2 Public Tencent Public Tencent oboe oboe Network Trace Dataset Network Trace Dataset 0.0 ANT 0.0 ANT 0.0 0.5 1.0 1.5 2.0 2.5 3.0 0.0 0.2 0.4 0.6 0.8 Rebuffering(s) Smoothness(Mbps) Figure 8: Final performance comparison under both public traces and Tencent traces. Figure 10: Final CDF curve under Tencent traces.

5.4 Performance Stability Analysis 1.0 pensieve 1.0 pensieve Besides a good average QoE, it’s necessary to maintain low instant- oboe oboe 0.8 0.8 ANT ANT QoE fluctuations across adjacent video chunks. Users expect to 0.6 0.6 enjoy a stable instant QoE during video streaming. ANT can acquire CDF CDF 0.4 0.4 more stable QoE performance in the face of highly dynamic network conditions. We illustrate the stability performance of ANT and other 0.2 Better 0.2 Better ABR algorithms in Fig. 11 and Fig. 12. 0.0 0.0 0.5 0.0 0.5 1.0 1.5 2.0 2.5 0.5 1.0 1.5 2.0 2.5 QoE Bitrate(Mbps) 1.0 1.0 1.0 1.0 0.8 0.8 0.8 0.6 0.8 0.6 Better

CDF Better 0.6 0.6 CDF 0.4 0.4 Better pensieve CDF pensieve 0.4 CDF 0.4 Better 0.2 0.2 oboe oboe pensieve pensieve ANT 0.2 0.2 0.0 ANT 0.0 oboe oboe 0 2 4 6 8 0 10 20 30 40 50 0.0 ANT 0.0 ANT Standard Deviation of QoE Standard Deviation of QoE 0 0.05 0.1 0.15 0.2 0.25 0.3 0.0 0.2 0.4 0.6 0.8 Rebuffering(s) Smoothness(Mbps) (a) Public Dataset (b) Tencent Dataset

Figure 9: Final CDF curve under public traces. Figure 11: CDF curve for the standard deviation of QoE.

Fig. 11 shows the CDF curves for the standard deviation of QoE under public traces and Tencent traces. ANT achieve a close per- higher and 5.4% higher than Pensieve respectively. And in terms formance of stability to Pensieve and Oboe under public traces of average rebuffering time and average bit rate fluctuations, ANT (about 90% traces with better stability), but outperforms them on outperforms Pensieve as well. Compared to Oboe’s auto-tuning the metric of average QoE and its individual components. For Ten- mechanism, ANT achieves 12.44% and 3.24% improvement in the cent traces, ANT acquires better stability than other algorithms on form of average QoE and average bit rate, and less rebuffer time and almost all network conditions, and performs better on the average bit rate fluctuations throughout the whole video streaming process. QoE metric and its individual components at the same time. In We further calculate the CDF (Cumulative Distribution Func- order to demonstrate stability directly, we also present the details tion) of final QoE values and corresponding individual components of QoE fluctuations in chunk level in Fig. 12. We can find that ANT under both the public traces and Tencent traces, and the results are always outperforms other algorithms throughout the video chunk shown in Fig. 9 and Fig. 10 respectively. CDF curves also illustrate streaming. This is because that ANT can recognize the change of consistent results. We can find that ANT achieves better and robust network conditions (as black arrows indicate in Fig. 12) and perform performance than existing state-of-the-art algorithms (i.e., Pensieve ABR model switching to maintain a high instant QoE. In contrast, and Oboe) on various network conditions. Pensieve cannot tune its model parameters at all and Oboe cannot 6 ANT: Learning Accurate Network Throughput for Better Adaptive Video Streaming ,, detect these network condition changes, so they face a significant appears, all pre-trained models may meet with performance degra- performance degradation of instant QoE. dation. At this moment, pre-trained ABR models stored in ABR server need to be refined online under this unseen network condi- tion to learn a well-matched ABR model for bit rate decision. This pensieve oboe ANT pensieve oboe ANT requires additional studies and will be deferred as our future work. 2 2 1 1 7 RELATED WORK

0 0 ABR algorithms with fixed model. Existing state-of-the-art ABR QoE QoE algorithms include rate-based (RB) algorithms [16, 18, 21, 25, 31, 37], 1 1 2 buffer-based (BB) algorithms2 [ , 13–15, 26, 30] and hybrid control 2 algorithms [1, 4, 5, 8, 10–12, 17, 19, 23, 24, 29, 32–36]. Due to the 0 10 20 30 40 50 60 70 0 10 20 30 40 50 60 70 chunk chunk limited capability of rule-based algorithms, more researches fo- cus on adopting learning-based hybrid control algorithms for ABR pensieve oboe ANT pensieve oboe ANT decision, such as Pensieve [24] adopting A3C architecture, [20] 2 2 with DDPG (Deep Deterministic Policy Gradient) framework, HD3 1 1 (Distributed Dueling DQN with Discrete-Continuous Hybrid Ac- tion Spaces) [17] utilizing deep Q-learning algorithm [28], et al. 0 0 QoE QoE Benefiting from capabilities of neural network in feature extract- 1 1 ing and policy learning, these algorithms tend to obtain a balance 2 2 between various QoE metrics and outperform the algorithms us- 0 20 40 60 80 100 120140 0 10 20 30 40 50 60 70 ing fixed rules. While they usually adopt single fixed model for chunk chunk ABR decision and do not specialize to different network conditions, ANT performs better by specializing ABR models for each network pensieve oboe ANT pensieve oboe ANT condition independently. 2 2 Auto-tuning ABR algorithm. In order to improve the dynamic 1 1 range of ABR algorithms, Oboe [1] is proposed to auto-tune the

0 0 parameters of ABR algorithms to network conditions. Oboe detects QoE QoE changes in network states/conditions using Bayesian change point 1 1 detection algorithms based on average and standard deviation of 2 2 throughput. While Oboe utilizes limited throughput statistics (i.e., 0 20 40 60 80 100 0 20 40 60 80 100 average and std) with inaccurate recognition of network condi- chunk chunk tion changes, ANT performs better by learning network conditions through raw historical throughput data. Figure 12: Fluctuation comparison of QoE performance. 6 DISCUSSION 8 CONCLUSION In this paper, ANT is proposed in RL-based ABR architecture to In this paper, we propose ANT, an accurate network throughput tackle with more complex and dynamic network conditions. How- learning mechanism for better adaptive video streaming. To deal ever, there still existing some problems to be dealt with in the future with the issue of performance degradation when network condition work. changes frequently, a condition learning module and a condition- Performance on more complex network conditions. The wised ABR decision module are devised in ANT. By collecting and number of network conditions that our condition learning mod- analyzing the historical throughput data, the condition learning ule can recognize is limited. Although network traces we use in module can accurately recognize current network condition, which this paper can cover a wide range of conditions, there is opportu- is used to drive the switching of RL-based ABR models for optimal nity for ANT to encounter more complex network conditions in performance in the ABR decision module. Extensive experimental real transmission environment. In this situation, condition learn- results verify the better performance of ANT for bit rate adaptation ing module may output inaccurate results that cause inappropriate across a wide range of traces. model selection in the following bit rate decision. Moreover, new throughput data may not contain network condition labels, which REFERENCES brings difficulty for neural network training. Nevertheless, wede- [1] Zahaib Akhtar, Yun Seong Nam, Ramesh Govindan, Sanjay Rao, Jessica Chen, Ethan Katz-Bassett, Bruno Ribeiro, Jibin Zhan, and Hui Zhang. 2018. Oboe: Auto- vise a general ABR model and a dedicated “uncertain” ABR model Tuning Video ABR Algorithms to Network Conditions. In Proceedings of the 2018 in ANT additionally to cover above-mentioned network conditions Conference of the ACM Special Interest Group on Data Communication (Budapest, Hungary) (SIGCOMM ’18). Association for Computing Machinery, New York, NY, for acceptable QoE. USA, 44–58. https://doi.org/10.1145/3230543.3230558 Online training for ABR models under specific network [2] A. Beben, P. Wiundefinedniewski, J. Mongay Batalla, and P. Krawiec. 2016. condition. ABR server stores limited number of pre-trained ABR ABMA+: Lightweight and Efficient Algorithm for HTTP Adaptive Streaming. In Proceedings of the 7th International Conference on Multimedia Systems (Klagenfurt, decision models, which correspond to considered network condi- Austria) (MMSys ’16). Association for Computing Machinery, New York, NY, USA, tions. However, when an absolutely different network condition Article 2, 11 pages. https://doi.org/10.1145/2910017.2910596 7 ,, Jiaoyang Yin, Yiling Xu, Hao Chen, Yunfei Zhang, Steve Appleby, Zhan Ma

[3] H. Chen, X. Zhang, Y. Xu, J. Ren, J. Fan, Z. Ma, and W. Zhang. 2019. T-Gaming: [21] Chenghao Liu, Imed Bouazizi, and Moncef Gabbouj. 2011. Rate Adaptation for A Cost-Efficient Cloud Gaming System at Scale. IEEE Transactions on Parallel Adaptive HTTP Streaming (MMSys ’11). Association for Computing Machinery, and Distributed Systems 30, 12 (2019), 2849–2865. https://doi.org/10.1109/TPDS. New York, NY, USA, 169–174. https://doi.org/10.1145/1943552.1943575 2019.2922205 [22] J. B. Macqueen. 1967. Some Methods for Classification and Analysis of Multivari- [4] L. De Cicco, V. Caldaralo, V. Palmisano, and S. Mascolo. 2013. ELASTIC: A Client- ate Observations. In Proceedings of the fifth Berkeley symposium on mathematical Side Controller for Dynamic Adaptive Streaming over HTTP (DASH). In 2013 statistics and probability. 281–297. 20th International Packet Video Workshop. 1–8. https://doi.org/10.1109/PV.2013. [23] Ahmed Mansy, Bill Ver Steeg, and Mostafa Ammar. 2013. SABRE: A Client 6691442 Based Technique for Mitigating the Buffer Bloat Effect of Adaptive Video Flows. [5] A. Elgabli and V. Aggarwal. 2020. FastScan: Robust Low-Complexity Rate Adap- In Proceedings of the 4th ACM Multimedia Systems Conference (Oslo, Norway) tation Algorithm for Video Streaming Over HTTP. IEEE Transactions on Circuits (MMSys ’13). Association for Computing Machinery, New York, NY, USA, 214–225. and Systems for Video Technology 30, 7, 2240–2249. https://doi.org/10.1109/ https://doi.org/10.1145/2483977.2484004 TCSVT.2019.2914388 [24] Hongzi Mao, Ravi Netravali, and Mohammad Alizadeh. 2017. Neural Adaptive [6] Forecast G. 2019. Cisco Visual Networking Index: Global Mobile Data Traffic Video Streaming with Pensieve. In Proceedings of the Conference of the ACM Special Forecast Update 2017-2022. White Pap. (2019). Interest Group on Data Communication (Los Angeles, CA, USA) (SIGCOMM ’17). [7] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Association for Computing Machinery, New York, NY, USA, 197–210. https: Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer //doi.org/10.1145/3098822.3098843 Vision and Pattern Recognition (CVPR). [25] Konstantin Miller, Abdel-Karim Al-Tamimi, and Adam Wolisz. 2016. QoE-Based [8] Ruying Hong, Qiwei Shen, Lei Zhang, and Jing Wang. 2019. Continuous Bitrate Low-Delay Live Streaming Using Throughput Predictions. ACM Trans. Mul- & Control with Deep Reinforcement Learning for Live Video Streaming. timedia Comput. Commun. Appl. 13, 1, Article 4 (Oct. 2016), 24 pages. https: In Proceedings of the 27th ACM International Conference on Multimedia (Nice, //doi.org/10.1145/2990505 France) (MM ’19). Association for Computing Machinery, New York, NY, USA, [26] K. Miller, E. Quacchio, G. Gennari, and A. Wolisz. 2012. Adaptation algorithm for 2637–2641. https://doi.org/10.1145/3343031.3356063 adaptive streaming over HTTP. In 2012 19th International Packet Video Workshop [9] Jie Hu, Li Shen, and Gang Sun. 2018. Squeeze-and-Excitation Networks. In (PV). 173–178. https://doi.org/10.1109/PV.2012.6229732 Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition [27] Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Tim- (CVPR). othy Lillicrap, Tim Harley, David Silver, and Koray Kavukcuoglu. 2016. Asyn- [10] Tianchi Huang, Rui-Xiao Zhang, Chao Zhou, and Lifeng Sun. 2018. QARC: chronous Methods for Deep Reinforcement Learning. In Proceedings of The 33rd Video Quality Aware Rate Control for Real-Time Video Streaming Based on Deep International Conference on Machine Learning (Proceedings of Machine Learning Re- Reinforcement Learning. In Proceedings of the 26th ACM International Conference search, Vol. 48), Maria Florina Balcan and Kilian Q. Weinberger (Eds.). PMLR, New on Multimedia (Seoul, Republic of Korea) (MM ’18). Association for Computing York, New York, USA, 1928–1937. http://proceedings.mlr.press/v48/mniha16.html Machinery, New York, NY, USA, 1208–1216. https://doi.org/10.1145/3240508. [28] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis 3240545 Antonoglou, Daan Wierstra, and Martin Riedmiller. 2013. Playing Atari with [11] Tianchi Huang, Chao Zhou, Rui-Xiao Zhang, Chenglei Wu, Xin Yao, and Lifeng Deep Reinforcement Learning. arXiv:1312.5602 [cs.LG] Sun. 2019. Comyco: Quality-Aware Adaptive Video Streaming via Imitation [29] Huan Peng, Yuan Zhang, Yongbei Yang, and Jinyao Yan. 2019. A Hybrid Control Learning. In Proceedings of the 27th ACM International Conference on Multimedia Scheme for Adaptive Live Streaming. In Proceedings of the 27th ACM International (Nice, France) (MM ’19). Association for Computing Machinery, New York, NY, Conference on Multimedia (Nice, France) (MM ’19). Association for Computing USA, 429–437. https://doi.org/10.1145/3343031.3351014 Machinery, New York, NY, USA, 2627–2631. https://doi.org/10.1145/3343031. [12] Tianchi Huang, Chao Zhou, Rui-Xiao Zhang, Chenglei Wu, Xin Yao, and 3356049 Lifeng Sun. 2020. Stick: A Harmonious Fusion of Buffer-based and Learning- [30] K. Spiteri, R. Urgaonkar, and R. K. Sitaraman. 2016. BOLA: Near-optimal bitrate based Approach for Adaptive Streaming. 1967–1976. https://doi.org/10.1109/ adaptation for online videos. In IEEE INFOCOM 2016 - The 35th Annual IEEE INFOCOM41043.2020.9155411 International Conference on Computer Communications. 1–9. https://doi.org/10. [13] Te-Yuan Huang, Nikhil Handigol, Brandon Heller, Nick McKeown, and Ramesh 1109/INFOCOM.2016.7524428 Johari. 2012. Confused, Timid, and Unstable: Picking a Video Streaming Rate [31] Yi Sun, Xiaoqi Yin, Junchen Jiang, Vyas Sekar, Fuyuan Lin, Nanshu Wang, Tao is Hard. In Proceedings of the 2012 Internet Measurement Conference (Boston, Liu, and Bruno Sinopoli. 2016. CS2P: Improving Video Bitrate Selection and Massachusetts, USA) (IMC ’12). Association for Computing Machinery, New York, Adaptation with Data-Driven Throughput Prediction. In Proceedings of the 2016 NY, USA, 225–238. https://doi.org/10.1145/2398776.2398800 ACM SIGCOMM Conference (Florianopolis, Brazil) (SIGCOMM ’16). Association [14] Te-Yuan Huang, Ramesh Johari, and Nick McKeown. 2013. Downton Abbey with- for Computing Machinery, New York, NY, USA, 272–285. https://doi.org/10. out the Hiccups: Buffer-Based Rate Adaptation for HTTP Video Streaming. In 1145/2934872.2934898 Proceedings of the 2013 ACM SIGCOMM Workshop on Future Human-Centric Mul- [32] Guibin Tian and Yong Liu. 2012. Towards Agile and Smooth Video Adaptation timedia Networking (Hong Kong, China) (FhMN ’13). Association for Computing in Dynamic HTTP Streaming. In Proceedings of the 8th International Conference Machinery, New York, NY, USA, 9–14. https://doi.org/10.1145/2491172.2491179 on Emerging Networking Experiments and Technologies (Nice, France) (CoNEXT [15] Te-Yuan Huang, Ramesh Johari, Nick McKeown, Matthew Trunnell, and Mark ’12). Association for Computing Machinery, New York, NY, USA, 109–120. https: Watson. 2014. A Buffer-Based Approach to Rate Adaptation: Evidence froma //doi.org/10.1145/2413176.2413190 Large Video Streaming Service. In Proceedings of the 2014 ACM Conference on SIG- [33] Cong Wang, Amr Rizk, and Michael Zink. 2016. SQUAD: A Spectrum-Based COMM (Chicago, Illinois, USA) (SIGCOMM ’14). Association for Computing Ma- Quality Adaptation for Dynamic Adaptive Streaming over HTTP. In Proceedings chinery, New York, NY, USA, 187–198. https://doi.org/10.1145/2619239.2626296 of the 7th International Conference on Multimedia Systems (Klagenfurt, Austria) [16] Junchen Jiang, Vyas Sekar, and Hui Zhang. 2012. Improving Fairness, Efficiency, (MMSys ’16). Association for Computing Machinery, New York, NY, USA, Article and Stability in HTTP-Based Adaptive Video Streaming with FESTIVE. In Proceed- 1, 12 pages. https://doi.org/10.1145/2910017.2910593 ings of the 8th International Conference on Emerging Networking Experiments and [34] Xiaoqi Yin, Abhishek Jindal, Vyas Sekar, and Bruno Sinopoli. 2015. A Control- Technologies (Nice, France) (CoNEXT ’12). Association for Computing Machinery, Theoretic Approach for Dynamic Adaptive Video Streaming over HTTP. In Pro- New York, NY, USA, 97–108. https://doi.org/10.1145/2413176.2413189 ceedings of the 2015 ACM Conference on Special Interest Group on Data Communica- [17] Xiaolan Jiang and Yusheng Ji. 2019. HD3: Distributed Dueling DQN with Discrete- tion (London, United Kingdom) (SIGCOMM ’15). Association for Computing Ma- Continuous Hybrid Action Spaces for Live Video Streaming. In Proceedings chinery, New York, NY, USA, 325–338. https://doi.org/10.1145/2785956.2787486 of the 27th ACM International Conference on Multimedia (Nice, France) (MM [35] Xiaoqi Yin, Vyas Sekar, and Bruno Sinopoli. 2014. Toward a Principled Framework ’19). Association for Computing Machinery, New York, NY, USA, 2632–2636. to Design Dynamic Adaptive Streaming Algorithms over HTTP. Proceedings https://doi.org/10.1145/3343031.3356052 of the 13th ACM Workshop on Hot Topics in Networks, HotNets 2014. https: [18] Eymen Kurdoglu, Yong Liu, Yao Wang, Yongfang Shi, ChenChen Gu, and Jing //doi.org/10.1145/2670518.2673877 Lyu. 2016. Real-Time Bandwidth Prediction and Rate Adaptation for Video [36] C. Zhou, C. Lin, X. Zhang, and Z. Guo. 2013. Buffer-based smooth rate adap- Calls over Cellular Networks. In Proceedings of the 7th International Conference on tation for dynamic HTTP streaming. In 2013 Asia-Pacific Signal and Infor- Multimedia Systems (Klagenfurt, Austria) (MMSys ’16). Association for Computing mation Processing Association Annual Summit and Conference. 1–9. https: Machinery, New York, NY, USA, Article 12, 11 pages. https://doi.org/10.1145/ //doi.org/10.1109/APSIPA.2013.6694183 2910017.2910608 [37] Xuan Kelvin Zou, Jeffrey Erman, Vijay Gopalakrishnan, Emir Halepovic, Rittwik [19] Z. Li, X. Zhu, J. Gahm, R. Pan, H. Hu, A. C. Begen, and D. Oran. 2014. Probe and Jana, Xin Jin, Jennifer Rexford, and Rakesh K. Sinha. 2015. Can Accurate Predic- Adapt: Rate Adaptation for HTTP Video Streaming At Scale. IEEE Journal on tions Improve Video Streaming in Cellular Networks?. In Proceedings of the 16th Selected Areas in Communications 32, 4, 719–733. https://doi.org/10.1109/JSAC. International Workshop on Mobile Computing Systems and Applications (Santa Fe, 2014.140405 New Mexico, USA) (HotMobile ’15). Association for Computing Machinery, New [20] Timothy P. Lillicrap, Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom York, NY, USA, 57–62. https://doi.org/10.1145/2699343.2699359 Erez, Yuval Tassa, David Silver, and Daan Wierstra. 2019. Continuous control with deep reinforcement learning. arXiv:1509.02971 [cs.LG] 8