Pitfalls of Data-Driven Networking: a Case Study of Latent Causal Confounders in Video Streaming
Total Page:16
File Type:pdf, Size:1020Kb
Pitfalls of data-driven networking: A case study of latent causal confounders in video streaming P. C. Sruthi Sanjay Rao Bruno Ribeiro [email protected] [email protected] [email protected] Purdue University Purdue University Purdue University ABSTRACT to experience no rebuffering events if the lowest quality video This paper motivates the need to support counterfactual reasoning choice were eliminated. In analyzing such questions, it is useful to (i.e., answer "what-if" questions about events that did not occur) ask what the download time of a video segment could have been if when collecting network data. We focus on video streaming – e.g., a bitrate different than the one selected in the actual session had given logs of a video session, a video publisher may ask whether been chosen. a user would continue to experience no rebuffering events if the Answering what-if questions of this nature is also known as coun- lowest quality video choice were eliminated. We discuss potential terfactual reasoning. Counterfactual reasoning considers the effect pitfalls related to counterfactual reasoning, and argue that dynamic of events that did not occur while the data was being recorded [13], network state (e.g., bandwidth) serves as a confounding yet hidden and is often used in fields such as epidemiology14 [ ]. Unfortunately, (latent) feature that complicates such analyses. We illustrate the such reasoning with networking data is challenging because many challenges, and present preliminary methods to address them using networked systems are adaptive in that they make decisions based concrete examples. Our evaluations show that existing approaches, on the network state (e.g., bandwidth, latency), and this state itself including randomized trials (collecting data from an algorithm impacts any observable measurements (e.g., download times). For that selects bitrates randomly), are by themselves inadequate for instance, Internet video is typically split into chunks, with each counterfactual reasoning related to video streaming, and must be chunk encoded at multiple bitrates. ABR algorithms decide what supplemented by techniques that explicitly infer latent features. bitrate to pick for each chunk based on their perception of network bandwidth, but the bandwidth in turn determines the download CCS CONCEPTS time of the chunk. Thus, the network bandwidth acts as a confound- ing variable that makes it difficult to easily infer the download time • Networks → Network performance modeling; • Informa- that a video chunk would see if an alternate decision had been made tion systems → Multimedia streaming. by the algorithm. KEYWORDS While some prior work considers what-if questions (e.g., [17]), they do not consider the biases arising out of confounding variables. Data-driven networking, Video delivery, Adaptive bitrate algo- Other works have recognized the need to account for confounding rithms variables when analyzing network data [3, 10, 18], though they ACM Reference Format: do not consider counterfactual reasoning. Further, these works P. C. Sruthi, Sanjay Rao, and Bruno Ribeiro. 2020. Pitfalls of data-driven only deal with observable confounding variables such as a user’s networking: A case study of latent causal confounders in video streaming. connection type (i.e., whether a user is behind a DSL, cable modem In Workshop on Network Meets AI & ML (NetAI’20), August 14, 2020, Virtual or WiFi), for which information is available as part of the data. Event, NY, USA. ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/ Unfortunately, variables that capture dynamic network state (e.g., 3405671.3405815 bandwidth, latency) are latent, and require new methodologies not 1 INTRODUCTION covered in prior work. In medical studies, causal effects are directly measured by ran- When designing networked systems in a manner informed by data domized controlled trials (RCTs)[4], eliminating the need to deal from real deployments, it is common to ask "what-if questions", with latent confounders. In networking, for instance, this would on the potential impact if an alternate choice had been made. For imply that rather than using an ABR algorithm that picks video instance, given data collected from real video streaming sessions, bitrates based on perceived network state, we could collect data a video publisher may wish to understand the performance if a from video sessions where chunk sizes are picked at random. Unfor- different Adaptive Bitrate (ABR) algorithm were used. Alternately, tunately, we show that RCTs are not a silver bullet in networking a publisher may wish to understand whether a user would continue applications: we still need to deal with latent confounders, oth- Permission to make digital or hard copies of all or part of this work for personal or erwise, in the example above, RCTs would have no effect on our classroom use is granted without fee provided that copies are not made or distributed ability to measure the counterfactual effect of an ABR algorithm for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM picking different bitrates. must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, Motivated by these observations, we argue for a research method- to post on servers or to redistribute to lists, requires prior specific permission and/or a ology that explicitly accounts for such latent confounding variables. fee. Request permissions from [email protected]. NetAI’20, August 14, 2020, Virtual Event, NY, USA Our approach involves (i) identifying the causal dependency graph © 2020 Association for Computing Machinery. aided by domain knowledge; (ii) identifying measured and latent ACM ISBN 978-1-4503-8043-0/20/08...$15.00 confounders; (iii) inferring latent features; and (iv) grouping data https://doi.org/10.1145/3405671.3405815 NetAI’20, August 14, 2020, Virtual Event, NY, USA P. C. Sruthi, S. Rao, and B. Ribeiro on the inferred latent features. We illustrate the approach for a case study with video streaming, and an example bandwidth pro- cess. The step of inferring latent features is a crucial step which we achieve using Maximum A Posteriori estimation in our case study. While the bandwidth process we consider may seem simplistic, it is nevertheless useful in establishing the importance of accounting for latent confounders. Figure 1: Causal relationships between size, bandwidth, and We present empirical results that show that latent confounding download time. The true bandwidth (unshaded) is hidden, while variables must be accounted for while evaluating what-if questions the others are measured and available in the data. in these systems, and that —because of these latent confounders— widely used approaches like direct emulation from data incorrectly estimate the effects of changes in video streaming systems. Our T results also show that accounting for latent confounders is critical to ensure good results with randomized trials and observational Cℎ C; studies. 퐵ℎ 2 CAUSALITY IN VIDEO STREAMING 2.1 The need for causal inference In video streaming, a video is split into chunks, each encoded at Bandwidth (Mbps) multiple bitrates. To cope with variability in network bandwidth, 퐵; clients run Adaptive BitRate (ABR) algorithms [2, 6, 12, 16, 19, 20]. For each chunk, these algorithms select what bitrate to pick based \ k ! on the client’s estimated throughput, and the amount of data stored Time in the client buffer. Figure 2: An example time-varying bandwidth process Consider records of a video session with # chunks. These records = f¹ ºg# minimally contain a set of tuples of the form Data C8,B8,38 8=1, chunk size ( based on a prediction of the value of the bandwidth 퐵, where which therefore acts as a confounding variable for both ( and 퐷. C8 : start time of 8-th video chunk download request, To understand the effect of latent confounder 퐵 on counterfac- tual reasoning in ABR algorithms, consider the following scenario. B8 : the 8-th video chunk size, Assume 퐵 2 f1, 2g MBps and whenever 퐵 = 1 we have the ABR 38 : the 8-th chunk’s download time. algorithm choosing ( = 1 MB which implies 퐷 = 1 second; and Using this data, we wish to answer the following what-if counter- when 퐵 = 2 we have the ABR algorithm choosing ( = 2 MB which factual question: for a given 8 2 f1,...,# g, what would have been again implies 퐷 = 1 second. A pure data-driven approach would 0 ¹ = j = º = ¹ = j = º = the download time 38 , if chunk 8 had been downloaded starting at conclude that % 퐷 1 ( 1 % 퐷 1 ( 2 1. Hence, 0 = time C8 , and if we had requested chunk size B8 < B8 ? In Section 3, statistically, it seems like 퐷 1 regardless of the choice of (. The we show that under some conditions, even assuming no latency above mistake comes from confusing % ¹퐷j(º, which we directly 0 ¹ j ¹ ºº or transport layer effects, one cannot accurately estimate 38 using obtain from the data, with the counterfactual query % 퐷 3> ( , 0 0 38,B8,B8 , and C8 . Without an accurate estimate of 38 , we are unable which is the distribution of download times if the size had been to precisely evaluate the potential of an alternate ABR algorithm. forced to be (. Here, the challenge of counterfactual evaluation comes from the In the above example, it may appear that one could estimate 퐵ˆ = causal relationships among observed and latent (hidden) variables, (/퐷 (i.e., the average throughput observed during the download), some of which may be confounders. Consider three variables -,. and use this information in the conditional. But, the true bandwidth and /. If - influences . , but / influences both - and . , / is a con- 퐵 may vary during a download, resulting in the estimation 퐵ˆ not founder, because it introduces a correlation/dependence between being theoretically sound as we illustrate next.