Comparing Broadband ISP Performance Using Big Data from M-Lab
Total Page:16
File Type:pdf, Size:1020Kb
1 Comparing Broadband ISP Performance using Big Data from M-Lab Xiaohong Deng, Yun Feng, Thanchanok Sutjarittham, Hassan Habibi Gharakheili, Blanca Gallego, and Vijay Sivaraman Abstract—Comparing ISPs on broadband speed is challeng- remote areas served by low-capacity (wired or wireless) infras- ing, since measurements can vary due to subscriber attributes tructure will compare poorly to an ISP-B whose subscribers such as operation system and test conditions such as access are predominantly city dwellers connected by fiber; yet, it capacity, server distance, TCP window size, time-of-day, and network segment size. In this paper, we draw inspiration from could well be that ISP-A can provide higher speeds than ISP- observational studies in medicine, which face a similar challenge B to every subscriber covered by ISP-B! The comparison bias in comparing the effect of treatments on patients with diverse illustrated above arising from disparity in access capacity is characteristics, and have successfully tackled this using “causal but one example of many potential confounding factors, such inference” techniques for post facto analysis of medical records. as latency to content servers, host and server TCP window size Our first contribution is to develop a tool to pre-process and visualize the millions of data points in M-Lab at various time- settings, maximum segment size in the network, and time-of- and space-granularities to get preliminary insights on factors day, that directly bias measurement test results. Observational affecting broadband performance. Next, we analyze 24 months studies therefore need to understand and correct for such biases of data pertaining to twelve ISPs across three countries, and to ensure that the comparisons are fair. demonstrate that there is observational bias in the data due to In this study we draw inspiration from the field of medicine, disparities amongst ISPs in their attribute distributions. For our third contribution, we apply a multi-variate matching method which has grappled for decades with appropriate methods to identify suitable cohorts that can be compared without bias, to compare new drugs/treatments. “Patients” (in our case which reveals that ISPs are closer in performance than thought broadband subscribers) with “attributes” such as gender, age, before. Our final contribution is to refine our model by developing medical conditions, and prior medications (in our case access- a method for estimating speed-tier and re-apply matching for capacity, server-latency, host-settings, etc.) are given “treat- comparison of ISP performance. Our results challenge conven- tional rankings of ISPs, and pave the way towards data-driven ments” (in our case ISPs), and their efficacy needs to be approaches for unbiased comparisons of ISPs world-wide. compared. The dilemma is that any given patient can only be measured taking treatment-A or treatment-B, but not both Index Terms—Broadband performance, Big data, Data analyt- ics, Measurement lab at the same time; similarly, a subscriber in our case can only be observed when connected to one ISP, so the “ground truth” of that customer’s broadband performance served by I. INTRODUCTION other ISPs is never observed. To overcome this issue, the gold This paper asks the question: how should we compare standard for medical treatment comparisons is a “randomized Internet Service Providers (ISPs) in terms of the broadband control trial” (RCT), wherein each patient in the cohort is speeds they provide to consumers? (aspects such as pricing randomly assigned to one of the multiple treatments (one plans, quotas, and reliability are not considered in this paper). of which could be a placebo). The randomization is crucial On the face of it, determining the answer may seem simple: a here, in the expectation that known as well as unknown subscriber’s speed can be measured directly (say via a speed- attributes that could confound the experiment outcome get arXiv:2101.09795v1 [cs.PF] 24 Jan 2021 test tool or an adaptive bit-rate video stream), allowing ISPs to evenly distributed across the groups being compared so that be compared based on the average (or median) measured speed statistically meaningful inferences can be drawn. across their subscriber base. However, this approach has deep Alas, “randomized” assignment of ISPs to subscribers is conceptual problems: an ISP-A who has many subscribers in not a viable option in the real world, so we have to instead rely on “observational” studies that analyze performance data X. Deng was with the School of Electrical Engineering and Telecommu- nications, University of New South Wales, Sydney, NSW 2052, Australia given a priori assignment of ISPs to subscribers. Fortunately (e-mail: [email protected]). for us, techniques for observational studies are maturing T. Sutjarittham, H. Habibi Gharakheili, and V. Sivaraman are with rapidly, particularly in medicine where analyzing big data the School of Electrical Engineering and Telecommunications, Univer- sity of New South Wales, Sydney, NSW 2052, Australia (e-mails: from electronic health records is much cheaper than running [email protected], [email protected], [email protected]). controlled clinical trials, and can yield valuable insights on the Y. Feng is with Shanghai Huawei Technologies, Pudong, China, ZIP 201206 causal relationship between patient attributes and treatment (e-mail: [email protected]). B. Gallego is with the Centre for Big Data Research in Health, Uni- outcomes. In this work we have collaborated closely with versity of New South Wales, Sydney, NSW 2052, Australia (e-mail: a medical informatics specialist to apply “causal inference” [email protected]). techniques to analyzing ISP performance data – unlike a This submission is an extended and improved version of our paper presented at the ITNAC 2015 conference [1] classic supervised learning problem, causal inference works by estimating how things might look in different conditions, 2 thereby differentiating the influence of A versus B, instead a 30-day period [3]. While these large content providers of trying to predicting the outcome. We apply this method to undoubtedly have a wealth of measurement data, these are the wealth of broadband data available from the open M-Lab specific to their services, and neither their data nor their precise platform, which holds over 40 million measurement results comparison methods are available in the public domain (to be world-wide for the year 2016. Though no data-driven approach fair Google does outline a methodology on its video quality can guarantee that causal relationships are deduced correctly, report page, but it fails to mention important elements such as as there could be unknown attributes that affect outcome whether it only considers video streams of a certain minimum (the “unknown unknowns”, to use the Rumsfeld phrase), we duration, whether a house that watches more video streams believe that the M-Lab data set captures most, if not all, contributes more to the aggregate rating, and how it accounts of the important attributes that are likely to affect the speed for various factors such as browser type, server latency, etc. measurements. that vary across subscribers and can affect the measurement Our objective in this paper is to apply emerging data- outcome). Governments are also under increasing pressure to analysis techniques to the big-data from M-Lab to get new compile consumer reports on broadband performance – for insights into ISP broadband performance comparison. Our first example the FCC in the US [4] directs consumers to various contribution is somewhat incidental - we develop a tool that speed test tools to make their own assessment, and the ACCC allows researchers to easily and quickly process and depict in Australia [5] is running a pilot program to instrument M-Lab data to visualize performance metrics (speed, latency, volunteers’ homes with hardware probes to measure their loss, congestion) at various spatial (per-house, per-ISP, per- connection speeds. Additionally, various national regulators in country) and temporal (hourly, monthly, yearly) granularities. Europe employ their own methods of measuring broadband Our second contribution applies our tool to over 17 million speed and publish white papers as surveyed in [6] – for data samples taken in 2015 and 2016 spanning 12 ISPs in 3 example, the ofcom in the UK uses a hardware measurement countries, to identify the relative impact of various attributes unit (developed by SamKnows), while several other national (access speed-tier, host settings, server distance, etc.) on broad- regulators such as in Italy, Austria, Germany, Portugal, Slove- band performance. We reveal, both visually and analytically, nia use specialized software solutions (developed in-house) that dominant attributes can vary significantly across ISPs, while the regulator in Greece adopted M-Lab’s NDT tool. corroborating our earlier assertion that subscriber cohorts have While there is a commendable amount of effort being disparate characteristics and ISP comparisons are therefore expended on collecting data, via either passive measurement riddled with bias. Our third contribution is to apply a causal of video traffic or active probing using hardware devices (we inference technique, called multi-variate matching, to filter refer the reader to a recent survey [7] that gives an overview of the data sets by identifying cohorts with similar attributes measurement platforms and standardization efforts), less effort across ISPs.