Freelab: a Free Experimentation Platform
Total Page:16
File Type:pdf, Size:1020Kb
FreeLab: A Free Experimentation Platform Matteo Varvello|; Diego Perino? |AT&T Labs – Research, ?Telefónica Research ABSTRACT In this work, we set out to build a free experimentation As researchers, we are aware of how hard it is to obtain access platform which can also be reliable and up-to-date. In classic to vantage points in the Internet. Experimentation platforms experimentation platforms applications run directly at vantage are useful tools, but they are also: 1) paid, either via a mem- points; we revert this rationale by proposing to use vantage bership fee or by resource sharing, 2) unreliable, nodes come points as traffic relays while running the application at theex- and go, 3) outdated, often still run on their original hardware perimenter’s machine(s). By leveraging free Internet relays as and OS. While one could build yet-another platform with vantage points, we can make such experimentation platform up-to-date and reliable hardware and software, it is hard to free. The drawback of this approach is the introduction of imagine one which is free. This is the goal of this paper: we extra errors (path inflation, header manipulation, bandwidth set out to build FreeLab, a free experimentation platform shrinkage) which need to be carefully corrected. which also aims to be reliable and up-to-date. The key idea This paper presents FreeLab, a free experimentation plat- behind FreeLab is that experiments run directly at its user form built atop of thousand of free HTTP(S) and SOCKS(5) machines, while traffic is relayed by free vantage points inthe Internet proxies [38]—to enable experiments based on TCP, Internet (web and SOCKS proxies, and DNS resolvers). Free- UDP, and HTTP(S)—and thousand free DNS resolvers [34]— Lab is thus free of access by design and up-to-date as far as its to enable DNS-based experiments. The key challenges of users maintain their experimenting machines. Reliability is a building FreeLab are twofold. First, free Internet resources key challenge due to the volatile nature of free resources, and need to be discovered and constantly monitored to provide the introduction of errors (path inflation, header manipulation, reliable vantage points. Second, relaying an application’s traf- bandwidth shrinkage) caused by traffic relays. fic through a vantage point is not the same as running the application at the vantage point. Additional delays and header 1 INTRODUCTION manipulations are few of the issues that can cause the two methodologies to diverge. FreeLab handle these issues by A large number of measurement platforms [3, 21, 37] and constructing correction filters before each experiment and experimentation test-beds [21, 27, 39] exist today, each with directly applying such corrections both on the fly (headers) different access rules and capabilities. The rise of cheap(er) and after the experiment is completed (packet timestamps). cloud computing has also motivated researchers to build small While FreeLab is not perfect, in this paper we discuss and scale dedicated test-beds, e.g., using Amazon EC2 [22, 23], demonstrate a fair set of experiments that can already be run for their experiments. Luminati/Hola [4, 41] have also been on it. FreeLab’s code is not yet ready for prime time, but our used to gather a large set of distributed vantage points. ongoing work consists in open sourcing it as soon as possible. These experimentation platforms share few limitations. We hope that FreeLab’s novel design will stimulate research First, they are paid either via a membership fee or by do- in how to improve the accuracy of the correction filters and nating hardware resources. Second, they are often unreliable augment the set of experiments it can run. in the sense that nodes are frequently down and exhibit un- outdated predictable behaviors. Third, they are as nodes have 2 MOTIVATION AND RELATED WORK been deployed at some point and both their hardware and software (OS) have not been updated ever since. As an exam- We are not aware of any system like FreeLab. However, ple, our measurements show that only 10% of the PlanetLab many work investigated the design and deployment of ex- (Europe) nodes are active; even worse, active nodes still run perimentation platforms; a complete survey can be found Fedora 8, which will be 10 years old in November 2017. in [1]. Recently, few works have leveraged novel ideas to gather vantage points all around the world, and are thus great Permission to make digital or hard copies of all or part of this work for motivations for our work. We briefly discuss them here. personal or classroom use is granted without fee provided that copies are not Hola [12] is a popular Chrome plugin that provides access made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components to (private) proxies from many locations around the world. of this work owned by others than ACM must be honored. Abstracting with Hola users can either pay for the service or get it for free if credit is permitted. To copy otherwise, or republish, to post on servers or to they contribute by proxying traffic for other users. A com- redistribute to lists, requires prior specific permission and/or a fee. Request mercial version of Hola is also available under the name permissions from [email protected]. Luminati [20]. Tyson et al. [41] use Hola to gather 143k van- HotNets-XVI, November 30-December 1, 2017, Palo Alto, CA, USA tage points in 3,818 Autonomous Systems (ASes) and study © 2017 Association for Computing Machinery. ACM ISBN 978-1-4503-5569-8/17/11. $15.00 header manipulations by middleboxes deployed in the Inter- https://doi.org/10.1145/3152434.3152436 net. In [4], the authors use Luminati to detect violations of In order to be a free platform, FreeLab relies on three key 10k ideas. First, it shifts the experiment logic from the vantage points to the machine(s) of an experimenter, i.e., a researcher 1k aiming to run an experiment. Second, it leverages free re- sources on the Internet as vantage points. Third, it uses cor- 100 rection filters to remove the bias caused by not running an 10 experiment directly at a vantage point. As most experimentation platforms, FreeLab is composed Count [#] 1 of vantage points, control server, and client. Figure 2 shows Downloads FreeLab and how its components are interconnected. We Total users Avg weekly users now describe each component in detail, and then discuss what Automated Downloads we envisage FreeLab can be used for. 03/17 04/01 04/15 05/01 05/15 06/01 06/15 Figure 1: Adoption and usage of our Chrome plugin. 3.1 Vantage Points In a classic experimentation platform, the vantage points de- application-level end-to-end connectivity on the Internet, i.e., fine the footprint or from how many locations experiments NXDOMAIN hijacking, HTTP content manipulation, and can be run. The software and hardware characteristics of a HTTPs certificate replacement. FreeLab shares a similar ra- vantage point define which experiments it can run, e.g., some tionale with [4, 41], in the sense that it aims at using multiple vantage points might run old OS versions where recent li- vantage points without access to such nodes. However, Free- braries cannot be installed. FreeLab’s vantage points also Lab generalizes the idea with the goal to build a free and define its footprint but the experiments they can run solely full-fledged experimentation platform. depend on which traffic type they can forward. We have iden- Perhaps the best motivation to FreeLab comes from our tified three sets of free resources in the Internet which can recent work [25] where we built ProxyTorrent, a distributed forward, respectively, HTTP(S), TCP, UDP, and DNS traffic, system to study and efficiently use free web proxies, free of allowing FreeLab to support a great number of experiments. charge intermediary boxes enabling HTTP(S) connections Free Web Proxies — These are, in theory, hundred of thou- between a client and a server. We made our system publicly sands of hosts [38] that forward HTTP(S) traffic at no cost. available on March 2017 through a Chrome’s plugin which In reality, our previous work (see Section 2) shows that most emulates Hola functionalities across free web proxies [7, 26]. hosts are down, many are volatile, and few even compromise Since its release, our plugin has attracted ∼1,500 users and user’s privacy or attempt to install malware. On average, only it has facilitated the download of ∼2TB of traffic. Figure 1 ∼2 thousand safe and reliable free web proxies are available shows the plugin’s number of weekly users, installations (to- daily (see Figure 3(a)), out of which 46% can also forward tal users), and downloads, or how many URLs have been HTTPS traffic. FreeLab uses reliable and safe free web prox- requested to be proxied, during the months after its release. ies to support experiments based on HTTP/HTTPS. The number of downloads peaked to 8,000 and 12,000 down- loads on 05/10 and 06/05, respectively. Manual inspection Free Socks Proxies — These are publicly available hosts reveals that these extra downloads are due to subsequent ses- on the Internet that support the SOCKS protocol [18], i.e., sions that retrieve the exact same amount of bytes via different a mechanism to transfer TCP/UDP traffic via a proxy. TCP proxies. This activity does not resemble human behavior but forwarding is supported by all SOCKS versions and proxies; rather some automated tools.