A Virtualized Infrastructure for Automated Bittorrent Performance Testing and Evaluation
Total Page:16
File Type:pdf, Size:1020Kb
A Virtualized Infrastructure for Automated BitTorrent Performance Testing and Evaluation R˘azvan Deaconescu George Milescu Bogdan Aurelian R˘azvan Rughini¸s Nicolae T¸˘apu¸s University Politehnica of Bucharest Computer Science Department Splaiul Independent¸ei nr. 313, Bucharest, Romania frazvan.deaconescu, george.milescu, bogdan.aurelian, razvan.rughinis, [email protected] Abstract 1 Introduction In the last decade, file sharing systems have gener- P2P sharing systems are continuously developing ally been dominated by P2P solutions. Whereas email and increasing in size. There is a large diversity of so- and HTTP have been the \killer apps" of the earlier In- lutions and protocols for sharing data and knowledge ternet, a large percentage of the current Internet back- which enable an increasing interest from common users bone traffic is BitTorrent traffic [15]. BitTorrent has and commercial and academic institutions [22]. proven to be the perfect file sharing solution for a de- It is assumed [15] that BitTorrent is responsible for centralized Internet, moving the burden from central a large portion of all Internet traffic. BitTorrent has servers to each individual station and maximizing net- proven to be the \killer" application of the recent ears, work performance by enabling unused communication by dominating the P2P traffic in the Internet [24]. paths between clients. During the recent years BitTorrent [16] has become the de facto P2P protocol used throughout the Inter- Although there have been extensive studies regarding net. A large portion of the Internet backbone is cur- the performance of the BitTorrent protocol and the im- rently comprised of BitTorrent traffic [18]. The decen- pact of network and human factors on the overall trans- tralized nature of the protocol insures scalability, fair- fer quality, there has been little interest in evaluating, ness and rapid spread of knowledge and information. comparing and analyzing current real world implemen- As a decentralized system, a BitTorrent network is tations. With hundreds of BitTorrent clients, each ap- very dynamic and download performance is influenced plying different algorithms and performance optimiza- by many factors: swarm size, number of peers, network tion techniques, we consider evaluating and comparing topology, ratio enforcement. The innate design of the various implementations an important issue. BitTorrent protocol implies that each client may get a In this paper, we present a BitTorrent performance higher download speed by unchoking a certain client. evaluation infrastructure that we are using with two At the same time, firewalls and NAT have continuously purposes: to test and compare current real world Bit- been a problem for modern P2P systems and decrease Torrent implementations and to simulate complex Bit- the overall performance. Torrent swarms. Our infrastructure consists of a virtu- Despite implementing the BitTorrent specification alized environment simulating complete P2P nodes and [23] and possible extensions each client uses different a fully automated framework. For relevant use, differ- algorithms and behaves differently on a given situation: ent existing BitTorrent clients have been instrumented it may limit the number of peers, it may use heuristic to output transfer status data and extensive logging in- information for an optimistic unchoke, it could choose formation. a better client to download from. An important point of consideration is the diversity and heterogeneity of peers in the Internet. Some peers have low bandwidth Keywords: BitTorrent; virtualization; automa- connections, some act behind NATs and firewalls, some tion; performance evaluation; client instrumentation use certain improvements to the protocol. These as- pects make a thorough analysis of the protocol or of its 2 Background implementations difficult as there is little to no control over the parameters in a real BitTorrent swarm. Our paper deals with recent concepts related to The results presented in this paper are a continu- peer-to-peer networks, BitTorrent in particular, and ation of previous work on BitTorrent applications as virtualization. This section gives some definitions of described at ICNS 2009 [1]. terms used throughout the paper. P2P networks are part of the peer-to-peer Our paper presents a BitTorrent performance eval- paradigm. Each peer is simultaneously a client and a uation infrastructure [30] that enables creating a con- server. P2P networks are decentralized systems sharing tained environment for BitTorrent evaluation, testing information and bandwidth as opposed to the classical various BitTorrent implementations and offers exten- centralized client-server paradigm. sive status information about each peer. This informa- BitTorrent is the most used P2P protocol in the tion can be used for analysis, interpretation and corre- Internet. Since its creation by Bram Cohen in 2001, lation between different implementations and for ana- BitTorrent has proven to provide the best way to allow lyzing the impact of a swarms state on the download file distribution among its peers. The BitTorrent pro- performance. tocol makes a separation between a file’s content and In order to simulate an environment as real as pos- its metadata. The metadata is stored in a specialized sible, hundreds to thousands of computer systems are .torrent file. The .torrent file stores piece information required, each running a particular BitTorrent imple- and hashes and tracker information (see below) and is mentation. Modern clusters could offer this environ- usually distributed through the use of a web server. ment, but the experiments require access to all systems, BitTorrent is not a completely decentralized protocol. making the availability of such a cluster an issue. A special server, called tracker is used to intermediate The approach we propose in this paper is to use a initial connections between peers. virtualization solution to accommodate a close-to-real- A set of peers sharing a particular file (i.e. having world testing environment for BitTorrent applications access and using the same .torrent file) are said to be at a fraction of the costs of a real hardware solution part of the same swarm. A tracker can mediate com- (considering the number of computer systems). Our munication in multiple swarms at the same time. Each virtualization solution uses the lightweight OpenVZ peer within a swarm is either a seeder of a leecher. A [21] application that enables fast creation, limited exe- seeder is a peer who has complete access to the shared cution overhead and low resource consumption. In this file; the seeder is only uploading. For a swarm to ex- paper we show that, by using commodity hardware and ist there has to be an initial seeder with access to the OpenVZ, a virtual testing environment can be created complete file and its associated metadata in the .tor- with at least ten times more simulated systems than rent file. A peer is a leecher as long as it has only the real one used for deployment. partial access to the file (i.e. it is still downloading). A healthy swarm must contain a good number of seeders. On top of the virtualized infrastructure, we devel- There is a great variety of BitTorrent clients, some oped a fully automated BitTorrent performance evalu- of which have been the subject of the experiments ation framework. All tested clients have been instru- described in this paper. There are also BitTorrent mented to use command line interfaces that enable au- libraries (such as libtorrent-rasterbar or libtorrent- tomated actions. Clients are started simultaneously rakshasa) that form the basis for particular BitTorrent and results are collected after the simulation is com- implementations. Some of the more popular clients are plete. uTorrent, Azureus, Transmission, rTorrent, BitTorrent The paper is organized as follows: Section 2 pro- (the official BitTorrent client, also known as Mainline). vides background information, keywords and acronyms Our paper describes the use of virtualization tech- used throughout the article, Section 3, 4, 5 present nology in the benefit of simulating partial or complete the infrastructure and framework used for our BitTor- BitTorrent swarms. For our experiments we have used rent experiments; Section 6 and 7 describe OpenVZ the OpenVZ [21] virtualization solution. OpenVZ is and MonALISA, the virtualization and monitoring so- an operating-system level virtualization solution. This lution we used; we present the experimental setups and means that each virtual machine (also known as VE - results of various experiments in Section 8; Section 9 virtual environment) that it will run on the same kernel describes the web interface architecture built on top of as the host system. This approach has the advantage of the framework; Section 10 and 11 present concluding using a small amount of resources for virtual machine remarks and related work. implementation. Each OpenVZ VE uses a part of the OpenVZ virtual machine implementation to run mul- tiple virtual systems on the same hardware node. Each virtual machine contains the basic tools for running and compiling BitTorrent clients. Tested Bit- Torrent implementations have been instrumented for automated command and also for outputting status and logging information required for subsequent analy- sis and result interpretation. As the infrastructure aims to be generic among different client implementations, the addition of a new BitTorrent client resumes only at adding the required scripts and instrumentation. Communication