
Small Is Not Always Beautiful Paweł Marciniak∗ Nikitas Liogkas Arnaud Legout Eddie Kohler Poznan University of UCLA I.N.R.I.A. UCLA Technology, Poland Los Angeles, CA Sophia Antipolis, France Los Angeles, CA [email protected] [email protected] [email protected] [email protected] Abstract peers. This paper investigates this parameter by running real experiments with varying piece sizes Peer-to-peer content distribution systems have on a controlled testbed, and demonstrates that been enjoying great popularity, and are now gain- piece size is critical for performance, as it deter- ing momentum as a means of disseminating video mines the degree of parallelism available in the sys- streams over the Internet. In many of these proto- tem. Our results also show that, for small-sized con- cols, including the popular BitTorrent, content is tent, smaller pieces enable shorter download times, split into mostly fixed-size pieces, allowing a client and as a result, BitTorrent’s design choice of fur- to download data from many peers simultaneously. ther dividing content pieces into subpieces is un- This makes piece size potentially critical for per- necessary for such content. We evaluate the over- formance. However, previous research efforts have head that small pieces incur as content size grows largely overlooked this parameter, opting to focus and demonstrate a trade-off between piece size on others instead. and available parallelism. We also explain how this This paper presents the results of real experi- trade-off motivates the use of both pieces and sub- ments with varying piece sizes on a controlled Bit- pieces for distributing large content, the common Torrent testbed. We demonstrate that this parame- case in BitTorrent swarms. ter is indeed critical, as it determines the degree of The rest of this paper is organized as follows. parallelism in the system, and we investigate op- Section 2 provides a brief description of the Bit- timal piece sizes for distributing small and large Torrent protocol, and describes our experimental content. We also pinpoint a related design trade- methodology. Section 3 then presents the results of off, and explain how BitTorrent’s choice of dividing our experiments with varying piece sizes, while Sec- pieces into subpieces attempts to address it. tion 4 discusses potential reasons behind the poor 1 Introduction performance of small pieces when distributing large content. Lastly, Section 5 describes related work Implementation variations and parameter settings and Section 6 concludes. can severely affect the service observed by the clients of a peer-to-peer system. A better under- 2 Background and Methodology standing of protocol parameters is needed to im- BitTorrent Overview BitTorrent is a popular prove and stabilize service, a particularly impor- peer-to-peer content distribution protocol that has tant goal for emerging peer-to-peer applications been shown to scale well with the number of par- such as streaming video. ticipating clients. Prior to distribution, the content BitTorrent is widely regarded as one of the most is divided into multiple pieces, while each piece is successful swarming protocols, which divide the further divided into multiple subpieces.A metainfo content to be distributed into distinct pieces and file containing information necessary for initiat- enable peers to share these pieces efficiently. Pre- ing the download process is then created by the vious research efforts have focused on the algo- content provider. This information includes each rithms believed to be the major factors behind Bit- piece’s SHA-1 hash (used to verify received data) Torrent’s good performance, such as the piece and and the address of the tracker, a centralized com- peer selection strategies. However, to the best of ponent that facilitates peer discovery. our knowledge, no studies have looked into the op- In order to join a torrent—the collection of timal size of content pieces being exchanged among peers participating in the download of a particular ∗Work done while an intern at INRIA Sophia Antipolis. content—a client retrieves the metainfo file out of band, usually from a Web site. It then contacts the and seed to model more realistic scenarios, but do tracker, which responds with a peer set of randomly not impose any download limits, as we wish to ob- selected peers. These might include both seeds, who serve differences in download completion time with already have the entire content and are sharing it varying piece sizes. The upload limits for leechers with others, and leechers, who are still in the pro- follow a uniform distribution from 20 to 200 kB/s, cess of downloading. The new client can then start while the seed’s upload capacity is set to 200 kB/s. contacting peers in this set and request data. Most We collect our measurements using the offi- clients nowadays implement a rarest-first policy for cial (mainline) BitTorrent implementation, instru- piece requests: they first ask for the pieces that ex- mented to record interesting events. Our client is ist at the smallest number of peers in their peer based on version 4.0.2 of the official implementa- set. Although peers always exchange just subpieces tion and is publicly available for download [1]. We with each other, they only make data available in log the client’s internal state, as well as each mes- the form of complete pieces: after downloading all sage sent or received along with the content of the subpieces of a piece, a peer notifies all peers in its message. Unless otherwise specified, we run our ex- peer set with a have message. Peers are also able to periments with the default parameters. determine which pieces others have based on a bit- The protocol does not strictly define the piece field message, exchanged upon the establishment of and subpiece sizes. An unofficial BitTorrent speci- new connections, which contains a bitmap denoting fication [3] states that the conventional wisdom is piece possession. to “pick the smallest piece size that results in a Each leecher independently decides who to ex- metainfo file no greater than 50–75 kB”. The most change data with via the choking algorithm, which common piece size for public torrents seems to be gives preference to those who upload data to the 256 kB. Additionally, most implementations nowa- given leecher at the highest rates. Thus, once days use 16 kB subpieces. For our experiments, we per rechoke period, typically every ten seconds, a always keep the subpiece size constant at 16 kB, leecher considers the receiving data rates from all and only vary the piece size. We have results for leechers in its peer set. It then picks out the fastest all possible combinations of different content sizes ones, a fixed number of them, and only uploads to (1 MB, 5 MB, 10 MB, 20 MB, 50 MB, and 100 MB) those for the duration of the period. Seeds, who and piece sizes (16 kB, 32 kB, 64 kB, 128 kB, do not need to download any pieces, follow a dif- 256 kB, 512 kB, 1024 kB, and 2048 kB). ferent unchoke strategy. Most current implementa- tions unchoke those leechers that download data at 3 Results the highest rates, to better utilize seed capacity. Our results, presented in this section, demonstrate that small pieces are preferable for the distribu- Experimental Methodology We have per- tion of small-sized content. We also discuss the formed all our experiments with private torrents benefits and drawbacks of small pieces for other on the PlanetLab platform [5]. These torrents com- content sizes, and evaluate the communication and prise 40 leechers and a single initial seed sharing metainfo file overhead that different piece sizes in- content of different sizes. Leechers do not change cur for larger content. their available upload bandwidth during the down- load, and disconnect after receiving a complete 3.1 Small Content copy of the content. The initial seed stays con- Even though most content distributed with Bit- nected for the duration of the experiment, while all Torrent is large, it is still interesting to examine the leechers join the torrent at the same time, emulat- impact of piece size on distributing smaller content. ing a flash crowd scenario. The number of parallel In addition to gaining a better understanding of the upload slots is set to 4 for the leechers and seed. trade-offs involved, it may also sometimes be desir- Although system behavior might be different with able to utilize BitTorrent to avoid server overload other peer arrival patterns and torrent configura- when distributing small content, e.g., in the case of tions, there is no reason to believe that the conclu- websites that suddenly become popular. Figure 2 sions we draw are predicated on these parameters. shows the median download completion times of The available bandwidth of most PlanetLab the 40 leechers downloading a 5 MB file, for dif- nodes is relatively high for typical real-world ferent numbers of pieces, along with standard de- clients. We impose upload limits on the leechers viation error bars. Clearly, smaller piece sizes en- Download Completion Time Download Completion Time Average Peer Upload Utilization Average Peer Upload Utilization 1 1 1 1 0.8 0.8 0.8 0.8 0.6 0.6 0.6 0.6 0.4 0.4 0.4 0.4 Upload utilization Upload utilization 0.2 0.2 0.2 0.2 Cumulative fraction of peers Cumulative fraction of peers 0 0 0 0 0 50 100 150 200 0 50 100 150 200 250 0 10 20 30 40 0 10 20 30 40 50 Completion time (s) Completion time (s) Time slot (5s) Time slot (5s) (a) Piece size of 16 kB (b) Piece size of 512 kB (c) Piece size of 16 kB (d) Piece size of 512 kB Figure 1: CDFs of peer download completion times and scatterplots of average upload utilization for five-second time intervals when distributing a 5 MB content (averages over 5 runs).
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages6 Page
-
File Size-