Power-law revisited: A large scale measurement study of P2P content popularity

György Dán Niklas Carlsson School of Electrical Engineering Department of Computer Science KTH, Royal Institute of Technology University of Calgary Stockholm, Sweden Calgary, Canada

27 April 2010, IPTPS San Jose, CA

1 P2P Content Popularity

• Instantenous popularity • Concurrent number of peers • Effectiveness of locality awareness • Little data available • Power-law ?

• Download popularity • Number of peers that downloaded content • Effectiveness of caching • Several measurements • Power-law but flattened head (Mandelbrot-Zipf)

• Measurements limited in time and coverage • How accurate are they? • How accurate can they be? L. Guo, S. Chen, Z. Xiao, E. Tan, X. Ding, and X. Zhang, Measurement, Analysis, and Modeling of -like Systems, in Proc. ACM IMC, Oct. 2005. M. Hefeeda and O. Saleh, Traffic modeling and proportional partial caching for peer-to-peer systems, IEEE/ACM Trans. on Networking, vol. 16, no. 6, pp. 14471460, 2008.

2 Measuring P2P Content Popularity

• Overlay structure  Measurement methodology • Tracker based (BitTorrent) • Peer harvesting • Tracker query - scrape • Deep packet inspection

• Unstructured (, Ares, FastTrack) • Monitoring queries and replies • Deep packet inspection

• Measurement = Sample of population wide popularity • Probability sampling - difficult • Opportunity sampling • Inference can be misleading

3 Measurement Methodology

• Screen scrape of Mininova.org • Largest torrent search engine • 31 Aug. 2008, 15 Oct. 2008, 31 Aug. 2009 • Scrape URL of 1690 BitTorrent trackers

• Scrape of 721 BitTorrent trackers (S,L,D) • 15 Sept. 2008 to 17 Aug. 2009 • weekly, daily at 8pm GMT • Almost instantaneous (<30mins)

Time

Tracker scrapes Mininova scrapes 4 Zipf’s Law and Beyond

• Zipf’s Law f f() r  1 Zipf(,) f1  r

• Heavy tail ar lime fZipf(,) f  ( r )   a  0,  0 r 1 • Mandelbrot-Zipf Law f f() r  1 MZipf(,,) f1   ()  r  • Flattened head

• Generalized Zipf Law f f() r  1 GZipf(,,,) f1    (1 /   (  /  )e(1/ ) r )  • Light tail ar lime fGZipf(,,,) f    ( r )   a  0,  0 r 1 • Flattened head • Power-law trunk

5 Zipf’s Law and Beyond - Example

Head Trunk 6 10

4 10 Tail Popularity 2 10 Zipf(1e+007,1) MZipf(1e+007,50,1)

0 GZipf(2e+005,0.02,1e-005,1) 10 0 2 4 6 10 10 10 10 Rank

6 What we measured (I)

• Instantaneous popularity

6 10 15.09.2008 Max peers 22.09.2008 23.09.2008 4 10 17.08.2009

Total peers:

2 42 million Number of peers Number 10 Power-law?

0 10 0 2 4 6 10 10 10 10 Torrent rank (r) Active swarms

7 What we measured (II)

• Download popularity

10 Max downloads: 10 50 million 17 Aug.2009 (48 weeks) 16 Mar.2009 (26 weeks) 13 Oct.2008 (4 weeks) Total downloads: 22 Sept.2008 (1 week) 8.3 billion

5 Power-law? 10

Power-law?

Numberof downloads (15 Sept.2008 to) Active swarms 0 10 0 2 4 6 8 10 10 10 10 10 Torrent rank (r) 8 Instantaneous Popularity

• Instantaneous popularity 15 Sept 2008, 8pm GMT • Max:1.6x105, Total: 4.23x107, Active: 2.93x106

6 10 Peers Leechers Seeds 4 10

2 10

Number of peers (leechers,seeds) peers of Number 0 10 0 2 4 6 10 10 10 10 Torrent rank (r)

9 Power-law or Double-power-law?

• Instantaneous popularity 15 Sept 2008, 8pm GMT • Max:1.6x105, Total: 4.23x107, Active: 2.93x106

6 10 Peers Power-law trunk Leechers hypothesis: Seeds 6 4 -Max: 10 10 -Total: 6.1x107 -Active: 9.5x106

2 10 Zipf(1.6e+05, 0.60) Sampling Zipf(1e+06, 0.86)

Number of peers (leechers,seeds) peers of Number 0 GZipf(1.5e+05, 0.08, 1e-06, 0.86) artifact? 10 0 2 4 6 10 10 10 10 Torrent rank (r)

10 Sampling and Exponential cutoff

• Instantaneous popularity 15 Sept 2008, 8pm GMT • 2.93x106 samples from Double-Zipf in two ways • PropTor (discover torrent proportional to its popularity) • UnifTor (discover torrent uniform at random) 6 10 Double-Zipf fit Measured PropTor 4 UnifTor 10 PropTor 7 sampling Total: 4.23x10 2 10 introduces

7 Number of peers Total: 4.02x10 exponential PMCC=0.99 cutoff 0 10 0 2 4 6 10 10 10 10 Torrent rank (r) 11 Download Popularity

• Download popularity over 4 and 48 weeks • Active: 2.29x106 and 7.17x106 torrents 10 10

8 10

6 10

4 10

2 10 Measured 13 Oct.2008

Measured 17 Aug.2009 0 10

Number Number of downloads (15 Sept.2008 to) 0 2 4 6 8 10 10 10 10 10 Torrent rank (r) 12 Power-law vs. Exponential cutoff

• Download popularity over 4 and 48 weeks • Active: 2.29x106 and 7.17x106 torrents 10 10 Measured 17 Aug.2009 Zipf(5e+07, 0.50) 8 10 Zipf(5e+08, 0.95) GZipf(3.4e+07,0.06,1.1e-06,0.95)

6 10

4 4 weeks 10 48 weeks

Double power- Measured 13 Oct.2008 Double power- 2 law hypothesis: 10 Zipf(1.3e+07, 0.35) law hypothesis: -Active: 1.77x107 Zipf(5e+08, 1.20) -Active: 1.43x109 GZipf(8.4e+06,0.033,1.5e-06,1.20) 0 10

Number of downloads (15 Sept.2008 to) 0 2 4 6 8 10 10 10 10 10 Torrent rank (r) 13 Sampling and Exponential cutoff

• Download popularity over 4 weeks (15 Sept.2008-13 Oct.2008) • 2.29x106 samples from Double-Zipf in two ways • PropTor (discover torrent proportional to its popularity)

• 8 UnifTor (discover torrent uniform at random) 10 Double-Zipf fit Measured 6 PropTor 10 UnifTor

4 PropTor 10 9 sampling Total: 1.31x10 introduces 9 2 Total: 1.21x10 10 exponential Number Number of downloads PMCC=0.99 cutoff

0 10 0 2 4 6 8 10 10 10 10 10 Torrent rank (r) 14 Impact of Sampling

Large torrents • Instantaneous popularity 15 Sept 2008, 8pm GMT overrepresented • 2.93x106 active torrents, 4.23x107 total peers

6 • sampled in 5 ways 10 Original PirateBay, PropTor, Mininova UnifTor: PirateBay 6.55x105 torrents 4 PropTor 10 UnifTor PropPeer: 4.23x105 PropPeer peers (1% of total)

2 Mininova: 9.7x105 10

Number of peers of Number torrents Heavy-tailed 0 10 0 2 4 6 10 10 10 10 Torrent rank (r) 15 Impact of Sampling

Large torrents • Download popularity over 4 weeks overrepresented • 2.29x106 active torrents, 1.31x109 total downloads • Sampled in 5 ways 8 10 Original PirateBay, PropTor, Mininova UnifTor: 6 PirateBay 6 10 1.69x10 torrents PropTor UnifTor Mininova: 4.95x105 4 PropPeer active torrents 10 PropPeer: 1.31x106 peers (0.1% of total) 2 10 Number of downloads of Number Heavy-tailed 0 10 0 2 4 6 10 10 10 10 Torrent rank (r) 16 Summary

• Large measurement study of P2P content popularity • Instantaneous popularity • Download popularity

• Instantaneous popularity • Power-law head?, power-law trunk • Tail may be power-law • Download popularity • Flat head, power-law trunk • Tail may be power-law for short periods • Not power-law for long periods

• Sampling and measured characteristics • Infer with care

17 Power-law revisited: A large scale measurement study of P2P content popularity

György Dán Niklas Carlsson School of Electrical Engineering Department of Computer Science KTH, Royal Institute of Technology University of Calgary Stockholm, Sweden Calgary, Canada

27 April 2010, IPTPS San Jose, CA

18