BitTorrent Chao Zhang, Prithula Dhungel, Di Wu, Zhengye Liu and Keith W. Ross

Woonhak Kang 2010. 11. 04 VLDB Lab. [email protected] Contents

• Introduction • BitTorrent (Background) § Architecture and Term. § Public and Private torrent sites • Overview of BitTorrent Darknets Operation • Analysis § Macroscopic § Medium-scopic § Microscopic • Conclusion

2 SKKU VLDB Lab. Introduction

§ 비공개 토런트 사이트(private torrent sites) § 가입자에게만 공개 § 초대(inviatation), 사이트 임시 가입기간에 가입 § 사용자의 upload, download 크기를 기록 - up/down 비율을 통해 사용자의 이용제한 - up/down 비율이 높은 유저에게 혜택 • Motivation § 연구분야에서 큰 주목을 받지 못했다. § 독특한 정책 때문에 공개 토런트와 특성이 다르다. § 토런트 전체 시스템의 이해를 위해서는 공개/비공개 모두를 고려할 필요 가 있다

3 SKKU VLDB Lab. Introduction

• Analysis § Macroscopic - 800개 이상의 비공개 토런트 분석 - Sharky list 와 Alexa rank 이용 - 전체 토런트 파일, 유저, 피어(peer) 정보 분석 § Medium-scopic - 4개의 인기 비공개 토런트 분석 - 트랙커(trackers), 피어(peer), 유저, 실제 공유파일 분석 - 공개 사이트와 비공개 사이트간의 상관관계 § Microscopic - HDChina 분석 - 유저의 up/down 기록, 활동시간 조사

4 SKKU VLDB Lab. Contents

• Introduction • BitTorrent (Background) § Architecture and Term. § Public and Private torrent sites • Overview of BitTorrent Darknets Operation • Analysis § Macroscopic § Medium-scopic § Microscopic • Conclusion

5 SKKU VLDB Lab. BitTorrent (Background)

• Bittorrent is a system for efficient and scalable replication of large amounts of static data § Scalable - the throughput increases with the number of downloaders § Efficient - it utilises a large amount of available network bandwidth • The file to be distributed is split up in pieces and an SHA-1 hash is calculated for each piece

6 SKKU VLDB Lab. BitTorrent (Background)

• A metadata file (.torrent) is distributed to all peers § Usually via HTTP • The metadata contains: § The SHA-1 hashes of all pieces § A mapping of the pieces to files § trackers reference

7 SKKU VLDB Lab. BitTorrent (Background)

• The tracker is a central server keeping a list of all peers participating in the swarm • A swarm is the set of peers that are participating in distributing the same files • A peer joins a swarm by asking the tracker for a peer list and connects to those peers

출처 : An introduction to the BitTorrent Peer-to-Peer File- System, J.A. Pouwelse et al.

8 SKKU VLDB Lab. BitTorrent (Background)

§ Private vs Public § Private flag set to 1 - DHT, PEX 활성화 결정

9 SKKU VLDB Lab. BitTorrent (Background)

10 SKKU VLDB Lab. Contents

• Introduction • BitTorrent (Background) § Architecture and Term. § Public and Private torrent sites • Overview of BitTorrent Darknets Operation • Analysis § Macroscopic § Medium-scopic § Microscopic • Conclusion

11 SKKU VLDB Lab. Overview of BitTorrent Darknets Operation

• Darknet owner § web site and tracker • User § register web site and get “pass key” § Invitation system § Tracker Checker, BTRACS • Incentive policies § Ratio incentive § Enforce minimum ratio

12 SKKU VLDB Lab. Contents

• Introduction • BitTorrent (Background) § Architecture and Term. § Public and Private torrent sites • Overview of BitTorrent Darknets Operation • Analysis § Macroscopic § Medium-scopic § Microscopic • Conclusion

13 SKKU VLDB Lab. Analysis

• Macroscopic § Rough idea about - How many BitTorrent Darknets - How many files being shared - How many users participate - Where the Trackers are located - Where the users of the darknets are located • Methodology § Find darknets : sharky list § Alexa rank § crawler

14 SKKU VLDB Lab. Analysis

• Sharky list § 900+ darknets in June, 2009 § 963, today • Create list § Tracker checker § blogs and forums § Google search § IRC invite channel

15 SKKU VLDB Lab. Analysis

• In this paper § manually checking only operational sites § 863 private sites • Category analysis § 55% General

16 SKKU VLDB Lab. Analysis

• Geographic distribution § Using MaxMind GeoIP § Europe(Leading )

17 SKKU VLDB Lab. Analysis

• Which site most popular? § Using Alexa’s rank § Alexa rank - present usage statisitcs § Pick 15 most popular darknets - 6 of them locate in netherlands - 1 china

18 SKKU VLDB Lab. Analysis

• Top site – Torrents.ru § 612,000 torrents § 3.5 million user account • Usage by country § where is the netherlands?

19 SKKU VLDB Lab. Analysis

• Total estimation § Regression analysis btw. Alexa rank and Darknets § Obtained 33 private sites (out of 67 sites) § Manually gather statistics from the sites (some of it partial stat.) § # of torrent : 0.84 § # of account : 0.81 § # of peers : 0.89

20 SKKU VLDB Lab. Analysis

• Total estimation § Obtain correlation eq. (X is alexa rank)

§ yt = torrents

§ ya = account

§ yp = peers • Aggregate total estimation using eq

21 SKKU VLDB Lab. Analysis

• Privates vs Public § Public : top 5 public torrent site (, Pirate Bay, Torrent Reactor, Btmonster, and torrent portal) § Collect - 8.8 million .torrent files(4.6 million unique info hashes) - 38,996 trackers § Observe - 5,085,217 unique peers • Summary § Darknets - Private world is comparable to the public site - 4.4 million torrent vs 4.6 info hashes - Active peers larger than that of the public sites

22 SKKU VLDB Lab. Analysis

• Medium-scopic § 4 sites - Torrents.ru, Zamunda, BitSoup, HDChina - Use only one tracker - From April 11, 2009 to June 13, 2009 crawling - Zamunda, BitSoup, HDChina - Torrents.ru is private flag set to 0, Using DHT - Active torrent : has at least one active peers

23 SKKU VLDB Lab. Analysis

• Overlap with the Public ecosystem § Infohash based - has same infohash (SHA-1) § Piece-based - Because of private flag, different infohash - Alternative - matching each pieces’ hash - Better than infohash matching system

24 SKKU VLDB Lab. Analysis

• Overlap with the Public ecosystem § Infohash based § Piece-based § Comparably low overlap ratio btw. each darknets

25 SKKU VLDB Lab. Analysis

• Overlap btw. public sites § more than 50 %

26 SKKU VLDB Lab. Analysis

• Title match, extended match § Title match - Same file has same title - eg. Ghost Ship. HDDVD.1080p.DTS.x264-CtrlHD - Title, Source Media, Resolution, Codec, release team and so on § Extended match - Title match + the same file size(within 5%) - Same file but different hash set - encode rate, different language § Methodology - top-100, random 100 - do TM, EM check

27 SKKU VLDB Lab. Analysis

• Leakage with the Public ecosystem § IPs have leaked into a DHT - If we know these IPs, don’t need to register private site § Methodology - Develop DHT crawler - crawl the DHT system for all the infohashes obtained from private sites - Low leakage rate except torrents.ru

28 SKKU VLDB Lab. Analysis

• Characteristics of private torrents § Newly released torrents, attract more peer § Decay of private sites much less - Because of purging policy - Remove unpopular

29 SKKU VLDB Lab. Analysis

• Characteristics of private torrents § Average torrent age on private site smaller § Because of purging policy - Old, unpopular removed by administrator § Rank - have a longer tail

30 SKKU VLDB Lab. Analysis

• Microscopic § HDChina - HD Movies and TV series - 18,054 user account - 15,738 active torrents - 10GB, 0.3 up/down ratio - 100GB, 0.7 up/down - pased for all user data in HDChina

31 SKKU VLDB Lab. Analysis

• Microscopic § Up/down rate - incentive policy - Total up/down - 17,054 TB/2,568TB - Many users upload more than 1TB

32 SKKU VLDB Lab. Analysis

• Microscopic § rate (up/down) - more than 90% ratio higher than 1 - less than 5% higher than 100

33 SKKU VLDB Lab. Analysis

• Microscopic § Online time - 50% users return within 10 hours - 95 % users return within 100 hours

34 SKKU VLDB Lab. Contents

• Introduction • BitTorrent (Background) § Architecture and Term. § Public and Private torrent sites • Overview of BitTorrent Darknets Operation • Analysis § Macroscopic § Medium-scopic § Microscopic • Conclusion

35 SKKU VLDB Lab. Conclusion

• Investigate 800+ private torrent sites § In terms of geographic concentrations and content distributions § Using sharky’s list and alexa rank - present informative view of darknets landscape - regression analysis - give us estimation - Private torrent sites - are relatively small but aggregation size of the darknets is large • Popular torrent sites § private sites vs. public sites § Overlap, leakage

36 SKKU VLDB Lab. QnA

• Any question?

37 SKKU VLDB Lab.