<<

Distribung a Large File

F bits

d4 Peer-to-Peer File rate u s Mike Freedman d d3 COS 461: Networks 1 d hp://www.cs.princeton.edu/courses/archive/spr14/cos461/ 2 rates di

2

Server Distribung a Large File Speeding Up the File Distribuon • Sending an F-bit file to N receivers • Increase the server upload rate

– Transming NF bits at rate us – Higher link bandwidth at the server

– … takes at least NF/us me – Mulple servers, each with their own link • Receiving the at the slowest receiver • Alternave: have the receivers help

– Slowest receiver has download rate dmin= mini{di} – Receivers get a copy of the data

– … takes at least F/dmin me – … and redistribute to other receivers – To reduce the burden on the server • Download me: max{NF/us , F/dmin}

3 4

1 Peers Help Distribung a Large File Peers Help Distribung a Large File • Components of distribuon latency

F bits – Server must send each bit: min me F/us d4 – Slowest peer must receive each bit: min me F/dmin

u4 • Upload me using all upload resources upload rate us Internet – Total number of bits: NF d – Total upload bandwidth us + sumi(ui) d1 3 u • Total: max{F/u , F/d , NF/(u +sum (u ))} u1 u2 3 s min s i i d2

upload rates ui download rates d 5 i 6

Peer-to-Peer is Self-Scaling Locang the Relevant Peers • Download me grows slowly with N • Three main approaches

-server: max{NF/u s, F/dmin} – Central ()

– Peer-to-peer: max{F/us , F/dmin , NF/(us+sumi(ui))} – Query flooding () – Hierarchical overlay (, modern Gnutella) • But… – Peers may come and go • Design goals – Peers need to find each other – Scalability – Peers need to be willing to help each other – Simplicity – Robustness – Plausible deniability

7 8

2 Peer-to-Peer Networks: Napster Napster Directory Service

• Napster history: the rise • Napster history: the fall • Client contacts Napster (via TCP) – 1/99: Napster version 1.0 – Mid 2001: out of – Provides a list of files it will – 5/99: company founded business due to lawsuits – … and Napster’s central server updates the directory – Mid 2001: dozens of – 12/99: first lawsuits • Client searches on a tle or performer – 2000: 80 million users decentralized P2P alternaves – Napster idenfies online clients with the file – 2003: growth of pay – … and provides their IP addresses services like iTunes • Client requests the file from the chosen supplier – Supplier transmits the file to the client , – Both client and supplier report status to Napster Northeastern freshman 9 10

Napster Properes Napster: Limitaons of Directory • Server’s directory connually updated • is decentralized, but locang – Always know what music is currently available content is highly centralized – Point of vulnerability for legal acon – Single point of failure – Performance boleneck • Peer-to-peer file transfer – infringement – No load on the server – Plausible deniability for legal acon (but not enough) • So, later P2P systems were more distributed • Bandwidth – Gnutella went to the other extreme… – Suppliers ranked by apparent bandwidth and response me

11 12

3 14 Peer-to-Peer Networks: Gnutella Gnutella: Search by Flooding • Gnutella history • Query flooding – 2000: J. Frankel & – Join: contact a few nodes T. Pepper released to become neighbors Gnutella – Publish: no need! – Soon aer: many other – Search: ask neighbors, who search clients (e.g., , ask their neighbors xyz. Limewire, ) – Fetch: get file directly from xyz.mp3 ? – 2001: protocol another enhancements, e.g., Flooding “ultrapeers”

13

15 16 Gnutella: Search by Flooding Gnutella: Search by Flooding

xyz.mp3 xyz.mp3 ? transfer Flooding search

4 18 Gnutella: Pros and Cons Lessons and Limitaons

• Advantages • Client-Server performs well – Fully decentralized – But not always feasible: Performance not oen key issue! – Search cost distributed For the following, you should choose a system that’s – Processing per node permits powerful search (A) Flood-based (B) DHT-based (C) Either (D) Neither semancs – Organic scaling • Disadvantages – Decentralizaon of visibility and liability – Finding popular stuff – Search scope may be quite large – Finding unpopular stuff – Search me may be quite long – Fancy local queries – High overhead, and nodes come and go oen – Fancy distributed queries – Prevent data poisoning – Performance guarantees 17

19 20 Lessons and Limitaons Peer-to-Peer Networks: KaAzA

• Client-Server performs well • KaZaA history • Super-node hierarchy – But not always feasible: Performance not oen key issue! – 2001: created by Dutch – Join: on start, the client For the following, you should choose a system that’s company (Kazaa BV) contacts a super-node (A) Flood-based (B) DHT-based (C) Both (D) Neither – Single network called – Publish: client sends list – Organic scaling: C FastTrack used by other of files to its super-node – Decentralizaon of visibility and liability: C clients as well – Search: queries flooded – Finding popular stuff: A (maybe C) – Eventually protocol among super-nodes – Finding unpopular stuff: B changed so others – – Fetch: get file directly Fancy local queries: A could no longer use it – Fancy distributed queries: D from one or more peers – Prevent data poisoning: B (depends on query interface) – Performance guarantees: B 20

5 21 “Ultra/super peers” in KaZaA: Movaon for Super-Nodes KaZaA and later Gnutella • Query consolidaon – Many connected nodes may have only a few files – Propagang query to a sub-node may take more me than for the super-node to answer itself • Stability – Super-node selecon favors nodes with high up-me – How long you’ve been on is a good predictor of how long you’ll be around in the future

22

Peer-to-Peer Networks: BitTorrent: Simultaneous • BitTorrent history • Divide file into many chunks (e.g., 256 KB) – 2002: B. Cohen debuted BitTorrent – Replicate different chunks on different peers – Peers can trade chunks with other peers • Emphasis on efficient fetching, not searching – Peer can (hopefully) assemble the enre file – Distribute same file to many peers – Single publisher, many downloaders • Allows simultaneous downloading – Retrieving different chunks from different peers • Prevenng free-loading – And uploading chunks to peers – Incenves for peers to contribute – Important for very large files

23 24

6 BitTorrent: Tracker BitTorrent: Overall Architecture

• Infrastructure node Web Server Tracker

Web page – Keeps track of peers parcipang in the torrent with link to .torrent – Peers registers with the tracker when it arrives • Tracker selects peers for downloading

.torrent – Returns a random set of peer IP addresses – So the new peer knows who to contact for data C A • Can have “trackerless” system Peer Peer [Seed] B – Using distributed hash tables (DHTs) [Leech] Downloader Peer

25 26 “US” [Leech]

BitTorrent: Overall Architecture BitTorrent: Overall Architecture

Web Server Tracker Web Server Tracker

Web page Web page with link with link to .torrent to .torrent

Get-announce

Response-peer list C C A A Peer Peer Peer [Seed] Peer [Seed] B B [Leech] [Leech] Downloader Peer Downloader Peer

27 “US” [Leech] 28 “US” [Leech]

7 BitTorrent: Overall Architecture BitTorrent: Overall Architecture

Web Server Tracker Web Server Tracker

Web page Web page with link with link to .torrent to .torrent

Shake-hand pieces C C A A pieces Shake-hand Peer Peer Peer [Seed] Peer [Seed] B B [Leech] [Leech] Downloader Peer Downloader Peer

29 “US” [Leech] 30 “US” [Leech]

BitTorrent: Overall Architecture BitTorrent: Overall Architecture

Web Server Tracker Web Server Tracker

Web page Web page with link with link to .torrent to .torrent

Get-announce

pieces Response-peer listpieces C C A pieces A pieces Peer Peer pieces pieces Peer [Seed] Peer [Seed] B B [Leech] [Leech] Downloader Peer Downloader Peer

31 “US” [Leech] 32 “US” [Leech]

8 BitTorrent: Chunk Request Order BitTorrent: Rarest Chunk First • Which chunks to request? • Which chunks to request first? – Could download in order – Chunk with fewest available copies (i.e., rarest chunk) – Like an HTTP client does • Benefits to the peer • Problem: many peers have the early chunks – Avoid starvaon when some peers depart – Peers have lile to share with each other – Liming the scalability of the system • Benefits to the system • Problem: eventually nobody has rare chunks – Avoid starvaon across all peers wanng a file – E.g., the chunks need the end of the file – Balance load by equalizing # of copies of chunks – Liming the ability to complete a download • Soluons: random selecon and rarest first

33 34

Free-Riding in P2P Networks Bit-Torrent: Prevenng Free-Riding • Vast majority of users are free-riders • Peer has limited upload bandwidth – Most share no files and answer no queries – And must share it among mulple peers – Others limit # of connecons or upload speed – Tit-for-tat: favor neighbors uploading at highest rate • • A few “peers” essenally act as servers Rewarding the top four neighbors – Measure download bit rates from each neighbor – A few individuals contribung to the – Reciprocate by sending to the top four peers – Making them hubs that basically act as a server • Opmisc unchoking • BitTorrent prevent free riding – Randomly try a new neighbor every 30 seconds – Allow the fastest peers to download from you – So new neighbor has a chance to be a beer partner – Occasionally let some free loaders download

35 36

9 BitTyrant: Gaming BitTorrent Conclusions • BitTorrent can be gamed, too • Finding the appropriate peers – Peer to top N peers at rate 1/N – Centralized directory (Napster) – E.g., if N=4 and peers upload at 15, 12, 10, 9, 8, 3 – Query flooding (Gnutella) – … peer uploading at rate 9 gets treated quite well – Super-nodes (KaZaA) • Best to be the Nth peer in the list, rather than 1st • BitTorrent – Offer just a bit more bandwidth than low-rate peers – Distributed download of large files – And you’ll sll be treated well by others – An-free-riding techniques

• BitTyrant soware hp://biyrant.cs.washington.edu/ • Great example of how change can happen so – Uploads at higher rates to higher-bandwidth peers quickly in applicaon-level protocols

37 38

10