Peer-to-Peer

Credits: Slides adapted from J. Pang, B. Richardson, I. Stoica, M. Cuenca, B. Cohen, S. Ramesh, D. Qui, R. Shrikant, Xiaoyan Li and R. Martin. Reminders

• HW1 due next class (9/14) • PA1 to be assigned next class. Intro

• Quickly grown in popularity – Hundreds of file sharing applications – 35 million American adults use P2P networks -- 29% of all Internet users in US! – Audio/Video transfer now dominates traffic on the Internet • But what is P2P? – Searching or location? -- DNS, Google! – Computers “Peering”? -- Server Clusters, IRC Networks, Internet Routing! – Clients with no servers? -- Doom, Quake! Intro (2)

• Fundamental difference: Take advantage of resources at the edges of the network • What’s changed: – End-host resources have increased dramatically – Broadband connectivity now common • What hasn’t: – Deploying infrastructure still expensive Overview • Centralized Database – • Query Flooding – • Intelligent Query Flooding – KaZaA • Swarming – BitTorrent The Lookup Problem

N2 N1 N3

Key=“title” Internet Value=MP3 data… ? Client Publisher Lookup(“title”)

N4 N6 N5 The Lookup Problem (2)

• Common Primitives: – Join : how to I begin participating? – Publish : how do I advertise my file? – Search : how to I find a file? – Fetch : how to I retrieve a file? Next Topic... • Centralized Database – Napster • Query Flooding – Gnutella • Intelligent Query Flooding – KaZaA • Swarming – BitTorrent Napster: History

• In 1999, S. Fanning launches Napster • Peaked at 1.5 million simultaneous users • Jul 2001, Napster shuts down Napster: Overiew

• Centralized Database: – Join : on startup, client contacts central server – Publish : reports list of files to central server – Search : query the server => return someone that stores the requested file – Fetch : get the file directly from peer Napster: Publish

insert(X, 123.2.21.23) ...

Publish

I have X, Y, and Z! 123.2.21.23 Napster: Search 123.2.0.18

search(A) --> Fetch 123.2.0.18

Query Reply

Where is file A? Napster: Discussion

• Pros: – Simple – Search scope is O(1) – Controllable (pro or con?) • Cons: – Server maintains O(N) State – Server does all processing – Single point of failure Next Topic... • Centralized Database – Napster • Query Flooding – Gnutella • Intelligent Query Flooding – KaZaA • Swarming – BitTorrent Gnutella: History

• In 2000, J. Frankel and T. Pepper from Nullsoft released Gnutella • Soon many other clients: Bearshare, Morpheus, LimeWire, etc. • In 2001, many protocol enhancements including “ultrapeers” Gnutella: Overview

• Query Flooding: – Join : on startup, client contacts a few other nodes; these become its “neighbors” – Publish : no need – Search : ask neighbors, who as their neighbors, and so on... when/if found, reply to sender. – Fetch : get the file directly from peer Gnutella: Search I have file A. I have file A.

Reply

Query

Where is file A? Gnutella: Discussion

• Pros: – Fully de-centralized – Search cost distributed • Cons: – Search scope is O( N) – Search time is O(???) – Nodes leave often, network unstable Aside: Search Time? Aside: All Peers Equal?

1.5Mbps DSL 1.5Mbps DSL

QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. 56kbps Modem 1.5Mbps DSL

10Mbps LAN

1.5Mbps DSL 56kbps Modem 56kbps Modem Next Topic...

• Centralized Database – Napster • Query Flooding – Gnutella • Intelligent Query Flooding – KaZaA • Swarming – BitTorrent • Unstructured Overlay Routing – • Structured Overlay Routing – Distributed Hash Tables KaZaA: History

• In 2001, KaZaA created by Dutch company Kazaa BV • Single network called FastTrack used by other clients as well: Morpheus, giFT, etc. • Eventually protocol changed so other clients could no longer talk to it KaZaA: Overview

• “Smart” Query Flooding: – Join : on startup, client contacts a “supernode” ... may at some point become one itself – Publish : send list of files to supernode – Search : send query to supernode, supernodes flood query amongst themselves. – Fetch : get the file directly from peer(s); can fetch simultaneously from multiple peers KaZaA: Network Design “Super Nodes” KaZaA: File Insert

insert(X, 123.2.21.23) ...

Publish

I have X!

123.2.21.23 KaZaA: File Search search(A) --> 123.2.22.50

search(A) 123.2.22.50 --> Query Replies 123.2.0.18

Where is file A?

123.2.0.18 KaZaA: Fetching

• More than one node may have requested file... • How to tell? – Must be able to distinguish identical files – Not necessarily same filename – Same filename not necessarily same file... • Use Hash of file – KaZaA uses UUHash: (MD5+CRC32 checksum of staggered blocks)---fast, but not secure – Alternatives: full MD5, SHA-1 • How to fetch? – Get bytes [0..1000] from A, [1001...2000] from B – Alternative: Erasure Codes for greater redundancy. KaZaA: Discussion

• Pros: – Tries to take into account node heterogeneity: • Bandwidth • Host Computational Resources • Host Availability (?) – Rumored to take into account network locality (latency) • Cons: – Mechanisms easy to circumvent – Still no real guarantees on search scope or search time – Easy to intentionally corrupt content. Next Topic... • Centralized Database – Napster • Query Flooding – Gnutella • Intelligent Query Flooding – KaZaA • Swarming – BitTorrent BitTorrent: History

• In 2002, B. Cohen debuted BitTorrent • Key Motivation: – Popularity exhibits temporal locality (Flash Crowds) – E.g., Slashdot effect, CNN on 9/11, new movie/game release • Focused on Efficient Fetching , not Searching : – Distribute the same file to all peers – Single publisher, multiple downloaders • Has some “real” publishers: – Blizzard Entertainment using it to distribute the beta of their new game - May 2006 deal with Warner Brothers for content distribution. BitTorrent Working BitTorrent: Publish/Join Tracker/Seed BitTorrent: Fetch BitTorrent: Overview

• Swarming: – Join : contact centralized “tracker” server, get a list of peers. – Publish : Run a tracker server. – Search : Out-of-band. E.g., use Google to find a tracker for the file you want. – Fetch : Download chunks of the file from your peers. Upload chunks you have to them. Algorithms/Features in BT • Pipelining of requests – to avoid delay in pieces being sent • Piece Selection – Strict priority [within sub-pieces] – Rarest first – Random First piece – Endgame mode [for request of final subpieces] • Choking, Optimistic unchoking, Anti- snubbing, Upload only Free-Riding with Optimistic Unchoking • If a peer wants to free-ride in BT, the maximum download rate it will get is

1/(n u + 1) of the maximum rate • This is also a very powerful result showing why BT performs so much better when compared to other P2P networks like Gnutella and Kazaa. Notes on BitTorrent • The BT network graph is random. Other ‘clever’ peer assignment algorithms have resulted only in network partitions • Single point of failure (tracker) • Lots of magic numbers – how many peers should a peer upload to? – how long should a peer wait in optimistic unchoke before moving on to other peers? – how many requests should be pipelined? • “Starting multiple torrents at once is a bad idea” - Bram BitTorrent: Sharing Strategy

• Employ “Tit-for-tat” sharing strategy – “I’ll share with you if you share with me” – Be optimistic: occasionally let freeloaders download • Otherwise no one would ever start! • Also allows you to discover better peers to download from when they reciprocate – Meant to counter Prisoner’s Dilemma problem. • Approximates Pareto Efficiency – Game Theory: “No change can make anyone better off without making others worse off” BitTorrent: Summary

• Pros: – Works reasonably well in practice – Gives peers incentive to share resources; avoids freeloaders • Cons: – Pareto Efficiency relative weak condition – Central tracker server needed to bootstrap swarm (is this really necessary?) P2P: Summary

• Many different styles; remember pros and cons of each – centralized, flooding, swarming

Lessons learned: – Single points of failure are very bad – Flooding messages to everyone is bad – Underlying network topology is important – Not all nodes are equal – Need incentives to discourage freeloading – Privacy and security are important Demo of using Bittorent client against a .torrent file on http://www.legaltorrents.com/

and

Video of Interview with Bittorrent’s Creator Bram Cohen http://24x7.com/blog/vlogs/bram_vlog.mov Extra Slides Only for Pirates and sharing home video?

• Legal downloads tripled in the first half of 2005, reaching 186 million. • In May 2006, Warner Brother announced it would use bittorrent to distribute pay-for content. $1 per tv show $4 to $5 per full length movie. • Claims “[Users] will be prevented from copying and distributing files they purchase through two mechanisms: one that requires them to enter a password before watching a file, and another that allows the file to be viewed only on the computer to which it was downloaded." Allowing p2p a liability for ISPs?

From TMCnet News: Unauthorized p2p file sharing banned in Spain. “It's a criminal offense for ISPs to facilitate unauthorized downloading.”

“Spain's telco giant Telefonica reports 90% of usage on its broadband lines is Internet traffic, up from 15% five years ago. Of that 90%, a massive 71% is P2P traffic.” What can ISPs do?

• tcp/udp port filtering? • Deep Packet Inspection (DPI): “From lightreading.com: Network operators worldwide spent US$96.8m on DPI in 2005, but the sector is poised to grow by more than 75 per cent this year, to about US$170m, and top US$586m in 2010.” KaZaA: Usage Patterns

• KaZaA is more than one workload! – Many files < 10MB (e.g., Audio Files) – Many files > 100MB (e.g., Movies)

from Gummadi et al ., SOSP 2003 KaZaA: Usage Patterns (2)

• What we saw: – A few big files consume most of the bandwidth – Many files are fetched once per client but still very popular • Solution? – Caching!

from Gummadi et al ., SOSP 2003