Scaling Problem • Millions of clients ! server and network meltdown Peer-to-Peer

15-441

2

P2P System Why p2p?

• Scaling: Create system whose capacity grows with # of clients - automatically! • Self-managing – This aspect attractive for corporate/datacenter needs – e.g., Amazon!s 100,000-ish machines, google!s 300k+ • Harness lots of “spare” capacity at end-hosts • Eliminate centralization • Leverage the resources of client machines (peers) – Robust to failures, etc. – Computation, storage, bandwidth – Robust to censorship, politics & legislation??

3 4 Today!s Goal Outline

• p2p is hot. • p2p file sharing techniques • There are tons and tons of instances – Downloading: Whole-file vs. chunks – Searching • But that!s not the point • Centralized index (, etc.) • Flooding (, etc.) • Smarter flooding (KaZaA, …) • Identify fundamental techniques useful in p2p • Routing (, etc.) settings • Uses of p2p - what works well, what doesn!t? • Understand the challenges – servers vs. arbitrary nodes • Look at the (current!) boundaries of where 2p – Hard state (backups!) vs soft-state (caches)

is particularly useful 5 • Challenges 6

Searching & Fetching Searching

Human: N2 “I want to watch that great 80s cult classic N1 N3 "Better Off Dead!”

1.Search: Key=“title” Internet Value=MP3 data… ? “better off dead” -> better_off_dead.mov Client Publisher or -> 0x539fba83ajdeadbeef Lookup(“title”) N4 N6 2.Locate sources of better_off_dead.mov N5 3.Download the file from them

7 8 Search Approaches Different types of searches

• Centralized • Needles vs. Haystacks • Flooding – Searching for top 40, or an obscure punk • A hybrid: Flooding between track from 1981 that nobody!s heard of? “Supernodes” • Search expressiveness • Structured – Whole word? Regular expressions? File names? Attributes? Whole-text search? • (e.g., p2p gnutella or p2p google?)

9 10

Framework Centralized

• Common Primitives: • Centralized Database: – Join: how to I begin participating? – Join: on startup, client contacts central server – Publish: how do I advertise my file? – Publish: reports list of files to central – Search: how to I find a file? server – Fetch: how to I retrieve a file? – Search: query the server => return node(s) that store the requested file

11 12 Napster Example: Publish Napster: Search 123.2.0.18

insert(X, search(A) 123.2.21.23) --> ... Fetch 123.2.0.18

Publish Query Reply

I have X, Y, and Z! Where is file A? 123.2.21.23

13 14

Napster: Discussion Query Flooding

• Pros: • Join: Must join a flooding network – Simple – Usually, establish peering with a few existing nodes – Search scope is O(1) for even complex searches (one index, etc.) • Publish: no need, just reply • Search: ask neighbors, who ask their – Controllable (pro or con?) neighbors, and so on... when/if found, • Cons: reply to sender. – Server maintains O(N) State – TTL limits propagation – Server does all processing

– Single point of failure 15 16 • Technical failures + legal (napster shut down Example: Gnutella Flooding: Discussion I have file A. I have file A. • Pros: – Fully de-centralized – Search cost distributed Reply – Processing @ each node permits powerful search semantics • Cons: – Search scope is O(N) – Search time is O(???) Query – Nodes leave often, network unstable • TTL-limited search works well for haystacks. Where is file A? – For scalability, does NOT search every node. May have 17 to re-issue query later 18

Supernode Flooding Supernode Network Design “Super Nodes” • Join: on startup, client contacts a “supernode” ... may at some point become one itself

• Publish: send list of files to supernode

• Search: send query to supernode, supernodes flood query amongst themselves. – Supernode network just like prior flooding net

19 20 Supernode: File Insert Supernode: File Search search(A) --> 123.2.22.50 insert(X, 123.2.21.23) ... search(A) 123.2.22.50 --> Query Replies Publish 123.2.0.18

I have X! Where is file A? 123.2.21.23 123.2.0.18

21 22

Supernode: Which nodes? Stability and Superpeers

• Often, bias towards nodes with good: • Why superpeers? – Bandwidth – Query consolidation – Computational Resources • Many connected nodes may have only a few files – Availability! • Propagating a query to a sub-node would take more b/w than answering it yourself – Caching effect • Requires network stability • Superpeer selection is time-based – How long you!ve been on is a good predictor of how long you!ll be around.

23 24 Structured Search: Superpeer results Distributed Hash Tables • Basically, “just better” than flood to all • Academic answer to p2p • Goals • Gets an order of magnitude or two – Guatanteed lookup success better scaling – Provable bounds on search time • But still fundamentally: o(search) * – Provable scalability o(per-node storage) = O(N) • Makes some things harder – Fuzzy queries / full-text search / etc. – central: O(1) search, O(N) storage • Read-write, not read-only – flood: O(N) search, O(1) storage • Hot Topic in networking since introduction in – Superpeer: can trade between ~2000/2001 25 26

Searching Wrap-Up DHT: Overview

• Abstraction: a distributed “hash-table” (DHT) Type O(search) storage Fuzzy? data structure: – put(id, item); Central O(1) O(N) Yes – item = get(id); Flood ~O(N) O(1) Yes • Implementation: nodes in system form a distributed data structure Super < O(N) > O(1) Yes – Can be Ring, Tree, Hypercube, Skip List, Butterfly Network, ... Structured O(log N) O(log N) not really

27 28 DHT: Overview (2) DHT: Example - Chord

• Structured Overlay Routing: • Associate to each node and file a unique id in – Join: On startup, contact a “bootstrap” node and integrate an uni-dimensional space (a Ring) yourself into the distributed data structure; get a node id – E.g., pick from the range [0...2m] – Publish: Route publication for file id toward a close node id along the data structure – Usually the hash of the file or IP address – Search: Route a query for file id toward a close node id. • Properties: Data structure guarantees that query will meet the publication. – Routing table size is O(log N) , where N is the total number of nodes – Important difference: get(key) is for an exact match on key! – Guarantees that a file is found in O(log N) hops • search(“spars”) will not find file(“briney spars”) • We can exploit this to be more efficient from MIT in 2001 29 30

DHT: Chord Basic Lookup DHT: Consistent Hashing N120 Key 5 K5 N10 Node 105 “Where is key 80?” N105 N105 K20

Circular ID space N32 “N90 has K80” N32

N90 K80 N90

K80 A key is stored at its successor: node with next higher ID N60

31 32 DHT: Chord “Finger Table” Node Join

1/4 1/2 • Compute ID • Use an existing node to route to that ID in 1/8 the ring.

1/16 1/32 – Finds s = successor(id) 1/64 1/128 • ask s for its predecessor, p N80 • Splice self into ring just like a linked list • Entry i in the finger table of node n is the first node that succeeds or – p->successor = me equals n + 2i • In other words, the ith finger points 1/2n-i way around the ring – me->successor = s

33 – me->predecessor = p 34

DHT: Chord Join DHT: Chord Join

• Assume an identifier space [0..8]

• Node n1 joins Succ. Table • Node n2 joins Succ. Table 0 i id+2i succ 0 i id+2i succ 7 1 0 2 1 7 1 0 2 2 1 3 1 1 3 1 2 5 1 2 5 1

6 2 6 2

Succ. Table

i 5 3 5 3 i id+2 succ 4 4 0 3 1 1 4 1 2 6 1

35 36 DHT: Chord Join DHT: Chord Join

Succ. Table Succ. Table Items i id+2i succ i id+2i succ 7 0 1 1 0 1 1 1 2 2 • Nodes: 1 2 2 2 4 0 n1, n2, n0, n6 2 4 0 • Nodes n0, n6 join Succ. Table 0 i id+2i succ 0 Succ. Table Items 0 2 2 i 7 1 • Items: 7 1 i id+2 succ 1 1 3 6 0 2 2 2 5 6 f7, f2 Succ. Table 1 3 6 2 5 6 i id+2i succ 0 7 0 6 2 Succ. Table 6 2 1 0 0 i 2 2 2 i id+2 succ 0 7 0 Succ. Table 1 0 0 Succ. Table i 2 2 2 i 5 3 i id+2 succ 5 3 i id+2 succ 4 0 3 6 4 0 3 6 1 4 6 1 4 6 2 6 6 2 6 6

37 38

DHT: Chord Routing DHT: Chord Summary Succ. Table Items i id+2i succ 7 • Upon receiving a query for 0 1 1 item id, a node: 1 2 2 • Routing table size? 2 4 0 • Checks whether stores the item locally –Log N fingers 0 Succ. Table • If not, forwards the query to Items 1 i the largest node in its 7 i id+2 succ 1 • Routing time? query(7) 0 2 2 successor table that does 1 3 6 not exceed id 2 5 6 –Each hop expects to 1/2 the distance to the Succ. Table 6 2 desired id => expect O(log N) hops. i id+2i succ 0 7 0 1 0 0 Succ. Table 2 2 2 i 5 3 i id+2 succ 4 0 3 6 1 4 6 2 6 6

39 40 The limits of search: DHT: Discussion A Peer-to-peer Google? • Pros: • Complex intersection queries (“the” + “who”) – Guaranteed Lookup – Billions of hits for each term alone – O(log N) per node state and search scope • Sophisticated ranking – Must compare many results before returning a • Cons: subset to user – This line used to say “not used.” But: • Very, very hard for a DHT / p2p system Now being used in a few apps, including – Need high inter-node bandwidth BitTorrent. – (This is exactly what Google does - massive – Supporting non-exact match search is clusters)

(quite!) hard 41 • But maybe many file sharing queries are okay...42

Fetching Data Chunk Fetching

• Once we know which node(s) have the • More than one node may have the file. data we want... • How to tell? – Must be able to distinguish identical files • Option 1: Fetch from a single peer – Not necessarily same filename – Problem: Have to fetch from peer who has – Same filename not necessarily same file... whole file. • Use hash of file – Common: MD5, SHA-1, etc. • Peers not useful sources until d/l whole file • How to fetch? • At which point they probably log off. :) – Get bytes [0..8000] from A, [8001...16000] from B – How can we fix this? – Alternative: Erasure Codes

43 44 BitTorrent: Overview BitTorrent

• Swarming: • Periodically get list of peers from tracker – Join: contact centralized “tracker” server, get a list of peers. • More often: – Publish: Run a tracker server. – Ask each peer for what chunks it has – Search: Out-of-band. E.g., use Google to find a • (Or have them update you) tracker for the file you want. • Request chunks from several peers at a – Fetch: Download chunks of the file from your peers. Upload chunks you have to them. time • Big differences from Napster: • Peers will start downloading from you – Chunk based downloading (sound familiar? :) • BT has some machinery to try to bias – “few large files” focus 45 46 – Anti-freeloading mechanisms towards helping those who help you

BitTorrent: Publish/Join BitTorrent: Fetch Tracker

47 48 BitTorrent: Summary Writable, persistent p2p

• Pros: • Do you trust your data to 100,000 monkeys? – Works reasonably well in practice • Node availability hurts – Gives peers incentive to share resources; avoids – Ex: Store 5 copies of data on different nodes freeloaders – When someone goes away, you must replicate the • Cons: data they held – Central tracker server needed to bootstrap swarm – Hard drives are *huge*, but cable modem upload – (Tracker is a design choice, not a requirement, as bandwidth is tiny - perhaps 10 Gbytes/day you know from your projects. Modern BitTorrent – Takes many days to upload contents of 200GB can also use a DHT to locate peers. But approach hard drive. Very expensive leave/replication still needs a “search” mechanism) situation!

49 50

What!s out there? P2P: Summary

Central Flood Super- Route • Many different styles; remember pros and cons of node each flood – centralized, flooding, swarming, unstructured and structured routing Whole Napster Gnutella Freenet • Lessons learned: File – Single points of failure are bad – Flooding messages to everyone is bad Chunk BitTorrent KaZaA DHTs – Underlying network topology is important – Not all nodes are equal Based (bytes, eDonkey – Need incentives to discourage freeloading not 2000 – Privacy and security are important chunks) – Structure can provide theoretical bounds and guarantees 51 52 KaZaA: Usage Patterns

• KaZaA is more than one workload! – Many files < 10MB Extra Slides (e.g., Audio Files) – Many files > 100MB (e.g., Movies)

from Gummadi et al., SOSP 2003 54

KaZaA: Usage Patterns (2) KaZaA: Usage Patterns (3)

• KaZaA is not Zipf! • What we saw: – FileSharing: – A few big files consume most of the bandwidth “Request-once” – Many files are fetched once per client but still very popular – Web: “Request- repeatedly” • Solution? – Caching!

from Gummadi et al., SOSP 2003 from Gummadi et al., SOSP 2003 55 56 Freenet: History Freenet: Overview

• In 1999, I. Clarke started the Freenet • Routed Queries: project – Join: on startup, client contacts a few other nodes it knows about; gets a unique node id • Basic Idea: – Publish: route file contents toward the file id. File – Employ Internet-like routing on the overlay is stored at node with id closest to file id network to publish and locate files – Search: route query for file id toward the closest node id • Addition goals: – Fetch: when query reaches a node containing file – Provide anonymity and security id, it returns the file to the sender – Make censorship difficult

57 58

Freenet: Routing Tables Freenet: Routing

• id – file identifier (e.g., hash of file) query(10) • next_hop – another node that stores the file id n1 n2 • file – file identified by id being stored on the local node id next_hop file 4 n1 f4 1 9 n3 f9 12 n2 f12 4’ … 5 n3 • Forwarding of query for file id 4 n4 n5 14 n5 f14 4 n1 f4 – If file id stored locally, then stop 2 5 • Forward data back to upstream requestor 13 n2 f13 10 n5 f10 … n3 3 3 n6 8 n6 – If not, search for the “closest” id in the table, and 3 n1 f3 forward the message to the corresponding 14 n4 f14 next_hop 5 n3 – If data is not found, failure is reported back • Requestor then tries next closest match in routing table

59 60 Freenet: Routing Properties Freenet: Anonymity & Security

• “Close” file ids tend to be stored on the same • Anonymity node – Randomly modify source of packet as it traverses the network – Why? Publications of similar file ids route toward – Can use “mix-nets” or onion-routing the same place • Security & Censorship resistance • Network tend to be a “small world” – No constraints on how to choose ids for files => easy to – Small number of nodes have large number of have to files collide, creating “denial of service” (censorship) neighbors (i.e., ~ “six-degrees of separation”) – Solution: have a id type that requires a private key signature that is verified when updating the file • Consequence: – Cache file on the reverse path of queries/publications => – Most queries only traverse a small number of hops attempt to “replace” file with bogus data will just cause the to find the file file to be replicated more!

61 62

Freenet: Discussion BitTorrent: Sharing Strategy

• Pros: • Employ “Tit-for-tat” sharing strategy – Intelligent routing makes queries relatively short – A is downloading from some other people – Search scope small (only nodes along search path • A will let the fastest N of those download from him involved); no flooding – Be optimistic: occasionally let freeloaders – Anonymity properties may give you “plausible download deniability” • Otherwise no one would ever start! • Also allows you to discover better peers to download • Cons: from when they reciprocate – Still no provable guarantees! – Let N peop – Anonymity features make it hard to measure, • Goal: Pareto Efficiency debug – Game Theory: “No change can make anyone 63 better off without making others worse off” 64