Scaling Problem • Millions of clients ! server and network meltdown Peer-to-Peer

15-441

2

P2P System Why p2p?

• Scaling: Create system whose capacity grows with # of clients - automatically! • Self-managing – This aspect attractive for corporate/datacenter needs – e.g., Amazon!s 100,000-ish machines, google!s 300k+ • Harness lots of “spare” capacity at end-hosts • Eliminate centralization • Leverage the resources of client machines (peers) – Robust to failures, etc. – Computation, storage, bandwidth – Robust to censorship, politics & legislation?? – Create apps/services without having huge resources

3 4 Today!s Goal Outline

• p2p is hot. • p2p file sharing techniques • There are tons and tons of instances – Downloading: Whole-file vs. chunks – Searching • But that!s not the point • Centralized index (, etc.) • Flooding (, etc.) • Smarter flooding (KaZaA, …) • Identify fundamental techniques useful in p2p • Routing (, etc.) settings • Uses of p2p - what works well, what doesn!t? • Understand the challenges – servers vs. arbitrary nodes • Look at the (current!) boundaries of where 2p – Hard state (backups!) vs soft-state (caches)

is particularly useful 55 • Challenges 6

Searching & Fetching Searching

Human: N N 2 N “I want to watch that great 80s cult classic 1 3 "Better Off Dead!” 1.Search: Key=“title” Internet Value=MP3 data… ? Client “better off dead” -> better_off_dead.mov Publisher or -> 0x539fba83ajdeadbeef Lookup(“title”) N4 N6 2.Locate sources of better_off_dead.mov N5 3.Download the file from them

77 8 Search Approaches Different types of searches

• Centralized • Needles vs. Haystacks • Flooding – Searching for top 40, or an obscure punk • A hybrid: Flooding between track from 1981 that nobody!s heard of? “Supernodes” • Search expressiveness • Structured – Whole word? Regular expressions? File names? Attributes? Whole-text search? • (e.g., p2p gnutella or p2p google?)

99 10

Framework Centralized

• Common Primitives: • Centralized Database: – Join: how to I begin participating? – Join: on startup, client contacts central server – Publish: how do I advertise my file? – Publish: reports list of files to central – Search: how to I find a file? server – Fetch: how to I retrieve a file? – Search: query the server => return node(s) that store the requested file

11 12 Napster Example: Publish Napster: Search 123.2.0.18

insert(X, search(A) 123.2.21.23) --> ... Fetch 123.2.0.18

Publish Query Reply

I have X, Y, and Z! Where is file A? 123.2.21.23

13 14

Napster: Discussion Searching Wrap-Up

• Pros: – Simple Type O(search) storage Fuzzy? – Search scope is O(1) for even complex Central O(1) O(N) Yes searches (one index, etc.) – Controllable (pro or con?) Flood • Cons: Super – Server maintains O(N) State Structured – Server does all processing

– Single point of failure 15 16 • Technical failures + legal (napster shut down Query Flooding Example: Gnutella I have file A. • Join: Must join a flooding network I have file A. – Usually, establish peering with a few existing nodes Reply • Publish: no need, just reply • Search: ask neighbors, who ask their neighbors, and so on... when/if found, reply to sender. Query – TTL limits propagation Where is file A?

17 18

Flooding: Discussion Flooding: Discussion 2

• Pros: • Overlay topology can be important – Fully de-centralized – Search cost distributed – Connections between peers != the – Processing @ each node permits powerful search underlying network links semantics – Might cross country (or world) multiple • Cons: times – Search scope is O(N) – Search time is O(???) – Nodes leave often, network unstable • TTL-limited search works well for haystacks. – For scalability, does NOT search every node. May have to re-issue query later 19 20 Supernode Flooding Supernode Network Design “Super Nodes” • Join: on startup, client contacts a “supernode” ... may at some point become one itself

• Publish: send list of files to supernode

• Search: send query to supernode, supernodes flood query amongst themselves. – Supernode network just like prior flooding net

21 22

Supernode: File Insert Supernode: File Search search(A) --> 123.2.22.50 insert(X, 123.2.21.23) ... search(A) 123.2.22.50 --> Query Replies Publish 123.2.0.18

I have X! Where is file A? 123.2.21.23 123.2.0.18

23 24 Supernode: Which nodes? Stability and Superpeers

• Often, bias towards nodes with good: • Why superpeers? – Bandwidth – Query consolidation – Computational Resources • Many connected nodes may have only a few files – Availability! • Propagating a query to a sub-node would take more b/w than answering it yourself – Caching effect • Requires network stability • Superpeer selection is time-based – How long you!ve been on is a good predictor of how long you!ll be around. 25 26

Structured Search: Superpeer results Distributed Hash Tables • Basically, “just better” than flood to all • Academic answer to p2p • Goals • Gets an order of magnitude or two – Guaranteed lookup success better scaling – Provable bounds on search time • But still fundamentally: o(search) * – Provable scalability o(per-node storage) = O(N) • Makes some things harder – Fuzzy queries / full-text search / etc. – central: O(1) search, O(N) storage • Read-write, not read-only – flood: O(N) search, O(1) storage • Hot Topic in networking since introduction in – Superpeer: can trade between ~2000/2001 2725 28 DHT: Overview DHT: Overview (2)

• Abstraction: a distributed “hash-table” (DHT) • Structured Overlay Routing: data structure: – Join: On startup, contact a “bootstrap” node and integrate – put(id, item); yourself into the distributed data structure; get a node id – item = get(id); – Publish: Route publication for file id toward a close node id along the data structure • Implementation: nodes in system form a – Search: Route a query for file id toward a close node id. distributed data structure Data structure guarantees that query will meet the publication. – Can be Ring, Tree, Hypercube, Skip List, Butterfly Network, ... – Important difference: get(key) is for an exact match on key! • search(“spars”) will not find file(“briney spars”) • We can exploit this to be more efficient 29 30

DHT: Example - Chord DHT: Consistent Hashing Key 5 K5 • Associate to each node and file a unique id in Node 105 an uni-dimensional space (a Ring) N105 K20 – E.g., pick from the range [0...2m] – Usually the hash of the file or IP address Circular ID space N32 –

N90

K80 from MIT in 2001 A key is stored at its successor: node with next higher ID

31 32 DHT: Chord Basic Lookup Node Join N120 N10 “Where is key 80?” N105 • Compute ID • Use an existing node to route to that ID in the ring. “N90 has K80” N32 – Finds s = successor(id) • ask s for its predecessor, p K80 N90 • Splice self into ring just like a linked list – p->successor = me N60 – me->successor = s

33 – me->predecessor = p 3434

DHT: Chord Join DHT: Chord Join

• Assume an identifier space [0..8]

• Node n1 joins Succ. Table • Node n2 joins Succ. Table 0 i id+2i succ 0 i id+2i succ 7 1 0 2 1 7 1 0 2 2 1 3 1 1 3 1 2 5 1 2 5 1

6 2 6 2

Succ. Table

i 5 3 5 3 i id+2 succ 4 4 0 3 1 1 4 1 2 6 1

35 36 DHT: Chord Join DHT: Chord Join

Succ. Table Succ. Table Items i id+2i succ i id+2i succ 7 0 1 1 0 1 1 1 2 2 • Nodes: 1 2 2 2 4 0 n1, n2, n0, n6 2 4 0 • Nodes n0, n6 join Succ. Table 0 i id+2i succ 0 Succ. Table Items 0 2 2 i 7 1 • Items: 7 1 i id+2 succ 1 1 3 6 0 2 2 2 5 6 f7, f2 Succ. Table 1 3 6 2 5 6 i id+2i succ 0 7 0 6 2 Succ. Table 6 2 1 0 0 i 2 2 2 i id+2 succ 0 7 0 Succ. Table 1 0 0 Succ. Table i 2 2 2 i 5 3 i id+2 succ 5 3 i id+2 succ 4 0 3 6 4 0 3 6 1 4 6 1 4 6 2 6 6 2 6 6

37 38

Basic search DHT: Chord “Finger Table” 1/4 1/2 • Forward from successor to successor...

• But that!s O(N)! 1/8

1/16 1/32 1/64 1/128 N80

• Entry i in the finger table of node n is the first node that succeeds or equals n + 2i • In other words, the ith finger points 1/2n-i way around the ring

39 40 DHT: Chord Routing DHT: Chord Summary Succ. Table Items i id+2i succ 7 • Upon receiving a query for 0 1 1 item id, a node: 1 2 2 • Routing table size? 2 4 0 • Checks whether stores the item locally –Log N fingers 0 Succ. Table Items • If not, forwards the query 7 1 i 1 to the largest node in its i id+2 succ • Routing time? query(7) 0 2 2 finger table that does not 1 3 6 exceed id 2 5 6 –Each hop expects to 1/2 the distance to the Succ. Table 6 2 desired id => expect O(log N) hops. i id+2i succ 0 7 0 • Resulting Properties: 1 0 0 Succ. Table 2 2 2 i 5 3 i id+2 succ – Routing table size is O(log N) , where N is the total 4 0 3 6 1 4 6 number of nodes 2 6 6 – Guarantees that a file is found in O(log N) hops 41 42

Searching Wrap-Up DHT: Discussion

• Now being used in a few apps, Type O(search) storage Fuzzy? including BitTorrent. Central O(1) O(N) Yes • Pros: – Guaranteed Lookup Flood ~O(N) O(1) Yes – O(log N) per node state and search scope Super < O(N) > O(1) Yes • Cons: Structured O(log N) O(log N) not really – Supporting non-exact match search is (quite!) hard. Let!s examine...

43 44 Fetching Data Chunk Fetching

• Once we know which node(s) have the • More than one node may have the file. data we want... • How to tell? – Must be able to distinguish identical files • Option 1: Fetch from a single peer – Not necessarily same filename – Problem: Have to fetch from peer who has – Same filename not necessarily same file... whole file. • Use hash of file – Common: MD5, SHA-1, etc. • Peers not useful sources until d/l whole file • How to fetch? • At which point they probably log off. :) – Get bytes [0..8000] from A, [8001...16000] from B – How can we fix this? – Alternative: Erasure Codes – Gets into data-oriented networking (big in research now) 4543 46

BitTorrent: Overview BitTorrent

• Swarming: • Periodically get list of peers from – Join: contact centralized “tracker” server, get a list tracker of peers. – Publish: Contact (or run) a tracker server. • More often: – Search: Out-of-band. E.g., use Google to find a – Ask each peer for what chunks it has tracker for the file you want. • (Or have them update you) – Fetch: Download chunks of the file from your peers. Upload chunks you have to them. • Request chunks from several peers at a • Big differences from Napster: time – Chunk based downloading (sound familiar? :) • Peers will start downloading from you – “few large files” focus 47 • BT has some machinery to try to bias4846 BitTorrent: Publish/Join BitTorrent: Fetch Tracker

49 50

BitTorrent: Summary P2P Challenges

• Pros: • Trust! – Works reasonably well in practice • Difficulty of doing “rich” search in p2p – Gives peers incentive to share resources; avoids freeloaders • Lots of unreliable nodes • Cons: – Central tracker server needed to bootstrap swarm – (Tracker is a design choice, not a requirement, as • Trust: you know from your projects. Modern BitTorrent – Freeloading (many filesharing systems) can also use a DHT to locate peers. But approach still needs a “search” mechanism) – Corrupting files (the RIAA, etc.)

51 – Malice and cheating (p2p gaming) 52 The limits of search: Writable, persistent p2p A Peer-to-peer Google? • Complex intersection queries (“the” + “who”) • Do you trust your data to 100,000 monkeys? – Billions of hits for each term alone • Node availability hurts • Sophisticated ranking – Ex: Store 5 copies of data on different nodes – Must compare many results before returning a – When someone goes away, you must replicate the subset to user data they held – Hard drives are *huge*, but cable modem upload • Very, very hard for a DHT / p2p system bandwidth is tiny - perhaps 10 Gbytes/day – Need high inter-node bandwidth – Takes many days to upload contents of 200GB – (This is exactly what Google does - massive hard drive. Very expensive leave/replication clusters) situation! • But maybe many file sharing queries are okay...53 54

What!s out there? P2P: Summary

Central Flood Super- Route • Many different styles; remember pros and cons of node each flood – centralized, flooding, swarming, unstructured and structured routing Whole Napster Gnutella Freenet • Lessons learned: File – Single points of failure are bad – Flooding messages to everyone is bad Chunk BitTorrent KaZaA DHTs – Underlying network topology is important – Not all nodes are equal Based (bytes, eDonkey – Need incentives to discourage freeloading not 2000 – Privacy and security are important chunks) – Structure can provide theoretical bounds and guarantees 55 56 KaZaA: Usage Patterns

• KaZaA is more than one workload! – Many files < 10MB Extra Slides (e.g., Audio Files) – Many files > 100MB (e.g., Movies)

from Gummadi et al., SOSP 2003

58

KaZaA: Usage Patterns (2) KaZaA: Usage Patterns (3)

• KaZaA is not Zipf! • What we saw: – FileSharing: – A few big files consume most of the bandwidth “Request-once” – Many files are fetched once per client but still very popular – Web: “Request- repeatedly” • Solution? – Caching!

from Gummadi et al., SOSP 2003 from Gummadi et al., SOSP 2003

59 60 Freenet: History Freenet: Overview

• In 1999, I. Clarke started the Freenet • Routed Queries: project – Join: on startup, client contacts a few other nodes it knows about; gets a unique node id • Basic Idea: – Publish: route file contents toward the file id. File – Employ Internet-like routing on the overlay is stored at node with id closest to file id network to publish and locate files – Search: route query for file id toward the closest node id • Addition goals: – Fetch: when query reaches a node containing – Provide anonymity and security file id, it returns the file to the sender – Make censorship difficult

61 62

Freenet: Routing Tables Freenet: Routing

• id – file identifier (e.g., hash of file) query(10) • next_hop – another node that stores the file id n1 n2 • file – file identified by id being stored on the local nodeid next_hop file 4 n1 f4 1 9 n3 f9 12 n2 f12 4’ … 5 n3 • Forwarding of query for file id 4 n4 n5 14 n5 f14 4 n1 f4 – If file id stored locally, then stop 2 5 13 n2 f13 10 n5 f10

• Forward data back to upstream requestor … n3 3 3 n6 8 n6 – If not, search for the “closest” id in the table, and 3 n1 f3 forward the message to the corresponding 14 n4 f14 next_hop 5 n3 – If data is not found, failure is reported back • Requestor then tries next closest match in routing table

63 64 Freenet: Routing Properties Freenet: Anonymity & Security

• “Close” file ids tend to be stored on the same • Anonymity node – Randomly modify source of packet as it traverses the network – Why? Publications of similar file ids route toward – Can use “mix-nets” or onion-routing the same place • Security & Censorship resistance • Network tend to be a “small world” – No constraints on how to choose ids for files => easy to – Small number of nodes have large number of have to files collide, creating “denial of service” (censorship) neighbors (i.e., ~ “six-degrees of separation”) – Solution: have a id type that requires a private key signature that is verified when updating the file • Consequence: – Cache file on the reverse path of queries/publications => – Most queries only traverse a small number of attempt to “replace” file with bogus data will just cause the hops to find the file file to be replicated more!

65 66

Freenet: Discussion BitTorrent: Sharing Strategy

• Pros: • Employ “Tit-for-tat” sharing strategy – Intelligent routing makes queries relatively short – A is downloading from some other people – Search scope small (only nodes along search • A will let the fastest N of those download from him path involved); no flooding – Be optimistic: occasionally let freeloaders – Anonymity properties may give you “plausible download deniability” • Otherwise no one would ever start! • Also allows you to discover better peers to download • Cons: from when they reciprocate – Still no provable guarantees! – Let N peop – Anonymity features make it hard to measure, • Goal: Pareto Efficiency debug – Game Theory: “No change can make anyone 67 better off without making others worse off” 68