Peer-To-Peer Scaling Problem P2P System Why P2p?

Scaling Problem • Millions of clients ! server and network meltdown Peer-to-Peer 15-441 2 P2P System Why p2p? • Scaling: Create system whose capacity grows with # of clients - automatically! • Self-managing – This aspect attractive for corporate/datacenter needs – e.g., Amazon!s 100,000-ish machines, google!s 300k+ • Harness lots of “spare” capacity at end-hosts • Eliminate centralization • Leverage the resources of client machines (peers) – Robust to failures, etc. – Computation, storage, bandwidth – Robust to censorship, politics & legislation?? 3 4 Today!s Goal Outline • p2p is hot. • p2p file sharing techniques • There are tons and tons of instances – Downloading: Whole-file vs. chunks – Searching • But that!s not the point • Centralized index (Napster, etc.) • Flooding (Gnutella, etc.) • Smarter flooding (KaZaA, …) • Identify fundamental techniques useful in p2p • Routing (Freenet, etc.) settings • Uses of p2p - what works well, what doesn!t? • Understand the challenges – servers vs. arbitrary nodes • Look at the (current!) boundaries of where 2p – Hard state (backups!) vs soft-state (caches) is particularly useful 5 • Challenges 6 Searching & Fetching Searching Human: N N 2 N “I want to watch that great 80s cult classic 1 3 "Better Off Dead!” 1.Search: Key=“title” Internet Value=MP3 data… ? “better off dead” -> better_off_dead.mov Client Publisher or -> 0x539fba83ajdeadbeef Lookup(“title”) N4 N6 2.Locate sources of better_off_dead.mov N5 3.Download the file from them 7 8 Search Approaches Different types of searches • Centralized • Needles vs. Haystacks • Flooding – Searching for top 40, or an obscure punk • A hybrid: Flooding between track from 1981 that nobody!s heard of? “Supernodes” • Search expressiveness • Structured – Whole word? Regular expressions? File names? Attributes? Whole-text search? • (e.g., p2p gnutella or p2p google?) 9 10 Framework Centralized • Common Primitives: • Centralized Database: – Join: how to I begin participating? – Join: on startup, client contacts central server – Publish: how do I advertise my file? – Publish: reports list of files to central – Search: how to I find a file? server – Fetch: how to I retrieve a file? – Search: query the server => return node(s) that store the requested file 11 12 Napster Example: Publish Napster: Search 123.2.0.18 insert(X, search(A) 123.2.21.23) --> ... Fetch 123.2.0.18 Publish Query Reply I have X, Y, and Z! Where is file A? 123.2.21.23 13 14 Napster: Discussion Query Flooding • Pros: • Join: Must join a flooding network – Simple – Usually, establish peering with a few existing nodes – Search scope is O(1) for even complex searches (one index, etc.) • Publish: no need, just reply • Search: ask neighbors, who ask their – Controllable (pro or con?) neighbors, and so on... when/if found, • Cons: reply to sender. – Server maintains O(N) State – TTL limits propagation – Server does all processing – Single point of failure 15 16 • Technical failures + legal (napster shut down Example: Gnutella Flooding: Discussion I have file A. I have file A. • Pros: – Fully de-centralized – Search cost distributed Reply – Processing @ each node permits powerful search semantics • Cons: – Search scope is O(N) – Search time is O(???) Query – Nodes leave often, network unstable • TTL-limited search works well for haystacks. Where is file A? – For scalability, does NOT search every node. May have 17 to re-issue query later 18 Supernode Flooding Supernode Network Design “Super Nodes” • Join: on startup, client contacts a “supernode” ... may at some point become one itself • Publish: send list of files to supernode • Search: send query to supernode, supernodes flood query amongst themselves. – Supernode network just like prior flooding net 19 20 Supernode: File Insert Supernode: File Search search(A) --> 123.2.22.50 insert(X, 123.2.21.23) ... search(A) 123.2.22.50 --> Query Replies Publish 123.2.0.18 I have X! Where is file A? 123.2.21.23 123.2.0.18 21 22 Supernode: Which nodes? Stability and Superpeers • Often, bias towards nodes with good: • Why superpeers? – Bandwidth – Query consolidation – Computational Resources • Many connected nodes may have only a few files – Availability! • Propagating a query to a sub-node would take more b/w than answering it yourself – Caching effect • Requires network stability • Superpeer selection is time-based – How long you!ve been on is a good predictor of how long you!ll be around. 23 24 Structured Search: Superpeer results Distributed Hash Tables • Basically, “just better” than flood to all • Academic answer to p2p • Goals • Gets an order of magnitude or two – Guatanteed lookup success better scaling – Provable bounds on search time • But still fundamentally: o(search) * – Provable scalability o(per-node storage) = O(N) • Makes some things harder – Fuzzy queries / full-text search / etc. – central: O(1) search, O(N) storage • Read-write, not read-only – flood: O(N) search, O(1) storage • Hot Topic in networking since introduction in – Superpeer: can trade between ~2000/2001 25 26 Searching Wrap-Up DHT: Overview • Abstraction: a distributed “hash-table” (DHT) Type O(search) storage Fuzzy? data structure: – put(id, item); Central O(1) O(N) Yes – item = get(id); Flood ~O(N) O(1) Yes • Implementation: nodes in system form a distributed data structure Super < O(N) > O(1) Yes – Can be Ring, Tree, Hypercube, Skip List, Butterfly Network, ... Structured O(log N) O(log N) not really 27 28 DHT: Overview (2) DHT: Example - Chord • Structured Overlay Routing: • Associate to each node and file a unique id in – Join: On startup, contact a “bootstrap” node and integrate an uni-dimensional space (a Ring) yourself into the distributed data structure; get a node id – E.g., pick from the range [0...2m] – Publish: Route publication for file id toward a close node id along the data structure – Usually the hash of the file or IP address – Search: Route a query for file id toward a close node id. • Properties: Data structure guarantees that query will meet the publication. – Routing table size is O(log N) , where N is the total number of nodes – Important difference: get(key) is for an exact match on key! – Guarantees that a file is found in O(log N) hops • search(“spars”) will not find file(“briney spars”) • We can exploit this to be more efficient from MIT in 2001 29 30 DHT: Chord Basic Lookup DHT: Consistent Hashing N120 Key 5 K5 N10 Node 105 “Where is key 80?” N105 N105 K20 Circular ID space N32 “N90 has K80” N32 N90 K80 N90 K80 A key is stored at its successor: node with next higher ID N60 31 32 DHT: Chord “Finger Table” Node Join 1/4 1/2 • Compute ID • Use an existing node to route to that ID in 1/8 the ring. 1/16 1/32 – Finds s = successor(id) 1/64 1/128 • ask s for its predecessor, p N80 • Splice self into ring just like a linked list • Entry i in the finger table of node n is the first node that succeeds or – p->successor = me equals n + 2i • In other words, the ith finger points 1/2n-i way around the ring – me->successor = s 33 – me->predecessor = p 34 DHT: Chord Join DHT: Chord Join • Assume an identifier space [0..8] • Node n1 joins Succ. Table • Node n2 joins Succ. Table 0 i id+2i succ 0 i id+2i succ 7 1 0 2 1 7 1 0 2 2 1 3 1 1 3 1 2 5 1 2 5 1 6 2 6 2 Succ. Table i 5 3 5 3 i id+2 succ 4 4 0 3 1 1 4 1 2 6 1 35 36 DHT: Chord Join DHT: Chord Join Succ. Table Succ. Table Items i id+2i succ i id+2i succ 7 0 1 1 0 1 1 1 2 2 • Nodes: 1 2 2 2 4 0 n1, n2, n0, n6 2 4 0 • Nodes n0, n6 join Succ. Table 0 i id+2i succ 0 Succ. Table Items 0 2 2 i 7 1 • Items: 7 1 i id+2 succ 1 1 3 6 0 2 2 2 5 6 f7, f2 Succ. Table 1 3 6 2 5 6 i id+2i succ 0 7 0 6 2 Succ. Table 6 2 1 0 0 i 2 2 2 i id+2 succ 0 7 0 Succ. Table 1 0 0 Succ. Table i 2 2 2 i 5 3 i id+2 succ 5 3 i id+2 succ 4 0 3 6 4 0 3 6 1 4 6 1 4 6 2 6 6 2 6 6 37 38 DHT: Chord Routing DHT: Chord Summary Succ. Table Items i id+2i succ 7 • Upon receiving a query for 0 1 1 item id, a node: 1 2 2 • Routing table size? 2 4 0 • Checks whether stores the item locally –Log N fingers 0 Succ. Table • If not, forwards the query to Items 1 i the largest node in its 7 i id+2 succ 1 • Routing time? query(7) 0 2 2 successor table that does 1 3 6 not exceed id 2 5 6 –Each hop expects to 1/2 the distance to the Succ. Table 6 2 desired id => expect O(log N) hops. i id+2i succ 0 7 0 1 0 0 Succ. Table 2 2 2 i 5 3 i id+2 succ 4 0 3 6 1 4 6 2 6 6 39 40 The limits of search: DHT: Discussion A Peer-to-peer Google? • Pros: • Complex intersection queries (“the” + “who”) – Guaranteed Lookup – Billions of hits for each term alone – O(log N) per node state and search scope • Sophisticated ranking – Must compare many results before returning a • Cons: subset to user – This line used to say “not used.” But: • Very, very hard for a DHT / p2p system Now being used in a few apps, including – Need high inter-node bandwidth BitTorrent.

Peer-To-Peer Scaling Problem P2P System Why P2p?

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support