
Scaling Problem • Millions of clients ! server and network meltdown Peer-to-Peer 15-441 2 P2P System Why p2p? • Scaling: Create system whose capacity grows with # of clients - automatically! • Self-managing – This aspect attractive for corporate/datacenter needs – e.g., Amazon!s 100,000-ish machines, google!s 300k+ • Harness lots of “spare” capacity at end-hosts • Eliminate centralization • Leverage the resources of client machines (peers) – Robust to failures, etc. – Computation, storage, bandwidth – Robust to censorship, politics & legislation?? 3 4 Today!s Goal Outline • p2p is hot. • p2p file sharing techniques • There are tons and tons of instances – Downloading: Whole-file vs. chunks – Searching • But that!s not the point • Centralized index (Napster, etc.) • Flooding (Gnutella, etc.) • Smarter flooding (KaZaA, …) • Identify fundamental techniques useful in p2p • Routing (Freenet, etc.) settings • Uses of p2p - what works well, what doesn!t? • Understand the challenges – servers vs. arbitrary nodes • Look at the (current!) boundaries of where 2p – Hard state (backups!) vs soft-state (caches) is particularly useful 5 • Challenges 6 Searching & Fetching Searching Human: N N 2 N “I want to watch that great 80s cult classic 1 3 "Better Off Dead!” 1.Search: Key=“title” Internet Value=MP3 data… ? “better off dead” -> better_off_dead.mov Client Publisher or -> 0x539fba83ajdeadbeef Lookup(“title”) N4 N6 2.Locate sources of better_off_dead.mov N5 3.Download the file from them 7 8 Search Approaches Different types of searches • Centralized • Needles vs. Haystacks • Flooding – Searching for top 40, or an obscure punk • A hybrid: Flooding between track from 1981 that nobody!s heard of? “Supernodes” • Search expressiveness • Structured – Whole word? Regular expressions? File names? Attributes? Whole-text search? • (e.g., p2p gnutella or p2p google?) 9 10 Framework Centralized • Common Primitives: • Centralized Database: – Join: how to I begin participating? – Join: on startup, client contacts central server – Publish: how do I advertise my file? – Publish: reports list of files to central – Search: how to I find a file? server – Fetch: how to I retrieve a file? – Search: query the server => return node(s) that store the requested file 11 12 Napster Example: Publish Napster: Search 123.2.0.18 insert(X, search(A) 123.2.21.23) --> ... Fetch 123.2.0.18 Publish Query Reply I have X, Y, and Z! Where is file A? 123.2.21.23 13 14 Napster: Discussion Query Flooding • Pros: • Join: Must join a flooding network – Simple – Usually, establish peering with a few existing nodes – Search scope is O(1) for even complex searches (one index, etc.) • Publish: no need, just reply • Search: ask neighbors, who ask their – Controllable (pro or con?) neighbors, and so on... when/if found, • Cons: reply to sender. – Server maintains O(N) State – TTL limits propagation – Server does all processing – Single point of failure 15 16 • Technical failures + legal (napster shut down Example: Gnutella Flooding: Discussion I have file A. I have file A. • Pros: – Fully de-centralized – Search cost distributed Reply – Processing @ each node permits powerful search semantics • Cons: – Search scope is O(N) – Search time is O(???) Query – Nodes leave often, network unstable • TTL-limited search works well for haystacks. Where is file A? – For scalability, does NOT search every node. May have 17 to re-issue query later 18 Supernode Flooding Supernode Network Design “Super Nodes” • Join: on startup, client contacts a “supernode” ... may at some point become one itself • Publish: send list of files to supernode • Search: send query to supernode, supernodes flood query amongst themselves. – Supernode network just like prior flooding net 19 20 Supernode: File Insert Supernode: File Search search(A) --> 123.2.22.50 insert(X, 123.2.21.23) ... search(A) 123.2.22.50 --> Query Replies Publish 123.2.0.18 I have X! Where is file A? 123.2.21.23 123.2.0.18 21 22 Supernode: Which nodes? Stability and Superpeers • Often, bias towards nodes with good: • Why superpeers? – Bandwidth – Query consolidation – Computational Resources • Many connected nodes may have only a few files – Availability! • Propagating a query to a sub-node would take more b/w than answering it yourself – Caching effect • Requires network stability • Superpeer selection is time-based – How long you!ve been on is a good predictor of how long you!ll be around. 23 24 Structured Search: Superpeer results Distributed Hash Tables • Basically, “just better” than flood to all • Academic answer to p2p • Goals • Gets an order of magnitude or two – Guatanteed lookup success better scaling – Provable bounds on search time • But still fundamentally: o(search) * – Provable scalability o(per-node storage) = O(N) • Makes some things harder – Fuzzy queries / full-text search / etc. – central: O(1) search, O(N) storage • Read-write, not read-only – flood: O(N) search, O(1) storage • Hot Topic in networking since introduction in – Superpeer: can trade between ~2000/2001 25 26 Searching Wrap-Up DHT: Overview • Abstraction: a distributed “hash-table” (DHT) Type O(search) storage Fuzzy? data structure: – put(id, item); Central O(1) O(N) Yes – item = get(id); Flood ~O(N) O(1) Yes • Implementation: nodes in system form a distributed data structure Super < O(N) > O(1) Yes – Can be Ring, Tree, Hypercube, Skip List, Butterfly Network, ... Structured O(log N) O(log N) not really 27 28 DHT: Overview (2) DHT: Example - Chord • Structured Overlay Routing: • Associate to each node and file a unique id in – Join: On startup, contact a “bootstrap” node and integrate an uni-dimensional space (a Ring) yourself into the distributed data structure; get a node id – E.g., pick from the range [0...2m] – Publish: Route publication for file id toward a close node id along the data structure – Usually the hash of the file or IP address – Search: Route a query for file id toward a close node id. • Properties: Data structure guarantees that query will meet the publication. – Routing table size is O(log N) , where N is the total number of nodes – Important difference: get(key) is for an exact match on key! – Guarantees that a file is found in O(log N) hops • search(“spars”) will not find file(“briney spars”) • We can exploit this to be more efficient from MIT in 2001 29 30 DHT: Chord Basic Lookup DHT: Consistent Hashing N120 Key 5 K5 N10 Node 105 “Where is key 80?” N105 N105 K20 Circular ID space N32 “N90 has K80” N32 N90 K80 N90 K80 A key is stored at its successor: node with next higher ID N60 31 32 DHT: Chord “Finger Table” Node Join 1/4 1/2 • Compute ID • Use an existing node to route to that ID in 1/8 the ring. 1/16 1/32 – Finds s = successor(id) 1/64 1/128 • ask s for its predecessor, p N80 • Splice self into ring just like a linked list • Entry i in the finger table of node n is the first node that succeeds or – p->successor = me equals n + 2i • In other words, the ith finger points 1/2n-i way around the ring – me->successor = s 33 – me->predecessor = p 34 DHT: Chord Join DHT: Chord Join • Assume an identifier space [0..8] • Node n1 joins Succ. Table • Node n2 joins Succ. Table 0 i id+2i succ 0 i id+2i succ 7 1 0 2 1 7 1 0 2 2 1 3 1 1 3 1 2 5 1 2 5 1 6 2 6 2 Succ. Table i 5 3 5 3 i id+2 succ 4 4 0 3 1 1 4 1 2 6 1 35 36 DHT: Chord Join DHT: Chord Join Succ. Table Succ. Table Items i id+2i succ i id+2i succ 7 0 1 1 0 1 1 1 2 2 • Nodes: 1 2 2 2 4 0 n1, n2, n0, n6 2 4 0 • Nodes n0, n6 join Succ. Table 0 i id+2i succ 0 Succ. Table Items 0 2 2 i 7 1 • Items: 7 1 i id+2 succ 1 1 3 6 0 2 2 2 5 6 f7, f2 Succ. Table 1 3 6 2 5 6 i id+2i succ 0 7 0 6 2 Succ. Table 6 2 1 0 0 i 2 2 2 i id+2 succ 0 7 0 Succ. Table 1 0 0 Succ. Table i 2 2 2 i 5 3 i id+2 succ 5 3 i id+2 succ 4 0 3 6 4 0 3 6 1 4 6 1 4 6 2 6 6 2 6 6 37 38 DHT: Chord Routing DHT: Chord Summary Succ. Table Items i id+2i succ 7 • Upon receiving a query for 0 1 1 item id, a node: 1 2 2 • Routing table size? 2 4 0 • Checks whether stores the item locally –Log N fingers 0 Succ. Table • If not, forwards the query to Items 1 i the largest node in its 7 i id+2 succ 1 • Routing time? query(7) 0 2 2 successor table that does 1 3 6 not exceed id 2 5 6 –Each hop expects to 1/2 the distance to the Succ. Table 6 2 desired id => expect O(log N) hops. i id+2i succ 0 7 0 1 0 0 Succ. Table 2 2 2 i 5 3 i id+2 succ 4 0 3 6 1 4 6 2 6 6 39 40 The limits of search: DHT: Discussion A Peer-to-peer Google? • Pros: • Complex intersection queries (“the” + “who”) – Guaranteed Lookup – Billions of hits for each term alone – O(log N) per node state and search scope • Sophisticated ranking – Must compare many results before returning a • Cons: subset to user – This line used to say “not used.” But: • Very, very hard for a DHT / p2p system Now being used in a few apps, including – Need high inter-node bandwidth BitTorrent.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages16 Page
-
File Size-