
Scaling Problem • Millions of clients ⇒ server and network meltdown Peer-to-Peer 15-441 2 P2P System Why P2P? • Harness lots of spare capacity – 1 Big Fast Server: 1Gbit/s, $10k/month++ – 2,000 cable modems: 1Gbit/s, $ ?? – 1M end-hosts: Uh, wow. • Build self-managing systems / Deal with huge scale – Same techniques attractive for both companies / • Leverage the resources of client machines (peers) servers / p2p – Computation, storage, bandwidth • E.g., Akamaiʼs 14,000 nodes • Googleʼs 100,000+ nodes 3 4 1 Outline P2P file-sharing • p2p file sharing techniques • Quickly grown in popularity – Downloading: Whole-file vs. chunks – Dozens or hundreds of file sharing applications – Searching – 35 million American adults use P2P networks -- • Centralized index (Napster, etc.) 29% of all Internet users in US! • Flooding (Gnutella, etc.) • Smarter flooding (KaZaA, …) – Audio/Video transfer now dominates traffic on the • Routing (Freenet, etc.) Internet • Uses of p2p - what works well, what doesnʼt? – servers vs. arbitrary nodes – Hard state (backups!) vs soft-state (caches) • Challenges – Fairness, freeloading, security, … 5 6 Whatʼs out there? Searching Central Flood Super- Route N node N 2 N flood 1 3 Whole Napster Gnutella Freenet File Key=“title” Internet Value=MP3 data… ? Client Publisher Lookup(“title”) Chunk BitTorrent KaZaA DHTs N N (bytes, 4 6 Based eDonkey N5 not 2000 chunks) 7 8 2 Searching 2 Framework • Needles vs. Haystacks • Common Primitives: – Searching for top 40, or an obscure punk – Join: how do I begin participating? track from 1981 that nobodyʼs heard of? – Publish: how do I advertise my file? • Search expressiveness – Search: how to I find a file? – Whole word? Regular expressions? File – Fetch: how to I retrieve a file? names? Attributes? Whole-text search? • (e.g., p2p gnutella or p2p google?) 9 10 Next Topic... Napster: History • Centralized Database – Napster • 1999: Sean Fanning launches Napster • Query Flooding – Gnutella • Peaked at 1.5 million simultaneous • Intelligent Query Flooding users – KaZaA • Swarming • Jul 2001: Napster shuts down – BitTorrent • Unstructured Overlay Routing – Freenet • Structured Overlay Routing – Distributed Hash Tables 11 12 3 Napster: Overiew Napster: Publish • Centralized Database: – Join: on startup, client contacts central insert(X, server 123.2.21.23) – Publish: reports list of files to central ... server – Search: query the server => return someone that stores the requested file Publish – Fetch: get the file directly from peer I have X, Y, and Z! 123.2.21.23 13 14 Napster: Search Napster: Discussion 123.2.0.18 • Pros: search(A) – Simple --> – Search scope is O(1) Fetch 123.2.0.18 – Controllable (pro or con?) Query Reply • Cons: – Server maintains O(N) State – Server does all processing Where is file A? – Single point of failure 15 16 4 Next Topic... Gnutella: History • Centralized Database – Napster • In 2000, J. Frankel and T. Pepper from • Query Flooding – Gnutella Nullsoft released Gnutella • Intelligent Query Flooding • Soon many other clients: Bearshare, – KaZaA • Swarming Morpheus, LimeWire, etc. – BitTorrent • In 2001, many protocol enhancements • Unstructured Overlay Routing – Freenet including “ultrapeers” • Structured Overlay Routing – Distributed Hash Tables 17 18 Gnutella: Overview Gnutella: Search I have file A. • Query Flooding: I have file A. – Join: on startup, client contacts a few other nodes; these become its “neighbors” – Publish: no need Reply – Search: ask neighbors, who ask their neighbors, and so on... when/if found, reply to sender. • TTL limits propagation Query – Fetch: get the file directly from peer Where is file A? 19 20 5 Gnutella: Discussion KaZaA: History • Pros: • In 2001, KaZaA created by Dutch company – Fully de-centralized Kazaa BV – Search cost distributed • Single network called FastTrack used by – Processing @ each node permits powerful search semantics other clients as well: Morpheus, giFT, etc. • Cons: • Eventually protocol changed so other clients – Search scope is O(N) could no longer talk to it – Search time is O(???) • Most popular file sharing network today with – Nodes leave often, network unstable >10 million users (number varies) • TTL-limited search works well for haystacks. – For scalability, does NOT search every node. May21 22 have to re-issue query later KaZaA: Overview KaZaA: Network Design “Super Nodes” • “Smart” Query Flooding: – Join: on startup, client contacts a “supernode” ... may at some point become one itself – Publish: send list of files to supernode – Search: send query to supernode, supernodes flood query amongst themselves. – Fetch: get the file directly from peer(s); can fetch simultaneously from multiple peers 23 24 6 KaZaA: File Insert KaZaA: File Search search(A) --> 123.2.22.50 insert(X, 123.2.21.23) ... search(A) 123.2.22.50 --> Query Publish Replies 123.2.0.18 I have X! Where is file A? 123.2.21.23 123.2.0.18 25 26 KaZaA: Fetching KaZaA: Discussion • More than one node may have requested file... • Pros: • How to tell? – Tries to take into account node heterogeneity: – Must be able to distinguish identical files • Bandwidth – Not necessarily same filename • Host Computational Resources – Same filename not necessarily same file... • Host Availability (?) • Use Hash of file – Rumored to take into account network locality – KaZaA uses UUHash: fast, but not secure • Cons: – Alternatives: MD5, SHA-1 – Mechanisms easy to circumvent • How to fetch? – Still no real guarantees on search scope or search time – Get bytes [0..1000] from A, [1001...2000] from B • Similar behavior to gnutella, but better. – Alternative: Erasure Codes 27 28 7 Stability and Superpeers BitTorrent: History • Why superpeers? • In 2002, B. Cohen debuted BitTorrent – Query consolidation • Key Motivation: • Many connected nodes may have only a few files – Popularity exhibits temporal locality (Flash Crowds) • Propagating a query to a sub-node would take more b/w – E.g., Slashdot effect, CNN on 9/11, new movie/game than answering it yourself release – Caching effect • Focused on Efficient Fetching, not Searching: • Requires network stability – Distribute the same file to all peers • Superpeer selection is time-based – Single publisher, multiple downloaders • Has some “real” publishers: – How long youʼve been on is a good predictor of – Blizzard Entertainment using it to distribute the beta of their how long youʼll be around. new game 29 30 BitTorrent: Overview BitTorrent: Publish/Join Tracker • Swarming: – Join: contact centralized “tracker” server, get a list of peers. – Publish: Run a tracker server. – Search: Out-of-band. E.g., use Google to find a tracker for the file you want. – Fetch: Download chunks of the file from your peers. Upload chunks you have to them. • Big differences from Napster: – Chunk based downloading (sound familiar? :) – “few large files” focus – Anti-freeloading mechanisms 31 32 8 BitTorrent: Fetch BitTorrent: Sharing Strategy • Employ “Tit-for-tat” sharing strategy – A is downloading from some other people • A will let the fastest N of those download from him – Be optimistic: occasionally let freeloaders download • Otherwise no one would ever start! • Also allows you to discover better peers to download from when they reciprocate • Goal: Pareto Efficiency – Game Theory: “No change can make anyone better off without making others worse off” 33 – Does it work? (donʼt know!) 34 BitTorrent: Summary Next Topic... • Centralized Database • Pros: – Napster – Works reasonably well in practice • Query Flooding – Gnutella – Gives peers incentive to share resources; avoids • Intelligent Query Flooding freeloaders – KaZaA • Cons: • Swarming – BitTorrent – Pareto Efficiency relatively weak condition • Unstructured Overlay Routing – Central tracker server needed to bootstrap swarm – Freenet – (Tracker is a design choice, not a requirement. • Structured Overlay Routing Could easily combine with other approaches.) – Distributed Hash Tables (DHT) 35 36 9 Distributed Hash Tables DHT: Chord Summary • Academic answer to p2p • Routing table size? • Goals –Log N fingers – Guaranteed lookup success – Provable bounds on search time • Routing time? – Provable scalability –Each hop expects to 1/2 the distance to the • Makes some things harder desired id => expect O(log N) hops. – Fuzzy queries / full-text search / etc. • Read-write, not read-only • Hot Topic in networking since introduction in ~2000/2001 37 38 DHT: Discussion When are p2p / DHTs useful? • Pros: • Caching and “soft-state” data – Guaranteed Lookup – Works well! BitTorrent, KaZaA, etc., all – O(log N) per node state and search scope use peers as caches for hot data • Cons: • Finding read-only data – No one uses them? (only one file sharing – Limited flooding finds hay app) – DHTs find needles – Supporting non-exact match search is hard • BUT 39 40 10 A Peer-to-peer Google? Writable, persistent p2p • Complex intersection queries (“the” + “who”) • Do you trust your data to 100,000 monkeys? – Billions of hits for each term alone • Node availability hurts • Sophisticated ranking – Ex: Store 5 copies of data on different nodes – Must compare many results before returning a – When someone goes away, you must replicate the data they held subset to user – Hard drives are *huge*, but cable modem upload • Very, very hard for a DHT / p2p system bandwidth is tiny - perhaps 10 Gbytes/day – Need high inter-node bandwidth – Takes many days to upload contents of 200GB – (This is exactly what Google does - massive hard drive. Very expensive leave/replication clusters) situation! 41 42 P2P: Summary • Many different styles;
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages14 Page
-
File Size-