Tutorial on P2P Systems
Total Page:16
File Type:pdf, Size:1020Kb
P2P Systems Keith W. Ross Dan Rubenstein Polytechnic University Columbia University http://cis.poly.edu/~ross http://www.cs.columbia.edu/~danr [email protected] [email protected] Thanks to: B. Bhattacharjee, A. Rowston, Don Towsley © Keith Ross and Dan Rubenstein 1 Defintion of P2P 1) Significant autonomy from central servers 2) Exploits resources at the edges of the Internet P storage and content P CPU cycles P human presence 3) Resources at edge have intermittent connectivity, being added & removed 2 It’s a broad definition: U P2P file sharing U DHTs & their apps P Napster, Gnutella, P Chord, CAN, Pastry, KaZaA, eDonkey, etc Tapestry U P2P communication U P Instant messaging P2P apps built over P Voice-over-IP: Skype emerging overlays P PlanetLab U P2P computation P seti@home Wireless ad-hoc networking not covered here 3 Tutorial Outline (1) U 1. Overview: overlay networks, P2P applications, copyright issues, worldwide computer vision U 2. Unstructured P2P file sharing: Napster, Gnutella, KaZaA, search theory, flashfloods U 3. Structured DHT systems: Chord, CAN, Pastry, Tapestry, etc. 4 Tutorial Outline (cont.) U 4. Applications of DHTs: persistent file storage, mobility management, etc. U 5. Security issues: vulnerabilities, solutions, anonymity U 6. Graphical structure: random graphs, fault tolerance U 7. Experimental observations: measurement studies U 8. Wrap up 5 1. Overview of P2P U overlay networks U P2P applications U worldwide computer vision 6 Overlay networks overlay edge 7 Overlay graph Virtual edge U TCP connection U or simply a pointer to an IP address Overlay maintenance U Periodically ping to make sure neighbor is still alive U Or verify liveness while messaging U If neighbor goes down, may want to establish new edge U New node needs to bootstrap 8 More about overlays Unstructured overlays U e.g., new node randomly chooses three existing nodes as neighbors Structured overlays U e.g., edges arranged in restrictive structure Proximity U Not necessarily taken into account 9 Overlays: all in the application layer application Tremendous design transport network flexibility data link physical P Topology, maintenance P Message types P Protocol P Messaging over TCP or UDP Underlying physical net is transparent to developer application P But some overlays exploit application transport transport network proximity network data link data link physical physical 10 Examples of overlays U DNS U BGP routers and their peering relationships U Content distribution networks (CDNs) U Application-level multicast P economical way around barriers to IP multicast U And P2P apps ! 11 1. Overview of P2P U overlay networks U current P2P applications P P2P file sharing & copyright issues P Instant messaging / voice over IP P P2P distributed computing U worldwide computer vision 12 P2P file sharing U Alice runs P2P client U Asks for “Hey Jude” application on her U Application displays notebook computer other peers that have U Intermittently copy of Hey Jude. connects to Internet; U Alice chooses one of gets new IP address the peers, Bob. for each connection U File is copied from U Registers her content Bob’s PC to Alice’s in P2P system notebook: P2P U While Alice downloads, other users uploading from Alice. 13 Millions of content servers Blue NPR ER Hey Jude Star Wars Magic Flute 14 Killer deployments U Napster P disruptive; proof of concept U Gnutella P open source U KaZaA/FastTrack P Today more KaZaA traffic then Web traffic! U eDonkey / Overnet P Becoming popular in Europe P Appears to use a DHT Is success due to massive number of servers, or simply because content is free? 15 P2P file sharing software U Allows Alice to open up U Allows users to search a directory in her file nodes for content system based on keyword P Anyone can retrieve a matches: file from directory P Like Google P Like a Web server U Allows Alice to copy Seems harmless files from other users’ to me ! open directories: P Like a Web client 16 Copyright issues (1) Direct infringement: U end users who download or upload copyrighted works direct infringers Indirect infringement: U Hold an individual accountable for actions of others U Contributory U Vicarious 17 Copyright issues (2) Contributory infringer: Vicarious infringer: U knew of underlying U able to control the direct infringement, direct infringers (e.g., and terminate user U caused, induced, or accounts), and materially contributed U derived direct financial to direct infringement benefit from direct infringement (money, more users) (knowledge not necessary) 18 Copyright issues (3) Betamax VCR defense Guidelines for P2P developers U Manufacturer not U total control so that liable for contributory there’s no direct infringement infringement U “capable of substantial or non-infringing use” U no control over users – no U But in Napster case, remote kill switch, court found defense automatic updates, actively does not apply to all promote non-infringing vicarious liability uses of product U Disaggregate functions: indexing, search, transfer U No customer support 19 Instant Messaging U Alice runs IM client on U Alice initiates direct her PC TCP connection with U Intermittently Bob: P2P connects to Internet; U Alice and Bob chat. gets new IP address for each connection U Registers herself with U Can also be voice, “system” video and text. U Learns from “system” that Bob in her buddy We’ll see that Skype list is active is a VoIP P2P system 20 P2P Distributed Computing seti@home U Peer sets up TCP U Search for ET connection to central intelligence computer, downloads U Central site collects chunk radio telescope data U Peer does FFT on U Data is divided into chunk, uploads results, work chunks of 300 gets new chunk Kbytes U User obtains client, Not peer to peer, but exploits which runs in backgrd resources at network edge 21 1. Overview of P2P U overlay networks U P2P applications U worldwide computer vision 22 Worldwide Computer Vision Alice’s home computer: U Alice’s computer is U Working for biotech, moonlighting matching gene sequences U Payments come from U DSL connection downloading biotech company, movie telescope data system and backup service U Contains encrypted Your PC is only a component fragments of thousands of non-Alice files in the “big” computer U Occasionally a fragment is read; it’s part of a movie someone is watching in Paris U Her laptop is off, but it’s backing up others’ files 23 Worldwide Computer (2) Anderson & Kubiatowicz: Challenges Internet-scale OS U heterogeneous hosts U Thin software layer running U security on each host & central U payments coordinating system Central server complex running on ISOS server U complex needed to ensure privacy of sensitive data U allocating resources, U coordinating currency ISOS server complex transfer maintains databases of resource descriptions, U Supports data processing & usage policies, and task online services descriptions 24 2. Unstructured P2P File Sharing U Napster U Gnutella U KaZaA U BitTorrent U search theory U dealing with flash crowds 25 Napster U Paradigm shift U not the first (c.f. probably Eternity, from Ross Anderson in Cambridge) U but instructive for what it gets right, and U also wrong… U also had a political message…and economic and legal… 26 Napster U program for sharing files over the Internet U a “disruptive” application/technology? U history: P 5/99: Shawn Fanning (freshman, Northeasten U.) founds Napster Online music service P 12/99: first lawsuit P 3/00: 25% UWisc traffic Napster P 2/01: US Circuit Court of Appeals: Napster knew users violating copyright laws P 7/01: # simultaneous online users: Napster 160K, Gnutella: 40K, Morpheus (KaZaA): 300K 27 Napster U judge orders Napster 8M to pull plug in July ‘01 U other file sharing apps 6M take over! 4M bits per sec 2M 0.0 gnutella napster fastrack (KaZaA) 28 Napster: how did it work U Application-level, client-server protocol over point-to-point TCP U Centralized directory server Steps: U connect to Napster server U upload your list of files to server. U give server keywords to search the full list with. U select “best” of correct answers. (pings) 29 Napster napster.com 1. File list centralized directory and IP address is uploaded 30 Napster napster.com User 2. centralized directory requests search at server. Query and results 31 Napster napster.com 3. User pings centralized directory hosts that apparently have data. Looks for pings best transfer pings rate. 32 Napster napster.com 4. User chooses centralized directory server Napster’s centralized Retrieves server farm had file difficult time keeping up with traffic 33 2. Unstructured P2P File Sharing U Napster U Gnutella U KaZaA U BitTorrent U search theory U dealing with flash crowds 34 Distributed Search/Flooding 35 Distributed Search/Flooding 36 Gnutella U focus: decentralized method of searching for files P central directory server no longer the bottleneck P more difficult to “pull plug” U each application instance serves to: P store selected files P route queries from and to its neighboring peers P respond to queries if file stored locally P serve files 37 Gnutella U Gnutella history: P 3/14/00: release by AOL, almost immediately withdrawn P became open source P many iterations to fix poor initial design (poor design turned many people off) U issues: P how much traffic does one query generate? P how many hosts can it support at once? P what is the latency associated with querying? P is there a bottleneck? 38 Gnutella: limited scope query Searching by flooding: U if you don’t have the file you want, query 7 of your neighbors. U if they don’t have it, they contact 7 of their neighbors, for a maximum hop count of 10.