P2P Systems Outline

P2P Systems IP networks course Giovanni Neglia Outline P2P systems overview The ancestor: Napster Only file sharing? distributed computing, collaborative environment, anonimity... Why P2P? File sharing, file sharing, file sharing... witha note on structuredP2P networks and DHT 1 What is Peer-to-Peer (P2P)? The Client-Server Model Contact a server and get the service. Server has all the resources and capabilities. No interaction among clients Common model in the Internet (e.g. www). 2 What is a peer? —an entity with capabilities similar to other entities in the system“ The P2P Model A peer‘s resources are similar to the resources of the other participants P2P œ peers communicating directly with other peers and sharing resources Peer = Servent = Server+Client 3 P2P Application Taxonomy Proposed for file sharing (Napster, Gnutella, Kazaa, BitTorrent) Distributed Computing (SETI@home) Collaboration (Jabber, Groove) generalpurpose platforms (JXTA) audio/videoconference -> Skype anonimity -> Tor, I2P chensorshipresistance -> Infranet, Tangler P2P File Sharing Software Allows a user to open up a directory in their file system Anyone can retrieve a file from directory Like a Web server Allows the user to copy files from other users‘ open directories: Like a Web client Allows users to search nodes for content based on keyword matches: Like Google 4 Outline P2P systems overview The ancestor: Napster Only file sharing? distributed computing, collaborative environment, anonimity... Why P2P? File sharing, file sharing, file sharing... Napster: How Did It Work Application-level, client-server protocol over point-to-point TCP Centralized directory server Steps: Connect to Napster server Give server keywords to search the full list with. Select —best“ of correct answers. One approach is selecting based on the response time of pings. Shortest response time is chosen. 5 Napster: How Did It Work napster.com centralized directory 1. File list and IP address is uploaded Napster: How Did It Work napster.com 2.U ser centralized directory requests search at server. Q uery and results 6 Napster: How Did It Work napster.com 3. U ser pings centralized directory hosts that apparently have data. Looks for pings best transfer pings rate. Napster: How Did It Work napster.com 4. U ser chooses centralized directory server N apster’s centralized server farm had difficult tim e Retrieves keeping file up w ith traffic 7 Napster History 5/99: Shawn Fanning (freshman, NortheastenU.) founds Napster Online music service 12/99: first lawsuit 3/00: 25% UWisctraffic Napster 2/01: US Circuit Court of Appeals: Napster knew users violating copyright laws Napster 8M Judge orders Napster to 6c M pull plug in July ”01 e s r Other file sharing apps e 4p M take over! s t i b 7/01: # simultaneous 2M online users: Napster 160K, 0.0 Gnutella: 40K, Morpheus (KaZaA): 300K gnutella napster fastrack (KaZ aA ) 8 Napster: Discussion Locates files quickly Vulnerable to censorship and technical failure Popular data become less accessible because of the load of the requests on a central server Outline P2P systems overview The ancestor: Napster Only file sharing? distributed computing, collaborative environment, anonimity... Why P2P? File sharing, file sharing, file sharing... 9 SETI@home Search for Extraterrestrial Intelligence Search through massive amounts of radio telescope data to look for signals Build huge virtual computer by using idle cycles on Internet computer Conceived 1995, launched April 1999 SETI@HOME Distributes a screen saverœbased application to users Applies signal analysis algorithms different data sets to process radio-telescope data. 1.Install 3.S ET I client gets S creen S erver data from server and runs 1 23 I@ H om e M ain S erver 4.Client sends results back to server Radio-telescope Data 2.S ET I client (screen S aver) starts 10 Why don‘t we build a huge supercomputer? Supercomputers time is expensive Home PCs are commonly idle Why don‘t we build a huge supercomputer? Top500 supercomputer list over time: k Zipfdistribution: Perf(rank) rank -0.68 Parameter 'k' evolution . 1. 0000 -0.70 ) . e 2001 2000 l ) -0.72 a e c l 1999 1998 s a -0.74 2001 c g 1000 1997 1996 s -0.76 o l g ( 1995 -0.78 o l S ( P -0.80 S O P 100 -0.82 L O F -0.84 L G F . 3 2 1 0 9 8 7 6 5 f 0 0 0 0 9 9 9 9 9 G r 0 0 0 0 9 9 9 9 9 e 2 2 2 2 1 1 1 1 1 k p c 10 a k P c a n i P L n i L 1 1 10 100 1000 Rank (log scale) 11 -0.68 Parameter 'k' evolution . -0.70 -0.72 -0.74 -0.76 -0.78 -0.80 -0.82 Trend -0.84 3 2 1 0 9 8 7 6 s 5 0 0 0 0 9 9 9 9 9 0 0 0 0 9 9 9 9 9 2 2 2 2 1 1 1 1 1 Increasingly interest to aggregate the capabilities of the machines in the tail of this distribution. A virtual machine that aggregates the last 10 in Top500 would rank 32ndin ‘95 but 14thin ”03 Grid computing is, in part, results of this trend: infrastructure enabling controlled, secure resource sharing (for a relatively small number of resources) Lessons from SETI@home Can apply this technology to real problems Expected 100k participants, but have 3 million, 0.5M active, 226 countries 40 TB data recorded, processed 25 TeraFLOPs average over 2005 Almost 1 million years CPU time No ET signals yet, but other results Followers: Genome@home, Folding@home, CancerResearch Iniziative 12 Groove P2P Work environment Ideal fordynamicteam work Integration Secure, faster & with existing cost effective system Functional online Extension interaction Edit and Upload User Download/Update /Store contents components Developer Update/Upload contents Download components Interact / Negotiate Share file / Coordinate events Centralized IS control for IT Sh are d Virtu al Space Better real-time Manager interaction Upload contents Network & Web Service Flexible and integral Platform Download contents P rovide Software/ Application T ech Partne r/Su pplie r/Family GROO VE Outline P2P systems overview The ancestor: Napster Only file sharing? distributed computing, collaborative environment, anonimity... Why P2P? File sharing, file sharing, file sharing... 13 Why P2P? Distributed systems pros... Scalability, Reliability, Saving,... and cons complexity, management, security The Internet has three valuable fundamental assets... Information Computing resources Bandwidth ...all of which are vastly under utilized, partly due to the traditional client-server model Why P2P? No single search engine can locate and catalog the ever-increasing amount of information on the Web in a timely way Moreover, a huge amount of information is transient and not subject to capture by techniques such as Web crawling 8 Google claims that it searches about 1.3x10web pages Finding useful information in real time is increasingly difficult! 14 Why P2P? Although miles of new fiber have been installed, the new bandwidth gets little use if everyone goes to Yahoo for content and to eBay Instead, hot spots just get hotter while cold pipes remain cold This is partly why most people still feel the congestion over the Internet while a single fiber‘s bandwidth has increased by a factor of 106 since 1975, doubling every 16 months Why P2P? P2P potentially can eliminating the single- source bottleneck P2P can be used to distribute data and control and load-balance requests across the Net P2P potentially eliminates the risk of a single point of failure P2P infrastructure allows direct access and shared space, and this can enable remote maintenance capability 15 P2P impact today (1) Widespread adoption KaZaAœ 360 million downloads (1.3M/week) one of the most popular applications ever! leading to (almost) zero-cost content distribution: is forcing companies to change their business models and might impact copyright laws FastTrack 2,460,120 eDonkey 1,987,097 Overnet 1,261,568 iMesh 803,420 Sources: www.slyck.com, Warez 440,289 www.kazaa.com, July ‘04 Gnutella 389,678 P2P impact today (2) P2P applications market (Solomon Smith Barney) 5.800 milliondollars in 2003 36.500 milliondollars in 2004 Resources saving Seti@Home $1.5M / year in additional power consumed 16 P2P impact today (3) P2P traffic in the network from 20% up to80%!! Driving adoption of consumer broadband Need for control accordingtonetwork providers and administrators Gnutella The focus is on a decentralized method of searching for files Central directory server no longer the bottleneck More difficult to —pull plug“ Each application instance serves to: Store selected files Route queries from and to its neighboring peers Respond to queries if file stored locally Serve files 17 Gnutella Gnutella history: 3/14/00: release by AOL, almost immediately withdrawn Became open source Many iterations to fix poor initial design (poor design turned many people off) Issues: How much traffic does one query generate? How many hosts can it support at once? What is the latency associated with querying? Is there a bottleneck? Gnutella: Searching Searching by flooding: A Query packet might ask, "Do you have any content that matches the string ”Homer"?Ifa node does not have the requested file, then 7 (default set by Gnutella) of its neighbors are queried. If the neighbors do not have it, they contact 7 of their neighbors. Maximum hop count: 10 (this is called time-to- live TTL) Reverse path forwarding for responses (not files) 18 Gnutella: Searching Gnutella: Searching 19 Gnutella: Searching Downloading Peers respond with a —QueryHit“ (contains contact info) File transfers use direct connection using

Load more