Gnutella Network
Total Page:16
File Type:pdf, Size:1020Kb
Peer-to-PeerArchitectureCaseStudy:GnutellaNetwork∗ MateiRipeanu [email protected] DepartmentofComputerScience TheUniversityofChicago 1100E.58thStreet,ChicagoIL60637 Tel:(773)955-4040(ext.57395) Fax:(773)702-8487 Abstract Despite recent excitement generated by the peer-to-peer (P2P) paradigm and the surprisingly rapiddeploymentofsomeP2Papplications,therearefewquantitativeevaluationsofP2Psystems behavior. The open architecture, achieved scale, and self-organizing structure of the Gnutella network make it an interesting P2P architecture to study. Like most other P2P applications, Gnutellabuilds,attheapplicationlevel,avirtualnetworkwithitsownroutingmechanisms.The topologyofthisvirtualnetworkandtheroutingmechanismsusedhaveasignificantinfluenceon applicationpropertiessuchasperformance,reliability,andscalability.Wehavebuilta“crawler” to extract the topology of Gnutella’s application level network. In this paper we analyze the topologygraphandevaluategeneratednetworktraffic.Thetwomajorfindingswefocusonare: (1)althoughGnutellaisnotapurepower-lawnetwork,itscurrentconfigurationhasthebenefits andthedrawbacksofapower-lawstructure,and(2)theGnutellavirtualnetworktopologydoes notmatchwelltheunderlyingInternettopology,henceleadingtoineffectiveuseofthephysical networkinginfrastructure.ThesefindingsguideustoproposechangestotheGnutellaprotocol and implementations that may bring significant performance and scalability improvements. AlthoughGnutellanetworkmightfade,webelievetheP2Pparadigmisheretostay.Inthislight, ourfindingsaswellasourmeasurementandanalysistechniquesbringpreciousinsightintoP2P systemdesigntradeoffs. Keywords:peer-to-peersystemevaluation,self-organizednetworks,power-lawnetwork, topologyanalysis. 1. Introduction Peer-to-peersystems(P2P)haveemergedasasignificantsocialandtechnicalphenomenonoverthe last year. They provide infrastructure for communities that share CPU cycles (e.g., SETI@Home, Entropia) and/or storage space (e.g., Napster, FreeNet, Gnutella), or that support collaborative environments(Groove).Twofactorshavefosteredtherecentexplosivegrowthofsuchsystems:first, thelowcostandhighavailabilityoflargenumbersofcomputingandstorageresources,andsecond, increasednetworkconnectivity.Asthesetrendscontinue,theP2Pparadigmisboundtobecomemore popular. Unliketraditionaldistributedsystems,P2Pnetworksaimtoaggregatelargenumbersofcomputersthat joinandleavethenetworkfrequentlyandthatmightnothavepermanentnetwork(IP)addresses.In ∗ AnextendedversionofthispaperwaspublishedasUniversityofChicagoTechnicalReportTR-2001-26. 1 pureP2Psystems,individualcomputerscommunicatedirectlywitheachotherandshareinformation andresourceswithoutusingdedicatedservers.Acommoncharacteristicofthisnewbreedofsystems isthattheybuild, attheapplicationlevel, avirtualnetworkwithitsownroutingmechanisms. The topology of the virtual network and the routing mechanisms used have a significant impact on application properties such as performance, reliability, and, in some cases, anonymity. The virtual topologyalsodeterminesthecommunicationcostsassociatedwithrunningtheP2Papplication,bothat individualhostsandintheaggregate.NotethatthedecentralizednatureofpureP2Psystemsmeans thatthesepropertiesareemergentproperties,determinedbyentirelylocaldecisionsmadebyindividual resources, based only on local information: we are dealing with a self-organized network of independententities. Theseconsiderationshavemotivatedustoconductadetailedstudyofthetopologyandprotocolsofa popularP2Psystem:Gnutella.Inthisstudy,webenefitedfromGnutella’slargeexistinguserbaseand open architecture, and, in effect, use the public Gnutella network as a large-scale, if uncontrolled, testbed.Weproceededasfollows.First,wecapturedthenetworktopology,itsgeneratedtraffic,and dynamicbehavior.Then,weusedthisrawdatatoperformamacroscopicanalysisofthenetwork,to evaluatecostsandbenefitsoftheP2Papproach,andtoinvestigatepossibleimprovementsthatwould allowbetterscalingandincreasedreliability. OurmeasurementsandanalysisoftheGnutellanetworkaredrivenbytwoprimaryquestions.Thefirst concernsitsconnectivitystructure.Recentresearch[1,8,7]showsthatnetworksasdiverseasnatural networksformedbymoleculesinacell,networksofpeopleinasocialgroup,ortheInternet,organize themselvessothatmostnodeshavefewlinkswhileatinynumberofnodes,calledhubs,havealarge numberoflinks.[14]findsthatnetworksfollowingthisorganizationalpattern(power-lawnetworks) display an unexpected degree of robustness: the ability of their nodes to communicate is unaffected evenbyextremelyhighfailurerates.However,errortolerancecomesatahighprice:thesenetworks are vulnerable to attacks, i.e., to the selection and removal of a few nodes that provide most of the network’s connectivity. We show here that, although Gnutella is not a pure power-law network, it preserves good fault tolerance characteristics while being less dependent than a pure power-law networkonhighlyconnectednodesthatareeasytosingleout(andattack). The second question concerns how well (if at all) Gnutella virtual network topology maps to the physicalInternetinfrastructure.Therearemultiplereasonstoanalyzethisissue.First,itisaquestion ofcrucialimportanceforInternetServiceProviders(ISP):ifthevirtualtopologydoesnotfollowthe physicalinfrastructure,thentheadditionalstressontheinfrastructureand,consequently,thecostsfor ISPs,areimmense.Thispointhasbeenraisedonvariousoccasions[9,12]but,asfarasweknow,we are the first to provide a quantitative evaluation on P2P application and Internet topology (mis)matching.Second,thescalabilityofanyP2Papplicationisultimatelydeterminedbyitsefficient useofunderlyingresources. WearenotthefirsttoanalyzetheGnutellanetwork.Inparticular,theDistributedSearchSolutions (DSS)group[15]haspublishedresultsoftheirGnutellasurveys[4,5],andothershaveusedtheirdata toanalyzeGnutellausers’behavior[2]andtoanalyzesearchprotocolsforpower-lawnetworks[6]. However,ournetworkcrawlingandanalysistechnology(developedindependentlyofthiswork)goes significantly further in terms of scale (both spatial and temporal) and sophistication. While DSS presentsonlyrawfactsaboutthenetwork,weanalyzethegeneratednetworktraffic,findpatternsin networkorganization,andinvestigateitsefficiencyinusingtheunderlyingnetworkinfrastructure. Therestofthepaperisstructuredasfollows:thenextsectionsuccinctlydescribesGnutellaprotocol andapplication.Section3introducesthecrawlerwedevelopedtodiscoverGnutella’svirtualnetwork 2 topology.InSection4weanalyzethenetworkandanswerthequestionsintroducedintheprevious paragraphs.WeconcludeinSection5. 2. GnutellaProtocol:DesignGoalsandDescription The Gnutella protocol [3] is an open, decentralized group membership and search protocol, mainly usedforfilesharing.ThetermGnutellaalsodesignatesthevirtualnetworkofInternetaccessiblehosts runningGnutella-speakingapplications(thisisthe“Gnutellanetwork”wemeasure)andanumberof smaller,andoftenprivate,disconnectednetworks. AsmostP2Pfilesharingapplications,Gnutellaprotocolwasdesignedtomeetthefollowinggoals: o Abilitytooperateinadynamicenvironment.P2Papplicationsoperateindynamicenvironments, wherehostsmayjoinorleavethenetworkfrequently.Theymustachieveflexibilityinorderto keepoperatingtransparentlydespiteaconstantlychangingsetofresources. o Performance and Scalability. P2P paradigm shows its full potential only on large-scale deploymentswherethelimitsofthetraditionalclient/serverparadigmbecomeobvious.Moreover, scalabilityisimportantasP2Papplicationsexhibitwhateconomistscallthe“networkeffect”[10]: thevalueofanetworktoanindividualuserincreaseswiththetotalnumberofusersparticipatingin the network. Ideally, when increasing the number of nodes, aggregate storage space and file availabilityshould growlinearly, responsetime shouldremainconstant, whilesearchthroughput shouldremainhighorgrow. o Reliability.Externalattacksshouldnotcausesignificantdataorperformanceloss. o Anonymity. Anonymity is valued as a means to protect privacy of people seeking or providing informationthatmaynotbepopular. Gnutella nodes, calledservents by developers, perform tasks normally associated with both SERVers andcliENTS.Theyprovideclient-sideinterfacesthroughwhichuserscanissuequeriesandviewsearch results,acceptqueriesfromotherservents,checkformatchesagainsttheirlocaldataset,andrespond withcorrespondingresults.Thesenodesarealsoresponsibleformanagingthebackgroundtrafficthat spreadstheinformationusedtomaintainnetworkintegrity. Inordertojointhesystemanewnode/serventinitiallyconnectstooneofseveralknownhoststhatare almostalwaysavailable(e.g.,gnutellahosts.com).Onceattachedtothenetwork(havingoneormore openconnectionswithnodesalreadyinthenetwork),nodessendmessagestointeractwitheachother. Messagescanbebroadcasted(i.e.,senttoallnodeswithwhichthesenderhasopenTCPconnections) orsimplyback-propagated(i.e.,sentonaspecificconnectiononthereverseofthepathtakenbyan initial, broadcasted, message). Several features of the protocol facilitate this broadcast/back-propagation mechanism. First, each message has a randomly generated identifier. Second, each node keeps a short memory of the recently routed messages, used to prevent re-broadcastingandimplementback-propagation.Third,messagesareflaggedwithtime-to-live(TTL) and“hopspassed”fields. Themessagesallowedinthenetworkare: ° GroupMembership(PINGandPONG)Messages.Anodejoiningthenetworkinitiatesabroadcasted PINGmessagetoannounceitspresence.WhenanodereceivesaPINGmessageitforwardsittoits neighborsandinitiatesaback-propagatedPONGmessage.ThePONGmessagecontainsinformation aboutthenodesuchasitsIPaddressandthenumberandsizeofsharedfiles. ° Search(QUERYandQUERYRESPONSE)Messages.QUERYmessagescontainauserspecifiedsearch