Gnutella Network

Gnutella Network

Peer-to-PeerArchitectureCaseStudy:GnutellaNetwork∗ MateiRipeanu [email protected] DepartmentofComputerScience TheUniversityofChicago 1100E.58thStreet,ChicagoIL60637 Tel:(773)955-4040(ext.57395) Fax:(773)702-8487 Abstract Despite recent excitement generated by the peer-to-peer (P2P) paradigm and the surprisingly rapiddeploymentofsomeP2Papplications,therearefewquantitativeevaluationsofP2Psystems behavior. The open architecture, achieved scale, and self-organizing structure of the Gnutella network make it an interesting P2P architecture to study. Like most other P2P applications, Gnutellabuilds,attheapplicationlevel,avirtualnetworkwithitsownroutingmechanisms.The topologyofthisvirtualnetworkandtheroutingmechanismsusedhaveasignificantinfluenceon applicationpropertiessuchasperformance,reliability,andscalability.Wehavebuilta“crawler” to extract the topology of Gnutella’s application level network. In this paper we analyze the topologygraphandevaluategeneratednetworktraffic.Thetwomajorfindingswefocusonare: (1)althoughGnutellaisnotapurepower-lawnetwork,itscurrentconfigurationhasthebenefits andthedrawbacksofapower-lawstructure,and(2)theGnutellavirtualnetworktopologydoes notmatchwelltheunderlyingInternettopology,henceleadingtoineffectiveuseofthephysical networkinginfrastructure.ThesefindingsguideustoproposechangestotheGnutellaprotocol and implementations that may bring significant performance and scalability improvements. AlthoughGnutellanetworkmightfade,webelievetheP2Pparadigmisheretostay.Inthislight, ourfindingsaswellasourmeasurementandanalysistechniquesbringpreciousinsightintoP2P systemdesigntradeoffs. Keywords:peer-to-peersystemevaluation,self-organizednetworks,power-lawnetwork, topologyanalysis. 1. Introduction Peer-to-peersystems(P2P)haveemergedasasignificantsocialandtechnicalphenomenonoverthe last year. They provide infrastructure for communities that share CPU cycles (e.g., SETI@Home, Entropia) and/or storage space (e.g., Napster, FreeNet, Gnutella), or that support collaborative environments(Groove).Twofactorshavefosteredtherecentexplosivegrowthofsuchsystems:first, thelowcostandhighavailabilityoflargenumbersofcomputingandstorageresources,andsecond, increasednetworkconnectivity.Asthesetrendscontinue,theP2Pparadigmisboundtobecomemore popular. Unliketraditionaldistributedsystems,P2Pnetworksaimtoaggregatelargenumbersofcomputersthat joinandleavethenetworkfrequentlyandthatmightnothavepermanentnetwork(IP)addresses.In ∗ AnextendedversionofthispaperwaspublishedasUniversityofChicagoTechnicalReportTR-2001-26. 1 pureP2Psystems,individualcomputerscommunicatedirectlywitheachotherandshareinformation andresourceswithoutusingdedicatedservers.Acommoncharacteristicofthisnewbreedofsystems isthattheybuild, attheapplicationlevel, avirtualnetworkwithitsownroutingmechanisms. The topology of the virtual network and the routing mechanisms used have a significant impact on application properties such as performance, reliability, and, in some cases, anonymity. The virtual topologyalsodeterminesthecommunicationcostsassociatedwithrunningtheP2Papplication,bothat individualhostsandintheaggregate.NotethatthedecentralizednatureofpureP2Psystemsmeans thatthesepropertiesareemergentproperties,determinedbyentirelylocaldecisionsmadebyindividual resources, based only on local information: we are dealing with a self-organized network of independententities. Theseconsiderationshavemotivatedustoconductadetailedstudyofthetopologyandprotocolsofa popularP2Psystem:Gnutella.Inthisstudy,webenefitedfromGnutella’slargeexistinguserbaseand open architecture, and, in effect, use the public Gnutella network as a large-scale, if uncontrolled, testbed.Weproceededasfollows.First,wecapturedthenetworktopology,itsgeneratedtraffic,and dynamicbehavior.Then,weusedthisrawdatatoperformamacroscopicanalysisofthenetwork,to evaluatecostsandbenefitsoftheP2Papproach,andtoinvestigatepossibleimprovementsthatwould allowbetterscalingandincreasedreliability. OurmeasurementsandanalysisoftheGnutellanetworkaredrivenbytwoprimaryquestions.Thefirst concernsitsconnectivitystructure.Recentresearch[1,8,7]showsthatnetworksasdiverseasnatural networksformedbymoleculesinacell,networksofpeopleinasocialgroup,ortheInternet,organize themselvessothatmostnodeshavefewlinkswhileatinynumberofnodes,calledhubs,havealarge numberoflinks.[14]findsthatnetworksfollowingthisorganizationalpattern(power-lawnetworks) display an unexpected degree of robustness: the ability of their nodes to communicate is unaffected evenbyextremelyhighfailurerates.However,errortolerancecomesatahighprice:thesenetworks are vulnerable to attacks, i.e., to the selection and removal of a few nodes that provide most of the network’s connectivity. We show here that, although Gnutella is not a pure power-law network, it preserves good fault tolerance characteristics while being less dependent than a pure power-law networkonhighlyconnectednodesthatareeasytosingleout(andattack). The second question concerns how well (if at all) Gnutella virtual network topology maps to the physicalInternetinfrastructure.Therearemultiplereasonstoanalyzethisissue.First,itisaquestion ofcrucialimportanceforInternetServiceProviders(ISP):ifthevirtualtopologydoesnotfollowthe physicalinfrastructure,thentheadditionalstressontheinfrastructureand,consequently,thecostsfor ISPs,areimmense.Thispointhasbeenraisedonvariousoccasions[9,12]but,asfarasweknow,we are the first to provide a quantitative evaluation on P2P application and Internet topology (mis)matching.Second,thescalabilityofanyP2Papplicationisultimatelydeterminedbyitsefficient useofunderlyingresources. WearenotthefirsttoanalyzetheGnutellanetwork.Inparticular,theDistributedSearchSolutions (DSS)group[15]haspublishedresultsoftheirGnutellasurveys[4,5],andothershaveusedtheirdata toanalyzeGnutellausers’behavior[2]andtoanalyzesearchprotocolsforpower-lawnetworks[6]. However,ournetworkcrawlingandanalysistechnology(developedindependentlyofthiswork)goes significantly further in terms of scale (both spatial and temporal) and sophistication. While DSS presentsonlyrawfactsaboutthenetwork,weanalyzethegeneratednetworktraffic,findpatternsin networkorganization,andinvestigateitsefficiencyinusingtheunderlyingnetworkinfrastructure. Therestofthepaperisstructuredasfollows:thenextsectionsuccinctlydescribesGnutellaprotocol andapplication.Section3introducesthecrawlerwedevelopedtodiscoverGnutella’svirtualnetwork 2 topology.InSection4weanalyzethenetworkandanswerthequestionsintroducedintheprevious paragraphs.WeconcludeinSection5. 2. GnutellaProtocol:DesignGoalsandDescription The Gnutella protocol [3] is an open, decentralized group membership and search protocol, mainly usedforfilesharing.ThetermGnutellaalsodesignatesthevirtualnetworkofInternetaccessiblehosts runningGnutella-speakingapplications(thisisthe“Gnutellanetwork”wemeasure)andanumberof smaller,andoftenprivate,disconnectednetworks. AsmostP2Pfilesharingapplications,Gnutellaprotocolwasdesignedtomeetthefollowinggoals: o Abilitytooperateinadynamicenvironment.P2Papplicationsoperateindynamicenvironments, wherehostsmayjoinorleavethenetworkfrequently.Theymustachieveflexibilityinorderto keepoperatingtransparentlydespiteaconstantlychangingsetofresources. o Performance and Scalability. P2P paradigm shows its full potential only on large-scale deploymentswherethelimitsofthetraditionalclient/serverparadigmbecomeobvious.Moreover, scalabilityisimportantasP2Papplicationsexhibitwhateconomistscallthe“networkeffect”[10]: thevalueofanetworktoanindividualuserincreaseswiththetotalnumberofusersparticipatingin the network. Ideally, when increasing the number of nodes, aggregate storage space and file availabilityshould growlinearly, responsetime shouldremainconstant, whilesearchthroughput shouldremainhighorgrow. o Reliability.Externalattacksshouldnotcausesignificantdataorperformanceloss. o Anonymity. Anonymity is valued as a means to protect privacy of people seeking or providing informationthatmaynotbepopular. Gnutella nodes, calledservents by developers, perform tasks normally associated with both SERVers andcliENTS.Theyprovideclient-sideinterfacesthroughwhichuserscanissuequeriesandviewsearch results,acceptqueriesfromotherservents,checkformatchesagainsttheirlocaldataset,andrespond withcorrespondingresults.Thesenodesarealsoresponsibleformanagingthebackgroundtrafficthat spreadstheinformationusedtomaintainnetworkintegrity. Inordertojointhesystemanewnode/serventinitiallyconnectstooneofseveralknownhoststhatare almostalwaysavailable(e.g.,gnutellahosts.com).Onceattachedtothenetwork(havingoneormore openconnectionswithnodesalreadyinthenetwork),nodessendmessagestointeractwitheachother. Messagescanbebroadcasted(i.e.,senttoallnodeswithwhichthesenderhasopenTCPconnections) orsimplyback-propagated(i.e.,sentonaspecificconnectiononthereverseofthepathtakenbyan initial, broadcasted, message). Several features of the protocol facilitate this broadcast/back-propagation mechanism. First, each message has a randomly generated identifier. Second, each node keeps a short memory of the recently routed messages, used to prevent re-broadcastingandimplementback-propagation.Third,messagesareflaggedwithtime-to-live(TTL) and“hopspassed”fields. Themessagesallowedinthenetworkare: ° GroupMembership(PINGandPONG)Messages.Anodejoiningthenetworkinitiatesabroadcasted PINGmessagetoannounceitspresence.WhenanodereceivesaPINGmessageitforwardsittoits neighborsandinitiatesaback-propagatedPONGmessage.ThePONGmessagecontainsinformation aboutthenodesuchasitsIPaddressandthenumberandsizeofsharedfiles. ° Search(QUERYandQUERYRESPONSE)Messages.QUERYmessagescontainauserspecifiedsearch

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    11 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us