Substrate Support for Peer-To-Peer Applications

SUBSTRATE SUPPORT FOR PEER-TO-PEER APPLICATIONS A Dissertation Presented to the Faculty of the Graduate School of Cornell University in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy by Vivek Vishnumurthy January 2009 c 2009 Vivek Vishnumurthy ALL RIGHTS RESERVED SUBSTRATE SUPPORT FOR PEER-TO-PEER APPLICATIONS Vivek Vishnumurthy, Ph.D. Cornell University 2009 Peer-to-peer (P2P) applications in general, and unstructured applications in particular, have been very popular in the recent past. In this thesis, we identify common problems encountered in the course of developing different diverse peer-to-peer applications, and propose solutions to them. The broad problems we study are, namely, (i) load-balancing in heterogeneous unstructured P2P networks, where capacities to support load differ among the different members of the network, and application load is to be distributed in accordance with members’ capacities to support it, and (ii) “extreme” nearest neighbor discovery in P2P networks, where the intent is to discover the latency-wise nearest peer in a P2P network, even when the nearest peer is in the same extended LAN or cam- pus network. The goal of this thesis is to come up with a powerful set of basic mechanisms that the developers of many different P2P applications can reuse, rather than having to repeatedly reinvent the same solutions. We examine two main causes of load in unstructured networks: the under- lying random graph, and the process of random node selection, where random peers are periodically picked to handle application load. We extensively evaluate different approaches to do heterogeneous random graph construction and peer selection, and identify our Swaplinks algorithm as the best approach. Swaplinks builds robust graphs where node degrees are close to their desired degrees, provides a good base to perform random peer selection, and is virtu- ally free of tuning knobs, making it very practical to deploy. In our study of the extreme-nearest neighbor discovery problem, we identify and demonstrate a condition we call the clustering condition caused by the Inter- net last-hop architecture – under the clustering condition, many different peers are located at about the same latency from one another, making it expensive for previous nearest-peer solutions to correctly find the nearest peer. We propose different solutions to overcome this problem, and show, using preliminary evaluations, that one of them is very attractive. BIOGRAPHICAL SKETCH Vivek was born in the city of Shimoga in Karnataka state in India, and was brought up in Bangalore, Karnataka, India. He got his Bachelor of Technology (B.Tech.) degree from the Indian Institute of Technology at Madras (IIT-Madras) in 2002. He has been in Cornell University from August 2002 to August 2008. He received an M.S. degree from Cornell in 2007 and is expecting the Ph.D. degree to be conferred in January 2009. He will be joining MokaFive, a start-up based on desktop virtualization, as a Software Engineer in September 2008. iii ACKNOWLEDGEMENTS Firstly, I would like to convey my gratitude to my advisor Paul Francis for his guidance throughout my Ph.D. research. His insistence on simplicity and prac- ticality in the course of research is something I hope to retain over my future career. I would also like to thank my committee members Jon Kleinberg and David Shmoys, and Emin G ün Sirer, Robbert van Renesse, and Eva Tardos for their help and comments at various points of my research. I am grateful to Sergio Gelato, Daniel Kartch, Steve Holland, Aleksey Nogin, Andrew Myers, and Nate Nystrom for developing the Latex template this thesis is based on. Many thanks to the people who laid out the plans of beautiful Cornell Uni- versity and Ithaca – towards the end of my PhD I was even enjoying the cold winters here! I have been lucky to have friendly and helpful officemates throughout my PhD. Heartfelt thanks to Ashwin, Oren, Liviu, Parvati, and Panda. Many thanks to Saikat, Hitesh, Bernard, Dan, Alan, Oliver, Jed, and Kevin in the Systems Lab for all the fun technical discussions and non-technical interac- tions over the years. I have been extremely fortunate to have an awesome and caring set of friends – thanks to Animashree, Ashwin, Lavanya, Mahesh, Muthu, Pappu, Parvathi- nathan, Parvati, Prakash Linga, Smita, Surabhi, and Vidhyashankar, without whom my stay in Cornell and Ithaca would have been far less enjoyable. Also, thanks to Lavanya, Ashwin, and Parvati for making sure (to the extent possible!) that I was actually working on my thesis when I was supposed to! Finally, I will forever be indebted to my parents and my sister for having instilled into me the value of education and hard work, for their constant and iv unwavering support, and for always encouraging me to strive for the best. v TABLE OF CONTENTS BiographicalSketch.............................. iii Acknowledgements.............................. iv TableofContents ............................... vi ListofTables.................................. viii ListofFigures ................................. ix 1 Introduction 1 1.1 Contributionsofthisthesis . 2 1.2 Bibliography............................... 6 2 Related Work 7 2.1 Load-balancing in Unstructured P2P Networks . .. 7 2.1.1 Unstructured Graph Construction . 7 2.1.2 RandomPeerSelection. 10 2.2 Heterogeneous Peer Selection Using Structured Approaches ... 13 2.3 Findingthenearestpeer . 16 3 Heterogeneous Overlay Construction and Random Node Selection in Unstructured P2P Networks 22 3.1 Introduction ............................... 22 3.2 InitialNodeDiscovery . 26 3.3 Algorithms for graph construction . 27 3.3.1 SelfLoops(SL).......................... 32 3.3.2 TheInverse-Probabilitywalks . 34 3.3.3 IterativeScaling(IS) . 35 3.3.4 SomeIssueswithBiasedWalkApproaches . 36 3.3.5 Swaplinks(SW)......................... 38 3.4 SelectionWalks ............................. 41 3.5 Distributed Outdegree Computation . 43 3.6 ExperimentalResults . 45 3.6.1 Graph Construction (Homogeneous Case) . 49 3.6.2 GraphConstructionUnderHeterogeneity . 51 3.6.3 Quality of Random Selection on Homogeneous Graphs . 54 3.6.4 SelectionwithHeterogeneity . 58 3.6.5 ScalingtoLargerSizes . 59 3.6.6 TheCursorApproach . 61 3.7 ConclusionsandDiscussion . 62 3.8 Acknowledgements........................... 63 vi 4 Comparison of Structured and Unstructured Approaches to Heteroge- neous Peer Selection 64 4.1 Introduction ............................... 64 4.2 SwaplinksImplementation . 66 4.3 AdaptingtheBambooDHTtoHeterogeneity . 68 4.3.1 TheBambooDistributedHashTable . 68 4.3.2 KRB ............................... 69 4.4 PerformanceEvaluation . 73 4.4.1 Evaluation under representative churn scenarios . ... 74 4.4.2 Extremechurn ......................... 86 4.4.3 EvaluationoverPlanetLab. 90 4.4.4 Smart-Pinging.......................... 92 4.5 ConclusionsandDiscussion . 94 4.6 Acknowledgments ........................... 96 5 FindingthenearestpeerinP2Pnetworks 97 5.1 Introduction ............................... 97 5.2 TheLast-HopClusteringEffectintheInternet . 100 5.2.1 TheClusteringCondition . 104 5.2.2 Common Assumptions Behind Nearest-Peer Algorithms . 105 5.2.3 Behavior of Sample Nearest-Peer Algorithms Under the ClusteringCondition. 107 5.3 ClusteringConditionintheInternet . 109 5.3.1 LatencyMeasurementResultsoverDNSservers . 110 5.3.2 MeasurementoverAzureusClientIPAddresses . 116 5.4 Meridian Simulations under the Clustering Condition . ..... 120 5.5 MechanismstoHandleClusteringEffect. 125 5.6 ConclusionsandFutureWork . 132 5.7 Acknowledgments ........................... 133 6 Summary,ConclusionsandFutureWork 134 6.1 Limitations, Future Enhancements, and Open Problems . 136 6.2 OffshootsfromThesisResearch . 137 Bibliography 140 vii LIST OF TABLES 3.1 Summary of the different walk strategies for a walk at node A; A node N is a neighbor of A, and wN is the probability that a walk at A is forwarded to N. virt − deg denotesthevirtual degree. 34 3.2 Homogeneous graph construction: Degree distribution, diameter, and build-loads of the different mechanisms. All graphs except Scamp have exactly 5 outlinks per node, and use 10-hop build walks. Diam and Dist are the average estimated diameter and the average inter- node distance, estimated using a sample set of 20 nodes. Dev(Deg) is the standard deviation of degrees, Indeg-95pc is the average 95th per- centile value, and MaxIndeg is the average maximum value of the in- degree. BLoad-Add and BLoad-Kill are the loads caused due to node addition and node deletion, resp. (*)Scamp’s Indeg-95pc and MaxIn- deg values correspond to the total degree, since its outdegree is not a constant. ................................. 48 3.3 Graphparametersfor50,000nodechurnedgraphs. .. 60 4.1 Swaplinks results for moderate and extreme capacity distribu- tionsunderhighandlowchurn. 79 4.2 Modification of various timeout parameters according to churn settings. “Original” denotes the values in the original Bamboo code distribution. The first four parameters determine the fre- quency of pings and exchanges of neighbor-sets between different nodes. †discard nbr timeout denotes the time between when a Bamboo node suspects a neighbor to be down (due to failure of message delivery) and when it actually decides it’s down (due to lack of response to subsequent pings). ‡KRB-period is the period between successive KRB load messages sent to random locations inthenetwork. ............................. 81 4.3 KRB results for moderate and extreme capacity

Substrate Support for Peer-To-Peer Applications

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support