Measuring and Characterizing Properties of Peer-To-Peer Systems
Total Page:16
File Type:pdf, Size:1020Kb
MEASURING AND CHARACTERIZING PROPERTIES OF PEER-TO-PEER SYSTEMS by DANIEL STUTZBACH A DISSERTATION Presented to the Department of Computer and Information Science and the Graduate School of the University of Oregon in partial fulfillment of the requirements for the degree of Doctor of Philosophy December 2006 ii “Measuring and Characterizing Properties of Peer-to-Peer Systems,” a dissertation prepared by Daniel Stutzbach in partial fulfillment of the requirements for the Doctor of Philosophy degree in the Department of Computer and Information Science. This dissertation has been approved and accepted by: Prof. Reza Rejaie, Chair of the Examining Committee Date Committee in charge: Prof. Reza Rejaie, Chair Prof. Andrzej Proskurowski Prof. Virginia Lo Prof. David Levin Dr. Walter Willinger Accepted by: Dean of the Graduate School iii An Abstract of the Dissertation of Daniel Stutzbach for the degree of Doctor of Philosophy in the Department of Computer and Information Science to be taken December 2006 Title: MEASURING AND CHARACTERIZING PROPERTIES OF PEER-TO-PEER SYSTEMS Approved: Prof. Reza Rejaie Peer-to-peer systems are becoming increasingly popular, with millions of simul- taneous users and a wide range of applications. Understanding existing systems and devising new peer-to-peer techniques relies on access to representative models, derived from empirical observations, of user behavior and peer-to-peer system behavior on a real network. However, it is challenging to accurately capture behavior in peer-to- peer systems because they are distributed, large, and rapidly changing. While some prior work does study the properties of peer-to-peer systems, they do not quantify the accuracy of their measurement techniques, sometimes leading to significant error. This dissertation empirically explores and characterizes a wide variety of prop- erties of peer-to-peer systems. The properties examined fall into four groups, along two axes: properties of peers versus properties of how peers are connected, and static properties versus dynamic properties. To study these properties, this dissertation develops and assesses two measurement techniques: (i) a crawler for capturing global iv state and (ii) a Metropolized random walk approach for collecting samples. Using these techniques to conduct empirical studies of widely-deployed peer-to-peer sys- tems, this dissertation presents empirical results to suggest useful models for key properties of peer-to-peer systems. In the end, this dissertation significantly deepens our understanding of peer-to-peer systems and lays the groundwork for the accurate measurement of other properties of peer-to-peer systems in the future. This dissertation includes my previously published and my co-authored materials. v CURRICULUM VITAE NAME OF AUTHOR: Daniel Stutzbach PLACE OF BIRTH: Attleboro, MA, U.S.A. DATE OF BIRTH: March 28, 1977 GRADUATE AND UNDERGRADUATE SCHOOLS ATTENDED: University of Oregon Worcester Polytechnic Institute DEGREES AWARDED: Doctor of Philosophy in Computer and Information Science, 2006, University of Oregon Bachelor of Science in Electrical Engineering, 1998, Worcester Polytechnic Institute AREAS OF SPECIAL INTEREST: Peer-to-Peer Networks, Network Measurement PROFESSIONAL EXPERIENCE: President, Stutzbach Enterprises, LLC, 2006— Research Assistant, University of Oregon, 2001–2006 Software Engineer Contractor, ADC, Inc., 2001 Senior Software Engineer, Assured Digital, Inc., 1999–2001 Software Test Engineer, Assured Digital, Inc., 1998–1999 Embedded Systems Programmer, Microwave Radio Communications, 1997–1998 vi AWARDS AND HONORS: IMC Travel Grant, 2006 Clarence and Lucille Dunbar Scholarship, 2006–2007 Upsilon Pi Epsilon Membership, 2006 INFOCOM Travel Grant, 2006 First place, UO Programming Competition, 2006 SIGCOMM Travel Grant, 2005 First place, UO Programming Competition, 2005 IMC Travel Grant, 2004 ICNP Travel Grant, 2004 First place, UO Programming Competition, 2004 Honorable Mention, ACM ICPC World Finals Programming Competition, 2003 First place, UO Programming Competition, 2003 First Place, ACM ICPC Pacific Northwest Programming Competition, 2002 First place, UO Programming Competition, 2002 First place, WPI ACM Programming Competition, 1998 First place, WPI Tau Beta Pi Design Competition, 1997 Scholarship Winner, Rhode Island Distinguished Merit Competition in Computer Science, 1995 PUBLICATIONS: D. Stutzbach and R. Rejaie, “Understanding churn in peer-to-peer networks,” in Proc. Internet Measurement Conference, Rio de Janeiro, Brazil, Oct. 2006. D. Stutzbach, R. Rejaie, N. Duffield, S. Sen, and W. Willinger, “On unbiased sampling for unstructured peer-to-peer networks,” in Proc. Internet Measurement Conference, Rio de Janeiro, Brazil, Oct. 2006. D. Stutzbach and R. Rejaie, “Improving lookup performance over a widely-deployed DHT,” in Proc. IEEE INFOCOM, Barcelona, Spain, Apr. 2006. vii D. Stutzbach, R. Rejaie, N. Duffield, S. Sen, and W. Willinger, “Sampling techniques for large, dynamic graphs,” in Proc. Global Internet Symposium, Barcelona, Spain, Apr. 2006. A. Rasti, D. Stutzbach, and R. Rejaie, “On the long-term evolution of the two-tier Gnutella overlay,” in Proc. Global Internet Symposium, Barcelona, Spain, Apr. 2006. J. Li, R. Bush, Z. M. Mao, T. Griffin, M. Roughan, D. Stutzbach, and E. Purpus, “Watching data streams toward a multi-homed sink under routing changes introduced by a BGP beacon,” in Proc. Passive and Active Measurement Workshop, Adelaide, Australia, Mar. 2006. S. Zhao, D. Stutzbach, and R. Rejaie, “Characterizing files in the modern Gnutella network: A measurement study,” in Proc. Multimedia Computing and Networking, San Jose, CA, Jan. 2006. D. Stutzbach, R. Rejaie, and S. Sen, “Characterizing unstructured overlay topologies in modern P2P file-sharing systems,” in Proc. Internet Measurement Conference, Berkeley, CA, Oct. 2005, pp. 49–62. D. Stutzbach and R. Rejaie, “Characterizing the two-tier Gnutella topology,” Extended Abstract in Proc. SIGMETRICS, Banff, AB, Canada, June 2005. D. Stutzbach, D. Zappala, and R. Rejaie, “The scalability of swarming peer-to-peer content delivery,” in Proc. IFIP Networking, Waterloo, Ontario, Canada, May 2005, pp. 15–26. D. Stutzbach and R. Rejaie, “Capturing accurate snapshots of the Gnutella network,” in Proc. Global Internet Symposium, Miami, FL, Mar. 2005, pp. 127–132. D. Stutzbach and R. Rejaie, “Evaluating the accuracy of captured snapshots by peer-to-peer crawlers,” Extended Abstract in Proc. Passive and Active Measurement Workshop, Boston, MA, Mar. 2005, pp. 353–357. viii ACKNOWLEDGMENTS I would like to thank my parents for providing me with the intellectual curiosity and work ethic required for any dissertation. Alisa Rata deserves special thanks for her emotional support and encouragement that greatly accelerated my progress. Her unending patience while I worked on my papers and this dissertation are particularly appreciated. My thanks also go out to Prof. Daniel Zappala for his tutelage during my initial years in graduate school, particularly for helping me to understand the difference between science and engineering. Early in my graduate school career, I had many fruitful discusses on the Gnutella Developer Forum mailing list, particularly with Greg Bildson, Raphael Manfredi, Serguei Osokine, Gordon Mohr, and Vinnie Falco. I would also like to thank Dr. Subhabrata Sen, Dr. Walter Willinger, Dr. Nick Duffield, Prof. Andrzej Proskurowski, Prof. Virginia Lo, and Prof. David Levin for insightful comments that opened me to new ideas and developing my sense of scientific rigor. I am also grateful for my friends and fellow Ph.D. students, Peter Boothe and Chris GauthierDickey, for their companionship and many stimulating conversations. Amir Rasti, Shanyu Zhao, and John Capehart are particularly thanked for their collaborative work that contributed to Chapters 5 and 6. The help of Joel Jaeggli, Lauradel Collins, Paul Block, and other members of the systems staff at the University of Oregon are greatly appreciated for assisting with hardware and software support for my experiments and post-processing. I am particularly appreciative of their patience when dealing with security concerns from alarmist security administrators and performance issues when I slowed their file server to a crawl. Robin High provided me with several useful insights and pointers on statistically analysis. My work on this dissertation was supported in part by by the National Science Foundation (NSF) under Grant No. Nets-NBD-0627202 and an unrestricted gift from ix Cisco Systems. Any opinions, findings, and conclusions or recommendations ex- pressed in this material are those of the author and do not necessarily reflect the views of the NSF or Cisco. I am also grateful to the federal Stafford Loan program which provided me loans that greatly improved my quality of life while a graduate student. BitTorrent tracker logs used in my study of churn in Chapter 7 were provided by Ernst Biersack and others from the Institut Eurecom, the Debian organization, and 3D Gamers. I am thankfully for their generosity in releasing this data. Finally, I am very thankful for the guidance of my adviser, Prof. Reza Rejaie, who continually pushed me to dig deeper and ask the next question. x To my parents, Ron and Joan Stutzbach xi TABLE OF CONTENTS Chapter Page 1. INTRODUCTION .............................. 1 2. BACKGROUND............................... 6 2.1 History.................................. 6 2.2 ModelingandDistributions. 9 2.3 GraphTheory.............................. 14 3. RELATEDWORK.............................