Peer to Peer Computing: the Hype, the Hard Problems, and the Quest for Solutions

Peer to Peer Computing: the hype, the hard problems, and the quest for solutions

Krishna Kant, Ravi Iyer Vijay Tewari Server Performance Architecture Peer-to-Peer Architecture Enterprise Architecture Lab Microprocessor Research Lab Intel Corporation, OR Intel Corporation, OR

• Presenter contact information: Krishna Kant, Intel Corp., Hillsboro, OR, Tel. (503) 712-3997, Email: [email protected]

• Aims/Learning objectives;

1. Gain knowledge about the emerging P2P computing paradigm 2. Gain knowledge about past, current and future systems that support P2P 3. Understand the major research issues in P2P computing 4. Understand performance and architectural issues for P2P peers.

• Duration: Full day (6 hours). Can be cut down to half-day, if necessary.

• Keywords: peer-to-peer computing, distributed systems, taxonomy, ﬁle sharing, cycle sharing, performance analysis

• Target audience;

1. Researchers interested in the emerging distributed computing or internet computing paradigms 2. Interested computer scientists who want to gain a basic understanding of what is new and interesting about P2P computing.

• Prerequisite knowledge of audience; General computer science background with some appreciation of distributed systems.

• Tutorial history; Was scheduled to be given at ICNP-2001 in Nov 2001, but tutorials were cancelled due to Sept 11 related issues. We currently have about 60 slides on the major topics (excluding details on P2P platforms such as Legion, JXTA, .NET, etc.) More slides will be added on experimental results that are underway. We expect a total of about 120 slides when completed.

1 1 Extended Abstract

Popularized by Napster and Gnutella type of file sharing solutions, peer to peer (P2P) computing has emerged at the forefront of Internet computing. The success of these and other early P2P services led to claims that P2P can easily unlock a vast pool of unused resources and establish itself as the new paradigm for computing. A more in-depth look, however, reveals that while the P2P approach has a good potential, pure P2P solutions have limited applicability, and several issues need to be addressed before it can attempt to replace traditional approaches such as client-server computing. This tutorial will introduce the current P2P landscape, discuss what is novel/unique about it, attempt to list the major challenges and discuss some approaches to addressing them. The tutorial shall start with a brief discussion of basic services that brought P2P computing to the forefront. Potential candidates here include Napster, Gnutella, Freenet, Seti@home, I-drive, etc. Based on this, we shall present a tentative definition of peer to peer computing, which we believe includes the following major aspects (a) no strict client-server relationship (b) incomplete global knowledge, and (c) ad-hoc nature of the P2P network. We shall also briefly touch upon the vast amount of work that exists on distributed systems and point out a number of distributed applications (for e.g. collaborative computing in a LAN environment, telemedicine, and video- conferencing) that could be considered as peer to peer applications. Many projects in the past have attempted to establish an infrastructure to support the capabilities and features discussed above in various environments. This includes the work on the network of workstations (NOW) that attempts to support various collaborative applications in a departmental LAN environment. Several systems including Globe, Globus and Legion have attempted to provide an infrastructure to do the same in the WAN environment as well. Most of this work is a product of university projects. Very recently, some major commercial players have claimed to be developing peer to peer support. The tutorial will first provide a somewhat detailed coverage of a few of these systems (including Sun’s JXTA, Microsoft’s .NET, Groove, etc.) and then discuss how these efforts support P2P computing. Based on these emerging P2P systems and others, we then identify a common set of requirements for P2P applications. We then attempt to map these requirements on to some middleware services (categorized into basic and advanced services) that would help the development of P2P applications. In order to explore the essential capabilities, features and challenges for P2P computing, we shall introduce a taxonomy that is organized according to two attribute classes: (1) application characteristics, and (2) environmental characteristics. Briefly, application characteristics include such aspects as data and metadata storage style, consistency model, resource usage style, QoS constraints, and reliability requirements. The environmental characteristics include network la- tency, security threats, failure environment, connectivity of peers, heterogeneity, and stability characteristics. The tutorial shall introduce the taxonomy and discuss classification of existing applications relative to it. The main purpose of the taxonomy is to lead into a variety of research issues in various P2P computing environment, which will be addressed next. A serious obstacle to the wide deployment of P2P applications is the frequent use of NAT (Network Address Translation) and firewalls in both business and home environments. Several controlled access mechnaisms (including the use of rendezvous points) have been proposed and will be discussed. A related problem is the lack of DNS entries for most of the peers that would be interested in participating in the P2P network, which makes addressing them difficult. The

2 tutorial shall discuss these and several other unique issues that need to be addressed anew by the research community since they have either not been encountered in the traditional distributed computing research or were not required to scale to the level of tens of thousands of nodes as might be the case for P2P computing. The last part of the tutorial will concentrate on the performance considerations in successfully supporting P2P computing. We will start by discussing some possible approaches to analyze the performance of a P2P file-sharing network. This includes the use of random graph models with non-uniform connectivity (leading to heavy-tailed degree behavior, as observed in real Gnutella networks) and incorporating suitable traffic characteristics. We will then present some specifics on our P2P network simulator that models a file sharing community. Based on extensive simula- tions, we study the efficiency and effectiveness of searches, the response time characteristics, the utilizations at each node and the impact of file migration on performance. In particular, we shall discuss results on effective migration techniques and resulting issues of stability and oscillations. We also discuss the need for more architectural features (hardware, O/S, and network protocol) that would allow more efficient searches, avoidance of duplicate information, and early suppression of unneeded responses.

2 Speaker Biographies

Krishna Kant received his Ph.D. degree in Computer Science from The University of Texas at Dallas in 1981. From 1981-1984 he was an assistant professor of computer science at Northwestern University, Evanston, IL. From 1985-1989, he was an assistant professor, and from 1989-1991 an associate professor of computer science at the Pennsylvania State University, University Park, PA. In 1988 he served with the Teletraﬃc Theory division at AT&T Bell Labs, Holmdel, NJ and in 1991 with the Integrated Systems Architecture division at BellCore, Piscataway, NJ. From 1992-1997, he was with Network Services Performance and Control group at Bellcore, Red Bank, NJ where he worked on a variety of narrowband and broadband signalling performance and congestion control issues. Since May 1997 he has been with the Server Architecture Lab at Intel Corp, Beaverton, OR, where he works on performance issues for Internet servers. His current research interests are in the areas of traﬃc characterization, performance modeling, and peer to peer computing. He is the author of a book titled Introduction to Computer System Performance Evaluation (New York: McGraw-Hill, 1992).

Ravi Iyer is currently a performance analyst in the Enterprise Architecture Laboratory at Intel Corporation. He is working on internet / database server workload characterization and on architectural / performance issues for front-end and back-end servers. Previously, he has also worked in the Computer Systems Laboratory at Hewlett Packard Laboratory and in the Server Engineering Group at Intel Corporation. He received his PhD in Computer Science from Texas A&M University in 1999. His research interests include internet computing / protocols, computer architecture, parallel processing and performance evaluation. He is currently serving as a member of the program committee for the International Conference on Information Technology (CIT-2001) and as a publicity chair for the 2nd Workshop on Performance and Architecture of Web Servers (PAWS-2001). He is a member of the IEEE.

Vijay Tewari is currently a Senior Software Engineer with the Technology Research Labs at

3 Intel Corp. He is a member of the Peer-to-Peer architecture team working on issues related to middleware services for P2P computing. Vijay got his Masters in Computer Science from the University of Minnesota, Twin Cities. During his Masters Vijay worked as an Intern in the Enterprise Architecture Labs for Intel corp. working on Web server performance. His interests include the areas of Networking, Internet and Distributed Computing. He has ten years of experience handling large communication systems for the Armed Forces in India. These include both static and mobile systems including wireless solutions.

2.1 Personal references

1. Krishna Kant has, in the past, spent 10 years in academia teaching a variety of courses with consistently high student rating. For more recent presentation abilities, contact Dr. Prasant Mohapatra of UC/Davis ([email protected]).

2. Vijay Tewari has extensive presentation experience as a communications engineer in Indian Military. For his recent presentation abilities, contact Dr. Bob Knighten at [email protected]. (You can contact about Krishna Kant and Ravi Iyer also from him, since we have all worked for Bob.)

3 Relevant URLs www.parc.xerox.com/istl/groups/iea/papers/gnutella sourceforge.net/projects www.cs.vu.nl/~steen/globe\001 www.entropia.com www.ud.com www.idrive.com www.cs.virginia.edu/~legion www.microsoft.com/net/ www.jxta.org www.gotdotnet.com www.pointerra.com www.spinfrenzy.com www.cutemx.com www.imesh.com www.scourexchange.com sourceforge.net/projects www.peer-to-peerwg.org www.napster.com www.infrasearch.com gnutella.wego.com freenet.sourceforge.net www.groove.net www.endeavors.com

4 References

[1] T.E. Anderson, et. al., “A case for NOW (Network of workstations)”, IEEE Micro, vol 15, no 1, Feb 1995, pp54-64. [2] I. Beier and H. Koenig, “GCSVA – A multiparty video conferencing system with distributed group and QoS management”, Proc. of 7th Intl. conference on computer communications and networks, 1998, pp594-598. [3] R.D. Bella, et. al., “An inter/intranet multimedia service for telemedicine”, Proceedings of 23rd Euromicro conference, 1997, pp379-386. [4] T. Bui and S. Sankaran, “Group Decision and Negotiation in Telemedicine: An application of Intelligent mobile agents as nonhuman teleworkers”, Proc. of 30th Hawaii Intl. conference on system sciences, Vol 4, 1997, pp120-129. [5] I. Clarke, “A Distributed Decentralized Information Storage and Retrieval system.” M.S. Thesis, Division of Informatics, Univ of Edinburgh, UK, 1999. [6] S.T. Chanson, A. Hui and E. Siu, “OCTOPUS – A scalable global multiparty video conferencing system”, Proc. of 8th Intel. conference on computer communications and networks, 1999, pp97-102. [7] R. Corchuelo, D. Ruiz, M. Toro, and A. Ruiz, “Implementing multiparty interactions on a network computer”, Proc. of 25th Euromicro conference, Vol 2, 1999, pp458-465. [8] M. Dahlin, et al., “Cooperative file caching: Using remote client memory to improve file system performance”, Proc. of first conference on O/S/ design and implementation, Nov 1994. [9] L. Gautier, C. Diot, and J. Kurose, “End to end transmission control mechanisms for multiparty interactive applications on the Internet”, Proc. of IEEE INFOCOM, 1999, Vol 3, pp1470-1479. [10] A. Grimshaw, et al., “Wide-Area Computing: Resource sharing on a large scale”, IEEE Computer, May 1999, pp1-9. [11] R.K. Joshi and D.J. Ram, “Anonymous Remote Computing: A paradigm for parallel pro- gramming on interconnected workstations”, IEEE Trans on software engineering, Vol 25, No 1, Jan 1999, pp75-90. [12] K. Kant and R. Iyer, “A Performance Model for Peer-to-Peer File Sharing Services”, submit- ted to an international conference, available at http://kkant.ccwebhost.com/download.html. [13] B.E. Martin, C.H. Pedersen, and J.B. Roberts, “An object based taxonomy for distributed computing systems”, IEEE Computer, Aug 1991, pp17-27. [14] S.J. Mullender, G. Rossum, et. al., “Amoeba: A distributed operating system for the 1990s”, IEEE Computer, vol 23, no 5, pp44-53, May 1990. [15] F. Tandiary, et. al., “Batrun: Utilizing idle workstations for large-scale computing”, IEEE parallel and distributed technology, Summer 1996, pp41-49.