Scooped, Again
Total Page:16
File Type:pdf, Size:1020Kb
Scooped, Again The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Ledlie, Jonathan, Jeff Shneidman, Margo Seltzer, and John Huth. 2003. Scooped, again. Peer-to-peer systems II: Second international workshop, IPTPS 2003, Berkeley, California, February 21-22, 2003, ed. IPTPS 2003, Frans Kaashoek, and Ion Stoica. Berlin: Springer Verlang. Previously published in Lecture Notes in Computer Science 2735: 129-138. Published Version http://dx.doi.org/10.1007/b11823 Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:2799042 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Other Posted Material, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#LAA Scooped, Again Jonathan Ledlie, Jeff Shneidman, Margo Seltzer, John Huth Harvard University jonathan,jeffsh,margo ¡ @eecs.harvard.edu [email protected] Abstract p2p Grid Users (scientists) users Users (disparate The Peer-to-Peer (p2p) and Grid infrastructure commu- groups ) ©§ nities are tackling an overlapping set of problems. In ad- App. ¢¤£¦¥§¥§¨writers place demand demand dressing these problems, p2p solutions are usually moti- demand (e.g., on vated by elegance or research interest. Grid researchers, App. writers OGSA) writers} under pressure from thousands of scientists with real file research and Infrastructure writers Infrastructure writers sharing and computational needs, are pooling their solu- development tions from a wide range of sources in an attempt to meet user demand. Driven by this need to solve large scientific Figure 1: In serving their well-defined user base, the problems quickly, the Grid is being constructed with the Grid community has needed to draw from both its ances- tools at hand: FTP or RPC for data transfer, centraliza- try of supercomputing and from Systems research (in- tion for scheduling and authentication, and an assump- cluding p2p and distributed computing). P2p has essen- tially invented its user base through its technology. tion of correct, obediant nodes. If history is any guide, the World Wide Web depicts viscerally that systems that his own work and the work of other physicists at address user needs can have enormous staying power and CERN led him to develop HTML, HTTP, and a sim- affect future research. The Grid infrastructure is a great ple browser [4]. customer waiting for future p2p products. By no means should we make them our only customers, but we should While HTTP and HTML are simple, they have at least put them on the list. If p2p research does not exhibited serious network and language-based defi- at least address the Grid, it may eventually be sidelined ciencies as the Web has grown [21]. These problems by defacto distributed algorithms that are less elegant but have been examined and patched to some extent, but were used to solve Grid problems. In essense, we’ll have this simple inelegant solution remains at the core of been scooped, again. the Web. A parallel situation exists today with the p2p and 1 Introduction Grid communities. A large group of users (scien- Butler Lampson, in his SOSP99 Invited Talk, tists) are pushing for immediately useable tools to stated that the greatest failure of the systems re- pool large sets of resources. The Grid is currently search community over the past ten years was that building these tools. The Systems community is in “we did not invent the Web” [39]. The systems danger of falling victim to the contemporary version research community laid the groundwork, but did of Lampson’s admonishment if we do not partici- not follow through. The same situation exists to- pate in this process. day with the overlapping efforts of the Grid and p2p The Grid is an active area of research and devel- communities. The former is building and using a opment. The number of academic grids has jumped global resource sharing system, while the latter is six-fold in the last year [43]. As Figure 1 depicts, repeating their mistake of the past by focusing on the “customer base” of scientists, who are often also elegant solutions without regard to a vast potential application writers, drive Grid developers to pro- user community. duce tangible solutions, even when the solutions In 1989, Tim Berner-Lee’s need to communicate are not ideal, from a computer systems perspec- tive. Most current p2p users are people sharing files; 2.3 Manifestations sometimes these people are pursuing noble causes Grids historially arose out of a need to perform like anonymous document distribution, but more of- massive computation. A manifestation of shared ten they are simply trying to circumvent copyright, computation, Condor accepts compiled jobs, sched- and rarely are they interacting with the systems’ de- ules and runs them on remote idle machines [42]. velopers. P2p does not have the driving force that Exemplifying its focus on computation, it issues interactive users provide and therefore has focused RPCs back to the job originator’s machine for data. on solutions that are interesting primarily from a re- An example of computational middleware is Globus search perspective. [23]; it “meta-schedules” jobs among Grids like The difficulty with this parallel development is Condor and ships host data between them using not that it is wasteful, but that, without the p2p com- GridFTP, an FTP wrapper. Manifestations of Data munity’s input, the Grid will most likely be built in Grids include the European Data Grid project [20]. a way incompatible and non-inclusive of many of The Grid community is currently authoring a p2p’s strong points: search and storage scalability, Web Services-oriented API called Open Grid Ser- decentralization, anonymity and pseudonymity, and vices Architecture (OGSA) [24, 25]. The next gen- denial of service prevention. eration of Globus is intended to be a reference im- In the rest of this paper, we first discuss the char- plentation of this API. ters of the two different communities. We introduce three fallacies that may have kept the communities 3 P2P separate. We then describe common problems being 3.1 What is P2P? attacked by the two communities and compare so- lutions from each camp, and discuss problems that Unintentionally, Shirky describes p2p much like seem to be truly disjoint. We conclude with a call to a Grid: “Peer-to-peer is a class of applications that action for the p2p community to examine the Grid take advantage of resources — storage, cycles, con- needs and consider future research problems in that tent, human presence — available at the edges of context. the Internet” [19]. Stoica et al. offer a more restric- tive definition focusing on decentralized and non- 2 Grids hierarchical node organization [54]. 2.1 What is the Grid? P2p’s focus on decentralization, instability, and fault tolerance exemplify areas that essentially have Buyya defines the Grid as “a type of parallel and been omitted from emerging Grid standards, but distributed system that enables the sharing, selec- will become more significant as the system grows. tion, and aggregation of resources distributed across multiple administrative domains based on their (re- 3.2 Goals of P2P sources) availability, capability, performance, cost, The goal of p2p is to take advantage of the idle and users’ quality-of-service requirements” [8]. cycles and storage of the edge of the Internet, effec- The Grid is “distinguished from conventional dis- tively utilizing its “dark matter” [19]. This overar- tributed computing by its focus on large-scale re- ching goal introduces issues including decentraliza- source sharing, innovative applications, and in some tion, anonymity and pseudonymity, redundant stor- cases, high performance orientation” [27]. age, search, locality, and authentication. 2.2 Goals of the Grid 3.3 Manifestations The Grid aims to be self-configuing, self-tuning, O’Reilly’s p2p site [46] divides p2p systems into and self-healing, similar to the goals in autonomic nineteen categories, primarily offering file sharing computing [2]. It aims to fulfill the vision of Cor- (e.g., Gnutella [28], KaZaA [36]), distributed com- bato’s Multics [13]: like a utility company, a mas- putation (e.g., distributed.net [17]), and anonymity sive resource to which a user gives his or her com- (e.g., Freenet [12], Publius [44]). Seti@home is an- putational or storage needs. The Grid’s goal is to other p2p application, although the Grid community utilize the shared storage and cycles from the mid- considers it one of theirs as well [51]. dle and edges of the Internet. Prominant research instances of p2p include tally anonymous file sharing) impose special re- Chord, Pastry, and Tapestry [48, 53, 56]. File quirements for assorted (not always technical) rea- systems built on these include CFS, PAST, and sons. But even in these situations, a general aware- Oceanstore [14, 18, 37]. File systems, however, ness of technical approaches taken by the other are not applications and, while a variety of forms community may help solve “physically private” of storage have been built, there exists no driving problems. application in this space. 4.3 “Grid projects do not have the flexibility to 4 Three Falacies That Have Kept the Com- try new algorithms/ideas because they have munities Separate to get real work done. P2P research is all about this flexibility.” This paper argues that the p2p and Grid commu- P2p research is very flexible: one version can nities are “natural” partners in research. The follow- obsolete the previous and new algorithms can be ing are some objections and responses to this claim. developed without conforming to any standard. 4.1 “The technical problems in Grid systems Within the emerging Grid standards, however, there are different from those in p2p systems.” is room for flexible research too.