The Distributed, Peer-To-Peer Approach" (2010)

Total Page:16

File Type:pdf, Size:1020Kb

The Distributed, Peer-To-Peer Approach Clemson University TigerPrints All Theses Theses 5-2010 Building Scientific loudC s: The Distributed, Peer- to-Peer Approach Linton Vadakedathu Clemson University, [email protected] Follow this and additional works at: https://tigerprints.clemson.edu/all_theses Part of the Computer Sciences Commons Recommended Citation Vadakedathu, Linton, "Building Scientific loudC s: The Distributed, Peer-to-Peer Approach" (2010). All Theses. 786. https://tigerprints.clemson.edu/all_theses/786 This Thesis is brought to you for free and open access by the Theses at TigerPrints. It has been accepted for inclusion in All Theses by an authorized administrator of TigerPrints. For more information, please contact [email protected]. Building Scientific Clouds: The Distributed, Peer-to-Peer Approach A Dissertation Presented to the Graduate School of Clemson University In Partial Fulfillment of the Requirements for the Degree Master of Science Computer Science by Linton Abraham Vadakedathu May 2010 Accepted by: Dr. Sebastien Goasguen, Committee Chair Dr. James Martin Dr. Pradip Srimani Abstract The Scientific community is constantly growing in size. The increase in personnel number and projects have resulted in the requirement of large amounts of storage, CPU power and other computing resources. It has also become necessary to acquire these resources in an affordable manner that is sensitive to work loads. In this thesis, the author presents a novel approach that provides the communication platform that will support such large scale scientific projects. These resources could be difficult to acquire due to NATs, firewalls and other site-based restrictions and policies. Methods used to overcome these hurdles have been discussed in detail along with other advantages of using such a system, which include: increased availability of necessary computing infrastructure; increased grid resource utilization; reduced user dependability; reduced job execution time. Experiments conducted included local infrastructure on the Clemson University Campus as well as resources provided by other federated grid sites. ii Dedication This thesis is dedicated to my family who stood by me with love and support. It is also dedicated to Dr. Sebastien Goasguen and the School of Computing at Clemson University. iii Acknowledgments The Author would like to acknowledge Dr. Sebastien Goasguen for his guidance and support in all research and course work done during the Masters program. The author thanks Michael Murphy and Michael Fenn for their time and support. The author acknowledges the guidance of Dr. James Martin and Dr. Pradip Srimani towards the correction and completion of this thesis. Finally, the author would like to acknowledge David Wolinsky of the University of Florida for his assistance in various research related undertakings. iv Contents TitlePage .......................................... .. i Abstract ........................................... .. ii Dedication......................................... iii Acknowledgments.................................... .... iv ListofTables ....................................... vii ListofFigures...................................... .viii ListofListings ..................................... .... ix 1 Introduction ...................................... ... 1 1.1 ResearchQuestions............................... ...... 3 1.2 MotivationforResearch . ....... 3 1.3 DistributedComputing .. .. .. .. .. .. .. .. .. .. .. .. ...... 4 1.4 Virtualization and Cloud Computing . .......... 5 1.5 High Throughput Computing and Condor . ........ 8 1.5.1 CondoratClemson............................... .. 10 1.6 Peer to Peer Architecture vs. Client-Server Architecture................ 10 1.7 DistributedHashTables . ....... 12 1.8 From P2P to DHT to Brunet, then came - IP Over P2P (IPOP) . ........ 15 1.8.1 ArchitecturalOverview . ..... 17 1.9 VirtualOrganizationCluster(VOC) . .......... 20 1.9.1 Physical Administrative Domain . ....... 22 1.9.2 Virtual Administrative Domain . ....... 23 1.9.3 Provisioning and Execution of Virtual Machines . ........... 24 2 RelatedWork ....................................... 25 2.1 Chord ........................................... 25 2.2 Pastry.......................................... .. 26 2.3 Kademlia ........................................ .. 27 2.4 Brunet(BasedonSymphony) . ...... 28 2.5 WorkRelatedtoVOCs ............................... ... 29 3 ProposedSystem .................................... 31 3.1 OverlayNetworkingDetails . ........ 31 3.2 IPOP BootstrapsandNamespacesinClemson . ......... 34 3.3 Multi-site communication and Watchdog . ........... 38 3.4 CondorIPOPPool-Local/SingleSite . ......... 39 v 3.5 Condor IPOP Pool - Multiple Grid Sites . ........ 42 4 Experiments&Results ............................... ... 44 4.1 Building the Physical and Virtual Clusters . ............. 44 4.2 LocalTesting .................................... .... 45 4.2.1 Experiments ................................... 45 4.2.2 Results ....................................... 48 4.3 VOC and the Amazon Elastic Compute Cloud(EC2) . ......... 49 4.3.1 Experiments ................................... 49 4.3.2 Results ....................................... 54 5 Conclusions ....................................... 55 Bibliography....................................... .... 57 vi List of Tables 4.1 Observed makespan lengths and system throughputs for 10 overlay experiment trials 48 vii List of Figures 1.1 GeneralRep.ofGrids .............................. ..... 2 1.2 Serialvs. ParallelComputation . .......... 2 1.3 Distributedcomputing . ....... 5 1.4 CloudComputing.................................. .... 7 1.5 CondorArchitecture .............................. ...... 9 1.6 Client-ServerArchitecture . .......... 11 1.7 Peer-to-PeerSystem .............................. ...... 12 1.8 DistributedHashTable .. .. .. .. .. .. .. .. .. .. .. .. ...... 13 1.9 DHTImplementation ............................... .... 14 1.10IPOPArchitecture ............................... ...... 16 1.11StructureofIPOP ................................ ..... 17 1.12 VirtualOrganizationClusters . ........... 21 3.1 IPOPasaSTUNserver ............................... ... 32 3.2 IPOPDHCPConfigurationFile. ...... 34 3.4 IPOPBootstrapConfigurationFile. ......... 35 3.3 IPOPNamespaceConfigurationFile . ........ 35 3.5 IPOPNodeConfigurationFile . ...... 36 3.6 SingleSiteVOC ................................... ... 40 3.7 VOCoverasinglesite .............................. ..... 41 3.8 MultipleGridSiteVOC ............................. ..... 42 4.1 ShortJobswithouttheOverlayNetwork . .......... 47 4.2 ShortJobswiththeOverlayNetwork. ......... 47 4.3 LongJobswithouttheOverlayNetwork . ......... 47 4.4 LongJobswiththeOverlayNetwork . ........ 48 4.5 Amazon EC2 Primary Distributed Architecture[2] . ............. 49 4.6 AmazonHomeUI .................................... 50 4.7 AmazonInstanceUI ................................ .... 50 4.8 Short Jobs executing in Multiple Grid sites over IPOP . .............. 52 4.9 Long Jobs executing in Multiple Grid sites over IPOP . ............. 53 viii Listings ix Chapter 1 Introduction “Grid computing has emerged as an important new field, distinguished from conventional distributed computing by its focus on large-scale resource sharing, innovative applications, and, in some cases, high-performance orientation.” - Ian Foster [41]. This statement, which was made by Ian Foster in 2001, serves as a standard for all research and development conducted in the grid com- munity. The key difference between Grid computing and Distributed computing is in the allocation of compute resources. Grid computing requires access to compute resources that are exclusively dedicated which includes powerful workstations, super computers and servers. Distributed comput- ing, on the other hand, does not have such a stringent requirement. It requires only a subset of resources from any single computation unit. This means that at any instant of time, a system could be ’live’ (actively deployed in a distributed environment) and be used by the owner of the system simultaneously. Such operations are abstracted away from the user in such a way that his work is never affected. Despite these differences, the scientific community has agreed on the consensus that Grid computing is a subset of Distributed computing. Large applications are often required for scientific simulations, research, modeling and other applications. Single computation units, despite having multiple processors, huge amounts of RAM and disk space, can neither satisfy the computing requirement of ever growing data sets nor provide results in a reasonable amount of time. Therefore the timing for a gradual but definite adaptation of large-scale distributed systems is indeed warranted and befitting[39]. Clemson University has a large supercomputing infrastructure, in the Palmetto Cluster. It has over 6000 cores and is ranked 89 among the TOP 500 Supercomputing Sites in the World. 1 Figure 1.1: A General Representation of Grids comprising not just Computing and Networking hardware but geographic locations and people Figure 1.2: Serial vs. Parallel Computation - The effect of parallelizing code and using distributed resources[10]. 2 The Cyber Infrastructure Research Group[31] at Clemson is the leading developer and strategist for distributed computing and infrastructure on campus. Led by Dr. Sebastien Goasguen, the group looks to leverage campus wide resources including the Palmetto Cluster to research and develop distributed
Recommended publications
  • Anonsocialmix: Anonymous P2P File Sharing Over Social Networks
    AnonSocialMix: Anonymous P2P File Sharing Over Social Networks Student Name: Rajdeep Mukherjee IIIT-D-MTech-CS-GEN-MT15051 July, 2017 Indraprastha Institute of Information Technology New Delhi Thesis Committee Dr. Sambuddho Chakravarty, IIIT Delhi (Advisor) Dr. Tanmoy Chakraborty, IIIT Delhi (Internal Examiner) Dr. Vinay Joseph Ribeiro, IIT Delhi (External Examiner) Submitted in partial fulfillment of the requirements for the Degree of M.Tech. in Computer Science, in Information Security Category ©2017 IIIT-D MTech-CS-GEN-17-MT15051 All rights reserved Certificate This is to certify that the thesis titled “AnonSocialMix: Anonymous P2P File Sharing Over Social Networks" submitted by Rajdeep Mukherjee for the partial fulfillment of the requirements for the degree of Master of Technology in Computer Science & Engineering is a record of the bonafide work carried out by him under my guidance and supervision in the Security and Privacy group at Indraprastha Institute of Information Technology, Delhi. This work has not been submitted anywhere else for the reward of any other degree. Dr.Sambuddho Chakravarty Indraprastha Institute of Information Technology, New Delhi 2 Abstract Peer-to-peer (P2P) file sharing accounts for one of the major sources of the Internet traffic. As privacy and anonymity issues continue to grow due to constant censorship and network surveil- lance, more and more Internet users are getting attracted towards the facilities for anonymous communication. Extensive research has been conducted over the years towards the design and development of several anonymous P2P file sharing protocols and systems. Size of the Anonymity Set plays a crucial role in determining the degree of anonymity being provided by such networks.
    [Show full text]
  • A Flexible Middleware for Metacomputing a Flexible
    IJCSNS International Journal of Computer Science and Network Security, VOL.6 No.2B, February 2006 92 A flexible middleware for metacomputing in multimulti----domaindomain networks Franco Frattolillo Research Centre on Software Technology Department of Engineering, University of Sannio Benevento, Italy Summary technologies, but they require further and significant Middlewares are software infrastructures able to harness the advances in mechanisms, techniques, and tools to bring enormous, but often poorly utilized, computing resources together computational resources distributed over the existing on the Internet in order to solve large-scale problems. Internet to efficiently solve a single large-scale problem However, most of such resources, particularly those existing according to the new scenario introduced by grid within “departmental” organizations, cannot often be considered computing [2, 3, 4]. In fact, in such a scenario, cluster actually available to run large-scale applications, in that they cannot be easily exploited by most of the currently used computing still remains a valid and actual support to high middlewares for grid computing. In fact, such resources are performance parallel computing, since clusters of usually represented by computing nodes belonging to non- workstations represent high performance/cost ratio routable, private networks and connected to the Internet through computing platforms and are widely available within publicly addressable IP front-end nodes. This paper presents a “departmental” organizations, such
    [Show full text]
  • Four-Dimensional Model for Describing the Status of Peers in Peer-To-Peer Distributed Systems
    Turkish Journal of Electrical Engineering & Computer Sciences Turk J Elec Eng & Comp Sci (2013) 21: 1646 { 1664 http://journals.tubitak.gov.tr/elektrik/ ⃝c TUB¨ ITAK_ Research Article doi:10.3906/elk-1108-27 Four-dimensional model for describing the status of peers in peer-to-peer distributed systems Seyedeh Leili MIRTAHERI,1;∗ Ehsan Mousavi KHANEGHAH,1 Mohsen SHARIFI,1 Behrouz MINAEI-BIDGOLI,1 Bijan RAAHEMI,2 Mohammad Norouzi ARAB,1 Abbas Saleh ARDESTANI3 1School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran 2School of Information Technology and Engineering, University of Ottawa, Ottawa, Ontario, Canada 3Department of Business Management, Faculty of Management, Islamic Azad University Central Tehran Branch, Tehran, Iran Received: 09.08.2011 • Accepted: 15.03.2012 • Published Online: 02.10.2013 • Printed: 28.10.2013 Abstract: One of the important aspects of decision making and management in distributed systems is collecting accurate information about the available resources of the peers. The previously proposed approaches for collecting such information completely depend on the system's architecture. In the server-oriented architecture, servers assume the main role of collecting comprehensive information from the peers and the system. Next, based on the information about the features of the basic activities and the system, an exact description of the peers' status is produced. Accurate decisions are then made using this description. However, the amount of information gathered in this architecture is too large, and it requires massive processing. On the other hand, updating the information takes time, causing delays and undermining the validity of the information. In addition, due to the limitations imposed by the servers, such architecture is not scalable and dynamic enough.
    [Show full text]
  • Social Network Based Anonymous Communication in Tor Peng Zhou, Xiapu Luo, Ang Chen, and Rocky K
    1 STor: Social Network based Anonymous Communication in Tor Peng Zhou, Xiapu Luo, Ang Chen, and Rocky K. C. Chang Department of Computing, The Hong Kong Polytechnic University, Hunghom, Hong Kong cspzhouroc,csxluo,csachen,csrchang @comp.polyu.edu.hk f g Abstract—Anonymity networks hide user identities with the help of relayed anonymity routers. However, the state-of-the-art anonymity networks do not provide an effective trust model. As a result, users cannot circumvent malicious or vulnerable routers, thus making them susceptible to malicious router based attacks (e.g., correlation attacks). In this paper, we propose a novel social network based trust model to help anonymity networks circumvent malicious routers and obtain secure anonymity. In particular, we design an input independent fuzzy model to determine trust relationships between friends based on qualitative and quantitative social attributes, both of which can be readily obtained from existing social networks. Moreover, we design an algorithm for propagating trust over an anonymity network. We integrate these two elements in STor, a novel social network based Tor. We have implemented STor by modifying the Tor’s source code and conducted experiments on PlanetLab to evaluate the effectiveness of STor. Both simulation and PlanetLab experiment results have demonstrated that STor can achieve secure anonymity by establishing trust-based circuits in a distributed way. Although the design of STor is based on Tor network, the social network based trust model can be adopted by other anonymity networks. Index Terms—Social Network, Anonymous Communication, Tor, Fuzzy Model F 1 INTRODUCTION alone can be easily bypassed by an attacker.
    [Show full text]
  • Technology Stack for Decentralized Mobile Services
    Technology Stack for Decentralized Mobile Services Matouš Skála Technology Stack for Decentralized Mobile Services by Matouš Skála to obtain the degree of Master of Science at the Delft University of Technology, to be defended publicly on Monday August 31, 2020 at 3:00 PM. Student number: 4893964 Project duration: November 15, 2019 – August 31, 2020 Thesis committee: Dr.ir. J.A. Pouwelse, TU Delft, supervisor Dr. J.S. Rellermeyer, TU Delft Dr. N. Yorke-Smith, TU Delft An electronic version of this thesis is available at http://repository.tudelft.nl/. Preface When I was choosing my thesis topic, I originally came up with an idea of designing a decen- tralized social network. After realizing how ambitious that goal was, I later decided to focus on more fundamental issues first and create a library that would allow for building any de- centralized applications, running purely on an overlay network consisting of smartphones. Rather than reinventing the wheel, I took inspiration from an existing networking library de- veloped at TU Delft over the last decade and created its wire-compatible implementation in Kotlin. Interestingly, in the end, I have even implemented a trivial social network to demon- strate the usage of the library, returning back to the original idea. I would like to thank my supervisor Johan Pouwelse for an endless stream of fresh ideas and valuable feedback, and to PhD students of the Delft Blockchain Lab for numerous coffee meetings and for serving me as a walking documentation of the existing codebase. Matouš Skála Prague,
    [Show full text]
  • From Computational Science to Internetics: Integration of Science with Computer Science
    Syracuse University SURFACE Northeast Parallel Architecture Center College of Engineering and Computer Science 2002 From Computational Science to Internetics: Integration of Science with Computer Science Geoffrey C. Fox Syracuse University, Northeast Parallel Architectures Center Follow this and additional works at: https://surface.syr.edu/npac Part of the Computer Sciences Commons, and the Curriculum and Instruction Commons Recommended Citation Fox, Geoffrey C., "From Computational Science to Internetics: Integration of Science with Computer Science" (2002). Northeast Parallel Architecture Center. 80. https://surface.syr.edu/npac/80 This Article is brought to you for free and open access by the College of Engineering and Computer Science at SURFACE. It has been accepted for inclusion in Northeast Parallel Architecture Center by an authorized administrator of SURFACE. For more information, please contact [email protected]. From Computational Science to Internetics Integration of Science with Computer Science Geoffrey C. Fox Syracuse University Northeast Parallel Architectures Center 111 College Place Syracuse, New York 13244-4100 [email protected] http://www.npac.syr.edu In honor of John Rice at Purdue University Abstract We describe how our world dominated by Science and Scientists has been changed and will be revolutionized by technologies moving with Internet time. Computers have always been well-used tools but in the beginning only the science counted and little credit or significance was attached to any computing activities associated with scientific research. Some 20 years ago, this started to change and the area of computational science gathered support with the NSF Supercomputer centers playing a critical role. However this vision has stalled over the last 5 years with information technology increasing in importance.
    [Show full text]
  • Censorship-Resistant Collaboration with a Hybrid DTN/P2P Network
    Censorship-resistant Collaboration with a Hybrid DTN/P2P Network Masterarbeit von Philipp Hagemeister aus Braunschweig vorgelegt am Lehrstuhl fur¨ Rechnernetze und Kommunikationssysteme Prof. Dr. Martin Mauve Heinrich-Heine-Universitat¨ Dusseldorf¨ Marz¨ 2012 Acknowledgments My thanks go to Marc Fontaine for asking stupid questions that turned out to be quite clever, and for pointing out that correctness is essential both in the real and the physical world. I also thank Paul Baade for demanding impossible features which turned out to be the last piece in the puzzle. Julius Rommler¨ has notified me of orthographical, typographical, and (inadvertently) semantical er- rors. And told me to use fewer big words. Thanks! I wish to thank Denis Lutke-Wiesmann¨ for proofreading the thesis, and the footnotes. Sven Hager found lots of overly short, overly long, and overly wrong statements. Thanks! Thanks to Prof. Martin Mauve for coming up with the idea, shielding us from bureaucracy, asking for explanation and rationale at every step, and finding all the errors nobody else found. iii Contents List of Figures viii 1 Motivation 1 1.1 Distribution of Speech . .2 1.2 Threat Model . .2 1.2.1 Nontechnical Attacks . .2 1.2.2 Internet Access . .3 1.2.3 Control over the User’s Computer . .4 1.2.4 Total Shutoff . .4 1.2.5 Physical Attacks . .5 1.2.6 IP Blocking . .5 1.2.7 DNS censorship . .6 1.2.8 Deep Packet Inspection . .6 1.2.9 Active Attacks . .8 1.2.10 Conclusions . .9 1.3 Decentralization . 10 1.4 Collaboration . 10 1.5 Structure of this Thesis .
    [Show full text]
  • Unveiling the I2P Web Structure: a Connectivity Analysis
    Unveiling the I2P web structure: a connectivity analysis Roberto Magan-Carri´ on,´ Alberto Abellan-Galera,´ Gabriel Macia-Fern´ andez´ and Pedro Garc´ıa-Teodoro Network Engineering & Security Group Dpt. of Signal Theory, Telematics and Communications - CITIC University of Granada - Spain Email: [email protected], [email protected], [email protected], [email protected] Abstract—Web is a primary and essential service to share the literature have analyzed the content and services offered information among users and organizations at present all over through this kind of technologies [6], [7], [2], as well as the world. Despite the current significance of such a kind of other relevant aspects like site popularity [8], topology and traffic on the Internet, the so-called Surface Web traffic has been estimated in just about 5% of the total. The rest of the dimensions [9], or classifying network traffic and darknet volume of this type of traffic corresponds to the portion of applications [10], [11], [12], [13], [14]. Web known as Deep Web. These contents are not accessible Two of the most popular darknets at present are The Onion by search engines because they are authentication protected Router (TOR; https://www.torproject.org/) and The Invisible contents or pages that are only reachable through the well Internet Project (I2P;https://geti2p.net/en/). This paper is fo- known as darknets. To browse through darknets websites special authorization or specific software and configurations are needed. cused on exploring and investigating the contents and structure Despite TOR is the most used darknet nowadays, there are of the websites in I2P, the so-called eepsites.
    [Show full text]
  • Privacy-Preserving P2P Data Sharing with Oneswarm
    Privacy-Preserving P2P Data Sharing with OneSwarm Tomas Isdal Michael Piatek Arvind Krishnamurthy Thomas Anderson University of Washington ABSTRACT On one side, protocols like BitTorrent are high performance and Privacy—the protection of information from unauthorized disclo- robust, but participants are easily monitored by anyone who cares sure—is increasingly scarce on the Internet. The lack of privacy to look [33, 30, 29]. On the other, anonymization networks (e.g., is particularly true for popular peer-to-peer data sharing applica- Tor [14] and Freenet [12]) emphasize privacy, but offer compara- tions such as BitTorrent where user behavior is easily monitored by tively poor performance. third parties. Anonymizing overlays such as Tor and Freenet can In this paper, we explore a new design point in this tradeoff be- improve user privacy, but only at a cost of substantially reduced tween privacy and performance. We describe the design and imple- performance. Most users are caught in the middle, unwilling to mentation of a file-sharing protocol called OneSwarm, intended to sacrifice either privacy or performance. provide much better performance than Tor and Freenet, and much In this paper, we explore a new design point in this tradeoff be- better privacy than BitTorrent. Our goal is to provide good enough tween privacy and performance. We describe the design and imple- performance that users turn on privacy by default for all of their mentation of a new P2P data sharing protocol, called OneSwarm, non-public data sharing. To this end, data objects shared via One- that provides users much better privacy than BitTorrent and much Swarm are located and transferred using disposable, temporary ad- better performance than Tor or Freenet.
    [Show full text]
  • Globus: a Metacomputing Infrastructure Toolkit
    Globus A Metacomputing Infrastructure Toolkit y Ian Foster Carl Kesselman httpwwwglobusorg Abstract Emerging highp erformance applications require the ability to exploit diverse ge ographically distributed resources These applications use highsp eed networks to in tegrate sup ercomputers large databases archival storage devices advanced visualiza tion devices andor scientic instruments to form networked virtual supercomputers or metacomputers While the physical infrastructure to build such systems is b ecoming widespread the heterogeneous and dynamic nature of the metacomputing environment p oses new challenges for developers of system software parallel to ols and applications In this article we introduce Globus a system that we are developing to address these challenges The Globus system is intended to achieve a vertically integrated treatment of application middleware and network A lowlevel toolkit provides basic mechanisms such as communication authentication network information and data access These mechanisms are used to construct various higherlevel metacomputing services such as parallel programming to ols and schedulers Our longterm goal is to build an Adaptive Wide Area Resource Environment AWARE an integrated set of higherlevel services that enable applications to adapt to heterogeneous and dynamically changing meta computing environments Preliminary versions of Globus comp onents were deployed successfully as part of the IWAY networking exp eriment Introduction New classes of highp erformance applications are b eing developed
    [Show full text]
  • Using Metacomputing Tools to Facilitate Large Scale Analyses Of
    Pacific Symposium on Biocomputing 6:360-371 (2001) USING METACOMPUTING TOOLS TO FACILITATE LARGE- SCALE ANALYSES OF BIOLOGICAL DATABASES Allison Waugh1, Glenn A. Williams1, Liping Wei2, and Russ B. Altman1 1Stanford Medical Informatics 251 Campus Drive, MSOB X-215, Stanford, CA 94305-5479 {waugh, gaw, altman} @ smi.stanford.edu 2Exelixis, Inc. 260 Littlefield Ave, South San Francisco, CA 94080 [email protected] ABSTRACT Given the high rate at which biological data are being collected and made public, it is essential that computational tools be developed that are capable of efficiently accessing and analyzing these data. High-performance distributed computing resources can play a key role in enabling large-scale analyses of biological databases. We use a distributed computing environment, LEGION, to enable large-scale computations on the Protein Data Bank (PDB). In particular, we employ the FEATURE program to scan all protein structures in the PDB in search for unrecognized potential cation binding sites. We evaluate the efficiency of LEGION’s parallel execution capabilities and analyze the initial biological implications that result from having a site annotation scan of the entire PDB. We discuss four interesting proteins with unannotated, high-scoring candidate cation binding sites. INTRODUCTION In the last three years the Protein Data Bank (PDB) has increased from 5,811 released atomic coordinate entries to 12,1101. The growth of the PDB is typical of many other biological databases. As these databases increase in size, analyzing their contents in a systematic manner becomes important. Two common analyses are (1) linear scans that examine each entry (such as looking through all DNA sequences for the presence of a new motif), and (2) "all-versus-all" comparisons in which each element is compared with all other elements (such as clustering of structures, as reported in Shindyalov and Bourne2).
    [Show full text]
  • Metacomputing Across Intercontinental Networks S.M
    Future Generation Computer Systems 17 (2001) 911–918 Metacomputing across intercontinental networks S.M. Pickles a,∗, J.M. Brooke a, F.C. Costen a, Edgar Gabriel b, Matthias Müller b, Michael Resch b, S.M. Ord c a Manchester Computing, The University of Manchester, Oxford Road, Manchester M13 9PL, UK b High Performance Computing Center, Stuttgart, Allmandring 30, D-70550 Stuttgart, Germany c NRAL, Jodrell Bank, Macclesfield, Cheshire SK11 9DL, UK Abstract An intercontinental network of supercomputers spanning more than 10 000 miles and running challenging scientific appli- cations was realized at the Supercomputing ’99 (SC99) conference in Portland, OR using PACX-MPI and ATM PVCs. In this paper, we describe how we constructed the heterogeneous cluster of supercomputers, the problems we confronted in terms of multi-architecture and the way several applications handled the specific requirements of a metacomputer. © 2001 Elsevier Science B.V. All rights reserved. Keywords: Metacomputing; High-performance networks; PACX-MPI 1. Overview of the SC99 global network • national high speed networks in each partner coun- try, During SC99 a network connection was set up that • the European high speed research network connected systems in Europe, the US and Japan. For TEN-155, the experiments described in this paper, four super- • a transatlantic ATM connection between the Ger- computers were linked together. man research network (DFN) and the US research network Abilene, • A Hitachi SR8000 at Tsukuba/Japan with 512 pro- • a transpacific ATM research network (TransPAC) cessors (64 nodes). that was connecting Japanese research networks to • A Cray T3E-900/512 at Pittsburgh/USA with 512 STAR-TAP at Chicago.
    [Show full text]