Introduction to Grid Computing

Total Page:16

File Type:pdf, Size:1020Kb

Introduction to Grid Computing Front cover Introduction to Grid Computing Learn grid computing basics Understand architectural considerations Create and demonstrate a grid environment Bart Jacob Michael Brown Kentaro Fukui Nihar Trivedi ibm.com/redbooks International Technical Support Organization Introduction to Grid Computing December 2005 SG24-6778-00 Note: Before using this information, read the information in “Notices” on page ix. First Edition (December 2005) © Copyright International Business Machines Corporation 2005. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Notices . ix Trademarks . x Preface . xi The team that wrote this redbook. xi Become a published author . xiii Comments welcome. xiii Part 1. Grid fundamentals . 1 Chapter 1. What grid Computing is . 3 Chapter 2. Benefits of grid computing . 7 2.1 Exploiting under utilized resources . 8 2.2 Parallel CPU capacity . 9 2.3 Virtual resources and virtual organizations for collaboration. 10 2.4 Access to additional resources . 11 2.5 Resource balancing. 12 2.6 Reliability . 14 2.7 Management . 15 2.8 Summary . 17 Chapter 3. Grid terms and concepts . 19 3.1 Types of resources . 20 3.1.1 Computation . 20 3.1.2 Storage . 20 3.1.3 Communications . 22 3.1.4 Software and licenses . 22 3.1.5 Special equipment, capacities, architectures, and policies . 23 3.2 Jobs and applications . 23 3.3 Scheduling, reservation, and scavenging . 24 3.4 Grid software components . 26 3.4.1 Management components. 26 3.4.2 Distributed grid management . 26 3.4.3 Donor software . 27 3.4.4 Submission software . 28 3.4.5 Schedulers . 28 3.4.6 Communications . 29 3.4.7 Observation and measurement. 29 © Copyright IBM Corp. 2005. All rights reserved. iii 3.5 Intragrid and intergrid . 30 3.6 Summary . 32 Chapter 4. Grid user roles . 33 4.1 Using a grid: A user’s perspective. 34 4.1.1 Enrolling and installing grid software. 34 4.1.2 Logging onto the grid . 34 4.1.3 Queries and submitting jobs . 35 4.1.4 Data configuration . 36 4.1.5 Monitoring progress and recovery. 36 4.1.6 Reserving resources . 37 4.2 Using a grid: An administrator’s perspective . 38 4.2.1 Planning . 38 4.2.2 Installation . 39 4.2.3 Managing enrollment of donors and users . 39 4.2.4 Certificate authority . 40 4.2.5 Resource management. 41 4.2.6 Data sharing . 41 4.3 Summary . 42 Part 2. Grid architecture considerations. 43 Chapter 5. Standards for grid environments . 45 5.1 Overview . 46 5.1.1 OGSA . 46 5.1.2 OGSI . 47 5.1.3 OGSA-DAI. 47 5.1.4 GridFTP. 48 5.1.5 WSRF . 48 5.1.6 Web services related standards . 49 Chapter 6. Application considerations . 51 6.1 General application considerations . 52 6.2 CPU-intensive application considerations . 53 6.3 Data considerations. 59 6.4 Summary . 62 Chapter 7. Security . 63 7.1 Introduction to grid security . 64 7.1.1 Grid security requirements . 64 7.1.2 Security fundamentals. 67 7.1.3 Important grid security terms. 68 7.1.4 Symmetric key encryption . 69 7.1.5 Asymmetric key encryption . 70 iv Introduction to Grid Computing 7.1.6 The Certificate Authority . 71 7.1.7 Digital certificates . 73 7.2 Grid security infrastructure . 76 7.2.1 Getting access to the grid . 76 7.2.2 Grid secure communication . 82 7.2.3 Grid security step-by-step . 84 7.3 Grid infrastructure security . 88 7.3.1 Physical security . 88 7.3.2 Operating system security. 88 7.3.3 Grid and firewalls . 89 7.3.4 Host intrusion detection. ..
Recommended publications
  • Distributed & Cloud Computing: Lecture 1
    Distributed & cloud computing: Lecture 1 Chokchai “Box” Leangsuksun SWECO Endowed Professor, Computer Science Louisiana Tech University [email protected] CTO, PB Tech International Inc. [email protected] Class Info ■" Class Hours: noon-1:50 pm T-Th ! ■" Dr. Box’s & Contact Info: ●"www.latech.edu/~box ●"Phone 318-257-3291 ●"Email: [email protected] Class Info ■" Main Text: ! ●" 1) Distributed Systems: Concepts and Design By Coulouris, Dollimore, Kindberg and Blair Edition 5, © Addison-Wesley 2012 ●" 2) related publications & online literatures Class Info ■" Objective: ! ●"the theory, design and implementation of distributed systems ●"Discussion on abstractions, Concepts and current systems/applications ●"Best practice on Research on DS & cloud computing Course Contents ! •" The Characterization of distributed computing and cloud computing. •" System Models. •" Networking and inter-process communication. •" OS supports and Virtualization. •" RAS, Performance & Reliability Modeling. Security. !! Course Contents •" Introduction to Cloud Computing ! •" Various models and applications. •" Deployment models •" Service models (SaaS, PaaS, IaaS, Xaas) •" Public Cloud: Amazon AWS •" Private Cloud: openStack. •" Cost models ( between cloud vs host your own) ! •" ! !!! Course Contents case studies! ! •" Introduction to HPC! •" Multicore & openMP! •" Manycore, GPGPU & CUDA! •" Cloud-based EKG system! •" Distributed Object & Web Services (if time allows)! •" Distributed File system (if time allows)! •" Replication & Disaster Recovery Preparedness! !!!!!! Important Dates ! ■" Apr 10: Mid Term exam ■" Apr 22: term paper & presentation due ■" May 15: Final exam Evaluations ! ■" 20% Lab Exercises/Quizzes ■" 20% Programming Assignments ■" 10% term paper ■" 20% Midterm Exam ■" 30% Final Exam Intro to Distributed Computing •" Distributed System Definitions. •" Distributed Systems Examples: –" The Internet. –" Intranets. –" Mobile Computing –" Cloud Computing •" Resource Sharing. •" The World Wide Web. •" Distributed Systems Challenges. Based on Mostafa’s lecture 10 Distributed Systems 0.
    [Show full text]
  • Distributed Algorithms with Theoretic Scalability Analysis of Radial and Looped Load flows for Power Distribution Systems
    Electric Power Systems Research 65 (2003) 169Á/177 www.elsevier.com/locate/epsr Distributed algorithms with theoretic scalability analysis of radial and looped load flows for power distribution systems Fangxing Li *, Robert P. Broadwater ECE Department Virginia Tech, Blacksburg, VA 24060, USA Received 15 April 2002; received in revised form 14 December 2002; accepted 16 December 2002 Abstract This paper presents distributed algorithms for both radial and looped load flows for unbalanced, multi-phase power distribution systems. The distributed algorithms are developed from tree-based sequential algorithms. Formulas of scalability for the distributed algorithms are presented. It is shown that computation time dominates communication time in the distributed computing model. This provides benefits to real-time load flow calculations, network reconfigurations, and optimization studies that rely on load flow calculations. Also, test results match the predictions of derived formulas. This shows the formulas can be used to predict the computation time when additional processors are involved. # 2003 Elsevier Science B.V. All rights reserved. Keywords: Distributed computing; Scalability analysis; Radial load flow; Looped load flow; Power distribution systems 1. Introduction Also, the method presented in Ref. [10] was tested in radial distribution systems with no more than 528 buses. Parallel and distributed computing has been applied More recent works [11Á/14] presented distributed to many scientific and engineering computations such as implementations for power flows or power flow based weather forecasting and nuclear simulations [1,2]. It also algorithms like optimizations and contingency analysis. has been applied to power system analysis calculations These works also targeted power transmission systems. [3Á/14].
    [Show full text]
  • BIOINFORMATICS APPLICATIONS NOTE Doi:10.1093/Bioinformatics/Btq011
    Vol. 26 no. 5 2010, pages 705–707 BIOINFORMATICS APPLICATIONS NOTE doi:10.1093/bioinformatics/btq011 Databases and ontologies Advance Access publication January 19, 2010 ProteinWorldDB: querying radical pairwise alignments among protein sets from complete genomes Thomas Dan Otto1,2,∗, Marcos Catanho1, Cristian Tristão3, Márcia Bezerra3, Renan Mathias Fernandes4, Guilherme Steinberger Elias4, Alexandre Capeletto Scaglia4, Bill Bovermann5, Viktors Berstis5, Sergio Lifschitz3, Antonio Basílio de Miranda1 and Wim Degrave1 1Laboratório de Genômica Funcional e Bioinformática, Instituto Oswaldo Cruz, Fiocruz, Rio de Janeiro, Brazil, 2Pathogen Genomics, Wellcome Trust Genome Campus, Hinxton, UK, 3Departamento de Informática, Pontifícia Universidade Católica do Rio de Janeiro, Rio de Janeiro, 4IBM Brasil, Hortolândia, São Paulo, Brazil and 5IBM, Austin, TX, USA Associate Editor: Alfonso Valencia ABSTRACT nomenclature or might have no value when inferred from previous Motivation: Many analyses in modern biological research are incorrectly annotated sequences. Hence, secondary databases based on comparisons between biological sequences, resulting such as Swiss-Prot (http://www.expasy.ch/sprot/), PFAM (http:// in functional, evolutionary and structural inferences. When large pfam.sanger.ac.uk) or KEGG (http://www.genome.ad.jp/kegg), to numbers of sequences are compared, heuristics are often used mention only a few, have been implemented to analyze specific resulting in a certain lack of accuracy. In order to improve functional aspects and to improve the annotation procedures and and validate results of such comparisons, we have performed results. radical all-against-all comparisons of 4 million protein sequences Dynamic programming algorithms, or a fast approximation, belonging to the RefSeq database, using an implementation of the have been successfully applied to biological sequence comparison Smith–Waterman algorithm.
    [Show full text]
  • Cluster, Grid and Cloud Computing: a Detailed Comparison
    The 6th International Conference on Computer Science & Education (ICCSE 2011) August 3-5, 2011. SuperStar Virgo, Singapore ThC 3.33 Cluster, Grid and Cloud Computing: A Detailed Comparison Naidila Sadashiv S. M Dilip Kumar Dept. of Computer Science and Engineering Dept. of Computer Science and Engineering Acharya Institute of Technology University Visvesvaraya College of Engineering (UVCE) Bangalore, India Bangalore, India [email protected] [email protected] Abstract—Cloud computing is rapidly growing as an alterna- with out any prior reservation and hence eliminates over- tive to conventional computing. However, it is based on models provisioning and improves resource utilization. like cluster computing, distributed computing, utility computing To the best of our knowledge, in the literature, only a few and grid computing in general. This paper presents an end-to- end comparison between Cluster Computing, Grid Computing comparisons have been appeared in the field of computing. and Cloud Computing, along with the challenges they face. This In this paper we bring out a complete comparison of the could help in better understanding these models and to know three computing models. Rest of the paper is organized as how they differ from its related concepts, all in one go. It also follows. The cluster computing, grid computing and cloud discusses the ongoing projects and different applications that use computing models are briefly explained in Section II. Issues these computing models as a platform for execution. An insight into some of the tools which can be used in the three computing and challenges related to these computing models are listed models to design and develop applications is given.
    [Show full text]
  • Grid Computing: What Is It, and Why Do I Care?*
    Grid Computing: What Is It, and Why Do I Care?* Ken MacInnis <[email protected]> * Or, “Mi caja es su caja!” (c) Ken MacInnis 2004 1 Outline Introduction and Motivation Examples Architecture, Components, Tools Lessons Learned and The Future Questions? (c) Ken MacInnis 2004 2 What is “grid computing”? Many different definitions: Utility computing Cycles for sale Distributed computing distributed.net RC5, SETI@Home High-performance resource sharing Clusters, storage, visualization, networking “We will probably see the spread of ‘computer utilities’, which, like present electric and telephone utilities, will service individual homes and offices across the country.” Len Kleinrock (1969) The word “grid” doesn’t equal Grid Computing: Sun Grid Engine is a mere scheduler! (c) Ken MacInnis 2004 3 Better definitions: Common protocols allowing large problems to be solved in a distributed multi-resource multi-user environment. “A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities.” Kesselman & Foster (1998) “…coordinated resource sharing and problem solving in dynamic, multi- institutional virtual organizations.” Kesselman, Foster, Tuecke (2000) (c) Ken MacInnis 2004 4 New Challenges for Computing Grid computing evolved out of a need to share resources Flexible, ever-changing “virtual organizations” High-energy physics, astronomy, more Differing site policies with common needs Disparate computing needs
    [Show full text]
  • Cloud Computing Over Cluster, Grid Computing: a Comparative Analysis
    Journal of Grid and Distributed Computing Volume 1, Issue 1, 2011, pp-01-04 Available online at: http://www.bioinfo.in/contents.php?id=92 Cloud Computing Over Cluster, Grid Computing: a Comparative Analysis 1Indu Gandotra, 2Pawanesh Abrol, 3 Pooja Gupta, 3Rohit Uppal and 3Sandeep Singh 1Department of MCA, MIET, Jammu 2Department of Computer Science & IT, Jammu Univ, Jammu 3Department of MCA, MIET, Jammu e-mail: [email protected], [email protected], [email protected], [email protected], [email protected] Abstract—There are dozens of definitions for cloud Virtualization is a technology that enables sharing of computing and through each definition we can get the cloud resources. Cloud computing platform can become different idea about what a cloud computing exacting is? more flexible, extensible and reusable by adopting the Cloud computing is not a very new concept because it is concept of service oriented architecture [5].We will not connected to grid computing paradigm whose concept came need to unwrap the shrink wrapped software and install. into existence thirteen years ago. Cloud computing is not only related to Grid Computing but also to Utility computing The cloud is really very easier, just to install single as well as Cluster computing. Cloud computing is a software in the centralized facility and cover all the computing platform for sharing resources that include requirements of the company’s users [1]. software’s, business process, infrastructures and applications. Cloud computing also relies on the technology II. CLUSTER COMPUTING of virtualization. In this paper, we will discuss about Grid computing, Cluster computing and Cloud computing i.e.
    [Show full text]
  • “Grid Computing”
    VISHVESHWARAIAH TECHNOLOGICAL UNIVERSITY S.D.M COLLEGE OF ENGINEERING AND TECHNOLOGY A seminar report on “Grid Computing” Submitted by Nagaraj Baddi (2SD07CS402) 8th semester DEPARTMENT OF COMPUTER SCIENCE ENGINEERING 2009-10 1 VISHVESHWARAIAH TECHNOLOGICAL UNIVERSITY S.D.M COLLEGE OF ENGINEERING AND TECHNOLOGY DEPARTMENT OF COMPUTER SCIENCE ENGINEERING CERTIFICATE Certified that the seminar work entitled “Grid Computing” is a bonafide work presented by Mr. Nagaraj.M.Baddi, bearing USN 2SD07CS402 in a partial fulfillment for the award of degree of Bachelor of Engineering in Computer Science Engineering of the Vishveshwaraiah Technological University Belgaum, during the year 2009-10. The seminar report has been approved as it satisfies the academic requirements with respect to seminar work presented for the Bachelor of Engineering Degree. Staff in charge H.O.D CSE (S. L. DESHPANDE) (S. M. JOSHI) Name: Nagaraj M. Baddi USN: 2SD07CS402 2 INDEX 1. Introduction 4 2. History 5 3. How Grid Computing Works 6 4. Related technologies 8 4.1 Cluster computing 8 4.2 Peer-to-peer computing 9 4.3 Internet computing 9 5. Grid Computing Logical Levels 10 5.1 Cluster Grid 10 5.2 Campus Grid 10 5.3 Global Grid 10 6. Grid Architecture 11 6.1 Grid fabric 11 6.2 Core Grid middleware 12 6.3 User-level Grid middleware 12 6.4 Grid applications and portals. 13 7. Grid Applications 13 7.1 Distributed supercomputing 13 7.2 High-throughput computing 14 7.3 On-demand computing 14 7.4 Data-intensive computing 14 7.5 Collaborative computing 15 8. Difference: Grid Computing vs Cloud Computing 15 9.
    [Show full text]
  • Computer Systems Architecture
    CS 352H: Computer Systems Architecture Topic 14: Multicores, Multiprocessors, and Clusters University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell Introduction Goal: connecting multiple computers to get higher performance Multiprocessors Scalability, availability, power efficiency Job-level (process-level) parallelism High throughput for independent jobs Parallel processing program Single program run on multiple processors Multicore microprocessors Chips with multiple processors (cores) University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 2 Hardware and Software Hardware Serial: e.g., Pentium 4 Parallel: e.g., quad-core Xeon e5345 Software Sequential: e.g., matrix multiplication Concurrent: e.g., operating system Sequential/concurrent software can run on serial/parallel hardware Challenge: making effective use of parallel hardware University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 3 What We’ve Already Covered §2.11: Parallelism and Instructions Synchronization §3.6: Parallelism and Computer Arithmetic Associativity §4.10: Parallelism and Advanced Instruction-Level Parallelism §5.8: Parallelism and Memory Hierarchies Cache Coherence §6.9: Parallelism and I/O: Redundant Arrays of Inexpensive Disks University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 4 Parallel Programming Parallel software is the problem Need to get significant performance improvement Otherwise, just use a faster uniprocessor,
    [Show full text]
  • Strategies for Managing Business Disruption Due to Grid Computing
    Strategies for managing business disruption due to Grid Computing by Vidyadhar Phalke Ph.D. Computer Science, Rutgers University, New Jersey, 1995 M.S. Computer Science, Rutgers University, New Jersey, 1992 B.Tech. Computer Science, Indian Institute of Technology, Delhi, 1989 Submitted to the MIT Sloan School of Management in Partial Fulfillment of the Requirements for the Degree of Master of Science in the Management of Technology at the Massachusetts Institute of Technology June 2003 © 2003 Vidyadhar Phalke All Rights Reserved The author hereby grants to MIT permission to reproduce and to distribute publicly paper and electronic copies of this thesis document in whole or in part Signature of Author: MIT Sloan School of Management 9 May 2003 Certified By: Starling D. Hunter III Theodore T. Miller Career Development Assistant Professor Thesis Supervisor Accepted By: David A. Weber Director, Management of Technology Program 2 Strategies for managing business disruption due to Grid Computing by Vidyadhar Phalke Submitted to the MIT Sloan School of Management on May 9 2003 in partial fulfillment of the requirements for the degree of Master of Science in the Management of Technology ABSTRACT In the technology centric businesses disruptive technologies displace incumbents time and again, sometimes to the extent that incumbents go bankrupt. In this thesis we would address the issue of what strategies are essential to prepare for and to manage disruptions for the affected businesses and industries. Specifically we will look at grid computing that is poised to disrupt (1) certain Enterprise IT departments, and (2) the software industry in the high-performance and web services space.
    [Show full text]
  • Taxonomy of Flynn (1966)
    Taxonomy of Flynn (1966). To describe these non-von Neumann or parallel architectures, a generally accepted taxonomy is that of Flynn (1966). The classification is based on the notion of two streams of information flow to a processor: instructions and data. These two streams can be either single or multiple, giving four classes of machines: 1. Single instruction single data (SISD) 2. Single instruction multiple data (SIMD) 3. Multiple instruction single data (MISD) 4. Multiple instruction multiple data (MIMD) Table 5.4 shows the four primary classes and some of the architectures that fit in those classes. Most of these architectures will be briefly discussed. 5.4.2.1 Single Instruction Single Data The SISD architectures encompass standard serial von Neumann architecture computers. In a sense, the SISD category is the base metric for Flynn’s taxonomy. 5.4.2.2 Single Instruction Multiple Data The SIMD computers are essentially array processors. This type of parallel computer architecture has n-processors, each executing the same instruction, but on different data streams. Often each element in the array can only communicate with its nearest neighbour. Computer architectures that are usually classified as SIMD are the systolic and wave-front array computers. In both types of processor, each processing element executes the same (and only) instruction, but on different data. Hence these architectures are SIMD. SIMD machines are widely used for such imaging computation as matrix arithmetic and convolution . 5.4.2.3 Multiple Instruction Single Data The MISD computer architecture lends itself naturally to those computations requiring an input to be subjected to several operations, each receiving the input in its original form.
    [Show full text]
  • Computing at Massive Scale: Scalability and Dependability Challenges
    This is a repository copy of Computing at massive scale: Scalability and dependability challenges. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/105671/ Version: Accepted Version Proceedings Paper: Yang, R and Xu, J orcid.org/0000-0002-4598-167X (2016) Computing at massive scale: Scalability and dependability challenges. In: Proceedings - 2016 IEEE Symposium on Service-Oriented System Engineering, SOSE 2016. 2016 IEEE Symposium on Service-Oriented System Engineering (SOSE), 29 Mar - 02 Apr 2016, Oxford, United Kingdom. IEEE , pp. 386-397. ISBN 9781509022533 https://doi.org/10.1109/SOSE.2016.73 (c) 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works. Reuse Unless indicated otherwise, fulltext items are protected by copyright with all rights reserved. The copyright exception in section 29 of the Copyright, Designs and Patents Act 1988 allows the making of a single copy solely for the purpose of non-commercial research or private study within the limits of fair dealing. The publisher or other rights-holder may allow further reproduction and re-use of this version - refer to the White Rose Research Online record for this item. Where records identify the publisher as the copyright holder, users can verify any specific terms of use on the publisher’s website. Takedown If you consider content in White Rose Research Online to be in breach of UK law, please notify us by emailing [email protected] including the URL of the record and the reason for the withdrawal request.
    [Show full text]
  • Distributed Computing with the Berkeley Open Infrastructure for Network Computing BOINC
    Distributed Computing with the Berkeley Open Infrastructure for Network Computing BOINC Eric Myers -19 Update COVID8 July 2020 1 September 2010 Mid-Hudson Linux Users Group 2 How BOINC Works …for like 8 to 12 hrs! BOINC Client BOINC is the software BOINC Server Windows framework that makes Linux Mac OS this all work. Linux 50+ separate projects (& Solaris, AIX, HP-UX, etc…) 1 September 2010 Mid-Hudson Valley Linux Users Group 3 BOINC Dataflow 1 September 2010 Mid-Hudson Valley Linux Users Group 4 http://setiathome.berkeley.edu SETI@home SETI@home is ”paused” in 2020 http://einstein.phys.uwm.edu/ or http://einsteinathome.orG Einstein@Home 78 new pulsars detected by 2020 7 Rosetta@home 1 September 2010 Mid-Hudson Valley Linux Users Group 8 http://www.worldcommunitygrid.orG/ World CommunityAs of 2010 Grid Active The Clean Energy Project - Phase 2 Help Cure Muscular Dystrophy – Phase 2 Funded and operated by IBM Help Fight Childhood Cancer Help Conquer Cancer Human Proteome Folding - Phase 2 Completed FightAIDS@Home Nutritious Rice for the World Intermittent AfricanClimate@Home Discovering Dengue Drugs - Together - Phase 2 Help Cure Muscular Dystrophy Influenza Antiviral Drug Search Genome Comparison The Clean Energy Project Help Defeat Cancer Discovering Dengue Drugs - Together Human Proteome Folding 1 September 2010 Mid-Hudson Valley Linux Users Group 9 To Join: 1. Download BOINC 2. Run BOINC ManaGer 3. Tools -> Add Project 1 September 2010 Mid-Hudson Valley Linux Users Group 10 Call to Action! Use your spare cycles to help fight COVID-19 World Community Grid Rosetta@home OpenPandemics - COVID-19 8 July 2020 HVopen 11 Rules and Policies Run BOINC only on authorized computers Run BOINC only on computers that you own, or for which you have obtained the owner’s permission.
    [Show full text]