Beowulf Cluster Computing with Linux (Scientific and Engineering

Beowulf Cluster Computing with Linux Scientific and Engineering Computation Janusz Kowalik, editor Data-Parallel Programming on MIMD Computers, Philip J. Hatcher and Michael J. Quinn, 1991 Unstructured Scientific Computation on Scalable Multiprocessors, edited by Piyush Mehrotra, Joel Saltz, and Robert Voigt, 1992 Parallel Computational Fluid Dynamics: Implementation and Results, edited by Horst D. Simon, 1992 Enterprise Integration Modeling: Proceedings of the First International Conference, edited by Charles J. Petrie, Jr., 1992 The High Performance Fortran Handbook, Charles H. Koelbel, David B. Loveman, Robert S. Schreiber, Guy L. Steele Jr. and Mary E. Zosel, 1994 PVM: Parallel Virtual Machine–A Users’ Guide and Tutorial for Network Parallel Computing, Al Geist, Adam Beguelin, Jack Dongarra, Weicheng Jiang, Bob Manchek, and Vaidy Sunderam, 1994 Practical Parallel Programming, Gregory V. Wilson, 1995 Enabling Technologies for Petaflops Computing, Thomas Sterling, Paul Messina, and Paul H. Smith, 1995 An Introduction to High-Performance Scientific Computing, Lloyd D. Fosdick, Elizabeth R. Jessup, Carolyn J. C. Schauble, and Gitta Domik, 1995 Parallel Programming Using C++, edited by Gregory V. Wilson and Paul Lu, 1996 Using PLAPACK: Parallel Linear Algebra Package, Robert A. van de Geijn, 1997 Fortran 95 Handbook, Jeanne C. Adams, Walter S. Brainerd, Jeanne T. Martin, Brian T. Smith, Jerrold L. Wagener, 1997 MPI—The Complete Reference: Volume 1, The MPI Core, Marc Snir, Steve Otto, Steven Huss-Lederman, David Walker, and Jack Dongarra, 1998 MPI—The Complete Reference: Volume 2, The MPI-2 Extensions, William Gropp, Steven Huss-Lederman, Andrew Lumsdaine, Ewing Lusk, Bill Nitzberg, William Saphir, and Marc Snir, 1998 A Programmer’s Guide to ZPL, Lawrence Snyder, 1999 How to Build a Beowulf, Thomas L. Sterling, John Salmon, Donald J. Becker, and Daniel F. Savarese, 1999 Using MPI: Portable Parallel Programming with the Message-Passing Interface, second edition, William Gropp, Ewing Lusk, and Anthony Skjellum, 1999 Using MPI-2: Advanced Features of the Message-Passing Interface, William Gropp, Ewing Lusk, and Rajeev Thakur, 1999 Beowulf Cluster Computing with Linux, Thomas Sterling, 2001 Beowulf Cluster Computing with Windows, Thomas Sterling, 2001 Beowulf Cluster Computing with Linux Thomas Sterling The MIT Press Cambridge, Massachusetts London, England c 2002 Massachusetts Institute of Technology All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher. This book was set in LATEX by the author and was printed and bound in the United States of America. Library of Congress Control Number 2001095383 ISBN: 0–262–69274–0 Disclaimer: Some images in the original version of this book are not available for inclusion in the eBook. Dedicated with respect and appreciation to the memory of Seymour R. Cray 1925–1996 7KLVSDJHLQWHQWLRQDOO\OHIW blank Contents Series Foreword xix Foreword xxi Preface xxix 1 Introduction—Thomas Sterling 1 1.1 Definitions and Taxonomy 1 1.2 Opportunities and Advantages 3 1.3 A Short History 6 1.4 Elements of a Cluster 8 1.5 Description of the Book 10 I Enabling Technologies 2 An Overview of Cluster Computing—Thomas 15 Sterling 2.1 A Taxonomy of Parallel Computing 16 2.2 Hardware System Structure 19 2.2.1 Beowulf Compute Nodes 19 2.2.2 Interconnection Networks 23 2.3 Node Software 25 2.4 Resource Management 25 2.5 Distributed Programming 27 2.6 Conclusions 29 3 Node Hardware—Thomas Sterling 31 3.1 Overview of a Beowulf Node 32 3.1.1 Principal Specifications 34 3.1.2 Basic Elements 35 3.2 Processors 38 3.2.1 Intel Pentium Family 39 3.2.2 AMD Athlon 39 3.2.3 Compaq Alpha 21264 40 viii Contents 3.2.4 IA64 40 3.3 Motherboard 41 3.4 Memory 43 3.4.1 Memory Capacity 43 3.4.2 Memory Speed 43 3.4.3 Memory Types 44 3.4.4 Memory Hierarchy and Caches 45 3.4.5 Package Styles 46 3.5 BIOS 46 3.6 Secondary Storage 47 3.7 PCI Bus 49 3.8 Example of a Beowulf Node 50 3.9 Boxes, Shelves, Piles, and Racks 50 3.10 Node Assembly 52 3.10.1 Motherboard Preassembly 53 3.10.2 The Case 54 3.10.3 Minimal Peripherals 55 3.10.4 Booting the System 56 3.10.5 Installing the Other Components 57 3.10.6 Troubleshooting 59 4 Linux—Peter H. Beckman 61 4.1 What Is Linux? 61 4.1.1 Why Use Linux for a Beowulf? 61 4.1.2 A Kernel and a Distribution 64 4.1.3 Open Source and Free Software 65 4.1.4 A Linux Distribution 67 4.1.5 Version Numbers and Development Methods 69 4.2 The Linux Kernel 71 4.2.1 Compiling a Kernel 72 4.2.2 Loadable Kernel Modules 73 4.2.3 The Beowulf Kernel Diet 74 4.2.4 Diskless Operation 76 Contents ix 4.2.5 Downloading and Compiling a New Kernel 77 4.2.6 Linux File Systems 79 4.3 Pruning Your Beowulf Node 82 4.3.1 inetd.conf 83 4.3.2 /etc/rc.d/init.d 83 4.3.3 Other Processes and Daemons 85 4.4 Other Considerations 86 4.4.1 TCP Messaging 87 4.4.2 Hardware Performance Counters 88 4.5 Final Tuning with /proc 88 4.6 Conclusions 92 5 Network Hardware—Thomas Sterling 95 5.1 Interconnect Technologies 95 5.1.1 The Ethernets 96 5.1.2 Myrinet 97 5.1.3 cLAN 98 5.1.4 Scalable Coherent Interface 99 5.1.5 QsNet 99 5.1.6 Infiniband 100 5.2 A Detailed Look at Ethernet 100 5.2.1 Packet Format 100 5.2.2 NIC Architecture 102 5.2.3 Hubs and Switches 105 5.3 Network Practicalities: Interconnect Choice 106 5.3.1 Importance of the Interconnect 106 5.3.2 Differences between the Interconnect Choices 107 5.3.3 Strategies to Improve Performance over Ethernet 108 5.3.4 Cluster Network Pitfalls 109 5.3.5 An Example of an Ethernet Interconnected Beowulf 110 5.3.6 An Example of a Myrinet Interconnected Cluster 111 6 Network Software—Thomas Sterling 113 x Contents 6.1 TCP/IP 113 6.1.1 IP Addresses 114 6.1.2 Zero-Copy Protocols 115 6.2 Sockets 116 6.3 Higher-Level Protocols 120 6.3.1 Remote Procedure Calls 121 6.3.2 Distributed Objects: CORBA and Java RMI 123 6.4 Distributed File Systems 126 6.4.1 NFS 126 6.4.2 AFS 127 6.4.3 Autofs: The Automounter 128 6.5 Remote Command Execution 128 6.5.1 BSD R Commands 128 6.5.2 SSH—The Secure Shell 130 7 Setting Up Clusters: Installation and 131 Configuration—Thomas Sterling and Daniel Savarese 7.1 System Access Models 131 7.1.1 The Standalone System 132 7.1.2 The Universally Accessible Machine 132 7.1.3 The Guarded Beowulf 132 7.2 Assigning Names 133 7.2.1 Statistically Assigned Addresses 133 7.2.2 Dynamically Assigned Addresses 134 7.3 Installing Node Software 135 7.3.1 Creating Tar Images 136 7.3.2 Setting Upa Clone Root Partition 137 7.3.3 Setting UpBOOTP 138 7.3.4 Building a Clone Boot Floppy 139 7.4 Basic System Administration 140 7.4.1 Booting and Shutting Down 140 7.4.2 The Node File System 141 Contents xi 7.4.3 Account Management 142 7.4.4 Running Unix Commands in Parallel 143 7.5 Avoiding Security Compromises 144 7.5.1 System Configuration 144 7.5.2 Restricting Host Access 145 7.5.3 Secure Shell 146 7.5.4 IP Masquerading 147 7.6 Job Scheduling 149 7.7 Some Advice on Upgrading Your Software 150 8 How Fast Is My Beowulf?—David Bailey 151 8.1 Metrics 151 8.2 Ping-Pong Test 154 8.3 The LINPACK Benchmark 154 8.4 The NAS Parallel Benchmark Suite 156 II Parallel Programming 9 Parallel Programming with MPI—William Gropp 161 and Ewing Lusk 9.1 Hello World in MPI 162 9.1.1 Compiling and Running MPI Programs 163 9.1.2 Adding Communication to Hello World 165 9.2 Manager/Worker Example 169 9.3 Two-Dimensional Jacobi Example with One-Dimensional Decomposition 174 9.4 Collective Operations 178 9.5 Parallel Monte Carlo Computation 183 9.6 Installing MPICH under Linux 183 9.6.1 Obtaining and Installing MPICH 183 9.6.2 Running MPICH Jobs with the ch p4 Device 186 9.6.3 Starting and Managing MPD 187 9.6.4 Running MPICH Jobs under MPD 189 xii Contents 9.6.5 Debugging MPI Programs 189 9.6.6 Other Compilers 191 9.7 Tools 192 9.7.1 Profiling Libraries 192 9.7.2 Visualizing Parallel Program Behavior 193 9.8 MPI Implementations for Clusters 194 9.9 MPI Routine Summary 194 10 Advanced Topics in MPI Programming—William 199 Gropp and Ewing Lusk 10.1 Dynamic Process Management in MPI 199 10.1.1 Intercommunicators 199 10.1.2 Spawning New MPI Processes 200 10.1.3 Revisiting Matrix-Vector Multiplication 200 10.1.4 More on Dynamic Process Management 202 10.2 Fault Tolerance 202 10.3 Revisiting Mesh Exchanges 204 10.3.1 Blocking and Nonblocking Communication 205 10.3.2 Communicating Noncontiguous Data in MPI 207 10.4 Motivation for Communicators 211 10.5 More on Collective Operations 213 10.6 Parallel I/O 215 10.6.1 A Simple Example 217 10.6.2 A More Complex Example 219 10.7 Remote Memory Access 221 10.8 Using C++ and Fortran 90 224 10.9 MPI, OpenMP, and Threads 226 10.10 Measuring MPI Performance 227 10.10.1 mpptest 227 10.10.2 SKaMPI 228 10.10.3 High Performance LINPACK 228 10.11 MPI-2 Status 230 Contents xiii 10.12 MPI Routine Summary 230 11 Parallel Programming with PVM—Al Geist and 237 Stephen Scott 11.1 Overview 237 11.2 Program Examples 242 11.3 Fork/Join 242 11.4 Dot Product 246 11.5 Matrix Multiply 251 11.6 One-Dimensional Heat Equation 257 11.7 Using PVM 265 11.7.1 Setting UpPVM 265 11.7.2 Starting PVM 266 11.7.3 Running PVM Programs 267 11.8 PVM Console Details 269 11.9 Host File Options 272 11.10 XPVM 274 11.10.1 Network View 276 11.10.2 Space-Time View 277 11.10.3 Other Views 278 12 Fault-Tolerant and Adaptive Programs with 281 PVM—Al Geist and Jim Kohl 12.1 Considerations for Fault Tolerance 282 12.2 Building Fault-Tolerant Parallel Applications 283 12.3 Adaptive Programs

Beowulf Cluster Computing with Linux (Scientific and Engineering

Multicomputer Cluster

Bench - Benchmarking the State-Of- The-Art Task Execution Frameworks of Many- Task Computing

Cluster Computing: Architectures, Operating Systems, Parallel Processing & Programming Languages

Accelerated AC Contingency Calculation on Commodity Multi

Adaptive Data Migration in Load-Imbalanced HPC Applications

Building a Beowulf Cluster

Beowulf Clusters — an Overview

Beowulf Clusters Make Supercomputing Accessible

Improving MPI Threading Support for Current Hardware Architectures

Exascale Computing Project -- Software

Parallel Data Mining from Multicore to Cloudy Grids

Spark on Hadoop Vs MPI/Openmp on Beowulf