Beowulf Cluster Computing with Linux (Scientific and Engineering

Total Page:16

File Type:pdf, Size:1020Kb

Beowulf Cluster Computing with Linux (Scientific and Engineering Beowulf Cluster Computing with Linux Scientific and Engineering Computation Janusz Kowalik, editor Data-Parallel Programming on MIMD Computers, Philip J. Hatcher and Michael J. Quinn, 1991 Unstructured Scientific Computation on Scalable Multiprocessors, edited by Piyush Mehrotra, Joel Saltz, and Robert Voigt, 1992 Parallel Computational Fluid Dynamics: Implementation and Results, edited by Horst D. Simon, 1992 Enterprise Integration Modeling: Proceedings of the First International Conference, edited by Charles J. Petrie, Jr., 1992 The High Performance Fortran Handbook, Charles H. Koelbel, David B. Loveman, Robert S. Schreiber, Guy L. Steele Jr. and Mary E. Zosel, 1994 PVM: Parallel Virtual Machine–A Users’ Guide and Tutorial for Network Parallel Computing, Al Geist, Adam Beguelin, Jack Dongarra, Weicheng Jiang, Bob Manchek, and Vaidy Sunderam, 1994 Practical Parallel Programming, Gregory V. Wilson, 1995 Enabling Technologies for Petaflops Computing, Thomas Sterling, Paul Messina, and Paul H. Smith, 1995 An Introduction to High-Performance Scientific Computing, Lloyd D. Fosdick, Elizabeth R. Jessup, Carolyn J. C. Schauble, and Gitta Domik, 1995 Parallel Programming Using C++, edited by Gregory V. Wilson and Paul Lu, 1996 Using PLAPACK: Parallel Linear Algebra Package, Robert A. van de Geijn, 1997 Fortran 95 Handbook, Jeanne C. Adams, Walter S. Brainerd, Jeanne T. Martin, Brian T. Smith, Jerrold L. Wagener, 1997 MPI—The Complete Reference: Volume 1, The MPI Core, Marc Snir, Steve Otto, Steven Huss-Lederman, David Walker, and Jack Dongarra, 1998 MPI—The Complete Reference: Volume 2, The MPI-2 Extensions, William Gropp, Steven Huss-Lederman, Andrew Lumsdaine, Ewing Lusk, Bill Nitzberg, William Saphir, and Marc Snir, 1998 A Programmer’s Guide to ZPL, Lawrence Snyder, 1999 How to Build a Beowulf, Thomas L. Sterling, John Salmon, Donald J. Becker, and Daniel F. Savarese, 1999 Using MPI: Portable Parallel Programming with the Message-Passing Interface, second edition, William Gropp, Ewing Lusk, and Anthony Skjellum, 1999 Using MPI-2: Advanced Features of the Message-Passing Interface, William Gropp, Ewing Lusk, and Rajeev Thakur, 1999 Beowulf Cluster Computing with Linux, Thomas Sterling, 2001 Beowulf Cluster Computing with Windows, Thomas Sterling, 2001 Beowulf Cluster Computing with Linux Thomas Sterling The MIT Press Cambridge, Massachusetts London, England c 2002 Massachusetts Institute of Technology All rights reserved. No part of this book may be reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher. This book was set in LATEX by the author and was printed and bound in the United States of America. Library of Congress Control Number 2001095383 ISBN: 0–262–69274–0 Disclaimer: Some images in the original version of this book are not available for inclusion in the eBook. Dedicated with respect and appreciation to the memory of Seymour R. Cray 1925–1996 7KLVSDJHLQWHQWLRQDOO\OHIW blank Contents Series Foreword xix Foreword xxi Preface xxix 1 Introduction—Thomas Sterling 1 1.1 Definitions and Taxonomy 1 1.2 Opportunities and Advantages 3 1.3 A Short History 6 1.4 Elements of a Cluster 8 1.5 Description of the Book 10 I Enabling Technologies 2 An Overview of Cluster Computing—Thomas 15 Sterling 2.1 A Taxonomy of Parallel Computing 16 2.2 Hardware System Structure 19 2.2.1 Beowulf Compute Nodes 19 2.2.2 Interconnection Networks 23 2.3 Node Software 25 2.4 Resource Management 25 2.5 Distributed Programming 27 2.6 Conclusions 29 3 Node Hardware—Thomas Sterling 31 3.1 Overview of a Beowulf Node 32 3.1.1 Principal Specifications 34 3.1.2 Basic Elements 35 3.2 Processors 38 3.2.1 Intel Pentium Family 39 3.2.2 AMD Athlon 39 3.2.3 Compaq Alpha 21264 40 viii Contents 3.2.4 IA64 40 3.3 Motherboard 41 3.4 Memory 43 3.4.1 Memory Capacity 43 3.4.2 Memory Speed 43 3.4.3 Memory Types 44 3.4.4 Memory Hierarchy and Caches 45 3.4.5 Package Styles 46 3.5 BIOS 46 3.6 Secondary Storage 47 3.7 PCI Bus 49 3.8 Example of a Beowulf Node 50 3.9 Boxes, Shelves, Piles, and Racks 50 3.10 Node Assembly 52 3.10.1 Motherboard Preassembly 53 3.10.2 The Case 54 3.10.3 Minimal Peripherals 55 3.10.4 Booting the System 56 3.10.5 Installing the Other Components 57 3.10.6 Troubleshooting 59 4 Linux—Peter H. Beckman 61 4.1 What Is Linux? 61 4.1.1 Why Use Linux for a Beowulf? 61 4.1.2 A Kernel and a Distribution 64 4.1.3 Open Source and Free Software 65 4.1.4 A Linux Distribution 67 4.1.5 Version Numbers and Development Methods 69 4.2 The Linux Kernel 71 4.2.1 Compiling a Kernel 72 4.2.2 Loadable Kernel Modules 73 4.2.3 The Beowulf Kernel Diet 74 4.2.4 Diskless Operation 76 Contents ix 4.2.5 Downloading and Compiling a New Kernel 77 4.2.6 Linux File Systems 79 4.3 Pruning Your Beowulf Node 82 4.3.1 inetd.conf 83 4.3.2 /etc/rc.d/init.d 83 4.3.3 Other Processes and Daemons 85 4.4 Other Considerations 86 4.4.1 TCP Messaging 87 4.4.2 Hardware Performance Counters 88 4.5 Final Tuning with /proc 88 4.6 Conclusions 92 5 Network Hardware—Thomas Sterling 95 5.1 Interconnect Technologies 95 5.1.1 The Ethernets 96 5.1.2 Myrinet 97 5.1.3 cLAN 98 5.1.4 Scalable Coherent Interface 99 5.1.5 QsNet 99 5.1.6 Infiniband 100 5.2 A Detailed Look at Ethernet 100 5.2.1 Packet Format 100 5.2.2 NIC Architecture 102 5.2.3 Hubs and Switches 105 5.3 Network Practicalities: Interconnect Choice 106 5.3.1 Importance of the Interconnect 106 5.3.2 Differences between the Interconnect Choices 107 5.3.3 Strategies to Improve Performance over Ethernet 108 5.3.4 Cluster Network Pitfalls 109 5.3.5 An Example of an Ethernet Interconnected Beowulf 110 5.3.6 An Example of a Myrinet Interconnected Cluster 111 6 Network Software—Thomas Sterling 113 x Contents 6.1 TCP/IP 113 6.1.1 IP Addresses 114 6.1.2 Zero-Copy Protocols 115 6.2 Sockets 116 6.3 Higher-Level Protocols 120 6.3.1 Remote Procedure Calls 121 6.3.2 Distributed Objects: CORBA and Java RMI 123 6.4 Distributed File Systems 126 6.4.1 NFS 126 6.4.2 AFS 127 6.4.3 Autofs: The Automounter 128 6.5 Remote Command Execution 128 6.5.1 BSD R Commands 128 6.5.2 SSH—The Secure Shell 130 7 Setting Up Clusters: Installation and 131 Configuration—Thomas Sterling and Daniel Savarese 7.1 System Access Models 131 7.1.1 The Standalone System 132 7.1.2 The Universally Accessible Machine 132 7.1.3 The Guarded Beowulf 132 7.2 Assigning Names 133 7.2.1 Statistically Assigned Addresses 133 7.2.2 Dynamically Assigned Addresses 134 7.3 Installing Node Software 135 7.3.1 Creating Tar Images 136 7.3.2 Setting Upa Clone Root Partition 137 7.3.3 Setting UpBOOTP 138 7.3.4 Building a Clone Boot Floppy 139 7.4 Basic System Administration 140 7.4.1 Booting and Shutting Down 140 7.4.2 The Node File System 141 Contents xi 7.4.3 Account Management 142 7.4.4 Running Unix Commands in Parallel 143 7.5 Avoiding Security Compromises 144 7.5.1 System Configuration 144 7.5.2 Restricting Host Access 145 7.5.3 Secure Shell 146 7.5.4 IP Masquerading 147 7.6 Job Scheduling 149 7.7 Some Advice on Upgrading Your Software 150 8 How Fast Is My Beowulf?—David Bailey 151 8.1 Metrics 151 8.2 Ping-Pong Test 154 8.3 The LINPACK Benchmark 154 8.4 The NAS Parallel Benchmark Suite 156 II Parallel Programming 9 Parallel Programming with MPI—William Gropp 161 and Ewing Lusk 9.1 Hello World in MPI 162 9.1.1 Compiling and Running MPI Programs 163 9.1.2 Adding Communication to Hello World 165 9.2 Manager/Worker Example 169 9.3 Two-Dimensional Jacobi Example with One-Dimensional Decomposition 174 9.4 Collective Operations 178 9.5 Parallel Monte Carlo Computation 183 9.6 Installing MPICH under Linux 183 9.6.1 Obtaining and Installing MPICH 183 9.6.2 Running MPICH Jobs with the ch p4 Device 186 9.6.3 Starting and Managing MPD 187 9.6.4 Running MPICH Jobs under MPD 189 xii Contents 9.6.5 Debugging MPI Programs 189 9.6.6 Other Compilers 191 9.7 Tools 192 9.7.1 Profiling Libraries 192 9.7.2 Visualizing Parallel Program Behavior 193 9.8 MPI Implementations for Clusters 194 9.9 MPI Routine Summary 194 10 Advanced Topics in MPI Programming—William 199 Gropp and Ewing Lusk 10.1 Dynamic Process Management in MPI 199 10.1.1 Intercommunicators 199 10.1.2 Spawning New MPI Processes 200 10.1.3 Revisiting Matrix-Vector Multiplication 200 10.1.4 More on Dynamic Process Management 202 10.2 Fault Tolerance 202 10.3 Revisiting Mesh Exchanges 204 10.3.1 Blocking and Nonblocking Communication 205 10.3.2 Communicating Noncontiguous Data in MPI 207 10.4 Motivation for Communicators 211 10.5 More on Collective Operations 213 10.6 Parallel I/O 215 10.6.1 A Simple Example 217 10.6.2 A More Complex Example 219 10.7 Remote Memory Access 221 10.8 Using C++ and Fortran 90 224 10.9 MPI, OpenMP, and Threads 226 10.10 Measuring MPI Performance 227 10.10.1 mpptest 227 10.10.2 SKaMPI 228 10.10.3 High Performance LINPACK 228 10.11 MPI-2 Status 230 Contents xiii 10.12 MPI Routine Summary 230 11 Parallel Programming with PVM—Al Geist and 237 Stephen Scott 11.1 Overview 237 11.2 Program Examples 242 11.3 Fork/Join 242 11.4 Dot Product 246 11.5 Matrix Multiply 251 11.6 One-Dimensional Heat Equation 257 11.7 Using PVM 265 11.7.1 Setting UpPVM 265 11.7.2 Starting PVM 266 11.7.3 Running PVM Programs 267 11.8 PVM Console Details 269 11.9 Host File Options 272 11.10 XPVM 274 11.10.1 Network View 276 11.10.2 Space-Time View 277 11.10.3 Other Views 278 12 Fault-Tolerant and Adaptive Programs with 281 PVM—Al Geist and Jim Kohl 12.1 Considerations for Fault Tolerance 282 12.2 Building Fault-Tolerant Parallel Applications 283 12.3 Adaptive Programs
Recommended publications
  • Multicomputer Cluster
    Multicomputer • Multiple (full) computers connected by network. • Distributed memory each have special address space. • Access to data another processor is explicit in program, express by call function for sending or receiving message. • Don’t need special operating System, enough libraries with function for sub sending message. • Good scalability. In this section we discuss network computing, in which the nodes are stand- alone computers that could be connected via a switch, local area network, or the Internet. The main idea is to divide the application into semi-independent parts according to the kind of processing needed. Different nodes on the network can be assigned different parts of the application. This form of network computing takes advantage of the unique capabilities of diverse system architectures. It also maximally leverages potentially idle resources within a large organization. Therefore, unused CPU cycles may be utilized during short periods of time resulting in bursts of activity followed by periods of inactivity. In what follows, we discuss the utilization of network technology in order to create a computing infrastructure using commodity computers. Cluster • In 1990 shifted from expensive and specialized parallel machines to the more cost-effective clusters of PCs and workstations. • A cluster is a collection of stand-alone computers connected using some interconnection network. • Each node in a cluster could be a workstation. • Important for it to have fast processors and fast network to enable it to use for distributed system. • Cluster workstation component: 1. Fast processor/memory and complete HW for PC. 2. Free access SW. 3. High execute, low latency. The 1990s have witnessed a significant shift from expensive and specialized parallel machines to the more cost-effective clusters of PCs and workstations.
    [Show full text]
  • Bench - Benchmarking the State-Of- The-Art Task Execution Frameworks of Many- Task Computing
    MATRIX: Bench - Benchmarking the state-of- the-art Task Execution Frameworks of Many- Task Computing Thomas Dubucq, Tony Forlini, Virgile Landeiro Dos Reis, and Isabelle Santos Illinois Institute of Technology, Chicago, IL, USA {tdubucq, tforlini, vlandeir, isantos1}@hawk.iit.edu Stanford University. Finally HPX is a general purpose C++ Abstract — Technology trends indicate that exascale systems will runtime system for parallel and distributed applications of any have billion-way parallelism, and each node will have about three scale developed by Louisiana State University and Staple is a orders of magnitude more intra-node parallelism than today’s framework for developing parallel programs from Texas A&M. peta-scale systems. The majority of current runtime systems focus a great deal of effort on optimizing the inter-node parallelism by MATRIX is a many-task computing job scheduling system maximizing the bandwidth and minimizing the latency of the use [3]. There are many resource managing systems aimed towards of interconnection networks and storage, but suffer from the lack data-intensive applications. Furthermore, distributed task of scalable solutions to expose the intra-node parallelism. Many- scheduling in many-task computing is a problem that has been task computing (MTC) is a distributed fine-grained paradigm that considered by many research teams. In particular, Charm++ [4], aims to address the challenges of managing parallelism and Legion [5], Swift [6], [10], Spark [1][2], HPX [12], STAPL [13] locality of exascale systems. MTC applications are typically structured as direct acyclic graphs of loosely coupled short tasks and MATRIX [11] offer solutions to this problem and have with explicit input/output data dependencies.
    [Show full text]
  • Cluster Computing: Architectures, Operating Systems, Parallel Processing & Programming Languages
    Cluster Computing Architectures, Operating Systems, Parallel Processing & Programming Languages Author Name: Richard S. Morrison Revision Version 2.4, Monday, 28 April 2003 Copyright © Richard S. Morrison 1998 – 2003 This document is distributed under the GNU General Public Licence [39] Print date: Tuesday, 28 April 2003 Document owner: Richard S. Morrison, [email protected] ✈ +612-9928-6881 Document name: CLUSTER_COMPUTING_THEORY Stored: (\\RSM\FURTHER_RESEARCH\CLUSTER_COMPUTING) Revision Version 2.4 Copyright © 2003 Synopsis & Acknolegdements My interest in Supercomputing through the use of clusters has been long standing and was initially sparked by an article in Electronic Design [33] in August 1998 on the Avalon Beowulf Cluster [24]. Between August 1998 and August 1999 I gathered information from websites and parallel research groups. This culminated in September 1999 when I organised the collected material and wove a common thread through the subject matter producing two handbooks for my own use on cluster computing. Each handbook is of considerable length, which was governed by the wealth of information and research conducted in this area over the last 5 years. The cover the handbooks are shown in Figure 1-1 below. Figure 1-1 – Author Compiled Beowulf Class 1 Handbooks Through my experimentation using the Linux Operating system and the undertaking of the University of Technology, Sydney (UTS) undergraduate subject Operating Systems in Autumn Semester 1999 with Noel Carmody, a systems level focus was developed and is the core element of this material contained in this document. This led to my membership to the IEEE and the IEEE Technical Committee on Parallel Processing, where I am able to gather and contribute information and be kept up to date on the latest issues.
    [Show full text]
  • Accelerated AC Contingency Calculation on Commodity Multi
    1 Accelerated AC Contingency Calculation on Commodity Multi-core SIMD CPUs Tao Cui, Student Member, IEEE, Rui Yang, Student Member, IEEE, Gabriela Hug, Member, IEEE, Franz Franchetti, Member, IEEE Abstract—Multi-core CPUs with multiple levels of parallelism In the computing industry, the performance capability of (i.e. data level, instruction level and task/core level) have become the computing platform has been growing rapidly in the last the mainstream CPUs for commodity computing systems. Based several decades at a roughly exponential rate [3]. The recent on the multi-core CPUs, in this paper we developed a high performance computing framework for AC contingency calcula- mainstream commodity CPUs enable us to build inexpensive tion (ACCC) to fully utilize the computing power of commodity computing systems with similar computational power as the systems for online and real time applications. Using Woodbury supercomputers just ten years ago. However, these advances in matrix identity based compensation method, we transform and hardware performance result from the increasing complexity pack multiple contingency cases of different outages into a of the computer architecture and they actually increase the dif- fine grained vectorized data parallel programming model. We implement the data parallel programming model using SIMD ficulty of fully utilizing the available computational power for instruction extension on x86 CPUs, therefore, fully taking advan- a specific application [4]. This paper focuses on fully utilizing tages of the CPU core with SIMD floating point capability. We the computing power of modern CPUs by code optimization also implement a thread pool scheduler for ACCC on multi-core and parallelization for specific hardware, enabling the real- CPUs which automatically balances the computing loads across time complete ACCC application for practical power grids on CPU cores to fully utilize the multi-core capability.
    [Show full text]
  • Adaptive Data Migration in Load-Imbalanced HPC Applications
    Louisiana State University LSU Digital Commons LSU Doctoral Dissertations Graduate School 10-16-2020 Adaptive Data Migration in Load-Imbalanced HPC Applications Parsa Amini Louisiana State University and Agricultural and Mechanical College Follow this and additional works at: https://digitalcommons.lsu.edu/gradschool_dissertations Part of the Computer Sciences Commons Recommended Citation Amini, Parsa, "Adaptive Data Migration in Load-Imbalanced HPC Applications" (2020). LSU Doctoral Dissertations. 5370. https://digitalcommons.lsu.edu/gradschool_dissertations/5370 This Dissertation is brought to you for free and open access by the Graduate School at LSU Digital Commons. It has been accepted for inclusion in LSU Doctoral Dissertations by an authorized graduate school editor of LSU Digital Commons. For more information, please [email protected]. ADAPTIVE DATA MIGRATION IN LOAD-IMBALANCED HPC APPLICATIONS A Dissertation Submitted to the Graduate Faculty of the Louisiana State University and Agricultural and Mechanical College in partial fulfillment of the requirements for the degree of Doctor of Philosophy in The Department of Computer Science by Parsa Amini B.S., Shahed University, 2013 M.S., New Mexico State University, 2015 December 2020 Acknowledgments This effort has been possible, thanks to the involvement and assistance of numerous people. First and foremost, I thank my advisor, Dr. Hartmut Kaiser, who made this journey possible with their invaluable support, precise guidance, and generous sharing of expertise. It has been a great privilege and opportunity for me be your student, a part of the STE||AR group, and the HPX development effort. I would also like to thank my mentor and former advisor at New Mexico State University, Dr.
    [Show full text]
  • Building a Beowulf Cluster
    Building a Beowulf cluster Åsmund Ødegård April 4, 2001 1 Introduction The main part of the introduction is only contained in the slides for this session. Some of the acronyms and names in this paper may be unknown. In Appendix B we includ short descriptions for some of them. Most of this is taken from “whatis” [6] 2 Outline of the installation ² Install linux on a PC ² Configure the PC to act as a install–server for the cluster ² Wire up the network if that isn’t done already ² Install linux on the rest of the nodes ² Configure one PC, e.g the install–server, to be a server for your cluster. These are the main steps required to build a linux cluster, but each step can be done in many different ways. How you prefer to do it, depends mainly on personal taste, though. Therefor, I will translate the given outline into this list: ² Install Debian GNU/Linux on a PC ² Install and configure “FAI” on the PC ² Build the “FAI” boot–floppy ² Assemble hardware information, and finalize the “FAI” configuration ² Boot each node with the boot–floppy ² Install and configure a queue system and software for running parallel jobs on your cluster 3 Debian The choice of Linux distribution is most of all a matter of personal taste. I prefer the Debian distri- bution for various reasons. So, the first step in the cluster–building process is to pick one of the PCs as a install–server, and install Debian onto it, as follows: ² Make sure that the computer can boot from cdrom.
    [Show full text]
  • Beowulf Clusters — an Overview
    WinterSchool 2001 Å. Ødegård Beowulf clusters — an overview Åsmund Ødegård April 4, 2001 Beowulf Clusters 1 WinterSchool 2001 Å. Ødegård Contents Introduction 3 What is a Beowulf 5 The history of Beowulf 6 Who can build a Beowulf 10 How to design a Beowulf 11 Beowulfs in more detail 12 Rules of thumb 26 What Beowulfs are Good For 30 Experiments 31 3D nonlinear acoustic fields 35 Incompressible Navier–Stokes 42 3D nonlinear water wave 44 Beowulf Clusters 2 WinterSchool 2001 Å. Ødegård Introduction Why clusters ? ² “Work harder” – More CPU–power, more memory, more everything ² “Work smarter” – Better algorithms ² “Get help” – Let more boxes work together to solve the problem – Parallel processing ² by Greg Pfister Beowulf Clusters 3 WinterSchool 2001 Å. Ødegård ² Beowulfs in the Parallel Computing picture: Parallel Computing MetaComputing Clusters Tightly Coupled Vector WS farms Pile of PCs NOW NT/Win2k Clusters Beowulf CC-NUMA Beowulf Clusters 4 WinterSchool 2001 Å. Ødegård What is a Beowulf ² Mass–market commodity off the shelf (COTS) ² Low cost local area network (LAN) ² Open Source UNIX like operating system (OS) ² Execute parallel application programmed with a message passing model (MPI) ² Anything from small systems to large, fast systems. The fastest rank as no.84 on todays Top500. ² The best price/performance system available for many applications ² Philosophy: The cheapest system available which solve your problem in reasonable time Beowulf Clusters 5 WinterSchool 2001 Å. Ødegård The history of Beowulf ² 1993: Perfect conditions for the first Beowulf – Major CPU performance advance: 80286 ¡! 80386 – DRAM of reasonable costs and densities (8MB) – Disk drives of several 100MBs available for PC – Ethernet (10Mbps) controllers and hubs cheap enough – Linux improved rapidly, and was in a usable state – PVM widely accepted as a cross–platform message passing model ² Clustering was done with commercial UNIX, but the cost was high.
    [Show full text]
  • Beowulf Clusters Make Supercomputing Accessible
    Nor-Tech Contributes to NASA Article: Beowulf Clusters Make Supercomputing Accessible Original article available at NASA Spinoff: https://spinoff.nasa.gov/Spinoff2020/it_1.html NASA Technology In the Old English epic Beowulf, the warrior Unferth, jealous of the eponymous hero’s bravery, openly doubts Beowulf’s odds of slaying the monster Grendel that has tormented the Danes for 12 years, promising a “grim grappling” if he dares confront the dreaded march-stepper. A thousand years later, many in the supercomputing world were similarly skeptical of a team of NASA engineers trying achieve supercomputer-class processing on a cluster of standard desktop computers running a relatively untested open source operating system. “Not only did nobody care, but there were even a number of people hostile to this project,” says Thomas Sterling, who led the small team at NASA’s Goddard Space Flight Center in the early 1990s. “Because it was different. Because it was completely outside the scope of the Thomas Sterling, who co-invented the Beowulf supercomputing cluster at Goddard Space Flight Center, poses with the Naegling cluster at California supercomputing community at that time.” Technical Institute in 1997. Consisting of 120 Pentium Pro processors, The technology, now known as the Naegling was the first cluster to hit 10 gigaflops of sustained performance. Beowulf cluster, would ultimately succeed beyond its inventors’ imaginations. In 1993, however, its odds may indeed have seemed long. The U.S. Government, nervous about Japan’s high- performance computing effort, had already been pouring money into computer architecture research at NASA and other Federal agencies for more than a decade, and results were frustrating.
    [Show full text]
  • Improving MPI Threading Support for Current Hardware Architectures
    University of Tennessee, Knoxville TRACE: Tennessee Research and Creative Exchange Doctoral Dissertations Graduate School 12-2019 Improving MPI Threading Support for Current Hardware Architectures Thananon Patinyasakdikul University of Tennessee, [email protected] Follow this and additional works at: https://trace.tennessee.edu/utk_graddiss Recommended Citation Patinyasakdikul, Thananon, "Improving MPI Threading Support for Current Hardware Architectures. " PhD diss., University of Tennessee, 2019. https://trace.tennessee.edu/utk_graddiss/5631 This Dissertation is brought to you for free and open access by the Graduate School at TRACE: Tennessee Research and Creative Exchange. It has been accepted for inclusion in Doctoral Dissertations by an authorized administrator of TRACE: Tennessee Research and Creative Exchange. For more information, please contact [email protected]. To the Graduate Council: I am submitting herewith a dissertation written by Thananon Patinyasakdikul entitled "Improving MPI Threading Support for Current Hardware Architectures." I have examined the final electronic copy of this dissertation for form and content and recommend that it be accepted in partial fulfillment of the equirr ements for the degree of Doctor of Philosophy, with a major in Computer Science. Jack Dongarra, Major Professor We have read this dissertation and recommend its acceptance: Michael Berry, Michela Taufer, Yingkui Li Accepted for the Council: Dixie L. Thompson Vice Provost and Dean of the Graduate School (Original signatures are on file with official studentecor r ds.) Improving MPI Threading Support for Current Hardware Architectures A Dissertation Presented for the Doctor of Philosophy Degree The University of Tennessee, Knoxville Thananon Patinyasakdikul December 2019 c by Thananon Patinyasakdikul, 2019 All Rights Reserved. ii To my parents Thanawij and Issaree Patinyasakdikul, my little brother Thanarat Patinyasakdikul for their love, trust and support.
    [Show full text]
  • Exascale Computing Project -- Software
    Exascale Computing Project -- Software Paul Messina, ECP Director Stephen Lee, ECP Deputy Director ASCAC Meeting, Arlington, VA Crystal City Marriott April 19, 2017 www.ExascaleProject.org ECP scope and goals Develop applications Partner with vendors to tackle a broad to develop computer spectrum of mission Support architectures that critical problems national security support exascale of unprecedented applications complexity Develop a software Train a next-generation Contribute to the stack that is both workforce of economic exascale-capable and computational competitiveness usable on industrial & scientists, engineers, of the nation academic scale and computer systems, in collaboration scientists with vendors 2 Exascale Computing Project, www.exascaleproject.org ECP has formulated a holistic approach that uses co- design and integration to achieve capable exascale Application Software Hardware Exascale Development Technology Technology Systems Science and Scalable and Hardware Integrated mission productive technology exascale applications software elements supercomputers Correctness Visualization Data Analysis Applicationsstack Co-Design Programming models, Math libraries and development environment, Tools Frameworks and runtimes System Software, resource Workflows Resilience management threading, Data Memory scheduling, monitoring, and management and Burst control I/O and file buffer system Node OS, runtimes Hardware interface ECP’s work encompasses applications, system software, hardware technologies and architectures, and workforce
    [Show full text]
  • Parallel Data Mining from Multicore to Cloudy Grids
    311 Parallel Data Mining from Multicore to Cloudy Grids Geoffrey FOXa,b,1 Seung-Hee BAEb, Jaliya EKANAYAKEb, Xiaohong QIUc, and Huapeng YUANb a Informatics Department, Indiana University 919 E. 10th Street Bloomington, IN 47408 USA b Computer Science Department and Community Grids Laboratory, Indiana University 501 N. Morton St., Suite 224, Bloomington IN 47404 USA c UITS Research Technologies, Indiana University, 501 N. Morton St., Suite 211, Bloomington, IN 47404 Abstract. We describe a suite of data mining tools that cover clustering, information retrieval and the mapping of high dimensional data to low dimensions for visualization. Preliminary applications are given to particle physics, bioinformatics and medical informatics. The data vary in dimension from low (2- 20), high (thousands) to undefined (sequences with dissimilarities but not vectors defined). We use deterministic annealing to provide more robust algorithms that are relatively insensitive to local minima. We discuss the algorithm structure and their mapping to parallel architectures of different types and look at the performance of the algorithms on three classes of system; multicore, cluster and Grid using a MapReduce style algorithm. Each approach is suitable in different application scenarios. We stress that data analysis/mining of large datasets can be a supercomputer application. Keywords. MPI, MapReduce, CCR, Performance, Clustering, Multidimensional Scaling Introduction Computation and data intensive scientific data analyses are increasingly prevalent. In the near future, data volumes processed by many applications will routinely cross the peta-scale threshold, which would in turn increase the computational requirements. Efficient parallel/concurrent algorithms and implementation techniques are the key to meeting the scalability and performance requirements entailed in such scientific data analyses.
    [Show full text]
  • Spark on Hadoop Vs MPI/Openmp on Beowulf
    Procedia Computer Science Volume 53, 2015, Pages 121–130 2015 INNS Conference on Big Data Big Data Analytics in the Cloud: Spark on Hadoop vs MPI/OpenMP on Beowulf Jorge L. Reyes-Ortiz1, Luca Oneto2, and Davide Anguita1 1 DIBRIS, University of Genoa, Via Opera Pia 13, I-16145, Genoa, Italy ([email protected], [email protected]) 2 DITEN, University of Genoa, Via Opera Pia 11A, I-16145, Genoa, Italy ([email protected]) Abstract One of the biggest challenges of the current big data landscape is our inability to pro- cess vast amounts of information in a reasonable time. In this work, we explore and com- pare two distributed computing frameworks implemented on commodity cluster architectures: MPI/OpenMP on Beowulf that is high-performance oriented and exploits multi-machine/multi- core infrastructures, and Apache Spark on Hadoop which targets iterative algorithms through in-memory computing. We use the Google Cloud Platform service to create virtual machine clusters, run the frameworks, and evaluate two supervised machine learning algorithms: KNN and Pegasos SVM. Results obtained from experiments with a particle physics data set show MPI/OpenMP outperforms Spark by more than one order of magnitude in terms of processing speed and provides more consistent performance. However, Spark shows better data manage- ment infrastructure and the possibility of dealing with other aspects such as node failure and data replication. Keywords: Big Data, Supervised Learning, Spark, Hadoop, MPI, OpenMP, Beowulf, Cloud, Parallel Computing 1 Introduction The information age brings along an explosion of big data from multiple sources in every aspect of our lives: human activity signals from wearable sensors, experiments from particle discovery research and stock market data systems are only a few examples [48].
    [Show full text]