Parallel Computational Mechanics with a Cluster of Workstations

PARALLEL COMPUTATIONAL MECHANICS WITH A CLUSTER OF WORKSTATIONS By PAUL C. JOHNSON A THESIS PRESENTED TO THE GRADUATE SCHOOL OF UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS OF THE DEGREE OF MASTER OF SCIENCE UNIVERSITY OF FLORIDA 2005 Copyright 2005 by Paul C. Johnson ACKNOWLEDGEMENTS I wish to express my sincere gratitude to Professor Loc Vu-Quoc for his support and guidance throughout my master’s study. His steady and thorough approach to teaching has inspired me to accept any challenge with determination. I also would like to express my gratitude to the supervisory committee members: Professors Alan D. George and Ashok V. Kumar. Many thanks go to my friends who have always been there for me whenever I needed help or friendship. Finally, I would like to thank my parents Charles and Helen Johnson, grandparents Wendell and Giselle Cernansky, and the Collins family for all of the support that they have given me over the years. I sincerely appreciate all that they have sacrificed and can not imagine how I would have proceeded without their love, support, and encouragement. iii TABLE OF CONTENTS page ACKNOWLEDGEMENTS : : : : : : : : : : : : : : : : : : : : : : : : : : : : iii ABSTRACT : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : xiv 1 PARALLEL COMPUTING : : : : : : : : : : : : : : : : : : : : : : : : : : 1 1.1 Types of Parallel Processing : : : : : : : : : : : : : : : : : : : : : : : 1 1.1.1 Clusters : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 2 1.1.2 Beowulf Cluster : : : : : : : : : : : : : : : : : : : : : : : : : : 2 1.1.3 Network of Workstations : : : : : : : : : : : : : : : : : : : : : 3 2 NETWORK SETUP : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 5 2.1 Network and Computer Hardware : : : : : : : : : : : : : : : : : : : : 5 2.2 Network Configuration : : : : : : : : : : : : : : : : : : : : : : : : : : 6 2.2.1 Configuration Files : : : : : : : : : : : : : : : : : : : : : : : : 7 2.2.2 Internet Protocol Forwarding and Masquerading : : : : : : : : : 9 2.3 MPI–Message Passing Interface : : : : : : : : : : : : : : : : : : : : : 12 2.3.1 Goals : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 12 2.3.2 MPICH : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 14 2.3.3 Installation : : : : : : : : : : : : : : : : : : : : : : : : : : : : 14 2.3.4 Enable SSH : : : : : : : : : : : : : : : : : : : : : : : : : : : : 14 2.3.5 Edit Machines.LINUX : : : : : : : : : : : : : : : : : : : : : : 16 2.3.6 Test Examples : : : : : : : : : : : : : : : : : : : : : : : : : : 17 2.3.7 Conclusions : : : : : : : : : : : : : : : : : : : : : : : : : : : : 18 3 BENCHMARKING : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 21 3.1 Performance Metrics : : : : : : : : : : : : : : : : : : : : : : : : : : : 21 3.2 Network Analysis : : : : : : : : : : : : : : : : : : : : : : : : : : : : 22 3.2.1 NetPIPE : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 23 3.2.2 Test Setup : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 24 3.2.3 Results : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 26 3.3 High Performance Linpack–Single Node : : : : : : : : : : : : : : : : 31 3.3.1 Installation : : : : : : : : : : : : : : : : : : : : : : : : : : : : 31 3.3.2 ATLAS Routines : : : : : : : : : : : : : : : : : : : : : : : : : 33 3.3.3 Goto BLAS Libraries : : : : : : : : : : : : : : : : : : : : : : : 34 iv 3.3.4 Using either Library : : : : : : : : : : : : : : : : : : : : : : : 35 3.3.5 Benchmarking : : : : : : : : : : : : : : : : : : : : : : : : : : 36 3.3.6 Main Algorithm : : : : : : : : : : : : : : : : : : : : : : : : : : 37 3.3.7 HPL.dat Options : : : : : : : : : : : : : : : : : : : : : : : : : 37 3.3.8 Test Setup : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 42 3.3.9 Results : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 43 3.3.10 Goto’s BLAS Routines : : : : : : : : : : : : : : : : : : : : : : 47 3.4 HPL–Multiple Node Tests : : : : : : : : : : : : : : : : : : : : : : : : 49 3.4.1 Two Processor Tests : : : : : : : : : : : : : : : : : : : : : : : 49 3.4.2 Process Grid : : : : : : : : : : : : : : : : : : : : : : : : : : : 50 3.4.3 Three Processor Tests : : : : : : : : : : : : : : : : : : : : : : : 55 3.4.4 Four Processor Tests : : : : : : : : : : : : : : : : : : : : : : : 58 3.4.5 Conclusions : : : : : : : : : : : : : : : : : : : : : : : : : : : : 60 4 CALCULIX : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 63 4.1 Installation of CalculiX GraphiX : : : : : : : : : : : : : : : : : : : : 63 4.2 Installation of CalculiX CrunchiX : : : : : : : : : : : : : : : : : : : : 64 4.2.1 ARPACK Installation : : : : : : : : : : : : : : : : : : : : : : : 65 4.2.2 SPOOLES Installation : : : : : : : : : : : : : : : : : : : : : : 66 4.2.3 Compile CalculiX CrunchiX : : : : : : : : : : : : : : : : : : : 66 4.3 Geometric Capabilities : : : : : : : : : : : : : : : : : : : : : : : : : : 67 4.4 Pre-processing : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 68 4.4.1 Points : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 68 4.4.2 Lines : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 69 4.4.3 Surfaces : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 71 4.4.4 Bodies : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 71 4.5 Finite-Element Mesh Creation : : : : : : : : : : : : : : : : : : : : : : 72 5 CREATING GEOMETRY WITH CALCULIX. : : : : : : : : : : : : : : : 74 5.1 CalculiX Geometry Generation : : : : : : : : : : : : : : : : : : : : : 74 5.1.1 Creating Points : : : : : : : : : : : : : : : : : : : : : : : : : : 75 5.1.2 Creating Lines : : : : : : : : : : : : : : : : : : : : : : : : : : 78 5.1.3 Creating Surfaces : : : : : : : : : : : : : : : : : : : : : : : : : 80 5.1.4 Creating Bodies : : : : : : : : : : : : : : : : : : : : : : : : : : 82 5.1.5 Creating the Cylinder : : : : : : : : : : : : : : : : : : : : : : : 85 5.1.6 Creating the Parallelepiped : : : : : : : : : : : : : : : : : : : : 90 5.1.7 Creating Horse-shoe Section : : : : : : : : : : : : : : : : : : : 92 5.1.8 Creating the Slanted Section : : : : : : : : : : : : : : : : : : : 94 5.2 Creating a Solid Mesh : : : : : : : : : : : : : : : : : : : : : : : : : : 95 5.2.1 Changing Element Divisions : : : : : : : : : : : : : : : : : : : 97 5.2.2 Delete and Merge Nodes : : : : : : : : : : : : : : : : : : : : : 103 v 5.2.3 Apply Boundary Conditions : : : : : : : : : : : : : : : : : : : 116 5.2.4 Run Analysis : : : : : : : : : : : : : : : : : : : : : : : : : : : 119 6 OPEN SOURCE SOLVERS : : : : : : : : : : : : : : : : : : : : : : : : : 121 6.1 SPOOLES : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 121 6.1.1 Objects in SPOOLES : : : : : : : : : : : : : : : : : : : : : : : 122 6.1.2 Steps to Solve Equations : : : : : : : : : : : : : : : : : : : : : 122 6.1.3 Communicate : : : : : : : : : : : : : : : : : : : : : : : : : : : 123 6.1.4 Reorder : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 125 6.1.5 Factor : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 125 6.1.6 Solve : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 126 6.2 Code to Solve Equations : : : : : : : : : : : : : : : : : : : : : : : : : 127 6.3 Serial Code : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 127 6.3.1 Communicate : : : : : : : : : : : : : : : : : : : : : : : : : : : 127 6.3.2 Reorder : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 129 6.3.3 Factor : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 130 6.3.4 Communicate B : : : : : : : : : : : : : : : : : : : : : : : : : : 133 6.3.5 Solve : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 134 6.4 Parallel Code : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 136 6.4.1 Communicate : : : : : : : : : : : : : : : : : : : : : : : : : : : 136 6.4.2 Reorder : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 138 6.4.3 Factor : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 140 6.4.4 Solve : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 143 7 MATRIX ORDERINGS : : : : : : : : : : : : : : : : : : : : : : : : : : : : 145 7.1 Ordering Optimization : : : : : : : : : : : : : : : : : : : : : : : : : : 145 7.2 Minimum Degree Ordering : : : : : : : : : : : : : : : : : : : : : : : 147 7.3 Nested Dissection : : : : : : : : : : : : : : : : : : : : : : : : : : : : 151 7.4 Multi-section : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 152 8 OPTIMIZING SPOOLES FOR A COW : : : : : : : : : : : : : : : : : : : 153 8.1 Installation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 153 8.2 Optimization : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 154 8.2.1 Multi-Processing Environment - MPE : : : : : : : : : : : : : : 155 8.2.2 Reduce Ordering Time : : : : : : : : : : : : : : : : : : : : : : 159 8.2.3 Optimizing the Front Tree : : : : : : : : : : : : : : : : : : : : : 161 8.2.4 Maxdomainsize : : : : : : : : : : : : : : : : : : : : : : : : : : 161 8.2.5 Maxzeros and Maxsize : : : : : : : : : : : : : : : : : : : : : : 162 8.2.6 Final Tests with Optimized Solver : : : : : : : : : : : : : : : : 165 8.3 Conclusions : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 167 8.3.1 Recommendations : : : : : : : : : : : : : : : : : : : : : : : : 168 vi A CPI SOURCE CODE : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 170 B BENCHMARKING RESULTS : : : : : : : : : : : : : : : : : : : : : : : : 172 B.1 NetPIPE Results : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 172 B.2 NetPIPE TCP Results : : : : : : : : : : : : : : : : : : : : : : : : : : 173 B.3 High Performance Linpack : : : : : : : : : : : : : : : : : : : : : : : 174 B.3.1 HPL Makefiles : : : : : : : : : : : : : : : : : : : : : : : : : : 174 B.3.2 HPL.dat File : : : : : : : : : : : : : : : : : : : : : : : : : : : 179 B.3.3 First Test Results with ATLAS : : : : : :.

Parallel Computational Mechanics with a Cluster of Workstations

Bench - Benchmarking the State-Of- The-Art Task Execution Frameworks of Many- Task Computing

Adaptive Data Migration in Load-Imbalanced HPC Applications

Beowulf Clusters — an Overview

Improving MPI Threading Support for Current Hardware Architectures

Exascale Computing Project -- Software

Parallel Data Mining from Multicore to Cloudy Grids

High Performance Integration of Data Parallel File Systems and Computing

MPICH Installer's Guide Version 3.3.2 Mathematics and Computer

3.0 and Beyond

Spack Package Repositories

Performance Comparison of MPICH and Mpi4py on Raspberry Pi-3B

Installation Guide for X86-64 Cpus and Tesla Gpus