Using Platform LSF™ HPC
Total Page:16
File Type:pdf, Size:1020Kb
Using Platform LSF™ HPC Version 7 Update 6 Release date: August 2009 Last modified: August 20, 2009 Support: [email protected] Comments to: [email protected] Copyright © 1994-2009, Platform Computing Inc. We’d like to hear from You can help us make this document better by telling us what you think of the content, you organization, and usefulness of the information. If you find an error, or just want to make a suggestion for improving this document, please address your comments to [email protected]. Your comments should pertain only to Platform documentation. For product support, contact [email protected]. Although the information in this document has been carefully reviewed, Platform Computing Inc. (“Platform”) does not warrant it to be free of errors or omissions. Platform reserves the right to make corrections, updates, revisions or changes to the information in this document. UNLESS OTHERWISE EXPRESSLY STATED BY PLATFORM, THE PROGRAM DESCRIBED IN THIS DOCUMENT IS PROVIDED “AS IS” AND WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT WILL PLATFORM COMPUTING BE LIABLE TO ANYONE FOR SPECIAL, COLLATERAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING WITHOUT LIMITATION ANY LOST PROFITS, DATA, OR SAVINGS, ARISING OUT OF THE USE OF OR INABILITY TO USE THIS PROGRAM. Document redistribution and translation This document is protected by copyright and you may not redistribute or translate it into another language, in part or in whole. Internal redistribution You may only redistribute this document internally within your organization (for example, on an intranet) provided that you continue to check the Platform Web site for updates and update your version of the documentation. You may not make it available to your organization over the Internet. Trademarks LSF is a registered trademark of Platform Computing Inc. in the United States and in other jurisdictions. PLATFORM COMPUTING, PLATFORM SYMPHONY, PLATFORM JOBSCHEDULER, PLATFORM ENTERPRISE GRID ORCHESTRATOR, PLATFORM EGO, PLATFORM VM ORCHESTRATOR, PLATFORM VMO, ACCELERATING INTELLIGENCE, and the PLATFORM and PLATFORM LSF logos are trademarks of Platform Computing Inc. in the United States and in other jurisdictions. UNIX is a registered trademark of The Open Group in the United States and in other jurisdictions. Linux is the registered trademark of Linus Torvalds in the U.S. and other countries. Microsoft is either a registered trademark or a trademark of Microsoft Corporation in the United States and/or other countries. Windows is a registered trademark of Microsoft Corporation in the United States and other countries. Macrovision, Globetrotter, and FLEXlm are registered trademarks or trademarks of Macrovision Corporation in the United States of America and/or other countries. Topspin is a registered trademark of Topspin Communications, Inc. Intel, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Other products or services mentioned in this document are identified by the trademarks or service marks of their respective owners. Third Party License Agreements http://www.platform.com/legal-notices/third-party-license-agreements Contents 1 About Platform LSF HPC . 7 What Is Platform LSF HPC? . 8 LSF HPC Components . 11 2 Running Parallel Jobs . 13 blaunch Distributed Application Framework . 14 OpenMP Jobs . 23 PVM Jobs . 24 SGI Vendor MPI Support . 25 HP Vendor MPI Support . 28 LSF HPC Generic Parallel Job Launcher Framework . 30 How the Generic PJL Framework Works . 31 Integration Method 1 . 37 Integration Method 2 . 39 Tuning PAM Scalability and Fault Tolerance . 41 Running Jobs with Task Geometry . 42 Enforcing Resource Usage Limits for Parallel Tasks . 45 Example Integration: LAM/MPI . 47 Tips for Writing PJL Wrapper Scripts . 55 Other Integration Options . 57 3 Using Platform LSF HPC with HP-UX Processor Sets . 59 About HP-UX Psets . 60 Configuring LSF HPC with HP-UX Psets . 63 Using LSF HPC with HP-UX Psets . 66 4 Using Platform LSF HPC with IBM POE . 71 Running IBM POE Jobs . 72 Migrating IBM Load Leveler Job Scripts to Use LSF Options . 79 Controlling Allocation and User Authentication for IBM POE Jobs . 86 Submitting IBM POE Jobs over InfiniBand . 89 5 Using Platform LSF HPC for Linux/QsNet . 91 About Platform LSF HPC for Linux/QsNet . 92 Using Platform LSF HPC 3 Configuring Platform LSF HPC for Linux/QsNet . 95 Operating Platform LSF HPC for Linux/QsNet . 100 Submitting and Monitoring Jobs . 103 6 Using Platform LSF HPC with SGI Cpusets . 111 About SGI cpusets . 112 Configuring LSF HPC with SGI Cpusets . 115 Using LSF HPC with SGI Cpusets . 122 Using SGI Comprehensive System Accounting facility (CSA) . 132 Using SGI User Limits Database (ULDB—IRIX only) . 134 SGI Job Container and Process Aggregate Support . 136 7 Using Platform LSF HPC with LAM/MPI . 139 About Platform LSF HPC and LAM/MPI . 140 Configuring LSF HPC to work with LAM/MPI . 142 Submitting LAM/MPI Jobs . 143 8 Using Platform LSF HPC with MPICH-GM . 145 About Platform LSF HPC and MPICH-GM . 146 Configuring LSF HPC to Work with MPICH-GM . 148 Submitting MPICH-GM Jobs . 150 Using AFS with MPICH-GM . 151 9 Using Platform LSF HPC with MPICH-P4 . 153 About Platform LSF HPC and MPICH-P4 . 154 Configuring LSF HPC to Work with MPICH-P4 . 156 Submitting MPICH-P4 Jobs . 157 10 Using Platform LSF HPC with MPICH2 . 159 About Platform LSF HPC and MPICH2 . 160 Configuring LSF HPC to Work with MPICH2 . 162 Building Parallel Jobs . 164 Submitting MPICH2 Jobs . 165 11 Using Platform LSF HPC with MVAPICH . 167 About Platform LSF HPC and MVAPICH . 168 Configuring LSF HPC to Work with MVAPICH . 170 Submitting MVAPICH Jobs . 171 12 Using Platform LSF HPC with Intel® MPI . 173 About Platform LSF HPC and the Intel® MPI Library . 174 Configuring LSF HPC to Work with Intel MPI . 176 Working with the Multi-purpose Daemon (MPD) . 177 4 Using Platform LSF HPC Submitting Intel MPI Jobs . 178 13 Using Platform LSF HPC with Open MPI . 181 About Platform LSF HPC and the Open MPI Library . 182 Configuring LSF HPC to Work with Open MPI . 184 Submitting Open MPI Jobs . 185 14 Using Platform LSF HPC Parallel Application Integrations . 187 Using LSF HPC with ANSYS . 188 Using LSF HPC with NCBI BLAST . 191 Using LSF HPC with FLUENT . 192 Using LSF HPC with Gaussian . 196 Using LSF HPC with Lion Bioscience SRS . 197 Using LSF HPC with LSTC LS-Dyna . 198 Using LSF HPC with MSC Nastran . 204 15 Using Platform LSF HPC with the Etnus TotalView® Debugger . 207 How LSF HPC Works with TotalView . 208 Running Jobs for TotalView Debugging . 210 Controlling and Monitoring Jobs Being Debugged in TotalView . 213 16 Using Platform LSF HPC with SLURM . 215 About Platform LSF HPC and SLURM . 216 Installing a New Platform LSF HPC Cluster with SLURM . 221 Configuring Platform LSF HPC with SLURM . 224 Operating Platform LSF HPC with SLURM . 229 Submitting and Monitoring Jobs with SLURM . 236 SLURM Command Reference . 245 SLURM File Reference . 247 17 pam Command Reference . 253 SYNOPSIS . 254 DESCRIPTION . 254 OPTIONS . 255 EXIT STATUS . 257 SEE ALSO . 257 Index . 259 Using Platform LSF HPC 5 6 Using Platform LSF HPC C HAPTER 1 About Platform LSF HPC Contents ◆ “What Is Platform LSF HPC?” on page 8 ◆ “LSF HPC Components” on page 11 Using Platform LSF HPC 7 What Is Platform LSF HPC? Platform LSF™ HPC (“LSF HPC”) is the distributed workload management solution for maximizing the performance of High Performance Computing (HPC) clusters. Platform LSF HPC is fully integrated with Platform LSF, the industry standard workload management software product, to provide load sharing in a distributed system and batch scheduling for compute-intensive jobs. Platform LSF HPC provides support for: ◆ Dynamic resource discovery and allocation (resource reservation) for parallel batch job execution ◆ Full job-level control of the distributed processes to ensure no processes will become un-managed. This effectively reduces the possibility of one parallel job causing severe disruption to an organization's computer service ◆ The standard MPI interface ◆ Full integration with Platform LSF, providing heterogeneous resource-based batch job scheduling including job-level resource usage enforcement Advanced HPC scheduling policies Platform LSF HPC enhances the job management capability of your cluster through advanced scheduling policies such as: ◆ Policy-based job preemption ◆ Advance reservation ◆ Memory and processor reservation ◆ Memory and processor backfill ◆ Cluster-wide resource allocation limits ◆ User and project-based fairshare scheduling ◆ Topology-aware scheduling LSF daemons Run on every node to collect resource information such as processor load, memory availability, interconnect states, and other host-specific as well as cluster-wide resources. These agents coordinate.