Bullx Cluster Suite Administrator's Guide
Total Page:16
File Type:pdf, Size:1020Kb
bullx cluster suite Administrator's Guide extreme computing REFERENCE 86 A2 20FA 02 extreme computing bullx cluster suite Administrator's Guide Software July 2009 BULL CEDOC 357 AVENUE PATTON B.P.20845 49008 ANGERS CEDEX 01 FRANCE REFERENCE 86 A2 20FA 02 The following copyright notice protects this book under Copyright laws which prohibit such actions as, but not limited to, copying, distributing, modifying, and making derivative works. Copyright © Bull SAS 2009 Printed in France Trademarks and Acknowledgements We acknowledge the rights of the proprietors of the trademarks mentioned in this manual. All brand names and software and hardware product names are subject to trademark and/or patent protection. Quoting of brand and product names is for information purposes only and does not represent trademark misuse. The information in this document is subject to change without notice. Bull will not be liable for errors contained herein, or for incidental or consequential damages in connection with the use of this material. Preface Scope and Objectives The purpose of this guide is to explain how to configure and manage Bull extreme computing clusters, using the administration tools recommended by Bull. It is not in the scope of this guide to describe Linux administration functions in depth. For this information, please refer to the standard Linux distribution documentation. Intended Readers This guide is for administrators of bullx cluster suite system. Prerequisites The installation of all hardware and software components must have been completed. Structure This guide is organized as follows: Chapter 1. Lists Bull’s Cluster Management functions and products. Chapter 2. Initial Configuration Tasks Describes some basic configuration tasks including password management, security settings and running parallel commands. Chapter 3. Cluster Database Management Describes the commands and the tools which enable the administrator to display and to change the Cluster Database. Chapter 4. Software Deployment Describes how to use KSIS to deploy, manage, modify and check software images. Chapter 5. Kerberos – Network Authentication Protocol Describes how to set up and use Kerberos for cluster security. Chapter 6. Storage Devices Management Explains how to setup the management environment for storage devices, and how to use storage management services. Chapter 7. Parallel File Systems Explains how these file systems operate on a Bull HPC system. It describes in detail how to install, configure and manage the Lustre file system. Preface i Chapter 8. Resource Management Explains how the SLURM Resource Manager works to help ensure the optimal management of cluster resources. Chapter 9. Batch Management with PBS Professional Describes post installation checks and some useful commands for the PBS Professional Batch Manager. Chapter 10. Monitoring Describes the Bull System Manager - HPC Edition monitoring tool for Bull HPC systems. Glossary Some of the acronyms and glossary terms for bullx cluster suite are detailed in the Glossary Bibliography Refer to the manuals included on the documentation CD delivered with your system OR download the latest manuals for your bullx cluster suite release, and for your cluster hardware, from: http://support.bull.com/ The bullx cluster suite Documentation CD-ROM (86 A2 12FB) includes the following manuals: • bullx cluster suite Installation and Configuration Guide (86 A2 19FA) • bullx cluster suite Administrator’s Guide (86 A2 20FA) • bullx cluster suite User's Guide (86 A2 22FA) • bullx cluster suite Maintenance Guide (86 A2 24FA) • bullx cluster suite Application Tuning Guide (86 A2 23FA) • bullx cluster suite High Availability Guide (86 A2 25FA) • InfiniBand Guide (86 A2 42FD) • LDAP Authentication Guide (86 A2 41FD) The following document is delivered separately: • The Software Release Bulletin (SRB) (86 A2 73EJ) mportant The Software Release Bulletin contains the latest information for your delivery. This should be read first. Contact your support representative for more information. For Bull System Manager, refer to the Bull System Manager documentation suite. For clusters which use the PBS Professional Batch Manager: • PBS Professional 10.0 Administrator’s Guide (on the PBS Professional CD-ROM) • PBS Professional 10.0 User’s Guide (on the PBS Professional CD-ROM) ii bullx cluster suite - Administrator's Guide For clusters which use LSF: • LSF Installation and Configuration Guide (86 A2 39FB) (on the LSF CD-ROM) • Installing Platform LSF on UNIX and Linux (on the LSF CD-ROM) For clusters which include the Bull Cool Cabinet: • Site Preparation Guide (86 A1 40FA) • R@ck'nRoll & R@ck-to-Build Installation and Service Guide (86 A1 17FA) • Cool Cabinet Installation Guide (86 A1 20EV) • Cool Cabinet Console User's Guide (86 A1 41FA) • Cool Cabinet Service Guide (86 A7 42FA) Highlighting • Commands entered by the user are in a frame in ‘Courier’ font, as shown below: mkdir /var/lib/newdir • System messages displayed on the screen are in ‘Courier New’ font between 2 dotted lines, as shown below. Enter the number for the path : • Values to be entered in by the user are in ‘Courier New’, for example: COM1 • Commands, files, directories and other items whose names are predefined by the system are in ‘Bold’, as shown below: The /etc/sysconfig/dump file. • The use of Italics identifies publications, chapters, sections, figures, and tables that are referenced. • < > identifies parameters to be supplied by the user, for example: <node_name> WARNING A Warning notice indicates an action that could cause damage to a program, device, system, or data. Preface iii iv bullx cluster suite - Administrator's Guide Table of Contents Chapter 1. Cluster Management Functions and Corresponding Products................. 1-1 Chapter 2. Initial Configuration Tasks ................................................................ 2-1 2.1 Configuring Services ....................................................................................................... 2-1 2.2 Modifying Passwords and Creating Users .......................................................................... 2-2 2.3 Configuring Security........................................................................................................ 2-3 2.3.1 Setting up SSH ...................................................................................................... 2-3 2.4 Running Parallel Commands with pdsh .............................................................................. 2-4 2.4.1 Using pdsh............................................................................................................ 2-4 2.4.2 Using pdcp ........................................................................................................... 2-7 2.4.3 Using dshbak ........................................................................................................ 2-7 2.5 Day to Day Maintenance Operations ................................................................................ 2-9 Chapter 3. Cluster Database Management......................................................... 3-1 3.1 Architecture of ClusterDB.................................................................................................. 3-1 3.2 ClusterDB Administrator ................................................................................................... 3-2 3.3 Using Commands............................................................................................................ 3-2 3.3.1 ChangeOwnerProperties......................................................................................... 3-3 3.3.2 dbmConfig............................................................................................................ 3-5 3.3.3 dbmCluster............................................................................................................ 3-7 3.3.4 dbmNode ............................................................................................................. 3-8 3.3.5 dbmHwManager ................................................................................................. 3-11 3.3.6 dbmGroup .......................................................................................................... 3-12 3.3.7 dbmEthernet ........................................................................................................ 3-15 3.3.8 dbmIconnect........................................................................................................ 3-16 3.3.9 dbmTalim............................................................................................................ 3-17 3.3.10 dbmSerial ........................................................................................................... 3-18 3.3.11 dbmFiberChannel ................................................................................................ 3-20 3.3.12 dbmServices........................................................................................................ 3-21 3.3.13 dbmDiskArray ..................................................................................................... 3-22 3.4 Managing the ClusterDB ................................................................................................ 3-24 3.4.1 Saving and Restoring the Database........................................................................ 3-24 3.4.2 Starting and Stopping PostgreSQL ......................................................................... 3-26 3.4.3 Viewing the PostgreSQL