Scyld Clusterware Documentation Release 7.6.2
Total Page:16
File Type:pdf, Size:1020Kb
Scyld ClusterWare Documentation Release 7.6.2 Penguin Computing June 11, 2019 CONTENTS 1 Release Notes 1 1.1 Release Notes: Scyld ClusterWare Release v7.6.2-762g0000......................1 1.1.1 About This Release.......................................1 Important: Recommend using /usr/sbin/install-scyld script............1 Important for clusters using 3rd-party drivers or applications..................1 Important for clusters using Panasas storage...........................2 1.1.2 First Installation of Scyld ClusterWare 7 On A Server.....................2 1.1.3 Upgrading An Earlier Release of Scyld ClusterWare 7 to 7.6.................3 1.1.4 Post-Installation Configuration Issues.............................4 Resolve *.rpmnew and *.rpmsave configuration file differences................4 Disable SELinux and NetworkManager.............................4 Edit /etc/beowulf/conf.d/sysctl.conf as needed.................5 Optionally reduce size of /usr/lib/locale/locale-archive.....................5 Optionally configure and enable compute node CPU speed/power management........5 Optionally install a different TORQUE package.........................5 Optionally enable job manager..................................6 Optionally enable TORQUE scheduler..............................7 Optionally enable Ganglia monitoring tool...........................7 Optionally enable NFS locking..................................7 Optionally adjust the size limit for locked memory.......................7 Optionally increase the max number of processes per user...................8 Optionally enable SSHD on compute nodes...........................8 Optionally allow IP Forwarding.................................8 Optionally increase the nf_conntrack table size.........................8 Optionally configure vm.zone_reclaim_mode on compute nodes................9 Optionally configure automount on compute nodes.......................9 Optionally reconfigure node names...............................9 1.1.5 Post-Installation Configuration Issues For Large Clusters................... 10 Optionally increase the number of nfsd threads......................... 10 Optionally increase the max number of processID values.................... 10 Optionally increase the max number of open files........................ 10 Issues with Ganglia........................................ 11 1.1.6 Post-Installation Release of Updated Packages......................... 11 1.1.7 Notable Feature Enhancements And Bug Fixes........................ 11 v7.6.2 - June 11, 2019...................................... 11 v7.6.1 - April 8, 2019....................................... 11 v7.6.0 - December 14, 2018................................... 12 v7.5.4 - October 2, 2018..................................... 12 v7.5.3 - July 27, 2018...................................... 13 v7.5.2 - July 14, 2018...................................... 14 i v7.5.1 - June 4, 2018....................................... 14 v7.5.0 - June 4, 2018....................................... 15 v7.4.6 - October 3, 2018..................................... 16 v7.4.5 - March 23, 2018..................................... 16 v7.4.4 - February 8, 2018..................................... 17 v7.4.3 - January 11, 2018..................................... 17 v7.4.2 - December 26, 2017................................... 18 v7.4.1 - October 31, 2017.................................... 18 v7.4.0 - October 11, 2017.................................... 18 v7.4.0 - October 6, 2017..................................... 18 v7.3.7 - October 13, 2017.................................... 19 v7.3.6 - August 7, 2017...................................... 19 v7.3.5 - June 28, 2017...................................... 20 v7.3.4 - June 16, 2017...................................... 20 v7.3.3 - April 27, 2017...................................... 20 v7.3.2 - April 3, 2017....................................... 21 v7.3.1 - March 8, 2017...................................... 21 v7.3.0 - January 20, 2017..................................... 21 v7.2.0 - November 14, 2016................................... 22 1.1.8 Known Issues And Workarounds................................ 22 Issues with bootmodule firmware................................ 22 Kernel panic using non-invpcid old Intel nodes......................... 22 Managing environment modules .version files.......................... 22 Installing and managing concurrent versions of packages.................... 24 Issues with OpenMPI....................................... 24 Issues with Scyld ClusterWare process migration in heterogeneous clusters.......... 25 Issues with MVAPICH2 and mpirun_rsh or mpispawn..................... 26 Issues with ptrace......................................... 26 Issues with IP Forwarding.................................... 26 Issues with kernel modules.................................... 26 Issues with port numbers..................................... 27 Issues with Spanning Tree Protocol and portfast......................... 27 Issues with Gdk.......................................... 27 Caution when modifying Scyld ClusterWare scripts....................... 28 Caution using tools that modify config files touched by Scyld ClusterWare........... 28 Running nscd service on master node may cause kickbackdaemon to misbehave........ 28 Beofdisk does not support local disks without partition tables................. 29 Issues with bproc and the getpid() syscall............................ 29 2 Installation Guide 30 2.1 Preface.................................................. 30 2.2 Scyld ClusterWare System Overview.................................. 30 2.3 Recommended Components....................................... 31 2.4 Assembling the Cluster.......................................... 32 2.5 Software Components.......................................... 32 2.6 Quick Start Installation.......................................... 33 2.6.1 Introduction........................................... 33 2.6.2 Network Interface Configuration................................ 33 Cluster Public Network Interface................................. 33 Cluster Private Network Interface................................ 34 2.6.3 Network Security Settings................................... 35 2.6.4 Red Hat RHEL7 or CentOS7 Installation........................... 36 2.6.5 Scyld ClusterWare Installation................................. 36 Configure Yum To Support ClusterWare............................. 36 ii Install ClusterWare........................................ 36 2.6.6 Trusted Devices......................................... 37 2.6.7 Compute Nodes......................................... 38 2.7 Detailed Installation Instructions..................................... 39 2.7.1 Red Hat Installation Specifics................................. 39 Network Interface Configuration................................. 39 Network Security Settings.................................... 44 Package Group Selection..................................... 46 2.7.2 Updating Red Hat or CentOs Installation............................ 47 Updating Using Yum....................................... 47 Updating Using Media...................................... 48 2.7.3 Scyld ClusterWare Installation................................. 48 Configure Yum To Support ClusterWare............................. 48 Install ClusterWare........................................ 48 2.7.4 Trusted Devices......................................... 49 2.7.5 Enabling Access to External License Servers......................... 50 2.7.6 Post-Installation Configuration................................. 51 2.7.7 Scyld ClusterWare Updates................................... 51 Updating ClusterWare...................................... 51 2.8 Cluster Verification Procedures..................................... 51 2.8.1 Monitoring Cluster Status.................................... 52 bpstat............................................... 52 BeoStatus............................................. 52 2.8.2 Running Jobs Across the Cluster................................ 53 Directed Execution with bpsh.................................. 54 Dynamic Execution with beorun and mpprun.......................... 54 2.9 Troubleshooting ClusterWare...................................... 54 2.9.1 Failing PXE Network Boot................................... 54 2.9.2 Mixed Uni-Processor and SMP Cluster Nodes......................... 56 2.9.3 IP Forwarding.......................................... 56 2.9.4 SSH Traffic........................................... 56 2.9.5 Device Driver Updates..................................... 56 2.9.6 Finding Further Information.................................. 56 2.10 Compute Node Disk Partitioning.................................... 57 2.10.1 Architectural Overview..................................... 57 2.10.2 Operational Overview...................................... 57 2.10.3 Disk Partitioning Procedures.................................. 57 Typical Partitioning........................................ 58 Default Partitioning........................................ 58 Generalized, User-Specified Partitions.............................. 59 Unique Partitions......................................... 59 2.10.4 Mapping Compute Node Partitions............................... 59 2.11 Changes to Configuration Files..................................... 60 2.11.1 Changes to Red Hat Configuration Files............................ 60 2.11.2 Possible Changes to ClusterWare Configuration