HPC User Guide
Total Page:16
File Type:pdf, Size:1020Kb
Supercomputing Wales HPC User Guide Version 2.0 March 2020 Supercomputing Wales User Guide Version 2.0 1 HPC User Guide Version 2.0 Table of Contents Glossary of terms used in this document ............................................................................................... 5 1 An Introduction to the User Guide ............................................................................................... 10 2 About SCW Systems ...................................................................................................................... 12 2.1 Cardiff HPC System - About Hawk......................................................................................... 12 2.1.1 System Specification: .................................................................................................... 12 2.1.2 Partitions: ...................................................................................................................... 12 2.2 Swansea HPC System - About Sunbird .................................................................................. 13 2.2.1 System Specification: .................................................................................................... 13 2.2.2 Partitions: ...................................................................................................................... 13 2.3 Cardiff Data Analytics Platform - About Sparrow (Coming soon) ......................................... 14 3 Registration & Access .................................................................................................................... 15 3.1 Gaining Access for Users & Projects ..................................................................................... 15 3.1.1 Applying for a user account .......................................................................................... 15 3.1.2 Accessing the systems ................................................................................................... 16 3.1.3 Applying for a new project ............................................................................................ 16 3.1.4 Joining an existing project ............................................................................................. 16 3.2 Terms & Conditions ............................................................................................................... 16 3.3 Accessing the Systems from the Internet ............................................................................. 17 3.3.1 Let’s Go! ........................................................................................................................ 17 3.3.2 SSH Address Details....................................................................................................... 17 3.3.3 Logging in from Windows ............................................................................................. 17 3.3.4 Logging in to the Cardiff System ................................................................................... 18 3.3.5 Logging in to the Swansea System (Sunbird) ................................................................ 18 3.3.6 Logging in from Linux/Mac ........................................................................................... 19 3.3.7 Entering your Password ................................................................................................ 19 4 Using the Systems ......................................................................................................................... 21 4.1 Command Line Interface & Software Access ........................................................................ 21 4.1.1 Modules ........................................................................................................................ 21 4.1.2 Can I Pre-Load Modules in .myenv? .............................................................................. 22 4.1.3 Customising Bash .......................................................................................................... 22 4.1.4 Can I use a Different Shell? ........................................................................................... 22 4.1.5 Changing your Password ............................................................................................... 22 4.2 Running Jobs with Slurm ....................................................................................................... 22 4.2.1 Submitting a Job ............................................................................................................ 22 4.2.2 Monitoring Jobs ............................................................................................................ 23 4.2.3 Killing a Job .................................................................................................................... 25 4.2.4 Completed Jobs ............................................................................................................. 25 4.2.5 Specifying which project is running the job .................................................................. 26 4.2.6 Example Jobs ................................................................................................................. 26 4.3 Slurm Job Directives, Variables & Partitions ......................................................................... 27 4.3.1 Job Runtime Environment in Slurm .............................................................................. 27 4.3.2 #SBATCH Directives ....................................................................................................... 27 4.3.3 Environment Variables .................................................................................................. 28 4.3.4 System Queues and Partitions in Slurm ........................................................................ 28 Supercomputing Wales User Guide Version 2.0 2 HPC User Guide Version 2.0 4.4 Using GPUs ............................................................................................................................ 29 4.4.1 GPU Usage ..................................................................................................................... 29 4.4.2 CUDA Versions & Hardware Differences ...................................................................... 29 4.4.3 GPU Compute Modes.................................................................................................... 29 4.5 Using the AMD EPYC Systems ............................................................................................... 30 4.5.1 What are the AMD EPYC nodes?................................................................................... 30 4.5.2 Using the AMD EPYC nodes .......................................................................................... 30 4.5.3 Optimisation Options in the Compiler .......................................................................... 31 4.5.4 Use of the Intel Math Kernel Library (MKL) .................................................................. 31 4.5.5 OpenMP ........................................................................................................................ 31 4.5.6 Finding the Incompatibles ............................................................................................. 31 4.6 Data Storage .......................................................................................................................... 32 4.6.1 Types of Storage ............................................................................................................ 32 4.6.2 Checking Storage Quotas .............................................................................................. 32 4.6.3 Shared Areas ................................................................................................................. 33 4.7 Conduct & Best Practice........................................................................................................ 33 5 Advanced Use................................................................................................................................ 35 5.1 MPI and/or OpenMP Jobs ..................................................................................................... 35 5.1.1 MPI ................................................................................................................................ 35 5.1.2 OpenMP ........................................................................................................................ 35 5.1.3 MPI + OpenMP .............................................................................................................. 35 5.2 Batch Submission of Serial Tasks .......................................................................................... 35 5.2.1 Shell process control ..................................................................................................... 36 5.2.2 GNU Parallel and Slurm’s srun command ..................................................................... 36 5.2.3 Multi-Threaded Tasks ................................................................................................... 38 5.3 Job Arrays .............................................................................................................................. 38 5.3.1 Submission .................................................................................................................... 38 5.3.2 Monitoring ...................................................................................................................