Introduction to High Performance Computing at Case Western Reserve University
Total Page:16
File Type:pdf, Size:1020Kb
Introduction to High Performance Computing at Case Western Reserve University ! Research Computing and CyberInfrastructure team ! Presenters Emily Dragowsky Daniel Balagué Guardia Hadrian Djohari KSL Data Center Sanjaya Gajurel Bootcamp Outline Who we are Case HPC resources Working with the Cluster Basic Linux Job Scripting Open Discussion/Q&A Who we are • Research Computing and 5th floor, around back CyberInfrastructure Team from the elevators • [U]TECH • University Staff, academic ties • CWRU grads • Research group members • Skilled practitioners • Strong collaboration with Network, Servers and Storage teams RCCI Training Sessions - Spring 2018 • 2/6/18 - Advancing with Software Installation • 2/20/18 - Working with Data Transfer • 3/6/18 - Exploring Amazon Web Services for Researchers • 3/21/18 - Advancing with the Linux Shell • 4/3/18 - Advancing with Matlab Toepfer Room, Adelbert Hall, 2nd floor RCCI Services Cyberinfrastructure Support • High Performance • Education and Awareness Computing • Consultation and Award Pre-support • Research Networking services • Database Design • Research Storage and • Visualization Archival solutions • Secure Research • Programming Services Environment for computing • Concierge for off-premise services on regulated data (XSEDE,OSC,AWS) • Public Cloud and Off- Premise Services CASE HPC Cluster • Designed for computationally intensive jobs • long-running, number crunching • Optimized for batch jobs • combine resources as needed (cpu, memory, gnu) • Supports interactive/graphically intensive jobs • OS version emphasizes stability • Linux (Red Hat Enterprise Linux 7.4) • Accessible from Linux, Mac and Windows • Some level of Linux expertise is needed - why we’re here today! • Clusters: redcat (slurm), and hadoop HPC Cluster Glossary • Head Nodes: Development, Analysis, Job Submission • Compute Nodes: Computational Computers • Panasas: Engineered File System, fastest storage • DELL Fluid File System: “Value” storage • Data Transfer Nodes: hpctransfer, dtn1 • Science DMZ: lowest “resistance” Data Pathway • SLURM: Cluster workload manager & Scheduler) HPC Cluster Components Resource Manager redcat.case.edu ! Science Admin DMZ Nodes Dell FFS Storage Head Nodes SLURM Master Data Transfer Nodes Panasas Storage Batch nodes GPU nodes SMP nodes HPC Cluster Components Resource Manager ! redcat.case.edu University ! Firewall Science Admin DMZ Nodes Dell FFS Storage Head Nodes SLURM Master Data Transfer Nodes Panasas Storage Batch nodes GPU nodes SMP nodes Working on the Cluster How To: ~ access the cluster ~ get my data onto the cluster ~ establish interactive sessions <break> ~ submit jobs through the scheduler ~ monitor jobs • a.k.a. why is my job not running!?!?! • work with others within the cluster Access the Cluster You can login from anywhere You will need: • An approved cluster account • Enter your CaseID and the Single Sign-On password • ssh (secure shell) utility [detailed instructions for all platforms]! • We recommend x2go-client • Putty or cygwin (Windows), Terminal (Mac/Linux) will work for non-graphical output sessions. ! If Off-campus Location,! then Connect through VPN, using two-factor authentication Case Guest wireless == “of-campus” Access the Cluster Access the Cluster Owner Group Permissions { Working within Group Allocations - I • What are linux groups? • Manage affiliations in the multiuser environment • Set “in-between” permissions • Groups are administered — contact [email protected] • Switching the active group: “newgrp - <groupname>” [mrd20@hpc3 ~] groups tas35 oscsys gaussian hpcadmin schrodinger ccm4 singularity [mrd20@hpc3 ~] newgrp - hpcadmin [mrd20@hpc3 ~] groups hpcadmin oscsys gaussian tas35 schrodinger ccm4 singularity Active Linux group affects group ownership of new files Access the Cluster Graphically w/ x2goclient Transfer Data scp command scp [-12346BCpqrv] [-c cipher] [-F ssh_config] [-i identity_file] [-l limit] [-o ssh_option] [-P port] [-S program] [[user@]host1:]file1 ... [[user@]host2:]file2 • Copy from HPC to your local PC • scp -r [email protected]:/home/mrd20/data/vLOS.dat . • ‘.’ — full stop means ‘this directory’ • From your PC to HPC • scp orange.py mrd20@redcat:! • ‘:’ — colon denotes hostname Transfer Data Globus Setup Instructions: https://sites.google.com/a/case.edu/hpc-upgraded-cluster/ home/important-notes-for-new-users/transferring-files HPC Environment Your Full Cluster Resources Your HPC account, sponsored by your PI, provides: ! •Group afliation — resources shared amongst group members •Storage •/home — permanent storage, replicated & “snapshot” protected •/scratch/pbsjobs — up to 1 TB temporary storage •/scratch/users — small-scale temporary storage ➡ exceeding quota(s) will prevent using account!! ! •Cores: member groups allocation of 32+ for an “8-share” • Wall-time: 320-hour limit for member shares (32 hours for guest shares) HPC Environment Your /home • Allocated storage space in the HPC filesystem for your work • Create subdirectories underneath your /home/CaseID, ideally each job has its own subdirectory ! cd — linux command to change the current directory examples to change to “home” ‣ cd /home/<CaseID> ‣ cd ~<CaseID> ‣ cd $HOME ! $HOME is an environment variable that points to /home/<CaseID> HPC Environment Beyond /home Linux systems have hierarchical directory structure User files: /home System files: /bin, /dev, /etc, /log, /opt, /var Application files: /usr/local/<module>/<version> ! ! Consider Python: 4 versions installed ‣ /bin/python — 2.6.6 ‣ /usr/local/python/ ‣ 2.7.8 ‣ 2.7.10 ‣ 3.5.2 Module Hierarchies HPC Environment: Environment Variables on Rider Keeping organized ! ‣ echo $PATH /usr/local/intel-17/openmpi/2.0.1/bin:/usr/local/intel/17/compilers_and_libraries_2017/linux/bin/intel64:/ usr/lib64/qt-3.3/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/dell/srvadmin/bin! ! ‣ echo $LD_LIBRARY_PATH /usr/local/intel-17/openmpi/2.0.1/lib:/usr/local/intel/17/tbb/lib/intel64/gcc4.7:/usr/local/intel/17/ compilers_and_libraries_2017/linux/mkl/lib/intel64:/usr/local/intel/17/compilers_and_libraries_2017/ linux/lib/intel64::/usr/local/lib Changes: slurm commands in /usr/bin —> no longer referenced in the PATH, etc Modules and Environment Module command: avail, spider, list, load, unload Manage the environment necessary to run your applications (binary, libraries, shortcuts) Modify environment variables using module commands: >>module avail & spider — learn what is available and how to load it >>module list (shows modules loaded in your environment) >>module load python (loads default version) >>module load python/3.5.1 (loads specific version) >>module unload python/3.5.1 (unloads specific version) ------------------------------------------------------------------- Modules and Environment On Rider, you might need to load a particular version of a compiler and OpenMPI in order to find your module. ! Command Rider Shows the list of the current loadable modules of a module avail hierarchy. It also shows, visually, which modules are loaded. module spider Shows the list of all modules and description. Shows the list of all modules and description, module spider <pkg> including instructions on how to load module. Modules and Environment On Rider, you might need to load a particular version of a compiler and OpenMPI in order to find your module. ! Command Rider Redcat Shows the list of the current Shows the list of all installed loadable modules of a modules and versions (almost module avail hierarchy. It also shows, all modules are loadable as a visually, which modules are result of the plain system). loaded. loaded. Shows the list of all modules module spider Shows the list of all modules. and how to load them. Modules and Environment Redcat Linux uses branching directory structure — independence • User files: /home • System files: /bin, /dev, /etc, /log, /opt, /var • Application files: /usr/local/<module>/<version> ! ! Consider Python: 4 versions installed ‣ /bin/python — 2.6.6 ‣ /usr/local/python/ ‣ 2.7.8 ‣ 2.7.10 ‣ 3.5.2 — how were they compiled? Modules and Environment Rider Lua module hierarchies — independence & accountability • Core — persistent, independent: no run-time dependence on other packages • Compilers - either intel/17 or gcc/6.3.0 • MPI - currently openmpi/2.0.1 with each compiler option ! Directory structure changes • compiler directory trees [here are the compiler files & executables] • /usr/local/<compiler>/<version> • /usr/local/intel/17 • compiled packages • /usr/local/<compiler>-<version>/<package>/<version> • /usr/local/intel-17/openmpi/2.0.1 Modules and Environment Rider Consider Python: what versions are installed/available? ‣ Which hierarchy is active? ‘module avail’ ! ‣ which python: /bin/python — 2.7.5 ‣ /usr/local/python/ — no such file or directory - run-time dependencies ! ‣ ‘module avail python’ ‣ python/3.5.1 python2/2.7.13 spyder/3.2.0-python2 ‣ ‘module spider python’ — information about python package! ‣ python/3.5.1 ! ‣ other possible module matches: python2 Intel + OpenMPI Hierarchy [mrd20@hpclogin ~]$ module avail ! ---------------------------------------------- /usr/local/share/modulefiles/MPI/intel/17/openmpi/2.0.1 ---------------------------------------------- amber/16-17 eigen3/3.3.4 hdf5/1.10.1 lammps/2017 (D) netcdf/4.4.1.1 python/3.5.1 relion/2.1.b1 vtk/8.0.1 bcftools/1.5 fftw/3.3.6-pl2 imagemagick/7.0.4-10 namd/2.12-cuda neuron/7.5 python2/2.7.13 samtools/1.5 boost/1.63 grace/5.1.25 lammps/2017-gpu namd/2.12 (D) openfoam/4.1 qtgrace/0.26 spyder/3.2.0-python2 ! ---------------------------------------------------