GPU Login

● ssh glogin.dragon.kaust.edu.sa ● First login auto-generates keys & ssh config – .ssh/config

● Host glogin #GPU login nodes Hostname glogin.dragon.kaust.edu.sa User $USER IdentityFile ~/.ssh/ksl-internal StrictHostKeyChecking no ForwardX11 yes ForwardX11Trusted yes

wiki.dragon.kaust.edu.sa/wiki/Tutorial0200LoggingIn GPU Software: Modules ● Modules – Customized to login node (GPU, Intel, AMD) – New & improved GPU App Stack is being built

● Expect changes. Make requests. Stay connected. – Prefer newest modules ● legacy will be deprecated ● Some modules might not be GPU optimized module avail module load module/version GPU Software: Modules

● CUDA

MATLAB anaconda ansys tensorflow anaconda-R relion avizo beagle-lib schrodinger cst biobuilds torch CST vmd mathematica cuda medea GPU Software: Modules ● OpenGL / EGL* vis/ParaView* adf caffe adf ansys MATLAB anaconda CST python-canopy atk anaconda-base eman2 qiime avizo anaconda-R GAUSSVIEW R comsol anaconda3 genometools rstudio cst gamma ATK gnuplot schrodinger mathematica bandage smrtanalysis medea baps openbabel virtualgl molcas biobuilds vmd tecplot bluefish vesta xcrysden

* NVIDIA EGL support coming to Cluster in future rollout...

GPU Jobs + Constraints

● sinfo --partition=batch --format="%n %f" | fgrep gpu

● dgpu501-22-r cpu_intel_e5_2670,gpu,...,tesla_k40m dgpu502-01-l cpu_intel_e5_2670,gpu,...,tesla_k20m dgpu702-16 cpu_intel_e5_2699_v3,gpu,...,gtx1080ti dgpu703-01 cpu_intel_e5_2699_v3,gpu,...,p100 dgpu703-25 cpu_intel_e5_2699_v3,gpu,...,p6000

wiki.dragon.kaust.edu.sa/wiki/FAQConstraints GPU Jobs + Constraints

● srun --pty --time=1:00 --gres=gpu:p100:2 bash -l ● sbatch --time=1:00:00 --gres=gpu:1 --constraint="[p100|p6000]" runjob.sbat

wiki.dragon.kaust.edu.sa/wiki/FAQConstraints#GPUs GPU Jobs + Constraints

● sbatch --time=1:00:00 runjob.sbat ● runjob.sbat #SBATCH --job-name=gpujob #SBATCH --gres=gpu:gtx1080i:4 #SBATCH --constraint="[local_500G]" #SBATCH --nodes=2 --ntasks-per-node=2

wiki.dragon.kaust.edu.sa/wiki/FAQConstraints#GPUs GPU Software: Modules & Compilers

● CMake – module load cmake ● C++ – System default: GCC v4.8.5 – module load gcc/6.4.0 – module load legacy intel/2017 GPU Software: Modules & Compilers

● CUDA – module load cuda/8.0.44 – nvcc -std=c++11 -o example example.cu ● CUDNN – module load applications-extra module load cuda/8.0.44-cudNN5.1 – nvcc -std=c++11 -o example example.cu GPU Apps

● tensorflow/1.3.0 – cudatoolkit=8.0, cudnn6.0.21, python=3.6.2 – module load tensorflow/1.3.0 – python >>> import tensorflow as tf GPU Tools

● General Information (not scalable) – nvidia-smi +------+ | NVIDIA-SMI 375.26 Driver Version: 375.26 | |------+------+------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |======+======+======| | 0 GeForce GTX TIT... On | 0000:0D:00.0 Off | N/A | | 37% 56C P2 153W / 189W | 135MiB / 6081MiB | 86% Default | +------+------+------+ | 1 GeForce GTX TIT... On | 0000:0E:00.0 Off | N/A | | 31% 47C P8 34W / 189W | 2MiB / 6082MiB | 0% Default | +------+------+------+ +------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |======| | 0 72633 C ../../build.cudnntraining.teneen/trainlenet 133MiB | +------+ KSL provides profiling training...