Overview of Cluster GPU Software Stack

Overview of Cluster GPU Software Stack

GPU Login ● ssh glogin.dragon.kaust.edu.sa ● First login auto-generates keys & ssh config – .ssh/config ● Host glogin #GPU login nodes Hostname glogin.dragon.kaust.edu.sa User $USER IdentityFile ~/.ssh/ksl-internal StrictHostKeyChecking no ForwardX11 yes ForwardX11Trusted yes wiki.dragon.kaust.edu.sa/wiki/Tutorial0200LoggingIn GPU Software: Modules ● Modules – Customized to login node (GPU, Intel, AMD) – New & improved GPU App Stack is being built ● Expect changes. Make requests. Stay connected. – Prefer newest modules ● legacy will be deprecated ● Some modules might not be GPU optimized module avail module load module/version GPU Software: Modules ● CUDA MATLAB anaconda gromacs ansys tensorflow anaconda-R relion avizo beagle-lib schrodinger cst biobuilds torch lammps CST vmd mathematica cuda medea GPU Software: Modules ● OpenGL / EGL* vis/ParaView* adf caffe pymol adf ansys MATLAB anaconda CST python-canopy atk anaconda-base eman2 qiime avizo anaconda-R GAUSSVIEW R comsol anaconda3 genometools rstudio cst gamma ATK gnuplot schrodinger mathematica bandage molden smrtanalysis medea baps openbabel virtualgl molcas biobuilds vmd tecplot turbomole bluefish vesta xcrysden * NVIDIA EGL support coming to Cluster in future rollout... GPU Jobs + Constraints ● sinfo --partition=batch --format="%n %f" | fgrep gpu ● dgpu501-22-r cpu_intel_e5_2670,gpu,...,tesla_k40m dgpu502-01-l cpu_intel_e5_2670,gpu,...,tesla_k20m dgpu702-16 cpu_intel_e5_2699_v3,gpu,...,gtx1080ti dgpu703-01 cpu_intel_e5_2699_v3,gpu,...,p100 dgpu703-25 cpu_intel_e5_2699_v3,gpu,...,p6000 wiki.dragon.kaust.edu.sa/wiki/FAQConstraints GPU Jobs + Constraints ● srun --pty --time=1:00 --gres=gpu:p100:2 bash -l ● sbatch --time=1:00:00 --gres=gpu:1 --constraint="[p100|p6000]" runjob.sbat wiki.dragon.kaust.edu.sa/wiki/FAQConstraints#GPUs GPU Jobs + Constraints ● sbatch --time=1:00:00 runjob.sbat ● runjob.sbat #SBATCH --job-name=gpujob #SBATCH --gres=gpu:gtx1080i:4 #SBATCH --constraint="[local_500G]" #SBATCH --nodes=2 --ntasks-per-node=2 wiki.dragon.kaust.edu.sa/wiki/FAQConstraints#GPUs GPU Software: Modules & Compilers ● CMake – module load cmake ● C++ – System default: GCC v4.8.5 – module load gcc/6.4.0 – module load legacy intel/2017 GPU Software: Modules & Compilers ● CUDA – module load cuda/8.0.44 – nvcc -std=c++11 -o example example.cu ● CUDNN – module load applications-extra module load cuda/8.0.44-cudNN5.1 – nvcc -std=c++11 -o example example.cu GPU Apps ● tensorflow/1.3.0 – cudatoolkit=8.0, cudnn6.0.21, python=3.6.2 – module load tensorflow/1.3.0 – python >>> import tensorflow as tf GPU Tools ● General Information (not scalable) – nvidia-smi +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.26 Driver Version: 375.26 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX TIT... On | 0000:0D:00.0 Off | N/A | | 37% 56C P2 153W / 189W | 135MiB / 6081MiB | 86% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX TIT... On | 0000:0E:00.0 Off | N/A | | 31% 47C P8 34W / 189W | 2MiB / 6082MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 72633 C ../../build.cudnntraining.teneen/trainlenet 133MiB | +-----------------------------------------------------------------------------+ KSL provides profiling training... .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    15 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us