Supercomputing

Supercomputing

Supercomputing Accessing CESGA Finis Terrae II Finis Terrae II 4x Login nodes: 306x THIN - 2x Intel Xeón E5-2680v3 (12c/24t) - 128GB RAM - 2x 1TB local drive 4x GPU 306x Thin nodes (as login nodes plus): - 1TB local drive LUSTRE 4x LOGIN 4x GPU nodes (login node plus): Internet Infiniband FDR @ 56 Gbps - + 2x GPUs NVIDIA Tesla K80 2x PHI 2x Xeon Phi nodes (login node plus): - + 2x Intel Xeon Phi 7120P (61c) 1x FAT 1x FAT node: - 8x Intel Xeon E7-8867v3 (16c/32t) - 4TB RAM - ~30TB SAS storage 2 First time connection # First, let’s create a public/private key pair to avoid using the password every time $ ssh-keygen Generating public/private rsa key pair. Enter file in which to save the key (/home/[my-local-user]/.ssh/id_rsa): [ ... ]1 # Now, authorize this key to access my Finis Terrae II account $ ssh-copy-id [my-user]@ft2.cesga.es [my-user]@ft2.cesga.es's password: # Done! Now let’s copy our project to the supercomputer to run it there $ scp -r myproject/ [my-user]@ft2.cesga.es:~/ # No more password! # Just reconnect to do our work there $ ssh [my-user]@ft2.cesga.es [[my-user]@fs6803 ~]$ cd myproject/ [[my-user]@fs6803 myproject]$ [ ... ] 1 It is recommended to put a passphrase to protect your private key. Consider using a key-agent to store the unlocked password in memory to avoid writing it every time. Working in a supercomputer Supercomputers often have a modular system to manage their installed software (compilers, libraries, applications, multiple versions…). $ gcc --version $ module load gcc/6.3.0 gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16) vs $ gcc --version gcc (GCC) 6.3.0 Other util commands are: module unload <module>, module avail, module list, module help [<module>]... Also, be aware of the different storage options available in Finis Terrae II for use: Name Purpose Location Size Speed Home General purpose filesystem. Store code, test results... $HOME 10 GB NFS @ ~100 MB/s Store To store big final computation results. $STORE 100 GB NFS @ ~100 MB/s Lustre High speed parallel file system used to store big temporal $LUSTRE 3 TB LUSTRE @ ~20 GB/s simulated data. Queuing jobs To execute in a compute node the use of the queue manager is required. The simplest way is to ask for an interactive session (see compute --help to obtain more details) [username@fs6803 ~]$ compute [ ... ] salloc: Granted job allocation <jobid> [username@c6601 ~]$ [ ... ] # Do your compilation or small executions To enqueue a simple script The simplest way is to use the sbatch command. Prepare a simple script: #!/bin/sh #SBATCH --ntasks 1 # Task to allocate (processes) #SBATCH --cpus-per-task 8 # Cores-per-task to allocate (useful to guarantee full node utilization) #SBATCH --nodes 1 # Nodes to allocate #SBATCH --partition thinnodes # Partition (or list of partitions) for the job to run on #SBATCH --time 00:30:00 # Wall time to allocate for the task (30 minutes) srun [-n <srun options>] <your executable> $ sbatch ./simple_script.sh Submitted batch job <jobid> Slurm The queue management system is called Slurm, and is widely used in HPC centers. All the queued jobs are identified by an unique job id. The final output of a queued execution will be written in the process working directory in a file with the following format: slurm-<jobid>.out To query about the status of the queued job, you can use the squeue command: $ squeue -j <jobid> # Query a specific job $ squeue -u <username> # Query all jobs for a given user You can also cancel any previously queued task with the scancel command: $ scancel <jobid> # To cancel a specific job $ scancel --state PENDING # Cancel all your pending jobs Other useful commands are: $ scontrol show job <jobid> -dd # To check details about a queued job $ sinfo # To view information about Slurm nodes.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    6 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us