Requesting Resources on an HPC Facility

Requesting Resources on an HPC Facility (Using the Sun Grid Engine Job Scheduler) Deniz Savas dsavas.staff.shef.ac.uk/teaching June 2017 Outline 1. Using the Job Scheduler 2. Interactive Jobs 3. Batch Jobs 4. Task arrays 5. Running Parallel Jobs 6. GPUs and remote Visualisation 7. Beyond Iceberg Accessing the N8 tier 2 facility Running Jobs A note on interactive jobs • Software that requires intensive computing should be run on the worker nodes and not the head node. • You should run compute intensive interactive jobs on the worker nodes by using the qsh or qrsh command. • Maximum ( and also default) time limit for interactive jobs is 8 hours. Sun Grid Engine • Two iceberg or sharc headnodes are gateways to the cluster of worker nodes. • Headnodes’ main purpose is to allow access to the worker nodes for the logged in users. • All CPU intensive computations must be performed on the worker nodes. This is achieved by using one of the following two commands on the headnode. – qsh or qrsh : To start an interactive session on a worker node. – qsub : To submit a batch job to the cluster • Once you log into iceberg, you are recommended to start working interactively on a worker-node by simply typing qsh and working in the new shell window that is opened. The next set of slides assume that you are already working on one of the worker nodes (qsh session). Practice Session 1 Running Applications on Iceberg (Problem 1) • Case Studies – Analysis of Patient Inflammation Data • Running an R application how to submit jobs and run R interactively • List available and loaded modules load the module for the R package • Start the R Application and plot the inflammation data Other Methods of submitting Batch Jobs on the Sheffield HPC clusters Iceberg has a number of home grown commands for submitting jobs for some of the most popular applications and packages to the batch system. These commands create suitable scripts to submit the users’ job to the cluster automatically These are; runfluent , runansys , runmatlab , runabaqus To get information on how to use these command, simply issue the command name on a worker node without any parameters. Exercise 1: Submit a job via qsub • Create a script file (named example.sh) by using a text editor such as gedit ,vi or emacs and inputing the following lines: #!/bin/sh # echo “This code is running on” /bin/hostname /bin/date • Now Submit this script to SGE using the qsub command: qsub example.sh Tutorials On iceberg copy the contents of the tutorial directory to your user area into a directory named sge: cp –r /usr/local/courses/sge sge cd sge In this directory the file readme.txt contains all the instructions necessary to perform the exercises. Managing Your Jobs Sun Grid Engine Overview SGE is the resource management system, job scheduling and batch control system. (Others available such as PBS, Torque/Maui, Platform LSF ) • Starts up interactive jobs on available workers • Schedules all batch orientated ‘i.e. non-interactive’ jobs • Attempts to create a fair-share environment • Optimizes resource utilization SGE SGE SGE SGE SGE worker worker worker worker worker node node node node node B Slot C Slot C Slot 2 A Slot 1 B Slot 1 C Slot 1 A A Slot 2 B Slot 1 C Slot 1 C Slot 2 C Slot 3 B Slot 1 B Slot 2 B Slot 3 Slot 1 1 1 Queue-A Queue-B Queue-C SGE MASTER node .Queues .Policies .Priorities JOB Y JOB Z .Share/Tickets JOB X JOB O .Resources JOB N JOB U .Users/Projects Scheduling ‘qsub’ batch jobs on the cluster Working with SGE as a user Although the SGE system contains many commands and utilities most of them are for the administration of the scheduling system only. The following list of SGE commands will be sufficient for most users. – qsub : Submits a batch job – qsh or qrsh : Starts an interactive session – qstat : Queries the progress of the jobs – qdel : Removes unwanted jobs. Running interactive jobs on the cluster 1. User asks to run an interactive job (qsh, qrsh ) 2. SGE checks to see if there are resources available to start the job immediately (i.e a free worker ) – If so, the interactive session is started under the control/monitoring of SGE on that worker. – If resources are not available the request is simply rejected and the user notified. This is because by its very nature users can not wait for an interactive session to start. 3. User terminates the job by typing exit or logout or the job is terminated when the queue limits are reached (i.e. currently after 8 hours of wall-clock time usage). Demonstration 1 Running Jobs batch job example Using the R package to analyse patient data qsub example: qsub –l h_rt=10:00:00 –o myoutputfile –j y myjob OR alternatively … the first few lines of the submit script myjob contains - $!/bin/bash $# -l h_rt=10:00:00 $# -o myoutputfile $# -j y and you simply type; qsub myjob Summary table of useful SGE commands Command(s) Description User/System qsub, qresub,qmon Submit batch jobs USER qsh,qrsh Submit Interactive USER Jobs qstat , qhost, qdel, Status of queues and USER qmon jobs in queues , list of execute nodes, remove jobs from queues qacct, qmon, qalter, Monitor/manage SYSTEM ADMIN qdel, qmod accounts, queues, jobs etc Using the qsub command to submit batch jobs In its simplest form any script file can be submitted to the SGE batch queue by simply typing qsub scriptfile . In this way the scriptfile is queued to be executed by the SGE under default conditions and using default amount of resources. Such use is not always desirable as the default conditions provided may not be appropriate for that job . Also, providing a good estimate of the amount of resources needed helps SGE to schedule the tasks more efficiently. There are two alternative mechanisms for specifying the environment & resources; 1) Via parameters to the qsub command 2) Via special SGE comments (#$ ) in the script file that is submitted. The meaning of the parameters are the same for both methods and they control such things as; - cpu time required - number of processors needed ( for multi-processor jobs), - output file names, - notification of job activity. Method 1 Using qsub command-line parameters Format: qsub [qsub_params] script_file [-- script_arguments] Examples: qsub myjob qsub –cwd $HOME/myfolder1 qsub –l h_rt=00:05:00 myjob -- test1 -large Note that the last example passes parameters to the script file following the -- token. Method 2 special comments in script files A script file is a file containing a set of Unix commands written in a scripting language ‘usually Bourne/Bash or C-Shell’. When the job runs these script files are executed as if their contents were typed at the keyboard. In a script file any line beginning with # will normally be treated as a comment line and ignored. However the SGE treats the comment lines in the submitted script, which start with the special sequence #$ ,in a special way. SGE expects to find declarations of the qsub options in these comment lines. At the time of job submission SGE determines the job resources from these comment lines. If there are any conflicts between the actual qsub command-line parameters and the special comment (#$) sge options the command line parameters always override the #$ sge options specified in the script. An example script containing SGE options #!/bin/sh #A simple job script for sun grid engine. # #$ -l h_rt=01:00:00 #$ -m be #$ -M [email protected] benchtest < inputfile > myresults More examples of #$ options in a scriptfile #!/bin/csh # Force the shell to be the C-shell # On iceberg the default shell is the bash-shell #$ -S /bin/csh # Request 8 GBytes of virtual memory #$ -l mem=8G # Specify myresults as the output file #$ -o myresults # Compile the program pgf90 test.for –o mytestprog # Run the program and read the data that program # would have read from the keyboard from file mydata mytestprog < mydata Running Jobs qsub and qsh options -l h_rt=hh:mm:ss The wall clock time. This parameter must be specified, failure to include this parameter will result in the error message: “Error: no suitable queues”. Current default is 8 hours . -l arch=intel* Force SGE to select either Intel or AMD architecture nodes. No -l arch=amd* need to use this parameter unless the code has processor dependency. -l mem=memory sets the virtual-memory limit e.g. –l mem=10G (for parallel jobs this is per processor and not total). Current default if not specified is 6 GB . -l rmem=memory Sets the limit of real-memory required Current default is 2 GB. Note: rmem parameter must always be less than mem. -help Prints a list of options -pe ompigige np Specifies the parallel environment to be used. np is the -pe openmpi-ib np number of processors required for the parallel job. -pe openmp np Running Jobs qsub and qsh options ( continued) -N jobname By default a job’s name is constructed from the job-script-file- name and the job-id that is allocated to the job by SGE. This options defines the jobname. Make sure it is unique because the job output files are constructed from the jobname. -o output_file Output is directed to a named file. Make sure not to overwrite important files by accident. -j y Join the standard output and standard error output streams recommended -m [bea] Sends emails about the progress of the job to the specified email address.

Load more