<<

Using the IBM Opteron 1350 OSC —Batch Processing

October 19-20, 2010 Online Information

• Technical information: – http://www.osc.edu/supercomputing/ • Hardware • Software • Environment • Training • Notices – http://www.osc.edu/supercomputing/computing/#batch • Contact information

[email protected]

– 1-800-686-6472, 1-614-292-1800

2 Table of Contents

• Batch Processing: Step by Step • Minimum Batch Files for Glenn Cluster • Advantages of Batch Processing • Useful Header Lines • Useful Batch Environment Variables • More PBS commands

All files used in the examples can be retrieved from the Batch subdirectory of the svn repository: svn checkout http://svn.osc.edu/repos/softdevtools/trunk/Batch Batch

3 Interactive Processing

• The way you are used to working on a workstation or laptop!

• Enter a command, output returned to monitor. Based on output, enter a command, output returned to monitor, repeat. User is interacting in real-time with the computer. • Interactive use is easiest (and almost required) for tasks that involve user’s analysis of previous command’s output to determine the next command.

• Common interactive tasks: file/directory searching, directory management (clean-up, reorganization, etc.), file editing, code debugging (in any manner), use of window-based software, use of performance tools (although most create a raw data file), and the list goes on …

4 Example Program

Throughout this workshop the same tasks will be carried out in different ways. The tasks are:

1) Compiling a bug-free source code file 2) Running the executable produced

3) Examining the output

4) Removing unnecessary files

5) Changing from home directory to “work” directory

5 Example Serial Program svn checkout http://svn.osc.edu/repos/softdevtools/trunk/Batch Batch

#include This program ( ) #include nest.c #include simply adds together int main(int argc, char* argv[]) { numbers whose values float x, rock = 0; int i, j, N, M; are created in a nested M = 8; N = 4; loop if (argc == 3) { N = atoi(argv[1]); // command arguments M = atoi(argv[2]); } else { // Exit if incorrect number of arguments printf("Usage : ./a.out int int\n"); exit(); } // Set Loop Counts for(i=1;i<=N;i=i+1) { for(j=1;j<=M;j=j+1) { x = log((float) ((10*i)+j)); // natural log rock = rock + x; // accumulate sum } } /* Output */ printf("For loop counts N=%d M=%d\n",N,M); printf("Sum=%f\n",rock); }

6 Example Parallel Program

if (rank==0) { This MPI program ( ) for (i=0;i

#include for (i=0;i sub[i]=eleven[i]; #define N 16000 }

int main(int argc, char *argv[]) { MPI_Send(&eleven[1*(N/4)],N/4,MPI_INT,1,19,MPI_COMM_WORLD); int rank,size; MPI_Send(&eleven[2*(N/4)],N/4,MPI_INT,2,29,MPI_COMM_WORLD); int i,j; MPI_Send(&eleven[3*(N/4)],N/4,MPI_INT,3,39,MPI_COMM_WORLD); int eleven[N],sub[N/4]; MPI_Status info; if (rank!=0) { MPI_Request ask,found; MPI_Recv(sub,N/4, MPI_INT, 0,MPI_ANY_TAG, int done=0; MPI_COMM_WORLD, &info); int index; }

MPI_Init(&argc,&argv); MPI_Barrier(MPI_COMM_WORLD); MPI_Comm_rank(MPI_COMM_WORLD,&rank); MPI_Comm_size(MPI_COMM_WORLD,&size); MPI_Irecv(&index,1,MPI_INT,MPI_ANY_SOURCE,MPI_ANY_TAG, MPI_COMM_WORLD,&ask); printf("In node[%d], tmpdir is %s\n", rank, system("echo $TMPDIR"));

[cont’d on next slide]

7 Example Parallel Program

[cont’d from previous slide]

i=0;

MPI_Test(&ask,&done,&info);

while (i

MPI_Wait(&ask,&info); printf("P:%d I searched up to index %d\n", rank,(i-1));

MPI_Finalize(); }

8 Interactive Session

This set of commands assumes you have downloaded nest.c and nest.pbs from the svn repository: opt-login1:$ cd Batch opt-login1:$ gcc nest.c -lm opt-login1:$ ./a.out 459 121 For loop counts N=459 M=121 Sum = 306505.406250 opt-login1:$ rm a.out

9 Batch Processing of SAME Tasks • Key fact: In the previous interactive session the user already knew before logging in what commands they were going to type. The code had been debugged, everything is ready for a “production” run. • NO REAL-TIME INTERACTION IS REQUIRED → Therefore, you can put the commands in a file (“batch file”)

• A batch file is just a script (sequence of UNIX commands put into a file) that contains:

1) The EXACT SAME commands you typed on the keyboard during the interactive session.

2) Some lines at the beginning of the file that give the batch some parameters it needs to know. These opening lines are called the header of the batch file.

10 Structure of a Minimum Batch File

#PBS -N nest Header #PBS -l walltime=00:05:00 #PBS -j oe • in /bin/ksh, this prints (echos) the executed commands in your output file #PBS –S /bin/ksh • in /bin/csh, this command is set echo set –x • which is automatically set when submitting a batch request • it is set to the directory path from where cd $PBS_O_WORKDIR the batch request was submitted gcc nest.c -lm • more on other environment variables later ./a.out 459 121 /bin/rm a.out Same Unix Commands

this will remove the named file(s) immediately

11 Batch Processing Session -bash-3.2$ qsub nest.pbs 2289978.opt-batch.osc.edu

-bash-3.2$ qstat -u yzhang opt-batch.osc.edu: Req'd Req'd Elap ID Username Queue Jobname SessID NDS TSK Memory Time S Time ------2289978.opt-bat yzhang serial nest -- 1 -- -- 00:05 Q --

-bash-3.2$ qstat -u yzhang 2289978.opt-bat yzhang serial nest -- 1 -- -- 00:05 Q --

-bash-3.2$ qstat -a | grep nest 2289978.opt-bat yzhang serial nest 402 1 -- -- 00:05 R --

-bash-3.2$ qstat -u yzhang 2289978.opt-bat yzhang serial nest 402 1 -- -- 00:05 R 00:01

12 Batch Processing Session

Results:

-bash-3.2$ ls -ltr total 108 -rw-r--r-- 1 yzhang G-3040 276 Oct 28 14:30 run-nest.job -rw-r--r-- 1 yzhang G-3040 295 Oct 28 14:30 run-nest3.job -rw-r--r-- 1 yzhang G-3040 140 Oct 28 14:30 nest.pbs -rw-r--r-- 1 yzhang G-3040 725 Oct 28 14:30 nest.c -rw------1 yzhang G-3040 139 Nov 1 22:17 nest.o2289978

-bash-3.2$ cat nest.o2289978 + cd /nfs/06/yzhang/workshops/softw09/Batch + gcc nest.c -lm + ./a.out 459 121 For loop counts N=459 M=121 Sum=306505.406250 + rm -f a.out

13 Description of Batch File Header Lines

• Batch processing on OSC systems is implemented by the Portable Batch System.

• Header lines begin with #PBS. The # symbol indicates that this line is just a comment from the 's point of view.

• #PBS -l walltime=00:01:00

– The character before the word walltime is a lower-case “el” not the numeric character “1”. The “el” stands for limit. – This header line tells PBS how much real time (the time on the clock on the wall) you expect the execution of the batch file commands to take. – Since the time format is hh:mm:ss, the limit on the wall clock time is 1 minute in this example – You can estimate wall clock time by running your program once and using external or internal timing commands (see later)

14 PBS at OSC

• PBS is installed on Glenn cluster and Bale cluster

• Glenn cluster has four partitions

• Options may vary across systems/partitions , e.g., to take advantage of system-specific hardware

• Review OSC web pages on the different computing systems/partitions.

http://www.osc.edu/supercomputing/hardware/

15 Description of Batch File Header Lines

• #PBS –N nest – provides a name to your batch job (important!); here the name is nest – the name is used by PBS in several ways: • appears in status (qstat command) output • prefix for the output file name returned by PBS, e.g. nest.o2289978 – A PBS output log file can be thought of as a screen dump. The log file contains everything that would have appeared on your monitor if the commands had been run interactively. • #PBS –j oe – By default, PBS returns two log files: one for the standard output data stream, the other for the standard error data stream – This option joins these two into a single log file

16 Description of Batch File Header Lines

• #PBS –S /bin/ksh – Sets the Linux shell which will interpret the shell commands in your batch script – In this example, the Korn shell is used

• Can use any available shell for your command interpreter

• The command

cat /etc/shells

shows the shells available on a system

17 Procedure to run a Batch job

1. Create a batch job-file (standard text file)

2. Use the qsub command to submit the job-file to PBS • When PBS begins executing batch file commands, the shell is invoked in your login directory (your home directory, $HOME). • Retain the job number identified in the return line!

3. Use the qstat command to check on your job status (tip: use qstat -u my_user_id to see the status of your jobs only)

4. Your batch session is complete when the job no longer appears in the qstat output table • Your job is started when requested resources become available • Your job ends when batch-file commands finish or time limit reached

5. Examine the file returned by PBS to see your job output • File appears in your home directory or the directory from which you submitted the job if using $PBS_O_WORKDIR

18 qsub and qstat • qsub batch_file_name – Submits your batch job to PBS – Based on the resources requested in the job header, PBS places your job in a queue into which your job “fits” – Returns your Job Identification Number (“Job ID” in qstat output, 2289978.opt-batch.osc.edu in our example)

• qstat –a – Status information on all batch jobs in PBS at that time – Output is a multi-columned, lengthy table containing many pieces of information – The first four columns identify the job: • JID that matches value returned from qsub, e.g., 2289978.opt-batch.osc.edu • Your OSC login ID, e.g., ucn1234

19 qstat -a (continued) • The name of the queue your job was placed in • The “internal” name for your job as set in the –N header line of your batch file

– The Req'd Time column shows the time limit you set in the batch file header (HH:MM:SS)

– THE MOST IMPORTANT COLUMN is the S(tatus) column which shows a letter code for what your code is doing now

Code Meaning R Running Q Queued H Held

20 The PBS log file • The log file may seem cryptic but there is a pattern to it: – internal_name.oJID – internal_name is the name given in “#PBS -N” header line – The o after the period stands for “output” (e stands for “error”)

• Log file contains the UNIX command(s) run and their output (if any)

• In our sample log file, the UNIX commands are proceeded by a “+”. This is due to the set –x command (ksh/bash) executed first in the batch file. This command causes each batch file command to be echoed to the monitor with a + in front. Without the set –x command (or its equivalent) only the actual screen output would appear in the log file.

• For those who use tcsh or csh, echoing of shell commands is achieved by using the command set echo on.

21 Deleting a Batch Job

• Situations may arise in which the user may want to delete one of your jobs from the PBS queue: – resource limits set incorrectly – missing input file(s) – incorrect or missing commands in the batch file – program is taking too long to run (“infinite-loop”)

• The PBS command to delete a batch job is qdel:

$ qdel Job_ID

22 Advantages of Batch Processing • Interactive resource limits too small – limit UNIX command will show interactive limits on CPU time (session & process), memory size, disk size, etc. – Current limits – 2 hours of CPU time – 1 GB of memory • Improves overall system efficiency by weighing user requirements against system load • Makes sure all users can get equal access to resources by enforcing a policy • Automatically keeps a log of your Unix commands and their output • Only way to access > 1 nodes for parallel processing • Batch processing concepts same for all batch software • Learn PBS on one OSC machine, know it for all

23 Useful Batch File Header Lines

• Header Line = qsub option • Optional resource request • Mailing options • Rename the Log File • Use a Different Shell • Parallel Processing • Starting Date & Time • Special Queues

24 qsub Options

• The header lines of a PBS batch file are actually options to the qsub command. For example in our batch files we have put the header line

#PBS –j oe

We could have left out that header line and used that option when the batch file is submitted:

qsub –j oe batch_file

• It is recommended to put the options in the header section of the batch file so that the user has a record of values used. • In this chapter, other options (besides the minimum suggested) will be discussed with emphasis on the most useful. • All qsub options can be found with

$ man qsub

25 Optional Resource Request -l mem=amount (OPTIONAL) Request use of amount of memory per node. Default units are bytes; can also be expressed in megabytes (e.g. mem=1000MB) or gigabytes (e.g., mem=2GB) -l file=amount (OPTIONAL) Request use of amount of local scratch disk space per node. Default units are bytes; can also be expressed in megabytes (e.g. disk=10000MB) or gigabytes (e.g., disk=10GB). Only required for jobs using > 10GB of local scratch space per node -l software=package[+N] (OPTIONAL) Request use of N licenses for package. If omitted, N=1. Only required for jobs using specific software packages with limited numbers of licenses; see software documentation for details.

EMPOWER. PARTNER. LEAD. Email from PBS

• Do I really have to keep checking on the status of my job with qstat – a to find out its progress? – Answer: no. By using the following batch file header lines, PBS will email you a message that your job has begun and that your job has ended, respectively #PBS –m b #PBS –m e • In the “job ending” email message the return status of job is reported. A return status of 0 indicates success. Total time and memory consumed is also reported. • There is also a –m a option that sends email if your job aborts • Edit the contents of your ${HOME}/.forward file to specify the destination of the email, e.g.,

kilroywashere@your_local_supernet.net

27 Sample End Email

Date: Sunday - November 1, 2009 9:33 PM From: root To: Subject: PBS JOB 2289554.opt-batch.osc.edu

PBS Job Id: 2289554.opt-batch.osc.edu Job Name: nest Execution terminated Exit_status=0 resources_used.cput=00:02:18 resources_used.mem=676kb resources_used.vmem=9824kb resources_used.walltime=00:02:20

28 Log File Name

• Do I have to live with that awkward name for the log file returned by PBS?

• Answer: no Add the following header line and choose the name you want: #PBS –o file_name

• This option stands for (o)utput. When the batch job is finished everything that would have been displayed on your monitor is contained in file_name.

29 Changing Shells • If not otherwise specified, the used to execute your batch file commands is your login shell

• The user can choose to run a batch job in a different shell if they desire. The header line is:

#PBS –S /bin/[csh|ksh|bash]

• Notice the full path name of the shell command must be used

• NOTE: Echoing of commands in the C shell is enabled by the command set echo

The “echoed” commands are not proceeded by a ‘+’

30 Parallel Processing for Clusters

• One batch file header line performs the most critical step needed in parallel processing: setting the number of processors your code will run on.

• This syntax for this important header line is: #PBS -l nodes=N:ppn=1 (1<=ppn<=8)

• The first part of this option specifies the number (N) of nodes you need. The maximum value for N depends on the nodes available on a given machine.

• Glenn cluster has 4/8/16 processors per node. The second part of the option indicate how many processors per node are used. If the ppn section is omitted, PBS will default to 1 processor per node.

31 Dual Socket Quad Socket

Number of Cores: 4 Number of Cores: 8 Memory: 8 GB Memory: (70) 16 GB, (16) 32 GB, (2) 64 GB To request, specify: To request, specify: N = 1 Dual 1 ≤ N ≤ 512 1 ≤ C ≤ 8 1 ≤ C ≤ 4 Type=oldquad Core Type=olddual To request memory, Example: #PBS -l mem=16GB #PBS –l nodes=10:ppn=4:olddual Example: #PBS –l nodes=1:ppn=8:oldquad Number of Machines: 877 Number of Machines: 88

Number of Cores: 8 Number of Cores: 16 Memory: 24 GB Memory: 64 GB To request, specify: To request, specify: 1 ≤ N ≤ 256 N = 1 Quad 5 ≤ C ≤ 8 9 ≤ C ≤ 16 Type=newdual Type=newquad Example: To request memory, Core #PBS –l nodes=5:ppn=8:newdual # PBS -1 mem = 32 GB Example: #PBS –l nodes=1:ppn=16:newquad Number of Machines: 650 Number of Machines: 8

#PBS -1 nodes=N:ppn=C:type Parallel Processing Batch File • The search program uses 4 nodes to search a quarter of an integer array for the value 11. When one processor has found it, that processor signals the others to stop searching. • The –n option for qstat has been used to show what physical nodes the code is actually being run on.

#PBS -l walltime=00:04:00 #PBS -N search #PBS -j oe #PBS –S /bin/ksh #PBS -l nodes=4:ppn=1:olddual set -x cd $PBS_O_WORKDIR mpicc search.c qstat –u $USER -rn mpiexec a.out < data3.txt rm a.out rm search.o

33 Log File (nodes=4:ppn=1) -bash-3.2$ less search.o2289997 + cd /nfs/06/yzhang/workshops/softw09/Batch + mpicc search.c + qstat -u yzhang -rn opt-batch.osc.edu: Req'd Req'd Ela p Job ID Username Queue Jobname SessID NDS TSK Memory Time S Tim e ------2289997.opt-batch.os yzhang parallel search -- 4 -- -- 00:04 R - - opt0345/0+opt0344/0+opt0343/0+opt0342/0 + mpiexec a.out + 0< data3.txt P:0 11 found at index=1499 P:1 I searched up to index 1789 P:0 I searched up to index 1499 P:2 I searched up to index 1406 P:3 I searched up to index 1789 + rm -f a.out search.o

34 Log File (nodes=2:ppn=2) -bash-3.2$ less search.o2289998 + cd /nfs/06/yzhang/workshops/softw09/Batch + mpicc search.c + qstat -u yzhang -rn opt-batch.osc.edu: Req'd Req'd Ela p Job ID Username Queue Jobname SessID NDS TSK Memory Time S Tim e ------2289998.opt-batch.os yzhang parallel search -- 2 -- -- 00:04 R - - opt0341/1+opt0341/0+opt0340/1+opt0340/0 + mpiexec a.out + 0< data3.txt P:0 11 found at index=1499 P:1 I searched up to index 1423 P:0 I searched up to index 1499 P:2 I searched up to index 2583 P:3 I searched up to index 3542 + rm -f a.out search.o

35 Parallel Processing for SMPs

• Specifying a processor count for SMP parallel job:

#PBS -l nodes=1:ppn=8:oldquad #PBS -l walltime=10:00:00 #PBS -j oe #PBS -N openmp #PBS -S /bin/ksh cd $TMPDIR cp $HOME/openmp/a.out . export OMP_NUM_THREADS=8 ./a.out

36 Setting Execution Day & Time

• Instruct PBS to not begin executing my job until a certain date and time • Use the header line:

#PBS –a [YYYY][MM][DD]hhmm

• This is the standard “(a)t” option. Only the hours and minutes must be set. If the time set has already passed, PBS will assume the date is tomorrow

• Let's say I wanted to pick a time in which a machine was not very busy. Say, this Saturday at 5 am. The header line would look like this:

#PBS –a 200911070500

• You can submit the job today, and it will be put in the (W)ait state until the date & time indicated. The output of qstat –a for a timed job looks as follows:

17450.nfs4.osc. osu2917 serial hpmulti_de 16440 1 -- -- 240:0 R 00:13 17453.nfs4.osc. yzhang batch dt ------00:04 W -- 17454.nfs4.osc. utl0192 parallel fractaltda 31330 8 -- -- 01:05 R --

37 Queue Specification • On many HPC systems special queues are set up and can be used only by permitted users. These queues might allow a huge amount of memory or time, a large number of processors, or access to a third- party software package.

• For example, on the glenn cluster there is a special queue called “longserial”, dedicated to serial jobs that need to run more than 168 hours and less than 336 hours.

• If you have permission, you can specify the queue for your job with the header line

#PBS –q queue_name

Otherwise, just let PBS put your job in the appropriate queue.

• For most applications, users will be put into either the serial or parallel queues.

38 Useful Batch Environmental Variables

• Using the /tmp directory

• Changing to your work directory

• Informative environment variables

39 • For many user the sizes of dataTMPDIR files or executable files are so large they cannot be placed in their home directories.

• The /tmp directory offers a huge amount of temporary disk space (315TB in total) to all users of an OSC system. In addition, it is much faster to access than $HOME disk since it is on local disk (not NFS- mounted).

• For each batch job, there is a subdirectory of /tmp uniquely associated with that job. It comes into existence when the job begins and is deleted when the job is finished. The name of the /tmp subdirectory is stored in the environment variable TMPDIR

• In the batch file the user should copy all files needed to $TMPDIR, cd to $TMPDIR, run your code, and finally bring needed output back files to your $HOME area.

• Note that “clean-up” at the end of the batch file is not needed since the $TMPDIR directory and all its files are deleted when the job ends.

40 Batch File using TMPDIR

#PBS -l walltime=00:04:00 #PBS -N nest #PBS -j oe set -x cd $HOME/workshops/softw09/Batch cp nest.c $TMPDIR cd $TMPDIR gcc nest.c -lm ./a.out 3 4 > output cp output $HOME/workshops/softw09/Batch cd $HOME/workshops/softw09/Batch cat output

41 Returned Log File

+ cd /nfs/06/yzhang/workshops/softw09/Batch + cp nest.c /tmp/pbstmp.2290002 + cd /tmp/pbstmp.2290002 + gcc nest.c -lm + ./a.out 3 4 + cp output /nfs/06/yzhang/workshops/softw09/Batch + cd /nfs/06/yzhang/workshops/softw09/Batch + cat output For loop counts N=3 M=4 Sum=17.406998

42 pbsdcp – Distributed Copy for Parallel Jobs • $TMPDIR directory is not shared across nodes!

• When a parallel job starts running on multiple nodes, each node has its own $TMPDIR.

• Use pbsdcp when copying files to directories not shared between nodes (e.g. /tmp or $TMPDIR) – Distributed copy command – Two modes: • -s scatter mode (default) • -g gather mode

EMPOWER. PARTNER. LEAD. Batch File Using pbsdcp

#PBS -N search #PBS -l walltime=00:04:00 #PBS -j oe #PBS -l nodes=2:ppn=2:olddual #PBS -S /bin/ksh

set –x qstat $PBS_JOBID -n cd $HOME/workshops/opteron09/batch mpicc -O3 search.c -o search

pbsdcp search data3.txt $TMPDIR cd $TMPDIR

/usr/bin/time mpiexec search < data3.txt

EMPOWER. PARTNER. LEAD. Log File Returned

+ cd /nfs/06/yzhang/workshops/opteron09/batch + qstat 2247114.opt-batch.osc.edu -rn opt-batch.osc.edu: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time ------2247114.opt-batch.os yzhang parallel search -- 2 -- -- 00:04 R -- opt0871/1+opt0871/0+opt0866/1+opt0866/0

+ mpicc -O3 search.c -o search + pbsdcp search data3.txt /tmp/pbstmp.2247114 + cd /tmp/pbstmp.2247114 + /usr/bin/time mpiexec search + 0< data.txt P:0 11 found at index=1499 P:0 I searched up to index 1499 P:1 I searched up to index 2236 P:2 I searched up to index 564 P:3 I searched up to index 439 0.00user 0.00system 0:01.06elapsed 1%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+1855minor)pagefaults 0swaps

EMPOWER. PARTNER. LEAD. PBS_O_WORKDIR

• Is there a way that PBS can automatically cd to my working directory since I always start out in my home directory? • Answer: mostly. Once a user has used qsub to submit a batch job, the environment variable PBS_O_WORKDIR is filled with the absolute path of the directory from which qsub was executed. • Usually, where the user has the files the batch job needs to work on is also where they submit from. Thus, the first line of their batch file can be made general purpose

cd $PBS_O_WORKDIR

• The user doesn’t even have to remember the path to the directory they are working in.

46 Batch File Using $PBS_O_WORKDIR

#PBS -l walltime=00:04:00 #PBS -N nest #PBS -j oe set -x cd $PBS_O_WORKDIR gcc nest.c -lm ./a.out 3 4 rm a.out

47 Log File Returned

+ cd /nfs/06/yzhang/workshops/softw09/Batch + gcc nest.c -lm + ./a.out + 3 4 Enter Loop Counts For loop counts N=3 M=4 Sum=17.406998 + rm a.out

48 Information Variables

• PBS has a number of built-in environment variables that preserve job information: – PBS_O_HOST = hostname of machine running PBS – PBS_O_QUEUE = starting queue your job was put in – PBS_QUEUE = queue your job was executed in – PBS_JOBID = JID of your job – PBS_JOBNAME = “internal” name you gave job – PBS_NODEFILE = name of the file containing list of nodes your job used

• On the next two slides are a batch file and its return log file that shows that these variables are filled with the correct values

49 Batch File Reporting Environment Information

#PBS -l walltime=5:00 #PBS -N print-env-var #PBS -j oe

set -x cd $PBS_O_WORKDIR qstat -u $USER -rn echo $PBS_O_HOST echo $PBS_O_QUEUE echo $PBS_QUEUE echo $PBS_JOBID echo $PBS_JOBNAME cat $PBS_NODEFILE

50 Returned Log File

+ cd /nfs/06/yzhang/workshops/softw09/Batch + qstat -u yzhang -rn opt-batch.osc.edu: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time ------2290013.opt-batch.os yzhang serial print-env- 24320 1 -- -- 00:05 R -- opt0372/2 + echo opt-login02.osc.edu (PBS_O_HOST) opt-login02.osc.edu + echo batch (PBS_O_QUEUE) batch + echo serial (PBS_QUEUE) serial + echo 2290013.opt-batch.osc.edu (PBS_JOBID) 2290013.opt-batch.osc.edu + echo print-env-var (PBS_JOBNAME) print-env-var + cat /var/spool/batch/torque/aux//2290013.opt-batch.osc.edu (PBS_NODEFILE) opt0372

51 More PBS Commands

• qpeek

• qstat (more options)

• Moab Scheduler Commands

52 qpeek

• qpeek is a well-named command. It allows for the user to “peek” into the partially-completed log file of a running job. Thus, the user can see the progress of the job.

• On the next slide, the qpeek command is used to see at what batch file command the following job is at: #PBS -l walltime=00:20:00 #PBS -N liver #PBS -j oe set -x cd liver_ia64 ./liver rm liver

53 qpeek Demonstration ipf-login1:$ qstat -a nfs1.osc.edu:

Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time ------17547.nfs1.osc. abe serial liver 18385 1 -- -- 00:20 Q -- ipf-login1:$ qpeek 17547 Job 17551 is not running! ipf-login1:$ qpeek 17547 + cd liver_ia64 ipf-login1:$ qpeek 17547 + cd liver_ia64 + ./liver ipf-login1:$ qpeek 17547 + cd liver_ia64 + ./liver ipf-login1:$ qpeek 17547 qstat: Unknown Job Id 17547.nfs1.osc.edu Job 17547 is not running!

54 qstat

• By using the qstat command the user can get a vast amount of information about jobs (as we have already seen) and the batch system queues themselves.

• We have already used these qstat options: -a show status info on all jobs -r show status info on running jobs only -n show the nodes jobs are running on

• The new options are used to check how busy the queues are and what the queue limits/properties are -Q summary of load on each of the queues -q summary of limits on each queues -Qf | more detailed description of all queue properties

55 Glenn Cluster Queues -bash-3.2$ qstat -Q Queue Max Tot Ena Str Que Run Hld Wat Trn Ext Type ------longserial 0 7 yes yes 0 7 0 0 0 0 E parallel 0 65 yes yes 0 65 0 0 0 0 E serial 0 407 yes yes 18 388 1 0 0 0 E dedicated 0 0 yes yes 0 0 0 0 0 0 E batch 0 0 yes yes 0 0 0 0 0 0 R

-bash-3.2$ qstat -q server: opt-batch.osc.edu

Queue Memory CPU Time Walltime Node Run Que Lm State ------longserial -- -- 336:00:0 1 7 0 -- E R parallel -- -- 96:00:00 256 65 0 -- E R serial -- -- 168:00:0 1 386 19 -- E R dedicated -- -- 48:00:00 965 0 0 -- E R batch ------0 0 -- E R ------458 19

56 Full Queue Description

-bash-3.2$ qstat -Qf parallel Queue: parallel queue_type = Execution max_user_queuable = 100 total_jobs = 65 state_count = Transit:0 Queued:0 Held:0 Waiting:0 Running:65 Exiting:0 resources_max.nodect = 256 resources_max.nodes = 256:ppn=4 resources_max.walltime = 96:00:00 resources_min.nodect = 2 resources_default.nodes = 2:ppn=1 resources_default.walltime = 01:00:00 mtime = 1237216627 resources_assigned.mem = 0b resources_assigned.nodect = 823 resources_assigned.vmem = 0b enabled = True started = True

57 Batch Request Limits For Users • For each user – 128 concurrently running jobs – 2048 processor cores in concurrent use • Serial jobs – Request only one node and up to 8 processor cores – 168 hour limit • Parallel jobs – Request multiple nodes and up to 2048 processor cores – 96 hour limit • Exceptions possible – Longer time limits – Larger processor counts – Contact [email protected]

EMPOWER. PARTNER. LEAD. Maui Scheduler Commands

• OSC PBS software has been enhanced with the use of the Moab Scheduler to improve job flow. – Advance reservations – Backfill scheduling – Fairshare and quality-of-service (QOS) levels

• Maui also comes with its own set of useful commands: – showq (list currently running and queued jobs)

– showstart (estimates start time for a queued job)

– showbf (tells what processors are available to “back-fill” the system)

59 Moab Scheduling Algorithm • Compute priorities for all jobs not currently running.

• Sort idle jobs in priority order from highest to lowest, removing any jobs which have had holds place on them or exceed policy limits.

• Starting with the highest priority job, attempt to run each job until there are not enough resources available to run the highest priority job remaining.

• Given current system conditions, compute when is the soonest time the highest priority job could run, and create a reservation for it at that time.

• Backfill any other idle jobs which will not cause the start time for the highest priority job to slip further into the future.

60 Factors Involved in Job Priority

• Recent usage – How much other computing has been done by user over last several days – How much other computing has been done by user’s group over last several days • Processor count requested • How long the job has been queued • Expansion factor (ratio of job length to queue time)

These factors tend to favor large processor-count, long-running jobs, as those are the most difficult to schedule. Smaller processor-count and/or shorter-running jobs are filled in using backfill scheduling.

NOTE: Highest priority does not mean a job will run immediately, the system must free up enough resources (processors and memory) to run it

61 showq Output ACTIVE JOBS------JOBNAME USERNAME STATE PROC REMAINING STARTTIME

2291837 osu3127 Running 1 00:28:41 Mon Nov 2 15:47:24 2293402 osu4970 Running 4 00:38:21 Tue Nov 3 10:57:04 2293187 osu1697 Running 1 1:27:13 Tue Nov 3 08:45:56 2293197 osu1697 Running 1 1:30:19 Tue Nov 3 08:49:02 2293386 osu5425 Running 4 1:32:15 Tue Nov 3 10:50:58 2293387 osu5425 Running 4 1:32:21 Tue Nov 3 10:51:04 2293388 osu5425 Running 4 1:32:27 Tue Nov 3 10:51:10 2289284 utl0253 Running 4 1:36:35 Sun Nov 1 20:55:18 2293408 kazantzi Running 8 1:45:01 Tue Nov 3 11:03:44 2293151 osu5455 Running 8 1:58:51 Tue Nov 3 08:17:34 2293390 osu5425 Running 4 2:33:28 Tue Nov 3 10:52:11 2293391 osu5425 Running 4 2:33:34 Tue Nov 3 10:52:17 2293392 osu5425 Running 4 2:33:40 Tue Nov 3 10:52:23 2292067 osu5208 Running 16 2:37:07 Mon Nov 2 17:25:50 ...

519 active jobs 5856 of 9256 processors in use by local jobs (63.27%) 1058 of 1584 nodes active (66.79%)

62 showq Output (cont’d)

IDLE JOBS------JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME

2258855 ysu0077 Idle 4 5:05:00:00 Sun Nov 1 21:59:55 2264762 ysu0077 Idle 4 6:06:00:00 Sun Nov 1 21:59:55 2271320 osu5455 Idle 1 1:16:00:00 Sun Nov 1 21:59:55 2271321 osu5455 Idle 1 1:16:00:00 Sun Nov 1 21:59:55 2271322 osu5455 Idle 1 1:16:00:00 Sun Nov 1 21:59:55

5 Idle Jobs

BLOCKED JOBS------JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME 2275323 wsu0167 UserHold 4 2:22:00:00 Wed Dec 31 19:00:00 2288910 osu4410 Idle 2 15:00:00 Wed Dec 31 19:00:00 2288911 osu4410 Idle 2 15:00:00 Wed Dec 31 19:00:00 ... 14 blocked jobs

Total jobs: 538

63 showstart Usage

-bash-3.2$ showstart 2290021 job 2290021 requires 4 procs for 00:04:00

Estimated Rsv based start in 00:00:04 on Sun Nov 1 22:52:39 Estimated Rsv based completion in 00:04:04 on Sun Nov 1 22:56:39

Best Partition: olddual

64 showbf Usage

-bash-3.2$ showbf Partition Tasks Nodes Duration StartOffset StartDate ------ALL 20914 237 2:02:01 00:00:00 21:57:59_11/01 ALL 20795 204 10:02:01 00:00:00 21:57:59_11/01 ALL 20777 199 1:10:02:01 00:00:00 21:57:59_11/01 ALL 20745 195 10:10:02:01 00:00:00 21:57:59_11/01 ALL 20741 194 11:10:02:01 00:00:00 21:57:59_11/01 ALL 20731 192 INFINITY 00:00:00 21:57:59_11/01 olddual 20282 148 2:02:01 00:00:00 21:57:59_11/01 olddual 20166 116 10:02:01 00:00:00 21:57:59_11/01 olddual 20148 111 INFINITY 00:00:00 21:57:59_11/01 newdual 19762 2 2:02:01 00:00:00 21:57:59_11/01 newdual 19759 1 INFINITY 00:00:00 21:57:59_11/01 oldquad 20380 85 INFINITY 00:00:00 21:57:59_11/01 newquad 19759 1 INFINITY 00:00:00 21:57:59_11/01 torque 19767 5 INFINITY 00:00:00 21:57:59_11/01

65 Why Won't My Job Run?

There are a number of reasons why your job may not run immediately, even if there appears to be sufficient resources for it to run: • Other users' jobs may have be assigned higher priority than your job, depending on what you're asking for. Especially in cases with small processor- count, long-running jobs, the scheduler may not be able to backfill the smaller job without interfering with a higher priority job's start time.

• There may be downtime or other system reservations in place. These will often be noted in the system's message of the day (/etc/motd) and/or the OSC “Notices” web page (http://www.osc.edu/supercomputing/notices).

• You or your group may be at the maximum CPU count or running job count for a user or group. These are generally set up such that a single user can run 128 jobs and/or use 2048 processors at a time.

66 Reporting Batch Problems to OSC Help

If you are having a problem with the batch system on any of OSC's machines, you should send email your problems to [email protected]. Including the following information will aid OSC's Science and Technology Support (STS) staff in diagnosing your problem quickly:

• Name and telephone number • User ID (username) • Home institution • Name of the system you are using (BALE cluster, Opteron cluster) • Job ID • Job script • Job output and/or error messages (preferably in context).

67