Table of Contents Running Jobs with PBS

Table of Contents Running Jobs with PBS...........................................................................................1 Portable Batch System (PBS): Overview............................................................................1 How PBS Schedules Jobs....................................................................................................2 Commonly Used PBS Commands.......................................................................................5 Commonly Used qsub Command Options..........................................................................7 PBS on Aitken........................................................................................................9 Preparing to Run on Aitken Rome Nodes...........................................................................9 Preparing to Run on Aitken Cascade Lake Nodes............................................................12 PBS Resource Request Examples....................................................................................14 PBS Job Queue Structure.................................................................................................17 PBS on Electra.....................................................................................................20 Preparing to Run on Electra Skylake Nodes.....................................................................20 Preparing to Run on Electra Broadwell Nodes.................................................................22 Running Jobs Before Dedicated Time...............................................................................24 Checking the Time Remaining in a PBS Job from a Fortran Code....................................25 Releasing Idle Nodes from Running Jobs.........................................................................26 PBS Resource Request Examples....................................................................................28 Running "At-Scale" Jobs in the sky_wide Queue..............................................................31 PBS Job Queue Structure.................................................................................................34 PBS on Pleiades...................................................................................................37 Preparing to Run on Pleiades Broadwell Nodes...............................................................37 Preparing to Run on Pleiades Haswell Nodes...................................................................40 Preparing to Run on Pleiades Ivy Bridge Nodes...............................................................43 Preparing to Run on Pleiades Sandy Bridge Nodes..........................................................45 Sample PBS Script for Pleiades........................................................................................47 Running Jobs Before Dedicated Time...............................................................................48 Checking the Time Remaining in a PBS Job from a Fortran Code....................................49 Releasing Idle Nodes from Running Jobs.........................................................................50 Selecting a Non-Default Compute Image for Your Job.....................................................52 PBS Resource Request Examples....................................................................................53 Pleiades devel Queue......................................................................................................56 PBS Job Queue Structure.................................................................................................59 PBS on Endeavour................................................................................................62 Preparing to Run on Endeavour.......................................................................................62 Endeavour PBS Server and Queue Structure...................................................................66 Running Jobs Before Dedicated Time...............................................................................68 Checking the Time Remaining in a PBS Job from a Fortran Code....................................69 Selecting a Non-Default Compute Image for Your Job.....................................................70 PBS on GPU Nodes...............................................................................................71 Changes to PBS Job Requests for V100 GPU Resources...................................................71 Using GPU Nodes.............................................................................................................77 PBS Job Queue Structure.................................................................................................80 Requesting GPU Resources..............................................................................................83 Managing Jobs.....................................................................................................85 Reserving a Dedicated Compute Node............................................................................85 Running Jobs Before Dedicated Time...............................................................................89 Checking the Time Remaining in a PBS Job from a Fortran Code....................................90 Releasing Idle Nodes from Running Jobs.........................................................................91 Table of Contents Monitoring with myNAS........................................................................................93 Installing the myNAS Mobile App.....................................................................................93 Using the myNAS Mobile App..........................................................................................95 Authenticating to NASA's Access Launchpad...................................................................99 Using the myNAS Web Portal.........................................................................................101 PBS Reference...................................................................................................103 PBS Environment Variables............................................................................................103 Default Variables Set by PBS.........................................................................................104 Optimizing/Troubleshooting................................................................................105 Common Reasons Why Jobs Won't Start........................................................................105 Using pdsh_gdb for Debugging PBS Jobs.......................................................................109 Lustre Filesystem Statistics in PBS Output File..............................................................110 Adjusting Resource Limits for Login Sessions and PBS Jobs...........................................111 PBS exit codes...............................................................................................................114 How to Run a Graphics Application in a PBS Batch Job..................................................115 MPT Startup Failures: Workarounds...............................................................................117 Checking if a PBS Job was Killed by the OOM Killer........................................................119 Common Reasons for Being Unable to Submit Jobs.......................................................120 Avoiding Job Failure from Overfilling /PBS/spool............................................................122 Packaging Multiple Runs................................................................................................124 Using HPE MPT to Run Multiple Serial Jobs...............................................................124 Using GNU Parallel to Package Multiple Jobs in a Single PBS Job.............................127 Using Job Arrays to Group Related Jobs...................................................................130 Running Multiple Small-Rank MPI Jobs with GNU Parallel and PBS Job Arrays..........132 Managing Memory.........................................................................................................134 Checking and Managing Page Cache Usage............................................................134 Using gm.x to Check Memory Usage........................................................................136 Using vnuma to Check Memory Usage and Non-Local Memory Access...................138 Using qsh.pl to Check Memory Usage......................................................................143 Using qtop.pl to Check Memory Usage....................................................................144 Using qps to Check Memory Usage..........................................................................145 Memory Usage Overview.........................................................................................146 How to Get More Memory for your PBS Job..............................................................149 Maximizing Use of Resources........................................................................................152 Checkpointing and Restart.......................................................................................152 Process/Thread Pinning..................................................................................................153 Instrumenting Your Fortran Code to Check Process/Thread Placement...................153

Load more