Table of Contents Running Jobs with PBS

Table of Contents Running Jobs with PBS

Table of Contents Running Jobs with PBS...........................................................................................1 Portable Batch System (PBS): Overview............................................................................1 How PBS Schedules Jobs....................................................................................................2 Commonly Used PBS Commands.......................................................................................5 Commonly Used qsub Command Options..........................................................................7 PBS on Aitken........................................................................................................9 Preparing to Run on Aitken Rome Nodes...........................................................................9 Preparing to Run on Aitken Cascade Lake Nodes............................................................12 PBS Resource Request Examples....................................................................................14 PBS Job Queue Structure.................................................................................................17 PBS on Electra.....................................................................................................20 Preparing to Run on Electra Skylake Nodes.....................................................................20 Preparing to Run on Electra Broadwell Nodes.................................................................22 Running Jobs Before Dedicated Time...............................................................................24 Checking the Time Remaining in a PBS Job from a Fortran Code....................................25 Releasing Idle Nodes from Running Jobs.........................................................................26 PBS Resource Request Examples....................................................................................28 Running "At-Scale" Jobs in the sky_wide Queue..............................................................31 PBS Job Queue Structure.................................................................................................34 PBS on Pleiades...................................................................................................37 Preparing to Run on Pleiades Broadwell Nodes...............................................................37 Preparing to Run on Pleiades Haswell Nodes...................................................................40 Preparing to Run on Pleiades Ivy Bridge Nodes...............................................................43 Preparing to Run on Pleiades Sandy Bridge Nodes..........................................................45 Sample PBS Script for Pleiades........................................................................................47 Running Jobs Before Dedicated Time...............................................................................48 Checking the Time Remaining in a PBS Job from a Fortran Code....................................49 Releasing Idle Nodes from Running Jobs.........................................................................50 Selecting a Non-Default Compute Image for Your Job.....................................................52 PBS Resource Request Examples....................................................................................53 Pleiades devel Queue......................................................................................................56 PBS Job Queue Structure.................................................................................................59 PBS on Endeavour................................................................................................62 Preparing to Run on Endeavour.......................................................................................62 Endeavour PBS Server and Queue Structure...................................................................66 Running Jobs Before Dedicated Time...............................................................................68 Checking the Time Remaining in a PBS Job from a Fortran Code....................................69 Selecting a Non-Default Compute Image for Your Job.....................................................70 PBS on GPU Nodes...............................................................................................71 Changes to PBS Job Requests for V100 GPU Resources...................................................71 Using GPU Nodes.............................................................................................................77 PBS Job Queue Structure.................................................................................................80 Requesting GPU Resources..............................................................................................83 Managing Jobs.....................................................................................................85 Reserving a Dedicated Compute Node............................................................................85 Running Jobs Before Dedicated Time...............................................................................89 Checking the Time Remaining in a PBS Job from a Fortran Code....................................90 Releasing Idle Nodes from Running Jobs.........................................................................91 Table of Contents Monitoring with myNAS........................................................................................93 Installing the myNAS Mobile App.....................................................................................93 Using the myNAS Mobile App..........................................................................................95 Authenticating to NASA's Access Launchpad...................................................................99 Using the myNAS Web Portal.........................................................................................101 PBS Reference...................................................................................................103 PBS Environment Variables............................................................................................103 Default Variables Set by PBS.........................................................................................104 Optimizing/Troubleshooting................................................................................105 Common Reasons Why Jobs Won't Start........................................................................105 Using pdsh_gdb for Debugging PBS Jobs.......................................................................109 Lustre Filesystem Statistics in PBS Output File..............................................................110 Adjusting Resource Limits for Login Sessions and PBS Jobs...........................................111 PBS exit codes...............................................................................................................114 How to Run a Graphics Application in a PBS Batch Job..................................................115 MPT Startup Failures: Workarounds...............................................................................117 Checking if a PBS Job was Killed by the OOM Killer........................................................119 Common Reasons for Being Unable to Submit Jobs.......................................................120 Avoiding Job Failure from Overfilling /PBS/spool............................................................122 Packaging Multiple Runs................................................................................................124 Using HPE MPT to Run Multiple Serial Jobs...............................................................124 Using GNU Parallel to Package Multiple Jobs in a Single PBS Job.............................127 Using Job Arrays to Group Related Jobs...................................................................130 Running Multiple Small-Rank MPI Jobs with GNU Parallel and PBS Job Arrays..........132 Managing Memory.........................................................................................................134 Checking and Managing Page Cache Usage............................................................134 Using gm.x to Check Memory Usage........................................................................136 Using vnuma to Check Memory Usage and Non-Local Memory Access...................138 Using qsh.pl to Check Memory Usage......................................................................143 Using qtop.pl to Check Memory Usage....................................................................144 Using qps to Check Memory Usage..........................................................................145 Memory Usage Overview.........................................................................................146 How to Get More Memory for your PBS Job..............................................................149 Maximizing Use of Resources........................................................................................152 Checkpointing and Restart.......................................................................................152 Process/Thread Pinning..................................................................................................153 Instrumenting Your Fortran Code to Check Process/Thread Placement...................153

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    182 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us