Grid Engine Users's Guide
Total Page:16
File Type:pdf, Size:1020Kb
Univa Corporation Grid Engine Documentation Grid Engine Users’s Guide Author: Version: Univa Engineering 8.5.3 July 19, 2017 Copyright ©2012–2017 Univa Corporation. All rights reserved. Contents Contents 1 Overview of Basic User Tasks1 2 A Simple Workflow Example1 3 Displaying Univa Grid Engine Status Information5 3.1 Cluster Overview....................................5 3.2 Hosts and Queues...................................5 3.3 Requestable Resources.................................8 3.4 User Access Permissions and Affiliations....................... 10 4 Submitting Batch Jobs 13 4.1 What is a Batch Job?................................. 13 4.2 How to Submit a Batch Job.............................. 13 4.2.1 Example 1: A Simple Batch Job....................... 13 4.2.2 Example 2: An Advanced Batch Job..................... 14 4.2.3 Example 3: Another Advanced Batch Job.................. 14 4.2.4 Example 4: A Simple Binary Job....................... 15 4.3 Specifying Requirements................................ 15 4.3.1 Request Files.................................. 16 4.3.2 Requests in the Job Script........................... 17 5 Using Job Classes to Prepare Templates for Jobs 17 5.1 Examples Motivating the Use of Job Classes..................... 18 5.2 Defining Job Classes.................................. 19 5.2.1 Attributes describing a Job Class....................... 20 5.2.2 Example 1: Job Classes - Identity, Ownership, Access........... 22 5.2.3 Attributes to Form a Job Template...................... 22 5.2.4 Example 2: Job Classes - Job Template................... 25 5.2.5 Access Specifiers to Allow Deviation..................... 26 5.2.6 Example 3: Job Classes - Access Specifiers................. 28 5.2.7 Different Variants of the same Job Class................... 29 5.2.8 Example 4: Job Classes - Multiple Variants................. 30 5.2.9 Enforcing Cluster Wide Requests with the Template Job Class...... 31 Grid Engine Users’s Guide v 8.5.3 i Contents 5.3 Relationship Between Job Classes and Other Objects................ 33 5.3.1 Resources Available for Job Classes...................... 33 5.3.2 Defining Job Class Limits........................... 34 5.3.3 JSV and Job Class Interaction........................ 34 5.4 Commands to Adjust Job Classes........................... 35 5.4.1 Creating, Modifying and Deleting Job Classes................ 35 5.4.2 States of Job Classes.............................. 36 5.5 Using Job Classes to Submit New Jobs........................ 37 5.6 Example: Submit a Job Class Job and Adjust Some Parameters......... 38 5.7 Status of Job Classes and Corresponding Jobs.................... 39 6 Monitoring and Controlling Jobs 40 6.1 Getting Status Information on Jobs......................... 40 6.2 Deleting a Job..................................... 41 6.3 Re-queuing a Job.................................... 42 6.4 Modifying a Waiting Job................................ 43 6.4.1 Altering Job Requirements.......................... 43 6.5 Changing Job Priority................................. 44 6.6 Obtaining the Job History............................... 45 7 Other Job Types 46 7.1 Array Jobs....................................... 46 7.2 Interactive Jobs..................................... 48 7.2.1 qrsh and qlogin................................. 49 7.2.2 qtcsh....................................... 49 7.2.3 qmake...................................... 50 7.2.4 qsh........................................ 51 7.3 Parallel Jobs...................................... 51 7.3.1 Parallel Environments............................. 51 7.3.2 Submitting Parallel Jobs............................ 53 7.3.3 Parallel Jobs and Core Binding........................ 54 7.4 Jobs with Core Binding................................ 55 7.4.1 Showing Execution Host Topology Related Information.......... 56 7.4.2 Requesting Execution Hosts Based on the Architecture.......... 57 Grid Engine Users’s Guide v 8.5.3 ii Contents 7.4.3 Requesting Specific Cores........................... 58 7.5 NUMA Aware Jobs: Jobs with Memory Binding and Enhanced Memory Management 58 7.5.1 Memory Allocation Strategy round_robin.................. 60 7.5.2 Memory Allocation Strategy cores and cores:strict............. 61 7.5.3 Memory Allocation Strategy nlocal...................... 63 7.6 Checkpointing Jobs................................... 64 7.6.1 User-Level Checkpointing........................... 65 7.6.2 Kernel-Level Checkpointing.......................... 65 7.6.3 Checkpointing Environments......................... 65 7.6.4 Submitting a Checkpointing Job....................... 66 7.7 Immediate Jobs..................................... 66 7.8 Reservations....................................... 67 7.8.1 Advance Reservations............................. 67 7.8.2 Standing Reservations............................. 69 7.9 Jobs using Docker Containers............................. 75 7.9.1 Running a sequential job in a Docker container............... 76 7.9.2 Running a parallel Job in Docker containers................. 79 7.9.3 Running an array Job in Docker containers................. 79 7.9.4 Running a Job in a Docker image that is not available locally....... 79 7.9.5 Using placeholders to dynamically define Docker options.......... 79 8 Getting a Consistent View onto the System by Using Sessions 80 8.1 Communication with Univa Grid Engine without using Sessions.......... 81 8.2 Using sessions to communicate with the system................... 81 9 Submission, Monitoring and Control via an API 83 9.1 The Distributed Resource Management Application API (DRMAA)....... 83 9.2 Basic DRMAA Concepts................................ 84 9.3 Supported DRMAA Versions and Language Bindings................ 84 9.4 When to Use DRMAA................................. 84 9.5 Examples........................................ 84 9.5.1 Building a DRMAA Application with C**.................. 84 9.5.2 Building a DRMAA Application with Java................. 87 9.6 Further Information.................................. 89 Grid Engine Users’s Guide v 8.5.3 iii Contents 10 Advanced Concepts 89 10.1 Job Dependencies.................................... 90 10.1.1 Examples.................................... 90 10.2 Using Environment Variables............................. 92 10.3 Using the Job Context................................. 95 10.4 Transferring Data.................................... 95 10.4.1 Transferring Data within the Job Script................... 96 10.4.2 Using Delegated File Staging in DRMAA Applications........... 96 10.5 Manual, Semi-Automatic and Automatic Preemption................ 97 10.5.1 Preemption Terms............................... 98 10.5.2 Preemption Trigger and Actions....................... 98 10.5.3 Manual Preemption.............................. 100 10.5.4 Preemption Configuration........................... 101 10.5.5 Preemption in Combination with License Orchestrator........... 102 10.5.6 Common Use Cases.............................. 102 11 Submitting Jobs from or to Windows hosts 104 11.1 Job submission from a Windows submit host to a Windows execution host.... 105 11.1.1 Running Jobs in the foreground........................ 106 11.2 Job submission from an UNIX submit host to a Windows execution host..... 107 11.3 Job submission from a Windows submit host to an UNIX execution host..... 107 Grid Engine Users’s Guide v 8.5.3 iv 2 A Simple Workflow Example 1 Overview of Basic User Tasks Univa Grid Engine offers the following basic commands, tools and activities to accomplish common user tasks in the cluster: Task Command submit jobs qsub, qresub, qrsh, qlogin, qsh, qmake, qtcsh check job status qstat modify jobs qalter, qhold, qrls delete jobs qdel check job accounting after job end qacct display cluster state qstat, qhost, qselect, qquota display cluster configuration qconf Table 1: Basic tasks and their corresponding commands Note qsh is not available on Microsoft Windows submit hosts and a qsh cannot be submitted to Windows execution hosts. The next sections provide detailed descriptions of how to use these commands in a Univa Grid Engine cluster. 2 A Simple Workflow Example Using Univa Grid Engine from the command line requires sourcing the settings file to set all nec- essary environment variables. The settings file is located in the <UGE installation path>/<UGE cell>/common directory. This directory contains two settings files for Unix: settings.sh for Bourne shell, bash and compatible shells, and settings.csh for csh and tcsh. If a Windows execution, submit or admin host is part of the Univa Grid Engine cluster, there is also a settings.bat for the Windows console (also known as cmd.exe window). For simplicity, this document refers to the <UGE installation path> as $SGE_ROOT and the <UGE_CELL> as $SGE_CELL. Both environment variables are set when the settings file is sourced. Source the settings file. Choose one of the following commands to execute based on the shell type in use. Bourne shell/bash: # . $SGE_ROOT/$SGE_CELL/common/settings.sh csh/tcsh: Grid Engine Users’s Guide v 8.5.3 1 2 A Simple Workflow Example # source $SGE_ROOT/$SGE_CELL/common/settings.csh Windows console: > %SGE_ROOT%\%SGE_CELL%\common\settings.bat Now that the shell is set up to work with Univa Grid Engine, it is possible to check which hosts are available in the cluster by running the qhost command. Sample qhost output: # qhost HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS --------------------------------------------------------------------- global - - - - - - - kailua