<<

Supercomputers: Queue and Job management HORT 59000 Lecture 6 Instructor: Kranthi Varala / architecture

User1

User2 Server (/ Web/ etc..) User3

User4 When to use

• Need to run hundreds to thousands of similar jobs.

• Need to run a few large jobs quickly.

• Tasks can be divided into smaller portions and run in parallel. Parallelization

• Refers to the ability of dividing a large task into smaller parts that can all be run in parallel.

• E.g., Correlation matrix of 10,000 genes.

• Can be divided into 10,000 jobs where each job works on one gene.

• Need to balance increase in computation power with increase in need to communicate. Parallelization: Ideal vs Real

By Raul654 (Own work) [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons architecture

Storage Compute Node1 Home Compute Node2

Compute Node3 Storage Head Compute Node4 Scratch Compute Node5

Compute Node6

Storage Compute Node7 Compute Node8 Supercomputer architecture

Compute Node1

Compute Node2

Compute Node3 Storage Head Node Compute Node4 Archive Compute Node5

Compute Node6

Compute Node7

Storage Storage Compute Node8 Home Scratch Terminology

• Cluster: Complex of multiple “nodes” + connecting hardware + storage

• Node: Individual in the cluster with its own processors and memory.

• Core: Individual processor on a node. Head Node

• Specialized node handles accounts, logins, with the and network storage.

• Built to lots of user interactions and to run the job management .

• NOT optimized to run compute intensive jobs.

• Typically interacts with archival storage. Compute Node

• Multiple identical nodes heavy computational load and large memory jobs.

• NOT optimized to support user interaction.

• Typically has access to home and scratch space only. Homogeneous vs. Heteregeneous nodes

• Homogeneous : All compute nodes identical and hardware.

• Heterogeneous: Different kinds of compute nodes with same software but large differences in hardware.

• Heterogenous clusters support multiple job types such as compute intensive vs. memory intensive vs. I/O intensive jobs Job Management

• User submits “jobs” to the cluster with specific requests for cores, memory, walltime etc.

• Jobs once submitted will be allocated to the nodes by the job management software.

• Each job runs creates its own login/shell and runs within that.

• Eg., PBS, SGE, SLURM etc. PBS Job Management

• Portable Batch System (PBS) is a very common management system. Used on all Purdue RCAC clusters.

• Each “job” is essentially a shell script that has a series of commands.

• Jobs are allocated based on resources available to the user. PBS resources

• Nodes: No. of nodes requested for the job. • Cores: No. of cores requested for the job. Total number cannot exceed the sum of cores available on all nodes requested. • Memory: Total memory required for the job (estimate). • Wall time: Expected run time for the job. • NOTE: job will be terminated if it exceeds time requested and/or runs out of memory. PBS queues

• Queues are how PBS manages the job submission. • Each queue has a set of properties: No. and/or types of nodes available to it, max. run time, user access permissions, priorities etc. • Each job is submitted by user to specific queue. • When a node that fits user requirement becomes available the job in the queue is run. PBS queues

• Multiple queues may be available to you.

• Get a list of queues by using PBS command qlist

• qstat –a will list the status of the queue Scholar queues

• Three queues available. • Our jobs go to scholar queue by default. PBS environment variables

• Job environment variables are specific to the job and only while the job is running.

• PBS_JOBID : ID assigned to the current job.

• PBS_O_WORKDIR : from which the job was submitted. Example PBS job

#!/bin/sh –l #PBS -l nodes=1:ppn=24 #PBS -l walltime=24:00:00 #PBS -l naccesspolicy=singleuser #PBS -q kvarala cd /scratch/brown/kvarala/EvoNet mpirun -np 24 /home/kvarala/bin/examl-AVX -t RAxML_parsimonyTree.StartingTree -m PSR -s EvoNet.m10.binary -n T1 >ExaML.log Example SGE job

#!/bin/bash #$ -N run_blastp #$ -cwd #$ -pe smp 8 #$ -l h_vmem=4G blastp –a 8 –I infile.fasta –d database.db –o outfile. Example SLURM job

#!/bin/bash #SBATCH --nodes 1 #SBATCH --tasks-per-node 12 #SBATCH -t 4:00:00 #SBATCH --mem 8GB cd /scratch/kv15/subCluster makeblastdb -dbtype 'prot' -in Gymno.Family1 -title GYMFAM1 -out GYMFAM1 -parse_seqids blastp -db GYMFAM1 -query Gymno.Family1 -outfmt 6 -evalue 0.01 - num_threads 12 -out GYMFAM1.blastout Modules

• Pre-installed software that is loaded when needed. • E.g., module load blastall • module list avail shows the list of all modules available. • Only load the modules you need to run the current job.