<<

Research Computing and CyberInfrastructure team

! Working with the Shell on the CWRU HPC 31 January 2019 7 February 2019 ! Presenter Emily Dragowsky

KSL Data Center

RCCI Team: Roger Bielefeld, Mike Warfe, Hadrian Djohari! Brian Christian, Emily Dragowsky, Jeremy Fondran, Frye,! Sanjaya Gajurel, Matt Garvey, Theresa Griegger, Cindy Martin, ! Sean Maxwell, Jeno Mozes, Nasir Yilmaz, Lee Zickel Preface: Prepare your environment • User account ! # .bashrc ## cli essentials ! if [ -t 1 ] then bind '"\e[A": -search-backward' bind '"\e[B": history-search-forward' bind '"\eOA": history-search-backward' bind '"\eOB": history-search-forward' fi ! This is too useful to pass up! Working with Linux

• Preamble • Intro Session Linux Review: Finding our way • Files & Directories: Sample work flow • Shell Commands • Pipes & Redirection • Scripting Foundations • Shell & environment variables • Control Structures • Regular expressions & text manipulation • Recap & Look Ahead Rider Cluster Components

! rider.case.edu ondemand.case.edu University Firewall

! Admin Nodes SLURM Science Nodes Master DMZ Resource ! Data Manager Transfer Disk Storage Nodes

Batch nodes GPU nodes SMP nodes Running a job: where is it?

slide from Hadrian Djohari Storing Data on the HPC

table from Nasir Yilmaz How data moves across campus

• Buildings on campus are each assigned to a zone. Data connections go from every building to the distribution switch the center of the zone and from there to the data centers at Kelvin Smith Library and Crawford Hall.

slide from Cindy Martin Questions about HPC context?

!

slide from Cindy Martin Sample Work Flow

• Move Data to the Cluster • Review the “SLURM” script • Resource allocation directives • Data management • Submit the script to the Scheduler • Monitor the job • View details with scheduler tools • Review the data locations • Assess Resource Usage Sample Work Flow

• Working in a collective space • /home/ : be aware of other users in your group(s) • /scratch/pbsjobs/job..hpc : short-term storage for jobs • Preparing scripts is fundamental skill • aids in documentation and reproducibility • scripts for job submission • scripts to manage and evaluate data • Disk usage & Quota management very important Commands in the Shell

• Familiar Commands • the necessary: , , , , . • the useful: , , unique, , date, [] • the powerful: find, , , • “Built-In” Shell Commands , bg, bind, break, builtin, caller, cd, command, compgen, complete, compopt, continue, declare, dirs, , , enable, eval, exec, , export, false, , fg, , , , history, jobs, , let, local, logout, mapfile, popd, , pushd, pwd, read, readonly, return, set, shift, shopt, source, suspend, , times, trap, true, , typeset, ulimit, , unalias, unset, Commands beyond the shell • Content • — bring file contents to standard output • — text buffer viewer (page forward/backward, search) • head/ — bring ‘head’ or ‘tail’ of a file to stdout • — print the number of words or lines in a text buffer • HPC Utilities • quotachk (-grp/ -prj) — check disk usage for account, group or project • remaining — determine remaining until maintenance • si — cluster status script Running Commands • Command Line: standard input & output streams • input: 0 < [filename] • stout: 1 >, >> [filename] • sterr: 2 >, >> [filename] ; 2>&1 redirect stderr to stdout

Standard monitor Output files Standard Input keyboard & program mouse ls, grep, awk, sed monitors Standard files Error Denotes mem use Redirection Examples - I

• Apply Redirection to animals.txt $ cat animals.txt • Comma-separated alphanumeric 2012-11-05,deer • 3 i/o streams: input, output & error 2012-11-05,rabbit 2012-11-05,raccoon • Redirection standard input to a file(s) 2012-11-06,rabbit 2012-11-06,deer • Redirect output & error 2012-11-06,fox ➡ a file 2012-11-07,rabbit 2012-11-07,bear ➡ Act upon file contents, redirect to another stream (file or command) ➡ to the same stream (e.g. 2>&1) Redirection Examples - II

• Select lines with head, tail • 2012-11-05,deer Replace content using > 2012-11-05,rabbit ➡ head -n 3 animals.txt > anHead.txt 2012-11-05,raccoon

➡ head -n 2 animals.txt > anHead.txt 2012-11-05,deer 2012-11-05,rabbit !

• Append content using >> 2012-11-07,rabbit ➡ 2012-11-07,bear 2012-11-07,rabbit tail -n 2 animals.txt > anTail.txt 2012-11-07,bear ➡ 2012-11-06,fox tail -n 3 animals.txt >> anTail.txt 2012-11-07,rabbit 2012-11-07,bear Redirection Examples - III

• Input Redirection < • yachats:data-shell drago$ wc -l animals.txt • 8 animals.txt • yachats:data-shell drago$ wc -l < animals.txt • 8

! • ‘as if’ command line input • [jump to command line] Pipes

• Pipes: Workflows using input/output redirection • command sequences to refine programatic output

Standard Standard • monitor Output Standard Output • files Input program program keyboard & ls, grep, awk, ls, grep, awk, mouse sed sed • monitors Standard Standard • files Error Error Read file Denotes mem use Pipe (Text) Buffers

Standard Standard Output Input program program ls, grep, awk, ls, grep, awk, sed sed

Denotes Standard mem use Error • A text file is stored on disk • A text buffer is defined in memory • Piping command input/output improves efficiency • Standard buffer size may be modified to “tune” process Working with Pipes

• Pipes: Workflows using input/output redirection • command sequences to refine programatic output • symbol ‘|’ designates a pipe be applied to cmd output • cmd1 | cmd2 • cat animals.txt | sort -k1,12 # field 1,12th character • cat animals.txt | sort -t , -k2 -k1 # set delimiter ‘,’; primary key, field 2; secondary key, field 1 • cat sunspot.txt | awk ‘NR > 3 {print}' | sort -t ' ' -k4n -k3n | less # numeric sort on each key • cat animals.txt | cut -d ‘,’ -f 2 • cat animals.txt | cut -d ‘,’ -f 2 | sort • cat animals.txt | cut -d ‘,’ -f 2 | sort | - Working with Pipes - II • cat sunspot.txt: (* Month: 1749 01 *) 58 (* Month: 1749 02 *) 63 ! (* Month: 1749 03 *) 70 (* Month: 1749 04 *) 56 (* Month: 1749 05 *) 85 ! (* Month: 1749 06 *) 84 (* Month: 1749 07 *) 95 ! (* Month: 1749 08 *) 66 (* Month: 1749 09 *) 76 ! (* Month: 1749 10 *) 76 • cat sunspot.txt | awk ‘NR > 3 {print}' | sort -t ' ' -k4n -k3n | less # numeric sort on each key Numeric Sort Alphabetic Sort (applies to complete line) (* Month: 1749 01 *) 58 (* Month: 1754 01 *) 0 (* Month: 1750 01 *) 73 (* Month: 1808 01 *) 0 (* Month: 1751 01 *) 70 (* Month: 1810 01 *) 0 (* Month: 1752 01 *) 35 (* Month: 1811 01 *) 0 (* Month: 1753 01 *) 44 (* Month: 1813 01 *) 0 (* Month: 1754 01 *) 0 (* Month: 1822 01 *) 0 (* Month: 1755 01 *) 10 (* Month: 1823 01 *) 0 (* Month: 1756 01 *) 13 (* Month: 1867 01 *) 0 (* Month: 1757 01 *) 14 (* Month: 1901 01 *) 0 (* Month: 1758 01 *) 38 (* Month: 1912 01 *) 0 Pipes Challenge

• Challenge: extract 10 common cmds from History • commands to use? Pipes Challenge

• Challenge: extract 10 most common cmds from History ➡ history — displays from .history file ➡ — translate or delete characters ➡ cut — remove sections from each line ➡ sort — sort lines of text input ➡ uniq — report or omit repeated lines ➡ tail — show last lines of input Files Manipulation

• The History command — history • history [OPTION]... [FILE]… • Show command history, default location ~/.bash_history • options/settings/notes • !! — repeat the previous command • !n — where n is a command number, repeat the nth command • use the up/down-arrows, and the tab extensions to navigate freely • export HISTFILESIZE=n change the history line limit to n Files Manipulation

• The Translate command — tr • tr [OPTION]... SET1 [SET2] • translate or delete characters • options • - s : (replace multiples from SET1 with single instance) • - d : delete characters in SET1 • e.g. tr -s ‘ ‘ says to replace multiple white space with single .s. Files Manipulation

• The Cut command — cut

• cut [OPTION]... [FILE]… • Print selected parts of lines from each FILE to standard output. • - d : delimiter (default is TAB) • - f : select these fields ➡ M-N range selection (Mth through Nth field) ✤ M : this field ✤ -N : first field through the Nth field Files Manipulation

• The Sort command — sort

• sort [OPTION]... [FILE]… • sort lines of text files (or buffers). default is alphabetic sort. • options

• - k [KEYDEF]: the “key” on which to sort ➡ KEYDEF: field number and character-position, + options ➡ e.g. F.[C][OPTS] • - n, h : numerical, human-readable sort Files Manipulation

• The Unique command — uniq • uniq [OPTION]... [INPUT] [OUTPUT]] • filter adjacent matching lines from INPUT, writing to OUTPUT • options • - c : prefix lines by number of occurances • - u : only print the unique lines • - d : print the duplicate lines only Pipes Challenge

• Challenge: extract 10 most common cmds from History ➡ history — displays from .history file ➡ tr — translate or delete characters ➡ cut — remove sections from each line ➡ sort — sort lines of text input ➡ uniq — report or omit repeated lines ➡ tail — show last lines of input

history | tr -s ' ' | cut -d ' ' -f 3- | sort | uniq -c | sort -n | tail Advanced: Managing Selective Output

• Example: Preserve header for cmd/file analysis ➡ Examine account usage with sacct ➡ Sort on elapsed time of job • sacct formatted output (slurm.schedmd.com/sacct.html) ! sacct -a -A -o | awk! ! • pipes in awk: awk ‘ (condition 1) { cmds }; (condition 2) { cmds} ‘

! sacct -a -A -o JobId,Elapsed,AveRSS | awk ‘ NR < 3 {print}; NR > 2 { print | “sort -k2b -n” } ‘ Programming

• Shell redirection much like structured programming • keep focus on ‘simple’ scripts • achieve more complex operations through combination • Scripts may take inputs when called • command line • reading data from files • Outputs directed to monitor, files or other programs • Portability should be a design criteria Shell Parameters & Variables • Parameters • Positional Parameters — $1, $2, … • Special *@#0$?_!- Parameters • Variables • naming: letters, numbers, ‘_’ • types: string by default • array types: indexed (1-d); associative (key : value) Positional Parameters

• Command line inputs • arguments passed to a script: $0, $1, … • $0 is the command name • two digit character parameters need brackets: ${10}, ${11}, … • The set of parameters: $@ or $* • We’ll look at a script that wants more than 10 pos-params: pp.sh Positional Parameters

• pp.sh 1 #!/bin/ 2 3 # Call this script with at least 10 parameters, for example 4 # ./scriptname 1 2 3 4 5 6 7 8 9 10 5 MINPARAMS=10 6 7 echo 8 9 echo "The name of this script is \"$0\"." 10 # Adds ./ for current 11 echo "The name of this script is \"` $0`\"." 12 # Strips out name info (see ‘basename') ! • The parameters are defined by the shell, and available to the script at run time Variables

• Declarations • declare -i icnt #integer variable • declare -a FEATURES #array • declare - constants #read-only values • declare -f stage1 #function • References • $icnt • ${FEATURES}, ${FEATURES[1]} Control Structures if-then-else

• Test a condition, choose instructions by result • if [ condition expression ] fi • example, test whether variable string is empty if [ -n "$1" ]; then # Tested variable is quoted. echo "Parameter #1 is $1" else echo “Parameter #1 is an empty string" fi Control Structures if-then-elseif-else

• Test a condition, choose instructions by result • if [ condition expression ] fi • example, test whether non-empty string is gt, lt or =0 if [ “$1” -gt 0 ]; then echo “$1 > 0” elif [ “$1” -lt 0 ]; then echo “$1 < 0” else echo “$1 = 0” fi • there’s lot’s of help online to learn the syntax for bash scripting While Loop

• perform commands while condition true while [ condition ] do commands … done • A closed structure — updates to conditional Required • x=$(( $x + 1 )) • counter=$(( $counter - 1 )) While Loop factorial example #!/bin/bash counter=$1 factorial=1 while [ $counter -gt 0 ] do factorial=$(( $factorial * $counter )) counter=$(( $counter - 1 )) done echo $factorial While, using Input Redirection

• # Use $IFS to a line of input to "read", if you do not want the default to be whitespace. while IFS=: read name uid gid fullname ignore do echo "$name ($fullname)" done

$IFS is the Internal Field Separator variable For Loop derived loop variables #!/bin/bash echo "Bash version ${BASH_VERSION}…" for i in {0..10..2} do echo "Welcome $i times" done

! many alternate conditions, e.g. $( start incr end) HPC Resource View - I

• Script formatting command outputs into HTTP • https://sites.google.com/a/case.edu/hpc-upgraded- cluster/servers-and-storage/cluster-resources • Contents of interest • declare -a rng_beg rng_end • sinfocmd=`which sinfo` # systems may have different paths • control structures looping over external inputs • SLURM sinfo: sinfo -O nodehost,features • pipes & text manipulation HPC Resource View - II

# Declare arrays to collect nodename ranges for each feature set declare -a rng_beg rng_end ! # Determine host-specific path to sinfo sinfocmd=`which sinfo` ! # Obtain current cluster configuration, emphasizing the partition # structure & features to specify resources # Using sinfo command: sinfo -O partition,features,cpus,memory,nodeaiot - S -partition ! /usr/bin/sinfo -O partition,features,cpus,memory,nodeaiot > $SINFOFILE cp $SINFOFILE $SINFILE ! HPC Resource View - III

# Determine node name ranges for various feature sets: # What is the set of features? FEATURES="$($sinfocmd -O features)" while IFS=' ' read -ra THING; do for i in "${THING[@]}"; do if [[ $i != 'AVAIL_FEATURES' ]] ; then rng_beg[$i]="$($sinfocmd -O nodehost,features | awk "/$i/" | cut -d ' ' -f 1 | sort | head -n 1 )" rng_end[$i]="$($sinfocmd -O nodehost,features | awk "/$i/" | cut -d ' ' -f 1 | sed 's/[^0-9]*//' \ | sort | tail -n 1 )" sed "s/$i/$i ${rng_beg[$i]}--${rng_end[$i]}/" $SINFILE > $SOUTFILE; cp $SOUTFILE $SINFILE else sed "s/$i/$i Node_Range/" $SINFILE > $SOUTFILE; cp $SOUTFILE $SINFILE fi done done <<< "$FEATURES"

Recap

• Organize around a work flow • Redirection, Piping and output manipulation (e.g. awk) enhance efficiency in shell • Bash Programming: Introduced simple structures, allows reproducibility and efficiency References

• Advanced Bash Script Guide: http://tldp.org/LDP/abs/html/index.html • Software Carpentry: http://swcarpentry.github.io/shell-novice

RCCI Team: Roger Bielefeld, Mike Warfe, Hadrian Djohari! Brian Christian, Emily Dragowsky, Jeremy Fondran, Cal Frye,! Sanjaya Gajurel, Matt Garvey, Theresa Griegger, Cindy Martin, ! Sean Maxwell, Jeno Mozes, Nasir Yilmaz, Lee Zickel