Research Computing and Cyberinfrastructure Team
Total Page:16
File Type:pdf, Size:1020Kb
Research Computing and CyberInfrastructure team ! Working with the Linux Shell on the CWRU HPC 31 January 2019 7 February 2019 ! Presenter Emily Dragowsky KSL Data Center RCCI Team: Roger Bielefeld, Mike Warfe, Hadrian Djohari! Brian Christian, Emily Dragowsky, Jeremy Fondran, Cal Frye,! Sanjaya Gajurel, Matt Garvey, Theresa Griegger, Cindy Martin, ! Sean Maxwell, Jeno Mozes, Nasir Yilmaz, Lee Zickel Preface: Prepare your environment • User account ! # .bashrc ## cli essentials ! if [ -t 1 ] then bind '"\e[A": history-search-backward' bind '"\e[B": history-search-forward' bind '"\eOA": history-search-backward' bind '"\eOB": history-search-forward' fi ! This is too useful to pass up! Working with Linux • Preamble • Intro Session Linux Review: Finding our way • Files & Directories: Sample work flow • Shell Commands • Pipes & Redirection • Scripting Foundations • Shell & environment variables • Control Structures • Regular expressions & text manipulation • Recap & Look Ahead Rider Cluster Components ! rider.case.edu ondemand.case.edu University Firewall ! Admin Head Nodes SLURM Science Nodes Master DMZ Resource ! Data Manager Transfer Disk Storage Nodes Batch nodes GPU nodes SMP nodes Running a job: where is it? slide from Hadrian Djohari Storing Data on the HPC table from Nasir Yilmaz How data moves across campus • Buildings on campus are each assigned to a zone. Data connections go from every building to the distribution switch at the center of the zone and from there to the data centers at Kelvin Smith Library and Crawford Hall. slide from Cindy Martin Questions about HPC context? ! slide from Cindy Martin Sample Work Flow • Move Data to the Cluster • Review the “SLURM” script • Resource allocation directives • Data management • Submit the script to the Scheduler • Monitor the job • View details with scheduler tools • Review the data locations • Assess Resource Usage Sample Work Flow • Working in a collective space • /home/<caseid> : be aware of other users in your group(s) • /scratch/pbsjobs/job.<jobid>.hpc : short-term storage for jobs • Preparing scripts is fundamental skill • aids in documentation and reproducibility • scripts for job submission • scripts to manage and evaluate data • Disk usage & Quota management very important Commands in the Shell • Familiar Commands • the necessary: ls, cd, cp, mv, rm. • the useful: pwd, sort, unique, cut, date, [which] • the powerful: find, grep, sed, awk • “Built-In” Shell Commands alias, bg, bind, break, builtin, caller, cd, command, compgen, complete, compopt, continue, declare, dirs, disown, echo, enable, eval, exec, exit, export, false, fc, fg, getopts, hash, help, history, jobs, kill, let, local, logout, mapfile, popd, printf, pushd, pwd, read, readonly, return, set, shift, shopt, source, suspend, test, times, trap, true, type, typeset, ulimit, umask, unalias, unset, wait Commands beyond the shell • File Content • cat — bring file contents to standard output • less — text buffer viewer (page forward/backward, search) • head/tail — bring ‘head’ or ‘tail’ of a file to stdout • wc — print the number of words or lines in a text buffer • HPC Utilities • quotachk (-grp/ -prj) — check disk usage for account, group or project • remaining — determine time remaining until maintenance shutdown • si — cluster status script Running Commands • Command Line: standard input & output streams • input: 0 < [filename] • stout: 1 >, >> [filename] • sterr: 2 >, >> [filename] ; 2>&1 redirect stderr to stdout Standard monitor Output files Standard Input keyboard & program mouse ls, grep, awk, sed monitors Standard files Error Denotes mem use Redirection Examples - I • Apply Redirection to animals.txt $ cat animals.txt • Comma-separated alphanumeric strings 2012-11-05,deer • 3 i/o streams: input, output & error 2012-11-05,rabbit 2012-11-05,raccoon • Redirection standard input to a file(s) 2012-11-06,rabbit 2012-11-06,deer • Redirect output & error 2012-11-06,fox ➡ a file 2012-11-07,rabbit 2012-11-07,bear ➡ Act upon file contents, redirect to another stream (file or command) ➡ to the same stream (e.g. 2>&1) Redirection Examples - II • Select lines with head, tail • 2012-11-05,deer Replace content using > 2012-11-05,rabbit ➡ head -n 3 animals.txt > anHead.txt 2012-11-05,raccoon ➡ head -n 2 animals.txt > anHead.txt 2012-11-05,deer 2012-11-05,rabbit ! • Append content using >> 2012-11-07,rabbit ➡ 2012-11-07,bear 2012-11-07,rabbit tail -n 2 animals.txt > anTail.txt 2012-11-07,bear ➡ 2012-11-06,fox tail -n 3 animals.txt >> anTail.txt 2012-11-07,rabbit 2012-11-07,bear Redirection Examples - III • Input Redirection < • yachats:data-shell drago$ wc -l animals.txt • 8 animals.txt • yachats:data-shell drago$ wc -l < animals.txt • 8 ! • ‘as if’ command line input • [jump to command line] Pipes • Pipes: Workflows using input/output redirection • command sequences to refine programatic output Standard Standard • monitor Output Standard Output • files Input program program keyboard & ls, grep, awk, ls, grep, awk, mouse sed sed • monitors Standard Standard • files Error Error Read file Denotes mem use Pipe (Text) Buffers Standard Standard Output Input program program ls, grep, awk, ls, grep, awk, sed sed Denotes Standard mem use Error • A text file is stored on disk • A text buffer is defined in memory • Piping command input/output improves efficiency • Standard buffer size may be modified to “tune” process Working with Pipes • Pipes: Workflows using input/output redirection • command sequences to refine programatic output • symbol ‘|’ designates a pipe be applied to cmd output • cmd1 | cmd2 • cat animals.txt | sort -k1,12 # field 1,12th character • cat animals.txt | sort -t , -k2 -k1 # set delimiter ‘,’; primary key, field 2; secondary key, field 1 • cat sunspot.txt | awk ‘NR > 3 {print}' | sort -t ' ' -k4n -k3n | less # numeric sort on each key • cat animals.txt | cut -d ‘,’ -f 2 • cat animals.txt | cut -d ‘,’ -f 2 | sort • cat animals.txt | cut -d ‘,’ -f 2 | sort | uniq -c Working with Pipes - II • cat sunspot.txt: (* Month: 1749 01 *) 58 (* Month: 1749 02 *) 63 ! (* Month: 1749 03 *) 70 (* Month: 1749 04 *) 56 (* Month: 1749 05 *) 85 ! (* Month: 1749 06 *) 84 (* Month: 1749 07 *) 95 ! (* Month: 1749 08 *) 66 (* Month: 1749 09 *) 76 ! (* Month: 1749 10 *) 76 • cat sunspot.txt | awk ‘NR > 3 {print}' | sort -t ' ' -k4n -k3n | less # numeric sort on each key Numeric Sort Alphabetic Sort (applies to complete line) (* Month: 1749 01 *) 58 (* Month: 1754 01 *) 0 (* Month: 1750 01 *) 73 (* Month: 1808 01 *) 0 (* Month: 1751 01 *) 70 (* Month: 1810 01 *) 0 (* Month: 1752 01 *) 35 (* Month: 1811 01 *) 0 (* Month: 1753 01 *) 44 (* Month: 1813 01 *) 0 (* Month: 1754 01 *) 0 (* Month: 1822 01 *) 0 (* Month: 1755 01 *) 10 (* Month: 1823 01 *) 0 (* Month: 1756 01 *) 13 (* Month: 1867 01 *) 0 (* Month: 1757 01 *) 14 (* Month: 1901 01 *) 0 (* Month: 1758 01 *) 38 (* Month: 1912 01 *) 0 Pipes Challenge • Challenge: extract 10 most common cmds from History • commands to use? Pipes Challenge • Challenge: extract 10 most common cmds from History ➡ history — displays from .history file ➡ tr — translate or delete characters ➡ cut — remove sections from each line ➡ sort — sort lines of text input ➡ uniq — report or omit repeated lines ➡ tail — show last lines of input Files Manipulation • The History command — history • history [OPTION]... [FILE]… • Show command history, default location ~/.bash_history • options/settings/notes • !! — repeat the previous command • !n — where n is a command number, repeat the nth command • use the up/down-arrows, and the tab extensions to navigate more freely • export HISTFILESIZE=n change the history line limit to n Files Manipulation • The Translate command — tr • tr [OPTION]... SET1 [SET2] • translate or delete characters • options • - s : (replace multiples from SET1 with single instance) • - d : delete characters in SET1 • e.g. tr -s ‘ ‘ says to replace multiple white space with single w.s. Files Manipulation • The Cut command — cut • cut [OPTION]... [FILE]… • Print selected parts of lines from each FILE to standard output. • - d : delimiter (default is TAB) • - f : select these fields ➡ M-N range selection (Mth through Nth field) ✤ M : this field ✤ -N : first field through the Nth field Files Manipulation • The Sort command — sort • sort [OPTION]... [FILE]… • sort lines of text files (or buffers). default is alphabetic sort. • options • - k [KEYDEF]: the “key” on which to sort ➡ KEYDEF: field number and character-position, + options ➡ e.g. F.[C][OPTS] • - n, h : numerical, human-readable sort Files Manipulation • The Unique command — uniq • uniq [OPTION]... [INPUT] [OUTPUT]] • filter adjacent matching lines from INPUT, writing to OUTPUT • options • - c : prefix lines by number of occurances • - u : only print the unique lines • - d : print the duplicate lines only Pipes Challenge • Challenge: extract 10 most common cmds from History ➡ history — displays from .history file ➡ tr — translate or delete characters ➡ cut — remove sections from each line ➡ sort — sort lines of text input ➡ uniq — report or omit repeated lines ➡ tail — show last lines of input history | tr -s ' ' | cut -d ' ' -f 3- | sort | uniq -c | sort -n | tail Advanced: Managing Selective Output • Example: Preserve header info for cmd/file analysis ➡ Examine account usage with sacct ➡ Sort on elapsed time of job • sacct formatted output (slurm.schedmd.com/sacct.html) ! sacct -a -A <account> -o <format list> | awk! ! • pipes in awk: awk ‘ (condition 1) { cmds }; (condition 2) { cmds} ‘ ! sacct -a -A <account> -o JobId,Elapsed,AveRSS | awk ‘ NR < 3 {print};