Intro to the Line J Fass | 19 June 2017 The Philosophy

Small, sharp tools … each of does a well defined job very well.

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 The Terminal

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 The Terminal

jfass@nickel:~$ ssh cabernet ______...... /\ ___ _ \ .... / : / (_) | | \ . | : | __, | | _ ,_ _ _ _ _| _ \______| : | / | |/ \_|/ / | / |/ | |/ | | | | : \___/\_/|_/\_/ |__/ |_/ | |_/|__/|_/ ______|___| | : .genomecenter.ucdavis.edu / \ ; / \/______/ [...]

jfass@cabernet:~$

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 The Terminal

jfass@nickel:~$ ssh cabernet ______...... /\ ___ _ \ .... / : / (_) | | \ . | : | __, | | _ ,_ _ _ _ _| _ \______| : | / | |/ \_|/ / | / |/ | |/ | | | | : \___/\_/|_/\_/ |__/ |_/ | |_/|__/|_/ ______|___| | : .genomecenter.ucdavis.edu / \ ; / \/______/ [...]

jfass@cabernet:~$ -l

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 The Terminal

jfass@cabernet:~$ -l or -k … Clears terminal,

See also: reset

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 The Terminal

jfass@cabernet:~$ <-- prompt (includes $ and one space after)

Huge # of possible configurations; in this case:

@:$

( = present working )

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Don’t work on node!

jfass@cabernet:~$ srun -t 1440 -n 1 --mem 8000 This logs you into an --reservation workshop --pty /bin/ interactive session on one of the cluster nodes, so you srun: job 5461103 queued and waiting for resources don’t all play on the head node. srun: error: Lookup failed: Unknown on this later. Just log into cabernet, then enter the srun: job 5461103 has been allocated resources above and press .

jfass@c4-1:~$ From now on, I’ll omit “jfass@hostname:” for brevity!

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics

~$ Follow command with

~$ pwd E.g. “pwd” … lists your present /home/jfass

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Getting Out!

~$ command fubar : escape from entering a command … ~$ # looks like this: (notice: “#” prevent ~$ command fubar ^C interpretation of text that follows) ~$

a running command ~$ 1000 (“sleep” actively counts off the specified number of ^C seconds before letting you do anything else) ~$

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Getting Out!

~$ … escape from some interactive sessions (R, [...] python, …)

> (R is a powerful data-centered, statistical Save workspace image? [y/n/c]: n language)

~$

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Getting Out!

~$ | more q = quit:

[...] Escape from paginators! (, man, etc.) (“yes” says “y” until killed ~$ … it’s a dinosaur)

(“|” is the pipe character …

we’ll explore it more soon)

(“more” shows you a page of text, then waits for you to hit to show another, or with “q”)

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Getting Out!

~$ exit “exit” kills the current shell: the program that’s interpreting your commands for the .

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Where Am I?

~$ list in the pwd

[...]

~$ pwd present working directory

[...]

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Options!

~$ ls -R list recursively

[...] What did I just do???

[...]

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Read The Manual (RTM)!

~$ man ls man consults the manual that exists for basic, [...] OS commands. Any software author can a “ for their software, but scientific software authors [...] don’t.

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Options, options, options!

~$ ls -l man ls …

~$ ls -a

~$ ls -l -a Can combine single letter options … ~$ ls -la

~$ ls -ltrha list all files (in pwd), in long , in reverse order with human readable file sizes

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Options, options, options!

~$ “alias” allows you to define your own commands. “alias” by [...] itself only lists the aliases defined in your bash alias ll='ls -alFtrh' configuration files. Try one on for size … [...]

~$ ll

[...]

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Directory Structure

~$ ls -R ‘/’ separates directories

[...] Names can include many characters, but avoid spaces ./R/x86_64-pc---library/3.3/BH/include and other weird stuff.

[...]

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Directory Structure

www.linuxtrainingacademy.com UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - ‘.’ and ‘..’

~$ ls -a “.” = pwd

. “..” = up one level

..

.bash_history Don’t be confused between use of “.” and filenames that start with “.” … the latter [...] are valid filenames, that are just “hidden” unless you use the “ls” command’s “-a” option.

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Absolute/Relative Address

~$ ls /home/jfass/ “.” = pwd

[...] “..” = up one level

~$ ls ./

[same ...]

~$ ls ../jfass/

[same ...]

~$

[yup, same ...]

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Completion

~$ ls /home/jfas will literally save your life. Hours of it. ~$ ls /home/jfass/ A single auto-completes when it’s possible (when only a single possible completion exists).

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Completion

~$ ls /home/j will literally save your life. Hours of it. ~$ ls /home/j Two s in a row will show ~$ ls /home/j you all the possible completions, when there jacob/ [...] wasn’t a single one for the single to use. jagadish/ jagomez/ jwbucha/

[...]

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Completion

~$ ls /h Use it!

~$ ls /home/ out for RSI ...

~$ ls /home/

~$ ls /home/j

~$ ls /home/j

~$ ls /home/jf

~$ ls /home/jfass/

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Create and Destroy

~$ temp Create a directory

~$ temp/ Change directories

~/temp$ “Hello, world!” > first.txt Put text into a file

~/temp$ first.txt Concatenate file to screen

~/temp$ first.txt Remove file

~/temp$ cd ../ Up and out

~$ temp Remove (empty) directory

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Pipe and Redirect

~$ mkdir CLB; cd CLB/ “>” redirects the output from one command to a file, ~/CLB$ echo “first” > .txt instead of the screen.

~/CLB$ echo “second” > test.txt “>” replaces

~/CLB$ cat test.txt

“>>” appends ~/CLB$ echo “third” >> test.txt

~/CLB$ cat test.txt

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Pipe and Redirect

~/CLB$ -c 1-3 test.txt “cut” cuts lines of text.

~/CLB$ cat test.txt | cut -c 1-3 “|” pipes output from one command to be the input of another command.

~/CLB$ cat test.txt > cut -c 1-3 “>” is wrong here … what does this command do?

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Pipe and pipe and pipe …

~/CLB$ cat test.txt Pipes allow us to build up compound operations, ~/CLB$ cat test.txt | cut -c1-3 filtering and changing data as we go. ~/CLB$ cat test.txt | cut -c1-3 | s (“grep” searches for matches to regular expressions … patterns)

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics -

~/CLB$ history Since we often develop long commands through trial and ~/CLB$ history | head error, it helps to see and access what we’ve done. ~/CLB$ history | In “less,” up and down ~/CLB$ history | tail -n 15 arrows, pgUp, pgDn, and “q” to exit. Also, “/pattern” searches for pattern each ~/CLB$ history | less . “n” and “N” for next and previous match, “g” and “G” for beginning and end of file / stream.

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - History

558 cat test.txt | cut -c1-3 | grep s “!#” repeats command # from 559 history your history. 560 history

~/CLB$ !560

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - History Search

~/CLB$ text triggers a recursive search for “text” (reverse-i-search)`first': echo "first" > test.txt from your history. After finding the most recent match that you like, use again to get to an earlier match (and again, and …).

executes the command; left or right arrow fills the command line with the command but allows you to edit it before running it.

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - History Search

~/CLB$ … And, by the way, the up and down arrows take you backwards and forwards through your history of commands. Reach one you like, and start editing.

Also, by the way, and bring you to ~/CLB$ cat text.txt | grep s the beginning and end of your command.

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Editing Commands

~/CLB$ blah blah blah Left arrow to before the last “blah,” then … ~/CLB$ blah blah blah kills text from here ‘til the end of the line. ~/CLB$ blah blah

~/CLB$ blah blah Now, … kills preceding word. ~/CLB$ blah

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Compression

~/CLB$ gzip test.txt Compress big files using “gzip,” “bzip2.” Bzip2 ~/CLB$ file text.txt.gz compresses smaller, but takes longer. ~/CLB$ gunzip test.txt.gz (“file” gives you about ~/CLB$ bzip2 test.txt; bunzip2 test.txt.bz2 the of file you’re looking at)

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Archives

~/CLB$ Large directory trees may be ftp://igenome:[email protected]/PhiX/I compressed as “tarballs” … llumina/RTA/PhiX_Illumina_RTA..gz see “tar.”

~/CLB$ tar -xzvf PhiX_Illumina_RTA.tar.gz Let’s grab one and it.

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Forced Removal

~/CLB$ rm -rf PhiX This is a dangerous one. Remove a file / directory, do it recursively to all sub-directories, and force removal (by-pass confirmation questions).

Caution is warranted. There’s no Trash Bin, and no gauranteed way to deleted files.

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Wildcard Characters

~/CLB$ tar -xzvf PhiX_Illumina_RTA.tar.gz Let’s re-unarchive that tarball, to have something to ~/CLB$ ls PhiX/Illumina/RTA/Sequence/*/*.fa look at.

[...] List all files a few directories down that end in “.fa” …

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Wildcards and

~/CLB$ find . -name “*.fa” “*” can fill in for anything in a filename, except “/” … [...] there’s a more appropriate command to use when you don’t know which directory level the files you’re looking for are at: “find”

“?” is like “*,” except only ~/CLB$ find . -name “*.f?” fills in for a single character.

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Fasta example

~/CLB$ -k Grab a sequence file from the https://bioshare.bioinformatics.ucdavis.edu/bioshare web (“curl” copies from a /download/pdhqicmfgw2bra8/variant.neighborhoods.fa > url) … see also “wget”

regions.fa

” counts words, etc.; “-l” ~/CLB$ wc -l * only gives line count

~/CLB$ grep -c “>” regions.fa “-c” causes grep to count matches instead of printing them

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Fasta example … Pipes!

~/CLB$ grep “>” regions.fa | cut -c 2- Follow each step of these commands … the entire final ~/CLB$ grep ">" regions.fa | cut -c 2- | cut -f1 -d: command counts how many sequences come from each ~/CLB$ grep ">" regions.fa | cut -c 2- | cut -f1 -d: contig. | “sort,” “” are self-explanatory; “uniq -c” ~/CLB$ grep ">" regions.fa | cut -c 2- | cut -f1 -d: adds the counts (must be | sort | uniq -c sorted first), and “sort -rn -k1,1” sorts lines in reverse ~/CLB$ grep ">" regions.fa | cut -c 2- | cut -f1 -d: numerical order based only on | sort | uniq -c | sort -rn -k1,1 the first column (the counts).

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Symbolic Links

~/CLB$ -s “ln -s [target] [ PhiX/Illumina/RTA/Sequence/WholeGenomeFasta/genome.f location/name]” creates a a . “shortcut” to the file.

~/CLB$ ls -ltrha The target file can be deleted, which leaves a broken link. The link can be [...] genome.fa -> deleted without affecting the PhiX/Illumina/RTA/Sequence/WholeGenomeFasta/genome.f file. a Otherwise, addressing the ~/CLB$ grep -c ">" genome.fa link addresses the file.

1

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Bioinformatics At Last?

~/CLB$ bwa mem genome.fa regions.fa > aln.sam Let’s align sequences to a sequence! [E::bwa_idx_load] fail to the index files Woops … what went wrong? ~/CLB$ cat aln.sam I redirected output to a ~/CLB$ file, why didn’t it go there? (Even if I wouldn’t want that kind of stuff to end up in an

alignment file!)

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - STDOUT & STDERR

~/CLB$ bwa mem genome.fa regions.fa 1> aln.sam 2> Programs can write to two aln.err separate output streams: standard out, and standard ~/CLB$ cat aln.sam error. Former mostly used for output, latter mostly used ~/CLB$ cat aln.err for error messages.

(“1>” is equivalent to “>”) [E::bwa_idx_load] fail to locate the index files

Now we can store error messages from many jobs run at once, to separate files.

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Loops

~/CLB$ for i in {1..21}; do echo $i >> a; done For loop, for a defined number of steps. for name in {list}; do

commands

done

~/CLB$ cat a

[...]

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Loops

~/CLB$ grep “>” regions.fa | cut -c2- | while read While loop, for stopping on a header; do echo “contig_$header” >> b; done condition.

while {condition}; do

commands

done

~/CLB$ cat b

[...]

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics -

~/CLB$ paste a b “paste” jams columns of equal size together. See also [...] “” …

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Running in the Background

~/CLB$ sleep 100000 When a long job is running, use “”, then “bg” [...] to suspend it, then run it in the background. If you want to kill it before it finisheds on its own, either ^Z “kill %1” (or %2, etc., if multiple jobs), or “fg,” then . [1]+ Stopped sleep 100000

bg

[1]+ sleep 100000 &

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Running in the Background

~/CLB$ sleep 100000 & To start a job already running in the background, [1] 2166 append “&” to the command.

~/CLB$ kill %1

~/CLB$

[1]+ Terminated sleep 100000

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Running in the Background

~/CLB$ sleep 100000 & To a background job resistant to a lost [1] 2178 connection or terminal problems, use “nohup.” ~/CLB$ nohup: ignoring input and appending output to ‘nohup.out’ “jobs” will list jobs running in the background.

With nohup, you should be able to exit from the shell, ~/CLB$ jobs and the job keeps running.

[1]+ Running nohup sleep 100000 & (Not as useful in a cluster environment?)

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Table of Processes

~/CLB$ top “Top” prints a self-updating table of running processes and system stats. Try “M,” “P,” and “u” then

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Shell Scripts

#!/bin/bash “Nano” is a simple text editor; open it with “nano grep -o . $1 | \ test.sh” and type / in this … your first shell sort | \ script! Save and exit as shown at the bottom, via uniq -c | \ .

sort -rn -k1,1

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Permission Needed

~/CLB$ ll test.sh Though you can run the commands in test.sh in -rw-rw-r-- 1 jfass biocore 79 Jun 18 19:00 test.sh several ways, to really make it a script you need to give ~/CLB$ u+x test.sh yourself permission to execute it. ~/CLB$ ll test.sh Permissions now are to read and write for user and group, -rwxrw-r-- 1 jfass biocore 79 Jun 18 19:00 test.sh* and only read for the world (ignore first dash, then rw- for user, rw- for group, and r-- for world). Add execute for user as shown.

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Permission Needed

~/CLB$ ./test.sh genome.fa Execute by addressing the shell script with its , 1686 T and feed it an argument as 1292 A required in the script. 1253 G Congrats! You’ve now counted 1155 C all characters in the phiX 1 x genomic fasta file. 1 p 1 i 1 h 1 >

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Command Line Basics - Installing (Simple) Software

~/CLB$ cd ../ Simple software installations require “make” … or maybe ~$ mkdir tools “./configure” then “make” … and yield an executable in ~$ cd tools/ the main software directory. Other software can be ~$ git clone https://github.com/lh3/minimap.git massively painful to . to your admins … ~$ cd minimap/

~$ make

~$ ./minimap

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19 Questions … comments … confusion?

UC Davis Genome Center | Bioinformatics Core | J Fass Command Line 2017-06-19