● Log into the Moodle site

● Enter the “Lecture 11” area (button 12)

● At 14:00, choose “Daily Quiz 9”

● Answer the multiple choice quiz (you have 10 min to finish) Automating the command line interface with scripts: programming basics J. M. P. Alves

Laboratory of Genomics & Bioinformatics in Parasitology Department of Parasitology, ICB, USP Repetitive actions

● It is often the case that you frequently run a certain set of individual commands to perform a bigger task

● For example: maybe you have a large amount or files and you need to run several commands on each and every file

● Of course, being a programmer’s system, a Unix-like system such as makes it easy to automate such tasks

● The automation is performed by saving commands in a file that we usually call a or program

● Except for the simplest (but still very useful) scripts, certain techniques from computer programming can be used

J.M.P. Alves 3 / 53 BMP0260 / ICB5765 / IBI5765 Computer programming

● Obviously, computer programming is a vast subject and we could spend a semester studying it and still not cover most things

● So, in one lecture only, we will see the basics of the basics, which are still very useful for performing some amazing feats on the command-line interface that are impossible (or at least very difficult to achieve) in a graphical shell and/or very boring/error-prone to do manually (in the CLI or otherwise)

● But… what is a computer program anyway?

J.M.P. Alves 4 / 53 BMP0260 / ICB5765 / IBI5765 Computer programming

● A computer program is a set of instructions designed to perform a certain task… using a computer

● Programs are written in formally defined programming languages

● Initially, the program is written by the human as source code; later, it gets translated to machine code (zeros and ones) specific to the kind of computer processor and operating system one is using

J.M.P. Alves 5 / 53 BMP0260 / ICB5765 / IBI5765 Programming languages

● There are tons of programming languages available: Java, C/C++, Python, Perl, Fortran, COBOL, assembly, Ada etc. etc. etc.

● Some are better suited to some circumstances, while others are more efficient in other situations

● But, overall, if a language is Turing-complete (or computationally universal), then it can do whatever any other such language can do

J.M.P. Alves 6 / 53 BMP0260 / ICB5765 / IBI5765 Programming languages

● In practice, people use whatever they already know

● The shell is special in that it is an interface as well as a interpreter

● Huh? Scripting language? Interpreter?

● Well, there are 10 (in binary) kinds of languages…

● Interpreted or compiled (although the distinction can be somewhat arbitrary and fuzzy in some cases…)

J.M.P. Alves 7 / 53 BMP0260 / ICB5765 / IBI5765 Compiled x interpreted

● Compiling is the act of transforming source code into (usually) machine code

● A language like C++, for example, is compiled: one cannot run the program right after writing the source code file –it is first necessary to compile the program, and then run the generated executable file

● A language like Python, on the other hand, is interpreted: one runs the text file with the source code directly, without the need of first generating a compiled file

● Of course, I am simplifying brutally here… But in practice, from the programmer’s point of view, that is a reasonable approximation

J.M.P. Alves 8 / 53 BMP0260 / ICB5765 / IBI5765 The shell language

● As mentioned above, the shell language is interpreted

● Commands are written in a file (called a ) and the shell reads the file and executes each command it finds

● How to write and run a script:

1. Write the script

2.Make the script executable

3.Put the script somewhere the shell can find it

● Steps 2 and 3 are optional, since there are ways to run the script without doing them, but they are a big help if you need to run the script frequently

J.M.P. Alves 9 / 53 BMP0260 / ICB5765 / IBI5765 The shell language

● Pretty much anything that can be run on the CLI can also be run from a shell script

● Besides the regular commands we have learned so far, the shell also uses, among other thing, control constructs and variables

● Script :

#!/bin/ # This is my beautiful script echo 'Hello, World!' # yeah!

● This script follows an ancient tradition of computer programming: the “hello, world” first program

J.M.P. Alves 10 / 53 BMP0260 / ICB5765 / IBI5765 Remember the three steps

● How to write a script:

1. Write the script

2.Make the script executable

3.Put the script somewhere the shell can find it

● Here it goes again: #!/bin/bash # This is my beautiful script echo 'Hello, World!' # yeah!

J.M.P. Alves 11 / 53 BMP0260 / ICB5765 / IBI5765 Remember the three steps

● If you want to run the script without making it executable and moving it elsewhere (usually some directory in the $), you can run:

bash script_0

● Let’s dissect our first script

#!/bin/bash : tells the shell which interpreter to use to execute the script; the #! part works as a magic number and is called shebang; for a Python script this line could be something like #!/usr/bin/python

J.M.P. Alves 12 / 53 BMP0260 / ICB5765 / IBI5765 Remember the three steps

● Lines starting with # (except for the shebang) are comments and are completely ignored; that is why you did not see anything about “beautiful script” in the output of the program

● Comments can appear at the end of a line too, as seen in the last line: echo 'Hello, World!' # yeah! (the comment, as expected, did not appear in the output)

J.M.P. Alves 13 / 53 BMP0260 / ICB5765 / IBI5765 Edit the script

● Add other commands to the script and run it again!

● For example: cd /usr/local/lib ls -l cd ~ echo echo "Disk space report for $HOME:" df -h . echo echo 'Done!'

J.M.P. Alves 14 / 53 BMP0260 / ICB5765 / IBI5765 Variation is a good thing

● Even if all that shell scripts could do was run simple commands, they would still be very useful

● But, as with any programming language, the Bash shell can use variables too

● We have seen variables before: $PATH, $HOME, $USER etc.

● Again, think of variables as little boxes where you can store data to use later

J.M.P. Alves 15 / 53 BMP0260 / ICB5765 / IBI5765 Variation is nice

● Normal shell variables can contain any kind of data, but only one piece (a string) at a time; that is what is called a scalar variable

● To create a new variable, just give it an initial value

● For example:

● my_var="blah blah" : now, there is a variable called $my_var, containing the text “blah blah”

J.M.P. Alves 16 / 53 BMP0260 / ICB5765 / IBI5765 It is good to vary

● Notice that when using the variable, it must start with $

● To create the variable, it must NOT start with $ – a bit confusing, but that’s life

● Example, modifying our first script: #!/bin/bash # This is my beautiful script echo 'Hello, World!' # yeah! var='blah blah' echo '1. And I now say $var' echo "2. And I now say $var"

● Notice that the first echo uses single quotes and the second one uses double quotes!

J.M.P. Alves 17 / 53 BMP0260 / ICB5765 / IBI5765 Important!

● See that the variable was (in one case) expanded and its value used in the command

● Variable expansion only happens with double quotes

● An empty variable expands into nothing (which can lead to errors if, for example, it is used in a command that expects something)

● Variable names can contain letters, digits, and the underscore sign; no other kind of character, including space, is allowed

J.M.P. Alves 18 / 53 BMP0260 / ICB5765 / IBI5765 Important!

● Variable names cannot start with digits though

● It is traditional to use all uppercase letters for variables that are not intended to change their value, i.e., constants

● For example: PI=3.14159265359

J.M.P. Alves 19 / 53 BMP0260 / ICB5765 / IBI5765 More on variables

● Let’s say you have a script with the following commands, which try to rename files by adding “1” or “b” to the end of the names, respectively: mv $file $file1 mv $file $fileb

● The shell has no way, in those commands, to tell that the second instance of $file in each command is the same as the first, i.e., we mean them to be the same variable

● The shell will “think” that $file1 and $fileb are new variables and that will lead to errors

J.M.P. Alves 20 / 53 BMP0260 / ICB5765 / IBI5765 More on variables

● To make a variable name unambiguous, it can be surrounded by {}

● Those commands can be corrected like this:

mv $file ${file}1 mv $file ${file}b

● Now, the shell can tell where the variable name starts and ends

J.M.P. Alves 21 / 53 BMP0260 / ICB5765 / IBI5765 Quiz time!

Go to the course page and choose Quiz 33

J.M.P. Alves 22 / 53 BMP0260 / ICB5765 / IBI5765 Controlling the flow

● Programming is not just about commands and variables though!

● Its true power comes from controlling the flow of the commands

● In the scripts we have played with so far, the flow has been linear, one command executed right after the other, from start to finish

● But any program that is useful will need to make decisions and repeat actions a certain number of times

J.M.P. Alves 23 / 53 BMP0260 / ICB5765 / IBI5765 Controlling the flow

● Flow can be changed in two main ways:

● Branching

● Looping

● In branching, the program chooses which parts to run based on a condition (e.g., is a file a directory? or is x greater than y? did the user type “yes”?)

● In looping, a set of statements is repeated a certain number of times… or forever! The number of repetitions can be fixed or it can depend on a condition

J.M.P. Alves 24 / 53 BMP0260 / ICB5765 / IBI5765 Looping

● An example of looping from everyday life: slicing carrots for soup

● Here is the algorithm: 1)Get cutting board 2)Get knife 3)Place carrot on cutting board a)Lift knife b)Advance carrot c)Cut d)Go to a) unless there is no carrot left 4)End

J.M.P. Alves 25 / 53 BMP0260 / ICB5765 / IBI5765 Looping

● In my everyday practice automating the CLI, I find looping to be of more use

● Again: in looping, a set of statements is repeated a certain number of times… or forever! The number of repetitions can be fixed or it can depend on a condition

● For example, in “pseudocode”: using numbers x = 1 to 10, do: print “Number $x” to the screen create directory called $x inside directory $x, create regular files a$x to z$x

● This little set of instructions will run exactly ten times

● First time, $x will be equal to 1; second, equal to 2; and so on until 10.

J.M.P. Alves 26 / 53 BMP0260 / ICB5765 / IBI5765 The for loop

● The for loop is one of the ways to run a set of statements a number of times

● In its most traditional form, the for loop iterates over a list of strings

● Example, first in “pseudocode” again:

1)Given the words orange, apple, lime, and pineapple

2)Put the next word in variable x

3)Print “Making $x juice.” to standard output

4)Go back to 2 until no more words

● Any list can be used in the for loop, as long as it contains strings (be they numbers, characters, words, file names, whatever)

J.M.P. Alves 27 / 53 BMP0260 / ICB5765 / IBI5765 The for loop

● Now, the same thing in actual Bash: #!/bin/bash for x in apple orange lime pineapple do echo "Making $x juice." done ● As you can see, the structure of the for loop is: for variable in list do … … done ● For clarity and better organization, see that indentation should always be used! Certain languages (Python) force you to do that…

J.M.P. Alves 28 / 53 BMP0260 / ICB5765 / IBI5765 The for loop

● The shell expansions and wildcards we learned about earlier in the course work here and help create the lists. Examples: for x in {1..100} do echo "Number $x." done

for x in /usr/bin/b* do stat $x done

J.M.P. Alves 29 / 53 BMP0260 / ICB5765 / IBI5765 The for loop

● The output of commands can be used here as well; for example: for x in `cat some_file` do echo Word found: $x done

● Notice that we must use the ` and ` quotes (backticks): this is called command substitution; it can also be done using $( command… )

● Command substitution can, of course, be used outside of for loops echo Path to ls is $(which ls)

● Will print: Path to ls is /bin/ls

J.M.P. Alves 30 / 53 BMP0260 / ICB5765 / IBI5765 Hint!

● Pretty much everything that can be done in the shell script can also be done on the prompt of the CLI! No need to always create a script.

● One of our previous examples, first as written in the script and then written directly in the prompt as a so called “one-liner”: for x in {1..100} do echo "Number $x." done for x in {1..100}; do echo "Number $x."; done

J.M.P. Alves 31 / 53 BMP0260 / ICB5765 / IBI5765 Now you do it!

Go to the course site and enter Practical Exercise 32

Follow the instructions to answer the questions in the exercise

Remember: in a PE, you should do things in practice before answering the question!

J.M.P. Alves 32 / 53 BMP0260 / ICB5765 / IBI5765 Other ways to loop

● The for loop iterates over a list of items, but there are other ways to be repetitive

● Another loop command is the while loop

● The while loop keeps going while a certain condition is satisfied

● For example, in “pseudocode”: While x is less than 100, do: Print “x = $x” to the screen Make $x equal to $x + 5

● Now we do not give the construct a list if things, but a condition that determines whether the loop should go on

J.M.P. Alves 33 / 53 BMP0260 / ICB5765 / IBI5765 The while loop

● Now, the same thing in actual Bash: #!/bin/bash x=0 while [[ $x -lt 100 ]] do echo "x = $x" x=$(( x + 5 )) done ● As you can see, the structure of the while loop is: while [[ condition ]] do … … done

J.M.P. Alves 34 / 53 BMP0260 / ICB5765 / IBI5765 Wait a second...

● A lot of stuff was happening in that last script. Let’s dissect it…

[[ $x -lt 100 ]] : this is a test, in the form [[ condition ]], which returns true or false

● -lt is one of the comparison operators, meaning less than

● Other such tests for integers are: -eq (equal?), -ne (different?), -le (equal or less than?), -ge (equal or greater than?), and -gt (greater than?)

J.M.P. Alves 35 / 53 BMP0260 / ICB5765 / IBI5765 Wait a second...

● By the way, the shell only works on integers, no decimal points anywhere! (use the bc program if you want real division; e.g. bc -l <<< 10/3)

x=$(( x + 5 ))

● You should remember this from the beginning of the course, when we played with it as a crude calculator: arithmetic expansion

● It takes the form $(( … )), and the variable inside must not have the $

J.M.P. Alves 36 / 53 BMP0260 / ICB5765 / IBI5765 File tests

● Another kind of very important test is the file tests; for example:

[[ $x -nt $y ]] : in this test we will have true if the file whose name is stored in $x is newer than the one whose name is in $y

● Some other such tests for files are: -ot : older than? -G : same group? -ef : same file, i.e., hard link? -r : readable? -d : directory? -w : lowercase w, writable? -f : regular file? -x : lowercase x, executable? -L : ? -s : lowercase s, size > 0? -e : exists? -O : uppercase O, same owner?

J.M.P. Alves 37 / 53 BMP0260 / ICB5765 / IBI5765 Adding up

● Finally, the arithmetic operators: + : addition - : subtraction * : multiplication / : integer division (5 / 2 = 2 or 7 / 3 = 2) ** : exponentiation % : modulo, remainder (5 % 2 = 1 or 8 % 3 = 2) ++ : add one to variable: $((++x)) equals $((x = x+1)) -- : subtract one from variable: $((--x)) equals $((x = x-1))

Pages 483 to 488 of the book “The Linux Command Line” list and explain these and many other operators for such expressions. Check it out!

J.M.P. Alves 38 / 53 BMP0260 / ICB5765 / IBI5765 Quiz time!

Go to the course page and choose Quiz 34

J.M.P. Alves 39 / 53 BMP0260 / ICB5765 / IBI5765 The until loop

● The until loop is very similar to the while loop… just inverted: the loop continues as long as the test condition is NOT satisfied

● Using the same example as earlier, with exactly the same results, but now using an until loop instead: #!/bin/bash x=0 until [[ $x -gt 95 ]] do echo "x = $x" x=$(( x + 5 )) done ● As you can see, we had to adjust the algorithm a little: e.g., if we had kept 100 in the test, the output would have been different

J.M.P. Alves 40 / 53 BMP0260 / ICB5765 / IBI5765 Branching

● While looping is repeating a certain set of statements, branching is choosing which statements to run, and not running others at all

● This is usually based on a condition or choice

● A crude example from everyday life: If it is raining outside: Bring umbrella If it is sunny outside: Bring sunglasses Else: Bring both the umbrella and the sunglasses

● In this situation, only one of the actions will be performed, depending on the test condition (observing the whether)

J.M.P. Alves 41 / 53 BMP0260 / ICB5765 / IBI5765 Branching

● Exactly the same concept is used in programming ● For example, the following script: #!/bin/bash VAR=2 if [[ $(( VAR % 2 )) -eq 0 ]] then echo "Number $VAR is even." else echo "Number $VAR is odd." fi ● The main thing here is the if...then...else construct, which has the structure: if [[ test ]]; then commands; else commands; fi

J.M.P. Alves 42 / 53 BMP0260 / ICB5765 / IBI5765 if...then...else if...then...

● But the if construct is more powerful that that: one can test for multiple conditions if test; then command(s)… elif test; then command(s)… else command(s)… fi ● The parts in red and blue are optional

● The elif (meaning “else if”) can be repeated as many times as needed, providing for an unlimited number of situations

● The loop constructs started with do and ended with done; the branching constructs start with their name (if or case) and end with the reverse of the name (fi or esac)

J.M.P. Alves 43 / 53 BMP0260 / ICB5765 / IBI5765 if...then...else if...then...

● Again, we have the test part, as seen with the while construct:

if [[ $(( VAR % 2 )) -eq 0 ]]

● Let's dissect that...

● The [[ and ]] delimit the test

● The $(( and )) delimit an arithmetic expansion

● In this case, the arithmetic operation is the modulo, or remainder: VAR % 2 (i.e., remainder of dividing $VAR by 2)

● The -eq tests for equality:

● Is the remainder of $VAR divided by 2 equal to zero?

J.M.P. Alves 44 / 53 BMP0260 / ICB5765 / IBI5765 if...then...else if...then... #!/bin/bash VAR=37

if [[ $VAR -ge 0 ]] && [[ $VAR -lt 20 ]] then echo "Number in VAR: [0, 20)" elif [[ $VAR -ge 20 ]] && [[ $VAR -lt 40 ]] then echo "Number in VAR: [20, 40)" elif [[ $VAR -ge 40 ]] && [[ $VAR -lt 60 ]] then echo "Number in VAR: [40, 60)" elif [[ $VAR -ge 60 ]] && [[ $VAR -lt 80 ]] then echo "Number in VAR: [60, 80)" else echo "Number in VAR: [80, inf)" fi J.M.P. Alves 45 / 53 BMP0260 / ICB5765 / IBI5765 Logical operators

● As we’ve seen in the example, we can use more than one test

● Different tests are combined using logical operators

● The main ones are:

● && : logical and

● || : logical or

● ! : not, negates what comes after it

J.M.P. Alves 46 / 53 BMP0260 / ICB5765 / IBI5765 Logical operators

● The && means that both tests must be satisfied for the whole condition to be considered true; for example: [[ $VAR -ge 0 ]] && [[ $VAR -lt 20 ]]

● Here, the whole test will be considered true only if the value of $VAR is >= 0 and < 20

● The || means that at least one of the tests must be satisfied; e.g.: [[ $VAR -ge 0 ]] || [[ $VAR -lt 20 ]]

● Now, $VAR can be either >= 0 or < 20 (or both!)

J.M.P. Alves 47 / 53 BMP0260 / ICB5765 / IBI5765 Quiz time!

Go to the course page and choose Quiz 35

J.M.P. Alves 48 / 53 BMP0260 / ICB5765 / IBI5765 Reading user input

● So far, values in our variables have come from inside the script itself

● That is not good; if we want to use different values, we have to edit the script!

● The read command allows us to ask the user for something, and put it in a variable

read -p "Enter an integer: " var_name

● The -p option allows us to give a prompt: the text of the question presented to the user

● The variable(s) where the answer(s) will be recorded is (are) put last (in this case, it is var_name, but of course it can be any valid variable name)

J.M.P. Alves 49 / 53 BMP0260 / ICB5765 / IBI5765 Reading user input read -p "Please provide a file name: " FILE

if [ -e "$FILE" ]; then if [ -f "$FILE" ]; then echo "$FILE is a regular file." elif [ -d "$FILE" ]; then echo "$FILE is a directory." fi if [ -r "$FILE" ]; then echo "$FILE is readable." fi if [ -w "$FILE" ]; then echo "$FILE is writable." fi if [ -x "$FILE" ]; then echo "$FILE is executable/searchable." fi else echo "$FILE does not exist" fi

J.M.P. Alves 50 / 53 BMP0260 / ICB5765 / IBI5765 Now you do it!

Go to the course site and enter Practical Exercise 33

Follow the instructions to answer the questions in the exercise

Remember: in a PE, you should do things in practice before answering the question!

J.M.P. Alves 51 / 53 BMP0260 / ICB5765 / IBI5765 Recap

● Shell programming at its simplest consists simply of saving a set of commands in a file

● But the shell can also regular programming devices such as:

● Variables

● Branching constructs

● Looping constructs

● Program flow control is usually performed using a condition or test as the deciding factor

● The for, while, and until constructs give us loops based on lists (for) or a condition (while and until)

J.M.P. Alves 52 / 53 BMP0260 / ICB5765 / IBI5765 Recap

● The if construct allows us to branch the code, executing certain parts of the script depending on certain conditions

● There is another branching construct that we have not seen: case

● We have just scratched the programming surface here

● There are many other concepts and techniques that are essential to more serious programming (in special, functions, which are like mini-programs that can be reused)

● What we have seen today is already enough to give a lot of options in dealing with files and data on the CLI

J.M.P. Alves 53 / 53 BMP0260 / ICB5765 / IBI5765