● Log into the Moodle site
● Enter the “Lecture 11” area (button 12)
● At 14:00, choose “Daily Quiz 9”
● Answer the multiple choice quiz (you have 10 min to finish) Automating the command line interface with scripts: programming basics J. M. P. Alves
Laboratory of Genomics & Bioinformatics in Parasitology Department of Parasitology, ICB, USP Repetitive actions
● It is often the case that you frequently run a certain set of individual commands to perform a bigger task
● For example: maybe you have a large amount or files and you need to run several commands on each and every file
● Of course, being a programmer’s system, a Unix-like system such as Linux makes it easy to automate such tasks
● The automation is performed by saving commands in a file that we usually call a script or program
● Except for the simplest (but still very useful) scripts, certain techniques from computer programming can be used
J.M.P. Alves 3 / 53 BMP0260 / ICB5765 / IBI5765 Computer programming
● Obviously, computer programming is a vast subject and we could spend a semester studying it and still not cover most things
● So, in one lecture only, we will see the basics of the basics, which are still very useful for performing some amazing feats on the command-line interface that are impossible (or at least very difficult to achieve) in a graphical shell and/or very boring/error-prone to do manually (in the CLI or otherwise)
● But… what is a computer program anyway?
J.M.P. Alves 4 / 53 BMP0260 / ICB5765 / IBI5765 Computer programming
● A computer program is a set of instructions designed to perform a certain task… using a computer
● Programs are written in formally defined programming languages
● Initially, the program is written by the human as source code; later, it gets translated to machine code (zeros and ones) specific to the kind of computer processor and operating system one is using
J.M.P. Alves 5 / 53 BMP0260 / ICB5765 / IBI5765 Programming languages
● There are tons of programming languages available: Java, C/C++, Python, Perl, Fortran, COBOL, assembly, Ada etc. etc. etc.
● Some are better suited to some circumstances, while others are more efficient in other situations
● But, overall, if a language is Turing-complete (or computationally universal), then it can do whatever any other such language can do
J.M.P. Alves 6 / 53 BMP0260 / ICB5765 / IBI5765 Programming languages
● In practice, people use whatever they already know
● The shell is special in that it is an interface as well as a scripting language interpreter
● Huh? Scripting language? Interpreter?
● Well, there are 10 (in binary) kinds of languages…
● Interpreted or compiled (although the distinction can be somewhat arbitrary and fuzzy in some cases…)
J.M.P. Alves 7 / 53 BMP0260 / ICB5765 / IBI5765 Compiled x interpreted
● Compiling is the act of transforming source code into (usually) machine code
● A language like C++, for example, is compiled: one cannot run the program right after writing the source code file –it is first necessary to compile the program, and then run the generated executable file
● A language like Python, on the other hand, is interpreted: one runs the text file with the source code directly, without the need of first generating a compiled file
● Of course, I am simplifying brutally here… But in practice, from the programmer’s point of view, that is a reasonable approximation
J.M.P. Alves 8 / 53 BMP0260 / ICB5765 / IBI5765 The shell language
● As mentioned above, the shell language is interpreted
● Commands are written in a file (called a shell script) and the shell reads the file and executes each command it finds
● How to write and run a script:
1. Write the script
2.Make the script executable
3.Put the script somewhere the shell can find it
● Steps 2 and 3 are optional, since there are ways to run the script without doing them, but they are a big help if you need to run the script frequently
J.M.P. Alves 9 / 53 BMP0260 / ICB5765 / IBI5765 The shell language
● Pretty much anything that can be run on the CLI can also be run from a shell script
● Besides the regular commands we have learned so far, the shell also uses, among other thing, control constructs and variables
● Script file format:
#!/bin/bash # This is my beautiful script echo 'Hello, World!' # yeah!
● This script follows an ancient tradition of computer programming: the “hello, world” first program
J.M.P. Alves 10 / 53 BMP0260 / ICB5765 / IBI5765 Remember the three steps
● How to write a script:
1. Write the script
2.Make the script executable
3.Put the script somewhere the shell can find it
● Here it goes again: #!/bin/bash # This is my beautiful script echo 'Hello, World!' # yeah!
J.M.P. Alves 11 / 53 BMP0260 / ICB5765 / IBI5765 Remember the three steps
● If you want to run the script without making it executable and moving it elsewhere (usually some directory in the $PATH), you can run:
bash script_0
● Let’s dissect our first script
#!/bin/bash : tells the shell which interpreter to use to execute the script; the #! part works as a magic number and is called shebang; for a Python script this line could be something like #!/usr/bin/python
J.M.P. Alves 12 / 53 BMP0260 / ICB5765 / IBI5765 Remember the three steps
● Lines starting with # (except for the shebang) are comments and are completely ignored; that is why you did not see anything about “beautiful script” in the output of the program
● Comments can appear at the end of a line too, as seen in the last line: echo 'Hello, World!' # yeah! (the comment, as expected, did not appear in the output)
J.M.P. Alves 13 / 53 BMP0260 / ICB5765 / IBI5765 Edit the script
● Add other commands to the script and run it again!
● For example: cd /usr/local/lib ls -l cd ~ echo echo "Disk space report for $HOME:" df -h . echo echo 'Done!'
J.M.P. Alves 14 / 53 BMP0260 / ICB5765 / IBI5765 Variation is a good thing
● Even if all that shell scripts could do was run simple commands, they would still be very useful
● But, as with any programming language, the Bash shell can use variables too
● We have seen variables before: $PATH, $HOME, $USER etc.
● Again, think of variables as little boxes where you can store data to use later
J.M.P. Alves 15 / 53 BMP0260 / ICB5765 / IBI5765 Variation is nice
● Normal shell variables can contain any kind of data, but only one piece (a string) at a time; that is what is called a scalar variable
● To create a new variable, just give it an initial value
● For example:
● my_var="blah blah" : now, there is a variable called $my_var, containing the text “blah blah”
J.M.P. Alves 16 / 53 BMP0260 / ICB5765 / IBI5765 It is good to vary
● Notice that when using the variable, it must start with $
● To create the variable, it must NOT start with $ – a bit confusing, but that’s life
● Example, modifying our first script: #!/bin/bash # This is my beautiful script echo 'Hello, World!' # yeah! var='blah blah' echo '1. And I now say $var' echo "2. And I now say $var"
● Notice that the first echo uses single quotes and the second one uses double quotes!
J.M.P. Alves 17 / 53 BMP0260 / ICB5765 / IBI5765 Important!
● See that the variable was (in one case) expanded and its value used in the command
● Variable expansion only happens with double quotes
● An empty variable expands into nothing (which can lead to errors if, for example, it is used in a command that expects something)
● Variable names can contain letters, digits, and the underscore sign; no other kind of character, including space, is allowed
J.M.P. Alves 18 / 53 BMP0260 / ICB5765 / IBI5765 Important!
● Variable names cannot start with digits though
● It is traditional to use all uppercase letters for variables that are not intended to change their value, i.e., constants
● For example: PI=3.14159265359
J.M.P. Alves 19 / 53 BMP0260 / ICB5765 / IBI5765 More on variables
● Let’s say you have a script with the following commands, which try to rename files by adding “1” or “b” to the end of the names, respectively: mv $file $file1 mv $file $fileb
● The shell has no way, in those commands, to tell that the second instance of $file in each command is the same as the first, i.e., we mean them to be the same variable
● The shell will “think” that $file1 and $fileb are new variables and that will lead to errors
J.M.P. Alves 20 / 53 BMP0260 / ICB5765 / IBI5765 More on variables
● To make a variable name unambiguous, it can be surrounded by {}
● Those commands can be corrected like this:
mv $file ${file}1 mv $file ${file}b
● Now, the shell can tell where the variable name starts and ends
J.M.P. Alves 21 / 53 BMP0260 / ICB5765 / IBI5765 Quiz time!
Go to the course page and choose Quiz 33
J.M.P. Alves 22 / 53 BMP0260 / ICB5765 / IBI5765 Controlling the flow
● Programming is not just about commands and variables though!
● Its true power comes from controlling the flow of the commands
● In the scripts we have played with so far, the flow has been linear, one command executed right after the other, from start to finish
● But any program that is useful will need to make decisions and repeat actions a certain number of times
J.M.P. Alves 23 / 53 BMP0260 / ICB5765 / IBI5765 Controlling the flow
● Flow can be changed in two main ways:
● Branching
● Looping
● In branching, the program chooses which parts to run based on a condition (e.g., is a file a directory? or is x greater than y? did the user type “yes”?)
● In looping, a set of statements is repeated a certain number of times… or forever! The number of repetitions can be fixed or it can depend on a condition
J.M.P. Alves 24 / 53 BMP0260 / ICB5765 / IBI5765 Looping
● An example of looping from everyday life: slicing carrots for soup
● Here is the algorithm: 1)Get cutting board 2)Get knife 3)Place carrot on cutting board a)Lift knife b)Advance carrot c)Cut d)Go to a) unless there is no carrot left 4)End
J.M.P. Alves 25 / 53 BMP0260 / ICB5765 / IBI5765 Looping
● In my everyday practice automating the CLI, I find looping to be of more use
● Again: in looping, a set of statements is repeated a certain number of times… or forever! The number of repetitions can be fixed or it can depend on a condition
● For example, in “pseudocode”: using numbers x = 1 to 10, do: print “Number $x” to the screen create directory called $x inside directory $x, create regular files a$x to z$x
● This little set of instructions will run exactly ten times
● First time, $x will be equal to 1; second, equal to 2; and so on until 10.
J.M.P. Alves 26 / 53 BMP0260 / ICB5765 / IBI5765 The for loop
● The for loop is one of the ways to run a set of statements a number of times
● In its most traditional form, the for loop iterates over a list of strings
● Example, first in “pseudocode” again:
1)Given the words orange, apple, lime, and pineapple
2)Put the next word in variable x
3)Print “Making $x juice.” to standard output
4)Go back to 2 until no more words
● Any list can be used in the for loop, as long as it contains strings (be they numbers, characters, words, file names, whatever)
J.M.P. Alves 27 / 53 BMP0260 / ICB5765 / IBI5765 The for loop
● Now, the same thing in actual Bash: #!/bin/bash for x in apple orange lime pineapple do echo "Making $x juice." done ● As you can see, the structure of the for loop is: for variable in list do … … done ● For clarity and better organization, see that indentation should always be used! Certain languages (Python) force you to do that…
J.M.P. Alves 28 / 53 BMP0260 / ICB5765 / IBI5765 The for loop
● The shell expansions and wildcards we learned about earlier in the course work here and help create the lists. Examples: for x in {1..100} do echo "Number $x." done
for x in /usr/bin/b* do stat $x done
J.M.P. Alves 29 / 53 BMP0260 / ICB5765 / IBI5765 The for loop
● The output of commands can be used here as well; for example: for x in `cat some_file` do echo Word found: $x done
● Notice that we must use the ` and ` quotes (backticks): this is called command substitution; it can also be done using $( command… )
● Command substitution can, of course, be used outside of for loops echo Path to ls is $(which ls)
● Will print: Path to ls is /bin/ls
J.M.P. Alves 30 / 53 BMP0260 / ICB5765 / IBI5765 Hint!
● Pretty much everything that can be done in the shell script can also be done on the prompt of the CLI! No need to always create a script.
● One of our previous examples, first as written in the script and then written directly in the prompt as a so called “one-liner”: for x in {1..100} do echo "Number $x." done for x in {1..100}; do echo "Number $x."; done
J.M.P. Alves 31 / 53 BMP0260 / ICB5765 / IBI5765 Now you do it!
Go to the course site and enter Practical Exercise 32
Follow the instructions to answer the questions in the exercise
Remember: in a PE, you should do things in practice before answering the question!
J.M.P. Alves 32 / 53 BMP0260 / ICB5765 / IBI5765 Other ways to loop
● The for loop iterates over a list of items, but there are other ways to be repetitive
● Another loop command is the while loop
● The while loop keeps going while a certain condition is satisfied
● For example, in “pseudocode”: While x is less than 100, do: Print “x = $x” to the screen Make $x equal to $x + 5
● Now we do not give the construct a list if things, but a condition that determines whether the loop should go on
J.M.P. Alves 33 / 53 BMP0260 / ICB5765 / IBI5765 The while loop
● Now, the same thing in actual Bash: #!/bin/bash x=0 while [[ $x -lt 100 ]] do echo "x = $x" x=$(( x + 5 )) done ● As you can see, the structure of the while loop is: while [[ condition ]] do … … done
J.M.P. Alves 34 / 53 BMP0260 / ICB5765 / IBI5765 Wait a second...
● A lot of stuff was happening in that last script. Let’s dissect it…
[[ $x -lt 100 ]] : this is a test, in the form [[ condition ]], which returns true or false
● -lt is one of the comparison operators, meaning less than
● Other such tests for integers are: -eq (equal?), -ne (different?), -le (equal or less than?), -ge (equal or greater than?), and -gt (greater than?)
J.M.P. Alves 35 / 53 BMP0260 / ICB5765 / IBI5765 Wait a second...
● By the way, the shell only works on integers, no decimal points anywhere! (use the bc program if you want real division; e.g. bc -l <<< 10/3)
x=$(( x + 5 ))
● You should remember this from the beginning of the course, when we played with it as a crude calculator: arithmetic expansion
● It takes the form $(( … )), and the variable inside must not have the $
J.M.P. Alves 36 / 53 BMP0260 / ICB5765 / IBI5765 File tests
● Another kind of very important test is the file tests; for example:
[[ $x -nt $y ]] : in this test we will have true if the file whose name is stored in $x is newer than the one whose name is in $y
● Some other such tests for files are: -ot : older than? -G : same group? -ef : same file, i.e., hard link? -r : readable? -d : directory? -w : lowercase w, writable? -f : regular file? -x : lowercase x, executable? -L : symbolic link? -s : lowercase s, size > 0? -e : exists? -O : uppercase O, same owner?
J.M.P. Alves 37 / 53 BMP0260 / ICB5765 / IBI5765 Adding up
● Finally, the arithmetic operators: + : addition - : subtraction * : multiplication / : integer division (5 / 2 = 2 or 7 / 3 = 2) ** : exponentiation % : modulo, remainder (5 % 2 = 1 or 8 % 3 = 2) ++ : add one to variable: $((++x)) equals $((x = x+1)) -- : subtract one from variable: $((--x)) equals $((x = x-1))
Pages 483 to 488 of the book “The Linux Command Line” list and explain these and many other operators for such expressions. Check it out!
J.M.P. Alves 38 / 53 BMP0260 / ICB5765 / IBI5765 Quiz time!
Go to the course page and choose Quiz 34
J.M.P. Alves 39 / 53 BMP0260 / ICB5765 / IBI5765 The until loop
● The until loop is very similar to the while loop… just inverted: the loop continues as long as the test condition is NOT satisfied
● Using the same example as earlier, with exactly the same results, but now using an until loop instead: #!/bin/bash x=0 until [[ $x -gt 95 ]] do echo "x = $x" x=$(( x + 5 )) done ● As you can see, we had to adjust the algorithm a little: e.g., if we had kept 100 in the test, the output would have been different
J.M.P. Alves 40 / 53 BMP0260 / ICB5765 / IBI5765 Branching
● While looping is repeating a certain set of statements, branching is choosing which statements to run, and not running others at all
● This is usually based on a condition or choice
● A crude example from everyday life: If it is raining outside: Bring umbrella If it is sunny outside: Bring sunglasses Else: Bring both the umbrella and the sunglasses
● In this situation, only one of the actions will be performed, depending on the test condition (observing the whether)
J.M.P. Alves 41 / 53 BMP0260 / ICB5765 / IBI5765 Branching
● Exactly the same concept is used in programming ● For example, the following script: #!/bin/bash VAR=2 if [[ $(( VAR % 2 )) -eq 0 ]] then echo "Number $VAR is even." else echo "Number $VAR is odd." fi ● The main thing here is the if...then...else construct, which has the structure: if [[ test ]]; then commands; else commands; fi
J.M.P. Alves 42 / 53 BMP0260 / ICB5765 / IBI5765 if...then...else if...then...
● But the if construct is more powerful that that: one can test for multiple conditions if test; then command(s)… elif test; then command(s)… else command(s)… fi ● The parts in red and blue are optional
● The elif (meaning “else if”) can be repeated as many times as needed, providing for an unlimited number of situations
● The loop constructs started with do and ended with done; the branching constructs start with their name (if or case) and end with the reverse of the name (fi or esac)
J.M.P. Alves 43 / 53 BMP0260 / ICB5765 / IBI5765 if...then...else if...then...
● Again, we have the test part, as seen with the while construct:
if [[ $(( VAR % 2 )) -eq 0 ]]
● Let's dissect that...
● The [[ and ]] delimit the test
● The $(( and )) delimit an arithmetic expansion
● In this case, the arithmetic operation is the modulo, or remainder: VAR % 2 (i.e., remainder of dividing $VAR by 2)
● The -eq tests for equality:
● Is the remainder of $VAR divided by 2 equal to zero?
J.M.P. Alves 44 / 53 BMP0260 / ICB5765 / IBI5765 if...then...else if...then... #!/bin/bash VAR=37
if [[ $VAR -ge 0 ]] && [[ $VAR -lt 20 ]] then echo "Number in VAR: [0, 20)" elif [[ $VAR -ge 20 ]] && [[ $VAR -lt 40 ]] then echo "Number in VAR: [20, 40)" elif [[ $VAR -ge 40 ]] && [[ $VAR -lt 60 ]] then echo "Number in VAR: [40, 60)" elif [[ $VAR -ge 60 ]] && [[ $VAR -lt 80 ]] then echo "Number in VAR: [60, 80)" else echo "Number in VAR: [80, inf)" fi J.M.P. Alves 45 / 53 BMP0260 / ICB5765 / IBI5765 Logical operators
● As we’ve seen in the example, we can use more than one test
● Different tests are combined using logical operators
● The main ones are:
● && : logical and
● || : logical or
● ! : not, negates what comes after it
J.M.P. Alves 46 / 53 BMP0260 / ICB5765 / IBI5765 Logical operators
● The && means that both tests must be satisfied for the whole condition to be considered true; for example: [[ $VAR -ge 0 ]] && [[ $VAR -lt 20 ]]
● Here, the whole test will be considered true only if the value of $VAR is >= 0 and < 20
● The || means that at least one of the tests must be satisfied; e.g.: [[ $VAR -ge 0 ]] || [[ $VAR -lt 20 ]]
● Now, $VAR can be either >= 0 or < 20 (or both!)
J.M.P. Alves 47 / 53 BMP0260 / ICB5765 / IBI5765 Quiz time!
Go to the course page and choose Quiz 35
J.M.P. Alves 48 / 53 BMP0260 / ICB5765 / IBI5765 Reading user input
● So far, values in our variables have come from inside the script itself
● That is not good; if we want to use different values, we have to edit the script!
● The read command allows us to ask the user for something, and put it in a variable
read -p "Enter an integer: " var_name
● The -p option allows us to give a prompt: the text of the question presented to the user
● The variable(s) where the answer(s) will be recorded is (are) put last (in this case, it is var_name, but of course it can be any valid variable name)
J.M.P. Alves 49 / 53 BMP0260 / ICB5765 / IBI5765 Reading user input read -p "Please provide a file name: " FILE
if [ -e "$FILE" ]; then if [ -f "$FILE" ]; then echo "$FILE is a regular file." elif [ -d "$FILE" ]; then echo "$FILE is a directory." fi if [ -r "$FILE" ]; then echo "$FILE is readable." fi if [ -w "$FILE" ]; then echo "$FILE is writable." fi if [ -x "$FILE" ]; then echo "$FILE is executable/searchable." fi else echo "$FILE does not exist" fi
J.M.P. Alves 50 / 53 BMP0260 / ICB5765 / IBI5765 Now you do it!
Go to the course site and enter Practical Exercise 33
Follow the instructions to answer the questions in the exercise
Remember: in a PE, you should do things in practice before answering the question!
J.M.P. Alves 51 / 53 BMP0260 / ICB5765 / IBI5765 Recap
● Shell programming at its simplest consists simply of saving a set of commands in a file
● But the shell can also regular programming devices such as:
● Variables
● Branching constructs
● Looping constructs
● Program flow control is usually performed using a condition or test as the deciding factor
● The for, while, and until constructs give us loops based on lists (for) or a condition (while and until)
J.M.P. Alves 52 / 53 BMP0260 / ICB5765 / IBI5765 Recap
● The if construct allows us to branch the code, executing certain parts of the script depending on certain conditions
● There is another branching construct that we have not seen: case
● We have just scratched the programming surface here
● There are many other concepts and techniques that are essential to more serious programming (in special, functions, which are like mini-programs that can be reused)
● What we have seen today is already enough to give a lot of options in dealing with files and data on the CLI
J.M.P. Alves 53 / 53 BMP0260 / ICB5765 / IBI5765