● Log into the course site

● Enter the “Lecture 6” area

14:00, choose “Daily Quiz 5”

● Answer the multiple choice quiz (you have 10 minutes to finish) Finding files and directories, deletion, standard streams, piping

J. M. P. Alves

Laboratory of Genomics & Bioinformatics in Parasitology Department of Parasitology, ICB, USP Removing files

● Files can be removed with the rm (for remove)

● By default, rm only removes regular files, but not directories

● To remove a file called file_name, simply run:

rm file_name

● Notice that removing counts as “writing”, so you need the adequate permission

● What happens if the file belongs to you but you do not have write permission? Let’s try!

J.M.P. Alves 3 / 60 BMP0260 / ICB5765 / IBI5765 Let’s do it!

● First, try to delete file smart_file from your home directory

rm smart_file

● If everything went well, the file should have disappeared. Check!

● Now, try to remove the file called ~/dummy_fileX

● Notice that, although the file is yours, you have to answer a question before deleting it

J.M.P. Alves 4 / 60 BMP0260 / ICB5765 / IBI5765 Quiz time!

Go to the course page and choose Quiz 13

J.M.P. Alves 5 / 60 BMP0260 / ICB5765 / IBI5765 ● As you saw, removing files and/or directories does not depend on the permissions to the items being removed, but on the permissions you have to their parent! (the directory where they are located)

● List again the contents of the test_dir directory that is in your home area

● Let’s look at the ownership and permissions for the another_dir directory:

drwxr-xr-x 2 dummy dummy 4096 Apr 4 16:15 another_dir

J.M.P. Alves 6 / 60 BMP0260 / ICB5765 / IBI5765 Let’s try that again...

● Can we remove directory another_dir or not?

● As seen before, no; we do not have write permission to the parent directory (test_dir) of another_dir

● But what happens inside of directory another_dir?

● Enter the another_dir directory

● Create a regular file there, called test_1

● Did you succeed? Always check!

J.M.P. Alves 7 / 60 BMP0260 / ICB5765 / IBI5765 ● Did you notice there is another file already there?

-rw-r--r-- 1 root root 0 Apr 4 17:07 test_0

● It belongs to root, not to you. Try to delete file test_0

● Answer y to the question. What happened?

● This seems like a violation, but it is not! You did not have write permission to file test_0, but you did have it to its parent directory...

● The directory is a special file (one that users cannot change directly) containing a table of the names and inodes for the files and subdirectories contained within the directory

J.M.P. Alves 8 / 60 BMP0260 / ICB5765 / IBI5765 ● A directory is just a table with two columns: file name and inode identifier (not shown here, but the first two entries of that table are . and ..)

● When you delete a file or directory, all you are doing is removing its entry from that table – the data will still be on storage for a while!

● Incidentally, when you create a hard link, you are just adding another entry to one of those tables, pointing to the same inode

● When the system sees that there are no hard links to an inode, it deletes the inode as well, and your data will eventually get overwritten

J.M.P. Alves 9 / 60 BMP0260 / ICB5765 / IBI5765 A directory as the system sees it This is why a dot The directory is a represents the file, so of course it directory itself has its own inode

Each file and subdirectory inside the directory is represented by a row in the table

J.M.P. Alves 10 / 60 BMP0260 / ICB5765 / IBI5765 So, if you have write permission to a directory, that means that you have the power to change that table, adding or removing rows (i.e., creating or deleting files, respectively), regardless of your permissions to those items located inside the directory!

J.M.P. Alves 11 / 60 BMP0260 / ICB5765 / IBI5765 Deleting directories

● Files can be removed with the rm (for remove) command

● By default, rm only removes regular files, but not directories

● Like for cp, to remove a directory you must give rm the -r option:

rm -r dir_name

● Notice that this will recursively delete everything inside a directory (i.e., all files, subdirectories, files within subdirs, subdirs inside subdirs etc.)!

● If there are subdirectories to you do not have write permission, will removing them fail or not?

J.M.P. Alves 12 / 60 BMP0260 / ICB5765 / IBI5765 Now you do it!

Go to the course site and enter Practical Exercise 14

Follow the instructions to answer the questions in the exercise

Remember: in a PE, you should do things in practice before answering the question!

J.M.P. Alves 13 / 60 BMP0260 / ICB5765 / IBI5765 Deleting directories

● As you saw in PE14, you cannot delete a directory if you are unable to delete at least one item inside that directory

● By default, rm silently deletes whatever you ask it to

● That, of course, makes it a very dangerous command: there is NO undelete command, NO “trash can” or anything like that for the Linux command line

● Therefore, always be sure of what you are doing before running rm!

● Remember the or echo trick: replace rm with one of those commands first to make sure only the intended files will get deleted, and nothing more (or )!

J.M.P. Alves 14 / 60 BMP0260 / ICB5765 / IBI5765 The CLI (usually) assumes you know what you are doing and does not try to stop you from hurting yourself!

J.M.P. Alves 15 / 60 BMP0260 / ICB5765 / IBI5765 Deleting directories

● Another way to delete directories is the rmdir (remove directory) command

● rmdir can only delete directories that are empty (of regular files)

● To delete multiple directories that are nested, one can use the -p option (also known as --parents)

● Examples:

rmdir dir1

rmdir -p dir1/subdirX

● The second command will only work if there are no regular files inside either subdirX or dir1 (if there are, then rm -r dir1 should be the command used)

J.M.P. Alves 16 / 60 BMP0260 / ICB5765 / IBI5765 Standard streams

● Standard streams (for data flow) are automatically connected input and output communication channels between a computer program and its environment, available when the program begins execution

● They are considered a special kind of virtual text files

● One of the most important concepts in the use of the command line!

● There are three such streams:

● Standard input (stdin)

● Standard output (stdout)

● Standard error (stderr)

J.M.P. Alves 17 / 60 BMP0260 / ICB5765 / IBI5765 Standard streams

● In most operating systems before , programs had to be explicitly connected (by the programmer) to the appropriate input and output devices

● One of the advances introduced by UNIX were abstract devices, which eliminated the need for the program to know where data was coming from (or going into)

● UNIX also implemented automatic connection of each running program to the standard data streams (which tie the program to actual physical devices)

● Standard input (stdin): where data comes from to enter the program

● Standard output (stdout): where data goes when it gets out of the program

● Standard error (stderr): where program errors (or warnings or diagnostic messages) go to when issued

J.M.P. Alves 18 / 60 BMP0260 / ICB5765 / IBI5765 Standard streams

● Some programs do not require standard input, e.g., ls,

● Some others do not require standard output, e.g., mkdir, cd

● Standard input is represented by number 0

● Standard output is represented by number 1

● Standard error is represented by number 2

J.M.P. Alves 19 / 60 BMP0260 / ICB5765 / IBI5765 Standard streams

● By default:

● Standard input (stdin): keyboard ● Standard output (stdout): screen Text terminal ● Standard error (stderr): screen

#0 stdin Keyboard

Process #1 stdout #2 stderr Display

J.M.P. Alves 20 / 60 BMP0260 / ICB5765 / IBI5765 Standard streams

● If you were in class previously, you must have a program called average whose file has been placed in your $HOME/bin directory

● If you don't have that file, it from ~dummy/bin/ to your home directory

● The average program accepts data from the standard input and sends its output to standard output

● Start the program (remember: if it's not in a directory from your $PATH, you must give the relative or absolute path to be able to run it! Also make sure your copy of the program has execute permissions for your user)

average (or ./average or bin/average etc.)

● Notice that the program started, and it is now waiting… for data!

J.M.P. Alves 21 / 60 BMP0260 / ICB5765 / IBI5765 ● Data is coming from the standard input, which is the keyboard, by default

● So, type a number, and then ENTER

● Keep doing that until you are done

● To signal the end of “file” (after all, STDIN is a “virtual text file”), press Ctrl+d by itself, at the start of a new line

● Since the average program waits for the whole input before performing any calculation, no output appears until the end of input

● Other programs could behave differently; for example:

● This program return the top 10 (by default) lines of a file. Try it with STDIN! Enter lines until the program exits

J.M.P. Alves 22 / 60 BMP0260 / ICB5765 / IBI5765 Standard streams

● Just by existing, standard streams are already very useful

● But the capability of makes them even more versatile

● That way, data can come from (or go to) different places than just the keyboard (or the screen)

● Redirection can be done between a program and input and output files or between different programs

J.M.P. Alves 23 / 60 BMP0260 / ICB5765 / IBI5765 Standard streams

● This is the second most important enabler of the modularity displayed by UNIX, specially at the command line! Remember the first lecture?

● The : combining small programs that each do only one thing (but do it well), instead of having large programs that do a lot of things (but not as well done):

“The power of a system comes more from the relationships among programs than from the programs themselves” – Brian W. Kernighan & Rob Pike, 1984

J.M.P. Alves 24 / 60 BMP0260 / ICB5765 / IBI5765 Quiz time!

Go to the course page and choose Quiz 18

J.M.P. Alves 25 / 60 BMP0260 / ICB5765 / IBI5765 Redirecting to (and from) files

● Typing a lot of data for the program would not be practical

● Reading a large amount of output on the screen wouldn’t either –actually, it would be impossible in many cases

● To redirect the streams, we use redirection operators

● They are: >, >>, <, <<, and <<<

● To determine which stream you are redirecting, you can prepend its number to the operator; e.g., 2> (will redirect STDERR)

J.M.P. Alves 26 / 60 BMP0260 / ICB5765 / IBI5765 Redirecting to (and from) files

● The redirection operators are:

● > : redirect STDOUT to the file named to its right

● 2> : redirect STDERR to file named to its right

● >> : redirect, appending STDOUT to the file named on the right

● 2>> : redirect, appending STDERR to the file named on the right

● < : redirect STDIN from the file named on the right

● << : redirect STDIN as a here-document

● <<< : redirect STDIN as a here-string

● The operators that redirect to STDOUT and STDERR create the file if it does not exist

● Careful! The single version of the operator (e.g., 2>) will always overwrite the file named to its right (if it exists)!

J.M.P. Alves 27 / 60 BMP0260 / ICB5765 / IBI5765 Redirecting to (and from) files

● Notice that it is not mandatory to use the file handle numbers for the STDIN and STDOUT streams

● If nothing is given, these are the default choices for the < and > redirection operators

● It is possible to merge STDOUT and STDERR and send them to the same file by using the construct: command > file 2>&1 command &> file

● Both versions do the same thing: send the two streams to the same file (overwriting the file!)

● To add to the file without overwriting, use the >> file 2>&1 and &>> versions

J.M.P. Alves 28 / 60 BMP0260 / ICB5765 / IBI5765 Redirecting to (and from) files

● Let’s try! In the remote server, run the following: ls -l /usr/local/bin/ /usr/blah

● If you typed correctly, you should see one error and the listings of a directory

● Now run: ls -l /usr/local/bin/ /usr/blah > ls_f1 2> ls_f2

● Where did all the data go?

● List the contents of your directory and see you now have two new files ls -ltr

● (That is: list with long format, sorting by time, with newest files last)

J.M.P. Alves 29 / 60 BMP0260 / ICB5765 / IBI5765 Redirecting to (and from) files

● Now, try: ls -l /usr/local/bin/ > ls_f3 ● Use the more command to see what is inside of the file ls_f3 that you just created ● Now run: ls -l /usr/local/lib/ > ls_f3 ● Look again at the contents of file ls_f3 ● Where did all the data from the first run go? ● We actually wanted to append the results from the second run to the file! ls -l /usr/local/bin > ls_f3 ls -l /usr/local/lib >> ls_f3

J.M.P. Alves 30 / 60 BMP0260 / ICB5765 / IBI5765 Here-documents

● Here-documents are multi-line string literals

● That is, they are a way of passing multiple lines of text to standard input

● The << operator specifies that a here-document is about to start

● Here-docs are of the following general format command << MARK … … MARK ● Everything between the two instances of the word MARK (or whatever you choose) will be redirected to the STDIN of the command to the left of <<

● Variables can be expanded inside the block of text, or not (depending on whether we use quotes around the delimiter word)

J.M.P. Alves 31 / 60 BMP0260 / ICB5765 / IBI5765 Here-documents

● Example: << EOF A quick brown fox jumps over the lazy dog $PATH EOF ● Since the delimiting identifier (in this case, EOF) appearing by itself on a line marks the end of the here-doc, it is a good idea to choose something that is not a real word ● The output of the command above will be something like: 2 10 115 ● Now, put quotations marks (single or double, it does not matter) around the first EOF and see what happens… ● No more expansion!

J.M.P. Alves 32 / 60 BMP0260 / ICB5765 / IBI5765 Here-strings

● Here-strings are a shortened version of a here-doc

● Here-strings are limited to one line (containing one or more words)

● The <<< operator specifies that a here-string follows

● Here-strings have a very simple format command <<< STRING

● Variables can be expanded inside the string, or not (depending on what kind of quotes, single or double, we use around the string)

● For example: wc <<< "A quick brown fox jumps over the lazy dog $PATH"

● Run the command like that, with double quotes, and then with single quotes. Different output? Why?

J.M.P. Alves 33 / 60 BMP0260 / ICB5765 / IBI5765 Quiz time!

Go to the course page and choose Quiz 19

J.M.P. Alves 34 / 60 BMP0260 / ICB5765 / IBI5765 Now you do it!

Go to the course site and enter Practical Exercise 18

Follow the instructions to answer the questions in the exercise

Remember: in a PE, you should do things in practice before answering the question!

J.M.P. Alves 35 / 60 BMP0260 / ICB5765 / IBI5765 Piping

● Another pioneering and essential UNIX concept, the is a sequence of processes chained together by their standard streams

● The standard output of the first process goes directly into the standard input of the second process, then the STDOUT of the second goes into the STDIN of the third, and so on and so forth…

J.M.P. Alves 36 / 60 BMP0260 / ICB5765 / IBI5765 Piping

The output of one process...

...becomes the input to another

J.M.P. Alves 37 / 60 BMP0260 / ICB5765 / IBI5765 Piping

● The operator for the pipe is the vertical bar: |

command1 | command2 | command3

● The STDERR does not get in the pipe, by default

● To have STDERR go along with STDOUT in the pipe, use the |& construct:

command1 |& command2 |& command3

● No space between | and & there! This construct is not used much though

J.M.P. Alves 38 / 60 BMP0260 / ICB5765 / IBI5765 Piping

● The pipeline is the crucial feature enabling the UNIX philosophy

● The chaining allowed by standard streams and pipe redirection is what leads to the combination of small, generic, single purpose command line tools into very specific, sophisticated commands

J.M.P. Alves 39 / 60 BMP0260 / ICB5765 / IBI5765 Piping

● The two different kinds of redirection can be used in the same chained command (job), of course

● For example:

ls -l /usr/bin | wc -l > out_file

● This command will:

● List (long format) all files from /usr/bin (left side of the pipe)

● Count how many lines (files) there are (right side of the pipe)

● Save the results in file out_file (redirection of STDOUT on the right)

● Right now, we haven’t seen enough data-munching commands to be able to explore the full power of piping… That’s for after the midterm exam!

J.M.P. Alves 40 / 60 BMP0260 / ICB5765 / IBI5765 Now you do it!

Go to the course site and enter Practical Exercise 19

Follow the instructions to answer the questions in the exercise

Remember: in a PE, you should do things in practice before answering the question!

J.M.P. Alves 41 / 60 BMP0260 / ICB5765 / IBI5765 Finding files or directories

● When you have lots of files (potentially thousands!) in your system, finding that one file that you know you have, but can’t remember where, can be a daunting task

● There are two different commands for finding files in Linux systems: locate

● The find command is probably always present, while locate might or might not be installed (although it is commonly installed)

● As “The Linux Command Line” book says:

● locate – Find Files The Easy Way

● find – Find Files The Hard Way

J.M.P. Alves 42 / 60 BMP0260 / ICB5765 / IBI5765 locate

● locate finds files exclusively by name

● The locate program performs a rapid database search of path names, and then outputs every name that matches a given query

● Let’s say we want to find every file that contains .zip in the name; the command for locate would be: locate .zip

● Try it in the remote server!

● locate, as implied above, uses a pre-made database to look up names

● The problem with that is that “newly” created files will not be found before the database is updated. Want proof, list the contents like this: ls ~dummy J.M.P. Alves 43 / 60 BMP0260 / ICB5765 / IBI5765 locate

● As you can see, there is a file (newfile.zip) with .zip in the name that was not found by locate

● That is because the database to locate is only updated at certain intervals – typically once a day

● Thus, any file changes (create/delete/rename) after the last update to the database will not be seen by locate

● That, of course, can be a disadvantage

● The advantage of using the database is the speed of the lookup

J.M.P. Alves 44 / 60 BMP0260 / ICB5765 / IBI5765 find

● find, on the other hand, performs real-time searches

● Also, it allows for much more than just searches by file name

● It searches a given directory (and its subdirectories) based on a variety of attributes (like permission, modification time, file type, etc., etc., etc.)

https://www.gnu.org/software/findutils/manual/html_mono/find.html

J.M.P. Alves 45 / 60 BMP0260 / ICB5765 / IBI5765 find

● It can be a very complex command

● First of all, let’s see what happens if we use find for the files containing .zip in the end of their names: find ~dummy -name '*.zip' 2> /dev/null

J.M.P. Alves 46 / 60 BMP0260 / ICB5765 / IBI5765 Now you do it!

Go to the course site and enter Practical Exercise 16

Follow the instructions to answer the questions in the exercise

Remember: in a PE, you should do things in practice before answering the question!

J.M.P. Alves 47 / 60 BMP0260 / ICB5765 / IBI5765 find

● Now, file newfile.zip was found, but the others were not…

● find uses the structure:

find path_to_search -opt1 x -opt2 y -action

● The expression that find uses to select files consists of one or more primaries, each of which is a separate command line argument

● find evaluates the expression each time it processes a file

J.M.P. Alves 48 / 60 BMP0260 / ICB5765 / IBI5765 find

● An expression can contain any of the following types of primaries:

● Options

● Tests

● Actions

● Operators

● For example:

find ~ -maxdepth 3 -name '*.pdf' -and -perm 777 -delete

J.M.P. Alves 49 / 60 BMP0260 / ICB5765 / IBI5765 find

find ~ -maxdepth 3 -name '*.pdf' -and -perm 777 -delete

tests command (-name '*.pdf' (find) and -perm 777) action (-delete) path where option the search (-maxdepth 3) logical should start operator (user’s home: ~) (-and)

J.M.P. Alves 50 / 60 BMP0260 / ICB5765 / IBI5765 find find ~ -maxdepth 3 -name '*.pdf' -and -perm 777 -delete

● This search will:

● Look for files and directories in the user’s home directory (and its subdirectories, but…)

● Go down at most three subdirectory levels (e.g., it will search directories ~/dir1/subdir1/ and ~dir1/subdir1/subsub2/, but not ~/dir1/subdir2/subsub1/subsubsubX/)

● Look for files whose names end in .pdf…

● AND whose permissions are 777

● Finally, find will delete files that satisfy those conditions

J.M.P. Alves 51 / 60 BMP0260 / ICB5765 / IBI5765 find – some tests -amin, -cmin, -mmin -path -anewer, -cnewer, -mnewer -perm -atime, -ctime, -mtime -readable -empty -regex -executable -samefile -group -size -name, -iname -type -inum -user -newer -writable -nogroup, -nouser etc. etc. etc.

J.M.P. Alves 52 / 60 BMP0260 / ICB5765 / IBI5765 find – actions

-delete -, execdir -fls, -ls -print, -fprint -print0, fprint0 -, fprintf -ok, -okdir -prune -quit

J.M.P. Alves 53 / 60 BMP0260 / ICB5765 / IBI5765 find – operators \( \) -not, ! -a, -and -o, -or

J.M.P. Alves 54 / 60 BMP0260 / ICB5765 / IBI5765 find

● Use find to perform the following searches:

find /data/genomas -name '*contigs*'

find /data/genomas -iname '*contigs*'

find /data/genomas -name 'Try*'

find /data/genomas -iname 'Try*' -type f

find /data/genomas -iname 'Try*' -type f -exec ls -l {} \;

find /data/genomas -iname 'Try*' -type f -exec ls -l {} +

● Characters like ; and ( and ) have special meaning to the , so they must be escaped with a preceding \ (backslash): \; \( \)

J.M.P. Alves 55 / 60 BMP0260 / ICB5765 / IBI5765 find

● We can group different tests in a search command

● To do that, we use the logical operators mentioned earlier (-and, -or, -not)

● If no operator is given, -and is applied by default in most cases

● Grouping is performed with parantheses, which have to be escaped (since they have special meaning for the shell) by using a backslash

● Examples:

find /data/ -name '*contigs*' -and -type d

find /data/ -name '*contigs*' -type d (same result as above)

find /data/ \( -name 'Try*' -or -name 'Lc*' \) -and -type f

find /data/ -iname 'Try*' -not -type d

J.M.P. Alves 56 / 60 BMP0260 / ICB5765 / IBI5765 Quiz time!

Go to the course page and choose Quiz 17

J.M.P. Alves 57 / 60 BMP0260 / ICB5765 / IBI5765 Now you do it!

Go to the course site and enter Practical Exercise 17

Follow the instructions to answer the questions in the exercise

Remember: in a PE, you should do things in practice before answering the question!

J.M.P. Alves 58 / 60 BMP0260 / ICB5765 / IBI5765 Recap

● Deleting (removing) files and directories is done mainly with the rm command (-r option for also deleting directories)

● rmdir can also remove directories – but only if they are empty!

● Deleting things is obviously dangerous and, given the power of the CLI, should be done carefully

● The find program is very powerful and can find files based on a large number of criteria, and also includes logical operators for greater flexibility

● Standard streams (STDIN, STDOUT, and STDERR) are an essential feature of UNIX, and make it very easy to redirect data flows between programs and files or between programs and other programs

J.M.P. Alves 59 / 60 BMP0260 / ICB5765 / IBI5765 Recap

● Standard streams (STDIN, STDOUT, and STDERR) are an essential feature of UNIX, and make it very easy to redirect data flows between programs and files or between programs and other programs

● The main redirection operators are <, >, 2>, >>, and 2>>

● Redirection of standard streams between programs, called piping, allows us to concatenate different programs to create more specific ones

● The pipe character, |, redirects STDOUT from the command to its left to STDIN of the command to its right (STDERR by default goes to the screen)

● Standard streams and piping are responsible for most of the UNIX philosophy

J.M.P. Alves 60 / 60 BMP0260 / ICB5765 / IBI5765