Advanced Bio-Linux Notes and Practical

Advanced Bio-Linux Notes and Practical Further Linux command line Compression and archiving utilities Linux supports a wide range of compression formats: Format Suffix Commands used to compress/uncompress ZIP .zip zip and unzip TAPE ARCHIVE .tar tar GZIP .gz gzip and gunzip COMPRESS .Z compress and uncompress BZIP2 .bz2 bzip2 and bunzip2 More details on tar and gzip Tar is very versatile and can operate on most file formats. tar creates an archive of files, and these tar archives are often then compressed with another utility such as gzip (covered later). To uncompress an archive: tar xvf archive.tar tar zxvf archive.tar.gz (or archive.tgz) tar jxvf archive.tar.bz2 The letters after tar alter the behaviour of the command. In this case they mean: x = extract files from an archives v = give verbose output during the operation f = indicates that tar is working on a filename z = automatically deal with gzip compression j = automatically deal with bzip2 compression To see what is inside an archive without uncompressing it: tar tf archive.tar tar ztf archive.tar.gz Again the letters modify the behaviour of tar: t = list the contents of an archive To create an archive: To archive all .txt files in a directory: tar cvf archive.tar *.txt In this context: c = create an archive To do this and compress with gzip/bzip2 at the same time: tar zcvf archive.tar.gz *.txt tar jcvf archive.tar.bz2 *.txt Note: Never use absolute paths when creating a tar archive. For example, f you want to tar up the /home partition on your Bio-Linux machine do not run the command: tar cvf home_archive.tar /home This is because when the file is restored, it will automatically overwrite /home. Always check the archive with tar tf before uncompressing it. Instead change directory to / and use: tar cvf /tmp/home_archive.tar home/ Along with tar, gzip is probably the most commonly used compression utility in UNIX. Compressing a file with gzip: gzip file.txt (this creates file.txt.gz) Uncompressing a file: gzip -d file.txt.gz (this creates file.txt) or gunzip file.txt.gz (this creates file.txt) Note: Linux does not handle Windows self extracting executables, unless you use WINE. Linux does not support Macintosh ARJ archives or SIT archives “out of the box”. However there are Linux applications you can use to work with these formats if you need to. Checking file types in Linux UNIX does not associate file types with particular suffixes eg. text files do not need the .txt ending. Conversely, UNIX does not assume you have a text file just because your filename ends in .txt. The file command tells you what type of file you have. Running: file <filename> will tell you whether it is ASCII text, binary, GNU tar format etc. This is useful for troubleshooting. For example, if you try to submit a binary file (eg, a word document) to a program that expects text (eg BLAST) you would get an error. Running file on the input file would reveal the problem. Input, Output, Error redirection and pipes There are many ways to redirect messages from UNIX programs. If you run my_program, and my_program prints its results to screen, how do you write the results to a file instead? my_program > output.txt The > operator redirects the output to the location of your choice. In the above example this is a file called output.txt. To append output to a file you use >>: my_program >> output.txt A UNIX aside - STDOUT, STDIN, STDERR: UNIX has standard ways of communicating with you. These are known as STDIN, STDOUT and STDERR (standard input, standard output and standard error). Each of these has a different number STDIN is 0, STDOUT is 1 and STDERR is 2. Input to a program generally comes from the keyboard or a file so most programs set STDIN to your keyboard or a file. Output and errors are sent to the screen in most c cases and so anything sent to STDOUT or STDERROR appears on your screen. However you can change where the inputs and outputs and errors go. This is shown in the examples below. Normal mode: myprog > output.txt (send stdout to file) myprog 2> error.txt (send stderr to file) myprog > allout.txt 2>&1 (send stdout and stderr to file) myprog < inputfile (take stdin from file) myprog 2> /dev/null (chuck stderr into oblivion!) Append: myprog >> output.txt (send stdout to end of file) myprog 2>> error.log (send stderr to end of file) myprog >> allout 2>&1 (send stdout and stderr to end of file) myprog <<x (take stdin until "x" occurs) Pipes: myprog | myprog2 (pipe stdout of myprog to stdin of myprog2) myprog 2>&1 | myprog2 (pipe stdout and stderr of myprog to myprog2) Who is using that file/device/socket? Sometimes it is crucial to know who (or what) is accessing a file, socket or device. The most common complaint is “I can't unmount my CD-ROM! it says the device is busy!” This means that someone is accessing the /mnt/cdrom directory. For example they may have changed into the /mnt/cdrom directory. This is enough to “busy” the device and cause a failure. There are two tools for tracking down the problem. To find out who (or what) is using a device, we use the lsof command. eg. for the CD-ROM: /usr/sbin/lsof /mnt/cdrom To find out who (or what) is using the ssh daemon: /usr/sbin/lsof -i:22 The -i specifies that we are looking at an internet (defaulting to the local machine) associated device and the 22 specifies the UNIX port to look at, 22 is the port number for ssh. The command fuser also identifies processes using files or sockets. To list all processes accessing the sshd port: sudo /sbin/fuser ssh/tcp To list all processes using /home/user1 and terminate them: sudo /sbin/fuser -mk /home/user1 In this example the -m flag specifies the fact we are looking at a file or directory. The -k flag means “kill the processes listed”. Exercises: 1. list processes that access the /var partition 2. list processes that use the ssh port a. using fuser b. using lsof 2.b. Now ssh manager@localhost and repeat. What do you see? 3. list all manual pages (hint read the man pages for man) relevant to passwd 4. email the man passwd page to your email account (you will need to use “pipes” for this, ask an instructor) 5. cd ~manager/examples/further_cli/tar/ and compress all the blast reports (.blastp) files into a tar.gz file 6. cd ../compressed and expand compressed.tar file in current directory 7. Compress/gzip all .prot files. What sort of compression ratio do you get? Can you alter/improve this? Process management What is a process? A process is a single instance of a program running on the system. A process can be something that is spawned automatically by the operating system, such as syslogd (a system daemon which handles the logging of system messages and messages from the kernel) or a user started process such as gedit (a text editor). When UNIX processes start they are given a unique number on the system so that the system can keep track of them. This unique number is called a PID or Process ID. A new PID is given for every running program. If three users launch the same program, each instance of the program has a different PID. Keeping track of processes There are a number of ways of finding out the PID of a process. To get a snapshot of the currently running processes we use the command ps. Exercise: Open a terminal window by clicking on the screen icon in the top toolbar. Type: ps Listed are the currently running processes on the terminal you have open. You will see a zsh process (the terminal you are using) and a ps process that you have just run. The information includes the PID, the terminal number the process is running on, the time the process has been running for and the command that the PID is associated with. Now we will launch a program; we are going to use nedit for this example. Type: nedit & ps Don't worry about the ampersand at the end of the command, it will be covered later. In the process list you should see a nedit process along with the zsh and ps processes. This only lists the processes on the terminal. In order to list all the processes that are running as your username (they might be on different terminals): Type: ps x This lists processes regardless of the terminal you are running on. To select by a specific user name: ps -u <username> Compare the differences between ps x and ps -u <username>. To list all the processes on a system you can use the following syntax: ps aux Understanding ps output ps aux lists all the processes running on the system with information about user ownership along with a much more information. The names at the top of each column tell you what the output means: USER: This is the user the process belongs to. System processes that are launched as the computer boots are generally owned by user root. PID: The process ID, the unique number assigned to each process. %CPU: The percentage of the CPU's time spent running the process. %MEM: The percentage of total memory used by the process. VSZ: The total virtual memory size, in blocks of 1Kb*.

Load more