<<

Advanced Bio- Notes and Practical

Further Linux line

Compression and archiving utilities

Linux supports a wide range of compression formats:

Format Suffix Commands used to /uncompress .zip zip and unzip TAPE ARCHIVE .tar tar .gz gzip and gunzip COMPRESS .Z compress and uncompress .bz2 bzip2 and bunzip2

More details on tar and gzip

Tar is very versatile and can operate on formats. tar creates an archive of files, and these tar archives are often then compressed with another utility such as gzip (covered later).

To uncompress an archive: tar xvf archive.tar tar zxvf archive.tar.gz (or archive.tgz) tar jxvf archive.tar.bz2 The letters after tar alter the behaviour of the command. In this case they mean: x = extract files from an archives v = give verbose output during the operation f = indicates that tar is working on a filename z = automatically deal with gzip compression j = automatically deal with bzip2 compression

To see what is inside an archive without uncompressing it: tar tf archive.tar tar ztf archive.tar.gz Again the letters modify the behaviour of tar: t = list the contents of an archive

To create an archive:

To archive all .txt files in a : tar cvf archive.tar *.txt In this context: c = create an archive To do this and compress with gzip/bzip2 the same : tar zcvf archive.tar.gz *.txt tar jcvf archive.tar.bz2 *.txt

Note: Never use absolute paths when creating a tar archive. For example, f you want to tar up the /home partition on your Bio-Linux machine do not run the command: tar cvf home_archive.tar /home This is because when the file is restored, it will automatically overwrite /home. Always check the archive with tar tf before uncompressing it. Instead change directory to / and use: tar cvf /tmp/home_archive.tar home/

Along with tar, gzip is probably the most commonly used compression utility in .

Compressing a file with gzip: gzip file.txt (this creates file.txt.gz) Uncompressing a file: gzip -d file.txt.gz (this creates file.txt) or gunzip file.txt.gz (this creates file.txt)

Note: Linux does not handle Windows self extracting executables, unless you use WINE. Linux does not support Macintosh ARJ archives or SIT archives “out of the box”. However there are Linux applications you can use to work with these formats if you need to. Checking file types in Linux

UNIX does not associate file types with particular suffixes eg. text files do not need the .txt ending. Conversely, UNIX does not assume you have a text file just because your filename ends in .txt. The file command tells you what of file you have. Running: file will tell you whether it is ASCII text, binary, GNU tar etc. This is useful for troubleshooting. For example, if you try to submit a binary file (eg, a word document) to a program that expects text (eg BLAST) you would get an error. Running file on the input file would reveal the problem.

Input, Output, Error redirection and pipes

There are many ways to redirect messages from UNIX programs. If you run my_program, and my_program prints its results to screen, how do you the results to a file instead? my_program > output.txt The > operator redirects the output to the location of your . In the above example this is a file called output.txt. To append output to a file you use >>: my_program >> output.txt

A UNIX aside - STDOUT, STDIN, STDERR: UNIX has standard ways of communicating with you. These are known as STDIN, STDOUT and STDERR (standard input, standard output and standard error). Each of these has a different number STDIN is 0, STDOUT is 1 and STDERR is 2. Input to a program generally comes from the keyboard or a file so most programs set STDIN to your keyboard or a file. Output and errors are sent to the screen in most c cases and so anything sent to STDOUT or STDERROR appears on your screen. However you can change where the inputs and outputs and errors go. This is shown in the examples below. Normal mode: myprog > output.txt (send stdout to file) myprog 2> error.txt (send stderr to file) myprog > allout.txt 2>&1 (send stdout and stderr to file) myprog < inputfile (take stdin from file) myprog 2> /dev/null (chuck stderr into oblivion!) Append: myprog >> output.txt (send stdout to end of file) myprog 2>> error.log (send stderr to end of file) myprog >> allout 2>&1 (send stdout and stderr to end of file) myprog <&1 | myprog2 (pipe stdout and stderr of myprog to myprog2)

Who is using that file/device/socket?

Sometimes it is crucial to know (or what) is accessing a file, socket or device. The most common complaint is “I can't unmount my -ROM! it says the device is busy!” This means that someone is accessing the /mnt/cdrom directory. For example they may have changed into the /mnt/cdrom directory. This is enough to “busy” the device and cause a failure. There are two tools for tracking down the problem. To out who (or what) is using a device, we use the command. eg. for the CD-ROM: /usr/sbin/lsof /mnt/cdrom To find out who (or what) is using the ssh daemon: /usr/sbin/lsof -i:22 The -i specifies that we are looking at an internet (defaulting to the local machine) associated device and the 22 specifies the UNIX port to look at, 22 is the port number for ssh. The command also identifies processes using files or sockets. To list all processes accessing the sshd port: /sbin/fuser ssh/tcp To list all processes using /home/user1 and terminate them: sudo /sbin/fuser -mk /home/user1 In this example the -m flag specifies the fact we are looking at a file or directory. The -k flag means “ the processes listed”.

Exercises:

1. list processes that access the /var partition 2. list processes that use the ssh port a. using fuser b. using lsof 2.b. Now ssh manager@localhost and repeat. What do you see? 3. list all manual pages (hint read the man pages for man) relevant to 4. email the man passwd page to your email account (you will need to use “pipes” for this, ask an instructor) 5. cd ~manager/examples/further_cli/tar/ and compress all the blast reports (.blastp) files into a tar.gz file 6. cd ../compressed and compressed.tar file in current directory 7. Compress/gzip all .prot files. What of compression ratio do you get? Can you alter/improve this? Process management

What is a process?

A process is a single instance of a program running on the system. A process can be something that is spawned automatically by the , such as syslogd (a system daemon handles the logging of system messages and messages from the kernel) or a user started process such as gedit (a text editor). When UNIX processes they are given a unique number on the system so that the system can keep track of them. This unique number is called a PID or Process ID. A new PID is given for every running program. If three users launch the same program, each instance of the program has a different PID.

Keeping track of processes

There are a number of ways of finding out the PID of a process. To get a snapshot of the currently running processes we use the command .

Exercise:

Open a terminal window by clicking on the screen icon in the toolbar. Type: ps Listed are the currently running processes on the terminal you have open. You will see a zsh process (the terminal you are using) and a ps process that you have just run. The information includes the PID, the terminal number the process is running on, the time the process has been running for and the command that the PID is associated with.

Now we will launch a program; we are going to use nedit for this example. Type: nedit & ps Don't worry about the ampersand at the end of the command, it will be covered later. In the process list you should see a nedit process along with the zsh and ps processes. This only lists the processes on the terminal. In order to list all the processes that are running as your username (they might be on different terminals): Type: ps x This lists processes regardless of the terminal you are running on. To select by a specific user name: ps -u Compare the differences between ps x and ps -u .

To list all the processes on a system you can use the following syntax: ps aux

Understanding ps output ps aux lists all the processes running on the system with information about user ownership along with a much information. The names at the top of each column tell you what the output means:

USER: This is the user the process belongs to. System processes that are launched as the computer boots are generally owned by user root. PID: The process ID, the unique number assigned to each process. %CPU: The percentage of the CPU's time spent running the process. %MEM: The percentage of total memory used by the process. VSZ: The total virtual memory size, in blocks of 1Kb*. RSS: “Real Set Size”, the actual amount of physical memory allocated to this process. TTY: The terminal associated with the process. A ? indicates the process is not connected to a terminal. STAT: Process state codes. Common states are: S - Sleeping, R - Runnable (on the run queue), N - Low priority task, Z - Zombie process. START: When the process was started, in hours and minutes, or a day if the process has been running for a while. TIME: CPU time used by the process since it started. COMMAND: The command name. This can be modified by processes as they run, so do not rely on this absolutely!

*Note - Virtual memory UNIX and Linux allow you to use all of the physical memory (RAM) in your system as well as areas of the hard disk (called swap space) designated as memory when the physical memory is used up. Virtual memory is the of the physical memory (RAM) and the total swap space assigned by the system administrator at the system installation time.

Exercise:

Take a look at the for ps and use some of the other flags available. ps captures a snapshot of currently running processes. The process table is a dynamic place though, processes change all the time. To view the process table in action we can use the command top. Type: top And it for a while, you will see the process table changing. Launch gedit and see how long it takes to appear in the process table. top displays a ps aux like output, but the most CPU intensive processes are listed first. top also gives information such as how long it has been since the last reboot, how many processes are running, how intensively the machine is used in different time periods (also known as the load average), how much real and virtual memory is used and how much virtual memory is cached.

Understanding top output

The process output is slightly different to that of ps aux. Notably it has these columns: PRI - the priority of the process. NI - the '' value of the process.

Note - “Process Priority” Each process in UNIX has a priority. The default priority for a process is 15. Long processes that need a lot of CPU time seriously affect the performance of machines, affecting all users. It is important that long, CPU intensive processes are run at low priorities.

Using nice to modify process priority

Exercise:

Type: top What priority and nice value is top running at? top by pressing q. Type: nice -n 10 top Now what priority and nice value is top running at?

Nice values run from -20 (highest priority) to 19 (lowest priority). What happens if you try to set a higher than normal priority? Type: nice -n -10 top The reason you can't do this as a user (you can with sudo, or as root) is that it is a) very dangerous to let people choose their priority, as everyone assumes their tasks have more priority than other peoples, and b) it's very easy to hog the entire resources of the machine with a CPU/memory intensive application running at the highest priority. It should also be noticed that top is an interactive program, not just a display one and it is actually possible to do process management using it. Therefore be very careful using top via sudo or as root.

More ways to view processes

There are other ways to view processes giving more information than ps and top.

Type: pstree displays a process . Processes can launch other processes when running (for example if one program launches another one). Then you have a parent process (i.e. the process which has spawned the new process) and child processes (the processes that are spawned). You will see references to PPID's - these are parent process ID's, each individual will have its own PID. In the pstree output the child processes are branches of the tree. Child processes may have child processes of their own and pstree shows this clearly. To get a similar output from ps type: ps aux --forest

Process control

There are generally four kinds of actions associated with process handling. Starting a process.

Putting a process in the background.

Putting a process in the foreground.

Stopping a process. Starting a process, in this case means “run a command”. To stop a process we can do one of three things:

If the command has a quit command, either through a keypress, window or menu item use it to stop the program cleanly.

If the process is active in the current terminal and has no quit command you can use . However in some cases, the program may not exit cleanly.

If the program is not active in the terminal and has no quit command we use the command kill. When we about a program “exiting cleanly” we mean that it has time to clean up after itself - cleanly release the memory it has occupied, up any used files and terminate any child processes. If a program does not exit cleanly then some of these might not happen. The kill command is a synonym for stop. It does not literally kill a process unless we specify it to with a .

Signals: Signals are a basic form of something called interprocess communication. This means they are a way for programs to send essential messages to one another. There are a number of different signals associated with different events. Signals have names, and an associated number. If you are familiar with UNIX you may have come across segmentation faults. These are reported with a SIGSEGV signal (signal number 11). Signals that are most used by systems administrators are SIGHUP (signal 1) and SIGKILL (signal 9). To tell kill exactly what you want it to do to a process you have to give it an appropriate signal to send the process you want kill to act upon.

Exercise:

Start a gedit process in your terminal: gedit & Imagine that that the program had hung so that the quit command wasn't working. We can stop gedit using kill. In order to use kill we need the PID of the process so we know which process to send the signal to. Get the PID of the gedit process from: ps By default kill sends a TERM (TERMINATE) signal, so to send the default signal to the process we want to send that signal to: kill Should stop the process. The gedit window should disappear from your desktop. This method of stopping processes will not always work - for example if a process has crashed and is not responding to mouse clicks or keyboard interrupts. When this happens we use a more drastic form of the kill command.

Exercise:

Start gedit: gedit & First we are going to send a SIGSEGV signal to the process - which will the process think that it has segfaulted (segmentation faulted). The process aborts, with a warning message to the screen. First of all we need the PID of the gedit process: ps Now send the signal with kill: kill -SEGV Do you get a warning message on the screen? Now start gedit again: gedit & Imagine the gedit process has crashed and cannot be quit in any of the normal ways. We can send a very terminal kill with signal SIGKILL. First get the gedit PID with: ps then: kill -KILL No warning message will appear on the screen. As mentioned earlier signals have names and numbers so you can use kill -9 instead of kill -KILL . Launch gedit again and try this: gedit & get the PID: ps Kill the process: kill -9

The other common use of kill is with the HUP flag. This is often used to restart system services that do not have an /etc/.d/ entry. If you make changes to a configuration file and you don't want to stop and start the service manually you can restart it from the command line with a kill -HUP causing the process to restart and re-read the configuration file changes. However, how a process interprets the HUP signal can vary, some processes take it as a HANGUP (terminate) signal, some as a reload signal, some as a restart and reload signal. Most system daemons will reload, or restart and reload. If you feel curious about how many signals kill can send try kill -l.

Sending processes to the background and foreground

Each time we have launched an application we have used the trailing ampersand (&) to start the program. This is to force the application into the terminal background, which leaves us with access to the terminal. If we had started the command without the trailing ampersand we would have not been able to continue using the terminal at the same time. If you launch an application without the trailing ampersand it is possible to regain control of the terminal. It is a two step process in which you first suspend the active process with and then send the process to the background with the command bg (background).

Exercise:

Launch gedit without the trailing ampersand: gedit Now suspend the process in the terminal: At this point you should have control of the terminal and you can type: bg This puts gedit into the background and both gedit and the terminal are now accessible for use. Now try to start gedit and suspend the process but do not put the process in the background. What happens when you try to use gedit? Go back to the terminal and put gedit in the background with bg. Can you use gedit now? If you need to bring a job back into the foreground - for example to use you can use the command fg which you would need to do if you wanted to use on it.

Start gedit again, and this time see what the command jobs outputs when you put it into the background More process management - and pkil l

You can do more large scale and elegant process management with pgrep and than ps and kill. pgrep or process (there is more on grep later) displays PID's associated with programs or users.

Exercise:

Run ps aux in one terminal:: ps aux Open another terminal and type: pgrep zsh What is being listed here? Hopefully you should be able to see the shared PID's between ps aux and pgrep. Now try: pgrep -u root This lists all the PID's associated with user root. Now try: pgrep -u root mingetty What do you think this output represents? pkill works similarly and takes the same arguments, but sends kill signals to multiple processes. Launch a few times as a background job: vim & (use the up arrow to repeat the command). vim never shows any display it just suspends itself as it is an interactive screen program. Now try: pgrep vim This should list all the PID's that you have just created. Now try: pkill vim Now try: pgrep vim Are the vim processes killed? Now try: pkill -9 vim and then: pgrep vim The vim processes are gone.

System management - memory management

As well as keeping an eye on the processes running on the system systems administrators also keep an eye on memory usage. We have already used top and seen that it displays memory information. You can use the command free if you do not need the additional process information you get with top. If you are interested in virtual memory usage, use the vmstat command. The output from vmstat can be cryptic so you look up the vmstat man pages: man vmstat The file /proc/meminfo contains a lot of information on all the memory states of the system: /proc/meminfo

System management - disk management

As a systems administrator, you should frequently check disk usage on your system. Running out of disk space is a common problem and results in a variety of errors depending on which partition has filled up. A full / (root) partition causes different error messages than a full /var partition. Two tasks that systems administrators commonly need to perform.

Checking the amount of diskspace left

Calculating the size of a directory on the disk.

To view the amount of diskspace left on the disk(s) type: -m The most important column to watch is the Use% column showing how much of the disk is used (as a percentage of the total disk space of the disk). If this exceeds 80%, or changes drastically from day to day you need to investigate. The Available column gives you the amount of space remaining on each disk partition in megabytes. For more fine grained information, use: df -k This reports sizes are in kilobytes. Compare the two and stick with whichever format you prefer. 1 megabyte is not 1000 kilobytes, it is 1024 kilobytes (this can confuse when relating kilobyte to megabyte figures). When your filesystems are low on space you need to closely examine your system. The command can find large files and directories. In order to get the size of a directory you can use the command: du -m If you prefer the readout in kilobytes then you can use: du -k Another extremely useful command to large files is: find ./ -size 100000 This locates files over 100,000kb (approximately 100Mb).

Exercise:

Find out the size of a few of the top level directories: cd / du -m /etc/ etc.

Centralised system monitoring

It is possible to get all the information discussed previously in one centralised tool. Run: gnome-system-monitor gnome-system-monitor allows you to view process lists, send processes signals (both functions of the command line top) and also do advanced things like show memory maps of processes (so you can see which files a process is using), check CPU load, , memory usage and disk usage. You might ask why this application was not shown first! As a systems administrator you should never rely on graphical tools. Firstly, a lot of systems administration is done remotely and it is more efficient to work at the command line. Secondly, GUI's can break, or report back erroneous information, so knowing the true source of information is invaluable. Controlling process timing

We've seen how to start, stop, suspend, kill, background and foreground processes. Another level of process control is the ability to control when a process starts. For example: Let's say you have written a program that checks a database for new data deposited. You manually run this everyday to check for new data and download it every day so that you are always up to date. You then go on holiday for a week, but you don't want the data to get out of date. How can you schedule your program to run daily? UNIX systems have a utility (daemon) which runs all the time and allows jobs to run hourly, daily, weekly, monthly etc. This daemon is called . cron is easy to interface with. cron examines something called a crontab file (/etc/crontab) to determine which jobs should run, and when. To view the crontab file on Bio-Linux: /etc/crontab The fields (often marked with asterisks *****) are populated with numbers and these numbers

control the time at which the job is run:

¢¡¤£¦¥¨§ © ¤¦¥¨§    

 "!#%$ &('*),+%-/.10

23 "!¨254 678+,9:0

4;!<4= >,?A@%0

BC;!D12 &E71)%-F6,0

#G "!IH >,[email protected]

In the “day of week” 0 is the first day of the week, and this is Sunday. Exercise: Look at the entries in the crontab file - can you work out exactly when they will run? In order to run jobs with cron you can edit a crontab file (either system wide, or for a user, which is not covered in this course) but if you need a job run hourly, daily, weekly or monthly on the system and are happy to use the default times that cron runs jobs you can simply put a script in one of the following directories: /etc/cron.hourly /etc/cron.daily /etc/cron.weekly /etc/cron.monthly The advantages of this are its simplicity but obviously you lack the fine-grained control.

Exercises:

1.Experiment with ps and its modifying switches. Refer to the man pages to alter the display.

2.top is a very system intensive command – see if you can work out how to run top so it only updates the display once every 20 seconds. Does this reduce the load?

3.Using the man pages try to figure out how to use top to do process management tasks.

4.View the tree like structure of the processes running on the machine. Which processes have the most child processes, and why do you think this is?

5.Work out how to list the processes associated with a terminal using pgrep.

6.Relate free, cat /proc/meminfo and vmstat information, can you see where the commands get their information from?

7.Get some size information from some of the directories on the filesystem. What do the larger directories contain?

8.Load up gnome-system-monitor and learn what the various displays are showing relative to the commands learnt in the session.

9.Work out what the crontab entry would be to run a program every Saturday at 2.34pm in every even numbered month. Further Linux command line II

, grep and pattern matching

While single commands are useful increasingly complex tasks can be done by stringing together commands and learning more features of UNIX. One nice feature is pattern matching with regular expressions. Two commands that use this extensively are sed and grep. sed sed is the stream editor. A stream editor is used to perform basic text transformations on an input stream. An input stream for our purposes is a file being read or the output of a command. So what can sed do?

Exercise:

Create a file called sed.txt with the following text: I like dogs, but I hate cats, and as for rats - yuk! Save the file and try the following sed commands: sed -e 's/dogs/rats/g' sed.txt sed -e 's/dogs/rats/g' -e 's/cats/moles/g' sed.txt Check the changes to the output from the original file. Can you see what is going on? In computer speak sed reads the standard input into the pattern space, performs a sequence of editing commands on the pattern space and then writes the pattern space to STDOUT. In simple terms sed takes and input, changes it, and outputs it to the screen. If you find that you regularly use sed to search and you can put the sed command(s) in a file .sed

Exercise:

Create a file 'dograt.sed' which contains only: s/dogs/rats/g Now try: sed -f dograt.sed sed.txt Does this produce the same output as: sed -e 's/dogs/rats/g' sed.txt

Using sed.txt how can you change: I like dogs, but I hate cats, and as for mice -yuk! to I like cats, but I hate dogs, and as for rats - yuk! In only ONE line of sed, without using sed -f? sed can also be used to pick particular lines from a file. Add another line to sed.txt: Brown dogs make better pets than white cats Save the file. Now try the following command: sed -e '/Brown/s/cats/rats/g' sed.txt Do all instances of cats turn to rats? Why do you think this is? sed can also delete lines from a file. Try the following three lines and see what they do: sed -e '1d' sed.txt sed -e '2d' sed.txt sed -e '1,2d' sed.txt The remainder of sed's complexity comes from getting to grips with regular expressions. grep and regular expressions grep is a very useful command for searching for patterns. It is a great command for looking for particular lines in a list or a file. To see why try some of the following examples: grep ACCESSION genbank.txt grep accession genbank.txt grep -i accession genbank.txt Can you work out what the -i flag is for? What does this tell you about the normal behaviour of grep? Now try the following: grep A.CESSION genbank.txt grep A..ESSION genbank.txt grep “A.*ESSION” genbank.txt All of these should pull out the same result. A “.” means match any character so that A.CESSION will match: ACCESSION and ABCESSION and AACESSION Similarly “..” means match two characters so that A..ESSION will match: ACCESSSION and ABBESSION and AAAESSION A “.*” means match one or more characters that are the same so A.*ESSION will match: ACCESSSION and ACCCCCCCESSION as well as ABBESSION and ABBBBBESSION Next try the following commands: grep ORGANISM genbank.txt grep ^ORGANISM genbank.txt Which one produces a result? The “^” means match the beginning of a line. The ORGANISM field is not at the beginning of a line in GenBank record - it has whitespace in front of it. Some characters are regarded as special by grep and if you use these as part of a pattern match you need to escape them so that grep knows not to treat them as special (meta) characters. The characters that need escaping in a match are : #^$\?+*|[](). Now try: grep “)$” genbank.txt grep \)$ genbank.txt Do they produce the same result? They should - the “$” means match the end of the line. These lines match any line ending with a “)”. If we want to use grep without the double quotes we have to escape the meta character - in this case ). In order to escape characters we prefix them with \. With grep you can match multiple characters. You can specify the number of characters you want to match or use “*” which matches as many characters as it can). Try the following: grep aaattt genbank.txt grep “a\{3\}t\{3\}” genbank.txt These two match the same thing. “a{3}” means “match 3 a's”, note the escaped metacharacters in the line!

More information on grep and sed

If you're feeling brave: man sed man grep If you want tutorials on the interweb: http://pegasus.rutgers.edu/~elflord/unix/sed.html http://pegasus.rutgers.edu/~elflord/unix/grep.html

Command repetition - the power of “ foreach” foreach is a command which is a component of the shell (i.e. built in). foreach allows you to iterate over n items specified in a list and perform an action on each one. Lets say you have 100 .gz files that you want to unzip. You could use foreach like this: foreach i (*.gz) gzip -d $i end Or maybe you have 20 jpeg files you want to view: foreach i (*.jpg) xview $i end You can even iterate over a list of elements in a file (such as filenames). foreach i (`cat file_of_filenames.txt`) $i end

Systems administration and the “one liner”

One of the prized achievements of the system administrator is a cleverly crafted “one liner”. This is a single command, strung together from many others that is a neat hack. It can be useful or amusing, and this is just considered to be a little light relief - don't worry about learning these - just try them: To get the first 50 lines of /usr/share/dict/words: perl -pe 'exit if $. > 50' /usr/share/dict/words To get all the palindromes in /usr/share/dict/words: perl -lne ' if $_ eq reverse' /usr/share/dict/words To get a man page into a text file without the control characters: man | col -b > vi.txt (If you want to see why this is a good thing try: man vi > vi2.txt and compare vi.txt and vi2.txt) If you want a command to warn you when your load average goes over 1.0 ( you could put as a cron job!): cat /proc/loadavg | -f1 -d | perl -n -e 'print “loadavg=$_” if $_>1'

Exercises:

1) Create a text file with a favoured line of prose or verse. Turn it into parody with sed. 2) Use grep to display genbank.txt without the AUTHOR lines. 3) Use grep to pull out lines of genbank.txt which have an EcoRI restriction site. 4) Read and try some of the things in these two links: http://pegasus.rutgers.edu/~elflord/unix/sed.html http://pegasus.rutgers.edu/~elflord/unix/grep.html 5) cat fofn.txt. Use foreach to display each file listed in fofn.txt to screen. Setting up a printer under Bio-Linux

Setting up a printer under Bio-Linux is simple. There are five main types of printer: 1. UNIX networked printer (lpd) 2. local printer (attached to the machine via cable) 3. Windows networked printer (attached to a Windows server/SAMBA) 4. Novell networked printer (NCP) 5. JetDirect networked printer (most LAN connected HP printers) It is important to determine which one you use before you set it up. If the printer is connected via ethernet cable to a network point it probably has an IP address. Make sure you know it, and the correct model number of the printer. Most users have JetDirect printers. This tutorial takes you through setting up a JetDirect printer. Setting up other printers is similar - they are all set up via a wizard.

Exercise:

Start the wizard from the command line: sudo printconf-gui Press “New” to set up a new printer. Follow the on-screen instructions. The queue name is arbitrary (unless your printer has defined queues [speak to your IT department about this]). Press “Next” to get to the next stage. Now enter the printer IP. If you want to fill in a dummy value just enter 192.168.1.1 Port 9100 is the default value for JetDirect. Press “Next”. For JetDirect printers we strongly recommend selecting the “Raw Print Queue”. For other kinds of printers you may have to select the correct printer model. Experiment to see what works on your setup before contacting the EGTDC for help. Then press “Next” and “Finish”. Your printer is now set up. When you return to the main menu, you will want to select the printer to be the “Default” printer and “Apply” the changes to restart the printer daemon lpd. You can then print a page from the “Test” menu. Installing software

Software you may want to install on your Bio-Linux machine is distributed in two main ways - rpm's and source code:

RPM's rpm stands for “RedHat manager”. rpm's allow the installation, removal, upgrading and listing of packages installed on your system. They are reasonably flexible and easy to use, and RPM is the mechanism is used to install the RedCarpet updates on your system. Software is precompiled for your machine and after unpacking will be ready to run.

RPM basics

In order to list all of the rpm packages on your system: rpm -qa

An rpm aside: Beware of naming conventions. If you download an rpm it will most likely be named super_spiffing_package-1.51.ix86.rpm. When this package is installed it will only be referred to by the package name and number super_spiffing_package- 1.51, and this is what you will see on an rpm -qa output. The ix86.rpm portion is lost.

To install an rpm: rpm -i .rpm

Another aside: Installing rpm's can sometimes be an exercise in patience. Many rpm's rely on other software to be installed before they can be installed - this is known as a dependency. If an rpm has unmet dependencies will get dependency warnings, showing what you need to install first. This can be a bit of a headache when you need to install 3 or more packages prior to getting the one you want installed. There is also the possibility of circular dependencies. This is where package 1 relies on package 2 which relies on package 3 which relies on package 1. These are uncommon but do happen.

Updating an rpm:

If you attempt to install the same, or a previous version of an rpm, rpm will warn you that you cannot do this. To cleanly upgrade an rpm on the system use the following syntax: rpm -Uvh .rpm The extra flags here are “v” for verbose (it prints extra information) and “h” for mark printing (which means you get a visual indication of how much the install has left to go).

Freshening an rpm: There is an alternative way to upgrade rpm's - this is called freshening. Freshen works like update but only works if a previous version of the software is installed. The update mechanism is more of a catch-all, it will update if a previous version is present, but if that version is not present then it will simply do an install. The command is: rpm -Fvh .rpm

Removing an rpm:

At some point you may want to get rid of an rpm on the system. To do this you need the name of the rpm package. You can get this by examining the output of rpm -qa. If the rpm you are trying to remove is used by other rpm's on the system (i.e. it is a dependency for other rpm's it will tell you when you try and remove it. Use this with caution, removing the wrong rpm can easily bring your system to its knees: rpm -e

Querying an rpm:

One of the main issues with rpm's is not knowing where the software is installed or what files are installed. The rpm query mechanism is very powerful once you know the syntax. To find out this kind of information try: rpm -qpl .rpm

You can also try: rpm -qip .rpm This lists all kinds of useful information about a package including a description of the package.

And this very useful command: rpm -qdf tells you if an installed file is part of an rpm package and if so, which one.

Often you will need to do some rpm troubleshooting. Outlined below are a number of common errors and their resolutions:

“Package already installed”, means what it says, the package and version number of the package match that already installed. If you really want to go ahead and install this package overwriting the previous installation you can use:

rpm -ivh --replacepkgs .rpm

conflicts with file from ”, means one of the files the rpm wants to install is already present in another package that is already installed. It may be newer or older than the already installed file and the package it conflicts with is not guaranteed to work if you chose to install the rpm anyway. Use this with caution.

rpm -ivh --replacefiles .rpm “error: unresolved dependencies”, means that you need to install some more packages before you can install the rpm. Sometimes however rpm can get confused. If you are sure that you have the dependency it is lacking installed, and rpm is in error you can use:

rpm -ivh --nodeps .rpm

Installing from source - the power user approach

Using rpm is just one way of installing new software. Many other mechanisms exist such as using () packages. However these ways of packaging binaries are relatively new to the installation world. Many packages are still distributed as compressed source code. Source packages usually come as .tar.gz files or files ending in .tgz files. These are files that that have been compressed first with tar (tape archive) and then compressed further with gzip. What these archives contain is the program code and some documentation. In order to get the package installed it must first be compiled into a working binary executable. This is normally done using a mechanism called make, which needs a file called Makefile. The actual compiling is done using gcc. Whilst installing from source code pre-dates rpm's and appears to be slightly arcane to new users, it offers an incredible amount of control. Installing from source is an essential skill for a systems administrator.

Exercise:

In order to learn about the installation from source we are going to install snort.

An aside: snort is an IDS (Intrusion Detection System) which is mature, widely used package for spotting hackers trying to get into your system. snort is only distributed as source code and has a single dependency - a code library called libpcap.

The first task with a source package is to decompress it:

tar zxvf snort-2.0.1.tar.gz

This creates a folder called snort-2.0.1 and places all the required files for installation in that directory. Change to the snort directory: cd snort-2.0.1 List the files and folders: ls The documentation in this case is stored in the doc/ directory, but does not always have a directory of it's own. Change to this directory: cd doc List the files: ls The files contain the following : AUTHORS: Name and email addresses of package authors. BUGS: Methods for reporting bugs, known issues, or debugging. FAQ: Project FAQ if one exists. INSTALL: Installation instructions. READ FULLY!!! NEWS: A list of updates since the last release. README: Copyright, synopsis, raison d'etre. Read this FIRST! TODO: List of future goals. USAGE: Humanised instructions for using the software.

Normally there are three stages to the installation of a package from source.

1. Configuration:

First of all we need to be back in the top level directory: cd .. Now we use the following command to configure aspects of the install: ./configure This tries to guess a number of system dependent parameters before actually compiling the code. The configure step produced Makefile(s) which ulitmately control how the code is compiled.

2. Making:

Now we want to build the software: make The make command now builds (compiles) the software using the optimisations and parameters from the confiure command. The software is ready to be used now, but is not installed.

3. Installing

This is usually done with sudo if you want to install the newly compiled binaries system wide. It installs the binaries and also the man pages (documentation for the program): sudo make install

It is quite common to make changes to the default install values. The process above basically accepts all the default values for the configuration, compilation and installation. By default a make install will install binaries to /usr/local/bin and documentation to /usr/local/man. To change this you can do: ./configure --prefix /new/location Many programs allow you to configure additional functionality at configuration time. snort allows many extras to be added in in the following fashion: ./configure --enable-smbalerts If configure incorrectly gets your system type, or isn't playing fair on your system you can: ./configure --= Where is CPU-COMPANY-SYSTEM. For Bio-Linux you would use: ./configure --host=i686-intel-linux

Installing software using YUM

Bio-Linux 3.0 has introduced a new way of keeping your system up to date, and managing packages on the system. Yum - the "YellowDog Updater Modified" is a program for keeping systems up to date and simplifying the install process for RPM (Red Hat Package Manager) packages.

Yum runs in 2 ways. The first method is the "daemon". The Yum daemon starts automatically when you turn on your Bio-Linux machine. On a daily basis this contacts a remote repository of packages and sees if any new packages are available for your system. It does this by first building a list of packages on your system and comparing them against its own database remotely. If an updated package is found then it is installed without user intervention and the system is always therefore kept up to date. This mechanism replaces the RedCarpet program in Bio-Linux 2.X.

We advise systems administrators to keep an eye on their Yum updates to make sure nothing has gone wrong on the nightly updates. This can be achieved by looking at the Yum logfile in /var/log/yum.log on a weekly basis.

Yum is also a fully fledged package management system. The examples below demonstrate a useful subsection of the Yum command.

To check if there are any updates pending for your system: sudo yum check-update To update your system completely: sudo yum update To install a new package: sudo yum install To update an existing package: sudo yum update To remove an existing package: sudo yum remove To get basic on an available package: sudo yum lust To find out which package provides a certain file: sudo yum provides To get details information on a package: sudo yum info To clean out cruft from yum: sudo yum clean

Please note that yum is a pretty smart piece of software and will quite happily accept partial names for packages in a way that rpm does not. Therefore to install a package you only need to type (for example) sudo yum install nedit to install nedit rather than sudo yum install nedit_5.3-4.i386 Exercises:

1.Examine the output of rpm -qa. If you don't recognise what they are use man to investigate what kind of services they are providing.

2.Attempt to install pan-0.13.4-nospell1.i686.rpm by searching out the dependencies on the . Google for the errors, has anyone else had the same problem? Do their fixes work? Which forums have good information? If all else fails install it with no dependencies and see if it works.

3.Uninstall the rpm from 2).

4.Play around with the rpm -q options. Try to make some sense of the man pages.

5.Go to http://sourceforge.net Find a small package that comes as a .tar.gz or .tgz file (not rpm) and attempt to compile and install it.

6.Install snort to /tmp/.

7.Using yum install ytalk

8.Using yum uninstall slrn

9.Using yum update the entire system.

10.Check /var/log/yum.log to see what yum has been installing recently.