Genomics Requires Powerful A Genomics View of Computers • Genomic data sets are large and complex, the requirements for computing power are also quite large • Computers that can provide this power generally use the Unix operating system - so you must learn Unix • The mac OS X is based on Unix – of the Unix information is true for your mac simply open “Terminal” inside X11 and you can control your powerbook with unix commands

Unix Advantages Stable and Efficient

• It is very popular, so it is easy to • Unix is very stable - computers running Unix information and get almost never crash • Lots of books • plenty of helpful websites • Unix is very efficient • USENET discussions (google groups) and e- lists • it gets maximum number crunching power out of your processor (and multiple processors) • most Comp. Sci. students know Unix • it can smoothly manage very large amounts of data • Unix can run on virtually any computer • Most bioinformatics software is created for (IBM, Sun, Compaq, Macintosh,etc) Unix - its easy for the programmers • Unix is free or nearly free • /open source software movement • Suse, FreeBSD, Red Hat, LinuxPPC, etc.

1 Unix has some Drawbacks General Unix Tips • Unix computers are controlled by a • UNIX is case sensitive!! line interface – myfile.txt and MyFile.txt do not mean the same thing • NOT user-friendly • difficult to learn, even difficult to truly master • You communicate with a Unix computer through a command program known as a shell. • Hackers love Unix • The shell interprets the commands that you on • there are lots of security holes the keyboard • most computers on the Internet run Unix , so hackers can • Advanced users can shell scripts to automate apply the same tricks to many different computers program execution, important for large (genomic size) datasets • There are many different versions of Unix with subtle (or not so subtle) differences

Tips…..Control Characters Getting Help in Unix

• You type Control characters by holding • Unix is not a user-friendly computer system. down the ‘ctrl’ key while also pressing the – While not actively user-hostile, it is perfectly happy to specified character. sit there and taunt you with a blank screen and a • While you are typing a command: blinking > cursor. • ctrl-U erases the whole command line • There is a rudimentary Help system consists of a set of "manual” pages for every Unix command. • Control commands that work (almost) any •The man pages tell you which options a particular • ctrl-S suspends (halts) output scrolling up on your command can take, and how each option modifies terminal screen the behavior of the command. • ctrl-Q resumes the display of output on your screen • ctrl- will abort any program • Type man and the name of a command to read the • ctrl-Z will pause any program manual page for that command.

2 ? Apple’s OS X for ‘’ More Help ( ) • So what if you don’t know what command LS(1) System General Commands Manual LS(1)

NAME you need? ls - list directory contents • You can search the manual pages for a given SYNOPSIS ls [-ACFLRSTWacdfgiklnoqrstux1] [ ...] keyword in the header: DESCRIPTION For each operand that names a file of a type other than directory, ls man -k displays its name as well as any requested, associated information. For each operand that names a file of type directory, ls displays the names of files contained within that directory, as well as any requested, asso- – E.g. to get help on commands that deal with ciated information. passwords you would type: man -k password If no operands are given, the contents of the current directory are dis- played. If more than one operand is given, non-directory operands are – works on some unix systems as well, try displayed first; directory and non-directory operands are sorted sepa- rately and in lexicographical order. man apropos on other unix systems for help The following options are available: • Get yourself a good "Intro to Unix" book -A List all entries except for `.' and `..'. Always set for the

Unix Help on the Web You will need to be able to: Here is a list of a few online Unix tutorials: • Unix for Beginners • Type and understand basic commands http://www.ee.surrey.ac.uk/Teaching/Unix/ – Manipulate files (copy, delete, examine etc) • Getting Started in Unix – Run Bioinformatic programs (consed, etc) http://www2.ucsc.edu/cats/sc/help/unix/start.shtml • Unix Guru Universe • Understand files and directories http://www.ugu.com/sui/ugu/show?help.beginners • Getting Started With The Unix Operating System http://www.leeds.ac.uk/iss/documentation/beg/beg8/beg8.html

3 Unix Commands Unix Commands (cont) Most commands have this basic structure: • Many Unix commands are short and cryptic like command -a -b -c input.file or . • Computer geeks like it that way. The more you use it Program modifiers input the more you will like it • Every command has a of modifiers which are •Program: the name of the program you want to run (blast, consed, etc.) generally single letters preceded by a hyphen: ls -l or -R •Modifiers (optional, usually in any order): a • Capital letters have different functions than small CASE SENSITIVE letter which controls exactly letters, often completely unrelated. how the program will run • A command also generally requires one or more •Some will be plain “-l” some will have values arguments, meaning some file on which it will act: “-l guest” -n mygene. •Input (optional): the file(s) you wish the program to act upon.

Lets change your Password Unix Filenames • Unix is case sensitive 1. Start X11 click on icon bottom 2. You should see a window “xterm” • UNIX filenames should contain only letters, show up where you can type commands numbers, and the _ (underscore), . (), and - 3. Type: (dash) characters. This command has no arguments 4. enter current password (dna50) • Unix does not allow two files to exist in the 5. enter new password 2X same directory with the same name. 6. DO NOT FORGET! • Whenever a situation occurs where a file is about to be created or copied into a directory where another file has that exact same name, the new file will overwrite (and delete) the older file. • Unix will generally alert you when this is about to happen, but it is easy to ignore the warning.

4 Filename Extensions Some Common Extensions

• By convention most UNIX filenames start with a • By convention files that end in : lower case letter and end with a dot followed by – .txt are text files one, two, or three letters (or more): myfile.txt – .html are HTML files for the Web • However, this is just a common convention and is not required. • It is also possible to have additional dots in the filename. – .zip or .gz are compressed files • The part of the name following the dot is called the – .fasta are DNA or peptide sequences in fasta “extension.” format • The extension is used to designate the file type – .seq or .pep are DNA or protein sequences • This convention is used to help you remember what • Files can have more than one extension. the file is. Unix does not require extensions. – Compressed text files would end .txt.gz

Unix directories Unix directories (cont) • Unix systems can have many thousands of files When logged into a computer there is always • Directories are a means of organizing your files on a Unix computer. one directory called the “current working – They are equivalent to folders in Windows and directory”. Macintosh computers • Start with the “root” directory: / The computer will look for files in the • There are files and directories inside root “current working dir” , unless told • There are files and other directories inside those specifically where to look. • There are files and other directories inside those •. •.

5 Unix directories (cont) Your Home Directory

A list of all the directories to a file is called the “path”. • When you login to any Unix server, you The path can either start at root “/” always start in your Home directory. /Users/chris/Documents/4342files/2004/lecture1.ppt – On Mac my home is /Users/chris Or the current working directory • Create sub-directories to store specific Documents/4342files/2004/lecture1.ppt projects or groups of information, just as you would place folders in a filing cabinet. •Do not accumulate thousands of files with cryptic names in your Home directory

File & Directory Commands Navigation • (present working directory) shows the name of • This is a minimal list of Unix commands that you the current working directory:

must know for file management: > pwd

ls (list files) ( directory) /Users/chris

(change directory) (remove directory) • ls (list) gives you a list of the files in the current directory: > ls (copy file) pwd (present working directory) Desktop Public Documents Movies Sites mv ( file) more (view by page) Use the ls -l (long) option to get more information rm (remove) cat ( file on screen) ~/ >ls -l total 0 • All of these commands can be modified with many drwx------10 chris staff 340 Mar 9 11:11 Desktop drwx------13 chris staff 442 Mar 9 11:11 Documents options. Learn to use Unix ‘man’ pages for more drwx------28 chris staff 952 Feb 21 10:42 Library information. drwx------3 chris staff 102 Dec 20 21:16 Movies

6 Sub-directories Shortcuts • cd (change directory) set a new current working dir >cd Documents • There are some important shortcuts in Unix for > pwd /Users/chris/Documents specifying directories • mkdir (make directory) creates a new • . (dot) means "the current directory" sub-directory inside of the current directory • .. means "the parent directory" - the directory one level > ls above the current directory, so cd .. will move you up one Leah.cvs splitGTI.pl level > mkdir subdir > ls • ~ (tilde) means your Home directory, so cd ~ will move Leah.cvs splitGTI.pl subdir you back to your Home. • rmdir (remove directory) deletes a sub-directory, but – Just typing a plain cd will also bring you back to your home the sub-directory must be empty directory > rmdir subdir > ls Leah.cvs splitGTI.pl

Commands for Files More • Files are used to store information, for • Use the command more to view the contents example, DNA sequence data or the results of of a file one screen at a time: > more t27054_cel.pep some analysis. !!AA_SEQUENCE 1.0 P1;T27054 - protein Y49E10.20 - Caenorhabditis elegans – You will mostly deal with text files Length: 534 May 30, 2000 13:49 Type: P Check: 1278 .. 1 MLKKAPCLFG SAIILGLLLA AAGVLLLIGI PIDRIVNRQV IDQDFLGYTR • cat dumps the entire contents of a file onto the 51 DENGTEVPNA MTKSWLKPLY AMQLNIWMFN VTNVDGILKR HEKPNLHEIG 101 PFVFDEVQEK VYHRFADNDT RVFYKNQKLY HFNKNASCPT CHLDMKVTIP screen. t27054_cel.pep (87%) – For a long file this can be annoying, but it can also be – Hit the spacebar to page down through the file helpful if you want to copy and from your screen – b moves back up a page – At the bottom of the screen, more shows how much of the file has been displayed – /pattern searches ahead for “pattern”

7 Copy & Move Delete • cp lets you copy a file from any directory to • Use the command rm (remove) to delete any other directory, or create a copy of a file files with a new name in one directory •ccp file.ext newname.ext • There is no way to undo this command!!! •ccp file.ext subdir/newname.ext – some Unix setup’s will ask you to confirm •ccp /Users/chris/file.ext ./subdir/newfilename.ext others will not. Be aware of this when deleting • mv allows you to move files to other > ls directories, but it is also used to rename files. af151074.gb_pr5 .seq – Filename and directory syntax for mv is exactly the same as > rm test.seq > ls for the cp command. af151074.gb_pr5 •mmv filename.ext subdir/newfilename.ext – NOTE: When you use mv to move a file into another directory, the current file is deleted.

Wildcards Two Tricks for everyone • You may often want to work on more than one file at a time. You can use wildcard symbols to do this. • Command • You can substitute the * as a wildcard symbol for any number of characters in any filename. • Filename completion • If you type just * after a command, it stands for all files in the current directory: cat * will dump all files to the screen • You can mix the * with other characters to form a search pattern: a*.txt would be a list all files that start with “a” and end in “.txt” •The “?” wildcard stands for any single character: draft?.doc would match draft1.doc, draft2.doc, draftb.doc, etc.

8 Command history Command history • The computer remembers you last 100 or so • The computer remembers you last 100 or so commands. commands. • Use the “up arrow” to scroll back through • Use the “up arrow” to scroll back through these commands. these commands. • You can use right and left arrow to move • You can use right and left arrow to move around inside the command to edit it around inside the command to edit it • Very for long commands (e.g. blast • Very nice for long commands (e.g. blast searches) searches)

Files name completion

• Since the computer will only look in the current working dir for files when you want to type in a file name all you have to type is enough characters at the beginning to uniquely define the file • When you hit the shell will finish typeing the name for you > ls af151074.gb_pr5 test.seq apt.txt > rm af > rm af151074.gb_pr5 • Have long names but keep the differences at the beginning to facilitate using this feature

9