A Genomics View of Unix
Total Page:16
File Type:pdf, Size:1020Kb
Genomics Requires Powerful A Genomics View of Unix Computers • Genomic data sets are large and complex, the requirements for computing power are also quite large • Computers that can provide this power generally use the Unix operating system - so you must learn Unix • The mac OS X is based on Unix – most of the Unix information is true for your mac simply open “Terminal” inside X11 and you can control your powerbook with unix commands Unix Advantages Stable and Efficient • It is very popular, so it is easy to find • Unix is very stable - computers running Unix information and get help almost never crash • Lots of books • plenty of helpful websites • Unix is very efficient • USENET discussions (google groups) and e-mail lists • it gets maximum number crunching power out of your processor (and multiple processors) • most Comp. Sci. students know Unix • it can smoothly manage very large amounts of data • Unix can run on virtually any computer • Most bioinformatics software is created for (IBM, Sun, Compaq, Macintosh,etc) Unix - its easy for the programmers • Unix is free or nearly free • Linux/open source software movement • Suse, FreeBSD, Red Hat, LinuxPPC, etc. 1 Unix has some Drawbacks General Unix Tips • Unix computers are controlled by a command • UNIX is case sensitive!! line interface – myfile.txt and MyFile.txt do not mean the same thing • NOT user-friendly • difficult to learn, even more difficult to truly master • You communicate with a Unix computer through a command program known as a shell. • Hackers love Unix • The shell interprets the commands that you type on • there are lots of security holes the keyboard • most computers on the Internet run Unix , so hackers can • Advanced users can write shell scripts to automate apply the same tricks to many different computers program execution, important for large (genomic size) datasets • There are many different versions of Unix with subtle (or not so subtle) differences Tips…..Control Characters Getting Help in Unix • You type Control characters by holding • Unix is not a user-friendly computer system. down the ‘ctrl’ key while also pressing the – While not actively user-hostile, it is perfectly happy to specified character. sit there and taunt you with a blank screen and a • While you are typing a command: blinking > cursor. • ctrl-U erases the whole command line • There is a rudimentary Help system which consists of a set of "manual” pages for every Unix command. • Control commands that work (almost) any time •The man pages tell you which options a particular • ctrl-S suspends (halts) output scrolling up on your command can take, and how each option modifies terminal screen the behavior of the command. • ctrl-Q resumes the display of output on your screen • ctrl-C will abort any program • Type man and the name of a command to read the • ctrl-Z will pause any program manual page for that command. 2 ? Apple’s OS X man page for ‘ls’ More Help ( ) • So what if you don’t know what command LS(1) System General Commands Manual LS(1) NAME you need? ls - list directory contents • You can search the manual pages for a given SYNOPSIS ls [-ACFLRSTWacdfgiklnoqrstux1] [file ...] keyword in the header: DESCRIPTION For each operand that names a file of a type other than directory, ls man -k displays its name as well as any requested, associated information. For each operand that names a file of type directory, ls displays the names of files contained within that directory, as well as any requested, asso- – E.g. to get help on commands that deal with ciated information. passwords you would type: man -k password If no operands are given, the contents of the current directory are dis- played. If more than one operand is given, non-directory operands are – Apropos works on some unix systems as well, try displayed first; directory and non-directory operands are sorted sepa- rately and in lexicographical order. man apropos on other unix systems for help The following options are available: • Get yourself a good "Intro to Unix" book -A List all entries except for `.' and `..'. Always set for the Unix Help on the Web You will need to be able to: Here is a list of a few online Unix tutorials: • Unix for Beginners • Type and understand basic commands http://www.ee.surrey.ac.uk/Teaching/Unix/ – Manipulate files (copy, delete, examine etc) • Getting Started in Unix – Run Bioinformatic programs (consed, etc) http://www2.ucsc.edu/cats/sc/help/unix/start.shtml • Unix Guru Universe • Understand files and directories http://www.ugu.com/sui/ugu/show?help.beginners • Getting Started With The Unix Operating System http://www.leeds.ac.uk/iss/documentation/beg/beg8/beg8.html 3 Unix Commands Unix Commands (cont) Most commands have this basic structure: • Many Unix commands are short and cryptic like command -a -b -c input.file vi or rm. • Computer geeks like it that way. The more you use it Program modifiers input the more you will like it • Every command has a host of modifiers which are •Program: the name of the program you want to run (blast, consed, etc.) generally single letters preceded by a hyphen: ls -l or mv -R •Modifiers (optional, usually in any order): a • Capital letters have different functions than small CASE SENSITIVE letter which controls exactly letters, often completely unrelated. how the program will run • A command also generally requires one or more •Some will be plain “-l” some will have values arguments, meaning some file on which it will act: “-l guest” cat -n mygene.seq •Input (optional): the file(s) you wish the program to act upon. Lets change your Password Unix Filenames • Unix is case sensitive 1. Start X11 click on icon at bottom 2. You should see a window “xterm” • UNIX filenames should contain only letters, show up where you can type commands numbers, and the _ (underscore), . (dot), and - 3. Type: passwd (dash) characters. This command has no arguments 4. enter current password (dna50) • Unix does not allow two files to exist in the 5. enter new password 2X same directory with the same name. 6. DO NOT FORGET! • Whenever a situation occurs where a file is about to be created or copied into a directory where another file has that exact same name, the new file will overwrite (and delete) the older file. • Unix will generally alert you when this is about to happen, but it is easy to ignore the warning. 4 Filename Extensions Some Common Extensions • By convention most UNIX filenames start with a • By convention files that end in : lower case letter and end with a dot followed by – .txt are text files one, two, or three letters (or more): myfile.txt – .html are HTML files for the Web • However, this is just a common convention and is not required. • It is also possible to have additional dots in the filename. – .zip or .gz are compressed files • The part of the name following the dot is called the – .fasta are DNA or peptide sequences in fasta “extension.” format • The extension is used to designate the file type – .seq or .pep are DNA or protein sequences • This convention is used to help you remember what • Files can have more than one extension. the file is. Unix does not require extensions. – Compressed text files would end .txt.gz Unix directories Unix directories (cont) • Unix systems can have many thousands of files When logged into a computer there is always • Directories are a means of organizing your files on a Unix computer. one directory called the “current working – They are equivalent to folders in Windows and directory”. Macintosh computers • Start with the “root” directory: / The computer will look for files in the • There are files and directories inside root “current working dir” , unless told • There are files and other directories inside those specifically where to look. • There are files and other directories inside those •. •. 5 Unix directories (cont) Your Home Directory A list of all the directories to a file is called the “path”. • When you login to any Unix server, you The path can either start at root “/” always start in your Home directory. /Users/chris/Documents/4342files/2004/lecture1.ppt – On Mac my home is /Users/chris Or the current working directory • Create sub-directories to store specific Documents/4342files/2004/lecture1.ppt projects or groups of information, just as you would place folders in a filing cabinet. •Do not accumulate thousands of files with cryptic names in your Home directory File & Directory Commands Navigation • pwd (present working directory) shows the name of • This is a minimal list of Unix commands that you the current working directory: must know for file management: > pwd ls (list files) mkdir (make directory) /Users/chris cd (change directory) rmdir (remove directory) • ls (list) gives you a list of the files in the current directory: > ls cp (copy file) pwd (present working directory) Desktop Library Public Documents Movies Sites mv (move file) more (view by page) Use the ls -l (long) option to get more information rm (remove) cat (dump file on screen) ~/ >ls -l total 0 • All of these commands can be modified with many drwx------ 10 chris staff 340 Mar 9 11:11 Desktop drwx------ 13 chris staff 442 Mar 9 11:11 Documents options. Learn to use Unix ‘man’ pages for more drwx------ 28 chris staff 952 Feb 21 10:42 Library information. drwx------ 3 chris staff 102 Dec 20 21:16 Movies 6 Sub-directories Shortcuts • cd (change directory) set a new current working dir >cd Documents • There are some important shortcuts in Unix for > pwd /Users/chris/Documents specifying directories • mkdir (make directory) creates a new • .