Unix Essentials (Pdf)
Total Page:16
File Type:pdf, Size:1020Kb
Unix Essentials Bingbing Yuan Next Hot Topics: Unix – Beyond Basics (Mon Oct 20th at 1pm) 1 Objectives • Unix Overview • Whitehead Resources • Unix Commands • BaRC Resources • LSF 2 Objectives: Hands-on • Parsing Human Body Index (HBI) array data Goal: Process a large data file to get important information such as genes of interest, sorting expression values, and subset the data for further investigation. 3 Advantages of Unix • Processing files with thousands, or millions, of lines How many reads are in my fastq file? Sort by gene name or expression values • Many programs run on Unix only Command-line tools • Automate repetitive tasks or commands Scripting • Other software, such as Excel, are not able to handle large files efficiently • Open Source 4 Scientific computing resources 5 Shared packages/programs https://tak.wi.mit.edu Request new packages/programs Installed packages/programs 6 Login • Requesting a tak account http://iona.wi.mit.edu/bio/software/unix/bioinfoaccount.php • Windows PuTTY or Cygwin Xming: setup X-windows for graphical display • Macs Access through Terminal 7 Connecting to tak for Windows Command Prompt user@tak ~$ 8 Log in to tak for Mac ssh –Y [email protected] 9 Unix Commands • General syntax Command Options or switches (zero or more) Arguments (zero or more) Example: uniq –c myFile.txt command options arguments Options can be combined ls –l –a or ls –la • Manual (man) page man uniq • One line description whatis ls 10 Unix Directory Structure root / home dev bin nfs lab . jdoe BaRC_Public solexa_public solexa_lodish page /home/jdoe /lab/page 11 Accessing Shared Resources at Whitehead • Unix /nfs/BaRC_Public /lab/solexa_public /lab/page • Windows (access using Start Menu Search) \\wi-files1\BaRC_Public \\wi-files1\fink_lab \\wi-files2\page \\wi-htdata\solexa_public • Macs (access using Go Connect to Server…) cifs://wi-files1/BaRC_Public cifs://wi-htdata/solexa_public Where’s my lab’s share? • http://wi-inside.wi.mit.edu/departments/it/services/filestorage/labshares 12 Directory Contents • List files/directories ls lists the contents of a directory ls –l includes additional info (eg. permissions, time stamp) Options: -l long listing -h human readable thiruvil@tak /nfs/BaRC_Public$ ls -l total 4740 drwxrwxr-x 5 gbell barc 4096 2012-03-16 15:56 apps/ drwxrwxr-x 4 gbell barc 4096 2011-10-18 09:48 BaRC_code/ drwxrwxrwx 5 gbell barc 4096 2012-09-17 15:03 Bartel_Lab/ drwxrwsrwx 3 gbell barc 4096 2012-05-04 16:17 Cheeseman_Lab/ drwxrwsrwx 3 byuan barc 4096 2010-11-23 14:22 chip_seq/ drwxrwsrwx 2 gbell barc 4096 2012-02-21 16:26 CMT/ -rw-r--r-- 1 gbell barc 192568 2012-10-10 10:14 du.20121010a.txt Permissions Owner Group Size (bytes) Time Stamp File or directory 13 Permissions drwxrwxr-x r read Type: directory (d) User Group Others w write symbolic link(l) x execute • Use chmod to change permissions user(u), group(g), others(o), all(a) chmod u+x foo.pl (user can execute) chmod g-w foo.pl (group can’t write) permission denied error thiruvil@tak /nfs/BaRC_Public$ ls -l myFile.txt -rw-r--r-- 1 thiruvil barc 0 2012-10-10 13:32 myFile.txt thiruvil@tak /nfs/BaRC_Public$ chmod g+w myFile.txt thiruvil@tak /nfs/BaRC_Public$ ll myFile.txt -rw-rw-r-- 1 thiruvil barc 0 2012-10-10 13:32 myFile.txt 14 Navigating in Unix • pwd print working directory byuan@tak ~$ pwd /home/byuan • cd change directory cd fink_lab # if you are in /lab cd to home directory cd ~ cd to directory above cd .. cd to a specific directory cd /nfs/BaRC_Public • No such file or directory error 15 Organizing Files and Directories • Commands mkdir make a directory mkdir my_foo rmdir remove a directory (must be empty) rmdir my_foo mv move or rename a file/directory mv myOldFile myNewFile cp copy a file cp myOldFile myNewFile rm remove or delete a file rm myFile 16 Unix Tips • Use to reuse previous commands • Ctrl-c: stop a process that is running • Tab-completion: – Complete commands/file names • Unix is case-sensitive 17 Getting Files • Getting files or directories Files wget http://www.broadinstitute.org/igv/projects/downloads/IGV_2.1.17.zip Directories from (outside) servers scp -r [email protected]:/broad/lab/works . 18 (Un)Compressing Files • .gz file Compress: gzip expression.txt > expression.gz Uncompress: gunzip expression.gz • .tar.gz file Compress: tar –czvf myFiles.tar.gz myFiles Uncompress: tar –xzvf myFiles.tar.gz Options -c create an archive (files to archive, archive from files) -x extract an archive (archive to files, files from archive) -f FILE name of archive -v be verbose, list all files being archived/extracted -z create/extract archive with gzip/gunzip • View compressed files using: – zmore,zgrep 19 Editing a File • Command-line editors pico nano emacs (emacs –nw) vi • Graphical editors (Windows users need an X-windows emulator) Note: may not be part of standard installation nedit gedit xemacs • Put an & at the end of command line to run it in the background when using a graphical editor so that you can continue to use the terminal window eg. gedit myFile.txt& 20 Viewing a File • Display page-by-page basis more myFile.txt Use: to scroll, space for next page and q to quit • Display first 15 lines of a file head -15 myFile.txt • Display last 15 lines of a file tail -15 myFile.txt • Show all contents of a file cat myFile.txt Show hidden characters (^M or carriage return) cat –A myFile.txt • Display number of lines in a file wc –l myFile.txt 21 Output Redirection and Piping • Write output of a command to file Write to output file • sort myFile.txt > myFile_sorted.txt Replace to output file • sort myFile.txt >| myFile_sorted.txt Append to output file • sort myFile.txt >> myFile_sorted.txt • Piping “|”: use output of one command as input for another command sort myFile.txt | more 22 Parsing a File: cut • Select columns of interest cut –f 9,12-15 myGeneValues.txt > col_9.12to15.txt Options: -f output only these fields -d field delimiter 23 Parsing a File: sort and uniq • Sort on column(s) sort -k 3,3 myGeneExpression.txt | more Options: -n numerical sort -r reverse -k pos1,pos2 start a key at pos1, end it at pos2 • Get only unique entries ensure file is sorted before running uniq uniq mySortedGenes.txt > myUniqGenes.txt Options: -c count entries -d duplicate counts 24 Regular Expressions • Pattern matching • Easier to search • Commonly used regular expressions Regular Expression Matches . All characters * Zero or more; wildcard ^ Beginning of a line $ End of a line Example: list all txt files ls *.txt 25 Searching Within a File • grep (global regular expression print) • Find words, or patterns, occurring in lines of a file grep TMEM geneList.txt TMEM131 TMEM9B TMEM14C TMEM66 TMEM49 Options: -v select non-matching lines -i ignore case -n print line number Example: get TMEM that does not end with 9 grep TMEM geneList.txt | grep -v "TMEM14C" | more 26 BaRC Resources • jura.wi.mit.edu 27 BaRC SOP http://barcwiki.wi.mit.edu/wiki/SOPs 28 BaRC Scripts 29 Running Scripts on Unix • Perl bed2gff.pl • R run_rma_customCDF.R • Python myScript.py • Matlab matlab -nodesktop -nosplash myScript.m • Java Archive (JAR) java -Xmx1000m -jar /usr/local/share/IGVTools/igv.jar 30 Running Programs/Tools on Unix • bedtools bedtools intersect -a myGenes1.bed –b myGenes2.bed Other utilities: http://code.google.com/p/bedtools/wiki/Usage • samtools samtools view myFile.bam Other utilities: http://samtools.sourceforge.net/samtools.shtml • Fastx toolkit fastx_quality_stats -i mySeq.fastq -o fastxStats_mySeq • FastQC fastqc mySeq.fastq • BLAST blastp –task blastp -db myProtDB.fa –q myProt.fa –out out.txt 31 Commonly Used Data Locations at Whitehead Location Description /nfs/genomes Genome data: gff, gtf, fasta, bowtie indexed files, blat indexed file, etc. for several organisms /nfs/seq/Data Sequence data, including blast databases, for several organisms /nfs/BaRC_datasets Large (array/NGS) datasets: HBI, HBM 2.0 32 Scientific computing resources 33 LSF cluster jobs https://tak.wi.mit.edu 34 Load Sharing Facility (LSF) Cluster • More computing power • Multiple jobs running at the same time 35 LSF Commands • bsub to submit jobs bsub wc –l reads.fq bsub “sort foo.txt > sorted.txt” Options: -e error file -o standard out file -m machine -u email address • bjobs to view your jobs bjobs • bkill to kill a job bkill 237878 36 Further Reading • BaRC: Unix Info http://iona.wi.mit.edu/bio/education/unix_intro.php • LSF Cluster (incl. examples) http://iona.wi.mit.edu/bio/bioinfo/docs/LSF_help.php • Whitehead IT Scientific Computing Tutorials http://wi-inside.wi.mit.edu/departments/it/services/scientificcomputing/scitutorials 37 .