Unix Essentials (Pdf)

Unix Essentials (Pdf)

Unix Essentials Bingbing Yuan Next Hot Topics: Unix – Beyond Basics (Mon Oct 20th at 1pm) 1 Objectives • Unix Overview • Whitehead Resources • Unix Commands • BaRC Resources • LSF 2 Objectives: Hands-on • Parsing Human Body Index (HBI) array data Goal: Process a large data file to get important information such as genes of interest, sorting expression values, and subset the data for further investigation. 3 Advantages of Unix • Processing files with thousands, or millions, of lines How many reads are in my fastq file? Sort by gene name or expression values • Many programs run on Unix only Command-line tools • Automate repetitive tasks or commands Scripting • Other software, such as Excel, are not able to handle large files efficiently • Open Source 4 Scientific computing resources 5 Shared packages/programs https://tak.wi.mit.edu Request new packages/programs Installed packages/programs 6 Login • Requesting a tak account http://iona.wi.mit.edu/bio/software/unix/bioinfoaccount.php • Windows PuTTY or Cygwin Xming: setup X-windows for graphical display • Macs Access through Terminal 7 Connecting to tak for Windows Command Prompt user@tak ~$ 8 Log in to tak for Mac ssh –Y [email protected] 9 Unix Commands • General syntax Command Options or switches (zero or more) Arguments (zero or more) Example: uniq –c myFile.txt command options arguments Options can be combined ls –l –a or ls –la • Manual (man) page man uniq • One line description whatis ls 10 Unix Directory Structure root / home dev bin nfs lab . jdoe BaRC_Public solexa_public solexa_lodish page /home/jdoe /lab/page 11 Accessing Shared Resources at Whitehead • Unix /nfs/BaRC_Public /lab/solexa_public /lab/page • Windows (access using Start Menu Search) \\wi-files1\BaRC_Public \\wi-files1\fink_lab \\wi-files2\page \\wi-htdata\solexa_public • Macs (access using Go Connect to Server…) cifs://wi-files1/BaRC_Public cifs://wi-htdata/solexa_public Where’s my lab’s share? • http://wi-inside.wi.mit.edu/departments/it/services/filestorage/labshares 12 Directory Contents • List files/directories ls lists the contents of a directory ls –l includes additional info (eg. permissions, time stamp) Options: -l long listing -h human readable thiruvil@tak /nfs/BaRC_Public$ ls -l total 4740 drwxrwxr-x 5 gbell barc 4096 2012-03-16 15:56 apps/ drwxrwxr-x 4 gbell barc 4096 2011-10-18 09:48 BaRC_code/ drwxrwxrwx 5 gbell barc 4096 2012-09-17 15:03 Bartel_Lab/ drwxrwsrwx 3 gbell barc 4096 2012-05-04 16:17 Cheeseman_Lab/ drwxrwsrwx 3 byuan barc 4096 2010-11-23 14:22 chip_seq/ drwxrwsrwx 2 gbell barc 4096 2012-02-21 16:26 CMT/ -rw-r--r-- 1 gbell barc 192568 2012-10-10 10:14 du.20121010a.txt Permissions Owner Group Size (bytes) Time Stamp File or directory 13 Permissions drwxrwxr-x r read Type: directory (d) User Group Others w write symbolic link(l) x execute • Use chmod to change permissions user(u), group(g), others(o), all(a) chmod u+x foo.pl (user can execute) chmod g-w foo.pl (group can’t write) permission denied error thiruvil@tak /nfs/BaRC_Public$ ls -l myFile.txt -rw-r--r-- 1 thiruvil barc 0 2012-10-10 13:32 myFile.txt thiruvil@tak /nfs/BaRC_Public$ chmod g+w myFile.txt thiruvil@tak /nfs/BaRC_Public$ ll myFile.txt -rw-rw-r-- 1 thiruvil barc 0 2012-10-10 13:32 myFile.txt 14 Navigating in Unix • pwd print working directory byuan@tak ~$ pwd /home/byuan • cd change directory cd fink_lab # if you are in /lab cd to home directory cd ~ cd to directory above cd .. cd to a specific directory cd /nfs/BaRC_Public • No such file or directory error 15 Organizing Files and Directories • Commands mkdir make a directory mkdir my_foo rmdir remove a directory (must be empty) rmdir my_foo mv move or rename a file/directory mv myOldFile myNewFile cp copy a file cp myOldFile myNewFile rm remove or delete a file rm myFile 16 Unix Tips • Use to reuse previous commands • Ctrl-c: stop a process that is running • Tab-completion: – Complete commands/file names • Unix is case-sensitive 17 Getting Files • Getting files or directories Files wget http://www.broadinstitute.org/igv/projects/downloads/IGV_2.1.17.zip Directories from (outside) servers scp -r [email protected]:/broad/lab/works . 18 (Un)Compressing Files • .gz file Compress: gzip expression.txt > expression.gz Uncompress: gunzip expression.gz • .tar.gz file Compress: tar –czvf myFiles.tar.gz myFiles Uncompress: tar –xzvf myFiles.tar.gz Options -c create an archive (files to archive, archive from files) -x extract an archive (archive to files, files from archive) -f FILE name of archive -v be verbose, list all files being archived/extracted -z create/extract archive with gzip/gunzip • View compressed files using: – zmore,zgrep 19 Editing a File • Command-line editors pico nano emacs (emacs –nw) vi • Graphical editors (Windows users need an X-windows emulator) Note: may not be part of standard installation nedit gedit xemacs • Put an & at the end of command line to run it in the background when using a graphical editor so that you can continue to use the terminal window eg. gedit myFile.txt& 20 Viewing a File • Display page-by-page basis more myFile.txt Use: to scroll, space for next page and q to quit • Display first 15 lines of a file head -15 myFile.txt • Display last 15 lines of a file tail -15 myFile.txt • Show all contents of a file cat myFile.txt Show hidden characters (^M or carriage return) cat –A myFile.txt • Display number of lines in a file wc –l myFile.txt 21 Output Redirection and Piping • Write output of a command to file Write to output file • sort myFile.txt > myFile_sorted.txt Replace to output file • sort myFile.txt >| myFile_sorted.txt Append to output file • sort myFile.txt >> myFile_sorted.txt • Piping “|”: use output of one command as input for another command sort myFile.txt | more 22 Parsing a File: cut • Select columns of interest cut –f 9,12-15 myGeneValues.txt > col_9.12to15.txt Options: -f output only these fields -d field delimiter 23 Parsing a File: sort and uniq • Sort on column(s) sort -k 3,3 myGeneExpression.txt | more Options: -n numerical sort -r reverse -k pos1,pos2 start a key at pos1, end it at pos2 • Get only unique entries ensure file is sorted before running uniq uniq mySortedGenes.txt > myUniqGenes.txt Options: -c count entries -d duplicate counts 24 Regular Expressions • Pattern matching • Easier to search • Commonly used regular expressions Regular Expression Matches . All characters * Zero or more; wildcard ^ Beginning of a line $ End of a line Example: list all txt files ls *.txt 25 Searching Within a File • grep (global regular expression print) • Find words, or patterns, occurring in lines of a file grep TMEM geneList.txt TMEM131 TMEM9B TMEM14C TMEM66 TMEM49 Options: -v select non-matching lines -i ignore case -n print line number Example: get TMEM that does not end with 9 grep TMEM geneList.txt | grep -v "TMEM14C" | more 26 BaRC Resources • jura.wi.mit.edu 27 BaRC SOP http://barcwiki.wi.mit.edu/wiki/SOPs 28 BaRC Scripts 29 Running Scripts on Unix • Perl bed2gff.pl • R run_rma_customCDF.R • Python myScript.py • Matlab matlab -nodesktop -nosplash myScript.m • Java Archive (JAR) java -Xmx1000m -jar /usr/local/share/IGVTools/igv.jar 30 Running Programs/Tools on Unix • bedtools bedtools intersect -a myGenes1.bed –b myGenes2.bed Other utilities: http://code.google.com/p/bedtools/wiki/Usage • samtools samtools view myFile.bam Other utilities: http://samtools.sourceforge.net/samtools.shtml • Fastx toolkit fastx_quality_stats -i mySeq.fastq -o fastxStats_mySeq • FastQC fastqc mySeq.fastq • BLAST blastp –task blastp -db myProtDB.fa –q myProt.fa –out out.txt 31 Commonly Used Data Locations at Whitehead Location Description /nfs/genomes Genome data: gff, gtf, fasta, bowtie indexed files, blat indexed file, etc. for several organisms /nfs/seq/Data Sequence data, including blast databases, for several organisms /nfs/BaRC_datasets Large (array/NGS) datasets: HBI, HBM 2.0 32 Scientific computing resources 33 LSF cluster jobs https://tak.wi.mit.edu 34 Load Sharing Facility (LSF) Cluster • More computing power • Multiple jobs running at the same time 35 LSF Commands • bsub to submit jobs bsub wc –l reads.fq bsub “sort foo.txt > sorted.txt” Options: -e error file -o standard out file -m machine -u email address • bjobs to view your jobs bjobs • bkill to kill a job bkill 237878 36 Further Reading • BaRC: Unix Info http://iona.wi.mit.edu/bio/education/unix_intro.php • LSF Cluster (incl. examples) http://iona.wi.mit.edu/bio/bioinfo/docs/LSF_help.php • Whitehead IT Scientific Computing Tutorials http://wi-inside.wi.mit.edu/departments/it/services/scientificcomputing/scitutorials 37 .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    37 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us