<<

10/19/2017

Outline

overview Introduction to Unix – Logging in to tak4 – Directory structure – Accessing Whitehead drives/finding your lab share • Basic commands • Exercises BaRC Hot Topics • BaRC and Whitehead resources Bioinformatics and Research Computing Whitehead Institute • LSF October 19th 2017 http://barc.wi.mit.edu/hot_topics/ 1 2

What is unix? and Why unix? Where can unix be used?

• Unix is a family of operating systems that are • Mac computers related to the original UNIX Come with unix • It is powerful, so large datasets can be analyzed • Windows computers need Cygwin or PuTTY: • Many repetitive analyses or tasks can be easily \\grindhouse\Software\Cygwin\Wibr‐Cygwin‐Installer‐1010.exe automated https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html • Some computer programs only run on the unix • Dedicated unix server operation system “tak4”, the Whitehead Scientific Linux • TAK4 (our unix server): lots of software and databases already installed or downloaded

3 4

1 10/19/2017

Connecting to tak for Windows Logging in to tak4 host: tak4.wi.mit.edu

• Requesting a tak4 account http://iona.wi.mit.edu/bio/software/unix/bioinfoaccount.php 3) Click “Open”

• Windows 1) Open putty 2.1) the address of the Host 2.2) Click on SSH‐>X11‐> Enable X11  PuTTY or Cygwin forwarding When you write the  Xming: setup X‐windows for graphical display password you won’t see any characters being typed. • Macs Access through Terminal Command Prompt 5 user@tak4 ~$ 6

Log in to tak4 for Mac Unix Directory Structure

root /

ssh –Y [email protected] home usr bin nfs lab . . .

jdoe BaRC_Public BaRC_training solexa_public solexa_lodish

page /home/jdoe /lab/page

See http://www.thegeekstuff.com/2010/09/linux‐file‐system‐structure/ for detailed info. nfs and lab are directories specific to Whitehead 7 8

2 10/19/2017

Accessing Shared Resources Whitehead Get ready for the exercises • Unix  /nfs/BaRC_Public  /nfs/BaRC_training • Use this link to copy the commands for the exercises  /lab/solexa_public http://barc.wi.mit.edu/education/hot_topics/hot_topics/unix_es • Windows (access using Start Menu  Search) sentials_2017/UnixEssentials_HandsOn.txt  \\wi‐files1\BaRC_Public  \\wi‐files1\BaRC_training  \\wi‐htdata\solexa_public • These folders contain today’s slides and the material for the • Macs (access using Go  Connect to Server…)  cifs://wi‐files1/BaRC_Public exercises  cifs://wi‐files1/BaRC_training \\wi‐files1\BaRC_Public\Hot_Topics\unixEssentials_2017  cifs://wi‐htdata/solexa_public cifs://wi‐files1/BaRC_Public/Hot_Topics/unixEssentials_2017

Where’s my lab’s share? • http://it.wi.mit.edu/systems/file‐storage/lab‐share‐paths

9 10

Unix Tips Basic commands listing and organizing files/folders • Use to reuse previous commands • list the contents of a directory: [only show names] ls –l [long listing: show other information too] ls –h [human readable] • Ctrl‐: stop a process that is running • a directory: • change directory: dirname • The directories . and .. • Tab‐completion: The current directory (.) cd . The parent directory (..) – Complete commands/file names cd .. (go to the parent directory) ls .. (list to the parent directory) • print working directory: • Go to your home directory: cd or cd ~ • Unix is case‐sensitive • Go to the previous directory: cd -

11 12

3 10/19/2017

Paths Unix Directory Structure

root • “Absolute path” example / /nfs/BaRC_training/userName/myWork /nfs/BaRC_training/userName/myFig

home usr bin cd /nfs/BaRC_training/userName/myWork nfs lab . . . cd /nfs/BaRC_training/userName/myFig • “Relative path” example jdoe BaRC_Public BaRC_training solexa_public solexa_lodish ../dirName if I am in “/nfs/BaRC_training/userName/myWork” I can do: page userName cd ../userName/myFig myWork myFig

cd /nfs/BaRC_training/userName/myFig 13 cd ../userName/myFig 14

Basic commands Permissions copying, moving files, getting help drwxrwxr-x • copy: r  read w  write : directory (d) User Group Others  • move: symbolic link(l) x execute • remove: • Use to change permissions  user(u), group(g), others(o), all(a) • remove directory:  chmod u+x foo.pl (user can execute)  chmod g-w foo.pl (group can’t write) • get help on a command: man {command} thiruvil@tak /nfs/BaRC_Public$ ls -l myFile.txt -rw-r--r-- 1 thiruvil barc 0 2012-10-10 13:32 myFile.txt thiruvil@tak /nfs/BaRC_Public$ chmod g+w myFile.txt thiruvil@tak /nfs/BaRC_Public$ ll myFile.txt -rw-rw-r-- 1 thiruvil barc 0 2012-10-10 13:32 myFile.txt Do Exercises 1, 2 and 3

15 16

4 10/19/2017

Displaying the contents of a file on the Editing a File screen • filename: Dump a file to the screen • Command‐line editors  pico • more filename: Progressively dump a file to the screen:  nano ENTER = one line down SPACEBAR = page down q=quit  (emacs –nw) • less filename: like more but with extended capabilities  • Graphical editors (Windows users need an X‐windows emulator) • filename: Show the first few lines of a file Note: may not be part of standard installation • head ‐n filename: Show the first n lines of a file  nedit  gedit • filename: Show the last few lines of a file  xemacs • tail ‐n filename: Show the last n lines of a file • Put an & at the end of command line to run it in the background when using a graphical editor so that you can continue to use the • clear: clear screen terminal window eg. gedit myFile.txt&

17 18

Parsing a File: Output Redirection and Piping Word count: • Write output of a command to file • Select columns of interest  Write to output file cut –f 9,12-15 myGeneValues.txt > col_9.12to15.txt • myFile.txt > myFile_sorted.txt Options:  Over‐write output file (if it exists) ‐f output only these fields • sort myFile.txt >| myFile_sorted.txt ‐d field delimiter  Append to output file • sort myFile.txt >> myFile_sorted.txt • Count number of lines/words/characters in file wc myFile • Piping “|”: use output of one command as wc ‐w (count words only) wc ‐l (count lines only) input for another command  sort myFile.txt | more Do Exercises 4, 5, 6 and 7

19 20

5 10/19/2017

Sorting and removing redundancy Searching the contents of a file sort and • Sort on column(s) • (global regular expression print) sort -k 3,3 myGeneExpression.txt | more words, or patterns, occurring in lines of a file Options: grep TMEM geneList.txt TMEM131 ‐n, ‐‐numeric‐sort compare according to string numerical value TMEM9B ‐g, ‐‐general‐numeric‐sort compare according to general numerical value TMEM14C ‐r reverse TMEM66 ‐k pos1,pos2 start sorting at pos1, end it at pos2 TMEM49

• Get only unique entries Options: uniq mySortedGenes.txt > myUniqGenes.txt ‐v select non‐matching lines Options: ‐iignore case ‐n print line number ‐c count entries Example: get TMEM but exclude TMEM14C ‐d duplicate counts grep TMEM geneList.txt | grep -v "TMEM14C" | more make sure that the file is sorted before running uniq Do Exercise 10 Do Exercises 8 and 9 21 22

Getting Files (Un)Compressing Files • .gz file • Getting files or directories Compress: gzip file.txt (file.txt.gz will be created) Uncompress: gunzip file.txt.gz (file.txt will be created) Files wget http://data.broadinstitute.org/igv/projects/downloads/2.4/IGVSource_2.4.2.zip • .tar.gz file Compress: tar –czvf myFiles.tar.gz myFiles Directories from (outside) servers Uncompress: tar –xzvf myFiles.tar.gz scp –r origin destination Options scp -r userName@serverToCopyFrom:/pathToFolderTocopyFrom . ‐c create an archive ‐x extract an archive scp -r [email protected]:/broad/lab/works . ‐f FILE name of archive ‐v be verbose, list all files being archived/extracted ‐z create/extract archive with gzip/gunzip • View compressed files using: zmore,zgrep

23 24

6 10/19/2017

BaRC SOPs BaRC Resources • barc.wi.mit.edu http://barcwiki.wi.mit.edu/wiki/SOPs

25 26

Unix Commands and BaRC Scripts Running Scripts on Unix • Perl bed2gff.pl or ./bed2gff.pl or path/bed2gff.pl • R run_rma_customCDF.R Perl UNIX R • Python myScript.py or ./myScript.py • Matlab matlab -nodesktop -nosplash myScript.m • Java Archive (JAR) java -Xmx1000m -jar /usr/local/share/IGVTools/igv.jar 27 28

7 10/19/2017

BaRC code Running Programs/Tools on Unix • bedtools Script type Location bedtools intersect -a myGenes1.bed –b myGenes2.bed Perl /nfs/BaRC_Public/BaRC_code/Perl Other utilities: http://code.google.com/p/bedtools/wiki/Usage R /nfs/BaRC_Public/BaRC_code/R • samtools Python /nfs/BaRC_Public/BaRC_code/Python samtools view myFile.bam Other utilities: http://samtools.sourceforge.net/samtools.shtml • Fastx toolkit fastx_quality_stats -i mySeq.fastq -o fastxStats_mySeq • FastQC fastqc mySeq.fastq • BLAST blastp –task blastp -db myProtDB.fa –q myProt.fa –out out.txt

29 30

Commonly Used Data Locations at Whitehead LSF Commands • bsub to submit jobs Location Description bsub wc –l reads.fq bsub “sort foo.txt > sorted.txt” /nfs/genomes Genome data: gff, gtf, fasta, bowtie Options: indexed files, blat indexed file, etc. ‐e error_file for several organisms ‐o standard_out_file ‐m machine (send the job to that machine) /nfs/seq/Data Sequence data, including blast ‐n number (use that many processors in the cluster) databases, for several organisms bsub ‐n 4 ‐R "span[hosts=1]“(use 4 processors all in the same machine in the cluster) /nfs/BaRC_datasets Large (array/NGS) datasets: HBI, • bjobs to view your jobs HBM 2.0 bjobs • bkill to a job bkill 237878

31 32

8 10/19/2017

LSF cluster jobs Further Reading http://tak4.wi.mit.edu/ • BaRC: Unix Info http://iona.wi.mit.edu/bio/education/unix_intro.php • LSF Cluster (incl. examples) http://iona.wi.mit.edu/bio/bioinfo/docs/LSF_help.php • Whitehead IT Computing Tutorials http://it.wi.mit.edu/

33 34

Upcoming Hot Topics Regular Expressions

• Unix advanced (Oct 25th) • Pattern matching • An Introduction to R ‐ November • Easier to search • An Introduction to R Graphics ‐ November • Commonly used regular expressions

Regular Expression Matches .All characters *Zero or more; wildcard ^ Beginning of a line $Endof a line example: grep chr. fileName

35 36

9