RNA-‐Seq: Revelation of the Messengers

Supplementary Material Hands-On Tutorial RNA-Seq: revelation of the messengers Marcel C. Van Verk†,1, Richard Hickman†,1, Corné M.J. Pieterse1,2, Saskia C.M. Van Wees1 † Equal contributors Corresponding author: Van Wees, S.C.M. ([email protected]). Bioinformatics questions: Van Verk, M.C. ([email protected]) or Hickman, R. ([email protected]). 1 Plant-Microbe Interactions, Department of Biology, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands 2 Centre for BioSystems Genomics, PO Box 98, 6700 AB Wageningen, The Netherlands INDEX 1. RNA-Seq tools websites……………………………………………………………………………………………………………………………. 2 2. Installation notes……………………………………………………………………………………………………………………………………… 3 2.1. CASAVA 1.8.2………………………………………………………………………………………………………………………………….. 3 2.2. Samtools…………………………………………………………………………………………………………………………………………. 3 2.3. MiSO……………………………………………………………………………………………………………………………………………….. 4 3. Quality Control: FastQC……………………………………………………………………………………………………………………………. 5 4. Creating indeXes………………………………………………………………………………………………………………………………………. 5 4.1. Genome IndeX for Bowtie / TopHat alignment……………………………………………………………………………….. 5 4.2. Transcriptome IndeX for Bowtie alignment……………………………………………………………………………………… 6 4.3. Genome IndeX for BWA alignment………………………………………………………………………………………………….. 7 4.4. Transcriptome IndeX for BWA alignment………………………………………………………………………………………….7 5. Read Alignment………………………………………………………………………………………………………………………………………… 8 5.1. Preliminaries…………………………………………………………………………………………………………………………………… 8 5.2. Genome read alignment with Bowtie……………………………………………………………………………………………… 8 5.3. Transcriptome read alignment with Bowtie……………………………………………………………………………………. 9 5.4. Genome read alignment with BWA…………………………………………………………………………………………………. 9 5.5. Transcriptome read alignment with BWA……………………………………………………………………………………… 10 5.6. Genome read alignment with TopHat……………………………………………………………………………………………. 10 6. Summarization……………………………………………………………………………………………………………………………………….. 12 6.1. Summarizing counts from Bowtie transcriptome alignment………………………………………………………….. 12 6.2. Summarizing counts from BWA transcriptome alignment…………………………………………………………….. 12 6.3. Summarizing counts from TopHat genome alignment using HTSeq………………………………………………. 12 6.4. Bowtie/BWA genome count summarization…………………………………………………………………………………. 13 7. Differential EXpression……………………………………………………………………………………………………………………………. 14 7.1. Differential eXpression of summarized TopHat count data using DESeq………………………………………… 14 7.2. Differential eXpression of summarized Bowtie/BWA genome-aligned count data using DESeq…….. 15 8. Isoform Analysis……………………………………………………………………………………………………………………………………… 16 8.1. Isoform analysis using MiSO………………………………………………………………………………………………………….. 16 8.2. Plotting MiSO data using Sashimi………………………………………………………………………………………………….. 18 8.3. Differential eXpression analysis with Cufflinks………………………………………………………………………………. 18 9. Visualization using IGV……………………………………………………………………………………………………………………………. 20 10. Interoperability………………………………………………………………………………………………………………………………………. 21 10.1. Convert SAM to BAM…………………………………………………………………………………………………………………….. 21 10.2. Convert BAM to SAM…………………………………………………………………………………………………………………….. 21 10.3. Sort a BAM file………………………………………………………………………………………………………………………………. 21 10.4. IndeX a BAM file……………………………………………………………………………………………………………………………. 21 10.5. Sort a SAM file………………………………………………………………………………………………………………………………. 22 10.6. Conversion of GTF to GFF……………………………………………………………………………………………………………… 22 11. References…………………………………………………………………………………………………………………………………………….. 23 IMPORTANT NOTE: The sample data set provided for use with this tutorial is a small subset of a real RNA-Seq eXperiment and only provides read data for a few select genes. Therefore the selected features in the analysis of this dataset differ from one where they are part of a realistic complete RNA-Seq eXperiment. The analysis steps provided here, however, are representative of a complete RNA- Seq analysis and no steps need to be altered when applying this analysis to a ‘real’ RNA-Seq data set compared to this sample data. Page | 1 1. RNA-Seq tools websites The table below provides a selection of the most frequently used tools for RNA-Seq analysis. Included are the hyperlinks to the main website, the download location for the program and link to the program manual. Tool URL Main page http://www.bioinformatics.babraham.ac.uk/projects/fastqc/ Program http://www.bioinformatics.babraham.ac.uk/projects/download.html#fastqc FastQC[S1] Video http://www.youtube.com/watch?v=bz93ReOv87Y Manual http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/ Main page http://bowtie-bio.sourceforge.net/indeX.shtml http://sourceforge.net/projects/bowtie-bio/files/bowtie/ Bowtie[S2] Program or main page à latest releases Manual http://bowtie-bio.sourceforge.net/manual.shtml Main page http://bio-bwa.sourceforge.net/ http://sourceforge.net/projects/bio-bwa/files/ BWA[S3] Program or main page à SF download page Manual http://bio-bwa.sourceforge.net/bwa.shtml Main page http://tophat.cbcb.umd.edu/ TopHat http://tophat.cbcb.umd.edu/downloads/ Program [S5,S6] or main page à releases Manual http://tophat.cbcb.umd.edu/manual.html Main page http://bioconductor.org/packages/release/bioc/html/DESeq.html Program http://cran.us.r-project.org/ DESeq[S7] R-package source("http://bioconductor.org/biocLite.R"); biocLite("DESeq") Manual http://bioconductor.org/packages/release/bioc/vignettes/DESeq/inst/doc/DESeq.pdf Main page http://www-huber.embl.de/users/anders/HTSeq/ HTSeq[S8] Program http://www-huber.embl.de/users/anders/HTSeq/doc/install.html#install Manual http://www-huber.embl.de/users/anders/HTSeq/doc/count.html Main page http://genes.mit.edu/burgelab/miso/ Program http://genes.mit.edu/burgelab/miso/software.html MiSO[S9] http://genes.mit.edu/burgelab/miso/docs/ Manual http://genes.mit.edu/burgelab/miso/docs/sashimi.html Main page http://cufflinks.cbcb.umd.edu/tutorial.html Cufflinks http://cufflinks.cbcb.umd.edu/downloads/ Program [S5,S6] or main page à releases Manual http://cufflinks.cbcb.umd.edu/manual.html Main page http://samtools.sourceforge.net/ Samtools [S10] Program http://sourceforge.net/projects/samtools/files/samtools/ Manual http://samtools.sourceforge.net/samtools.shtml Main page http://www.broadinstitute.org/igv/ IGV[S11] Program http://www.broadinstitute.org/software/igv/download Manual http://www.broadinstitute.org/software/igv/UserGuide For a full overview of next generation sequencing software tools required for each step described and more advanced analysis packages see: http://seqanswers.com/wiki/Software/list. Page | 2 2. Installation notes The manuals of most tools specify in detail how to install the software. Still, we have encountered some issues with installing some of the tools on our Mac OS X Lion system. An overview of the steps we took to solve the problems with some specific tools is shown below. 2.1 CASAVA v1.8.2 CASAVA is a pipeline supplied by Illumina to convert the Basecalls into (demultiplexed) FastQ files. If the sequencing is outsourced, this step is usually already performed by the sequencing provider. On Mac OS X: - Download the latest version of cmake at: http://www.cmake.org/cmake/resources/software.html - Install cmake using the downloaded .dmg package - At the command line type: $ which cmake this will give the eXact installation location of cmake. Use this location in the neXt step for the part marked in red. $ /CASAVA_v1.8.2/src/configure --prefix=/Data/CASAVA_v1.8.2 --with-cmake=/usr/bin/cmake Underlined is the installation directory where CASAVA will be installed on the system. If one of other systems one of the following libraries is missing during installation, install them using the following steps: - Gzip $ sudo apt-get install build-essential - Bzip - Goto bzip.org/downloads and download the installation package $ sudo make $ sudo make install - Zlib $ sudo apt-get install zlib1g-dev - Ncurses $ sudo apt-get install libncurses5-dev - Libxml $ sudo apt-get install libxml2-dev - Simpleperl $ sudo apt-get install libxml-simple-perl 2.2 SAMTOOLS The installation of SAMTOOLS is very well eXplained by its manual. However, pay attention to the following point if a 64 bit system is used. - Edit the Makefile, un-comment out the -m64 on the line containing CLFAGS. This ensures the program will compile as a 64 bit eXecutable. Page | 3 2.3 MiSO The installation of MISO is very well described by its manual. However, MISO has some dependencies with other Python packages. Below an overview is given of the installation procedure for MISO and its dependencies. - Within the downloaded MISO directory type: $ sudo python setup.py install - To check which dependent modules are missing run: $ python module_availability Installation of modules: Scipy: - For uniX systems download Scipy and install after using: http://sourceforge.net/projects/scipy/files/ - For Mac OS X 64 bit systems, download the installation script at: http://www.scipy.org/Download Download the script that is named install_superpack.sh using the link: SciPy Superpack for Python 2.7 (64-bit) installation script - Install Scipy by typing: $ bash install_superpack.sh Simplejson: $ sudo pip install simplejson Pysam: $ sudo pip install pysam Matplotlib: - On uniX systems type: $ sudo pip install matplotlib - On Mac OS X systems type: $ sudo pip install git+ https://github.com/matplotlib/matplotlib.git#egg=matplotlib-dev Page | 4 3. Quality Control: FastQC FastQC performs a quality control check on the output Fastq files and indicates potential

RNA-‐Seq: Revelation of the Messengers

Choudhury 22.11.2018 Suppl

Supplementary Materials

Software List for Biology, Bioinformatics and Biostatistics CCT

Rnaseq2017 Course Salerno, September 27-29, 2017

Integrative Analysis of Transcriptomic Data for Identification of T-Cell

RNA-Seq Lecture 2

Structure Characterization of a Peroxidase from The

Comparative Analysis of Genome Aligners Shows HISAT2 and BWA Are Among the Best Tools

Tophat2: Accurate Alignment of Transcriptomes in the Presence Of

Omics Data Exploration: Across Scales and Dimensions

Magic-BLAST, an Accurate DNA and RNA-Seq Aligner for Long and Short Reads

Rbm10 Facilitates Heterochromatin Assembly Via the Clr6 HDAC Complex