RNA Sequencing Analysis with Tophat Booklet
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
A Comprehensive Workflow for Variant Calling Pipeline Comparison and Analysis Using R Programming
www.ijcrt.org © 2020 IJCRT | Volume 8, Issue 8 August 2020 | ISSN: 2320-2882 A COMPREHENSIVE WORKFLOW FOR VARIANT CALLING PIPELINE COMPARISON AND ANALYSIS USING R PROGRAMMING 1Mansi Ujjainwal, 2Preeti Chaudhary 1MSc Bioinformatics, 2Mtech Bioinformatics, 1Amity Institute of Biotechnology 1Amity University, Noida, India Abstract: The aim of the article is to provide variant calling workflow and analysis protocol for comparing results of the two using two variant calling platforms. Variant calling pipelines used here are predominantly used for calling variants in human whole exome data and whole genome data. The result of a variant calling pipeline is a set of variants( SNPS, insertions, deletions etc) present in the sequencing data. Each pipeline is capable of calling its certain intersecting and certain unique variants. The intersecting and unique variants can further be distinguished on the basis of their reference SNP ID and grouped on the basis of its annotation. The number of variants called can be humongous depending upon the size and complexity of the data. R programming packages and ubuntu command shell can be used to differentiate and analyse the variants called by each type of pipeline. Index Terms – Whole Exome Sequencing, Variant Calling, Sequencing data, R programming I. INTRODUCTION The Human Genome Project started in 1990, makes up the single most significant project in the field of biomedical sciences and biology. The project was set out to change how we see biology and medicine. The project was set out to sequence complete genome of Homo sapiens as well as several microorganisms including Escherichia coli, Saccharomyces cerevisiae, and metazoans such as Caenorhabtidis elegans. -
Property Graph Vs RDF Triple Store: a Comparison on Glycan Substructure Search
RESEARCH ARTICLE Property Graph vs RDF Triple Store: A Comparison on Glycan Substructure Search Davide Alocci1,2, Julien Mariethoz1, Oliver Horlacher1,2, Jerven T. Bolleman3, Matthew P. Campbell4, Frederique Lisacek1,2* 1 Proteome Informatics Group, SIB Swiss Institute of Bioinformatics, Geneva, 1211, Switzerland, 2 Computer Science Department, University of Geneva, Geneva, 1227, Switzerland, 3 Swiss-Prot Group, SIB Swiss Institute of Bioinformatics, Geneva, 1211, Switzerland, 4 Department of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia * [email protected] Abstract Resource description framework (RDF) and Property Graph databases are emerging tech- nologies that are used for storing graph-structured data. We compare these technologies OPEN ACCESS through a molecular biology use case: glycan substructure search. Glycans are branched Citation: Alocci D, Mariethoz J, Horlacher O, tree-like molecules composed of building blocks linked together by chemical bonds. The Bolleman JT, Campbell MP, Lisacek F (2015) molecular structure of a glycan can be encoded into a direct acyclic graph where each node Property Graph vs RDF Triple Store: A Comparison on Glycan Substructure Search. PLoS ONE 10(12): represents a building block and each edge serves as a chemical linkage between two build- e0144578. doi:10.1371/journal.pone.0144578 ing blocks. In this context, Graph databases are possible software solutions for storing gly- Editor: Manuela Helmer-Citterich, University of can structures and Graph query languages, such as SPARQL and Cypher, can be used to Rome Tor Vergata, ITALY perform a substructure search. Glycan substructure searching is an important feature for Received: July 16, 2015 querying structure and experimental glycan databases and retrieving biologically meaning- ful data. -
Genomic Alignment (Mapping) and SNP / Polymorphism Calling
GenomicGenomic alignmentalignment (mapping)(mapping) andand SNPSNP // polymorphismpolymorphism callingcalling Jérôme Mariette & Christophe Klopp http://bioinfo.genotoul.fr/ Bioinfo Genotoul platform – Since 2008 ● 1 Roche 454 ● 1 MiSeq ● 2 HiSeq – Providing ● Data processing for quality control ● Secure data access to end users http://bioinfo.genotoul.fr/ http://ng6.toulouse.inra.fr/ 2 Bioinfo Genotoul : Services – High speed computing facility access – Application and web-server hosting – Training – Support – Project partnership 3 Genetic variation http://en.wikipedia.org/wiki/Genetic_variation Genetic variation, variations in alleles of genes, occurs both within and in populations. Genetic variation is important because it provides the “raw material” for natural selection. http://studentreader.com/genotypes-phenotypes/ 4 Types of variations ● SNP : Single nucleotide polymorphism ● CNV : copy number variation ● Chromosomal rearrangement ● Chromosomal duplication http://en.wikipedia.org/wiki/Copy-number_variation http://en.wikipedia.org/wiki/Human_genetic_variation 5 The variation transmission ● Mutation : In molecular biology and genetics, mutations are changes in a genomic sequence: the DNA sequence of a cell's genome or the DNA or RNA sequence of a virus (http://en.wikipedia.org/wiki/Mutation). ● Mutations are transmitted if they are not lethal. ● Mutations can impact the phenotype. 6 Genetic markers and genotyping ● A set of SNPs is selected along the genome. ● The phenotypes are collected for individuals. ● The SNPs are genotyped -
Genetics 211 - 2018 Lecture 3
Genetics 211 - 2018 Lecture 3 High Throughput Sequencing Pt II Gavin Sherlock [email protected] January 23rd 2018 Long “Synthetic Reads” aka Moleculo Genomic DNA Fragment Size Select (10kb) Polish, ligate amplification adaptors ~10 kb DNA Dilute to 500 molecules per well Amplify, fragment, add sequencing adaptors Pool Sequence Separate, based on well barcode Remove barcodes, assemble 10kb fragments Assemble genome from 10kb fragments Synthetic Read Characteristics 10x Genomics • Similar in concept to CPT-Seq from last week’s paper • Idea is to uniquely barcode reads that derive from a long molecule - ~50-100kb • 10x Chromium system automates much of the process for you ~10 HMW gDNA molecules per GEM 10x Barcoded Beads HMW gDNA Oil Benefits of 10x • Correct placement in difficult to align regions: Paralog A Paralog B Benefits of 10x • Correct placement in difficult to align regions: Paralog A Paralog B Benefits of 10x • Correct placement in difficult to align regions: Paralog A Paralog B Benefits of 10x • Correct placement in difficult to align regions: Paralog A Paralog B Benefits of 10x • Correct placement in difficult to align regions: Paralog A Paralog B Benefits of 10x • Haplotype phasing: Benefits of 10x • Haplotype phasing: Benefits of 10x • Haplotype phasing: Benefits of 10x • Haplotype phasing: Using Hi-C data to aid assemblies • Hi-C is a proximity ligation method, aimed at reconstructing the 3 dimensional structure of a genome • Originally developed with the idea of looking at how the genome of an organism for which a good reference exists is physically organized • But, probability of intrachromosomal contacts is much higher than that of interchromosomal contacts. -
Choudhury 22.11.2018 Suppl
The Set1 complex is dimeric and acts with Jhd2 demethylation to convey symmetrical H3K4 trimethylation Rupam Choudhury, Sukdeep Singh, Senthil Arumugam, Assen Roguev and A. Francis Stewart Supplementary Material 9 Supplementary Figures 1 Supplementary Table – an Excel file not included in this document Extended Materials and Methods Supplementary Figure 1. Sdc1 mediated dimerization of yeast Set1 complex. (A) Multiple sequence alignment of the dimerizaton region of the protein kinase A regulatory subunit II with Dpy30 homologues and other proteins (from Roguev et al, 2001). Arrows indicate the three amino acids that were changed to alanine in sdc1*. (B) Expression levels of TAP-Set1 protein in wild type and sdc1 mutant strains evaluated by Westerns using whole cell extracts and beta-actin (B-actin) as loading control. (C) Spp1 associates with the monomeric Set1 complex. Spp1 was tagged with a Myc epitope in wild type and sdc1* strains that carried TAP-Set1. TAP-Set1 was immunoprecipitated from whole cell extracts and evaluated by Western with an anti-myc antibody. Input, whole cell extract from the TAP-set1; spp1-myc strain. W303, whole cell extract from the W303 parental strain. a Focal Volume Fluorescence signal PCH Molecular brightness comparison Intensity 25 20 Time 15 10 Frequency Brightness (au) 5 0 Intensity Intensity (counts /ms) GFP1 GFP-GFP2 Time c b 1x yEGFP 2x yEGFP 3x yEGFP 1 X yEGFP ADH1 yEGFP CYC1 2 X yEGFP ADH1 yEGFP yEGFP CYC1 3 X yEGFP ADH1 yEGFP yEGFP yEGFP CYC1 d 121-165 sdc1 wt sdc1 sdc1 Swd1-yEGFP B-Actin Supplementary Figure 2. Molecular analysis of Sdc1 mediated dimerization of Set1C by Fluorescence Correlation Spectroscopy (FCS). -
RNA-‐Seq: Revelation of the Messengers
Supplementary Material Hands-On Tutorial RNA-Seq: revelation of the messengers Marcel C. Van Verk†,1, Richard Hickman†,1, Corné M.J. Pieterse1,2, Saskia C.M. Van Wees1 † Equal contributors Corresponding author: Van Wees, S.C.M. ([email protected]). Bioinformatics questions: Van Verk, M.C. ([email protected]) or Hickman, R. ([email protected]). 1 Plant-Microbe Interactions, Department of Biology, Utrecht University, Padualaan 8, 3584 CH Utrecht, The Netherlands 2 Centre for BioSystems Genomics, PO Box 98, 6700 AB Wageningen, The Netherlands INDEX 1. RNA-Seq tools websites……………………………………………………………………………………………………………………………. 2 2. Installation notes……………………………………………………………………………………………………………………………………… 3 2.1. CASAVA 1.8.2………………………………………………………………………………………………………………………………….. 3 2.2. Samtools…………………………………………………………………………………………………………………………………………. 3 2.3. MiSO……………………………………………………………………………………………………………………………………………….. 4 3. Quality Control: FastQC……………………………………………………………………………………………………………………………. 5 4. Creating indeXes………………………………………………………………………………………………………………………………………. 5 4.1. Genome IndeX for Bowtie / TopHat alignment……………………………………………………………………………….. 5 4.2. Transcriptome IndeX for Bowtie alignment……………………………………………………………………………………… 6 4.3. Genome IndeX for BWA alignment………………………………………………………………………………………………….. 7 4.4. Transcriptome IndeX for BWA alignment………………………………………………………………………………………….7 5. Read Alignment………………………………………………………………………………………………………………………………………… 8 5.1. Preliminaries…………………………………………………………………………………………………………………………………… 8 5.2. Genome read alignment with Bowtie……………………………………………………………………………………………… -
Introduction to Bioinformatics (Elective) – SBB1609
SCHOOL OF BIO AND CHEMICAL ENGINEERING DEPARTMENT OF BIOTECHNOLOGY Unit 1 – Introduction to Bioinformatics (Elective) – SBB1609 1 I HISTORY OF BIOINFORMATICS Bioinformatics is an interdisciplinary field that develops methods and software tools for understanding biologicaldata. As an interdisciplinary field of science, bioinformatics combines computer science, statistics, mathematics, and engineering to analyze and interpret biological data. Bioinformatics has been used for in silico analyses of biological queries using mathematical and statistical techniques. Bioinformatics derives knowledge from computer analysis of biological data. These can consist of the information stored in the genetic code, but also experimental results from various sources, patient statistics, and scientific literature. Research in bioinformatics includes method development for storage, retrieval, and analysis of the data. Bioinformatics is a rapidly developing branch of biology and is highly interdisciplinary, using techniques and concepts from informatics, statistics, mathematics, chemistry, biochemistry, physics, and linguistics. It has many practical applications in different areas of biology and medicine. Bioinformatics: Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data. Computational Biology: The development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems. "Classical" bioinformatics: "The mathematical, statistical and computing methods that aim to solve biological problems using DNA and amino acid sequences and related information.” The National Center for Biotechnology Information (NCBI 2001) defines bioinformatics as: "Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. -
Supplementary Materials
Supplementary Materials Functional linkage of gene fusions to cancer cell fitness assessed by pharmacological and CRISPR/Cas9 screening 1,† 1,† 2,3 1,3 Gabriele Picco , Elisabeth D Chen , Luz Garcia Alonso , Fiona M Behan , Emanuel 1 1 1 1 1 Gonçalves , Graham Bignell , Angela Matchan , Beiyuan Fu , Ruby Banerjee , Elizabeth 1 1 4 1,7 5,3 Anderson , Adam Butler , Cyril H Benes , Ultan McDermott , David Dow , Francesco 2,3 5,3 1 1 6,2,3 Iorio E uan Stronach , Fengtang Yang , Kosuke Yusa , Julio Saez-Rodriguez , Mathew 1,3,* J Garnett 1 Wellcome Sanger Institute, Wellcome Genome Campus, Cambridge CB10 1SA, UK. 2 European Molecular Biology Laboratory – European Bioinformatics Institute, Wellcome Genome Campus, Cambridge CB10 1SD, UK. 3 Open Targets, Wellcome Genome Campus, Cambridge CB10 1SA, UK. 4 Massachusetts General Hospital, 55 Fruit Street, Boston, MA 02114, USA. 5 GSK, Target Sciences, Gunnels Wood Road, Stevenage, SG1 2NY, UK and Collegeville, 25 Pennsylvania, 19426-0989, USA. 6 Heidelberg University, Faculty of Medicine, Institute for Computational Biomedicine, Bioquant, Heidelberg 69120, Germany. 7 AstraZeneca, CRUK Cambridge Institute, Cambridge CB2 0RE * Correspondence to: [email protected] † Equally contributing authors Supplementary Tables Supplementary Table 1: Annotation of 1,034 human cancer cell lines used in our study and the source of RNA-seq data. All cell lines are part of the GDSC cancer cell line project (see COSMIC IDs). Supplementary Table 2: List and annotation of 10,514 fusion transcripts found in 1,011 cell lines. Supplementary Table 3: Significant results from differential gene expression for recurrent fusions. Supplementary Table 4: Annotation of gene fusions for aberrant expression of the 3 prime end gene. -
Meeting Booklet
LS2 Annual Meeting 2019 “Cell Biology from Tissue to Nucleus” Meeting Booklet 14-15 February, University of Zurich, Campus Irchel LS 2 - Life Sciences Switzerland Design by Dagmar Bocakova - [email protected] 2019 WELCOME ADDRESS Dear colleagues and friends It is with great pleasure that we invite you to the LS² Annual Meeting 2019, held on the 14th and 15th of February, 2019 at the Irchel Campus of the University of Zurich. This is a special year, as it is the 50th anniversary of USGEB/LS2, which we will celebrate with a Jubilee Apéro on Thursday 14th. The LS² Annual Meeting brings together scientists from all nations and backgrounds to discuss a variety of Life Science subjects. The meeting this year focusses on the cell in health and disease, with plenary talks on different aspects of cell biology and a symposium on live cell imaging approaches. You will be able to hear the latest, most exciting findings in several fields, from Molecular and Cellular Biosciences, Proteomics, Chemical Biology, Physiology, Pharmacology and more, presented by around 30 invited speakers and 45 speakers selected from abstracts in one of the seven scientific symposia and five plenary lectures. Two novelties highlight the meeting this year: The "PIs of Tomorrow" session, in which selected postdocs will present their research to a jury of professors, will be for the first time a plenary session. In addition, poster presenters will be selected to give flash talks in the symposia with the spirit to further promote young scientists. Join us for the poster session with more than 130 posters, combined with a large industry exhibition and the Jubilee Apéro. -
Software List for Biology, Bioinformatics and Biostatistics CCT
Software List for biology, bioinformatics and biostatistics v CCT - Delta Software Version Application short read assembler and it works on both small and large (mammalian size) ALLPATHS-LG 52488 genomes provides a fast, flexible C++ API & toolkit for reading, writing, and manipulating BAMtools 2.4.0 BAM files a high level of alignment fidelity and is comparable to other mainstream Barracuda 0.7.107b alignment programs allows one to intersect, merge, count, complement, and shuffle genomic bedtools 2.25.0 intervals from multiple files Bfast 0.7.0a universal DNA sequence aligner tool analysis and comprehension of high-throughput genomic data using the R Bioconductor 3.2 statistical programming BioPython 1.66 tools for biological computation written in Python a fast approach to detecting gene-gene interactions in genome-wide case- Boost 1.54.0 control studies short read aligner geared toward quickly aligning large sets of short DNA Bowtie 1.1.2 sequences to large genomes Bowtie2 2.2.6 Bowtie + fully supports gapped alignment with affine gap penalties BWA 0.7.12 mapping low-divergent sequences against a large reference genome ClustalW 2.1 multiple sequence alignment program to align DNA and protein sequences assembles transcripts, estimates their abundances for differential expression Cufflinks 2.2.1 and regulation in RNA-Seq samples EBSEQ (R) 1.10.0 identifying genes and isoforms differentially expressed EMBOSS 6.5.7 a comprehensive set of sequence analysis programs FASTA 36.3.8b a DNA and protein sequence alignment software package FastQC -
Rnaseq2017 Course Salerno, September 27-29, 2017
RNASeq2017 Course Salerno, September 27-29, 2017 RNA-seq Hands on Exercise Fabrizio Ferrè, University of Bologna Alma Mater ([email protected]) Hands-on tutorial based on the EBI teaching materials from Myrto Kostadima and Remco Loos at https://www.ebi.ac.uk/training/online/ Index 1. Introduction.......................................2 1.1 RNA-Seq analysis tools.........................2 2. Your data..........................................3 3. Quality control....................................6 4. Trimming...........................................8 5. The tuxedo pipeline...............................12 5.1 Genome indexing for bowtie....................12 5.2 Insert size and mean inner distance...........13 5.3 Read mapping using tophat.....................16 5.4 Transcriptome reconstruction and expression using cufflinks...................................19 5.5 Differential Expression using cuffdiff........22 5.6 Transcriptome reconstruction without genome annotation........................................24 6. De novo assembly with Velvet......................27 7. Variant calling...................................29 8. Duplicate removal.................................31 9. Using STAR for read mapping.......................32 9.1 Genome indexing for STAR......................32 9.2 Read mapping using STAR.......................32 9.3 Using cuffdiff on STAR mapped reads...........34 10. Concluding remarks..............................35 11. Useful references...............................36 1 1. Introduction The goal of this -
Integrative Analysis of Transcriptomic Data for Identification of T-Cell
www.nature.com/scientificreports OPEN Integrative analysis of transcriptomic data for identifcation of T‑cell activation‑related mRNA signatures indicative of preterm birth Jae Young Yoo1,5, Do Young Hyeon2,5, Yourae Shin2,5, Soo Min Kim1, Young‑Ah You1,3, Daye Kim4, Daehee Hwang2* & Young Ju Kim1,3* Preterm birth (PTB), defned as birth at less than 37 weeks of gestation, is a major determinant of neonatal mortality and morbidity. Early diagnosis of PTB risk followed by protective interventions are essential to reduce adverse neonatal outcomes. However, due to the redundant nature of the clinical conditions with other diseases, PTB‑associated clinical parameters are poor predictors of PTB. To identify molecular signatures predictive of PTB with high accuracy, we performed mRNA sequencing analysis of PTB patients and full‑term birth (FTB) controls in Korean population and identifed diferentially expressed genes (DEGs) as well as cellular pathways represented by the DEGs between PTB and FTB. By integrating the gene expression profles of diferent ethnic groups from previous studies, we identifed the core T‑cell activation pathway associated with PTB, which was shared among all previous datasets, and selected three representative DEGs (CYLD, TFRC, and RIPK2) from the core pathway as mRNA signatures predictive of PTB. We confrmed the dysregulation of the candidate predictors and the core T‑cell activation pathway in an independent cohort. Our results suggest that CYLD, TFRC, and RIPK2 are potentially reliable predictors for PTB. Preterm birth (PTB) is the birth of a baby at less than 37 weeks of gestation, as opposed to the usual about 40 weeks, called full term birth (FTB)1.