Bioinformatic Analysis of Microrna Genes in Free-Living and Parasitic Nematodes

Total Page:16

File Type:pdf, Size:1020Kb

Bioinformatic Analysis of Microrna Genes in Free-Living and Parasitic Nematodes Bioinformatic Analysis of microRNA Genes in Free-Living and Parasitic Nematodes Rina Ahmed November 2014 DISSERTATION zur Erlangung des akademischen Grades der Doktorin der Naturwissenschaften (Dr. rer. nat.) eingereicht im Fachbereich Mathematik und Informatik der Freien Universit¨at Berlin Begutachtet von: Prof. Dr. Martin Vingron Prof. Dr. Kris Gunsalus 1. Gutachter: Prof. Dr. Martin Vingron 2. Gutachterin: Prof. Dr. Kris Gunsalus Disputation: 26. Februar 2015 Preface The work that led to this thesis is part of two collaborative projects in which I par- ticipated. This thesis presents results from both projects. A panel of different bioin- formatics and statistical methods suitable to analyze small RNA deep sequencing data were identified and developed. Individual contributions for each project will be detailed here: Flexbar Project The work presented in Chapter3 was published in the special issue \Next-Generation Sequencing Approaches in Biology" in the journal Biology 1. The Flexible Barcode and Adapter Remover (FLEXBAR) originated from the Flexible Adapter Remover (FAR) and has been developed by Matthias Dodt in the bioinformat- ics group of Dr. Christoph Dieterich. As part of this project, I developed the adapter removal feature for SOLiD color space reads and focused on the application of small RNA-seq in letter and color space. Additionally, I was involved in the design of FAR and in the development of specific features of the subsequently added barcode detec- tion function for demultiplexing. The final version of FLEXBAR (paper version) has been extensively revised and enhanced by Johannes R¨ohr through the introduction of novel and extended features, a cleanup in the source code, redesigned command-line interface, and optimized parameter settings. miRNA Project The bioinformatics workflow and the analysis and results presented in Chapter2 and4 were published in Genome Biology and Evolution 2. As part of this collaborative project, I designed and performed all computational experiments. The experimental data sets were generated in the group of Dr. Christoph Dieterich at the Berlin Institute for Medical Systems Biology (BIMSB) which is part of the Max-Delbruck-Center¨ for Molecular Medicine (MDC). The total RNA of the parasite samples (Strongyloides ratti) were kindly provided by our collaborator PD. Dr. Norbert W. Brattig from the Bernhard Nocht Institute for Tropical Medicine in Hamburg. All iii next-generation sequencing was performed in the group of Dr. Wei Chen at BIMSB. Acknowledgements The research presented in this thesis was funded by the MDC- NYU Exchange Program and was carried out at BIMSB in the group of Dr. Christoph Dieterich and the Center for Genomics and Systems Biology at the New York Univer- sity (NYU) in the group of Prof. Dr. Kris Gunsalus. In the following, I would like to thank all people who have supported and helped me throughout my PhD studies: First of all, I would like to thank my supervisor Christoph Dieterich for giving me the opportunity to pursue this research in his lab and introducing me to the fascinating world of next-generation sequencing. I am very grateful for his ideas, support, and fruitful discussions throughout the years and for giving me the opportunity to attend international conferences. I especially want to thank my co-supervisor Kris Gunsalus for giving me the chance to work in a very stimulating research environment and tightly connected with the wet lab group of Fabio Piano. Kris provided a creative and open minded working environment and I am very grateful for her dedication and support inside and outside the lab. I would also like to thank Martin Vingron for taking the time to supervise my PhD thesis as my University advisor. Furthermore, I am very grateful to all present and former members of the Dietrich, Gunsalus, and Piano groups and all people from BIMSB for creating this inspiring working atmosphere with plenty of joyful coffee breaks. Special thanks goes to Nikolaus Rajewsky and Jutta Steink¨otter for their guidance and dedication and for providing this excellent research program. Moreover, I have to thank Jennifer Stewart and Sabrina Deter for helping me with all organizational issues and loads of paper work. Last but not least, I would like to thank my friends and family for their tremendous support and patience. iv Contents 1 Introduction1 1.1 Objectives and Thesis Structure . .1 1.2 The Animal Models . .2 1.2.1 Caenorhabditis elegans ........................2 1.2.2 Pristionchus pacificus ........................5 1.2.3 Relationship with Parasitic Nematodes . .5 1.3 Post-transcriptional Regulation of Gene Expression . .7 1.3.1 microRNA Genes . .8 1.4 Next-Generation Sequencing . 11 1.4.1 Illumina/Solexa System . 15 1.4.2 ABI SOLiDTM System . 17 1.4.3 Small RNA Sequencing . 19 2 Materials and Methods 23 2.1 Small RNA Sequencing . 23 2.1.1 Nematode Strains and Culture . 24 2.1.2 Total RNA Isolation and Small RNA Library Generation . 25 2.2 Data Sets . 25 2.2.1 Small RNA Sequencing Data . 25 2.2.2 Publicly Available Data . 26 2.3 Bioinformatics Methods . 27 2.3.1 Preprocessing of Small RNA Sequencing Data . 27 2.3.1.1 Quality Filtering . 28 2.3.1.2 Barcode Detection . 29 2.3.1.3 Adapter Removal . 30 2.3.2 Mapping of Short Sequencing Reads to a Reference . 32 2.3.3 Identification of microRNA Genes from Small RNA-Seq Data . 34 v CONTENTS 2.3.3.1 Quantification of microRNA Expression Levels . 35 2.3.3.2 Identification of Novel microRNA Genes . 35 2.3.4 Differential Expression Analysis . 38 2.3.4.1 Normalizing microRNA Sequencing Data . 38 2.3.4.2 Defining Differential Expression . 40 2.3.4.3 Correction for Multiple Hypothesis Testing . 42 2.3.5 Inference of microRNA Gene Families and Phylogeny . 43 2.3.5.1 Grouping of microRNA Gene Families . 44 2.3.5.2 Multiple Sequence-structure Alignments of RNA . 45 2.3.5.3 Building Phylogenetic Trees . 46 2.3.5.4 Performance Evaluation . 49 2.3.6 Single-Mutation Seed Network . 50 3 FLEXBAR - Flexible Barcode and Adapter Processing for Next- Generation Sequencing 51 3.1 Background . 52 3.2 Program Features . 52 3.2.1 Algorithmic Implementation . 54 3.2.2 Trim-end Modes . 55 3.2.3 Quality Clipping and Read Filtering . 56 3.3 Program Usage . 56 3.4 Program Validation . 58 3.4.1 Adapter Removal from microRNA Short Reads in Color Space . 59 4 Conserved microRNAs are Candidate Post-Transcriptional Regula- tors of Developmental Arrest in Free-Living and Parasitic Nematodes 63 4.1 Background . 64 4.2 Sequencing of microRNAs from Three Nematodes . 65 4.3 Unbiased Identification of Novel microRNA Genes . 67 4.4 Most microRNA Genes Are Not Conserved among Distantly Related Nematodes . 69 4.5 Evaluation of microRNA Homology Assignment . 72 4.6 microRNA Expression Changes from Sequencing Data Agree with Pub- lished qRT-PCR Results . 75 4.7 Differential Expression Analysis Identifies Cross-Species Candidate Reg- ulators . 80 vi CONTENTS 4.8 P. pacificus miR-34 Seed Neighbors are Upregulated in Dauer Larvae . 85 5 Discussion 89 5.1 FLEXBAR - Leading Solution in Barcode and Adapter Processing . 90 5.2 Comprehensive Bioinformatic Analysis Identifies Cross-Species Candi- date Regulators in Nematodes . 91 5.3 Future Directions . 96 5.4 Concluding Remarks . 97 References 99 Abbreviations 119 Summary 123 Zusammenfassung 125 Appendix A - Supplemental Material 129 Appendix B - Supplemental CD 133 Curriculum Vitae 135 Selbst¨andigkeitserkl¨arung 139 vii List of Figures 1.1 Major taxonomic groups of the phylum Nematoda . .3 1.2 Life cycle of free-living and parasitic nematodes . .4 1.3 miRNA biogenesis pathway . 10 1.4 Workflow of Sanger versus next-generation sequencing . 14 1.5 Illumina/Solexa sequencing system . 15 1.6 ABI SOLiD sequencing system . 21 2.1 Experimental setup of microRNA gene profiling . 24 2.2 Bioinformatics analysis workflow for small RNA-seq data . 28 2.3 Processing strategy of adapter recognition and removal in color space RIGHT trim mode . 32 2.4 miRDeep2 prediction strategy . 37 2.5 MA-plots before and after normalization . 41 3.1 FLEXBAR's internal workflow . 53 3.2 Sequence tag recognition with a dynamic programming matrix . 61 3.3 Graphical representation of FLEXBAR's sequence trim-end modes . 62 3.4 Benchmark V - Comparison of FLEXBAR and CUTADAPT . 62 4.1 Bioinformatic analysis workflow for 10 small RNA data sets . 66 4.2 Identified miRNA genes by miRDeep2 . 68 4.3 miRNA gene complement in C. elegans, P. pacificus, and S. ratti .... 69 4.4 miRNA homology and seed conservation . 71 4.5 miRNA graph of 51 gene families . 73 4.6 Phylogenetic tree of let-7 family miRNAs from eight animal clades and shuffled precursors . 75 4.7 Small RNA-seq expression profiles in C. elegans agree with qRT-PCR data 80 4.8 Proportion of expression changes between developmentally arrested stages 81 ix LIST OF FIGURES 4.9 mir-71 family miRNAs as cross-species candidate regulators in develop- mental arrest . 83 4.10 mir-34 family miRNAs as cross-species candidate regulators in develop- mental arrest . 84 4.11 MSA of mir-34 family miRNAs from seven animal clades . 85 4.12 Properties of single-mutation seed network. 86 4.13 Expression conservation of miR-34 seed neighbors . 87 A.1 MSA of let-7 family miRNAs from eight animal clades and five random generated precursors . 130 B.1 Multiple alignments of miRNA seed groups . 133 B.2 Multiple alignments of miRNA families . 133 B.3 Expression fold changes grouped by miRNA families . 133 x List of Tables 1.1 Comparison of Sanger and next-sequencing platforms . 13 2.1 Deep sequencing small RNA data sets profiled in nematodes .
Recommended publications
  • This Is an Open Access Document Downloaded from ORCA, Cardiff University's Institutional Repository
    This is an Open Access document downloaded from ORCA, Cardiff University's institutional repository: http://orca.cf.ac.uk/117055/ This is the author’s version of a work that was submitted to / accepted for publication. Citation for final published version: Jorge, F., White, R.S.A. and Paterson, Rachel A. 2018. Hiding in the swamp: new capillariid nematode parasitizing New Zealand brown mudfish. Journal of Helminthology 92 (3) , pp. 379-386. 10.1017/S0022149X17000530 file Publishers page: http://dx.doi.org/10.1017/S0022149X17000530 <http://dx.doi.org/10.1017/S0022149X17000530> Please note: Changes made as a result of publishing processes such as copy-editing, formatting and page numbers may not be reflected in this version. For the definitive version of this publication, please refer to the published source. You are advised to consult the publisher’s version if you wish to cite this paper. This version is being made available in accordance with publisher policies. See http://orca.cf.ac.uk/policies.html for usage policies. Copyright and moral rights for publications made available in ORCA are retained by the copyright holders. Title: Hiding in the swamp: new capillariid nematode parasitizing New Zealand brown mudfish Authors: Fátima Jorge1, Richard S. A. White2 and Rachel A. Paterson1,3 Addresses: 1Department of Zoology, University of Otago, PO Box 56, Dunedin 9054, New Zealand; 2School of Biological Sciences, University of Canterbury, Private Bag 4800, Christchurch 8140, New Zealand; 3School of Biosciences, University of Cardiff, Cardiff, CF10 3AX, United Kingdom Running headline: Capillariid nematode parasitizing New Zealand mudfish Corresponding author: Fátima Jorge Department of Zoology, University of Otago, 340 Great King Street, PO Box 56, Dunedin 9054, New Zealand e-mail: [email protected] 1 Abstract The extent of New Zealand’s freshwater fish-parasite diversity has yet to be fully revealed, with host-parasite relationships still to be described from nearly half the known fish community.
    [Show full text]
  • Ontology-Based Methods for Analyzing Life Science Data
    Habilitation a` Diriger des Recherches pr´esent´ee par Olivier Dameron Ontology-based methods for analyzing life science data Soutenue publiquement le 11 janvier 2016 devant le jury compos´ede Anita Burgun Professeur, Universit´eRen´eDescartes Paris Examinatrice Marie-Dominique Devignes Charg´eede recherches CNRS, LORIA Nancy Examinatrice Michel Dumontier Associate professor, Stanford University USA Rapporteur Christine Froidevaux Professeur, Universit´eParis Sud Rapporteure Fabien Gandon Directeur de recherches, Inria Sophia-Antipolis Rapporteur Anne Siegel Directrice de recherches CNRS, IRISA Rennes Examinatrice Alexandre Termier Professeur, Universit´ede Rennes 1 Examinateur 2 Contents 1 Introduction 9 1.1 Context ......................................... 10 1.2 Challenges . 11 1.3 Summary of the contributions . 14 1.4 Organization of the manuscript . 18 2 Reasoning based on hierarchies 21 2.1 Principle......................................... 21 2.1.1 RDF for describing data . 21 2.1.2 RDFS for describing types . 24 2.1.3 RDFS entailments . 26 2.1.4 Typical uses of RDFS entailments in life science . 26 2.1.5 Synthesis . 30 2.2 Case study: integrating diseases and pathways . 31 2.2.1 Context . 31 2.2.2 Objective . 32 2.2.3 Linking pathways and diseases using GO, KO and SNOMED-CT . 32 2.2.4 Querying associated diseases and pathways . 33 2.3 Methodology: Web services composition . 39 2.3.1 Context . 39 2.3.2 Objective . 40 2.3.3 Semantic compatibility of services parameters . 40 2.3.4 Algorithm for pairing services parameters . 40 2.4 Application: ontology-based query expansion with GO2PUB . 43 2.4.1 Context . 43 2.4.2 Objective .
    [Show full text]
  • Next Generation DNA Sequencing
    Genes 2010, 1, 385-387; doi:10.3390/genes1030385 OPEN ACCESS genes ISSN 2073-4425 www.mdpi.com/journal/genes Editorial Special Issue: Next Generation DNA Sequencing Paul Richardson PRC LLC, 25 Amber Lane, Lafayette, CA 94549, USA; E-Mail: [email protected]; Tel.: +1 925 301-5460 Received: 21 October 2010 / Accepted: 26 October 2010 / Published: 27 October 2010 Next Generation Sequencing (NGS) refers to technologies that do not rely on traditional dideoxy-nucleotide (Sanger) sequencing where labeled DNA fragments are physically resolved by electrophoresis. These new technologies rely on different strategies, but essentially all of them make use of real-time data collection of a base level incorporation event across a massive number of reactions (on the order of millions versus 96 for capillary electrophoresis for instance). The major commercial NGS platforms available to researchers are the 454 Genome Sequencer (Roche), Illumina (formerly Solexa) Genome analyzer, the SOLiD system (Applied Biosystems/Life Technologies) and the Heliscope (Helicos Corporation). The techniques and different strategies utilized by these platforms are reviewed in a number of the papers in this special issue. These technologies are enabling new applications that take advantage of the massive data produced by this next generation of sequencing instruments. In this special issue, nine papers review and demonstrate the utility and potential of next generation sequencing. One of the biggest consequences with NGS technologies is how to deal with all of the data produced by these platforms. Magi et al. [1] review the software tools available for the multiple functions needed to process and interpret the huge amounts of data produced by these instruments.
    [Show full text]
  • "Structure, Function and Evolution of the Nematode Genome"
    Structure, Function and Advanced article Evolution of The Article Contents . Introduction Nematode Genome . Main Text Online posting date: 15th February 2013 Christian Ro¨delsperger, Max Planck Institute for Developmental Biology, Tuebingen, Germany Adrian Streit, Max Planck Institute for Developmental Biology, Tuebingen, Germany Ralf J Sommer, Max Planck Institute for Developmental Biology, Tuebingen, Germany In the past few years, an increasing number of draft gen- numerous variations. In some instances, multiple alter- ome sequences of multiple free-living and parasitic native forms for particular developmental stages exist, nematodes have been published. Although nematode most notably dauer juveniles, an alternative third juvenile genomes vary in size within an order of magnitude, com- stage capable of surviving long periods of starvation and other adverse conditions. Some or all stages can be para- pared with mammalian genomes, they are all very small. sitic (Anderson, 2000; Community; Eckert et al., 2005; Nevertheless, nematodes possess only marginally fewer Riddle et al., 1997). The minimal generation times and the genes than mammals do. Nematode genomes are very life expectancies vary greatly among nematodes and range compact and therefore form a highly attractive system for from a few days to several years. comparative studies of genome structure and evolution. Among the nematodes, numerous parasites of plants and Strikingly, approximately one-third of the genes in every animals, including man are of great medical and economic sequenced nematode genome has no recognisable importance (Lee, 2002). From phylogenetic analyses, it can homologues outside their genus. One observes high rates be concluded that parasitic life styles evolved at least seven of gene losses and gains, among them numerous examples times independently within the nematodes (four times with of gene acquisition by horizontal gene transfer.
    [Show full text]
  • Morphology, Life Cycle, Pathogenicity and Prophylaxis of Trichinella Spiralis
    Paper No. : 08 Biology of Parasitism Module : 26 Morphology, Life Cycle, Pathogenicity and Prophylaxis of Trichinella Development Team Principal Investigator: Prof. Neeta Sehgal Head, Department of Zoology, University of Delhi Paper Coordinator: Dr. Pawan Malhotra ICGEB, New Delhi Content Writer: Dr. Rita Rath Dyal Singh College, University of Delhi Content Reviewer: Prof. Virender Kumar Bhasin Department of Zoology, University of Delhi 1 Biology of Parasitism ZOOLOGY Morphology, Life Cycle and Pathogenicity and Prophylaxis of Trichinella Description of Module Subject Name ZOOLOGY Paper Name Biology of Parasitism- ZOOL OO8 Module Name/Title Morphology, Life Cycle, Pathogenicity and Prophylaxis of Trichinella Module Id 26; Morphology, Life Cycle, Pathogenicity and Prophylaxis Keywords Trichinosis, pork worm, encysted larva, striated muscle , intestinal parasite Contents 1. Learning Outcomes 2. History 3. Geographical Distribution 4. Habit and Habitat 5. Morphology 5.1. Adult 5.2. Larva 6. Life Cycle 7. Pathogenicity 7.1. Diagnosis 7.2. Treatment 7.3. Prophylaxis 8. Phylogenetic Position 9. Genomics 10. Proteomics 11. Summary 2 Biology of Parasitism ZOOLOGY Morphology, Life Cycle and Pathogenicity and Prophylaxis of Trichinella 1. Learning Outcomes This unit will help to: Understand the medical importance of Trichinella spiralis. Identify the male and female worm from its morphological characteristics. Explain the importance of hosts in the life cycle of Trichinella spiralis. Diagnose the symptoms of the disease caused by the parasite. Understand the genomics and proteomics of the parasite to be able to design more accurate diagnostic, preventive, curative measures. Suggest various methods for the prevention and control of the parasite. Trichinella spiralis (Pork worm) Classification Kingdom: Animalia Phylum: Nematoda Class: Adenophorea Order: Trichocephalida Superfamily: Trichinelloidea Genus: Trichinella Species: spiralis 2.
    [Show full text]
  • Research Report 2006 Max Planck Institute for Molecular Genetics, Berlin Imprint | Research Report 2006
    MAX PLANCK INSTITUTE FOR MOLECULAR GENETICS Research Report 2006 Max Planck Institute for Molecular Genetics, Berlin Imprint | Research Report 2006 Published by the Max Planck Institute for Molecular Genetics (MPIMG), Berlin, Germany, August 2006 Editorial Board Bernhard Herrmann, Hans Lehrach, H.-Hilger Ropers, Martin Vingron Coordination Claudia Falter, Ingrid Stark Design & Production UNICOM Werbeagentur GmbH, Berlin Number of copies: 1,500 Photos Katrin Ullrich, MPIMG; David Ausserhofer Contact Max Planck Institute for Molecular Genetics Ihnestr. 63–73 14195 Berlin, Germany Phone: +49 (0)30-8413 - 0 Fax: +49 (0)30-8413 - 1207 Email: [email protected] For further information about the MPIMG please see our website: www.molgen.mpg.de MPI for Molecular Genetics Research Report 2006 Table of Contents The Max Planck Institute for Molecular Genetics . 4 • Organisational Structure. 4 • MPIMG – Mission, Development of the Institute, Research Concept. .5 Department of Developmental Genetics (Bernhard Herrmann) . 7 • Transmission ratio distortion (Hermann Bauer) . .11 • Signal Transduction in Embryogenesis and Tumor Progression (Markus Morkel). 14 • Development of Endodermal Organs (Heiner Schrewe) . 16 • Gene Expression and 3D-Reconstruction (Ralf Spörle). 18 • Somitogenesis (Lars Wittler). 21 Department of Vertebrate Genomics (Hans Lehrach) . 25 • Molecular Embryology and Aging (James Adjaye). .31 • Protein Expression and Protein Structure (Konrad Büssow). .34 • Mass Spectrometry (Johan Gobom). 37 • Bioinformatics (Ralf Herwig). .40 • Comparative and Functional Genomics (Heinz Himmelbauer). 44 • Genetic Variation (Margret Hoehe). 48 • Cell Arrays/Oligofingerprinting (Michal Janitz). .52 • Kinetic Modeling (Edda Klipp) . .56 • In Vitro Ligand Screening (Zoltán Konthur). .60 • Neurodegenerative Disorders (Sylvia Krobitsch). .64 • Protein Complexes & Cell Organelle Assembly/ USN (Bodo Lange/Thorsten Mielke). .67 • Automation & Technology Development (Hans Lehrach).
    [Show full text]
  • New Capillariid Nematode Parasitizing New Zealand Brown Mudfish Authors
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Online Research @ Cardiff Title: Hiding in the swamp: new capillariid nematode parasitizing New Zealand brown mudfish Authors: Fátima Jorge1, Richard S. A. White2 and Rachel A. Paterson1,3 Addresses: 1Department of Zoology, University of Otago, PO Box 56, Dunedin 9054, New Zealand; 2School of Biological Sciences, University of Canterbury, Private Bag 4800, Christchurch 8140, New Zealand; 3School of Biosciences, University of Cardiff, Cardiff, CF10 3AX, United Kingdom Running headline: Capillariid nematode parasitizing New Zealand mudfish Corresponding author: Fátima Jorge Department of Zoology, University of Otago, 340 Great King Street, PO Box 56, Dunedin 9054, New Zealand e-mail: [email protected] 1 Abstract The extent of New Zealand’s freshwater fish-parasite diversity has yet to be fully revealed, with host-parasite relationships still to be described from nearly half the known fish community. Whilst advancements in the number of fish species examined and parasite taxa described are being made; some parasite groups, such as nematodes, remain poorly understood. In the present study we combined morphological and molecular analyses to characterize a capillariid nematode found infecting the swim bladder of the brown mudfish Neochanna apoda, an endemic New Zealand fish from peat-swamp-forests. Morphologically, the studied nematodes are distinct from other Capillariinae taxa by the features of the male posterior end, namely the shape of the bursa lobes, and shape of spicule distal end. Male specimens were classified in three different types according to differences in shape of the bursa lobes at the posterior end, but only one was successfully molecularly characterized.
    [Show full text]
  • Next Generation Machine Learning Prediction of Protein Cellular Sorting
    TECHNISCHE UNIVERSITÄT MÜNCHEN Lehrstuhl für Bioinformatik Next Generation Machine Learning Prediction of Protein Cellular Sorting Tatyana Goldberg Vollständiger Abdruck der von der Fakultät für Informatik der Technischen Universität München zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften genehmigten Dissertation. Vorsitzender: Univ.-Prof. Dr. Daniel Cremers Prüfer der Dissertation: 1. Univ.-Prof. Dr. Burkhard Rost 2. Prof. Dr. Yana Bromberg, The State University of New Jersey/USA 3. Univ.-Prof. Dr. Iris Antes Die Dissertation wurde am 15.12.2015 bei der Technischen Universität München eingereicht und durch die Fakultät für Informatik am 22.03.2016 angenommen. ii Table of Contents Abstract ............................................................................................................................................. v Zusammenfassung ........................................................................................................................... vii Acknowledgments ............................................................................................................................ ix List of publications ........................................................................................................................... xi List of Figures and Tables ............................................................................................................... xiii 1. Introduction ...............................................................................................................................
    [Show full text]
  • Evaluation of Normalization Methods in Mammalian Microrna-Seq Data
    Downloaded from rnajournal.cshlp.org on September 27, 2021 - Published by Cold Spring Harbor Laboratory Press METHOD Evaluation of normalization methods in mammalian microRNA-Seq data LANA XIA GARMIRE1 and SHANKAR SUBRAMANIAM1 Department of Bioengineering, Jacobs School of Engineering, University of California at San Diego, La Jolla, California 92093-0412, USA ABSTRACT Simple total tag count normalization is inadequate for microRNA sequencing data generated from the next generation sequencing technology. However, so far systematic evaluation of normalization methods on microRNA sequencing data is lacking. We comprehensively evaluate seven commonly used normalization methods including global normalization, Lowess normalization, Trimmed Mean Method (TMM), quantile normalization, scaling normalization, variance stabilization, and invariant method. We assess these methods on two individual experimental data sets with the empirical statistical metrics of mean square error (MSE) and Kolmogorov-Smirnov (K-S) statistic. Additionally, we evaluate the methods with results from quantitative PCR validation. Our results consistently show that Lowess normalization and quantile normalization perform the best, whereas TMM, a method applied to the RNA-Sequencing normalization, performs the worst. The poor performance of TMM normalization is further evidenced by abnormal results from the test of differential expression (DE) of microRNA-Seq data. Comparing with the models used for DE, the choice of normalization method is the primary factor that affects the results of DE. In summary, Lowess normalization and quantile normalization are recommended for normalizing microRNA-Seq data, whereas the TMM method should be used with caution. Keywords: microRNA-Seq; next generation sequencing; statistical normalization; high-throughput data analysis INTRODUCTION much smaller than mRNAs. For example, so far there are <1000 annotated microRNAs in human that are expected to The next generation sequencing (NGS) technology has regulate z30% of genes.
    [Show full text]
  • First Analysis Steps
    FirstFirst analysisanalysis stepssteps o quality control and optimization o calibration and error modeling o data transformations Wolfgang Huber Dep. of Molecular Genome Analysis (A. Poustka) DKFZ Heidelberg Acknowledgements Anja von Heydebreck Günther Sawitzki Holger Sültmann, Klaus Steiner, Markus Vogt, Jörg Schneider, Frank Bergmann, Florian Haller, Katharina Finis, Stephanie Süß, Anke Schroth, Friederike Wilmer, Judith Boer, Martin Vingron, Annemarie Poustka Sandrine Dudoit, Robert Gentleman, Rafael Irizarry and Yee Hwa Yang: Bioconductor short course, summer 2002 and many others a microarray slide Slide: 25x75 mm Spot-to-spot: ca. 150-350 µm 4 x 4 or 8x4 sectors 17...38 rows and columns per sector ca. 4600…46000 probes/array sector: corresponds to one print-tip Terminology sample: RNA (cDNA) hybridized to the array, aka target, mobile substrate. probe: DNA spotted on the array, aka spot, immobile substrate. sector: rectangular matrix of spots printed using the same print-tip (or pin), aka print-tip-group plate: set of 384 (768) spots printed with DNA from the same microtitre plate of clones slide, array channel: data from one color (Cy3 = cyanine 3 = green, Cy5 = cyanine 5 = red). batch: collection of microarrays with the same probe layout. Raw data scanner signal resolution: 5 or 10 mm spatial, 16 bit (65536) dynamical per channel ca. 30-50 pixels per probe (60 µm spot size) 40 MB per array Raw data scanner signal resolution: 5 or 10 mm spatial, 16 bit (65536) dynamical per channel ca. 30-50 pixels per probe (60 µm spot size) 40 MB per array Image Analysis Raw data scanner signal resolution: 5 or 10 mm spatial, 16 bit (65536) dynamical per channel ca.
    [Show full text]
  • Probabilistic Models for Gene Silencing Data
    Probabilistic Models for Gene Silencing Data Florian Markowetz Dezember 2005 Dissertation zur Erlangung des Grades eines Doktors der Naturwissenschaften (Dr. rer. nat.) am Fachbereich Mathematik und Informatik der Freien Universit¨atBerlin 1. Referent: Prof. Dr. Martin Vingron 2. Referent: Prof. Dr. Klaus-Robert M¨uller Tag der Promotion: 26. April 2006 Preface Acknowledgements This work was carried out in the Computational Diagnostics group of the Department of Computational Molecular Biology at the Max Planck Institute for Molecular Genetics in Berlin. I thank all past and present colleagues for the good working atmosphere and the scientific—and sometimes maybe not so scientific—discussions. Especially, I am grateful to my supervisor Rainer Spang for suggesting the topic, his scientific support, and the opportunity to write this thesis under his guidance. I thank Michael Boutros for providing the expression data and for introducing me to the world of RNAi when I visited his lab at the DKFZ in Heidelberg. I thank Anja von Heydebreck, J¨org Schultz, and Martin Vingron for their advice and counsel as members of my PhD commitee. During the time I worked on this thesis, I enjoyed fruitful discussions with many people. In particular, I gratefully acknowledge Jacques Bloch, Steffen Grossmann, Achim Tresch, and Chen-Hsiang Yeang for their contributions. Special thanks go to Viola Gesellchen, Britta Koch, Stefanie Scheid, Stefan Bentink, Stefan Haas, Dennis Kostka, and Stefan R¨opcke, who read drafts of this thesis and greatly improved it by their comments. Publications Parts of this thesis have been published before. Chapter 2 grew out of lectures I gave in 2005 at the Instiute for Theoretical Physics and Mathematics (IPM) in Tehran, Iran, and at the German Conference on Bioinformatics (GCB), Hamburg, Germany.
    [Show full text]
  • Trichuris Trichiura
    GLOBAL WATER PATHOGEN PROJECT PART THREE. SPECIFIC EXCRETED PATHOGENS: ENVIRONMENTAL AND EPIDEMIOLOGY ASPECTS TRICHURIS TRICHIURA Ricardo Izurieta University of South Florida Tampa, United States Miguel Reina-Ortiz University of South Florida Tampa, United States Tatiana Ochoa-Capello Moffitt Cancer Center Tampa, United States Copyright: This publication is available in Open Access under the Attribution-ShareAlike 3.0 IGO (CC-BY-SA 3.0 IGO) license (http://creativecommons.org/licenses/by-sa/3.0/igo). By using the content of this publication, the users accept to be bound by the terms of use of the UNESCO Open Access Repository (http://www.unesco.org/openaccess/terms-use-ccbysa-en). Disclaimer: The designations employed and the presentation of material throughout this publication do not imply the expression of any opinion whatsoever on the part of UNESCO concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. The ideas and opinions expressed in this publication are those of the authors; they are not necessarily those of UNESCO and do not commit the Organization. Citation: Izurieta, R., Reina-Ortiz, M. and Ochoa-Capello, T. (2018). Trichuris trichiura. In: J.B. Rose and B. Jiménez-Cisneros (eds), Water and Sanitation for the 21st Century: Health and Microbiological Aspects of Excreta and Wastewater Management (Global Water Pathogen Project). (L. Robertson (eds), Part 3: Specific Excreted Pathogens: Environmental and Epidemiology Aspects - Section 4: Helminths), Michigan State University, E. Lansing, MI, UNESCO. https://doi.org/10.14321/waterpathogens.43 Acknowledgements: K.R.L. Young, Project Design editor; Website Design (http://www.agroknow.com) Last published: August 3, 2018 Trichuris trichiura Summary least in some regions of the world (Yu et al., 2016; Al- Mekhlafi et al., 2006; Norhayati et al., 1997).
    [Show full text]