RNA and DNA Sequence Analysis of the Human Transcriptome

Total Page:16

File Type:pdf, Size:1020Kb

RNA and DNA Sequence Analysis of the Human Transcriptome University of Pennsylvania ScholarlyCommons Publicly Accessible Penn Dissertations 2013 RNA and DNA Sequence Analysis of the Human Transcriptome Jonathan M. Toung University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/edissertations Part of the Bioinformatics Commons, and the Genetics Commons Recommended Citation Toung, Jonathan M., "RNA and DNA Sequence Analysis of the Human Transcriptome" (2013). Publicly Accessible Penn Dissertations. 709. https://repository.upenn.edu/edissertations/709 This paper is posted at ScholarlyCommons. https://repository.upenn.edu/edissertations/709 For more information, please contact [email protected]. RNA and DNA Sequence Analysis of the Human Transcriptome Abstract The manifestation of phenotype at the cellular and organismal level is determined in large part by gene expression, or the transcription of DNA into RNA. As such, the study of the transcriptome, or the characterization and quantification of all RNA produced in the cell, is important. Recent advances in sequencing technology have allowed for unprecedented interrogation of the transcriptome at single- nucleotide resolution. In the first part of this thesis, we use RNA-Sequencing (RNA-Seq) to study the human B-cell transcriptome and determine the experimental parameters necessary for sequencing-based studies of gene expression. We discover that deep sequencing is necessary to detect fully and quantify accurately the complexity of human transcriptomes. Furthermore, we find that at high sequencing depths, the vast majority of transcribed elements in human B-cells are detected. In the second part of this thesis, we utilize the sequence information provided by RNA-Seq to analyze systematic differences between DNA and RNA sequence. The transmission of information from DNA to RNA is a critical process and is expected to occur in a one-to-one fashion. By comparing the DNA sequence to RNA sequence of the same individuals, we found all 12 types of RNA-DNA sequence differences (RDDs), the majority of which cannot be explained by known mechanisms such as RNA editing or transcriptional infidelity. We developed computational methods to robustly identify RDDs and control for false positives resulting from genotyping, sequencing, and alignment error. Finally, we explore the genetic basis of RDD levels, or the proportion of reads at a site bearing the sequence difference allele. In particular, we analyzed the levels of RNA editing in unrelated and related individuals and found that a significant portion of individual variation in A-to-G editing levels contains a genetic component. In summary, our results demonstrate that RNA-Seq is a powerful technique for comprehensive and quantitative analysis of gene expression. In addition, the resolution offered by RNA-Seq enables a detailed view of sequence differences between RNA and DNA. Future work will focus on understanding the mechanisms and factors influencing RDDs. Our esultsr suggest that RDD levels may be considered a quantitative and heritable phenotype; as such, a genetic approach may be a sensible method for finding the determinants and mechanism of RDDs. Degree Type Dissertation Degree Name Doctor of Philosophy (PhD) Graduate Group Genomics & Computational Biology First Advisor Frederic Bushman Second Advisor Nancy Zhang Keywords Computational biology, Genome analysis, Next-generation sequencing, RNA Editing, RNA-Sequencing, Transcriptome analysis Subject Categories Bioinformatics | Genetics This dissertation is available at ScholarlyCommons: https://repository.upenn.edu/edissertations/709 RNA AND DNA SEQUENCE ANALYSIS OF THE HUMAN TRANSCRIPTOME Jonathan M. Toung A DISSERTATION in Genomics and Computational Biology Presented to the Faculties of the University of Pennsylvania in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy 2013 _______________________________ _______________________________ Frederic Bushman Nancy Zhang Professor of Microbiology Professor of Statistics Supervisor of Dissertation Co-supervisor of Dissertation _______________________________ Maja Bucan Professor of Genetics Graduate Group Chairperson Dissertation Committee: Christopher D. Brown, Assistant Professor of Genetics Gregory R. Grant, Research Assistant Professor of Genetics Shane T. Jensen, Associate Professor of Statistics (chair) Thomas A. Jongens, Associate Professor of Genetics Ponzy Lu, Professor of Chemistry Dedication For Esther and my family. ii Acknowledgements While many a times it has felt that the pursuit of this degree has been a solitary and singular effort, I would be foolish to not acknowledge the many people that have helped me tremendously along the way. This journey began by Dr. Ponzy Lu intermittently badgering me to pursue a Ph.D since the first time I sought out his advice as a freshman in the fall of 2006. My undergraduate mentor, thesis committee member, and life coach – I am truly indebted to your tireless support of me. I thank Dr. Vivian Cheung for her support of my studies. I thank Michael Morley for teaching me all there is to know about bioinformatics and programming. Much of what I have accomplished today computationally would not have been possible without your help. Thank you Dr. Isabel Xiaorong Wang. I truly appreciate your willingness to sit me down in your office and help with my committee meetings, my papers, my public talks. I thank past and present members of the Cheung lab for their help and support: Wendy Ankener, Will Bernal, Lauren Brady, Alan Bruzel, Jim Devlin, Susannah Elwyn, Brittany Gregory, Anna Lee, Sen Sen Liu, Yaojuan Liu, Colleen McGarry, Allison Richards, and Elizabeth So. I would not have been able to complete this degree had it not been for individuals who saw a precarious situation and did everything in their power to rectify it. Dr. Rick Bushman, I thank you so much for your support and encouragement throughout the year. Thank you for being the first to provide a space for me in your lab to finish my studies. You have gone above and beyond your duties as an advisor to vouch and fight for me. Dr. Nancy Zhang, I thank you for your support as my advisor. I cannot forget your efforts iii throughout the situation to make sure I could progress and move on with my studies. Your statistical insights into the project have always been inspiring, and I am truly grateful to have learned so much from you. Dr. John Hogenesch, you have been a tremendous help for my last paper. Your encouragement and advice on how to deal with the situation was invaluable. Dr. Greg Grant, chapter 4 would not have been possible without you. Thank you for your advice and devotion to the project. To members of my committee, Dr. Greg Grant, Dr. Shane T. Jensen, Dr. Tom Jongens, Dr. Ponzy Lu, Dr. Nancy Zhang and members of my academic review committee, Dr. Maja Bucan, Dr. Rick Bushman, and Dr. John Hogenesch – thank you all for getting me through this wild ride. Thank you Dr. Warren Ewens and Dr. Mingyao Li for your help and guidance on my studies. Thank you to everyone in the Genomics and Computational Biology program. A special thanks to Hannah Chervitz and all that you do to make our lives easier as students. To my parents, thank you for your support and lessons to never give up. I’m glad we made it through another journey. And lastly, to my dear girlfriend Esther. You deserve this degree as much as I do. You have put up with a lot in the past three years. It has been an emotional roller coaster and you have been with me every step of the way. Here’s to better days ahead. Someone once told me the wise saying that life is never as good as you imagine it to be nor as bad as you think it is. It will be just okay. iv ABSTRACT RNA AND DNA-SEQUENCE ANALYSIS OF THE HUMAN TRANSCRIPTOME Jonathan M. Toung Frederic Bushman Nancy Zhang The manifestation of phenotype at the cellular and organismal level is determined in large part by gene expression, or the transcription of DNA into RNA. As such, the study of the transcriptome, or the characterization and quantification of all RNA produced in the cell, is important. Recent advances in sequencing technology have allowed for unprecedented interrogation of the transcriptome at single-nucleotide resolution. In the first part of this thesis, we use RNA-Sequencing (RNA-Seq) to study the human B-cell transcriptome and determine the experimental parameters necessary for sequencing-based studies of gene expression. We discover that deep sequencing is necessary to detect fully and quantify accurately the complexity of human transcriptomes. Furthermore, we find that at high sequencing depths, the vast majority of transcribed elements in human B-cells are detected. In the second part of this thesis, we utilize the sequence information provided by RNA-Seq to analyze systematic differences between DNA and RNA sequence. The transmission of information from DNA to RNA is a critical process and is expected to occur in a one-to-one fashion. By comparing the DNA sequence to RNA sequence of the v same individuals, we found all 12 types of RNA-DNA sequence differences (RDDs), the majority of which cannot be explained by known mechanisms such as RNA editing or transcriptional infidelity. We developed computational methods to robustly identify RDDs and control for false positives resulting from genotyping, sequencing, and alignment error. Finally, we explore the genetic basis of RDD levels, or the proportion of reads at a site bearing the sequence difference allele. In particular, we analyzed the levels of RNA editing in unrelated and related individuals and found that a significant portion of individual variation in A-to-G editing levels contains a genetic component. In summary, our results demonstrate that RNA-Seq is a powerful technique for comprehensive and quantitative analysis of gene expression. In addition, the resolution offered by RNA-Seq enables a detailed view of sequence differences between RNA and DNA. Future work will focus on understanding the mechanisms and factors influencing RDDs.
Recommended publications
  • Microdeletions in 16P11.2 and 13Q31.3 Associated with Developmental Delay and Generalized Overgrowth
    Microdeletions in 16p11.2 and 13q31.3 associated with developmental delay and generalized overgrowth A.M. George1, J. Taylor2 and D.R. Love1 1Diagnostic Genetics, LabPlus, Auckland City Hospital, Auckland, New Zealand 2Northern Regional Genetic Service, Auckland City Hospital, Auckland, New Zealand Corresponding author: D.R. Love E-mail: [email protected] Genet. Mol. Res. 11 (3): 3133-3137 (2012) Received November 28, 2011 Accepted July 18, 2012 Published September 3, 2012 DOI http://dx.doi.org/10.4238/2012.September.3.1 ABSTRACT. Chromosome microarray analysis of patients with developmental delay has provided evidence of small deletions or duplications associated with this clinical phenotype. In this context, a 7.1- to 8.7-Mb interstitial deletion of chromosome 16 is well documented, but within this interval a rare 200-kb deletion has recently been defined that appears to be associated with obesity, or developmental delay together with overgrowth. We report a patient carrying this rare deletion, who falls into the latter clinical category, but who also carries a second very rare deletion in 13q31.3. It remains unclear if this maternally inherited deletion acts as a second copy number variation leading to pathogenic variation, or is non-causal and the true modifiers are yet to be determined. Key words: Developmental delay; Obesity; Overgrowth; GPC5; SH2B1 Genetics and Molecular Research 11 (3): 3133-3137 (2012) ©FUNPEC-RP www.funpecrp.com.br A.M. George et al. 3134 INTRODUCTION Current referrals for chromosome microarray analysis (CMA) are primarily for de- termining the molecular basis of developmental delay and autistic spectrum disorder in child- hood.
    [Show full text]
  • Rabbit Anti-Human ECH1 Polyclonal Antibody (DPABH-07145) This Product Is for Research Use Only and Is Not Intended for Diagnostic Use
    Rabbit Anti-Human ECH1 Polyclonal Antibody (DPABH-07145) This product is for research use only and is not intended for diagnostic use. PRODUCT INFORMATION Immunogen Recombinant Protein, antigen sequence: ISLRLTGSSAQEEASGVALGEAPDHSYESLRVTSAQKHVLHVQLNRPNKRNAMNKVFWREMVE CFNKISRDADCRAVVISGAGKMFTAGIDLMDMASDILQPKGDDVARISWYLRDIITRYQETFNVIE RCPK Isotype IgG Source/Host Rabbit Species Reactivity Human Purification Antigen affinity purified Conjugate Unconjugated Applications IHC, WB Format Liquid Size 100 μL Buffer 40% glycerol and PBS (pH 7.2). Preservative 0.02% Sodium Azide Storage Store at +4°C for short term storage. Long time storage is recommended at -20°C. Gently mix before use. Optimal concentrations and conditions for each application should be determined by the user. BACKGROUND Introduction This gene encodes a member of the hydratase/isomerase superfamily. The gene product shows high sequence similarity to enoyl-coenzyme A (CoA) hydratases of several species, particularly within a conserved domain characteristic of these proteins. The encoded protein, which contains a C-terminal peroxisomal targeting sequence, localizes to the peroxisome. The rat ortholog, 45-1 Ramsey Road, Shirley, NY 11967, USA Email: [email protected] Tel: 1-631-624-4882 Fax: 1-631-938-8221 1 © Creative Diagnostics All Rights Reserved which localizes to the matrix of both the peroxisome and mitochondria, can isomerize 3-trans,5- cis-dienoyl-CoA to 2-trans,4-trans-dienoyl-CoA, indicating that it is a delta3,5-delta2,4-dienoyl- CoA isomerase. This enzyme
    [Show full text]
  • PARSANA-DISSERTATION-2020.Pdf
    DECIPHERING TRANSCRIPTIONAL PATTERNS OF GENE REGULATION: A COMPUTATIONAL APPROACH by Princy Parsana A dissertation submitted to The Johns Hopkins University in conformity with the requirements for the degree of Doctor of Philosophy Baltimore, Maryland July, 2020 © 2020 Princy Parsana All rights reserved Abstract With rapid advancements in sequencing technology, we now have the ability to sequence the entire human genome, and to quantify expression of tens of thousands of genes from hundreds of individuals. This provides an extraordinary opportunity to learn phenotype relevant genomic patterns that can improve our understanding of molecular and cellular processes underlying a trait. The high dimensional nature of genomic data presents a range of computational and statistical challenges. This dissertation presents a compilation of projects that were driven by the motivation to efficiently capture gene regulatory patterns in the human transcriptome, while addressing statistical and computational challenges that accompany this data. We attempt to address two major difficulties in this domain: a) artifacts and noise in transcriptomic data, andb) limited statistical power. First, we present our work on investigating the effect of artifactual variation in gene expression data and its impact on trans-eQTL discovery. Here we performed an in-depth analysis of diverse pre-recorded covariates and latent confounders to understand their contribution to heterogeneity in gene expression measurements. Next, we discovered 673 trans-eQTLs across 16 human tissues using v6 data from the Genotype Tissue Expression (GTEx) project. Finally, we characterized two trait-associated trans-eQTLs; one in Skeletal Muscle and another in Thyroid. Second, we present a principal component based residualization method to correct gene expression measurements prior to reconstruction of co-expression networks.
    [Show full text]
  • Wo 2010/075007 A2
    (12) INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) (19) World Intellectual Property Organization International Bureau (10) International Publication Number (43) International Publication Date 1 July 2010 (01.07.2010) WO 2010/075007 A2 (51) International Patent Classification: (81) Designated States (unless otherwise indicated, for every C12Q 1/68 (2006.01) G06F 19/00 (2006.01) kind of national protection available): AE, AG, AL, AM, C12N 15/12 (2006.01) AO, AT, AU, AZ, BA, BB, BG, BH, BR, BW, BY, BZ, CA, CH, CL, CN, CO, CR, CU, CZ, DE, DK, DM, DO, (21) International Application Number: DZ, EC, EE, EG, ES, FI, GB, GD, GE, GH, GM, GT, PCT/US2009/067757 HN, HR, HU, ID, IL, IN, IS, JP, KE, KG, KM, KN, KP, (22) International Filing Date: KR, KZ, LA, LC, LK, LR, LS, LT, LU, LY, MA, MD, 11 December 2009 ( 11.12.2009) ME, MG, MK, MN, MW, MX, MY, MZ, NA, NG, NI, NO, NZ, OM, PE, PG, PH, PL, PT, RO, RS, RU, SC, SD, (25) Filing Language: English SE, SG, SK, SL, SM, ST, SV, SY, TJ, TM, TN, TR, TT, (26) Publication Language: English TZ, UA, UG, US, UZ, VC, VN, ZA, ZM, ZW. (30) Priority Data: (84) Designated States (unless otherwise indicated, for every 12/3 16,877 16 December 2008 (16.12.2008) US kind of regional protection available): ARIPO (BW, GH, GM, KE, LS, MW, MZ, NA, SD, SL, SZ, TZ, UG, ZM, (71) Applicant (for all designated States except US): DODDS, ZW), Eurasian (AM, AZ, BY, KG, KZ, MD, RU, TJ, W., Jean [US/US]; 938 Stanford Street, Santa Monica, TM), European (AT, BE, BG, CH, CY, CZ, DE, DK, EE, CA 90403 (US).
    [Show full text]
  • Analysis of Trans Esnps Infers Regulatory Network Architecture
    Analysis of trans eSNPs infers regulatory network architecture Anat Kreimer Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY 2014 © 2014 Anat Kreimer All rights reserved ABSTRACT Analysis of trans eSNPs infers regulatory network architecture Anat Kreimer eSNPs are genetic variants associated with transcript expression levels. The characteristics of such variants highlight their importance and present a unique opportunity for studying gene regulation. eSNPs affect most genes and their cell type specificity can shed light on different processes that are activated in each cell. They can identify functional variants by connecting SNPs that are implicated in disease to a molecular mechanism. Examining eSNPs that are associated with distal genes can provide insights regarding the inference of regulatory networks but also presents challenges due to the high statistical burden of multiple testing. Such association studies allow: simultaneous investigation of many gene expression phenotypes without assuming any prior knowledge and identification of unknown regulators of gene expression while uncovering directionality. This thesis will focus on such distal eSNPs to map regulatory interactions between different loci and expose the architecture of the regulatory network defined by such interactions. We develop novel computational approaches and apply them to genetics-genomics data in human. We go beyond pairwise interactions to define network motifs, including regulatory modules and bi-fan structures, showing them to be prevalent in real data and exposing distinct attributes of such arrangements. We project eSNP associations onto a protein-protein interaction network to expose topological properties of eSNPs and their targets and highlight different modes of distal regulation.
    [Show full text]
  • Autism Multiplex Family with 16P11.2P12.2 Microduplication Syndrome in Monozygotic Twins and Distal 16P11.2 Deletion in Their Brother
    European Journal of Human Genetics (2012) 20, 540–546 & 2012 Macmillan Publishers Limited All rights reserved 1018-4813/12 www.nature.com/ejhg ARTICLE Autism multiplex family with 16p11.2p12.2 microduplication syndrome in monozygotic twins and distal 16p11.2 deletion in their brother Anne-Claude Tabet1,2,3,4, Marion Pilorge2,3,4, Richard Delorme5,6,Fre´de´rique Amsellem5,6, Jean-Marc Pinard7, Marion Leboyer6,8,9, Alain Verloes10, Brigitte Benzacken1,11,12 and Catalina Betancur*,2,3,4 The pericentromeric region of chromosome 16p is rich in segmental duplications that predispose to rearrangements through non-allelic homologous recombination. Several recurrent copy number variations have been described recently in chromosome 16p. 16p11.2 rearrangements (29.5–30.1 Mb) are associated with autism, intellectual disability (ID) and other neurodevelopmental disorders. Another recognizable but less common microdeletion syndrome in 16p11.2p12.2 (21.4 to 28.5–30.1 Mb) has been described in six individuals with ID, whereas apparently reciprocal duplications, studied by standard cytogenetic and fluorescence in situ hybridization techniques, have been reported in three patients with autism spectrum disorders. Here, we report a multiplex family with three boys affected with autism, including two monozygotic twins carrying a de novo 16p11.2p12.2 duplication of 8.95 Mb (21.28–30.23 Mb) characterized by single-nucleotide polymorphism array, encompassing both the 16p11.2 and 16p11.2p12.2 regions. The twins exhibited autism, severe ID, and dysmorphic features, including a triangular face, deep-set eyes, large and prominent nasal bridge, and tall, slender build. The eldest brother presented with autism, mild ID, early-onset obesity and normal craniofacial features, and carried a smaller, overlapping 16p11.2 microdeletion of 847 kb (28.40–29.25 Mb), inherited from his apparently healthy father.
    [Show full text]
  • Investigation of RNA-Mediated Pathogenic Pathways in a Drosophila Model of Expanded Repeat Disease
    Investigation of RNA-mediated pathogenic pathways in a Drosophila model of expanded repeat disease A thesis submitted for the degree of Doctor of Philosophy, June 2010 Clare Louise van Eyk, B.Sc. (Hons.) School of Molecular and Biomedical Science, Discipline of Genetics The University of Adelaide II Table of Contents Index of Figures and Tables……………………………………………………………..VII Declaration………………………………………………………………………………......XI Acknowledgements…………………………………………………………………........XIII Abbreviations……………………………………………………………………………....XV Drosophila nomenclature…………………………………………………………….….XV Abstract………………………………………………………………………………........XIX Chapter 1: Introduction ............................................................................................1 1.0 Expanded repeat diseases....................................................................................1 1.1 Translated repeat diseases...................................................................................2 1.1.1 Polyglutamine diseases .............................................................................2 Huntington’s disease...................................................................................3 Spinal bulbar muscular atrophy (SBMA) .....................................................3 Dentatorubral-pallidoluysian atrophy (DRPLA) ...........................................4 The spinal cerebellar ataxias (SCAs)..........................................................4 1.1.2 Pathogenesis and aggregate formation .....................................................7
    [Show full text]
  • RNA Editing at Baseline and Following Endoplasmic Reticulum Stress
    RNA Editing at Baseline and Following Endoplasmic Reticulum Stress By Allison Leigh Richards A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Human Genetics) in The University of Michigan 2015 Doctoral Committee: Professor Vivian G. Cheung, Chair Assistant Professor Santhi K. Ganesh Professor David Ginsburg Professor Daniel J. Klionsky Dedication To my father, mother, and Matt without whom I would never have made it ii Acknowledgements Thank you first and foremost to my dissertation mentor, Dr. Vivian Cheung. I have learned so much from you over the past several years including presentation skills such as never sighing and never saying “as you can see…” You have taught me how to think outside the box and how to create and explain my story to others. I would not be where I am today without your help and guidance. Thank you to the members of my dissertation committee (Drs. Santhi Ganesh, David Ginsburg and Daniel Klionsky) for all of your advice and support. I would also like to thank the entire Human Genetics Program, and especially JoAnn Sekiguchi and Karen Grahl, for welcoming me to the University of Michigan and making my transition so much easier. Thank you to Michael Boehnke and the Genome Science Training Program for supporting my work. A very special thank you to all of the members of the Cheung lab, past and present. Thank you to Xiaorong Wang for all of your help from the bench to advice on my career. Thank you to Zhengwei Zhu who has helped me immensely throughout my thesis even through my panic.
    [Show full text]
  • Contributions to Biostatistics: Categorical Data Analysis, Data Modeling and Statistical Inference Mathieu Emily
    Contributions to biostatistics: categorical data analysis, data modeling and statistical inference Mathieu Emily To cite this version: Mathieu Emily. Contributions to biostatistics: categorical data analysis, data modeling and statistical inference. Mathematics [math]. Université de Rennes 1, 2016. tel-01439264 HAL Id: tel-01439264 https://hal.archives-ouvertes.fr/tel-01439264 Submitted on 18 Jan 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. HABILITATION À DIRIGER DES RECHERCHES Université de Rennes 1 Contributions to biostatistics: categorical data analysis, data modeling and statistical inference Mathieu Emily November 15th, 2016 Jury: Christophe Ambroise Professeur, Université d’Evry Val d’Essonne, France Rapporteur Gérard Biau Professeur, Université Pierre et Marie Curie, France Président David Causeur Professeur, Agrocampus Ouest, France Examinateur Heather Cordell Professor, University of Newcastle upon Tyne, United Kingdom Rapporteur Jean-Michel Marin Professeur, Université de Montpellier, France Rapporteur Valérie Monbet Professeur, Université de Rennes 1, France Examinateur Korbinian Strimmer Professor, Imperial College London, United Kingdom Examinateur Jean-Philippe Vert Directeur de recherche, Mines ParisTech, France Examinateur Pour M.H.A.M. Remerciements En tout premier lieu je tiens à adresser mes remerciements à l’ensemble des membres du jury qui m’ont fait l’honneur d’évaluer mes travaux.
    [Show full text]
  • Microbes and Metagenomics in Human Health an Overview of Recent Publications Featuring Illumina® Technology TABLE of CONTENTS
    Microbes and Metagenomics in Human Health An overview of recent publications featuring Illumina® technology TABLE OF CONTENTS 4 Introduction 5 Human Microbiome Gut Microbiome Gut Microbiome and Disease Inflammatory Bowel Disease (IBD) Metabolic Diseases: Diabetes and Obesity Obesity Oral Microbiome Other Human Biomes 25 Viromes and Human Health Viral Populations Viral Zoonotic Reservoirs DNA Viruses RNA Viruses Human Viral Pathogens Phages Virus Vaccine Development 44 Microbial Pathogenesis Important Microorganisms in Human Health Antimicrobial Resistance Bacterial Vaccines 54 Microbial Populations Amplicon Sequencing 16S: Ribosomal RNA Metagenome Sequencing: Whole-Genome Shotgun Metagenomics Eukaryotes Single-Cell Sequencing (SCS) Plasmidome Transcriptome Sequencing 63 Glossary of Terms 64 Bibliography This document highlights recent publications that demonstrate the use of Illumina technologies in immunology research. To learn more about the platforms and assays cited, visit www.illumina.com. An overview of recent publications featuring Illumina technology 3 INTRODUCTION The study of microbes in human health traditionally focused on identifying and 1. Roca I., Akova M., Baquero F., Carlet J., treating pathogens in patients, usually with antibiotics. The rise of antibiotic Cavaleri M., et al. (2015) The global threat of resistance and an increasingly dense—and mobile—global population is forcing a antimicrobial resistance: science for interven- tion. New Microbes New Infect 6: 22-29 1, 2, 3 change in that paradigm. Improvements in high-throughput sequencing, also 2. Shallcross L. J., Howard S. J., Fowler T. and called next-generation sequencing (NGS), allow a holistic approach to managing Davies S. C. (2015) Tackling the threat of anti- microbial resistance: from policy to sustainable microbes in human health.
    [Show full text]
  • Bioinformatics Tools for RNA-Seq Gene and Isoform Quantification
    on: Sequ ati en er c n in e g G & t x A Journal of e p Zhang, et al., Next Generat Sequenc & Applic p N l f i c o 2016, 3:3 a l t a i o n r ISSN: 2469-9853n u s DOI: 10.4172/2469-9853.1000140 o Next Generation Sequencing & Applications J Review Article Open Access Bioinformatics Tools for RNA-seq Gene and Isoform Quantification Chi Zhang1, Baohong Zhang1, Michael S Vincent2 and Shanrong Zhao1* 1Early Clinical Development, Pfizer Worldwide R&D, Cambridge, MA, USA 2Inflammation and Immunology RU, Pfizer Worldwide R&D, Cambridge, MA, USA *Corresponding author: Shanrong Zhao, Early Clinical Development, Pfizer Worldwide R&D, Cambridge, MA, 02139, USA, Tel: + 1-212-733-2323; E-mail: [email protected] Rec date: Oct 27, 2016; Acc date: Dec 15, 2016; Pub date: Dec 17, 2016 Copyright: © 2016 Zhang C, et al. This is an open-access article distributed under the terms of the creative commons attribution license, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Abstract In recent years, RNA-seq has emerged as a powerful technology in estimation of gene or transcript expression. ‘Union-exon’ and transcript based approaches are widely used in gene quantification. The ‘Union-exon’ based approach is simple, but it does not distinguish between isoforms when multiple alternatively spliced transcripts are expressed from the same gene. Because a gene is expressed in one or more transcript isoforms, the transcript based approach is more biologically meaningful than the ‘union exon’-based approach.
    [Show full text]
  • Sequence Analysis Instructions
    Sequence Analysis Instructions In order to predict your drug metabolizing phenotype from your CYP2D6 gene sequence, you must determine: 1) The assembled sequence from your two opposing sequencing reactions 2) If your PCR product even represents the human CYP2D6 gene, 3) The location of your sequence within the CYP2D6 gene, 4) Whether differences between your alleles and CYP2D6*1 sequence represents sequencing errors or polymorphisms in your sequence, and 5) The effect(s) of any polymorphisms on CYP2D6 protein sequence. As you analyze your gene sequence, copy and paste your analyses and results into a text file for your final lab report. If some factor, like the quality of your sequence, prevents you from carrying out the complete analysis, your grade will not be penalized—just complete as many of the steps below as possible, and include an explanation of why you could not complete the analysis in your final report. Font appearing in bold green italics describes questions to answer and items to include in your final report. Part 1: Assembling your CYP2D6 sequence from both directions The sequencing reaction only produces 700-800 bases of good sequence. To get most of the sequence of the 1.2 Kbp PCR product, sequencing reactions were performed from both directions on the PCR product. These need to be assembled into one sequence using the overlap of the two sequences. In order to assure that it is going in the forward direction, it is best to derive the reverse complement sequence from the reverse sequencing file. Take the text file and paste it into a program like http://www.bioinformatics.org/sms/rev_comp.html.
    [Show full text]