Downloaded from orbit.dtu.dk on: Oct 09, 2021
Complete genomic and transcriptional landscape analysis using third-generation sequencing: a case study of Saccharomyces cerevisiae CEN.PK113-7D
Jenjaroenpun, Piroon; Wongsurawat, Thidathip; Pereira, Rui; Patumcharoenpol, Preecha; Ussery, David W.; Nielsen, Jens; Nookaew, Intawat
Published in: Nucleic acids research
Link to article, DOI: 10.1093/nar/gky014
Publication date: 2018
Document Version Publisher's PDF, also known as Version of record
Link back to DTU Orbit
Citation (APA): Jenjaroenpun, P., Wongsurawat, T., Pereira, R., Patumcharoenpol, P., Ussery, D. W., Nielsen, J., & Nookaew, I. (2018). Complete genomic and transcriptional landscape analysis using third-generation sequencing: a case study of Saccharomyces cerevisiae CEN.PK113-7D. Nucleic acids research, 46(7), [e38]. https://doi.org/10.1093/nar/gky014
General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.
Users may download and print one copy of any publication from the public portal for the purpose of private study or research. You may not further distribute the material or use it for any profit-making activity or commercial gain You may freely distribute the URL identifying the publication in the public portal
If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.
Published online 13 January 2018 Nucleic Acids Research, 2018, Vol. 46, No. 7 e38 doi: 10.1093/nar/gky014 Complete genomic and transcriptional landscape analysis using third-generation sequencing: a case study of Saccharomyces cerevisiae CEN.PK113-7D Piroon Jenjaroenpun1,†, Thidathip Wongsurawat1,†,RuiPereira2, Preecha Patumcharoenpol1, David W. Ussery1,3, Jens Nielsen2,4 and Intawat Nookaew1,2,3,*
1Department of Biomedical Informatics, College of Medicine, University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA, 2Department of Biology and Biological Engineering, Chalmers University of Technology, Gothenburg SE-412 96, Sweden, 3Department of Physiology and Biophysics, College of Medicine, The University of Arkansas for Medical Sciences, Little Rock, AR 72205, USA and 4Novo Nordisk Foundation Center for Biosustainability, Technical University of Denmark, DK2800 Lyngby, Denmark
Received September 11, 2017; Revised January 03, 2018; Editorial Decision January 04, 2018; Accepted January 05, 2018
ABSTRACT polyadenylated non-coding RNAs, rRNAs, telomere- RNA, long non-coding RNA and antisense RNA. This Completion of eukaryal genomes can be difficult work demonstrates a strategy to obtain complete task with the highly repetitive sequences along the genome sequences and transcriptional landscapes chromosomes and short read lengths of second- that can be applied to other eukaryal organisms. generation sequencing. Saccharomyces cerevisiae strain CEN.PK113-7D, widely used as a model or- ganism and a cell factory, was selected for this study to demonstrate the superior capability of very INTRODUCTION long sequence reads for de novo genome assem- The genome of the most well studied eukaryotic model bly. We generated long reads using two common organism, Saccharomyces cerevisiae strain S288c, was se- third-generation sequencing technologies (Oxford quenced and released in 1996; it was the first complete, high Nanopore Technology (ONT) and Pacific Biosciences quality genome sequence of an eukaryal organism (1). Since (PacBio)) and used short reads obtained using Il- then, the development of DNA sequencing technologies has lumina sequencing for error correction. Assembly yielded scientific breakthroughs that enable us to obtain of the reads derived from all three technologies re- and analyze genomic DNA sequences at a faster, more eco- sulted in complete sequences for all 16 yeast chro- nomical pace (2). As of August 2017, the NCBI genome > mosomes, as well as the mitochondrial chromosome, database lists 500 Saccharomyces genomes and 4600 eu- < in one step. Further, we identified three types of karyal sequenced genomes. However, 1% (35 genomes) of these are classified as ‘complete genomes’, which harbor DNA methylation (5mC, 4mC and 6mA). Compari- contiguous chromosomal sequence(s) without gaps (defi- son between the reference strain S288C and strain nition by NCBI); this includes 1 animal (Caenorhabditis CEN.PK113-7D identified chromosomal rearrange- elegans), 29 fungi and 5 protists. All of these have rela- ments against a background of similar gene con- tively small genome sizes, most <100 Mb. Thus, >99% of tent between the two strains. We identified full-length the eukaryal genomes are drafts, and virtually all of the transcripts through ONT direct RNA sequencing larger genomes are incomplete. There are many limitations technology. This allows for the identification of tran- of draft genomes, which can lead to misinterpretations (e.g. scriptional landscapes, including untranslated re- see (3)). Complete genome status should be the objective for gions (UTRs) (5 UTR and 3 UTR) as well as differen- a genome-sequencing project, even though draft genomes tial gene expression quantification. About 91% of the can provide most of the coding sequences that are suffi- predicted transcripts could be consistently detected cient to gain functional insights about an organism. Once a genome is obtained, transcriptional analysis can be per- across biological replicates grown either on glucose formed to improve gene annotation and identify dynamic or ethanol. Direct RNA sequencing identified many signatures of gene expression. Traditional RNA-Seq has
*To whom correspondence should be addressed. Tel: +1 501-603-1968; Fax: +1 501-526-5964; Email: [email protected] † These authors contributed equally to this work as first authors.