Assessment of Next Generation Sequencing Technologies for <I>De Novo</I> and Hybrid Assemblies of Challenging Bacter

Assessment of Next Generation Sequencing Technologies for <I>De Novo</I> and Hybrid Assemblies of Challenging Bacter

University of Tennessee, Knoxville TRACE: Tennessee Research and Creative Exchange Doctoral Dissertations Graduate School 5-2016 Assessment of Next Generation Sequencing Technologies for De novo and Hybrid Assemblies of Challenging Bacterial Genomes Sagar Mukund Utturkar University of Tennessee - Knoxville, [email protected] Follow this and additional works at: https://trace.tennessee.edu/utk_graddiss Part of the Bioinformatics Commons Recommended Citation Utturkar, Sagar Mukund, "Assessment of Next Generation Sequencing Technologies for De novo and Hybrid Assemblies of Challenging Bacterial Genomes. " PhD diss., University of Tennessee, 2016. https://trace.tennessee.edu/utk_graddiss/3669 This Dissertation is brought to you for free and open access by the Graduate School at TRACE: Tennessee Research and Creative Exchange. It has been accepted for inclusion in Doctoral Dissertations by an authorized administrator of TRACE: Tennessee Research and Creative Exchange. For more information, please contact [email protected]. To the Graduate Council: I am submitting herewith a dissertation written by Sagar Mukund Utturkar entitled "Assessment of Next Generation Sequencing Technologies for De novo and Hybrid Assemblies of Challenging Bacterial Genomes." I have examined the final electronic copy of this dissertation for form and content and recommend that it be accepted in partial fulfillment of the equirr ements for the degree of Doctor of Philosophy, with a major in Life Sciences. Steven D. Brown, Major Professor We have read this dissertation and recommend its acceptance: Christopher W. Schadt, Mitchel J. Doktycz, Dale A. Pelletier, Gladys Alexandre Accepted for the Council: Carolyn R. Hodges Vice Provost and Dean of the Graduate School (Original signatures are on file with official studentecor r ds.) Assessment of Next Generation Sequencing Technologies for De novo and Hybrid Assemblies of Challenging Bacterial Genomes A Dissertation Presented for the Doctor of Philosophy Degree The University of Tennessee, Knoxville Sagar Mukund Utturkar May 2016 Copyright © 2015 by Sagar Utturkar All rights reserved. ii Dedicated to my beloved grandmother, late Mrs. Jayashree Gavankar, For her prayers and unconditional love iii ACKNOWLEDGEMENTS I would like to thank the people who helped me during my research and dissertation. First and foremost, I would like to thank my advisor Dr. Steven D. Brown for insightful guidance, advice, support and encouragement throughout my Ph.D. curriculum. Dr. Brown showed immense patience during my early learning years, encouraged me for manuscript writing, presented me with right opportunities to work on collaborative projects, provided freedom to express my ideas and included me in interactions with top scientists. His goal-oriented research, dedicated nature and friendly adulation made him my role model for a scientist, mentor and a teacher. He transformed me into a better scientist and stronger person and I will be indebted to him all my life for his kindness. I would also like to thank my distinguished committee members Dr. Mitch Doktycz, Dr. Dale Pelletier, Dr. Chris Schadt and Dr. Gladys Alexandre for their time, efforts, suggestions and critical review to move my research forward. I would like to express my special thanks to Dr. Mircea Podar for external help towards this dissertation research and collaborative research opportunities. I would like to acknowledge the Genome Science and Technology program, University of Tennessee, Plant-Microbe Interfaces project and Oak Ridge National Laboratory for providing financial support and excellent work environment. I was never limited by the tools and resources required to perform productive research. I have had the pleasure of working with amazing colleagues in Dawn Kilngeman, Charlotte Wilson, Kyle Sander, Miguel Rodriguez, Punita Manga, Chia-wei Wu, and Alex Dumitrache. I want to have a special mention of Dawn Klingeman for teaching me various wet-lab techniques and performing all the sequencing runs to make this research possible, Charlotte Wilson for sharing thoughts and laughs, and Miguel for being a nice friend and providing stimulating lab environment. I would also like to thank several people from computational biology group at Oak Ridge National Laboratory including Steve Moulton and Michael Galloway for providing technical help required during my research and Miriam Land for providing outstanding support with various computational tools, scripts and ideas. I express my deepest gratitude for my parents, Mr. Mukund Utturkar and Mrs. Vidya Utturkar who were always besides me and provided freedom to pursue my dreams. I won’t be where I am without them. I would like to thank my wife, Ketaki Bhide for her continuous support and staying strong during these years. I also want to thank my grandfather Mr. Vinayak Gavankar, my aunt Mrs. Varsha Agashe, and my in-laws for the encouragement and support. Finally, words will be limited to acknowledge the role of my friends in Knoxville, especially Snehal Joshi, Sarvesh Iyer and Snigdha Sewlikar, who are like my second family and never let me miss my home. iv ABSTRACT In past decade, tremendous progress has been made in DNA sequencing methodologies in terms of throughput, speed, read-lengths, along with a sharp decrease in per base cost. These technologies, commonly referred to as next-generation sequencing (NGS) are complimented by the development of hybrid assembly approaches which can utilize multiple NGS platforms. In the first part of my dissertation I performed systematic evaluations and optimizations of nine de novo and hybrid assembly protocols across four novel microbial genomes. While each had strengths and weaknesses, via optimization using multiple strategies I obtained dramatic improvements in overall assembly size and quality. To select the best assembly, I also proposed the novel rDNA operon validation approach to evaluate assembly accuracy. Additionally, I investigated the ability of third- generation PacBio sequencing platform and achieved automated finishing of Clostridium autoethanogenum without any accessory data. These complete genome sequences facilitated comparisons which revealed rDNA operons as a major limitation for short read technologies, and also enabled comparative and functional genomics analysis. To facilitate future assessment and algorithms developments of NGS technologies we publically released the sequence datasets for C. autoethanogenum which span three generations of sequencing technologies, containing six types of data from four NGS platforms. To assess limitations of NGS technologies, assessment of unassembled regions within Illumina and PacBio assemblies was performed using eight microbial genomes. This analysis confirmed rDNA operons as major breakpoints within Illumina assembly while gaps within PacBio assembly appears to be an unaccounted for event and assembly quality is cumulative effect of read-depth, read-quality, sample DNA quality and presence of phage DNA or mobile genetic elements. In a final collaborative study an enrichment protocol was applied for isolation of live endophytic bacteria from roots of the tree Populus deltoides. This protocol achieved a significant reduction in contaminating plant DNA and enabled use these samples for single-cell genomics analysis for the first time. Whole genome sequencing of selected single-cell genomes was performed, assembly and contamination removal optimized, and followed by the bioinformatics, phylogenetic and comparative genomics analyses to identify unique characteristics of these uncultured microorganisms. v TABLE OF CONTENTS CHAPTER 1 : INTRODUCTION ..................................................................................... 1 1.1 Background ............................................................................................................. 2 1.2 Statement of hypothesis ....................................................................................... 13 1.3 Approach ............................................................................................................... 15 1.4 Significance ........................................................................................................... 17 References ................................................................................................................... 19 Appendix ...................................................................................................................... 26 CHAPTER 2 : EVALUATION AND VALIDATION OF DE NOVO AND HYBRID ASSEMBLY TECHNIQUES TO DERIVE HIGH QUALITY GENOME SEQUENCES ... 30 2.1 Abstract .................................................................................................................. 32 2.2 Introduction ........................................................................................................... 32 2.3 Methods ................................................................................................................. 33 2.4 Results and Discussion ........................................................................................ 34 2.5 Conclusions ........................................................................................................... 40 References ................................................................................................................... 42 Appendix ...................................................................................................................... 45 CHAPTER 3 : COMPARISON OF SINGLE-MOLECULE

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    261 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us