Complete Genome Sequence of the Hyperthermophilic Bacteria- Thermotoga Sp
Total Page:16
File Type:pdf, Size:1020Kb
COMPLETE GENOME SEQUENCE OF THE HYPERTHERMOPHILIC BACTERIA- THERMOTOGA SP. STRAIN RQ7 Rutika Puranik A Thesis Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of MASTER OF SCIENCE May 2015 Committee: Zhaohui Xu, Advisor Scott Rogers George Bullerjahn © 2015 Rutika Puranik All Rights Reserved iii ABSTRACT Zhaohui Xu, Advisor The genus Thermotoga is one of the deep-rooted genus in the phylogenetic tree of life and has been studied for its thermostable enzymes and the property of hydrogen production at higher temperatures. The current study focuses on the complete genome sequencing of T. sp. strain RQ7 to understand and identify the conserved as well as variable properties between the strains and its genus with the approach of comparative genomics. A pipeline was developed to assemble the complete genome based on the next generation sequencing (NGS) data. The pipeline successfully combined computational approaches with wet lab experiments to deliver a completed genome of T. sp. strain RQ7 that has the genome size of 1,851,618 bp with a GC content of 47.1%. The genome is submitted to Genbank with accession CP07633. Comparative genomic analysis of this genome with three other strains of Thermotoga, helped identifying putative natural transformation and competence protein coding genes in addition to the absence of TneDI restriction- modification system in T. sp. strain RQ7. Genome analysis also assisted in recognizing the unique genes in T. sp. strain RQ7 and CRISPR/Cas system. This strain has 8 CRISPR loci and an array of Cas coding genes in the entire genome. The genome sequencing of this strain has provided a platform for studying the development of genetic tools, which would make these strains industrially applicable for biofuel generation. iv I dedicate this work to my parents Mr. Rajiv Puranik and Mrs. Radhika Puranik. v ACKNOWLEDGMENTS I sincerely thank my advisor Dr. Zhaohui Xu for constantly guiding me in my entire course of Master’s program. I am grateful for her support and patience. I wish to thank my committee members Dr. Scott Rogers and Dr. George Bullerjahn for accepting to be on my thesis committee and their valuable inputs. I appreciate the enormous support and help of my lab members Dr. Dongmei Han, Hui Xu and Dr. Uksha Saini and all my friends in the department. I am especially thankful to my husband Akshay Joshi for motivating me and believing in me. vi TABLE OF CONTENTS Page I. INTRODUCTION ............................................................................................................. 1 Genus Thermotoga ..................................................................................................... 1 Genome Sequencing .................................................................................................. 3 Genome Annotation ................................................................................................... 6 II. MATERIALS AND METHODS ...................................................................................... 9 Growth conditions and DNA isolation ...................................................................... 9 Genome sequencing by BGI America’s .................................................................... 10 Genome assembly and analysis ................................................................................. 11 Overview of completing the genome ......................................................................... 11 Primer Walking and PCR........................................................................................... 13 Genome annotation .................................................................................................... 16 III. RESULTS ........................................................................................................................ 19 Completion of Thermotoga sp. strain RQ7 genome .................................................. 19 Genome details and features ...................................................................................... 30 Features analyzed in the genome ............................................................................... 31 Unique genes in T. sp. strain RQ7 ................................................................. 31 Natural Transformation .................................................................................. 32 The Type II Restriction-Modification system TneDI .................................... 42 CRISPRs ........................................................................................................ 44 IV. DISCUSSION .................................................................................................................. 47 vii V. CONCLUSIONS ............................................................................................................. 49 VI. REFERENCES ................................................................................................................ 50 viii LIST OF FIGURES Figure Page 1 Phylogenetic tree of life based on the small subunit rRNA sequences ...................... 2 2 Small subunit rRNA phylogeny of different Thermotoga strains .............................. 3 3 The principle of Sanger sequencing ........................................................................... 4 4 Schematic representation of steps involved in Illumina sequencing technology ...... 5 5 The approach of Paired-End reads ............................................................................. 6 6 Multistep annotation process ..................................................................................... 7 7 Dataflow schematic for genome annotation .............................................................. 8 8 Method used by BGI America’s for data production and quality control ................. 10 9 The pipeline of genome assembling and gap closure ................................................ 12 10 Schematic overview of GapFish algorithm................................................................ 14 11 Schematic representation of the steps performed during primer walking ................. 16 12 Diagram summarizing the overall outline of prokaryotic genome annotation Pipeline ...................................................................................................................... 17 13 Multiple genome alignment of four different Thermotoga isolates using Mauve alignment tool ............................................................................................................ 22 14 Amplification of randomly selected genes in the big gap of T. sp. strain RQ7 ......... 23 15 Wet lab approach to confirm the existence of the minigaps ...................................... 26 16 Differences in obtaining PCR products for easy genes and difficult genes found in the big gap of T. sp. strain RQ7 ................................................................................. 27 17 The approach of Nested PCR for amplifying the difficult genes in the big gap of T. sp. strain RQ7 ........................................................................................................ 28 ix 18 Nested PCR with internal primers for obtaining the final PCR product of difficult gene ............................................................................................................................ 29 19 Metabolic reaction of sulfate reduction pathway ....................................................... 31 20 CDD (Conserved Domain Database) analysis using amino acid sequences of the PilZ protein in V. cholerae, P. aeruginosa and T. sp. strain RQ7 ............................. 34 21 CDD analysis using amino acid sequences of PilB protein in V. cholerae, P. aeruginosa and T. sp. strain RQ7 .......................................................................... 35 22 CDD analysis of PilQ amino acid sequence in N. gonorrhoeae showing the presence of important conserved secretin domain ..................................................... 36 23 Conserved domains of PilC protein in P. stutzeri and putative PilC in T. sp. strain RQ7 analyzed by CDD ........................................................................... 36 24 CDD analysis using amino acid sequences outlines the conserved domains of PilD in N. gonorrhoeae, V. vulnificus and putative PilD in T. sp. strain RQ7 .......... 37 25 Outline of the conserved domains of PilT protein in N. gonorrhoea and putative PilT in T. sp. strain RQ7 ............................................................................................ 38 26 CDD analysis using amino acid sequences of PilE in P. aeruginosa and N. gonorrhoeae and putative PilE in T. sp. strain RQ7 ............................................. 38 27 CDD analysis using amino acid sequence of ComM in H. influenzae and putative ComM in T. sp. strain RQ7 .......................................................................... 39 28 CDD analysis of putative ComE in T. sp. strain RQ7 and ComE in the reference organism B. subtilis .................................................................................................... 40 29 CDD analysis using amino acid sequence of ComEC protein in B. subtilis and putative ComEC in T. sp. strain RQ7 ........................................................................ 40 x 30 CDD analysis using amino acid sequence of putative ComFC in T. sp.strain RQ7 and ComFC in B. subtilis ..........................................................................................