View metadata, citation and similar papers at core.ac.uk brought to you by CORE

provided by Plymouth Electronic Archive and Research Library Standards in Genomic Sciences

Permanent draft genome of thioparus DSM 505T, an obligately chemolithoautotrophic member of the --Manuscript Draft--

Manuscript Number: SIGS-D-16-00109R3 Full Title: Permanent draft genome of Thiobacillus thioparus DSM 505T, an obligately chemolithoautotrophic member of the Betaproteobacteria Article Type: Short genome report

Funding Information: U.S. Department of Energy Dr Nikos C Kyrpides (DE-AC02-05CH11231) Royal Society Dr Rich Boden (RG120444)

Abstract: Thiobacillus thioparus DSM 505T is one of first two isolated strains of inorganic sulfur- oxidising . The original strain of T. thioparus was lost almost 100 years ago and the working type strain is Culture CT (=DSM 505T = ATCC 8158T) isolated by Starkey in 1934 from agricultural soil at Rutgers University, New Jersey, USA. It is an obligate chemolithoautotroph that conserves energy from the oxidation of reduced inorganic sulfur compounds using the Kelly-Trudinger pathway and uses it to fix carbon dioxide It is not capable of heterotrophic or mixotrophic growth. The strain has a genome size of 3,201,518 bp. Here we report the genome sequence, annotation and characteristics. The genome contains 3,135 protein coding and 62 RNA coding genes. Genes encoding the transaldolase variant of the Calvin-Benson-Bassham cycle were also identified and an operon encoding carboxysomes, along with Smith's biosynthetic horseshoe in lieu of Krebs' cycle sensu stricto. Terminal oxidases were identified, viz. cytochrome c oxidase (cbb3, EC 1.9.3.1) and ubiquinol oxidase (bd, EC 1.10.3.10). There is a partial sox operon of the Kelly-Friedrich pathway of inorganic sulfur- oxidation that contains soxXYZAB genes but lacking soxCDEF, there is also a lack of the DUF302 gene previously noted in the sox operon of other members of the that can use trithionate as an energy source. In spite of apparently not growing anaerobically with denitrification, the nar, nir, nor and nos operons encoding enzymes of denitrification are found in the T. thioparus genome, in the same arrangements as in the true denitrifier T. denitrificans. Corresponding Author: Rich Boden, Ph.D B.Sc University of Plymouth Plymouth, Devon UNITED KINGDOM Corresponding Author Secondary Information: Corresponding Author's Institution: University of Plymouth Corresponding Author's Secondary Institution: First Author: Lee P Hutt First Author Secondary Information: Order of Authors: Lee P Hutt Marcel Huntemann Alicia Clum Manoj Pillay Krishnaveni Palaniappan Neha Varghese Natalia Mikhailova Dimitrios Stamatis

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation Tatiparthi Reddy Chris Daum Nicole Shapiro Natalia Ivanova Nikos C Kyrpides Tanja Woyke Rich Boden, Ph.D B.Sc Order of Authors Secondary Information: Response to Reviewers: In response to request to add 2 percentages to one of the tables (which could have been added in proof tbh), they are done. Hopefully this is now final.

Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation Manuscript Click here to download Manuscript R4.docx

Click here to view linked References 1 2 3 4 5 6 7 8 9 10 11 1 12 Permanent draft genome of Thiobacillus thioparus 13 14 T 15 2 DSM 505 , an obligately chemolithoautotrophic 16 17 18 3 member of the Betaproteobacteria 19 20 21 22 4 Lee P. Hutt1,2, Marcel Huntemann3, Alicia Clum3, Manoj Pillay3, Krishnaveni 23 24 5 Palaniappan3, Neha Varghese3, Natalia Mikhailova3, Dimitrios Stamatis3, 25

26 3 3 3 3 27 6 Tatiparthi Reddy , Chris Daum , Nicole Shapiro , Natalia Ivanova , Nikos 28 3 3 1,2 29 7 Kyrpides , Tanja Woyke , Rich Boden * 30 31 8 32 33 9 [1] School of Biological and Marine Sciences, University of Plymouth, Drake Circus, Plymouth, PL4 8AA, 34 35 10 UK. 36 37 11 [2] Sustainable Earth Institute, University of Plymouth, Drake Circus, Plymouth, PL4 8AA, UK. 38 39 40 12 [3] DOE Joint Genome Institute, Walnut Creek, CA 94598, USA. 41 42 13 * Corresponding author ([email protected]) 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11 14 Abstract 12 13 15 Thiobacillus thioparus DSM 505T is one of first two isolated strains of inorganic sulfur-oxidising 14 15 16 Bacteria. The original strain of T. thioparus was lost almost 100 years ago and the working type strain is 16 17 Culture CT (=DSM 505T = ATCC 8158T) isolated by Starkey in 1934 from agricultural soil at Rutgers 17 18 18 University, New Jersey, USA. It is an obligate chemolithoautotroph that conserves energy from the 19 20 19 oxidation of reduced inorganic sulfur compounds using the Kelly-Trudinger pathway and uses it to fix 21 22 20 carbon dioxide It is not capable of heterotrophic or mixotrophic growth. The strain has a genome size of 23 21 3,201,518 bp. Here we report the genome sequence, annotation and characteristics. The genome contains 24 25 22 3,135 protein coding and 62 RNA coding genes. Genes encoding the transaldolase variant of the Calvin- 26 27 23 Benson-Bassham cycle were also identified and an operon encoding carboxysomes, along with Smith’s 28 24 biosynthetic horseshoe in lieu of Krebs’ cycle sensu stricto. Terminal oxidases were identified, viz. 29 30 25 cytochrome c oxidase (cbb3, EC 1.9.3.1) and ubiquinol oxidase (bd, EC 1.10.3.10). There is a partial sox 31 32 26 operon of the Kelly-Friedrich pathway of inorganic sulfur-oxidation that contains soxXYZAB genes but 33 27 lacking soxCDEF, there is also a lack of the DUF302 gene previously noted in the sox operon of other 34 35 28 members of the ‘Proteobacteria’ that can use trithionate as an energy source. In spite of apparently not 36 37 29 growing anaerobically with denitrification, the nar, nir, nor and nos operons encoding enzymes of 38 30 denitrification are found in the T. thioparus genome, in the same arrangements as in the true denitrifier T. 39 40 31 denitrificans. 41 42 32 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11 33 Keywords: 12 13 34 Thiobacillus thioparus, Betaproteobacteria, sulfur oxidation, 14 15 35 chemolithoautotroph, carboxysome, denitrification 16 17 18 19 36 Abbreviations 20 21 37 ATCC – American Type Culture Collection 22 23 24 38 BLAST – Basic Linear Alignment Search Tool 25 26 39 CIP – Collection de L’institut Pasteur 27 28 29 40 COG – Clusters of Orthologous Groups 30 31 32 41 DSMZ - Leibniz-Institut DSMZ - Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH 33 34 42 GEBA – Genomic Encyclopedia of Bacteria and Archaea. 35 36 37 43 IMG – Integrated Microbial Genomes database. 38 39 40 44 JCM – Japan Collection of Microorganisms 41 42 45 JGI – Joint Genome Institute 43 44 45 46 KMG – 1,000 Microbial Genomes project 46 47 47 NBRC – NITE Biological Resource Center 48 49 50 48 NCBI – National Center for Biotechnology Information 51 52 53 49 RuBisCO - ribulose-1,5-bisphosphate carboxylase/oxygenase 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11 50 Introduction 12 13 51 In 1902, Thiobacillus thioparus was one of the first two obligately chemolithoautrophic sulfur-oxidizing 14 15 52 Bacteria to be isolated, (along with what is now neapolitanus), and was named in 1904 16 53 [1, 2]. The original isolates were lost, but Thiobacillus thioparus is now the type species of the genus (a 17 18 54 member of the Betaproteobacteria) and the type strain is an isolate from Starkey (1934) (= Culture CT = 19 20 55 DSM 505T = ATCC 8158T = CIP 104484T = JCM 3859T = NBRC 103402T) [3, 4]. Originally, the 21 22 56 characteristic of utilizing inorganic sulfur compounds as an energy source was thought to be a taxonomic 23 57 trait unique to Thiobacillus and at its height the genus contained in at least 32 different ‘species’ [4]. With 24 25 58 phylogenetic methods however, many of these strains have since been reassigned to different, often new 26 T 27 59 genera with T. thioparus DSM 505 being one of four species with validly published names left in the 28 60 genus. Compared to other inorganic sulfur-oxidisers, surprisingly little research has been conducted on T. 29 30 61 thioparus DSM 505T in terms of physiology and biochemistry or genetics, possibly due to the low growth 31 32 62 yields of this species when compared to T. denitrificans and T. aquaesulis [5, 6] making it more 33 63 challenging to study. It may, however, give extended and contrasted insights into autotrophic sulfur- 34 35 64 oxidation in the Bacteria. It was selected for genome sequencing as part of the Department of Energy 36 37 65 DOE-CSP 2012 initiative – as type species of a genus. 38 39 40 66 Organism Information 41 42 67 Classification and features 43 44 68 Thiobacillus thioparus DSM 505T was isolated from sandy loam soil from the New Jersey Agricultural 45 46 69 Experimental Station planted with unspecified vegetable crops using 20mM thiosulfate as sole energy 47 48 70 source in basal medium at pH 8.5 by Starkey (1934) [3] and is currently one of four species with validly 49 71 published names within the genus. It forms small white colonies of 1-3 mm diameter after 2-3 days that 50 51 72 turn pink or brown with age and which become coated with yellow-ish elementary sulfur. When grown in 52 53 73 liquid media, finely divided (white) elementary sulfur is formed during early stages of growth, 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11 74 particularly when thiosulfate is used as the energy source. This disappears when growth approaches 12 13 75 stationary phase. Thiosulfate is oxidized stoichiometrically to tetrathionate after 24 h accompanied by a 14 76 rise in pH, characteristic of the Kelly-Trudinger pathway. Tetrathionate is subsequently oxidized to 15 16 77 sulfate with culture pH falling to pH 4.8 by stationary phase. During continuous culture, no intermediates 17 18 78 are detected in the medium once steady-state has been established. If the dilution rate of a thiosulfate 19 79 limited chemostat is increased, a large production of elementary sulfur is observed, which disappears as a 20 21 80 new steady-state is established at the faster dilution rate. General features of T. thioparus DSM 505T are 22 23 81 summarized in Table 1. A phylogenetic tree based on the 16S rRNA gene sequence showing this 24 82 organisms position within the Betaproteobacteria and rooted with tepidarius is given 25 26 83 in Figure 1. 27 28 Cells are 1.5 – 2.0 by 0.6 to 0.8 μm and stain Gram negative. They are motile by means of a single polar 29 84 30 85 flagellum between 6 - 10 μm in length, as shown in Figure 2. The dominant respiratory quinone is 31 32 86 ubiquione-8 [7-10] and fix carbon dioxide using the Calvin-Benson-Bassham cycle at the expense of 33 34 87 inorganic sulfur oxidation. Cells accumulate polyphosphate (‘volutin’) granules in batch culture and to a 35 88 lesser extent in an energy-limited chemostat [8]. Carboxysomes are present in cells regardless of growth 36 37 89 conditions employed (Fig. 1), but are apparently not found in T. denitrificans cells, at least not under 38 39 90 growth conditions employed in previous studies [9]. T. thioparus can only use oxygen as a terminal 40 91 electron acceptor – nitrate, nitrite, thiosulfate, sulfate, elementary sulfur and ferric iron do not support 41 42 92 growth as terminal electron acceptors. The genomic DNA G+C content has been estimated using the 43 44 93 thermal denaturation [11] at 61 – 62 mol% [7, 10]. T. thioparus DSM 505T does not grow on any organic 45 94 carbon compound tested, including sugars (glucose, ribose, fructose, sucrose), intermediates of Krebs’ 46 47 95 cycle (citrate, succinate, fumarate, malate, oxaloacetate), carboxylates (glycolate, formate, acetate, 48 49 96 propionate, pyruvate), one-carbon compounds (monomethylamine, dimethylamine, trimethylamine, 50 51 97 methanol, methane), structural amino acids (all 20), substituted thiophenes (thiophene-2-carboxylate, 52 98 thiophene-3-carboxylate) or complex media (yeast extract, nutrient broth, brain-heart infusion, Columbia 53 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11 99 sheep or horse blood agar, chocolate agar). Energy sources that support autotrophic growth of DSM 505T 12 13100 include thiosulfate, trithionate, tetrathionate, pentathionate, hexathionate, thiocyanate and dithionate. 14101 Some bone fide strains of T. thioparus (Tk-m and E6) grow autotrophically on carbon disulfide, 15 16102 dimethylsulfide, dimethyldisulfide and Admidate (O,O-dimethylphosphoramidothioate) [4], but it is not 17 T 18103 known if the type strain DSM 505 is capable of this. Autotrophic growth is not supported by Fe(II), 19 104 Mn(II), Cu(I), U(IV), sulfite, dimethylsulfoxide, dimethylsulfone, pyrite or formate. During batch growth 20 21105 on thiosulfate the intermediate production of tetrathionate is observed during early stages of growth, 22 23106 indicative of the Kelly-Trudinger pathway [12]. Compared to the other two members of the genus [5, 6], 24 107 T. thioparus DSM 505T has relatively low growth yields on thiosulfate (Hutt and Boden, manuscript in 25 26108 preparation) which may give insight into the physiological variances of Kelly-Trudinger pathway 27 28109 organisms even within one genus. 29 30 31110 Genome sequencing information 32 33 111 Genome project history 34 35112 This organism was selected for sequencing on the basis of its role in sulfur cycling, physiological, 36 37113 biochemical, evolutionary and biogeochemical importance, and is part of the GEBA-KMG project at the 38 39114 U.S. Department of Energy JGI. The genome project is deposited in the Genomes OnLine Database [13] 40 115 and a high-quality permanent draft genome sequence in IMG (the annotated genome is publically 41 42116 available in IMG under Genome ID 2515154076) [14]. Sequencing, finishing and annotation were 43 44117 performed by the JGI using state of the art sequencing technology [15]. A summary of the project 45 information is given in Table 2. 46118 47 48119 Growth conditions and genomic DNA preparation 49 50120 T. thioparus DSM 505T DNA was obtained from Dr Hans-Peter Klenk at the DSMZ, having been grown 51 52121 on basal salts medium pH 6.6, supplemented with 40 mM thiosulfate as the sole energy source (DSM 53 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11122 Medium 36), under air at 26 °C for 72 h. DNA was extracted using the JETFLEX Genomic DNA 12 13123 Purification Kit from Genomed (Löhne, Germany) into TE Buffer. Quality was checked by agarose gel 14124 electrophoresis. 15 16 17125 Genome sequencing and assembly 18 19126 The draft genome of Thiobacillus thioparus DSM 505T was generated at the DOE JGI using the Illumina 20 21127 technology [16]. An Illumina standard shotgun library was constructed and sequenced using the Illumina 22128 HiSeq 2000 platform which generated 11,161,382 reads totalling 1,674.2 Mbp. All general aspects of 23 24129 library construction and sequencing performed at the JGI can be found at http://www.jgi.doe.gov. All raw 25 26130 Illumina sequence data was passed through DUK, a filtering program developed at JGI, which removes 27 131 known Illumina sequencing and library preparation artifacts [17]. Following steps were then performed 28 29132 for assembly: (1) filtered Illumina reads were assembled using Velvet (version 1.1.04) [18], (2) 1–3 Kbp 30 31133 simulated paired end reads were created from Velvet contigs using wgsim [19], (3) Illumina reads were 32 33134 assembled with simulated read pairs using Allpaths–LG (version r41043) [20]. Parameters for assembly 34135 steps were: 1) Velvet (velveth: 63 –shortPaired and velvetg: –very clean yes –export-Filtered yes –min 35 36136 contig lgth 500 –scaffolding no –cov cutoff 10) 2) wgsim (–e 0 –1 100 –2 100 –r 0 –R 0 –X 0) 3) 37 38137 Allpaths–LG (PrepareAllpathsInputs: PHRED 64=1 PLOIDY=1 FRAG COVERAGE=125 JUMP 39138 COVERAGE=25 LONG JUMP COV=50, RunAllpathsLG: THREADS=8 RUN=std shredpairs 40 41139 TARGETS=standard VAPI WARN ONLY=True OVERWRITE= OVERWRITE=True) . The final draft 42 43140 assembly contained 20 contigs in 20 scaffolds. The total size of the genome is 3.2 Mbp and the final 44141 assembly is based on 392.6 Mbp of Illumina data, which provides an average 122.7× coverage of the 45 46142 genome. 47 48 49143 Genome annotation 50 144 Genes were identified using Prodigal [21], followed by a round of manual curation using GenePRIMP 51 52145 [22] for finished genomes and Draft genomes in fewer than 10 scaffolds. The predicted CDSs were 53 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11146 translated and used to search the NCBI nonredundant database, UniProt, TIGRFam, Pfam, KEGG, COG, 12 13147 and InterPro databases. The tRNAScanSE tool [23] was used to find tRNA genes, whereas ribosomal 14148 RNA genes were found by searches against models of the ribosomal RNA genes built from SILVA [24]. 15 16149 Other non–coding RNAs such as the RNA components of the protein secretion complex and the RNase P 17 18150 were identified by searching the genome for the corresponding Rfam profiles using INFERNAL [25]. 19 151 Additional gene prediction analysis and manual functional annotation was performed within the IMG 20 21152 platform [26] developed by the JGI, Walnut Creek, CA, USA [27, 28]. 22 23 24 25153 Genome Properties

26 T 27154 The genome of T. thioparus DSM 505 is 3,201,518 bp-long with a 62.3 mol% G+C content (Table 3). Of 28155 the 3,197 predicted genes, 3,135 were protein-coding genes and 62 were RNA genes. A total of 2,597 29 30156 genes (81.23 %) have predicted function. A total of 538 (16.83 %) were identified as pseudogenes – the 31 32157 remainder annotated as hypothetical proteins. The properties and the statistics of the genome are given in 33158 Table 3, the distribution of genes into COG functional categories is given in Table 4. The genome is the 34 35159 second largest genome of obligate chemolithoautotrophs sequenced to date and is 89 % (bp/bp) or 90 % 36 T 37160 (protein coding genes/protein coding genes) of the size of that of T. denitrificans DSM 12475 [12]. 38 39 40161 Insights from the genome sequence 41 42162 As an obligate autotroph, it would be expected for a complete Calvin-Benson-Bassham cycle and, in lieu 43 44163 of Krebs’ cycle, Smith’s biosynthetic horseshoe [12, 29, 30, 33]. Smith’s horseshoe representing a very 45 46164 near-complete Krebs’ cycle was identified, in which citrate synthase (EC 2.3.3.16), aconitase (EC 4.2.1.3), 47165 isocitrate dehydrogenase (NADP+, EC 1.1.1.42), succinyl coenzyme A synthase (ADP-forming, EC 48 49166 2.6.1.5), succinate dehydrogenase (EC 1.3.5.1) and malate dehydrogenase (oxaloacetate decarboxylating, 50 51167 NADP+, EC 1.1.1.40) genes are present, but fumarase (EC 4.2.1.2) and the E3 subunit of α-ketoglutarate 52168 dehydrogenase (NAD+, EC 1.2.4.2) genes are absent – E1 and E2 are present. This is a comparatively 53 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11169 large version of Smith’s horseshoe [33], which varies between Classes of the ‘Proteobacteria’, for 12 13170 example, genes for fumarase, succinate dehydrogenase and the E1 subunit of α-ketoglutarate 14171 dehydrogenase were recently found to be missing from Thermithiobacillus tepidarius DSM 3134T [12, 31, 15 16172 32] of the . Activities of Krebs’ cycle enzymes assayed by Smith et al. [33] in cell-free 17 18173 extracts of T. thioparus found activities of all enzymes except for α-ketoglutarate dehydrogenase, 19 174 indicating the E1 and E2 subunits alone were not sufficient for activity. Interestingly, Smith detected 20 21175 fumarase activity; however, this strain of T. thioparus was isolated by the authors themselves and was not 22 23176 DSM 505T, so there may be further variation of Smith’s horseshoe at strain level. With many species 24 177 having been erroneously classified as strains of T. thioparus in the past that have subsequently been 25 26178 proven to belong to other genera [4] it is also plausible that Smith’s strain was from another specie, genus 27 28179 or even Class. The E1, E2 and E3 subunit genes for α-ketoglutarate dehydrogenase were identified in the

29 T 30180 genome of T. denitrificans ATCC 25259 [34], though enzyme activity was also absent in this strain [33, 31181 35]. It is important to stress that the full suite of enzymes of Krebs’ cycle or Smith’s horseshoe have never 32 33182 been assayed in T. thioparus DSM 505T - clearly this is needed in order to identify if genomic data reflect 34 35183 true activities in vivo [29]. 36 37184 A complete Calvin-Benson-Bassham cycle is present, with a single copy of the large (cbbL) and small 38 39185 (cbbS) form I RuBisCO (EC 4.1.1.39) subunits and, owing to the presence of a transaldolase (EC 2.2.1.2) 40 41186 and absence of a sedoheptulose-1,7-bisphosphatase (EC 3.1.3.37) gene, we can conclude that it uses the 42187 transaldolase-variant Calvin-Benson-Bassham cycle [36]. Adjacent to these genes are cbbO and cbbQ, 43 44188 consistent with form IAq RuBisCO [37], which is canonically cytoplasmic, rather than carboxysomal. A 45 46189 cluster of 11 genes that encode carboxysome shell proteins and a carboxysome carbonic anhydrase were 47 190 also identified. This cluster is located on the forward strand while on the reverse strand the RuBisCO 48 49191 cluster is located c.9 kb upstream, between which is a divergently transcribed transcriptional regulator 50 51192 (cbbR) [34]. Further evidence of carboxysome expression can be seen in transmission electron 52 micrographs (Fig 1). The carboxysome gene cluster (experimental data would be required to demonstrate 53193 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11194 if it is an operon or not) in T. thioparus DSM 505T does not start with cbbLS, (as would be found in 12 13195 Halothiobacillus, and Thermithiobacillus spp. [38]) - the lack of these genes has also 14196 been observed in T. denitrificans ATCC 25259T [34] and may be a diagnostic property of this genus. 15 16 17197 Whilst T. thioparus DSM 505T cannot grow anaerobically with denitrification but T. denitrificans does, 18 19198 putative operons encoding the nitrate reductase (nar), nitrite reductase (nir), nitric oxide reductase (nor) 20199 and nitrous oxide reductase (nos) proteins of canonical denitrification are found in both T. denitrificans 21 22200 and T. thioparus – this may indicate a potential for the latter to grow with denitrification albeit not under 23 24201 conditions previously employed – for example, it may occur only under micro-oxic conditions rather than 25 202 fully anoxic conditions, alternatively, some unknown factor may prevent detection of nitrate and/or the 26 27203 expression of these genes. No evidence for any alternative denitrification pathways [56] was found 28 29 30204 With regard to respiration, five cytochromes c553, one cytochrome c556 and three cytochrome b were 31 205 present. The two high-affinity terminal oxidases were both found – with multiple gene copies of 32

33206 cytochrome c oxidase (cbb3, EC 1.9.3.1) and a single copy of cytochrome bd-type quinol oxidase (EC 34 35207 1.10.3.10). It is worth noting that T. denitrificans also possess the aa3 variant of cytochrome c oxidase 36 (EC 1.9.3.1), which may permit it greater metabolic diversity, for example, growth under more variable 37208 38209 oxygen partial pressures - this could potentially explain why T. thioparus cannot denitrify under anoxic 39 40210 conditions, even though it possesses the operons encoding enzymes of denitrification. It is known that 41 42211 microaerophilic organisms usually employ cbb3 or bd-type high-affinity terminal oxidases [56] and this 43212 may further evidence that T. thioparus and T. denitrificans may grow under such conditions, potentially 44 45213 with denitrification, though we cannot find any studies that demonstrate this in vivo thus far. 46 47 48214 Extended insights 49 50215 Enzymes of the Kelly-Trudinger pathway remain poorly understood and many of the genes are yet to be 51216 identified [12]. The oxidation of thiosulfate to tetrathionate takes place via a cytochrome c-linked 52 53217 thiosulfate dehydrogenase (EC 1.8.2.2), one gene (tsdA) for which was identified in Allochromatium 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11218 vinosum, a member of the [39]. However, tsdA is not present in T. thioparus DSM 12 T 13219 505 supporting the hypothesis that more than one thiosulfate dehydrogenase is present in the 14220 ‘Proteobacteria’ or could vary at Class level. Numerous Kelly-Friedrich pathway genes (soxXYZAB) 15 16221 were present but the remaining of the conserved soxTRS-VW-XYZABCDEFGH genes being absent [40- 17 18222 42]. In spp. () soxYZ encodes a protein complex that binds thiosulfate via 19 223 a cysteine residue in the initial stage of the Kelly-Friedrich pathway while soxXA encode cytochromes c551 20

21224 and c552.5 which capture two electrons from thiosulfate oxidation. Finally soxB encodes a hydrolase that 22 23225 removes the terminal sulfone group as sulfate. Missing the SoxCD or sulfur dehydrogenase protein from 24 226 the multi-enzyme system would leave a sulfur atom attached to the SoxYZ residue, preventing the action 25 26227 of SoxB to liberate this residue to act further with another thiosulfate molecule. An unidentified protein 27 28228 may participate in the release of these sulfur atoms and potentially may explain the deposits of sulfur seen 29 30229 during initial stages of growth. This does not explain the production of tetrathionate which still would 31230 require the activity of thiosulfate dehydrogenase. If soxXYZAB genes are being expressed then they may 32 33231 be required as a functional part of the Kelly-Trudinger pathway. The action of Sox proteins (if any) in T. 34 T 35232 thioparus DSM 505 in conjunction and potentially collaboration with additional Kelly-Trudinger 36233 pathway proteins would undoubtedly be essential in resolving its chemolithoautotrophic metabolism, 37 38234 which remains poorly understood. An unidentified gene encoding a putative DUF302-family protein is 39 T 40235 present between soxA and soxB genes of Thermithiobacillus tepidarius DSM 3134 , Acidithiobacillus 41 236 thiooxidans ATCC 19377T and ATCC 51756T, and Thiohalorhabdus 42 43237 denitrificans DSM 15699T, the function of which may be important in the Kelly-Trudinger pathway [12]; 44 45238 however, DUF302 is not present on the sox operon of T. thioparus DSM 505T, although six unidentified 46 239 genes annotated as DUF302 family proteins are present elsewhere in the genome. The soxEF genes 47 48240 encode and a flavocytochrome c sulfide dehydrogenase (EC 1.8.2.3) and are both found in the T. 49 50241 denitrificans genome, separate from the main sox gene cluster, but they are not found in T. thioparus, 51 52242 whereas dissimilatory sulfite reductase (dsr) genes are found in both genomes, as are adenylyl sulfate 53 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11243 reductase (aprAB, EC 1.8.99.2) genes, the presence of which has also been confirmed in T. aquaesulis 12 13244 [57]. 14 15245 It was previously noted that 5.9% of the genome (178 genes) of Thermithiobacillus tepidarius DSM 16 17246 3134T were potential horizontally transferred genes from Thiobacillus thioparus, Thiobacillus 18 19247 denitrificans and Sulfuricella denitrificans of the Betaproteobacteria [12]; 96 of these genes came from 20248 the two Thiobacillus spp. However, very little gene transfer has taken place from members of 21 22249 Acidithiobacillia to T. thioparus DSM 505T with only 6 genes from Ttb. tepidarius DSM 3134T, 4 from 23 24250 Acidithiobacillus spp. This is perhaps not surprising due to the thermophilic and acidophilic nature of 25 251 these three Acidithiobacillia compared to the mesophilic requirements of T. thioparus and the 26 27252 unlikelihood of these species co-inhabiting the same environments. A far larger portion (143 genes; 28 29253 4.82 %) of genes were attributed to transfer from members of Gammaproteobacteria, many of which 30 254 grow at more neutral pH and mesophilic temperatures. There was no distinct pattern of any particular 31 32255 metabolism pathways or resistances etc being encoded by these potentially transferred genes. 33 34 35 36256 Conclusions 37 T 38257 The genome of Thiobacillus thioparus DSM 505 gives insights into many aspects of its physiology, 39258 biochemistry and evolution. This organism uses the transaldolase variant of the Calvin-Benson-Bassham 40 41259 cycle and produces carboxysomes for carbon dioxide fixation, evident from both the genome and 42 43260 transmission electron microscopy. The expression of both carboxysomes and RuBisCO may be regulated 44 261 by the same divergently transcribed transcriptional regulator. Smith’s biosynthetic horseshoe is present in 45 46262 lieu of Krebs’ cycle sensu stricto, but this is unusually large as only 2 genes are missing, though T. 47 48263 denitrificans [34] is only missing 1 – this may in part explain the heterotrophic growth of T. aquaesulis 49 264 since just one additional gene would convert Smith’s horseshoe into a functional version of Krebs’ cycle 50 51265 [55]. Many inorganic sulfur-oxidation genes of the sox cluster were found but soxC, soxD, soxE and soxF 52 53266 are absent. The tsdA gene for a thiosulfate dehydrogenase identified in Allochromatium vinosum is absent 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11267 and confirmation of the presence of a different thiosulfate dehydrogenase enzyme and gene will require 12 13268 further study. The genome sequence will enable evolutionary studies into the nature of Thiobacillus and 14269 chemolithoautotrophs in general, in particular reigniting the obligate versus facultative and the 15 16270 autotrophic versus mixotrophic debates that have been largely absent from the literature in recent years, 17 18271 but genome sequences becoming available will now answer many questions proposed over 10 years ago 19 272 [29,33]. 20 21 22 23273 Competing interests 24 25274 The authors have no competing interests. 26 27 28275 Funding 29 30 31276 The sequencing and annotation was performed under the auspices of the United States Department of 32 33277 Energy JGI, a DOE Office of Science User Facility and is supported by the Office of Science of the 34 278 United States Department of Energy under Contract Number DE-AC02-05CH11231. The authors wish to 35 36279 acknowledge the School of Biological and Marine Sciences, University of Plymouth for studentship 37 38280 funding to LH that supported the analysis of the genome and to the Royal Society Research Grant 39 40281 RG120444 awarded to RB to support the analysis of this genome. 41 42282 Authors' contributions 43 44283 LPH and RB analysed and mined the genome data in public databases for genes of interest and performed 45 46284 BLASTn/BLASTp searches to verify and validate the annotation etc and made comparisons of the sulfur 47 285 oxidation, denitrification etc operons with those in other organisms. RB constructed the phylogenetic tree. 48 49286 LPH grew the organism and performed electron microscopy at the Electron Microscopy Unit, University 50 51287 of Plymouth. All other authors contributed to the sequencing, assembly and annotation of the genome 52 53288 sequence. All authors read and approved the final manuscript. 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11289 Acknowledgements 12 13290 We acknowledge Dr Hans-Peter Klenk at the DSMZ for the provision of genomic DNA. 14 15 16 17291 References 18 19292 [1] Nathanson A. Über eine neue Gruppe von Schwefelbakterien und ihren Stoffwechsel. Mitt zool Stn 20293 Neapel. 1902;15:655-680. 21 22 23294 [2] Beijerinck MW. Ueber die Bakterien, welche sich im Dunkeln mit Kohlensäure als Kohlenstoffquelle 24295 ernähren können. Centralbl Bakteriol Parasitenkd Infektionskr Hyg Abt II. 1904;11:592-599 25 26 27296 [3] Starkey RL. Isolation of some bacteria which oxidise thiosulfate. Soil Sci. 1934;39:197-219. 28 29297 [4] Boden R, Cleland D, Green PN, Katayama Y, Uchino Y, Murrell JC. & Kelly DP. Phylogenetic 30 31298 assessment of culture collection strains of Thiobacillus thioparus, and definitive 16S rRNA gene 32 299 sequences for T. thioparus, T. denitrificans, and Halothiobacillus neapolitanus. Arch Microbiol. 33 34300 2011;194:187-195. 35 36 301 [5] Justin P, Kelly DP. Growth kinetics of Thiobacillus denitrificans in anaerobic and aerobic chemostat 37 38302 culture. J Gen Microbiol. 1978;107:123-130. 39 40 [6] Wood AP, Kelly DP. Isolation and physiological characterisation of Thiobacillus aquaesulis sp. nov., 41303 42304 a novel facultatively autotrophic moderate thermophile. Arch. Microbiol. 1988;149:339-343. 43 44 45305 [7] Katayama-Fujimura Y, Tsuzaki N, Kuraishi H. Ubiquinone, fatty acid and DNA base composition 46306 determination as a guild to the of the genus Thiobacillus. J Gen Microbiol. 1982;128:1599- 47 48307 1611. 49 50308 [8] Katayama-Fujimura Y, Enokizono Y, Kaneko T, Kuraishi H. Deoxyribonucleic acid homologies 51 52309 among species of the genus Thiobacillus. J Gen Appl Microbiol. 1983;29:287-295. 53 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11310 [9] Katayama-Fujimura Y, Tsuzaki N, Hirata A, Kuraishi H. Polyhedral inclusion bodies (carboxysomes) 12 13311 in Thiobacillus species with reference to the taxonomy of the Genus Thiobacillus. J Gen Appl Microbiol. 14312 1984;30:211-222. 15 16 17313 [10] Kelly DP, Wood AP. Confirmation of Thiobacillus denitrificans as a species of the genus 18314 Thiobacillus, in the β-subclass of the Proteobacteria, with strain NCIMB 9548 as the type strain. Int J Syst 19 20315 Evol Microbiol. 2000;50:547-550. 21 22 316 [11] Marmur J, Doty P. Determination of the base composition of deoxyribonucleic acid from its thermal 23 24317 denaturation temperature. J Mol Biol. 1962;5:109-118. 25 26 318 [12] Boden R, Hutt LP, Huntemann M, Clum A, Pillay M, Palaniappan K, Varghese N, Mikhailova N, 27 28319 Stamatis D, Reddy T, Ngan CY, Daum C, Shapiro N, Markowitz V, Ivanova N, Woyke T, Kyrpides N. 29 30320 Permanent draft genome of Thermithiobaclillus tepidarius DSM 3134T, a moderately thermophilic, 31 321 obligately chemolithoautotrophic member of the Acidithiobacillia. Standards in Genomic Sciences. 2016; Commented [N4L1]: "Standards in Genomic Sciences" 32 found (1/1) 33322 11:DOI 10.1186/s40793-016-0188-0 34 35 36323 [13] Pagani I, Liolios K, Jansson J, Chen IM, Smirnova T, Nosrat B, et al. The Genomes OnLine 37324 Database (GOLD) v. 4: status of genomic and metagenomic projects and their associated metadata. 38 39325 Nucleic Acids Res. 2012;40:D571–9. 40 41326 [14] Markowitz VM, Chen I-MA, Palaniappan K, Chu K, Szeto E, Pillay M, et al. IMG 4 version of the 42 43327 integrated microbial genomes comparative analysis system. Nucleic Acids Res. 2014;42:D560–7. 44 45 328 [15] Mavromatis K, Land ML, Brettin TS, Quest DJ, Copeland A, Clum A, et al. The fast changing 46 47329 landscape of sequencing technologies and their impact on microbial genome assemblies and annotation. 48 49330 PLoS One. 2012;7, e48837. 50 51331 [16] Bennett S. Solexa Ltd. Pharmacogenomics. 2004;5(4):433–8 52 53 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11332 [17] Mingkun L, Copeland A, Han J. DUK, unpublished, 2011 12 13333 [18] Mingkun L. kmernorm, unpublished, 2011 14 15 16334 [19] Zerbino D, Birney E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. 17335 Genome Res. 2008;18:821–829. 18 19 20336 [20] Reads simulator wgsim [https://github.com/lh3/wgsim] 21 22337 [21] Gnerre S, MacCallum I. High–quality draft assemblies of mammalian genomes from massively 23 24338 parallel sequence data. PNAS. 2011;108:4 1513–1518. 25 26339 [22] Hyatt D, Chen GL, Lacascio PF, Land ML, Larimer FW, Hauser LJ. Prodigal: prokaryotic gene 27 28340 recognition and translation initiation site identification. BMC Bioinformatics.2010; 11:119. 29 30341 [23] Pati A, Ivanova NN, Mikhailova N, Ovchinnikova G, Hooper SD, Lykidis A, Kyrpides NC. 31 32342 GenePRIMP: a gene prediction improvement pipeline for prokaryotic genomes. Nat Methods 2010; 33 34343 7:455–457. 35 36344 [24] Lowe TM, Eddy SR. tRNAscan–SE: a program for improved detection of transfer RNA genes in 37 38345 genomic sequence. Nucleic Acids Res 1997; 25:955–964. 39 40346 [25] Pruesse E, Quast C, Knittel, Fuchs B, Ludwig W, Peplies J, Glckner FO. SILVA: a comprehensive 41 42347 online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. 43348 Nuc Acids Res 2007; 35: 2188–7196. 44 45 46349 [26] INFERNAL. Inference of RNA alignments. [http://infernal.janelia.org] 47 48350 [27] The Integrated Microbial Genomes (IMG) platform. [http://img.jgi.doe.gov] 49 50 351 [28] Markowitz VM, Mavromatis K, Ivanova NN, Chen IMA, Chu K, Kyrpides NC. IMG ER: a system 51 52352 for microbial genome annotation expert review and curation. Bioinformatics 2009; 25:2271–2278 53 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11353 [29] Wood AP, Aurikko JP, Kelly DP. A challenge for 21st century molecular biology and biochemistry: 12 13354 what are the causes of obligate autotrophy and methanotrophy. FEMS Microbiol Rev. 2004;28:335-352. 14 15355 [30] Peeters TL, Liu MS, Aleem MIH. The tricarboxylic acid cycle in Thiobacillus denitrificans and 16 17356 Thiobacillus-A2. J. Gen. Microbiol. 1970;64:29-35. 18 19357 [31] Kelly DP, Wood AP. Reclassification of some species of Thiobacillus to the newly designated 20 21358 genera Acidithiobacillus gen. nov., Halothiobacillus gen. nov. and Thermithiobacillus gen. nov. Int. J. 22 359 Syst. Evol. Microbiol. 2000;50:511-516. 23 24360 [32] Hudson C, Williams K, Kelly DP. Definitive assignment by multigenome analysis of the 25 26361 Gammaproteobacterial Genus Thermithiobacillus to the Class Acidithiobacillia. Pol. J. Microbiol. 27 362 2014;63:245-247. 28 29363 [33] Smith AJ, London J, & Stanier RY. Biochemical basis of obligate autotrophy in blue- green algae 30 31364 and thiobacilli. J. Bacteriol. 1967;94:972-983. 32 33365 [34] Beller HR, Chain PSG, Letain TE, Chakicherla A, Larimer FW, Richardson PM, Coleman MA, 34 35366 Wood AP, Kelly DP. The genome sequence of the obligately chemolithoautotrophic, facultatively 36 37367 anaerobic bacterium Thiobacillus denitrificans. J. Bacteriol. 2006;188:1473-1488. 38 39368 [35] Taylor BF, Hoare DS. Thiobacillus denitrificans as an obligate chemolithoautotroph. Cell suspension 40 41369 and enzymic studies. Arch. Mikrobiol. 1971;80:262–276. 42 43370 [36] Anthony C. The biochemistry of methylotrophs. 1st ed. London: Academic Press; 1982. 44 45 371 [37] Badger MR, Bek EJ. Multiple RuBisCO forms in proteobacteria: their functional significance in 46

47372 relation to CO2 acquisition by the CBB cycle. J Exp Bot. 2008;59:1525-1541. 48 49 373 [38] Sutter M, Roberts EW, Gonzalez RC, Bates C, Dawoud S, Landry K, Cannon GC, Heinhorst S, 50 51374 Kerfeld CA. Structural characterization of a newly identified component of α-carboxysomes: the AAA+ 52 53375 domain protein CsoCBBQ. Sci. Rep. 2015;5:16243. 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11376 [39] Denkmann K, Grein F, Zigann R, Siemen A, Bergmann J, van Helmont S, Nicolai A, Pereira IAC, 12 13377 Dahl C. Thiosulfate dehydrogenase: a widespread unusual acidophilic c-type cytochrome. Environmental 14378 Microbiology. 2012;14:2673-2688. 15 16 17379 [40] Wodara C, Kostka S, Egert M, Kelly DP, Friedrich CG. (1994) Identification and sequence analysis 18380 of the soxB gene essential for sulfur oxidation of Paracoccus denitrificans GB17. J. Bacteriol. 19 20381 1994;176:6188-6191. 21 22 382 [41] Wodara C, Bardischewsky F, Friedrich CG. Cloning and characterization of sulfite dehydrogenase, 23 24383 two c-type cytochromes, and a flavoprotein of Paracoccus denitrificans GB17: essential role of sulfite 25 26384 dehydrogenase in lithotrophic sulfur oxidation. J. Bacteriol. 1997;179:5014-5023. 27 28385 [42] Friedrich CG, Quentmeier A, Bardischewsky F, Rother D, Kraft R, Kostka S, Prinz H. Novel genes 29 30386 coding for lithotrophic sulfur oxidation of Paracoccus pantotrophus GB17. J. Bacteriol. 2000;182:4677- 31 32387 4687. 33 34388 [43] Field D, Garrity G, Gray T, Morrison N, Selengut J, Sterk P, et al. The minimum information about a 35 36389 genome sequence (MIGS) specification. Nat Biotechnol. 2008;26:541–7. 37 38390 [44] Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: tool for the 39 40391 unification of biology. The Gene Ontology Consortium Nat Genet. 2000;25:25–9. 41 42392 [45] Guide to GO evidence codes [http://www.geneontology.org/GO.evidence.shtml] 43 44 393 [46] Tamura K, Stecher G, Peterson D, Filipski A, Sumar S. MEGA6: Molecular Evolutionary Generics 45 46394 Analysis version 6.0. Mol Biol Evol 2013;30:2725-2719. 47 48 395 [47] Tamura K, Nei M. Estimation of the number of nucleotide substitutions in the control region of 49 50396 mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 1993;10:512-526. 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11397 [48] Woese CR, Kandler O, Wheelis ML. Towards a natural system of organisms: proposal for the 12 13398 domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci U S A. 1990;87:4576 [PubMed]. 14 15399 [49] Garrity GM, Bell JA, Lilburn T, Phylum XIV. Proteobacteria phyl. nov. In: Garrity GM, Brenner DJ, 16 17400 Krieg NR, Staley JT, editors. Bergey’s Manual of Systematic Bacteriology, vol. 2. Second ed. New York: 18401 Springer; 2005. p. 1. Part B. 19 20 21402 [50] Garrity GM, Bell JA, Lilburn T. "Class II. Betaproteobacteria class. nov." In: D.J. Brenner, N.R. 22 403 Krieg, J.T. Staley, and G.M. Garrity (eds), Bergeys Manual of Systematic Bacteriology, second edition, 23 24404 vol. 2 (The Proteobacteria) part C (The Alpha-, Beta-, Delta-, and Epsilonproteobacteria), Springer, New 25 26405 York, 2005; 575. 27 28406 [51] Garrity GM, Bell JA, Lilburn T. "Order II. Hydrogenophilales ord. nov." In: D.J. Brenner, N.R. 29 30407 Krieg, J.T. Staley, and G.M. Garrity (eds), Bergeys Manual of Systematic Bacteriology, second edition, 31 32408 vol. 2 (The Proteobacteria), part C (The Alpha-, Beta-, Delta-, and Epsilonproteobacteria), Springer, 33409 New York, 2005; 763. 34 35 36410 [52] Garrity GM, Bell JA, Lilburn T. "Family I. fam. nov." In: D.J. Brenner, N.R. 37411 Krieg, J.T. Staley, and G.M. Garrity (eds), Bergeys Manual of Systematic Bacteriology, second edition, 38 39412 vol. 2 (The Proteobacteria), part C (The Alpha-, Beta-, Delta-, and Epsilonproteobacteria), Springer, New 40 41413 York, 2005;763. 42 43414 [53] Skerman VBD, McGowan V, Sneath PHA. Approved list of Bacterial names. Int J Syst Evol 44 45415 Microbiol. 1980;30:225-420. 46 47416 [54] Kelly D P, Wood AP, Stackebrandt E. (2005). Genus II. Thiobacillus Beijerinck 1904. In Bergey’s 48 49417 Manual of Systematic Bacteriology, Vol. 2, 2nd edn., pp. 764–769. Edited by D. J. Brenner, N. R. Krieg, 50 J. T. Staley & G. M. Garrity. The Proteobacteria, Part C, The Alpha-, Beta-, Delta-, and 51418 52419 Epsilonproteobacteria. New York:Springer. 53 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11420 [55] Wood AP, Kelly DP. Isolation and characterization of Thiobacillus aquaesulis sp. nov., a novel 12 13421 facultatively autotrophic moderate thermophile. Arch Microbiol 1988;149:339-343. 14 15422 [56] Chen J, Strous M. Denitrification and aerobic respiration, hybrid electron transport chains and co- 16 17423 evolution. Biochim Biophys Acta Bioenergetics. 2013;1827:136-144. 18 19424 [57] Mayer B, Kuever J. Molecular analysis of the distribution and phylogeny of dissimilatory adenosine 20 21425 5'-phosphosulfate reductase-encoding genes (aprBA) among sulfur-oxidizing prokaryotes. Microbiology 22 426 (UK). 2007;153:3478-3498. 23 24 25427 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11428 Table 1

12 T 13429 Table 1. Classification and general features of Thiobacillus thioparus DSM 505 according to 14430 MIGS recommendations [43] Evidence 15 MIGS ID Property Term codea 16 Classification Domain Bacteria TAS [48] 17 Phylum ‘Proteobacteria’ TAS [49] 18 Class Betaproteobacteria TAS [50] 19 Order Hydrogenophilales TAS [51] 20 Family Hydrogenophilaceae TAS [52] 21 22 Genus Thiobacillus TAS [2, 3]

23 Species Thiobacillus thioparus TAS [2, 3] T 24 (Type) strain: DSM 505 TAS [53]

25 Gram stain Negative TAS [3] 26 Cell shape Rod TAS [2, 3] 27 Motility Motile TAS [2, 3] 28 Sporulation None TAS [3, 55] 29 Temperature range N.D. NAS 30 Optimum temperature 28 °C TAS [3, 54] 31 pH range; Optimum N.D.; 7.0 NAS 32 Carbon source Carbon dioxide TAS [3, 54] 33 MIGS -6 Habitat Agricultural soil (sandy loam) TAS [3] 34 MIGS-6.3 Salinity N.D. NAS 35 MIGS-22 Oxygen requirement Aerobic TAS [3, 54] 36 MIGS-15 Biotic relationship Free-living TAS [3, 54] 37 MIGS-14 Pathogenicity Non-pathogen NAS 38 MIGS-4 Geographic location New Jersey, United States of America TAS [3] 39 MIGS-5 Sample collection 1934 TAS [3] 40 MIGS-4.1 Latitude 40° 28' 57'' N TAS [3] 41 42 MIGS-4.2 Longitude 74° 26' 14'' E TAS [3] 43 MIGS-4.4 Altitude 28 m TAS [3] 44431 a Evidence codes - IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in 432 the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but 45433 based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the 46434 Gene Ontology project [44,45]. 47435 48 49 50436 51 52437 53 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11438 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11439 Table 2 12 13440 Table 2. Project information. 14 MIGS ID Property Term 15 MIGS 31 Finishing quality Improved High-Quality Draft 16 MIGS-28 Libraries used Illumina Standard PE 17 MIGS 29 Sequencing platforms Illumina 18 MIGS 31.2 Fold coverage 122.7 19 MIGS 30 Assemblers Allpaths/Velvet 20 MIGS 32 Gene calling method NCBI Prokaryotic Genome Annotation Pipeline 21 Locus Tag B058 22 Genbank ID ARDU00000000 23 GenBank Date of Release April 16th, 2013 24 GOLD ID Ga0025551 25 BIOPROJECT PRJNA169730 26 MIGS 13 Source Material Identifier DSM 505T 27 Project relevance GEBA-KMG 28 29441 30442 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11443 Table 3 12 13444 Table 3. Genome statistics. 14 Attribute Value % of Total 445 15 Genome size (bp) 3,201,518 100.00446 16 DNA coding (bp) 2,937,381 91.75 17 DNA G+C (bp) 1,994,510 62.30447 18 DNA scaffolds 18 100.00 Commented [BSBCL2]: Please include the percentage of 19 Total genes 3,197 100.00448 the total number of bp that this genome attribute accounts for 20 Protein coding genes 3,135 98.06 449 21 RNA genes 62 1.94 22 Pseudo genes 538 16.83 23 450 Genes in internal clusters 267 8.35 24 Genes with function prediction 2,597 81.23 25 451 26 Genes assigned to COGs 2,258 70.63 27 Genes with Pfam domains 2,700 84.45452 28 Genes with signal peptides 376 11.76 453 29 Genes with transmembrane helices 767 23.99 30 CRISPR repeats 2 0.04 Commented [BSBCL3]: Please add the percentage of 454 the total number of bp that this genome attribute accounts 31 for to the adjacent column 32455 33 34456 35 36457 37 38458 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11459 Table 4 12 13460 Table 4. Number of genes associated with general COG functional categories. 14 Code Value %age Description 15 J 205 8.2 Translation, ribosomal structure and biogenesis 16 A 1 0.0 RNA processing and modification 17 K 120 4.8 Transcription 18 L 83 3.3 Replication, recombination and repair 19 B 1 0.0 Chromatin structure and dynamics 20 D 38 1.5 Cell cycle control, Cell division, chromosome partitioning 21 V 61 2.4 Defense mechanisms 22 T 156 6.2 Signal transduction mechanisms 23 24 M 225 9.0 Cell wall/membrane biogenesis 25 N 101 4.0 Cell motility 26 U 56 2.2 Intracellular trafficking and secretion 27 O 144 5.8 Posttranslational modification, protein turnover, chaperones 28 C 207 8.3 Energy production and conversion 29 G 86 3.4 Carbohydrate transport and metabolism 30 E 153 6.1 Amino acid transport and metabolism 31 F 63 2.5 Nucleotide transport and metabolism 32 H 144 5.8 Coenzyme transport and metabolism 33 I 76 3.0 Lipid transport and metabolism 34 P 206 8.2 Inorganic ion transport and metabolism 35 Q 37 1.5 Secondary metabolites biosynthesis, transport and catabolism 36 R 165 6.6 General function prediction only 37 S 143 5.7 Function unknown 38 - 939 29.4 Not in COGs 39 40461 The total is based on the total number of protein coding genes in the genome. 462 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 1 2 3 4 5 6 7 8 9 10 11463 Figure legends 12464 13465 Figure 1. Maximum-likelyhood phylogenetic tree based on MUSCLE alignment of 16S rRNA gene 14 15466 sequences of the genus Thiobacillus and the closely related members of the Betaproteobacteria. Type 16 17467 strains of each species are used and only species with validly published names are shown. 18 19468 Sequences pertaining to organisms for which a publically available genome sequence exists are 20 21469 underlined. Accession numbers for the GenBank database are in parentheses. Alignment and tree were 22 23470 constructed in MEGA 6 [46]. Tree was drawn using the Tamura-Nei model for maximum-likelyhood 24471 trees [47]. Values at nodes are based on 5,000 bootstrap replicates, with values <70 % omitted. Scale-bar 25 26472 indicates 2 substitutions per 100. Thermithiobacillus tepidarius DSM 3134T from the Acidithiobacillia is 27 28473 used as the outgroup. 29 30474 Figure 2. Transmission electron micrographs of T. thioparus DSM 505T cells obtained from a thiosulfate- 31 32475 limited chemostat (20mM, D = 0.07 h-1) visualized in a JEOL JEM-1400Plus transmission electron 33 34476 microscope, operating at 120 kV. 35 36 477 (A) Negatively stained cells. Cells were applied to Formvar and carbon coated copper grid before 37 38478 washing with saline and staining in 50 mM uranyl acetate for 5 mins and washing again. 39 40 41479 (B) Sectioned cells showing the presence of an electron dense polyphosphate (‘volutin’) granule and 42 480 numerous polyhedral carboxysomes that are paler in comparison. 43 44 45481 46 47482 48 49483 50484 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 Figure 1 Click here to download Figure R1 new tree.tif Figure 2 Click here to download Figure higher qual Figure EM images.tif