Comparative Genomics of Ethanol-Producing Species R-038 C. L. Hemme1,2, Q. He3, Y. Deng1,2, Q. Tu1,2, H. Mouttaki1,2, Z. He1,2, K. Barry4, E.H. Saunders5, H. Sun5, M. Land6, L. Hauser6, A. Lapidus4, C. S. Han5, J. Wiegel7, R. Tanner2, Lee Lynd8, P. Lawson2, M.W. Fields9, A. Arkin10, C. Schadt11, B.S. Stevenson2, M. McInerney2, Y. Yang11, H. Dong12, R. Huhnke13, J. R. Mielenz11, S.-Y. Ding14, M. Himmel14, S. Taghavi15, D. van der Lelie15, E. Rubin4 and J. Zhou1,2 Institute for Environmental Genomics, University of Oklahoma, Norman, OK1; Department of , University of Oklahoma, Norman, OK2; Department of Civil and Environmental Engineering, University of Tennessee, Knoxville, TN3; DOE Joint Genome Institute, Walnut Creek, CA4; DOE Joint Genome Institute, Los Alamos National Laboratory, Los Alamos, NM5; Genome Analysis and Systems Modeling Group, Biosciences Division, Oak Ridge National Laboratory, Oak Ridge, TN6; Department of Microbiology, University of Georgia, Athens7, GA; Dartmouth College, Hanover, NH8; Montana State University, Bozeman, Mt9; University of California-Berkeley, Berkeley, CA10; Oak Ridge National Laboratory, Oak Ridge, TN11; University of Miami, Oxford, OH12; Oklahoma State University, Stillwater, OK13; National Renewable subterraneus Energy Laboratory, Golden, CO14; Brookhaven National Laboratory, Upton, NY15 Introduction Characteristics of Thermoanaerobacter Genomes Comparative Genomics of Clostridia: A Novel Relationship Between The genomes of two ethanol-producing Thermoanaerobacter strains (T. pseudethanolicus 39E and Thermoanerobacter sp. X514) were Thermoanaerobacter and Cluster III Cellulolytic Clostridia sequenced and compared to an outlier strain Caldanaerobacter A) B) A) B) subsp. tengcongensis MB4 (formerly T. tengcongensis MB4). The Thermoanaerobacter strains showed many characteristics distinguishing the genus from Caldanaerobacter including the ability to metabolize xylan and xylose, the utilization of an archaeal V-type ATPase, and the use of a bifunctional 2° alcohol dehydrogenase/aldehyde dehydrogenase enzyme for ethanol production. X514 displays lineage-specific adaptations for survival in low-nutrient subsurface groundwater including expansion of gene families for xylose uptake and cation efflux, and the acquisition by lateral transfer of additional carbon metabolism genes that are predicted to affect the flux of carbon through the fermentation pathway. X514 also encodes a complete set of genes for de novo biosynthesis of vitamin B12, and evidence suggests that this pathway was ancestral to the three strains and maintained only in X514 as part of a general low-nutrient survival strategy. The genomes suggest several properties which may have implications in cellulosic ethanol production in consolidated bioprocessing (CBP) schemes. Further analysis of all sequenced Clostridia genomes reveal unique properties of the Clostridia and suggest novel phylogenetic relationships, including a possible common origin or C) Thermoanaerobacter and thermocellum.

Geographical Location of Finished Isolates

A genome tree constructed from concatenated alignments of 12 housekeeping genes for all sequenced Clostridia strains suggests A) The Thermoanaerobacter genomes display characteristics typical of Clostridia genomes including small novel relationships. The tree clearly delineates true Clostridium genome (<3 Mb) low G+C%, strong GC and purine skews between the leading and lagging strands and species and suggests monophyletic clades encompassing exceptionally strong leading strand bias. Despite a high degree of gene conservation and local syntenny, Dorea/Ruminococcus/Clostridium phytofermentans , C) the X514 genome has undergone numerous lineage-specific large-scale inversions. Rings are labeled as Alkaphilus/Finegoldia/Clostridium difficilie , and follows from outer to inner rings: Ring 1, COG assignments; Ring 2, lateral gene transfer events (red), Thermoanaerobacter/Class III cellulolytic Clostridia. The tree known phage genes (grey) and pseudogenes (blue); Ring 3, mean-centered GC content (red/blue); Ring 4, suggests significant reclassification efforts are necessary. See also mean-centered GC-skew (red/green); Ring 5, mean-centered purine skew (purple/orange); Ring 6, mean- Gupta and Gao, 2009. B) The Thermoanaerobacter/ Class III centered synonymous codon usage order (SCUO); Ring 7 (X514 only), whole-genome alignment of X514 and cellulolytic Clostridia clade suggests a common thermophilic ancestor 39E using MAUVE aligner (red indicates same orientation, blue indicates opposite orientation). B) Genome C. subterraneus subsp. tengcongensis MB4 possibly capable of cellulose degradation. Furthermore, positioning of rearrangements between 39E and MB$ and between 39E and X514. C) Model for evolution of the the Clostridium cellulolyticum/Clostrdium papyrosolvens strains Thermoanaerobacter genome. suggests a loss of thermophily in this lineage. Lineages are labelled Thermoanaerobacter sp. X514T. pseudoethanolicus 39E as follows: red, non-cellulolytic thermophiles; purple, cellulolytic thermophiles; blue, cellulolytic mesophiles; yellow, X514 and 39E Exhibit Different Ethanol Yields in Co- pathogens/commensal; green, cellulolytic commensals. Strains sequenced by the Zhou laboratory are labeled in red. Culture with C. thermocellum. Why? Genomic Characteristics A) D)

MB4 39E X514 Genome Size 2689445 2362816 2457259 Comparative Genomics of Clostridia: Properties of Clostridia (bp) %G+C 37.6 34.5 34.5 Genomes %G+A 49.9 50.3 51.8 A) C) # Protein 2588 2243 2349 Coding Genes Leading 2237 (86.4%) 1969 (87.8%) 2053 (87.4%) Strand CDS B) Structural 68 75 73 RNAs Replichore 0.50 0.52 0.67 Arm Asymmetry

Pseudogenes 26 48 57 phytofermentans B) Analysis of the genomic properties of finished Orthologs and Unique Genes Clostridia genomes shows several trends. A, B) Clustering of genomes by PCA analysis compliments the previous phylogenomics analysis. True Clostridium species cluster together while outlier species (e.g. C. C) ) display different properties. Most of the variability in the genomes is Possible Explanations: accounted for by nucleotide composition (%G+C, GC-skew, etc.) and CDS skew. C) Effects of - Increased carbon uptake replication-associated pressure (RAP) and thermocellum transcription-associated pressure (TAP) in - Increased carbon flux Clostridia species as measured by plotting GC skew of coding sequences resulting from RAP - Modified regulatory networks (σGR) and TAP (σGT) vs total GC skew of coding sequences (χGcd). Analysis of all bacterial genomes suggests the contribution of RAP to - Co-factor biosynthesis χGcd across the bacterial spectrum is ~60% (Chen and Chen, 2007). However, TAP accounts for X514 produces more ethanol in co-culture with C. thermocellum than the corresponding co-culture ~70% of the GC skew in Clostridia genomes. with 39E. Several physiological and genetics traits suggest possible explanations for this This Research was funded by the observation. X514 shows an apparent lineage-specific expansion of xylose transporters. CAI analysis for 39E (A) and X514 (B) suggest that carbohydrate transporters are highly expressed in State of Oklahoma through the both strains and that lineage-specific xylose transporters in X514 are highly expressed. Thus, X514 may have a greater baseline rate of xylose uptake. The genomic region encoding xylan Oklahoma Bioenergy Center (OBC) degradation, xylose transport and xylose metabolism (C) is highly variable between Thermoanaerobacter species. By dint of living in a nutrient-poor environment, X514 has apparently program. Genome sequencing was retained a complete de novo vitamin B12 biosynthesis pathway (D). B12 has been shown to enhance chitinolytic activity in some C. thermocellum strains and is an essential nutrient in some C. performed through the DOW-JGI co-cultures, thus the ability of X514 to synthesize its own B12 may play a role in cellulose degradation rates and ethanol yields in co-culture. All of these possibilities are currently Community Sequencing Program. being explored using experimental techniques.