Supplementary Information 2 3 Descriptions of Proposed Taxa 4 Description of Candidatus Mnemosynella Biddleae (Gen
Total Page:16
File Type:pdf, Size:1020Kb
1 Supplementary Information 2 3 Descriptions of proposed taxa 4 Description of Candidatus Mnemosynella biddleae (gen. nov., sp. nov.) 5 Candidatus Mnemosynella biddleae (Mne.mo.sy.nel’la. N.L. fem. dim. n. 6 Mnemosynella, little Mnemosyne, after the Greek mythology Titaness, goddess of 7 memory, and mother of the Muses; a reference to the species containing multiple 8 methanogenesis markers and “remembering” its methanogenic ancestry; bidd’le.ae. 9 N.L. gen. fem. n. biddleae, named after Jennifer Biddle, in honor of her contributions 10 to microbial ecology). 11 Description of Candidatus Mnemosynella hypogeia (gen. nov., sp. nov.) 12 Candidatus Mnemosynella hypogeia (Description as above for the genus 13 Mnemosynella. hy.po.gei’a. N.L. fem. adj. hypogeia, subterraneous, earth-born ) 14 Description of Mnemosynellaceae (fam. nov.) 15 (Mne.mo.sy.nel.la.ce’ae. N.L. fem. n. Mnemosynella, a Candidatus genus name; - 16 aceae, ending to denote a family; N.L. fem. pl. n. Mnemosynellaceae, the 17 Mnemosynella family) 18 Description of Mnemosynellales (ord. nov.) 19 (Mne.mo.sy.nel.la’les. N.L. fem. n. Mnemosynella, a Candidatus genus name; -ales, 20 ending to denote a family; N.L. fem. pl. n. Mnemosynellales, the Mnemosynella order) 21 Description of Candidatus Hecatella orcuttiae (gen. nov., sp. nov.) 22 Candidatus Hecatella orcuttiae (He.ca.tel’la. N.L. fem. dim. n. Hecatella, little Hecate, 23 after the Greek mythology goddess of witchcraft and crossroads; a reference to the 24 species’ metabolism being at the “crossroads” of methanogenesis and the Wood- 25 Ljungdahl pathway by means of its Mtr complex; or.cut’ti.ae. N.L. gen. fem. n. 26 orcuttiae, named after Beth Orcutt, in honor of her contributions to microbial ecology, 27 including studies on the Juan de Fuca Ridge). 28 Description of Hecatellaceae (fam. nov.) 29 (He.ca.tel.la.ce’ae. N.L. fem. n. Hecatella, a Candidatus genus name; -aceae, ending 30 to denote a family; N.L. fem. pl. n. Hecatellaceae, the Hecatella family) 31 Description of Hecatellales (ord. nov.) 32 (He.ca.tel.la’les. N.L. fem. n. Hecatella, a Candidatus genus name; -ales, ending to 33 denote a family; N.L. fem. pl. n. Hecatellales, the Hecatella order) 34 Description of Candidatus Geothermarchaeum rappei (gen. nov., sp. nov.) 35 Candidatus Geothermarchaeum rappei (Ge.o.therm.ar.chae’um. Gr. fem. n. gê, the 36 earth; Gr. masc. adj. thermos, hot; N.L. neut. n. archaeum, ancient one, archaeon, 37 from Gr. masc. adj. archaîos, ancient; N.L. neut. n. Geothermarchaeum, an archaeon 38 from hot earth; rap.pe’i. N.L. gen. masc. n. rappei, named after Michael Rappé, in 39 honor of his contributions to microbial ecology, including obtaining the original samples 40 from the Juan de Fuca Ridge used in this study). 41 Description of Candidatus Scotarchaeum ottlingeri (gen. nov., sp. nov.) 42 Candidatus Scotarchaeum ottlingerii (Scot.ar.chae’um. Gr. masc. n. skotos, darkness; 43 archaeum, ancient one, archaeon, from Gr. masc. adj. archaîos, ancient; N.L. neut. n. 44 Scotarchaeum, an archaeon living in darkness; ott.lin’ge.ri. N.L. gen. masc. n. 45 ottlingeri, named after Markus Ottlinger, German visual artist depicting hydrothermal 46 settings and biofilms). 47 Description of Geothermarchaeaceae (fam. nov.) 48 (Ge.o.therm.ar.chae.a.ce’ae. N.L. neut. n. Geothermarchaeum, a Candidatus genus 49 name; -aceae, ending to denote a family; N.L. fem. pl. n. Geothermarchaeaceae, the 50 Geothermarchaeum family) 51 Description of Geothermarchaeales (ord. nov.) 52 (Ge.o.therm.ar.chae.a’les. N.L. fem. n. Geothermarchaeum, a Candidatus genus 53 name; -ales, ending to denote a family; N.L. fem. pl. n. Geothermarchaeales, the 54 Geothermarchaeum order) 55 Description of Candidatus Pyrohabitans jungbluthii (gen. nov., sp. nov.) 56 Candidatus Pyrohabitans jungbluthii (Py.ro.ha’bi.tans. Gr. neut. n. pŷr, fire; L. pres. 57 part. habitans, inhabiting; N.L. masc. n. Pyrohabitans, an inhabitant of fire; 58 jung.blu’thi.i. N.L. gen. masc. n. jungbluthii, named after Sean Jungbluth, in honor of 59 his contributions to microbial ecology, including obtaining the original samples from 60 the Juan de Fuca Ridge used in this study). 61 a 62 Supplementary Methods 63 64 Detailed description of homology search methods for Eha, Ehb, Hcg 65 Since several of the proteins in the Eha, Ehb, and Hcg sets were either DUFs or part 66 of the lists from Gao & Gupta1, we could compare the single gene distributions and 67 phylogenies to determine potential issues in the homology searches for the more 68 problematic (poorly annotated, fast evolving etc.) proteins in each complex or pathway. 69 HMM searches often produced >1000 hits for most subunits and thus a lot of 70 computational power, time, and manual work would be required to isolate our 71 homologs of interest. For that reason, we instead searched for homologs using 72 DIAMOND blastp with two seeds on the taxonomic extremes of each complex or 73 pathway. For Eha they were Methanothermobacter marburgensis 74 (Methanobacteriales) and Methanolacinia petrolearia (Methanomicrobiales), for Ehb 75 M. marburgensis and Ca. Methanosuratus petracarbonis (Verstraetearchaeota), and 76 for Hcg M. marburgensis and Desulfurobacterium thermolithotrophum 77 (Desulfurobacteriales). M. marburgensis was used as a reference for picking the first 78 seed using information from 2. Other known homologs from the literature in 79 Methanococcus maripaludis, Methanocaldococcus jannaschii, Methanothermobacter 80 thermoautotrophicus, Methanopyrus kandleri were used as (additional) seeds in some 81 cases. When obtaining fewer homologs than expected (i.e. entire clades missing 82 partially or entirely), the HMM profiles or DIAMOND seeds were expanded by using 83 all hits of the previous search round to create a new HMM profile or collection of seeds 84 and rerun the search. To clean up the datasets and retain our homologs of interest, 85 we first aligned the initial pool of hits with MUSCLE3. Then in Seaview4 we manually 86 inspected the alignments, constructed preliminary phylogenies with BioNJ5 (Poisson 87 or Observed distances) and/or PhyML6 (default options, no topology optimization), and 88 isolated (monophyletic) clades containing our homologs of interest. None of these 89 initial phylogenies were retained, since they were made on a trial and error basis, but 90 they can be reproduced through the datasets in the Supplementary Data. We 91 confirmed suspected gene losses by manually comparing the hits of the homology 92 searches with the synteny in each taxon. Below we list for each gene whether we used 93 the base seeds for a DIAMOND search and if it was found in this study as a DUF, or 94 in Gao & Gupta1 with the corresponding Pfam or arCOG accession for the HMM 95 profile. We also detail any deviations in the homology search methodology. 96 Eha 97 EhaA: (Gao & Gupta, 2007), PF17367, custom HMM from all the hits of the original 98 search 99 EhaB: (Gao & Gupta, 2007), arCOG04828 100 EhaC: DUF2109 101 EhaD: DUF2108 102 EhaE: DUF2107 103 EhaF: DUF2106 104 EhaG: DUF2105 105 EhaH: PF10125, found from synteny 106 EhaI: (Gao & Gupta, 2007) contained only arCOG05034. From Uniprot cross- 107 references for multiple sequences, we found HMM profiles for arCOG05034 and 108 arCOG06464 and pooled the hits together. Mnemosynellales, Persephonarchaea, and 109 M. kandleri were not found through HMM searches, so we added their sequences 110 manually from expected synteny and BLAST searches against NCBI. We pooled all 111 these sequences for recursive DIAMOND searches, using the hits from the previous 112 search as seeds. We stopped after 3 rounds of searches, since the number of hits 113 decreased afterwards. 114 EhaJ: base seeds 115 EhaK: (Gao & Gupta, 2007) contained only arCOG08277. From Uniprot cross-refs, 116 found HMM profiles for arCOG06676, arCOG08277, arCOG10247, arCOG60928, and 117 proceeded as with EhaI. Only 1 round of DIAMOND searches was necessary, as we 118 found no further hits afterwards. 119 EhaL: DUF2104, custom HMM from all the hits of the original search. 120 EhaM: DUF1959, custom HMM from all the hits of the original search. 121 EhaN: base seeds 122 EhaO: base seeds 123 EhaP & EhaQ: In M. kandleri, the gene called EhaP in Uniprot (AAM01675.1) is an 124 unrelated ferredoxin, distantly related to RnfB homologs in Firmicutes but also to its 125 own EhaR* (AAM01674.1). The real EhaP homolog in M. kandleri is called EhaQ 126 (AAM01677.1). However, when running BLASTp against NCBI non-redundant with the 127 M. marburgensis EhaQ, the first M. kandleri hit at twilight zone homology is 128 AAM01676.1 listed in Uniprot as EhbK (marked with an asterisk in Figure 3a). Using 129 M. maripaludis as the query, the first M. kandleri hit is the ferredoxin AAM01674.1. We 130 did not find either of these sequences in our local DIAMOND searches, even if they 131 could be considered as EhaQ-like. Many Methanococcales also carry an EhaQ-like 132 homolog that was not picked up in our homology searches but is part of the Eha cluster 133 (AAB98510.1 in M. jannaschii, marked with an asterisk in Figure 3a). 134 EhaP: base seeds, plus MSBL1 archaeon SCGC-AAA382A20 135 EhaQ: M. marburgensis and M. maripaludis 136 EhaR: The gene called EhaR in M. marburgensis (ADL58396.1) was annotated as a 137 ribokinase and only picked hits in Methanobacteriales. The one called EhaR in M. 138 maripaludis based on Uniprot and synteny (ABO34430.1) and its closest homolog in 139 M. kandleri (AAM01674.1) did have some hits in Methanobacteriales. These included 140 SCG85576.1 in Methanobacterium congolense, and when NCBI non-redundant was 141 queried, ADL59095.1 (MvhB) in M. marburgensis. However, the syntenic 142 polyferredoxin ADL58399.1, as well as SCG85576.1, and ADL59095.1, when used in 143 a BLASTp search in the same way, pick up AAM01674.1 as their first hit in M.