<<

From bathtubs to bloodfeeders: an evolutionary study of the alphaproteobacterial Gellertiella (formerly Ca. Reichenowia)

by

Kevin C. Anderson

A thesis submitted in conformity with the requirements for the degree of Master’s of Science Graduate Department of Ecology & Evolutionary Biology University of Toronto

© Copyright 2021 by Kevin C. Anderson Abstract

From bathtubs to bloodfeeders: an evolutionary study of the alphaproteobacterial Gellertiella (formerly Ca. Reichenowia)

Kevin C. Anderson Master’s of Science Graduate Department of Ecology & Evolutionary Biology University of Toronto 2021

Many are blood feeders and host within specialized organs. One example is which hosts the α-proteobacterial Candidatus Reichenowia. It is assumed that Reichenowia provisions

Placobdella with B vitamins. Although Reichenowia consistently places within , its free-living relative remains a mystery. By obtaining genome sequences of the endosymbiotic bacteria of six of Placobdella, I address questions regarding the role of Reichenowia and its origin. B vitamin synthesis pathways remain largely intact across all taxa with many gaps likely representing a lack of knowledge concerning alternate synthesis routes. I find robust and consistent support for the nesting of the free-living

Gellertiella hungarica within Reichenowia, necessitating the dissolution of Reichenowia. The topology of this clade suggests two independent origins of endosymbiosis from a G. hungarica-like ancestor. These findings clarify the ecology of the system and point towards a potentially novel model system for investigating the early stages of endosymbiosis.

ii To Nana.

Acknowledgements

Many people were instrumental in the completion of this thesis, both directly and indirectly. First and foremost I owe huge thanks to my superb supervisory team in Sebastian Kvist and Alejandro Manzano-Mar´ın for trusting me to take on this project and guiding me through my research in perhaps one of the strangest years on record. Claire Manglicmot has played a large role in making this thesis topic a possibility and for helping to generate the data which is the basis of the study presented within. My supervisory committee, composed of Megan Frederickson and Rob Ness, is also to thank for the helpful input and suggestions which greatly improved the quality of the research. The entirety of the Kvist Lab (as well as some of its recent graduates), helped not only in the betterment of my research but also were (and continue to be) dearly cherished friends; thank you to Danielle de Carle, Rafael Eiji Iwama, Maddy Foote, Claire Manglicmot, Ismay Earl, and Sophia Fan. For assistance funding this research I acknowledge, and am grateful for, the support of the Natural Sciences and Engineering Research Council of Canada (NSERC). I also wish to thank Mrs. Ellen B. Freeman for her contributions to the Reino S. Freeman graduate scholarship and for enabling research that I hope will honour the legacy of her husband. Lastly, and certainly not leastly, I wish to thank my family and friends for providing a source of support and comfort that I would not be here without. Chief among these are Georgia Harrison, Jennifer Potvin, Christopher Anderson, and Jacob Taylor. This list is not extensive (how could it be?) so to all those listed and unlisted, thank you again, for everything.

iii Contents

Acknowledgements...... iii List of Tables...... v List of Figures...... vi

1 Reichenowia 1 1.1 Introduction...... 1 1.2 Materials & Methods...... 3 1.2.1 Specimen Collection...... 3 1.2.2 Genome Sequencing & Assembly...... 4 1.2.3 Metabolic Reconstruction...... 4 1.2.4 Phylogenomic Analyses...... 6 1.2.5 Similarity analyses...... 7 1.3 Results...... 7 1.3.1 Genome characteristics of Gellertiella and Reichenowia...... 7 1.3.2 Similarity analyses...... 8 1.3.3 Phylogenomic placement of Reichenowia...... 10 1.3.4 B-vitamin synthesis capabilities...... 13 1.4 Discussion...... 16 1.4.1 Taxonomic status of Candidatus Reichenowia...... 16 1.4.2 The functional role of endosymbiotic Gellertiella ...... 16 1.4.3 Phylogenomic placement of Gellertiella ...... 18 1.4.4 The origin of endosymbiosis in Gellertiella ...... 18 1.4.5 Outstanding questions and future directions...... 20

Bibliography 31

Appendices 32

A Supplementary material 32

iv List of Tables

1.1 and symbiont sampling localities and accession numbers...... 5 1.2 Genome characteristics of Reichenowia and Gellertiella hungarica ...... 8 1.3 ANI and 16S identity similarity matrix...... 8

A.1 Metadata for taxa used during the genome assembly process...... 33 A.2 Taxonomic information for each of the taxa used during the metabolic reconstruction..... 33 A.3 Metadata for each B vitamin pathway reconstructed...... 34 A.4 Metadata for the taxa used for the phylogenomic reconstructions...... 37 A.5 Topology test results...... 37

v List of Figures

1.1 Ecological data for putative G. hungarica and Reichenowia sequences retreived from the NCBI sequence read archive...... 9 1.2 Ribosomal phylogeny of select Rhizobiales...... 11 1.2 Ribosomal phylogeny of select Rhizobiales (continued)...... 12 1.3 An overview of the B vitamin metabolism of Gellertiella hungarica and Reichenowia spp... 14

A.1 Ribosomal, amino acid coded, partitioned, Bayesian tree...... 38 A.2 Ribosomal, amino acid coded, unpartitioned, maximum likelihood tree...... 39 A.3 Ribosomal, amino acid coded, unpartitioned, Bayesian tree...... 40 A.4 Ribosomal, nucleotide coded, partitioned, maximum likelihood tree...... 41 A.5 Ribosomal, nucleotide coded, unpartitioned, maximum likelihood tree...... 42 A.6 Orthologue, amino acid coded, unpartitioned, maximum likelihood tree...... 43 A.7 An overview of the B vitamin metabolism of Gellertiella hungarica ...... 44 A.8 An overview of the B vitamin metabolism of Reichenowia sp. strain Ppha...... 45 A.9 An overview of the B vitamin metabolism of Reichenowia sp. strain Pmon...... 46 A.10 An overview of the B vitamin metabolism of Reichenowia sp. strain Pmul...... 47 A.11 An overview of the B vitamin metabolism of Reichenowia sp. strain Ppar...... 48 A.12 An overview of the B vitamin metabolism of Reichenowia sp. strain Prug...... 49 A.13 An overview of the B vitamin metabolism of Reichenowia sp. strain Psp...... 50

vi Chapter 1

Reichenowia

1.1 Introduction

Endosymbiosis is a widespread, evolutionarily convergent, phenomenon amongst both prokaryotes and eu- karyotes [1,2] wherein, most commonly, a prokaryotic taxon becomes intimately ( i.e., intracellularly) asso- ciated with a eukaryotic host. Different endosymbionts have varied roles within their hosts depending upon the host’s specific needs (e.g. [3–5]). One such role is nutrient provisioning - when a highly specialized host diet (e.g. phloem feeding) lacks essential nutrients that the host cannot synthesize [6]. In these cases, it is common for the host-associated endosymbiont to retain the genetic repertoire neccessary for the synthesis of these nutrients. Although experiments that correlate symbiont removal and ad hoc nutrient exposure with host fitness and/or survival remain the gold-standard for determining the functional basis of a symbiosis (e.g.,[7]), the recent advances in our ability to cheaply and accurately sequence genomes with even small samples of tissue have made the inference of an endosymbiont’s function possible from a genome sequence alone. However, our power to make a functional prediction regarding the role of an endosymbiont based on a genome sequence would be considerably weakened without the extremely derived features that have been observed among the genomes of (especially) obligate, vertically transmitted endosymbionts. Briefly, these consist of a (generally) drastic reduction in the size of the endosymbiont genome compared to its free-living relative, an unusually high AT bias in terms of the genome’s base composition, an increased rate in sequence evolution (again compared to a free-living relative), and even a reduction in the ability to translate mRNA in an optimal fashion [2,6,8–10]. These genomic features are thought to have arisen as a byproduct of mostly non-adaptive processes: specifically, the clock-like population bottlenecks and lack of recombination between populations of vertically transmitted, obligate endosymbionts results in the total relaxation of selection on ‘non-essential’ elements of the genome and reduces the power of selection to maintain even those elements that are crucial for the proper functioning of the endosymbiosis. However, the rather binary overview pro- vided above fails to account for important elements such as the relative importance of the endosymbiont (i.e., obligate vs. facultative), the time since the establishment of the partnership, the mode of symbiont transmission, and even the medium in which this transmission occurs. All of these elements can result in genomic outcomes that deviate from our general expectations of a highly reduced, AT biased, and quickly evolving obligate endosymbiont [8, 11]. My thesis work focuses on the endosymbionts of leeches (Annelida: : Hirudinea): more specifically, leeches in the Placobdella.

1 Chapter 1. Reichenowia 2

Endosymbiosis in leeches

Arthropods have received the majority of research attention regarding endosymbiotic relationships, perhaps due to the ease with which they can be cultured, their short generation times, and/or their presence as agricultural and human pests (e.g.,[4, 12]). This is despite the fact that leeches, being members of Annel- ida, represent a phylogenetically distant, yet functionally convergent model to study nutrient provisioning endosymbionts [13]. Beyond this, within Hirudinida itself, there have been two independent acquisitions of hematophagy [14], and separate clades of blood feeders seem to have independently acquired endosymbionts throughout their evolutionary history. Discussion here will be limited to the nutritional endosymbionts of leeches although other types of leech endosymbionts have been recorded from different tissues (e.g., nephridial tissues [15]). It has been suggested that bacteria have been recruited by various clades of leeches at least 5 times over evolutionary history [16]. Symbionts from , Hemiclepsis, Hirudo, Parabdella, Pla- cobdella, Placobdelloides, and Torix have been phylogenetically placed on the basis of 16S rDNA [16, 17]. In each case, the symbionts form monophyletic groups (except for ex. Placobdelloides which is rendered paraphyletic via ex. Parabdella) that are scattered across and Gammaproteobacteria. Excluding Hirudo, a member of the order Hirudiniformes, the genera discussed above represent members of the order Glossiphoniiformes [14, 18], which are largely composed of strictly hematophagous, proboscis bearing leeches. There is evidence to suggest that certain hirudiniforms are not strictly hematophagous (un- published observations) whereas members of Glossiphoniiformes seem to be either strictly hematophagous or liquidosomatophagous (feeding on body fluids and tissues of invertebrates). It is possible that the diversity of bacteriomes, organs that specifically house intracellular symbionts in specialized cells termed bacterio- cytes, observed in Glossiphoniiformes are a result of this strict dietary ecology and repeated, independent endosymbiont acquisition [13]. Notably, some genera that are strictly hematophagous do not form obvi- ous bacteriomes, but still possess endosymbionts that are housed intracellularly: examples of these genera include Torix [17] and Theromyzon [19]. In all of these examples, it is suspected that the endosymbionts provide nutrients lacking from the diet (B vitamins), but this has only been rigorously investigated in Ca. Providencia siddallii, the gammaproteobacterial endosymbiont of Haementeria officinalis [20]. Despite a reduced genome and a high A-T sequence bias, P. siddallii retains complete pathways for the production of most B vitamins (specifically B1, B2, B5, B6, B7, B9) [20].

The Placobdella-Reichenowia system

Placobdella (discussed above) is a genus of leeches in Glossiphoniiformes: this genus consists of proboscis- bearing, obligately hematophagous leeches, and is mostly North American in distribution [21]. Placobdella spp. possess bacteriomes that are present as a pair of blind-ended sacs extending laterally from the esophagus [22]. Within these bacteriomes are bacteriocytes, which house a single species of intracellular endosymbionts of the genus Candidatus Reichenowia (hereafter Reichenowia); this genus was recognized following 16S- based phylogenetic analysis, transmission electron microscopy, and fluorescence in situ hybridization of the bacteria [22]. Although Reichenowia was discovered in 2004 [22], the genus was never formally described, due to the fact that it has not been cultured to date. Reichenowia are members of Alphaproteobacteria that form a monophyletic group within Rhizobiales and have, most recently, been placed as the sister to the + clade [23] or the Ensifer + Rhizobium clade [24]. This positions Reichenowia within a clade dominated by facultative nitrogen fixing symbionts of plants [25]. While it has been assumed that Reichenowia is vertically transmitted from parent to offspring, evidence to support this hypothesis remains scarce. On the one hand it was found, using alphaproteobacterial probes, that unfed juveniles contained alphaproteobacteria in the bacteriomes alone suggesting that transmission does not occur environmentally at the time of the first blood meal [22]. However, recent work found no Chapter 1. Reichenowia 3 evidence for maternal transmission by attempting diagnostic PCRs on the ovarial tissue of seven different species of Placobdella; the testisacs of P. parasitica, P. rugosa, and P. sp. which produced no bands, and a spermatophore from P. parasitca which produced faint bands, but was not able to be sequenced [24]. This uncertainty is compounded when considering the fact that a 16S sequence was obtained from a sludge sample in India for a bacterium that places within Reichenowia (GenBank accession no. AY897416). This result is not entirely surprising given that the closest relatives of Reichenowia (Ensifer and Rhizobium) form similar associations with their plant hosts wherein the host acquires symbionts from the environment every generation [26].

Research questions

Owing to the fact that that the functional role of Reichenowia has heretofore only been assumed, and given the functional role proposed in the analagous Haementeria-Providencia system [20], a major goal of this project is to assess the degree to which the genomes of Reichenowia spp. corroborate the B vitamin provisioning hypothesis in this system. I address this through the functional annotation of a battery of B vitamin pathways. Unlike many other metabolic pathways (e.g., essential amino acid synthesis pathways) which are understood in a relatively broad phylogenetic context, our understanding of the pathways for B vitamin synthesis have come mostly from studies based on Escherichia coli. In addition, the evolutionary pressures exerted on endosymbionts in general result in derived genomes that render the direct interpretation of automatic annotation softwares dubious (e.g., many genes which, based on manual assessment, are clearly pseudogenized, are reported as intact and functional CDS’s). For these reasons, I have limited the functional annotation to these pathways despite recognizing the need for full annotations before making larger scale metabolic and functional predictions for Reichenowia spp. Another major goal of this research is to further elucidate the phylogenetic relationships of Reichenowia. Although Reichenowia’s general placement has remained consistent across multiple studies [16, 22–24] the sister group to these endosymbiotic taxa remains a mystery. Since endosymbiosis is, as far as we understand, an inherently derived condition among bacteria, finding the (free-living or pathogenic) relative of the focal taxon is especially important in fully understanding the ecological limitations and evolutionary basis of the symbiosis in question. To address these questions, pooled tissue samples from the bacteriomes of six species of Placobdella have been sequenced, resulting in six genome sequences of varying quality. I find genome-level evidence (i.e., pathway retention) across each genome for all B vitamin pathways with the exception of B12, suggesting that Reichenowia spp. provides its host with B vitamins in vitro. In addition, I find robust and consistent support for the nesting of Gellertiella hungarica str. DSM 29853 (a motile, free-living bacterium) within Reichenowia spp. neccessiating the dissolution of Reichenowia in favour of Gellertiella.

1.2 Materials & Methods

1.2.1 Specimen Collection

The specimens of Placobdella used in this study were collected from 2015-2018 and span the Canadian provinces of Alberta, Saskatchewan, Manitoba, and Ontario, as well as the US states of Nebraska and Min- nesota. Specimens were collected by the shoreline of small bodies of freshwater, more specifically, underneath rocks on the bed of the water body, on pieces of detritus (i.e., submerged logs), swimming freely, or attached to a host or collector. Following collection, specimens were relaxed in 20% ethanol for approximately 15 minutes followed by fixation in 95% ethanol. Specimen identification was conducted through the use of specialized literature, namely [27], [28], and [29–34]. For a full list of specimens used including ROMIZI Chapter 1. Reichenowia 4 accession numbers, refer to Table 1.1.

1.2.2 Genome Sequencing & Assembly

All leech specimens (and their bacteriomes) were dissected under a Leica Wild M10 stereo dissection micro- scope while submerged in 95% ethanol. DNA extraction was performed on dissected bacteriomes using a phenol/chloroform protocol. Prior to extraction, samples were frozen using liquid nitrogen and subsequently macerated using a pestle within an Eppendorf tube. Once macerated, tissue was centrifuged at 4◦ C for a total of 15 minutes and the resultant aqueous phase was transferred to a new Eppendorf tube for further analysis. DNA quantification was performed with a tapestation. DNA libraries were constructed using a NGS Nextera FLEX DNA library preparation kit (150 bp paired end) (Illumina Inc., San Diego, CA, USA) according to the manufacturer’s protocol. Library preparation and sequencing were performed at SickKids (Toronto, ON, CA). Sequencing was conducted using the HiSeqX platform and utilizing a single lane. Illumina reads were right-tailed clipped, with a minimum quality threshold of 20 (−t 20), using ‘fastq quality trimmer’ from the FASTX-Toolkit ver. 0.0.14. In addition, reads shorter than 75 bps were discarded. PRINSEQ ver. 0.20.4 [35] was used to remove reads containing undefined nucleotides (‘N’) as well as for those that were left without a pair after the filtering steps. The remaining reads were assembled using SPAdes ver. 3.11.1 [36] with the options ‘–only-assembler’ and k-mer sizes of 33,55,77, and 99. The resulting contigs were filtered with a minimum contig size of 200 and a k-mer coverage of 30x. Next, contigs were binned using two parallel strategies. The first strategy is as follows: the best-hits from a BLASTX search against a custom databases made up of the proteome of the leech Helobdella robusta, the polychaete Capitella teleta, the trypanosomatid Trypanosoma cruzi strain CL Brenner and gambiense DAL972, the mitochondria of Hamenteria officinalis and Placobdella lamothei, the alphaproteobacteria Agrobacterium radiobacter str. K84, str. CFN 42, and fredii str. NGR234. This binning was manually curated by checking the con- sistency and taxonomic assignment of proteins on each scaffold through on-line BLASTX searches against NCBI’s ‘nr’ database. These searches revealed the presence of bacteria (other than Reichenowia) in Pla- cobdella rugosa (Chlamydiae and Entomoplasma) and Placobdella sp. (Yokonella sp.). Metadata for all genomes used for this binning process are found in Table A.1. In addition, a graph-based approach was used to corroborate the BLASTX results. For this, Bandage ver. 0.8.1 [37] was used to select closed graphs that included the manually curated bins. This approach confirmed that, in every case, no additional scaffolds made up these graphs. These manually curated bins were then used for mapping back the reads to these as well as the mitochondria assemblies using Bowtie2 ver. 2.3.5.1 [38]. The reads mapping to each bin were used for targeted re-assembly following the same approach as previously described but using an additional k-mer size of 127. Finally, the resulting assemblies were manually screened for contamination and artefact scaffolds by inspecting all scaffolds, discarding mostly short low-complexity and low-coverage contigs.

1.2.3 Metabolic Reconstruction

All putative CDSs in each Reichenowia genome were identified using Prokka ver. 1.14.0 [39] under the default settings and these draft annotations were visualized with the use of UGENE ver. 36 [40]. To examine the completeness of the B-vitamin producing metabolic pathways, I retrieved all genes available on the KEGG database [41] for each of the metabolic pathways listed in Table A.3 and each of the organisms listed in Table A.2. These organisms were chosen as they: i) largely represent well characterized bacteria, ii) exhibit considerable divergence in their metabolic pathways, and iii) contain close relatives of Reichenowia [23, 24]. I began with the most extensively studied bacterium, Escherichia coli strain K-12 (MG1655) (hereafter, E. coli) wherein each gene from each pathway was manually verified via a combination of BLASTP[42] and InterPro[43], using sequence similarity as well as domain architecture to confirm the enzymatic function Chapter 1. Reichenowia 5

Host Taxon Symbiont Taxon Accession No. Locality Placobdella montifera Reichenowia sp. str. Pmon ROMIZ I13692 Pine Bay, ON, Canada Placobdella multilineata Reichenowia sp. str. Pmul ROMIZ I12290 Georgia, MN, USA P. multilineata R. sp. str. Pmul ROMIZ I12291 Georgia, MN, USA P. multilineata R. sp. str. Pmul ROMIZ I12292 Georgia, MN, USA P. multilineata R. sp. str. Pmul ROMIZ I12293 Georgia, MN, USA Placobdella rugosa Reichenowia sp. str. Prug ROMIZ I10089 Mijinemungshing Lake, ON, Canada P. rugosa R. sp. str. Prug ROMIZ I10090 Mijinemungshing Lake, ON, Canada P. rugosa R. sp. str. Prug ROMIZ I10110 Kenny Lake, ON, Canada P. rugosa R. sp. str. Prug ROMIZ I10112 Kenny Lake, ON, Canada P. rugosa R. sp. str. Prug ROMIZ I10118 Paquette Lake, ON, Canada P. rugosa R. sp. str. Prug ROMIZ I10119 Paquette Lake, ON, Canada P. rugosa R. sp. str. Prug ROMIZ I10120 Paquette Lake, ON, Canada P. rugosa R. sp. str. Prug ROMIZ I10253 Moore Lake, ON, Canada P. rugosa R. sp. str. Prug ROMIZ I10476 Lost Lake, MN, USA P. rugosa R. sp. str. Prug ROMIZ I10478 Lost Lake, MN, USA P. rugosa R. sp. str. Prug ROMIZ I10479 Lost Lake, MN, USA P. rugosa R. sp. str. Prug ROMIZ I10513 Unnamed Pond, NE, USA P. rugosa R. sp. str. Prug ROMIZ I11442 Anglin Lake, SK, Canada P. rugosa R. sp. str. Prug ROMIZ I11455 Anglin Lake, SK, Canada P. rugosa R. sp. str. Prug ROMIZ I11473 Algonquin Provincial Park, ON, Canada P. rugosa R. sp. str. Prug ROMIZ I11474 Algonquin Provincial Park, ON, Canada P. rugosa R. sp. str. Prug ROMIZ I11529 Unnamed pond, MB, Canada P. rugosa R. sp. str. Prug ROMIZ I13649 Cloud Bay Rapids, ON, Canada Placobdella parasitica Reichenowia sp. str. Ppar ROMIZ I10340 Algonquin Provincial Park, ON, Canada P. parasitica R. sp. str. Ppar ROMIZ I11476 Algonquin Provincial Park, ON, Canada P. parasitica R. sp. str. Ppar ROMIZ I11613 Algonquin Provincial Park, ON, Canada Placobdella phalera Reichenowia sp. str. Ppha ROMIZ I10190 Clear Lake, ON, Canada P. phalera R. sp. str. Ppha ROMIZ I10191 Clear Lake, ON, Canada P. phalera R. sp. str. Ppha ROMIZ I10284 Pacaud Lake, ON, Canada P. phalera R. sp. str. Ppha ROMIZ I11121 Vermillion Lake, MN, USA Placobdella sp. Reichenowia sp. str. Psp ROMIZ I11551 Adams Lake, MB, Canada P. sp. R. sp. str. Psp ROMIZ I11554 Adams Lake, MB, Canada P. sp. R. sp. str. Psp ROMIZ I11483 Adams Lake, SK, Canada P. sp. R. sp. str. Psp ROMIZ I11277 Half Moon Lake, AB, Canada P. sp. R. sp. str. Psp ROMIZ I11278 Half Moon Lake, AB, Canada P. sp. R. sp. str. Psp ROMIZ I11279 Half Moon Lake, AB, Canada P. sp. R. sp. str. Psp ROMIZ I11297 Beaver Lake, AB, Canada P. sp. R. sp. str. Psp ROMIZ I11356 Muir Lake, AB, Canada Table 1.1: The columns from left to right indicate: the host species in which bacteriomes were dissected and sequenced, the bacterial species found within the bacteriomes, each leech specimen’s ROMIZI accession number, and its collection locality. Chapter 1. Reichenowia 6

(note: the Biocyc database collection was also consulted during this step to ensure optimal coverage of potential alternative pathway routes [44]). BLAST databases were then constructed which contained the genes present in each pathway for E. coli. These databases were used to expedite the process of curation for the final dataset. Genes for each pathway (in all gammaproteobacteria) were used as the queries in BLASTp searches against the genes for their corresponding pathway in E. coli and their current annotation was accepted if they were predicted as the same gene manually verified in E. coli and if their E value was ≤ 1e−80. Any genes that did not meet these criteria were manually verified according to the process outlined above. The cut-off value of 1e−80 was not random, it was chosen because of the presence of composite genes in some taxa such as ribBA, which generally show high sequence similarity to any of their non-composite homologs. The same process outlined above was conducted for each alphaproteobacterium in Table A.2, however databases analagous to those constructed for E. coli were created for Rhizobium etli strain CFN 42 (hereafter, R. etli). These R. etli pathway databases were then used to filter BLAST hits according to the criteria outlined above. All of the curated, organism-specific databases were then merged on a per-pathway basis (e.g., all thiamin synthesis-related genes were concatenated into a single file) and used as the subject sequences to aid in the detection of B vitamin synthesis-related genes in each of the Reichenowia genomes. The screening process used to detect B vitamin synthesis-related genes was the same as outlined above for the taxa contained in Table A.2. With this step completed, potential genetic gaps in B vitamin synthesis pathways were apparent in each Reichenowia genome. To ensure that these gaps were not simply the result of a missed annotation (especially in the case of short coding sequences) or present in a truncated form on the end of a contig, tBLASTx was used to search for a significant alignment between the gene coding for the enzyme representing the presumptive gap, in a closely-related organism if possible, and the Reichenowia genome sequence in question. Finally, the literature was consulted to search for any enzymes (or alternative pathway routes) known to act as functional replacements for those missing from the pathways in question. Once identified, tBLASTx and BLASTp were again used to search for a significant alignment between the gene coding for the enzyme representing the presumptive gap and the Reichenowia genome sequence (or predicted protein in the case of BLASTP) in question.

1.2.4 Phylogenomic Analyses

Taking into account previous findings [16, 22–24], my taxon sampling was designed to elucidate the position of Reichenowia within Rhizobiales, given its consistent placement within this order. A total of 155 genomes were used to reconstruct the phylogenetic trees depicted in this study (see: Table A.4). Of the 155, 146 taxa represent members of Rhizobiales, whereas the outgroup is represented by 8 members of Caulobacteraceae, and Meganema perideroides (Rhodobacterales)) (following the results of [45]), and Thalassocella blandensis (Gammaproteobacteria: Cellvibrionaceae) was used to root the phylogenies. Data for all previously sequenced taxa was downloaded from the NCBI asssembly database, using the accession numbers listed in Table A.4. Two gene sets were used in parallel to reconstruct the phylogenies: one contained all of the single copy ribosomal subunit proteins found in the taxa and a separate dataset of single copy orthologues which were present in all taxa in the dataset (note: these datasets were not mutually exclusive with respect to the loci included). The ribosomal dataset consisted of all proteins except for the 30S ribosomal protein S21 [rpsU ]. The orthogroups were found using Orthofinder ver. 2.4.0 [46] under default settings. MAFFT ver 7.450 [47] was used to produce alignments for each of the amino acid coded datasets and these alignments were subsequently backtranslated into their nucleotide counterparts using PAL2NAL ver. 14 [48]. GBlocks ver. 0.91b [49] was then used with default settings to clean the resultant alignments. Further analyses were ran either locally (when computational resources allowed), on the CIPRES science gateway [50], or using the Chapter 1. Reichenowia 7

Niagara supercomputer (a part of SciNet) [51]. To prepare for partitioning and model selection, the generated multiple sequence alignments were converted to the nexus file format [52] using tools available in Biopython [53] and (when necessary) subsequently converted to a relaxed phylip format; this was achieved through the use of the web-based ALTER [54]. The partitioning scheme and best-fit models of the phylogenetic datasets were found using PartitionFinder2 ver. 2.1.1 [55], where each gene was assigned its own partition prior to analysis (except in the case of the unpartitioned datasets, where only model selection occurred). All maximum likelihood (ML) phylogenies were reconstructed through the use of IQTree ver. 1.6.12 [56] [‘-m TESTNEW’ for unpartitioned datasets and ‘-bb 1000’ for all datasets] and all Bayesian inference (BI) trees were reconstructed using MrBayes v. 3.2.7 [57] [relburnin = yes, burninfrac = 0.25, samplefreq = 1000, nchains = 4, starttree = random, for all analyses]. Tracer ver. 1.7.1 [58] was used to assess the Bayesian runs for convergence and stationarity. To assess the level of support for the monophyly of endosymbiosis in Reichenowia (see subsection 1.3.3), I conducted an independent, constrained tree search for each dataset analyzed under the ML method wherein I imposed the constraint that all members of Reichenowia (i.e., the newly sequenced taxa) were necessarily monophyletic and separated by at least one branch from G. hungarica str. DSM 29853 (hereafter, G. hungarica). For the purposes of the unrooted constraint tree, Agrobacterium tumefaciens was used as an outgroup taxon. In all cases, the constrained searches utilized the same set of options as their non-constrained counterpart. Topology tests were conducted for each set of trees separately (i.e., the unconstrained and constrained search for each dataset) and with the following set of options: ‘-n 0’, ‘-zb 10000’, ‘-au’.

1.2.5 Similarity analyses

All similarity analyses used the same set of samples which comprised of all taxa newly sequenced for this study as well as Gellertiella hungarica. To calculate a distance matrix for each of the above taxa, the entire 16S rRNA locus was extracted using RNAmmer [59] and a distance matrix was calculated using AliView [60]. Average nucleotide identity (ANI) comparisons were calculated with the ‘ani-matrix.bash’ script included in the enveomics collection [61], using assemblies for each taxon as the input. The IMNGS platform [62] was used to search for 16S rRNA sequences included in the NCBI sequence read archive above a 97% and 99% threshold (with a minimum sequence length of 200), in an effort to elucidate some of the general ecological features of Gellertiella and Reichenowia. For the purposes of the abundance and prevalence measures, only samples in which the putative Gellertiella or Reichenowia sequences composed at least 0.1% of the total number of sequences were considered positive for their respective taxon. Input for the IMNGS analaysis was the same as for the 16S distance matrix (i.e., full 16S rRNA sequences for each taxon).

1.3 Results

1.3.1 Genome characteristics of Gellertiella and Reichenowia

The genome assembly process resulted in genomes that spanned anywhere from 29 contigs (in the case of R. sp. Ppha [hereafter, Ppha]) to 239 contigs (in the case of R. sp. Ppar [hereafter, Ppar). Genome size ranged from 2.01 Mega base pairs (Mbp) (in the case of R. sp. strain Psp [hereafter, Psp]) to 4.43 Mbp (in the case of Ppha). For comparison, the size of G. hungarica’s genome is 4.77 Mbp in length. The number of currently predicted protein coding sequences in the genomes of Gellertiella and Reichenowia ranges from 2315-4912 (in Psp and Ppha, respectively), whereas G. hungarica’s houses 4440 protein coding sequences. The GC content (measured in percentage) was relatively stable between various strains of Reichenowia, ranging from 59.66% (in Psp) to 62.63% (in R. sp. strain Pmul [hereafter, Pmul]), whereas G. hungarica possessed the Chapter 1. Reichenowia 8

Organism Name Number of contigs Genome size (Mbp) Putative coding sequences Predicted tRNAs GC Content (%) Gellertiella hungarica str. DSM29853 36 4.77 4440 50 63.54 Reichenowia sp. str. Ppha 29 4.43 4912 46 60.90 R. sp. str. Pmon 151 3.79 4116 46 62.49 R. sp. str. Ppar 239 3.37 4108 44 62.43 R. sp. str. Prug 86 3.31 4407 43 61.77 R. sp. str. Pmul 215 3.07 4016 45 62.63 R. sp. str. Psp 41 2.02 2315 45 59.66

Table 1.2: From right to left: the organism name (down to strain when possible), the number of contigs resulting from the assembly process, the genome size in base pairs, the number of intact coding sequences found during automatic annotation, and the percentage of base pairs coding for either guanine or cytosine throughout the assembled contigs.

Reichenowia sp. str. Psp R. sp. str. Prug R. sp. str. Pmon R. sp. str. Ppha R. sp. str. Ppar R. sp. str. Pmul Gellertiella hungarica str. DSM29853 Reichenowia sp. str. Psp - 83.16 (97.64) 82.35 (97.78) 81.19 (96.64) 82.15 (96.36) 82.51 (97.11) 82.45 (96.90) R. sp. str. Prug 83.16 (97.64) - 81.62 (98.32) 80.30 (97.31) 81.56 (97.04) 81.90 (97.85) 81.56 (97.17) R. sp. str. Pmon 82.35 (97.78) 81.62 (98.32) - 80.72 (98.79) 83.26 (98.45) 90.29 (99.33) 83.24 (99.06) R. sp. str. Ppha 81.19 (96.64) 80.30 (97.31) 80.72 (98.79) - 81.07 (97.64) 81.07 (98.12) 81.13 (99.06) R. sp. str. Ppar 82.15 (96.36) 81.56 (97.04) 83.26 (98.45) 81.07 (97.64) - 83.12 (97.98) 82.76 (97.58) R. sp. str. Pmul 82.51 (97.11) 81.90 (97.85) 90.29 (99.33) 81.07 (98.12) 83.12 (97.98) - 82.88 (98.38) Gellertiella hungarica str. DSM29853 82.45 (96.90) 81.56 (97.17) 83.24 (99.06) 81.13 (99.06) 82.76 (97.58) 82.88 (98.38) -

Table 1.3: A similarity matrix depicting various measure of nucleotide identity as compared between strains of Gellertiella and Reichenowia. The unenclosed value represents the average nucleotide identity (ANI) score, whereas the value in brackets represents the shared identity across the 16S rRNA locus (both values expressed as percentages). highest GC content at a level of 63.54%. For a full overview of each strain’s genomic characteristics, see Table 1.2.

1.3.2 Similarity analyses

To determine the of the newly sequenced strains, I calculated the average nucleotide identity (ANI) and 16S rRNA identity between each newly sequenced sample as well as to Gellertiella hungarica, which the newly sequenced taxa showed a strong phylogenetic affinity for (see subsection 1.3.3). The results of these analyses are showcased in Table 1.3. The generally accepted thresholds for the circumscription of species boundaries using ANI and 16S identity are ANI≥ 95% for ANI scores and ≥ 98.6% for 16S identity (i.e., if the taxonomy of one sample is known, there is a high confidence that the samples represent a single species if the value is above or equal to the cutoff value) [63–67]. According to the ANI scores observed in the matrix, each sample represents a different species in the genus Gellertiella with all non-self comparisons between the values of 80% and 83%. The exception to this was R. sp. Pmon (hereafter, Pmon) x Pmul which displays a relatively uncommon score of 90%, placing it within the often observed intra vs. interspecific discontinuity [63]. However, taken together with the 16S identity scores, the picture is not so simple. Using the commonly accepted intraspecific 16S identity cutoff of 98.6%, Ppar, Psp, and R. sp. Prug (hereafter, Prug) are found to be distinct from all other strains examined (although Ppar and Pmon share a value of 98.45%). On the other hand, G. hungarica, Ppha, Pmon, and Pmul tend to form a ‘species-level’ cluster. Within this cluster, all values are above the 98.6% cutoff except for comparisons between either of Ppha or G. hungarica and Pmul (16S identity 98.12% and 98.38%, respectively). Despite this evidence, treating each of the members of this cluster as a strain of G. hungarica would render the species paraphyletic across all of the recovered trees (see subsection 1.3.3). The results of the IMNGS search is depicted in Figure 1.1. The number of samples which tested positive for each strain is as follows: Ppha (n=77), G. hungarica (n=19), Prug (n=12), and Pmon (n=5). Based on my criteria, Pmul, Ppar, and Psp were not detected in any samples. The majority of positives samples were Chapter 1. Reichenowia 9

Figure 1.1: The top graph depicts the frequency of sequence read archive (SRA) samples which were positive for a particular strain of Gellertiella and/or Reichenowia. These bins were sorted according to the sampling environment from which the sequences were recovered and each bin was partitioned proportionately according to the number of samples containing each strain (represented by different colours as specified in the figure legend). The lower graph depicts the proportional abundance (based on the total number of sequences recovered) and is sorted by sampling environment. For the abundance measurements, all strains/species were treated together. Each boxplot depicts the following measurements: the median (line within box), first and third quartiles (bottom and top of box, respectively), and the minumum and maximum values (lower and higher whisker, respectively) .

recovered from samples which sequenced organisms associated with the root and/or leaf tissues of Arabidopsis thaliana. Other positive samples were the result of plant-associated sequencing efforts (namely, Glycine max) but these were much lower in frequency compared to the A. thaliana derived samples. associated samples were derived from sequencing efforts conducted on mosquito microbiomes (n=6), fish guts (n=4), and Lepus europaeus. In the case of the mosquito derived samples, G. hungarica was invariably the strain found to inhabit these samples whereas Ppha and Prug were found in the fish and L. europaeus samples, respectively. The remainder of these consisted of environmental samples and, among these, the soil (n=13) and air (n=9) metagenomes made up the bulk of the positive samples: the soil metagenomes tested positive roughly equally for Ppha and G. hungarica whereas the air metagenomes predominantly tested positive for Prug with one instance each of Pmon and G. hungarica. Across all of the positive samples, the proportional abundance of Gellertiella and/or Reichenowia remained relatively low. The highest abundance was observed in the air metagenomes with a maximum value of 0.061 and a median value of 0.012. The highest median abundance was observed in the mosquito samples with a value of 0.016 and a maximum value of 0.037. Other samples with notable abundances were the landfill/compost cover derived metagenomes which reached a maximum of 0.018 and had a median of 0.005. The A. thaliana derived samples reached a maximum of 0.034 and had a median value of 0.003. All other median abundances were below the value of 0.01. Chapter 1. Reichenowia 10

1.3.3 Phylogenomic placement of Reichenowia

Regarding the phylogenetic trees, the majority shared a generally similar topology with the clear exception of the ribosomal, unpartitioned, nucleotide coded, maximum likelihood (ML) tree in which Rhizobiales was rendered paraphyletic by Caulobacterales, Rhodobacter sphaeroides, and Meganema perideroedes, the taxa used as outgroups. Given this dubious topology, interpretation of this tree will be kept to a minimum. In addition, while many of the trees contained some basal nodes that exhibited relatively high support (i.e. UFBoot (MBS) ≥ 95% or posterior probability (PP) ≥ 99%), this was not observed in the aforementioned tree. Although several families were recovered as paraphyletic and/or polyphyletic across all of the trees, there was often precedent for such relationships: for example, in Bradyrhizobiaceae [68], [69, 70], Methylocystaceae [71], and Xanthobacteraceae [72, 73]. The problem of taxonomy not being rep- resentative of evolutionary relationships between bacteria has long been recognized and has led to multiple initiatives which seek to remedy this precise issue (e.g.,[74, 75] which further underscore that my trees are not outliers in this regard). The topology for the ribosomal, partitioned, amino acid coded, ML phylogeny is depicted in Figure 1.2. The Rhizobiaceae clade (and the relationships of the taxa therein) was of particular interest in this study and in most (but not all) cases, this family was recovered as monophyletic with full support. This monophyly is contingent upon the inclusion of dimorpha, a bacterium which putatively belongs to the family Brucellaceae, but was recently placed with confidence in Rhizobiaceae [76], and in all of the trees places as the sister to Shinella zoogloeoides with full support, and the inclusion of the genera Hoeflea and Pseudohoeflea. An important note is that the group Rhizobiaceae was fully supported both in the case where Hoeflea and Pseudohoeflea were incuded in Rhizobiaecae and when Georhizobium profundi and haloflavum (the basally branching members of Rhizobiaceae) were excluded from the group. Alternatively, Rhizobiaceae was rendered polyphyletic in the trees reconstructed from the nucleotide coded ribosomal dataset via the placement of as sister to Bartonellaceae (MBS = 88-100%). While an explicit sister relationship with Bartonellaceae has, to my knowledge, not been reported for Liberibacter its placement does seem to be quite variable depending on the type of data used for phylogenetic reconstruction and the taxa which are sampled (e.g.,[77–79]). Another possible contributor to this phylogenetic instability is the derived nature of the majority of Liberibacter genomes (the bulk of this genus is composed of obligate, intracellular plant pathogens) [80]. Following recent taxonomic revisions proposed for the family Rhizobiaceae and the genus Pseudorhizobium, which is nested within [81, 82], the genera Agrobacterium, , Ciceribacter, Ensifer/Sinorhizobium, Liberibacter, Pseudorhizobium, and Rhizobium are recovered as monophyletic with full support in all trees. In contrast to the findings of [81], variable support is found for the monophyly of the ‘R. aggregatum complex’ (MBS = 69-100%) (represented in this study with R. daejeonense and R. selenitrireducens) which is rendered paraphyletic by Ciceribacter, in all of the ribosomal, amino acid coded trees with the exception of the partitioned, ML tree (i.e., the tree presented in Figure 1.2). In all of the phylogenetic trees reconstructed, Reichenowia forms a fully supported clade, nested within Rhizobiaceae, via the inclusion of the free living G. hungarica [83]. For a discussion of the genomic simi- larities between G. hungarica and Reichenowia, see subsection 1.3.2. Based on this finding (in concert with the similarity analyses), and given that the genus Reichenowia was an unoffical (candidatus) name owing to its members unculturability (see Appendix 11 [84]), the name Gellertiella takes taxonomic precedence and members formerly classified as Reichenowia should henceforth be referred to using the generic name Gellertiella. Interestingly, across all of the trees, Ppha was fully supported as the sister to a clade containing G. hungarica and the remainder of Reichenowia. G. hungarica was fully supported as the sister to the remainder of Reichenowia (i.e., excluding Ppha). Within this ‘endosymbiotic’ clade (fully supported in all trees), sister relationships between Pmon and Pmul, as well as Prug and Psp were fully supported in each Chapter 1. Reichenowia 11

Thalassocella blandensis ISS155 Caulobacterales

Rhodobacter sphaeroides ATCC 17029

Neomegalonema perideroedes DSM15528

Parvibaculum lavamentivorans DS-1

Pyruvatibacter mobilis CGMCC 1.15125

Aestuariivirga litoralis KCTC 52945 99 99 denitrificans ATCC 51888 vannielii ATCC 17100

Tepidamorphus gemmatus DSM 19345

Blastochloris viridis ATCC 19567 Xanthobacteraceae

99 Variibacter gotjawalensis GJW-30

Oligotropha carboxidovorans OM5

Afipia felis NCTC12499

Nitrobacter winogradskyi Nb-255

Rhodopseudomonas palustris CGMCC 1.2180 96 72 Tardiphaga robiniae LMG 26467 99 Bradyrhizobium

Rhabdaerophilum calidifontis SYSU G02060

74 Roseiarcus fermentans DSM 24875

Methylosinus trichosporium OB3b

Methylocystis parvus BRCS2 53 Lichenibacterium ramalinae RmlP001

53 Lichenihabitans psoromatis PAMC 29148 Beijerinckiaceae

Bosea vaviloviae Vaf18 84 Chelatococcaceae 90 Salinarimonas rosea DSM 21201 Methylobacteriaceae Kaistia 98 89 Siculibacillus lacustris SA-279 Pleomorphomonas diazotrophica R5-392

Cohaesibacter gelatinilyticus DSM 18289 62 Amorphus coralli DSM 19760

86 42 Breoghania corrubedonensis DSM 23382

Afifella aestuarii JA968 52 myrionectae HL2708#5

Pelagibacterium halotolerans ANSP101

Youhaiella tibetensis fig4 88 Devosia

0.2 substitutions per site

Continued on following page

Figure 1.2: Maximum likelihood tree based on partitioned, amino acid coded data from all single copy ribosomal loci from each taxon listed in Table A.4 (log likelihood = -383298.739875). UFBoot values are represented as percentages and depicted above and to the left of their respective nodes. Black circles on nodes depict full support recovered for that node. Triangular tips represent clades that have been collapsed to depict higher order taxonomic groupings. Branch lengths are drawn proportional to the amount of change. Chapter 1. Reichenowia 12

Continued on preceeding page

Salaquimonas pukyongi RR3-28 Aurantimonadaceae Bartonellaceae Phyllobacteria

Paramesorhizobium deserti A-3-E 65 Pseudochrobactrum saccharolyticum CCUG 33852

Falsochrobactrum ovis DSM 26720

63 Ochrobactrum quorumnocens A44 Brucella melitensis 16M

Ochrobactrum anthropi OAB

98 Notoacmeibacter marinus XMTR2A4 Zhengella mangrovi X9-2-2 92 85

Aquamicrobium aerolatum DSM 21857

Pseudoaminobacter salicylatoxidans DSM 6986

Aquamicrobium defluvii DSM 11603

Oricola cellulosilytica KCTC 52183

Oceaniradius stylonematis StC1

Roseitalea porphyridii MA7-20 79 Pararhizobium haloflavum XC0140 Rhizobiaceae Georhizobium profundi WS11

Pseudohoeflea suaedae YC6898 Liberibacter

Mycoplana dimorpha DSM 7138

95 Shinella zoogloeoides PQ7 Pararhizobium polonicum F5.1 Ensifer/Sinorhizobium

81 Rhizobium (sensu stricto) Reichenowia sp. Ppha

Gellertiella hungarica DSM 29853

80 Reichenowia sp. Pmon Reichenowia sp. Pmul Reichenowia sp. Ppar 97 Reichenowia sp. Prug 87 Reichenowia sp. Psp Allorhizobium 69 Rhizobium selenitrireducens ATCC BAA-1503 99 Rhizobium daejeonense CCBAU10050 Ciceribacter 0.2 substitutions per site Agrobacterium

94 Rhizobium pseudoryzae DSM 19479

Rhizobium rhizoryzae DSM 29514

98 Neorhizobium galegae HAMBI 540 99 Pseudorhizobium

Figure 1.2: Maximum likelihood tree based on partitioned, amino acid coded data from all single copy ribosomal loci from each taxon listed in Table A.4 (log likelihood = -383298.739875). UFBoot values are represented as percentages and depicted above and to the left of their respective nodes. Black circles on nodes depict full support recovered for that node. Triangular tips represent clades that have been collapsed to depict higher order taxonomic groupings. Branch lengths are drawn proportional to the amount of change. Chapter 1. Reichenowia 13 tree. Whereas the general topology of internal Gellertiella and Reichenowia relationships was stable, the relationship between Ppar and its sister clade varied somewhat between trees. In all of the amino acid-coded trees reconstructed from the ribosomal dataset, Ppar was recovered as the sister to a clade including Prug and Psp (MBS = 96-97%, PP = 91-95%) and this group, in turn, was recovered as the sister to a clade containing Pmon and Pmul. In all other trees reconstructed the reverse was true: Ppar was recovered as the sister to a clade containing Pmon and Pmul (MBS = 100%) and that clade was recovered as the sister to a clade containing Prug and Psp. To test the degree to which the data supports the monophyletic origin of an endosymbiotic state in Reichenowia, I reconstructed trees with a constraint imposing G. hungarica (the only known free-living member of the genus) as the sister to the remainder of Reichenowia and no further constraints. According to several different types of topology tests Table A.5, the constrained topology is significantly rejected across all datasets except for the unpartitioned, nucleotide coded, ribosomal dataset where all tests do not reject the constrained topology.

1.3.4 B-vitamin synthesis capabilities

The results show that all strains of Gellertiella and Reichenowia sequenced herein retain a largely complete metabolism for the biosynthesis of the B vitamin complex (note: some pathway holes which are shown in Figure 1.3 are fully discussed in subsection 1.4.2). In contrast to this pattern is the pathway for cobalamin (B12) which is severely degraded in all strains except for Gellertiella hungarica. For a complete overview of Gellertiella hungarica and Reichenowia’s B vitamin metabolism, see Figure 1.3. From the thiamin pathway (B1), the genes thiI and phoA/rsgA are absent among all taxa, resulting in two potential holes in the pathway. Because the two functions of thiI are carried out via separate domains, only the rhodanese domain is necessary for the thiamin relevant function [85] and accordingly, proteins containing this domain were found in each genome examined. The other hole represents alkaline phosphatase (phoA), an enzyme responsible for cleaving the phosphate group from thiamin monophosphate (TMP), thereby converting it to thiamin. The riboflavin (B2) pathway remains complete save for the absence of ybjI /yigB in all taxa and the loss of ribA in Prug and Pmul. However, regarding the former, a BLAST against the Genbank non-redundant (nr) database using the filter ‘Rhizobiaceae (taxid:82115)’ revealed that these proteins are not found within the family: the only hit reported with a score < 1e−15 was a Rhizobium leguminosarum sequence (accession no: NEI53301.1) which resembles a fragment from the aminoterminal end of yigB. Another enzyme from the same protein family, serB, which traditionally carries out the dephosphorylation of phosphoserine, was found to be retained across Gellertiella and Reichenowia. In the case of Prug, no sequences were found with significant similarity to ribA and in Pmul, the protein appears to be present in the genome in a fragmented form wherein ONPCBJCD 01171 corresponds to the aminoterminal half of the protein while ONPCBJCD 01170 corresponds to the carboxylterminal half. With respect to the genomic organization of ribB and ribA and with the exceptions of G. hungarica and Ppha, ribB and ribA exist as distinct CDSs in each genome and are not located within a close proximity of each other (i.e., ribB and ribA are likely transcribed separately and do not place within a ‘riboflavin operon’). In contrast, G. hungarica and Ppha house a ribBA fusion protein which catalyzes two distinct steps in the synthesis of riboflavin. In terms of the NADP+ synthesis pathway, Gellertiella and Reichenowia, in most cases, retain a complete pathway for the de novo synthesis of NADP+ (from L-aspartate) as well as two types of salvage pathways: the PNC IV cycle (from β-nicotinamide D-ribonucleotide) and the PNC VI cycle (from nicotinamide riboside) [86, 87]. Additionally, multiple enzymes are able to catalyze the final step in the pathway where NAD+ is phosphorylated to produce NADP+. These enzymes are: nadK (EC number: 2.7.1.23), sthA (EC number: Chapter 1. Reichenowia 14

Thiamin (B1) Riboflavin (B2), folic acid (B9)

phoA rsgA pabA pabB pabC thiC thiD thiE tpk Chorismate AIR TMP Thiamin TPP trpEG smc04014 homolog general general phosphatase folP phosphatase folB folK folC folA nifS L-Cys Folic acid sufS thiI phoA thiG folE thiF rhodanese Sulfur carrier containing protein protein yigB ybjI ribA ribD ribE ribC ribF ribF dxs GTP * Riboflavin FMN FAD G-3P serB

thiO Ribulose 5 ribB Glycine phosphate

pncA Nicotinamide * NMN Cobalamin (B12) * pncB pncC hemA hemB hemC hemD * Glycine nadB nadA nadC nadD nadE nadK sthA L-Asp * * * NAD+ * NADP+ cobA cysG pntA pntB* Coenzyme A (B5), NADP+ (B3) cobI amaB* panD

panE panC cobG AHAS ssu ilvC ilvD panB coaA dfp coaD coaE cobK cobF cobM cobJ Pyruvate Coenzyme A mazG AHAS lsu ilvC cobL ilvE L-Val

cobH Pyridoxal 5 phosphate (B6) Lipoic acid lipB lipA cobB epd pdxB serC pdxA pdxJ pdxH Octanoyl-[acp] Lipoic acid D-Erythrose PLP 4-phosphate gapA pdxR cobN thiG cobS (6.6.1.2) D-Ribulose cobT 5-phosphate cobO cobQ

cobD

cobO Biotin (B7) Cobinamide cobP

bioC bioH bioF bioA bioD bioB Fatty acid Malonyl-[acp] Biotin cobP synthesis cobU cobC cytochrome Dimethyl bioI * P450 benzimidazole cobS (2.7.2.21)

Long chain [acp] Cobalamin

Figure 1.3: Representation of each metabolic pathway involved in the biosynthesis of the B vitamin complex as found in Gellertiella and Reichenowia. Bolded names represent start or end-point compounds and black circles represent intermediates not named to conserve space. All arrows represent the conversion of one intermediate to the next, with the gene(s) responsible for producing the enzyme depicted either above or below the reaction arrow. Steps for which the enzyme’s gene(s) was not found are highlighted with a red box and denoted by gray, dotted reaction arrows. Asterisks above the name of a particular gene denote that this gene is not present in all strains examined. Grey names and arrows represent steps for which I have found a potential replacement enzyme that has not yet been demonstrated to have the action indicated. Abbrevia- tions are as follows: aminoimidazole ribotide (AIR), L-cysteine (L-Cys), D-glyceraldehyde 3-phosphate (G- 3P), thiamin monophosphate (TMP), thiamin diphosphate (TPP), guanosine 5‘-triphosphate (GTP), flavin mononucleotide (FMN), flavin adenine dinucleotide (FAD), pyridoxal 5‘-phosphate (PLP), L-aspartate (L- Asp), nicotinamide adenine dinucleotide (NAD+), nicotinamide adenine dinucleotide phosphate (NADP+), L-valine (L-Val), acyl-carrier protein ([acp]). Chapter 1. Reichenowia 15

1.6.1.1), and pntA/pntB (EC number: 1.6.1.2). In most cases, Gellertiella and Reichenowia retain at least two of these enzymes. However, despite the general trends explained above, there is considerable variation between the amount and types of pathways retained among the different strains examined. G. hungarica, Ppha, and Ppar retain the de novo synthesis, PNC IV cycle, and PNC VI cycle pathways as well as multiple ways of catalyzing the final reaction in the pathway (G. hungarica retains nadK, pntA/pntB, and sthA, whereas Ppha and Ppar retain only the former two). The NADP+ pathways of Pmul, Prug, Psp on the other hand, are relatively degraded. Pmul retains both the PNC IV and PNC VI cycle salvage pathways, but houses truncated forms of both nadA and nadC, rendering the de novo synthesis pathway dysfunctional. Prug retains the PNC IV cycle salvage pathway, all steps in the PNC VI cycle salvage pathway (save for pncA, which is present in a truncated form), but does not retain the reportoire needed for de novo synthesis (nadA and nadB are present in truncated forms). G. sp. has the most severely degraded NADP+ pathways: the early steps of de novo synthesis (nadA, nadB, and nadC ) are not found in the genome, pncA (from the PNC VI cycle salvage pathway) and pncC (from the PNC IV cycle salvage pathway) are also not found in the genome, and G. sp. also seems to have lost both sthA and pntA/pntB. Regarding the coenzyme A pathway (B5), the genes panE and panD were not found in the genomes of Gellertiella or Reichenowia. In terms of panE, its activity has been found to be replicated by ilvC, another more general reductase [88] which is found in all taxa. The gene panD was also absent in the genomes of Gellertiella and Reichenowia, but a recent discovery shows that the gene amaB, which converts 3-ureidopropionic acid to β-alanine as part of the uracil degradation pathway, is found in a majority of alphaproteobacteria lacking a panD homolog [89]. In line with this finding, all strains (with the exception of G. sp.) retain a copy of amaB that likely produces the β-alanine needed for the synthesis of pantothenate. PdxB was another ‘essential’ gene missing from genomes of Gellertiella and Reichenowia. Regarding pdxB, a BLASTp search against the Genbank non-redundant (nr) database, using the filter ‘Rhizobiaceae (taxid:82115))’, and using the E. coli form of pdxB as the query sequence, only returned hits with relatively low similarity (> 1e−30) that were largely annotated as general oxidoreductases and dehydrogenases so it is unsurprising that this enzyme is not found in the genomes of Gellertiella and Reichenowia. Instead, all strains retain a putative pdxR/smc00985 enzyme which is predicted to carry out a reaction that is homologous to that of pdxB in E. coli. In the case of the biotin pathway (B7) bioC and bioH, which are responsible for initiating the first steps of biotin biosynthesis [90], are missing in all taxa. BioM, bioN, and bioY (implicated in the transport of biotin for certain rhizobial taxa [91, 92]) were also absent in all taxa. BioI (which encodes a cytochrome P450 [93]) can also generate the pimeloyl moiety required for biotin synthesis (as an alternative to bioC and bioH ) and G. hungarica, Pmon, Ppar, and Ppha possess sequences encoding a cytochrome P450 which bears a moderate similarity to the canonical bioI (2e−64 - 3e−69). This sequence is not found in G. sp. and is present in a truncated form in Pmul and Prug. In the folic acid pathway (B9) phoA (discussed above), pabA, and pabC are missing from Gellertiella and Reichenowia’s metabolic repertoire. The case for pabC is similar to that of ybjI, yigB, and pdxB where the absence of the canonical protein can be applied more generally to Rhizobiaciae. A BLASTp search against the Genbank non-redundant (nr) database, using the filter ‘Rhizobiaceae (taxid:82115))’, and using the B. subtilis form of pabC, returned hits with scores > 1e−30 (with the exception ofWP 076846432.1, a class IV aminotransferase from Agrobacterium tumefaciens with a score of 9e−59). Instead, a homolog of the gene smc04041 (which is adjacent to pabB and exhibits some sequence similarity with the canonical pabC ) might be fulfilling the role of the seemingly missing pabC in the strains examined [94]. In accordance with this prediction, this gene is retained in all strains and its position adjacent to pabB is maintained, strongly supporting the homology of these sequences. G. hungarica possesses the full set of genes encoding the enzymes needed to synthesize cobalamin from Chapter 1. Reichenowia 16 the starting point of glycine and utilizing the late-insertion (a.k.a. aerobic) sub-pathway for corrin ring synthesis. In contrast, none of the strains of Reichenowia retain this ability. Instead, each of these taxa possess pathways with rampant pseudogenization and a lack of evidence for various genes essential for the proper function of this pathway.

1.4 Discussion

1.4.1 Taxonomic status of Candidatus Reichenowia

Given the evidence presented in this study, I make the recommendation that further studies of Reichenowia adopt the taxonomy established in [83] and refer to these alphaproteobacterial symbionts of Placobdella as members of the genus Gellertiella. Despite its earlier use, the term Reichenowia does not take precedence in this case due its Candidatus (and thereby unrecognized) status [84]. However, the degree to which this genus should be subdivided (or not) into multiple species remains unclear. From the perspective of 16S identity alone, there exists, at a minimum, three genetic ‘species’ of Gellertiella: Psp, Prug, and Ppha + G. hungarica + Pmon + Pmul. Furthermore, the ANI scores are suggestive of each taxon being representative of a distinct species Table 1.3. However, the derived nature of the endosymbiotic Gellertiella genomes could result in exaggerated ANI values given that much of the genome is presumably derived from the same genetic background as G. hungarica but is under relaxed selection as a consequence of a host-associated lifestyle. Pending further insight regarding the ecology of endosymbiotic members of Gellertiella (e.g., evidence that these endosymbionts are inherited in a strictly vertical nature and thus are evolving separately from other populations of Gellertiella) I adopt a more conservative approach and suggest that samples of Gellertiella taken from different species be regarded as strains of G. hungarica, albeit with the recognition that a greater understanding of this system could alter this taxonomy substantially.

1.4.2 The functional role of endosymbiotic Gellertiella

The results presented in Figure 1.3 provide the first genome-level evidence to suggest that the symbiotic members of Gellertiella act as a provisioner of the B vitamin complex for their hosts. This is consistent with the hypothesis that nutrient provisioning is a role adopted by distinct groups of leech endosymbionts and that this niche is not simply restricted to the gammaproteobacterial symbiont of Haementeria officinalis [20]. To my knowledge, this represents the first known case of an alphaproteobacterial nutritional symbiont being found outside of the order Rickettsiales (a group composed of bacteria already specialized for an intracellular lifestyle [95]), with the known cases representing symbionts of some species of ticks (originating from the genus Rickettsia [96, 97]) and the Wolbachia symbionts of Cimex lectularius (bed bug) [98]. These results also suggest that a more in-depth analysis of the B vitamin provisioning capabilities of the Rickettsia known to infect other genera of Glossiphoniiformes may prove fruitful in solidifying the evidence for convergence among disparate leech hosts and their (similarly, if not more disparate) endosymbionts [99]. As stated in the results (see subsection 1.3.4), Gellertiella, with some exceptions, retains a largely complete genetic repertoire for the de novo synthsis of B vitamins (with the notable exception of B12 [cobalamin]). This places the endosymbiotic members of Gellertiella in a polyphyletic group of bacteria which, primarily based on genomic evidence, retain the ability to synthesize B vitamins (presumably for their eukaryotic hosts) despite possessing a reduced genome compared to free-living relatives [4, 97, 98, 100–102]. That being said, although these pathways remain largely conserved across various endosymbiotic strains of Gellertiella, some losses (or putative absences) warrant further discussion. Regarding phoA, it seems likely that its loss is compensated for by a general phosphatase found elsewhere in the genome (for a discussion Chapter 1. Reichenowia 17 of phosphatase specificity, see [103]). PhoD [104] is a potential candidate for this replacement and is a relatively closely related phosphatase found in many bacteria. Sequences with weak similarity (2e−33 - 6e−11) to phoD were found in the taxa G. hungarica, Pmon, Pmul, and Prug, likely representing homologs of smc03243 [105]. Similarly, the genes ybjI and yigB function as phosphatases also, but represent members of the haloacid dehalogenase superfamily [106]. The step in the biosynthesis of riboflavin catalyzed by these enzymes has only been characterized quite recently and was shown in E. coli [106] so it is likely that this is a case where the enzyme used to catalyze this step in distantly related taxa is not yet known. This presumption is also supported by the fact that the haloacid dehalogenase superfamily is notoriously promiscuous with regards to its substrate specificity [103] and so I hypothesized that another member of this superfamily may be conserved (and potentially catalyzing this step) across Gellertiella and Reichenowia. This is why serB has been annotated as possibly catalyzing this step in Figure 1.3. Although it is likely that phoA is compensated for by a more general phosphatase (see above), the situation with pabA and pabC is less clear. The trpG subunit shares a common evolutionary origin with pabA [107] with both genes encoding a glutamine amidotransferase that combine with trpE (in the case of trpG) or pabB (in the case of pabA) to generate anthranilate or aminodeoxychorismate (a precursor of para-aminobenzoic acid), respectively [108]. In Bacillus subtilis the pabA subunit tends to be rather promiscuous and can form a holoenzyme with either trpE or pabB [109, 110]. Although speculative, this promiscuity might also occur in the homologous trpG which is retained in the form of a trpEG fusion protein (as described in [111]) in each strain examined. Some absences of putatively necessary genes have been demonstrated in other symbiotic systems previ- ously. The absence of phoA in all strains of Gellertiella and the absence (or truncated presence) of nadA, nadB, and/or nadC in Pmul, Prug, and Psp (for a detailed breakdown, see subsection 1.3.4) are of particular note in relation to the findings of [101] in which the primary endosymbionts of Melophagus ovinus (sheep louse), Glossina morsitans (tsetse fly), and Pediculus humanus (human louse) are observed to be missing phoA as well as nadABC [101]. In the case of M. ovinus and G. morsitans, secondary symbionts appear to make up for this loss, but this is not the case for P. humanus which does not harbour a secondary endosym- biont. In the case of ilvC, despite the fact that it appears to be a less productive replacement for panE in E. coli [112], this replacement has been noted previously in the aphid endosymbiont Buchnera aphidicola where its activity was confirmed with functional complementation assays in E. coli [113]. Given the variable presence of the potential bioI homolog, the sequences found are likely not (solely, at least) responsible for the generation of the pimeloyl moiety necessary for biotin production. However, the synthesis of biotin from the precursor 8-amino-7-oxononanoate is not unprecedented: evidence for this alternate route of production has been found in other endosymbionts (e.g., various strains of primary and secondary aphid symbionts [114]). These convergent losses show that, although the loss of these genes may still be detrimental to the host, there is precedence for metabolic pathways to be truncated in this manner in other nutrient provisioning endosymbionts. These ‘losses with precedent’ notwithstanding there are other intriguing possibilities regarding the fate of these missing steps. One such possibility is that genes such as ribA (missing in Psp and present, but truncated in Prug) have been horizontally transferred to the host which is then able to compensate for the loss or pseudogenization of the bacterial sequence. This phenomenon is far from unprecedented (e.g., [115–117] and see [118]) and has been observed in multiple endosymbiotic systems where genes of nutritional importance are found to be laterally transferred to the host: the most notable of these is perhaps the nested, tripartite mealybug system in which the host has acquired genes for biotin and riboflavin synthesis from bacteria that it is not currently associated with [119]. Another somewhat similar, but distinct, possibility is that a certain degree of metabolic redundancy exists between the bacterium and the host (i.e. the host is able to catalyze reactions carried out by enzymes which are not found in the more derived endosymbionts). Chapter 1. Reichenowia 18

Such a case has been observed in the aphid-Buchnera symbiosis in which the final step in a multitude of essential amino acid synthesis pathways is carried out via host genes which are up-regulated in bacteriocytes [120]. Interestingly, ribA, nadABC, and amaB represent the first step(s) of their respective pathways (or pathway branches in the case of amaB, see Figure 1.3). Yet another possibility entirely is that rather than an adaptive scenario, these endosymbionts have simply begun to lose essential elements of their genomes due to the stochastic nature of genome reduction in endosymbiotic taxa [121]. This phenomenon often leads to replacement of the symbiont (or the hosting of a co-obligate symbiont) as is demonstrated in [122] and [119], respectively. Although this study finds no evidence of a co-obligate symbiont in any of the species of Placobdella sampled (data not shown), the possibility that Gellertiella (or other unknown) endosymbionts have been replaced by the host over the course of evolutionary history cannot be ruled out (especially when considering the lack of evidence for co-speciation found in [24]). However, to discriminate between any of these hypotheses for the replacement of lineage-specific pathway holes, sequencing of the host species genomes will be necessary.

1.4.3 Phylogenomic placement of Gellertiella

The results of the phylogenetic analyses are consistent with the results of [16, 22, 23] and [24] insofar as all of these place Gellertiella (formerly Reichenowia) in the family Rhizobiaceae. In this study, I aimed to dras- tically increase the taxon sampling within this family to try and determine the sister group of Gellertiella, as this relationship has only been recovered with weak-to-moderate support in previous phylogenies [23, 24]. Despite my increased sampling, a strongly supported sister group for Gellertiella is not found. Instead, Gellertiella is recovered weakly as the sister to a clade containing Allorhizobium, Ciceribacter, Agrobac- terium, Neorhizobium, Psuedorhizobium, members of the ‘R. aggregatum complex [81], R. pseudoryzae, and R. rhizoryzae (MBS = 46-90%). This sister relationship differed slightly in the trees generated from the ribosomal dataset and were reconstructed using Bayesian methods; in this case, Gellertiella was found, again with relatively weak support, as the sister to a clade composed of Rhizobium sensu stricto which, in turn, was sister to the clade described above (MBS = 85%, PP = 95-98%). Perhaps the most exciting (yet not entirely unprecedented) result is the finding that Gellertiella hungarica, a culturable, free-living bacterium [83] forms a fully supported (and generally topologically consistent, see subsection 1.3.3) clade with its endosymbiotic counterparts (formerly Reichenowia). Although a free-living relative of the former Reichenowia has remained elusive until now, Manglicmot et al. 2020 [24] chose to include a 16S rRNA sequence from an enriched sludge sample [123] that bore high similarity to that of Reichenowia. Interestingly, the basal branches of the Gellertiella clade share the same topology as that found previously wherein Ppha is found as the sister to the remainder of the clade and within that sub- clade, the (putatively in the case of [24]) free-living relative is found as the sister to the remainder of the endosymbiotic taxa. Given the results presented herein, it is highly likely that this sequence isolated from the sludge sample represents Gellertiella hungarica, or another closely related free-living member of Gellertiella. Without a doubt, these results will have important implications for the further study of leech endosymbionts, particularly those of Placobdella spp., which are discussed fully in subsection 1.4.4.

1.4.4 The origin of endosymbiosis in Gellertiella

After longstanding speculation regarding the free-living relative of endosymbiotic Gellertiella (i.e. Re- ichenowia) [16, 22–24] the phylogenomic and similarity based results make clear the fact that the endosym- bionts of Placobdella analysed to date represent members of the genus Gellertiella. In the case of at least some of these taxa (e.g, Ppha) the similarity results suggest that they have descended from a free-living ancestor shared with G. hungarica and may, in fact, represent strains of G. hungarica rather than distinct Chapter 1. Reichenowia 19 species within the genus. However, pending further ecological data that supports a lack of hybridization between these taxa (e.g. evidence of strict, host-intraspecific, vertical transfer) I regard these taxa as strains of G. hungarica (see subsection 1.4.1). Knowledge of the free-living relative(s) of an endosymbiotic taxon is immensely important for a variety of reasons. Chief among these is the fact that the genomic structure of obligate endosymbionts is inherently derived in nature and no known examples exist of an endosymbiotic taxon re-establishing the genomic architecture and repertoire necessary for a fully free-living life-history [9, 124]. For this reason, an understanding of the free-living relatives can provide a greater understanding of the ‘genetic background’ from which these endosymbiotic relatives have evolved from, easing the process of comparative analyses and providing testable predictions regarding the metabolic limits of the more derived relatives. Additionally, one of the major questions in the field of endosymbiosis research is what processes and patterns are observed during the transition from a free-living bacterium to an intracellular endosymbiont [2]. In this regard, systems in which putatively evolutionarily recent examples of transitions to endosymbiosis are present along with their free-living relatives are of great utility. Indeed, this appears to be the case within Gellertiella, where members such as G. hungarica Ppha bear a high similarity (and phylogenetic affinity) to G. hungarica (see subsection 1.3.2). Furthermore, what is observed across the taxa sampled is a gradient of genome reduction (Table 1.3) (preliminarily confirmed on a functional level via the annotation of B vitamin synthesis pathways Figure 1.3), which is suggestive of instances of symbiosis in Gellertiella that vary in their time of establisment (i.e., with the most derived symbionts representing the oldest, most stable instances of symbiosis in the genus and vice versa with the taxa possessing the most ‘free-living-like’ genomes). Another striking feature of this system is the extensive morphological modification that has occurred between even the most genomically similar members of Gellertiella. G. hungarica displays a rather typical morphology in that it is rod-shaped and motile via a single, polar flagellum [83]. In contrast, the endosymbiotic members of Gellertiella are pleomorphic and non-motile (no flagella are observed) ([24], unpublished data). Another tantalizing suggestion from my results is the possibility that Gellertiella has evolved to fill a symbiotic niche with Placobdella on more than one occasion (i.e., convergent genome reduction/degradation). This result is suggested by the topologies recovered in the phylogenomic analyses in which Ppha is recovered as the sister to a clade containing G. hungarica and the remainder of endosymbiotic Gellertiella (G. hungarica is the sister to this remainder). These topologies are suggestive of at least two independent origins of endosymbiosis within the genus Gellertiella with both potentially stemming from an ancestor of G. hungarica. In this scenario, the first instance is represented by G. phalera alone whereas the second instance presumably involves the remainder of endosymbiotic Gellertiella sampled herein. This would be very similar (though less extensive compared) to the scenario observed in [125], with one notable distinction being that the functional basis of the Placobdella-Gellertiella is thought to be understood in greater detail than that of the Euplotes-Polynucleobacter symbiosis. In this way, it could be that Gellertiella represents a group of ‘symbiotically primed’ taxa similar to what has been observed in [126]. While this system bears potentially superficial similarities to those of the Pentatomid stinkbugs [127], Trichonympha protists [128], and Coxiella tick symbionts [129], it would be distinct from these examples in that the various lineages of Gellertiella possess genomes in drastically different stages of reduction/degredation. Owing to the derived nature of these genomes, I can more or less exclude the possibility that the ances- tor of Gellertiella was similar to Ppha (i.e., a relatively low predicted coding density driven by pervasive pseudogenization) and that the genome of G. hungarica was ‘reformed’ from these fragmentary elements. It is, however, important to note that drawing a direct comparison between the age of an endosymbiont and the level of genome reduction and/or degredation it displays can be a vast oversimplification of the processes involved (for example, [130–132]). This appears to be especially true in aquatic systems (of which ours repre- sents an example) where the horizontal transfer of symbionts is relatively common (e.g.,[133]) and can result in genomes that are intermediate in size regardless of the age of the association [11]. While the possibility Chapter 1. Reichenowia 20 that endosymbiosis has originated once in Gellertiella cannot (yet) be ruled out, it is important to note that the typical phylogenetic artefacts expected to be observed in endosymbionts (e.g., long branch attraction [134]) would be expected to draw these two endosymbiotic groups together rather than pull them apart as is observed in the recovered topologies. In accordance with these predictions is the results presented from the various topology tests performed (shown in Table A.5). For each maximum likelihood tree reconstructed, its constrained counterpart which necessarily rendered the endosymbiotic Gellertiella (i.e., Reichenowia) strains monophyletic, was found to be a significantly worse explanation of the data. The one exception to this was the ribosomal, unpartitioned, nucleotide coded dataset in which both the original and constrained tree were accepted. However, given the results discussed above (see subsection 1.3.3) this finding was not surprising.

1.4.5 Outstanding questions and future directions

Proximately, I hope to curate full genome annotations for each strain of Gellertiella analayzed in this study. This will make it possible to assess aspects of the genome like coding density and the full-cell predicted metabolism which are currently unknown for these symbionts. I would also like to add P. costata (a palearctic Placobdella) to our dataset if possible to encapsulate a more complete understanding of the endosymbionts across Placobdella’s range. Given the wide range of environments Gellertiella appears to be present in, and the vast geogephic distance between P. costata and its nearctic relatives, it would be unsurprising if this species harboured a symbiont that has originated independently of the other strains analyzed in this study. Another crucial piece of data that will help to further our understanding of this system is full genome sequences for each host species. The phylogenetic relationships within Placobdella are still not understood in great detail, possibly due to the genetic markers used thus far. Obtaining genomes for these species would allow us to more confidently assess the possibility of strict co-speciation between Placobdella and Gellertiella. In addition, the loss of genes such as ribA in Psp and Prug (which are presumably crucial in maintaining an optimally functioning symbiosis) could be compensated for by host sequences, but without host genomes ruling out this possibility will remain difficult. Ultimately, there are other key questions which remain enigmatic with regards to even the basic biology of the system at hand. One such example is the method of transmission of Gellertiella symbionts. Fully annotated genomes might point us in the right direction: for example, Psp appears to show at least some of the hallmarks of an obligate, vertically transmitted symbiont, but ecological studies will be necessary to corroborate these findings. Another outstanding question is the degree to which populations of Gellertiella differ from each other (either within a bacteriome, an individual leech, or some environmental pool). Beyond these questions, the emerging Placobdella-Gellertiella system is likely an exceptional one in that the evidence presented here points rather strongly towards multiple, potentially recent, origins of endosym- biosis within Placobdella. Furthermore, a close relative of these symbionts has been effectively cultured and is available as part of the German Collection of Microorganisms and Cell Cultures facilitating future experimentation. Since these endosymbionts (for the most part) display genomes that are consistent with an evolutionarily young origin, Gellertiella may prove to be a taxon that plays a crucial role in furthering our understanding of the early stages of enodsymbiosis for which readily available examples remain scarce. Bibliography

1. McCutcheon, J. P. & Moran, N. A. Extreme genome reduction in symbiotic bacteria. Nature Reviews Microbiology 10, 13–26. issn: 1740-1526, 1740-1534. doi:10.1038/nrmicro2670 (Jan. 2012). 2. Wernegreen, J. J. Endosymbiosis. Current Biology 22, R555–R561. issn: 09609822. doi:10.1016/j. cub.2012.06.010 (July 2012). 3. Yang, J. C. et al. The Complete Genome of Teredinibacter turnerae T7901: An Intracellular En- dosymbiont of Marine Wood-Boring Bivalves (Shipworms). PLoS ONE 4 (ed Ahmed, N.) e6085. issn: 1932-6203. doi:10.1371/journal.pone.0006085 (July 2009). 4. Smith, T. A., Driscoll, T., Gillespie, J. J. & Raghavan, R. A Coxiella-Like Endosymbiont Is a Potential Vitamin Source for the Lone Star Tick. Genome Biology and Evolution 7, 831–838. issn: 1759-6653. doi:10.1093/gbe/evv016 (Mar. 2015). 5. Manzano-Mar´ın,A., Simon, J.-C. & Latorre, A. Reinventing the Wheel and Making It Round Again: Evolutionary Convergence in Buchnera–Serratia Symbiotic Consortia between the Distantly Related Lachninae Aphids Tuberolachnus salignus and Cinara cedri. Genome Biology and Evolution 8. Pub- lisher: Oxford Academic, 1440–1458. doi:10.1093/gbe/evw085 (May 2016). 6. Moran, N. A. & Wernegreen, J. J. Lifestyle evolution in symbiotic bacteria: insights from genomics. Trends in Ecology & Evolution 15, 321–326. issn: 01695347. doi:10.1016/S0169-5347(00)01902-9 (Aug. 2000). 7. Nogge, G. Significance of symbionts for the maintenance of an optimal nutritional state for successful reproduction in hematophagous arthropod. Parasitology 82, 101–104 (Sept. 1980). 8. Latorre, A. & Manzano-Mar´ın,A. Dissecting genome reduction and trait loss in insect endosymbionts. Annals of the New York Academy of Sciences 1389, 52–75. issn: 00778923. doi:10.1111/nyas.13222 (Feb. 2017). 9. Bennett, G. M. & Moran, N. A. Heritable symbiosis: The advantages and perils of an evolutionary rabbit hole. Proceedings of the National Academy of Sciences 112, 10169–10176. issn: 0027-8424, 1091-6490. doi:10.1073/pnas.1421388112 (Aug. 2015). 10. Hansen, A. K. & Moran, N. A. Altered tRNA characteristics and 30 maturation in bacterial symbionts with reduced genomes. Nucleic Acids Research 40, 7870–7884. issn: 1362-4962, 0305-1048. doi:10. 1093/nar/gks503 (Sept. 2012). 11. Russell, S. L., Pepper-Tunick, E., Svedberg, J., Byrne, A., Ruelas Castillo, J., Vollmers, C., Beinart, R. A. & Corbett-Detig, R. Horizontal transmission and recombination maintain forever young bacterial symbiont genomes. PLOS Genetics 16 (ed Didelot, X.) e1008935. issn: 1553-7404. doi:10 . 1371 / journal.pgen.1008935 (Aug. 2020).

21 BIBLIOGRAPHY 22

12. McCutcheon, J. P. & von Dohlen, C. D. An Interdependent Metabolic Patchwork in the Nested Symbiosis of Mealybugs. Current Biology 21, 1366–1372. issn: 09609822. doi:10.1016/j.cub.2011. 06.051 (Aug. 2011). 13. Graf, J., Kikuchi, Y. & Rio, R. V. Leeches and their microbiota: naturally simple symbiosis models. Trends in Microbiology 14, 365–371. issn: 0966842X. doi:10.1016/j.tim.2006.06.009 (Aug. 2006). 14. Tessler, M., de Carle, D., Voiklis, M. L., Gresham, O. A., Neumann, J. S., Cios, S. & Siddall, M. E. Worms that suck: Phylogenetic analysis of Hirudinea solidifies the position of Acanthobdellida and necessitates the dissolution of . Molecular Phylogenetics and Evolution 127, 129–134. issn: 10557903. doi:10.1016/j.ympev.2018.05.001 (Oct. 2018). 15. Kikuchi, Y., Bomar, L. & Graf, J. Stratified bacterial community in the bladder of the medicinal leech, Hirudo verbana. Environmental Microbiology 11, 2758–2770. issn: 14622912, 14622920. doi:10.1111/ j.1462-2920.2009.02004.x (Oct. 2009). 16. Perkins, S. L., Budinoff, R. B. & Siddall, M. E. New Gammaproteobacteria Associated with Blood- Feeding Leeches and a Broad Phylogenetic Analysis of Leech Endosymbionts. Applied and Environmen- tal Microbiology 71, 5219–5224. issn: 0099-2240, 1098-5336. doi:10.1128/AEM.71.9.5219-5224.2005 (Sept. 2005). 17. Kikuchi, Y. & Fukatsu, T. Rickettsia Infection in Natural Leech Populations. Microbial Ecology 49, 265–271. issn: 0095-3628, 1432-184X. doi:10.1007/s00248-004-0140-5 (Feb. 2005). 18. Siddall, M. E., Budinoff, R. B. & Borda, E. Phylogenetic evaluation of systematics and biogeography of the leech family . Invertebrate Systematics 19, 105–112. issn: 1445-5226. doi:10. 1071/IS04034 (2005). 19. Kikuchi, Y. & Fukatsu, T. Endosymbiotic Bacteria in the Esophageal Organ of Glossiphoniid Leeches. Applied and Environmental Microbiology 68, 4637–4641. issn: 0099-2240, 1098-5336. doi:10.1128/ AEM.68.9.4637-4641.2002 (Sept. 2002). 20. Manzano-Mar´ın,A., Oceguera-Figueroa, A., Latorre, A., Jim´enez-Garc´ıa,L. F. & Moya, A. Solving a Bloody Mess: B-Vitamin Independent Metabolic Convergence among Gammaproteobacterial Obli- gate Endosymbionts from Blood-Feeding Arthropods and the Leech Haementeria officinalis. Genome Biology and Evolution 7, 2871–2884. issn: 1759-6653. doi:10.1093/gbe/evv188 (Oct. 2015). 21. De Carle, D., Oceguera-Figueroa, A., Tessler, M., Siddall, M. E. & Kvist, S. Phylogenetic analysis of Placobdella (Hirudinea: Rhynchobdellida: Glossiphoniidae) with consideration of COI variation. Molecular Phylogenetics and Evolution 114, 234–248. issn: 10557903. doi:10.1016/j.ympev.2017. 06.017 (Sept. 2017). 22. Siddall, M. E., Perkins, S. L. & Desser, S. S. Leech mycetome endosymbionts are a new lineage of alphaproteobacteria related to the Rhizobiaceae. Molecular Phylogenetics and Evolution 30, 178–186. issn: 10557903. doi:10.1016/S1055-7903(03)00184-2 (Jan. 2004). 23. Kvist, S., Narechania, A., Oceguera-Figueroa, A., Fuks, B. & Siddall, M. E. Phylogenomics of Re- ichenowia parasitica, an Alphaproteobacterial Endosymbiont of the Freshwater Leech Placobdella par- asitica. PLoS ONE 6 (ed Planet, P. J.) e28192. issn: 1932-6203. doi:10.1371/journal.pone.0028192 (Nov. 2011). 24. Manglicmot, C., Oceguera-Figueroa, A. & Kvist, S. Bacterial endosymbionts of Placobdella (Annelida: Hirudinea: Glossiphoniidae): phylogeny, genetic distance, and vertical transmission. Hydrobiologia 847, 1177–1194. issn: 0018-8158, 1573-5117. doi:10.1007/s10750-019-04175-z (Feb. 2020). BIBLIOGRAPHY 23

25. Sadowsky, M. J. & Graham, P. H. in The Rhizobiaceae (eds Spaink, H. P., Kondorosi, A. & Hooykaas, P. J. J.) 155–172 (Springer Netherlands, Dordrecht, 1998). isbn: 978-0-7923-5180-1 978-94-011-5060-6. doi:10.1007/978-94-011-5060-6_8. 26. Bright, M. & Bulgheresi, S. A complex journey: transmission of microbial symbionts. Nature Reviews Microbiology 8, 218–230. issn: 1740-1526, 1740-1534. doi:10.1038/nrmicro2262 (Mar. 2010). 27. Klemm, D. J. Leeches (Annelida: Hirudinea) of North America (Environmental Monitoring and Sup- port Laboratory, U.S. Environmental Protection Agency, Cincinnati, Ohio, 1982). 28. Sawyer, R. T. Leech Biology and Behaviour (Oxford Science Publications, 2/8 Morfa Road, Swansea, SA1 2HT, UK, 1986). 29. Moser, W. E., Richardson, D. J., Wheeler, B. A., Irwin, K. J., Daniels, B. A., Trauth, S. E. & Klemm, D. J. Placobdella cryptobranchii (Rhynchobdellida: Glossiphoniidae) on Cryptobranchus alleganiensis bishopi (Ozark Hellbender) in Arkansas and Missouri. Comparative Parasitology 75, 98–101. issn: 1525-2647. doi:10.1654/4300.1 (Jan. 2008). 30. Moser, W. E., Richardson, D. J., Hammond, C. I. & Lazo-Wasem, E. A. Redescription of Placob- della ornata (Verrill, 1872) (Hirudinida: Glossiphoniidae). Bulletin of the Peabody Museum of Natural History 53, 325–330. issn: 0079-032X. doi:10.3374/014.053.0103 (Apr. 2012). 31. Moser, W. E., Richardson, D. J., Hammond, C. I. & Lazo-Wasem, E. A. Redescription of Placobdella papillifera Verrill, 1872 (Hirudinida: Glossiphoniidae). Bulletin of the Peabody Museum of Natural History 54, 125–131. issn: 0079-032X, 2162-4135. doi:10.3374/014.054.0105 (Apr. 2013). 32. Moser, W. E., Richardson, D. J., Hammond, C. I. & Lazo-Wasem, E. A. Redescription of Placobdella parasitica (Say, 1824) Moore, 1901 (Hirudinida: Glossiphoniidae). Bulletin of the Peabody Museum of Natural History 54, 255–262. issn: 0079-032X, 2162-4135. doi:10.3374/014.054.0203 (Oct. 2013). 33. Moser, W. E., Richardson, D. J., Hammond, C. I. & Lazo-Wasem, E. A. Redescription and Molecular Characterization of Placobdella hollensis (Whitman, 1892) (Hirudinida: Glossiphoniidae). Bulletin of the Peabody Museum of Natural History 55, 49–54. issn: 0079-032X, 2162-4135. doi:10.3374/014. 055.0104 (Apr. 2014). 34. Moser, W. E., Richardson, D. J., Hammond, C. I., Govedich, F. R. & Lazo-Wasem, E. A. Resurrection and Redescription of Placobdella rugosa (Verrill, 1874) (Hirudinida: Glossiphoniidae). Bulletin of the Peabody Museum of Natural History 53, 375–381. issn: 0079-032X. doi:10.3374/014.053.0203 (Oct. 2012). 35. Schmieder, R. & Edwards, R. Quality control and preprocessing of metagenomic datasets. Bioinfor- matics 27, 863–864. issn: 1367-4803, 1460-2059. doi:10.1093/bioinformatics/btr026 (Mar. 2011). 36. Bankevich, A. et al. SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing. Journal of Computational Biology 19, 455–477. issn: 1066-5277, 1557-8666. doi:10.1089/ cmb.2012.0021 (May 2012). 37. Wick, R. R., Schultz, M. B., Zobel, J. & Holt, K. E. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics 31, 3350–3352. issn: 1367-4803, 1460-2059. doi:10.1093/bioinformatics/ btv383 (Oct. 2015). 38. Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nature Methods 9, 357– 359. issn: 1548-7091, 1548-7105. doi:10.1038/nmeth.1923 (Apr. 2012). 39. Seemann, T. Prokka: rapid prokaryotic genome annotation. Bioinformatics 30, 2068–2069. issn: 1367- 4803, 1460-2059. doi:10.1093/bioinformatics/btu153 (July 2014). BIBLIOGRAPHY 24

40. Okonechnikov, K., Golosova, O. & Fursov, M. Unipro UGENE: a unified bioinformatics toolkit. Bioin- formatics 28, 1166–1167. issn: 1367-4803, 1460-2059. doi:10.1093/bioinformatics/bts091 (Apr. 2012). 41. Ogata, H., Goto, S., Sato, K., Fujibuchi, W., Bono, H. & Kanehisa, M. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Research 27, 29–34 (1999). 42. Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic Local Alignment Search Tool. Journal of Molecular Biology 215, 403–410 (1990). 43. Quevillon, E., Silventoinen, V., Pillai, S., Harte, N., Mulder, N., Apweiler, R. & Lopez, R. Inter- ProScan: protein domains identifier. Nucleic Acids Research 33, W116–W120. issn: 0305-1048, 1362- 4962. doi:10.1093/nar/gki442 (July 2005). 44. Karp, P. D. et al. The BioCyc collection of microbial genomes and metabolic pathways. Briefings in Bioinformatics 20, 1085–1093. issn: 1467-5463, 1477-4054. doi:10.1093/bib/bbx085 (July 2019). 45. Mu˜noz-G´omez,S. A., Hess, S., Burger, G., Lang, B. F., Susko, E., Slamovits, C. H. & Roger, A. J. An updated phylogeny of the Alphaproteobacteria reveals that the parasitic Rickettsiales and Holosporales have independent origins. eLife 8 (eds Rokas, A., Wittkopp, P. J. & Irisarri, I.) Publisher: eLife Sciences Publications, Ltd, e42535. issn: 2050-084X. doi:10.7554/eLife.42535 (Feb. 2019). 46. Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biology 20, 238. issn: 1474-760X. doi:10.1186/s13059-019-1832-y (Dec. 2019). 47. Katoh, K. & Standley, D. M. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Molecular Biology and Evolution 30, 772–780. issn: 0737-4038, 1537- 1719. doi:10.1093/molbev/mst010 (Apr. 2013). 48. Suyama, M., Torrents, D. & Bork, P. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Research 34, W609–W612. issn: 0305-1048, 1362-4962. doi:10.1093/nar/gkl315 (July 2006). 49. Castresana, J. Selection of Conserved Blocks from Multiple Alignments for Their Use in Phylogenetic Analysis. Molecular Biology and Evolution 17, 540–552. issn: 1537-1719, 0737-4038. doi:10.1093/ oxfordjournals.molbev.a026334 (Apr. 2000). 50. Miller, M. A., Pfeiffer, W. & Schwartz, T. The CIPRES science gateway: a community resource for phylogenetic analyses in Proceedings of the 2011 TeraGrid Conference on Extreme Digital Discovery - TG ’11 (ACM Press, Salt Lake City, Utah, 2011), 1. isbn: 978-1-4503-0888-5. doi:10.1145/2016741. 2016785. 51. Loken, C. et al. SciNet: Lessons Learned from Building a Power-efficient Top-20 System and Data Centre. Journal of Physics: Conference Series 256, 012026. issn: 1742-6596. doi:10.1088/1742- 6596/256/1/012026 (Nov. 2010). 52. Maddison, D. R., Swofford, D. L. & Maddison, W. P. Nexus: An Extensible File Format for Systematic Information. Systematic Biology 46 (ed Cannatella, D.) 590–621. issn: 1076-836X, 1063-5157. doi:10. 1093/sysbio/46.4.590 (Dec. 1997). 53. Cock, P. J. A. et al. Biopython: freely available Python tools for computational molecular biol- ogy and bioinformatics. Bioinformatics 25, 1422–1423. issn: 1367-4803, 1460-2059. doi:10 . 1093 / bioinformatics/btp163 (June 2009). 54. Glez-Pe˜no,D., G´omez-Blanco, D., Reboiro-Jato, M., Fdez-Riverola, F. & Posada, D. ALTER: program- oriented conversion of DNA and protein alignments. Nucleic Acids Research 38, W14–W18. issn: 1362-4962, 0305-1048. doi:10.1093/nar/gkq321 (July 2010). BIBLIOGRAPHY 25

55. Lanfear, R., Frandsen, P. B., Wright, A. M., Senfeld, T. & Calcott, B. PartitionFinder 2: New Methods for Selecting Partitioned Models of Evolution for Molecular and Morphological Phylogenetic Analyses. Molecular Biology and Evolution 34, 772–773. issn: 0737-4038, 1537-1719. doi:10 . 1093 / molbev / msw260 (Dec. 2016). 56. Nguyen, L.-T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies. Molecular Biology and Evo- lution 32, 268–274. issn: 1537-1719, 0737-4038. doi:10.1093/molbev/msu300 (Jan. 2015). 57. Ronquist, F. & Huelsenbeck, J. P. MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics 19, 1572–1574. issn: 1367-4803, 1460-2059. doi:10.1093/bioinformatics/btg180 (Aug. 2003). 58. Rambaut, A., Drummond, A. J., Xie, D., Baele, G. & Suchard, M. A. Posterior Summarization in Bayesian Phylogenetics Using Tracer 1.7. Systematic Biology 67 (ed Susko, E.) 901–904. issn: 1063- 5157, 1076-836X. doi:10.1093/sysbio/syy032 (Sept. 2018). 59. Lagesen, K., Hallin, P., Rødland, E. A., Stærfeldt, H.-H., Rognes, T. & Ussery, D. W. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Research 35, 3100–3108. issn: 1362-4962, 0305-1048. doi:10.1093/nar/gkm160 (May 2007). 60. Larsson, A. AliView: a fast and lightweight alignment viewer and editor for large datasets. Bioin- formatics 30, 3276–3278. issn: 1460-2059, 1367-4803. doi:10.1093/bioinformatics/btu531 (Nov. 2014). 61. Rodriguez-R, L. M. & Konstantinidis, K. T. The enveomics collection: a toolbox for specialized analyses of microbial genomes and metagenomes preprint (PeerJ Preprints, Mar. 2016). doi:10.7287/peerj. preprints.1900v1. 62. Lagkouvardos, I., Joseph, D., Kapfhammer, M., Giritli, S., Horn, M., Haller, D. & Clavel, T. IMNGS: A comprehensive open resource of processed 16S rRNA microbial profiles for ecology and diversity studies. Scientific Reports 6, 33721. issn: 2045-2322. doi:10.1038/srep33721 (Dec. 2016). 63. Jain, C., Rodriguez-R, L. M., Phillippy, A. M., Konstantinidis, K. T. & Aluru, S. High throughput ANI analysis of 90K prokaryotic genomes reveals clear species boundaries. Nature Communications 9, 5114. issn: 2041-1723. doi:10.1038/s41467-018-07641-9 (Dec. 2018). 64. Rossell´o-M´ora,R. & Amann, R. Past and future species definitions for Bacteria and Archaea. Sys- tematic and Applied Microbiology 38, 209–216. issn: 07232020. doi:10.1016/j.syapm.2015.02.001 (June 2015). 65. Konstantinidis, K. T., Rossell´o-M´ora,R. & Amann, R. Uncultivated microbes in need of their own taxonomy. The ISME Journal 11, 2399–2406. issn: 1751-7362, 1751-7370. doi:10.1038/ismej.2017. 113 (Nov. 2017). 66. Yarza, P., Yilmaz, P., Pruesse, E., Gl¨ockner, F. O., Ludwig, W., Schleifer, K.-H., Whitman, W. B., Euz´eby, J., Amann, R. & Rossell´o-M´ora,R. Uniting the classification of cultured and uncultured bacteria and archaea using 16S rRNA gene sequences. Nature Reviews Microbiology 12, 635–645. issn: 1740-1526, 1740-1534. doi:10.1038/nrmicro3330 (Sept. 2014). 67. Chun, J. et al. Proposed minimal standards for the use of genome data for the taxonomy of prokaryotes. International Journal of Systematic and Evolutionary Microbiology 68, 461–466. issn: 1466-5026, 1466-5034. doi:10.1099/ijsem.0.002516 (Jan. 2018). 68. De Souza, J. A. M., Alves, L. M. C. & de Mello Varani, A. in The Prokaryotes 4th ed., 135–154 (Springer, Berlin, Heidelberg, 2014). isbn: 978-3-642-30197-1. BIBLIOGRAPHY 26

69. Oren, A. & Xu, X.-W. in The Prokaryotes 4th ed., 247–281 (Springer, Berlin, Heidelberg, 2014). isbn: 978-3-642-30197-1. 70. Li, X., Salam, N., Li, J.-L., Chen, Y.-M., Yang, Z.-W., Han, M.-X., Mou, X., Xiao, M. & Li, W.-J. Aes- tuariivirga litoralis gen. nov., sp. nov., a proteobacterium isolated from a water sample, and proposal of Aestuariivirgaceae fam. nov. International Journal of Systematic and Evolutionary Microbiology 69, 299–306. issn: 1466-5026, 1466-5034. doi:10.1099/ijsem.0.003087 (Feb. 2019). 71. Webb, H. K., Ng, H. J. & Ivanova, E. P. in The Prokaryotes 4th ed., 341–347 (Springer, Berlin, Heidelberg, 2014). isbn: 978-3-642-30197-1. 72. Rainey, F. A. & Wiegel, J. 16S Ribosomal DNA Sequence Analysis Confirms the Close Relationship between the Genera Xanthobacter, Azorhizobium, and Aquabacter and Reveals a Lack of Phylogenetic Coherence among Xanthobacter Species. International Journal of Systematic Bacteriology 46, 607– 610. issn: 0020-7713, 1465-2102. doi:10.1099/00207713-46-2-607 (Apr. 1996). 73. Duo, J.-L., Cha, Q.-Y., Zhou, X.-K., Zhang, T.-K., Qin, S.-C., Yang, P.-X., Zhu, M.-L., Mo, M. H. & Duan, Y.-Q. Aquabacter cavernae sp. nov., a bacterium isolated from cave soil. International Journal of Systematic and Evolutionary Microbiology 69, 3716–3722. issn: 1466-5026, 1466-5034. doi:10.1099/ ijsem.0.003585 (Dec. 2019). 74. Parks, D. H., Chuvochina, M., Waite, D. W., Rinke, C., Skarshewski, A., Chaumeil, P.-A. & Hugen- holtz, P. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nature Biotechnology 36, 996–1004. issn: 1087-0156, 1546-1696. doi:10.1038/nbt.4229 (Nov. 2018). 75. Parks, D. H., Chuvochina, M., Chaumeil, P.-A., Rinke, C., Mussig, A. J. & Hugenholtz, P. A complete domain-to-species taxonomy for Bacteria and Archaea. Nature Biotechnology 38, 1079–1086. issn: 1087-0156, 1546-1696. doi:10.1038/s41587-020-0501-8 (Sept. 2020). 76. Leclercq, S. O., Cloeckaert, A. & Zygmunt, M. S. Taxonomic Organization of the Family Brucellaceae Based on a Phylogenomic Approach. Frontiers in Microbiology 10, 3083. issn: 1664-302X. doi:10. 3389/fmicb.2019.03083 (Jan. 2020). 77. Leonard, M. T., Fagen, J. R., Davis-Richardson, A. G., Davis, M. J. & Triplett, E. W. Complete genome sequence of Liberibacter crescens BT-1. Standards in Genomic Sciences 7, 271–283. issn: 1944-3277. doi:10.4056/sigs.3326772 (Dec. 2012). 78. Fagen, J. R., Leonard, M. T., Coyle, J. F., McCullough, C. M., Davis-Richardson, A. G., Davis, M. J. & Triplett, E. W. Liberibacter crescens gen. nov., sp. nov., the first cultured member of the genus Liberibacter. International Journal of Systematic and Evolutionary Microbiology 64, 2461–2466. issn: 1466-5026, 1466-5034. doi:10.1099/ijs.0.063255-0 (July 2014). 79. Ferla, M. P., Thrash, J. C., Giovannoni, S. J. & Patrick, W. M. New rRNA Gene-Based Phylogenies of the Alphaproteobacteria Provide Perspective on Major Groups, Mitochondrial Ancestry and Phy- logenetic Instability. PLoS ONE 8 (ed Badger, J. H.) e83383. issn: 1932-6203. doi:10.1371/journal. pone.0083383 (Dec. 2013). 80. Fagen, J. R., Leonard, M. T., McCullough, C. M., Edirisinghe, J. N., Henry, C. S., Davis, M. J. & Triplett, E. W. Comparative Genomics of Cultured and Uncultured Strains Suggests Genes Essential for Free-Living Growth of Liberibacter. PLoS ONE 9 (ed Parkinson, J.) e84469. issn: 1932-6203. doi:10.1371/journal.pone.0084469 (Jan. 2014). BIBLIOGRAPHY 27

81. Mousavi, S. A., Willems, A., Nesme, X., de Lajudie, P. & Lindstr¨om,K. Revised phylogeny of Rhi- zobiaceae: Proposal of the delineation of Pararhizobium gen. nov., and 13 new species combinations. Systematic and Applied Microbiology 38, 84–90. issn: 07232020. doi:10.1016/j.syapm.2014.12.003 (Mar. 2015). 82. Lassalle, F. et al. Phylogenomics reveals the basis of adaptation of Pseudorhizobium species to extreme environments and supports a taxonomic revision of the genus. Systematic and Applied Microbiology 44, 126165. issn: 07232020. doi:10.1016/j.syapm.2020.126165 (Jan. 2021). 83. T´oth,E., Szur´oczki,S., K´eki,Z., B´oka, K., Szili-Kov´acs,T. & Schumann, P. Gellertiella hungarica gen. nov., sp. nov., a novel bacterium of the family Rhizobiaceae isolated from a spa in Budapest. International Journal of Systematic and Evolutionary Microbiology 67, 4565–4571. issn: 1466-5026, 1466-5034. doi:10.1099/ijsem.0.002332 (Nov. 2017). 84. International Code of Nomenclature of Prokaryotes: Prokaryotic Code (2008 Revision). International Journal of Systematic and Evolutionary Microbiology 69, S1–S111. issn: 1466-5026, 1466-5034. doi:10. 1099/ijsem.0.000778 (Jan. 2019). 85. Martinez-Gomez, N. C., Palmer, L. D., Vivas, E., Roach, P. L. & Downs, D. M. The Rhodanese Domain of ThiI Is Both Necessary and Sufficient for Synthesis of the Thiazole Moiety of Thiamine in Salmonella enterica. Journal of Bacteriology 193, 4582–4587. issn: 0021-9193. doi:10.1128/JB.05325-11 (Sept. 2011). 86. Foster, J. W., Kinney, D. M. & Moat, A. G. Pyridine nucleotide cycle of Salmonella typhimurium: isolation and characterization of pncA, pncB, and pncC mutants and utilization of exogenous nicoti- namide adenine dinucleotide. Journal of Bacteriology 137, 1165–1175. issn: 0021-9193, 1098-5530. doi:10.1128/JB.137.3.1165-1175.1979 (1979). 87. Foster, J. W. & Baskowsky-Foster, A. M. Pyridine nucleotide cycle of Salmonella typhimurium: in vivo recycling of nicotinamide adenine dinucleotide. Journal of Bacteriology 142, 1032–1035. issn: 0021-9193, 1098-5530. doi:10.1128/JB.142.3.1032-1035.1980 (1980). 88. Tyagi, R., Lee, Y.-T., Guddat, L. W. & Duggleby, R. G. Probing the mechanism of the bifunctional enzyme ketol-acid reductoisomerase by site-directed mutagenesis of the active site: Mutagenesis of the active site of E. coli KARI. FEBS Journal 272, 593–602. issn: 1742464X. doi:10.1111/j.1742- 4658.2004.04506.x (Jan. 2005). 89. L´opez-S´amano,M., Beltr´an,L. F. L.-A., S´anchez-Thomas, R., D´avalos, A., Villase˜nor,T., Garc´ıa- Garc´ıa,J. D. & Garc´ıa-de los Santos, A. A novel way to synthesize pantothenate in bacteria involves β-alanine synthase present in uracil degradation pathway. MicrobiologyOpen 9, e1006. issn: 2045-8827, 2045-8827. doi:10.1002/mbo3.1006 (Apr. 2020). 90. Lin, S., Hanson, R. E. & Cronan, J. E. Biotin synthesis begins by hijacking the fatty acid synthetic pathway. Nature Chemical Biology 6, 682–688. issn: 1552-4450, 1552-4469. doi:10.1038/nchembio. 420 (Sept. 2010). 91. Guill´en-Navarro, K., Encarnaci´on,S. & Dunn, M. F. Biotin biosynthesis, transport and utilization in . FEMS Microbiology Letters 246, 159–165. issn: 03781097, 15746968. doi:10.1016/j. femsle.2005.04.020 (May 2005). 92. Guill´en-Navarro, K., Ara´ıza,G., Garc´ıa-delos Santos, A., Mora, Y. & Dunn, M. F. The Rhizobium etli bioMNY operon is involved in biotin transport. FEMS Microbiology Letters 250, 209–219. issn: 03781097, 15746968. doi:10.1016/j.femsle.2005.07.020 (Sept. 2005). BIBLIOGRAPHY 28

93. Stok, J. E. & De Voss, J. J. Expression, Purification, and Characterization of BioI : A Carbon–Carbon Bond Cleaving Cytochrome P450 Involved in Biotin Biosynthesis in Bacillus subtilis. Archives of Biochemistry and Biophysics 384, 351–360. issn: 00039861. doi:10.1006/abbi.2000.2067 (Dec. 2000). 94. diCenzo, G. C., Benedict, A. B., Fondi, M., Walker, G. C., Finan, T. M., Mengoni, A. & Griffitts, J. S. Robustness encoded across essential and accessory replicons of the ecologically versatile bacterium Sinorhizobium meliloti. PLOS Genetics 14 (ed Casades´us,J.) e1007357. issn: 1553-7404. doi:10 . 1371/journal.pgen.1007357 (Apr. 2018). 95. Renvois´e, A., Merhej, V., Georgiades, K. & Raoult, D. Intracellular Rickettsiales: Insights into manip- ulators of eukaryotic cells. Trends in Molecular Medicine 17, 573–583. issn: 14714914. doi:10.1016/ j.molmed.2011.05.009 (Oct. 2011). 96. Binetruy, F., Buysse, M., Lejarre, Q., Barosi, R., Villa, M., Rahola, N., Paupy, C., Ayala, D. & Duron, O. Microbial community structure reveals instability of nutritional symbiosis during the evolutionary radiation of Amblyomma ticks. Molecular Ecology 29, 1016–1029. issn: 0962-1083, 1365-294X. doi:10. 1111/mec.15373 (Mar. 2020). 97. Hunter, D. J., Torkelson, J. L., Bodnar, J., Mortazavi, B., Laurent, T., Deason, J., Thephavongsa, K. & Zhong, J. The Rickettsia Endosymbiont of Ixodes pacificus Contains All the Genes of De Novo Folate Biosynthesis. PLOS ONE 10 (ed Pal, U.) e0144552. issn: 1932-6203. doi:10.1371/journal. pone.0144552 (Dec. 2015). 98. Nikoh, N., Hosokawa, T., Moriyama, M., Oshima, K., Hattori, M. & Fukatsu, T. Evolutionary origin of insect-Wolbachia nutritional mutualism. Proceedings of the National Academy of Sciences 111, 10257–10262. issn: 0027-8424, 1091-6490. doi:10.1073/pnas.1409284111 (July 2014). 99. Kikuchi, Y., Sameshima, S., Kitade, O., Kojima, J. & Fukatsu, T. Novel Clade of Rickettsia spp. from Leeches. Applied and Environmental Microbiology 68, 999–1004. issn: 0099-2240, 1098-5336. doi:10.1128/AEM.68.2.999-1004.2002 (Feb. 2002). 100. Husnik, F. Host–symbiont–pathogen interactions in blood-feeding parasites: nutrition, immune cross- talk and gene exchange. Parasitology 145, 1294–1303. issn: 0031-1820, 1469-8161. doi:10 . 1017 / S0031182018000574 (Sept. 2018). 101. Nov´akov´a,E., Husn´ık,F., Sochov´a,E.ˇ & Hypˇsa,V. Arsenophonus and Sodalis Symbionts in Louse Flies: an Analogy to the Wigglesworthia and Sodalis System in Tsetse Flies. Applied and Environmental Microbiology 81 (ed Goodrich-Blair, H.) 6189–6199. issn: 0099-2240, 1098-5336. doi:10.1128/AEM. 01487-15 (Sept. 2015).

102. Michalkova, V., Benoit, J. B., Weiss, B. L., Attardo, G. M. & Aksoy, S. Vitamin B6 Generated by Obligate Symbionts Is Critical for Maintaining Proline Homeostasis and Fecundity in Tsetse Flies. Applied and Environmental Microbiology 80, 5844–5853. issn: 0099-2240, 1098-5336. doi:10.1128/ AEM.01150-14 (Sept. 2014). 103. Kuznetsova, E. et al. Genome-wide Analysis of Substrate Specificities of the Escherichia coli Haloacid Dehalogenase-like Phosphatase Family. Journal of Biological Chemistry 281, 36149–36161. issn: 0021- 9258, 1083-351X. doi:10.1074/jbc.M605449200 (Nov. 2006). 104. Rodriguez, F., Lillington, J., Johnson, S., Timmel, C. R., Lea, S. M. & Berks, B. C. Crystal Structure of the Bacillus subtilis Phosphodiesterase PhoD Reveals an Iron and Calcium-containing Active Site. Journal of Biological Chemistry 289, 30889–30899. issn: 00219258. doi:10.1074/jbc.M114.604892 (Nov. 2014). BIBLIOGRAPHY 29

105. Zaheer, R., Morton, R., Proudfoot, M., Yakunin, A. & Finan, T. M. Genetic and biochemical properties of an alkaline phosphatase PhoX family protein found in many bacteria. Environmental Microbiology 11, 1572–1587. issn: 14622912, 14622920. doi:10.1111/j.1462-2920.2009.01885.x (June 2009). 106. Haase, I., Sarge, S., Illarionov, B., Laudert, D., Hohmann, H.-P., Bacher, A. & Fischer, M. Enzymes from the Haloacid Dehalogenase (HAD) Superfamily Catalyse the Elusive Dephosphorylation Step of Riboflavin Biosynthesis. ChemBioChem 14, 2272–2275. issn: 14394227. doi:10.1002/cbic.201300544 (Nov. 2013). 107. Kaplan, J. B., Nichols, B. P. & Brenner, S. Nucleotide sequence of Escherichia coli pabA and its evolutionary relationship to trp (G) D. Journal of Molecular Biology 168, 451–468. issn: 00222836. doi:10.1016/S0022-2836(83)80295-2 (Aug. 1983). 108. Xie, G., Keyhani, N. O., Bonner, C. A. & Jensen, R. A. Ancient Origin of the Tryptophan Operon and the Dynamics of Evolutionary Change. Microbiology and Molecular Biology Reviews 67, 303–342. issn: 1092-2172, 1098-5557. doi:10.1128/MMBR.67.3.303-342.2003 (Sept. 2003). 109. De Cr´ecy-Lagard,V., El Yacoubi, B., de la Garza, R., Noiriel, A. & Hanson, A. D. Comparative ge- nomics of bacterial and plant folate synthesis and salvage: predictions and validations. BMC Genomics 8, 245. issn: 14712164. doi:10.1186/1471-2164-8-245 (2007). 110. Kane, J. F., Holmes, W. M. & Jensen, R. A. Metabolic interlock. The dual function of a folate pathway gene as an extra-operonic gene of tryptophan biosynthesis. The Journal of Biological Chemistry 247, 1587–1592 (1972). 111. Troch, P. D., Dosselaere, F., Keijers, V., Wilde, P. d. & Vanderleyden, J. Isolation and Characterization of the Azospirillum brasilense trpE(G) Gene, Encoding Anthranilate Synthase. Current Microbiology 34, 27–32. issn: 0343-8651, 1432-0991. doi:10.1007/s002849900139 (Jan. 1997). 112. Elischewski, F., P¨uhler, A. & Kalinowski, J. Pantothenate production in Escherichia coli K12 by enhanced expression of the panE gene encoding ketopantoate reductase. Journal of Biotechnology 75, 135–146. issn: 01681656. doi:10.1016/S0168-1656(99)00153-4 (Oct. 1999). 113. Price, D. R. & Wilson, A. C. A substrate ambiguous enzyme facilitates genome reduction in an intracellular symbiont. BMC Biology 12, 110. issn: 1741-7007. doi:10.1186/s12915- 014- 0110- 4 (Dec. 2014). 114. Meseguer, A. S., Manzano-Mar´ın,A., Coeur d’Acier, A., Clamens, A.-L., Godefroid, M. & Jousselin, E. Buchnera has changed flatmate but the repeated replacement of co-obligate symbionts is not associated with the ecological expansions of their aphid hosts. Molecular Ecology 26, 2363–2378. issn: 09621083. doi:10.1111/mec.13910 (Apr. 2017). 115. Klasson, L., Kambris, Z., Cook, P. E., Walker, T. & Sinkins, S. P. Horizontal gene transfer between Wolbachia and the mosquito Aedes aegypti. BMC Genomics 10, 33. issn: 1471-2164. doi:10.1186/ 1471-2164-10-33 (2009). 116. Brelsfoard, C. et al. Presence of Extensive Wolbachia Symbiont Insertions Discovered in the Genome of Its Host Glossina morsitans morsitans. PLoS Neglected Tropical Diseases 8 (ed Valenzuela, J. G.) e2728. issn: 1935-2735. doi:10.1371/journal.pntd.0002728 (Apr. 2014). 117. Ren, F.-R., Sun, X., Wang, T.-Y., Yao, Y.-L., Huang, Y.-Z., Zhang, X. & Luan, J.-B. Biotin provision- ing by horizontally transferred genes from bacteria confers animal fitness benefits. The ISME Journal 14, 2542–2553. issn: 1751-7362, 1751-7370. doi:10.1038/s41396-020-0704-5 (Oct. 2020). BIBLIOGRAPHY 30

118. Husnik, F. & McCutcheon, J. P. Functional horizontal gene transfer from bacteria to eukaryotes. Nature Reviews Microbiology 16, 67–79. issn: 1740-1526, 1740-1534. doi:10.1038/nrmicro.2017.137 (Feb. 2018). 119. Husnik, F. & McCutcheon, J. P. Repeated replacement of an intrabacterial symbiont in the tripartite nested mealybug symbiosis. Proceedings of the National Academy of Sciences 113, E5416–E5424. issn: 0027-8424, 1091-6490. doi:10.1073/pnas.1603910113 (Sept. 2016). 120. Hansen, A. K. & Moran, N. A. Aphid genome expression reveals host-symbiont cooperation in the production of amino acids. Proceedings of the National Academy of Sciences 108, 2849–2854. issn: 0027-8424, 1091-6490. doi:10.1073/pnas.1013465108 (Feb. 2011). 121. Husnik, F. & Keeling, P. J. The fate of obligate endosymbionts: reduction, integration, or extinction. Current Opinion in Genetics & Development 58-59, 1–8. issn: 0959437X. doi:10.1016/j.gde.2019. 07.014 (Oct. 2019). 122. Chong, R. A. & Moran, N. A. Evolutionary loss and replacement of Buchnera, the obligate endosym- biont of aphids. The ISME Journal 12, 898–908. issn: 1751-7362, 1751-7370. doi:10.1038/s41396- 017-0024-6 (Mar. 2018). 123. Bhuvaneswari, G. Molecular Characterization of camphor utilizing bacterial isolates from refinery sludge and detection of target loci-Cytochrome P-450 cam mono oxygenase (cam C gene) by PCR and gene probe. SpringerPlus 2, 170. issn: 2193-1801. doi:10.1186/2193-1801-2-170 (Dec. 2013). 124. Moran, N. A., McCutcheon, J. P. & Nakabachi, A. Genomics and Evolution of Heritable Bacterial Sym- bionts. Annual Review of Genetics 42, 165–190. issn: 0066-4197, 1545-2948. doi:10.1146/annurev. genet.41.110306.130119 (Dec. 2008). 125. Boscaro, V., Kolisko, M., Felletti, M., Vannini, C., Lynn, D. H. & Keeling, P. J. Parallel genome reduction in symbionts descended from closely related free-living bacteria. Nature Ecology & Evolution 1, 1160–1167. issn: 2397-334X. doi:10.1038/s41559-017-0237-0 (Aug. 2017). 126. Perreau, J., Patel, D. J., Anderson, H., Maeda, G. P., Elston, K. M., Barrick, J. E. & Moran, N. A. Vertical transmission at the pathogen-symbiont interface: Serratia symbiotica and aphids preprint (Mi- crobiology, Sept. 2020). doi:10.1101/2020.09.01.279018. 127. Fukatsu, T. & Hosokawa, T. Capsule-Transmitted Gut Symbiotic Bacterium of the Japanese Common Plataspid Stinkbug, Megacopta punctatissima. Applied and Environmental Microbiology 68, 389–396. issn: 0099-2240, 1098-5336. doi:10.1128/AEM.68.1.389-396.2002 (Jan. 2002). 128. Takeuchi, M., Kuwahara, H., Murakami, T., Takahashi, K., Kajitani, R., Toyoda, A., Itoh, T., Ohkuma, M. & Hongoh, Y. Parallel reductive genome evolution in Desulfovibrio ectosymbionts independently acquired by Trichonympha protists in the termite gut. The ISME Journal 14, 2288–2301. issn: 1751- 7362, 1751-7370. doi:10.1038/s41396-020-0688-1 (Sept. 2020). 129. Nardi, T., Olivieri, E., Kariuki, E., Sassera, D. & Castelli, M. Sequence of a Coxiella Endosymbiont of the Tick Amblyomma nuttalli Suggests a Pattern of Convergent Genome Reduction in the Coxiella Genus. Genome Biology and Evolution 13 (ed Ochman, H.) evaa253. issn: 1759-6653. doi:10.1093/ gbe/evaa253 (Jan. 2021). 130. Manzano-Mar´ın,A. & Latorre, A. Snapshots of a shrinking partner: Genome reduction in Serratia symbiotica. Scientific Reports 6, 32590. issn: 2045-2322. doi:10.1038/srep32590 (Dec. 2016). 131. Burke, G. R. & Moran, N. A. Massive Genomic Decay in Serratia symbiotica, a Recently Evolved Symbiont of Aphids. Genome Biology and Evolution 3, 195–208. issn: 1759-6653. doi:10.1093/gbe/ evr002 (Jan. 2011). BIBLIOGRAPHY 31

132. Manzano-Mar´ın,A., Lamelas, A., Moya, A. & Latorre, A. Comparative Genomics of Serratia spp.: Two Paths towards Endosymbiotic Life. PLoS ONE 7 (ed Horn, M.) e47274. issn: 1932-6203. doi:10. 1371/journal.pone.0047274 (Oct. 2012). 133. Stewart, F. J., Young, C. R. & Cavanaugh, C. M. Lateral Symbiont Acquisition in a Maternally Transmitted Chemosynthetic Clam Endosymbiosis. Molecular Biology and Evolution 25, 673–687. issn: 0737-4038, 1537-1719. doi:10.1093/molbev/msn010 (Feb. 2008). 134. Husn´ık,F., Chrudimsk´y,T. & Hypˇsa,V. Multiple origins of endosymbiosis within the Enterobacteri- aceae (γ-): convergence of complex phylogenetic approaches. BMC Biology 9, 87. issn: 1741-7007. doi:10.1186/1741-7007-9-87 (Dec. 2011). 135. Kishino, H., Miyata, T. & Hasegawa, M. Maximum likelihood inference of protein phylogeny and the origin of chloroplasts. Journal of Molecular Evolution 31, 151–160. issn: 0022-2844, 1432-1432. doi:10.1007/BF02109483 (Aug. 1990). 136. Kishino, H. & Hasegawa, M. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea. Journal of Molecular Evolution 29, 170–179. issn: 0022-2844, 1432-1432. doi:10.1007/BF02100115 (Aug. 1989). 137. Shimodaira, H. & Hasegawa, M. Multiple Comparisons of Log-Likelihoods with Applications to Phylo- genetic Inference. Molecular Biology and Evolution 16, 1114–1116. issn: 0737-4038, 1537-1719. doi:10. 1093/oxfordjournals.molbev.a026201 (Aug. 1999). 138. Strimmer, K. & Rambaut, A. Inferring confidence sets of possibly misspecified gene trees. Proceedings of the Royal Society of London. Series B: Biological Sciences 269, 137–142. issn: 0962-8452, 1471-2954. doi:10.1098/rspb.2001.1862 (Jan. 2002). 139. Shimodaira, H. An Approximately Unbiased Test of Phylogenetic Tree Selection. Systematic Biology 51 (ed Goldman, N.) 492–508. issn: 1076-836X, 1063-5157. doi:10.1080/10635150290069913 (May 2002). Appendix A

Supplementary material

32 Appendix A. Supplementary material 33

Species Strain Accession Chlamydia sp. H15-1957-10C LT993738.1, LT993739.1 Chlamydia sp. S15-834C LS992154.1, LS992155.1 Chlamydia abortus S26/3 CR848038.1 Parachlamydia sp. BC.030 PEQL00000000.1 Chlamydia pneumoniae CWL029 AE001363.1 Chlamydia trachomatis 434/Bu AM884176.1 Chlamydia psittaci 6BC CP002586.1, CP002587.1 Chlamydia pecorum E58 CP002608.1 Neochlamydia sp. S13 AP017977.1, AP017978.1 Candidatus Protochlamydia amoebophila UWE25 BX908798.2 Candidatus Protochlamydia naegleriophila KNic LN879502.1, LN879503.1 Parachlamydia acanthamoebae UV-7 FR872580.1 Simkania negevensis Z FR872582.1, FR872581.1 Chlamydia muridarum Nigg AE002160.2, AE002162.1 Waddlia chondrophila WSU 86-1044 CP001928.1, CP001929.1 Entomoplasma ellychniae ELCN-1 PHND00000000.1 Entomoplasma freundtii BARC 318 CP024962.1 Entomoplasma lucivorax PIPN-2 PHNE00000000.1 Entomoplasma luminosum PIMN-1 CP024963.1 Entomoplasma melaleucae M1 CP024964.1 Entomoplasma somnilux PYAN-1 CP024965.1 Yokenella regensburgei NCTC11966 UHJH00000000.1

Table A.1: Contains taxonomic and accession information for each taxon included in the local BLAST database constructed for the assembly process.

Class Order Family Genus Species Strain Org Code Alphaproteobacteria Rhizobiales Rhizobiaceae Agrobacterium tumefaciens Ach5 atf Alphaproteobacteria Rhizobiales Rhizobiaceae Ensifer adhaerens Casida A eah Alphaproteobacteria Rhizobiales Rhizobiaceae Gellertiella hungarica DSM 29853 N/A Alphaproteobacteria Rhizobiales Rhizobiaceae NZP2037 mln Alphaproteobacteria Rhizobiales Rhizobiaceae Rhizobium etli CFN 42 ret Alphaproteobacteria Rhizobiales Rhizobiaceae Sinorhizobium meliloti 1021 sme Alphaproteobacteria Rickettsiales Rickettsiaceae Rickettsia prowazekaii Madrid E rpr Gammaproteobacteria Legionellales Coxiellaceae Coxiella - ex. A. americanum cea Gammaproteobacteria Enterobacterales Enterobacteriaceae Escherichia coli K-12 (MG1655) eco Gammaproteobacteria Pseudomonales Pseudomonadaceae Pseudomonas aeruginosa PAO1 pae Gammaproteobacteria Enterobacterales Enterobacteriaceae Ca. Riesia pediculicola USDA rip Gammaproteobacteria Vibrionales Vibrionaceae Vibrio cholerae O1 El Tor N16961 vch Gammaproteobacteria Enterobacterales Erwinaceae Wigglesworthia glossinidia brevipalpis wbr

Table A.2: Contains pertinent taxonomic information for all organisms used to assemble the metabolic screening database. ‘Org code’ refers to a 3-4 letter code used to designate distinct taxonomic units (species or below) in the KEGG database. Appendix A. Supplementary material 34

Pathway Active B vitamin Path suffix Thiamin metabolism Thiamin pyrophosphate 00730 Riboflavin metabolism Flavin mononucleotide/Flavin adenine dinucleotide 00740 Vitamin B6 metabolism Pyridoxal phosphate 00750 Nicotinate and nicotinamide metabolism NADPH/NADP+ 00760 Pantothenate and CoA biosynthesis Coenzyme A 00770 Biotin metabolism Biotin 00780 Lipoic acid metabolism Lipoic acid 00785 Folate biosynthesis Folic acid 00790 Porphyrin and chlorophyll metabolism Vitamin B12 coenzyme (cobamide) 00860

Table A.3: From left to right, each column denotes the name of the pathway map, the active form of B vitamin produced in the pathway, and the pathway suffix as found on KEGG. Appendix A. Supplementary material 35

Order Family Genus Species Strain Accession

Rhizobiales Aestuariivirgaceae Aestuariivirga litoralis KCTC 52945 GCF 003234965.1 Rhizobiales Ancalomicrobiaceae Siculibacillus lacustris SA-279 GCF 004328075.1 Rhizobiales Aurantimonadaceae Aurantimonas coralicida DSM 14790 GCF 000421645.1 Rhizobiales Aurantimonadaceae Aureimonas altamirensis DSM 21988 GCF 900141975.1 Rhizobiales Aurantimonadaceae Consotaella salsifontis USBA 369 GCF 900167365.1 Rhizobiales Aurantimonadaceae Fulvimarina pelagi HTCC2506 GCF 000153705.1 Rhizobiales Aurantimonadaceae Jiella endophytica CBS5Q-3 GCF 004519335.1 Rhizobiales Bartonellaceae Bartonella bacilliformis KC583 GCF 000015445.1 Rhizobiales Bartonellaceae Bartonella grahamii as4aup GCF 000022725.1 Rhizobiales Bartonellaceae Bartonella quintana MF1-1 GCF 009936175.1 Rhizobiales Beijerinckiaceae Beijerinckia indica indica GCF 000019845.1 Rhizobiales Beijerinckiaceae Methylocella silvestris BL2 GCF 000021745.1 Rhizobiales Beijerinckiaceae Methylovirgula ligni BW863 GCF 004135935.1 Rhizobiales Bradyrhizobiaceae Afipia felis NCTC12499 GCF 900445155.1 Rhizobiales Bradyrhizobiaceae Bosea vaviloviae Vaf18 GCF 001741865.1 Rhizobiales Bradyrhizobiaceae Bradyrhizobium diazoefficiens USDA 110 GCF 001642675.1 Rhizobiales Bradyrhizobiaceae Bradyrhizobium japonicum USDA 6 GCF 000284375.1 Rhizobiales Bradyrhizobiaceae Bradyrhizobium oligotrophicum S58 GCF 000344805.1 Rhizobiales Bradyrhizobiaceae Bradyrhizobium paxllaeri LMTR 21 GCF 001693515.2 Rhizobiales Bradyrhizobiaceae Bradyrhizobium zhanjiangense CCBAU 51778 GCF 004114935.1 Rhizobiales Bradyrhizobiaceae winogradskyi Nb-255 GCF 000012725.1 Rhizobiales Bradyrhizobiaceae Oligotropha carboxidovorans OM5 GCF 000218565.1 Rhizobiales Bradyrhizobiaceae Rhodopseudomonas palustris CGMCC 1.2180 GCF 013415845.1 Rhizobiales Bradyrhizobiaceae Tardiphaga robiniae LMG 26467 GCF 013359755.1 Rhizobiales Bradyrhizobiaceae Variibacter gotjawalensis GJW-30 GCF 002355335.1 Rhizobiales Brucellaceae Brucella melitensis 16M GCF 000007125.1 Rhizobiales Brucellaceae Falsochrobactrum ovis DSM 26720 GCF 003259955.1 Rhizobiales Brucellaceae Ochrobactrum anthropi OAB GCF 000742955.1 Rhizobiales Brucellaceae Ochrobactrum quorumnocens A44 GCF 002278035.1 Rhizobiales Brucellaceae Pseudochrobactrum saccharolyticum CCUG 33852 GCF 008801715.1 Rhizobiales Chelatococcaceae Camelimonas lactis DSM 22958 GCF 004342915.1 Rhizobiales Chelatococcaceae Chelatococcus sambhunathii DSM 18167 GCF 001418005.1 Rhizobiales Breoghania corrubedonensis DSM 23382 GCF 003053845.1 Rhizobiales Cohaesibacteraceae gelatinilyticus DSM 18289 GCF 900215605.1 Rhizobiales Hyphomicrobiaceae Aquabacter cavernae Sn-9-2 GCF 003993795.1 Rhizobiales Hyphomicrobiaceae Blastochloris viridis ATCC 19567 GCF 001402875.1 Rhizobiales Hyphomicrobiaceae Devosia lucknowensis L15 GCF 900177655.1 Rhizobiales Hyphomicrobiaceae Devosia ginsengisoli Gsoil 520 GCF 007859655.1 Rhizobiales Hyphomicrobiaceae Hyphomicrobium denitrificans ATCC 51888 GCF 000143145.1 Rhizobiales Hyphomicrobiaceae Maritalea myrionectae HL2708#5 GCF 003433515.1 Rhizobiales Hyphomicrobiaceae halotolerans ANSP101 GCF 013004005.1 Rhizobiales Hyphomicrobiaceae ATCC 17100 GCF 000166055.1 Rhizobiales Hyphomicrobiaceae Youhaiella tibetensis fig4 GCF 008000755.1 Rhizobiales Lichenibacteriaceae Lichenibacterium ramalinae RmlP001 GCF 004137085.1 Rhizobiales Lichenihabitantaceae Lichenihabitans psoromatis PAMC 29148 GCF 004323635.1 Rhizobiales Mabikibacteraceae Mabikibacter ruber YP382-1-A GCF 006346345.1 Rhizobiales Meganemaceae Neomegalonema perideroedes DSM 15528 GCF 000374145.1 Rhizobiales Methylobacteriaceae Enterovirga rhinocerotis DSM 25903 GCF 004363955.1 Rhizobiales Methylobacteriaceae Methylobacterium durans 17SD2-17 GCF 003173715.1 Rhizobiales Methylobacteriaceae Methylobacterium nodulans ORS 2060 GCF 000022085.1 Rhizobiales Methylobacteriaceae Methylobacterium phyllosphaerae CBMB27 GCF 001936175.1 Rhizobiales Methylobacteriaceae Methylorubrum extorquens PA1 GCF 000018845.1 Rhizobiales Methylobacteriaceae Microvirga ossetica V5/3M GCF 002741015.1 Rhizobiales Methylocystaceae Methylocystis parvus BRCS2 GCF 009685195.1 Rhizobiales Methylocystaceae Methylosinus trichosporium OB3b GCF 000178815.2 Rhizobiales Methylocystaceae Pleomorphomonas diazotrophica R5-392 GCF 900114935.1 Rhizobiales Notoacmeibacteraceae Notoacmeibacter marinus XMTR2A4 GCF 002238045.1 Rhizobiales NCTC10684 GCF 900445235.1 Rhizobiales Phyllobacteriaceae Aminobacter niigataensis DSM 7050 GCF 014200015.1 Rhizobiales Phyllobacteriaceae Aquamicrobium defluvii DSM 11603 GCF 004363725.1 Rhizobiales Phyllobacteriaceae Aquamicrobium aerolatum DSM 21857 GCF 900113935.1 Rhizobiales Phyllobacteriaceae Chelativorans multitrophicus DSM 9103 GCF 011317445.1 Rhizobiales Phyllobacteriaceae Chelativorans oligotrophicus LPM-4 GCF 011317505.1 Rhizobiales Phyllobacteriaceae Hoeflea marina DSM 16791 GCF 003182275.1 Rhizobiales Phyllobacteriaceae Hoeflea phototrophica DFL-43 GCF 000154705.2 Rhizobiales Phyllobacteriaceae WSM1284 GCF 001618845.1 Appendix A. Supplementary material 36

Table A.4 continued from previous page Order Family Genus Species Strain Accession

Rhizobiales Phyllobacteriaceae Mesorhizobium loti NZP2042 GCF 013170705.1 Rhizobiales Phyllobacteriaceae Mesorhizobium opportunistum WSM2075 GCF 000176035.2 Rhizobiales Phyllobacteriaceae Nitratireductor basaltis UMTGB225 GCF 000733725.1 Rhizobiales Phyllobacteriaceae Nitratireductor indicus CGMCC 1.10953 GCF 900115835.1 Rhizobiales Phyllobacteriaceae Oceaniradius stylonematis StC1 GCF 003149475.2 Rhizobiales Phyllobacteriaceae cellulosilytica KCTC 52183 GCF 004331745.1 Rhizobiales Phyllobacteriaceae Paramesorhizobium deserti A-3-E GCF 001558695.1 Rhizobiales Phyllobacteriaceae endophyticum PEPV15 GCF 003010935.1 Rhizobiales Phyllobacteriaceae Phyllobacterium zundukense Tri-48 GCF 002764115.1 Rhizobiales Phyllobacteriaceae Pseudoaminobacter salicylatoxidans DSM 6986 GCF 003148475.1 Rhizobiales Phyllobacteriaceae Pseudohoeflea suaedae YC6898 GCF 004354915.1 Rhizobiales Phyllobacteriaceae Roseitalea porphyridii MA7-20 GCF 004331955.1 Rhizobiales Phyllobacteriaceae Salaquimonas pukyongi RR3-28 GCF 001953055.1 Rhizobiales Phyllobacteriaceae Zhengella mangrovi X9-2-2 GCF 002727065.1 Rhizobiales Rhabdaerophilaceae Rhabdaerophilum calidifontis SYSU G02060 GCF 008641065.1 Rhizobiales Rhizobiaceae DSM 7138 GCF 003046475.1 Rhizobiales Rhizobiaceae Ciceribacter lividus DSM 25528 GCF 003337715.1 Rhizobiales Rhizobiaceae Ciceribacter thiooxidans F43B GCF 014126615.1 Rhizobiales Rhizobiaceae Gellertiella hungarica DSM 29853 GCF 014197145.1 Rhizobiales Rhizobiaceae Reichenowia sp. Ppha N/A Rhizobiales Rhizobiaceae Reichenowia sp. Pmon N/A Rhizobiales Rhizobiaceae Reichenowia sp. Pmul N/A Rhizobiales Rhizobiaceae Reichenowia sp. Ppar N/A Rhizobiales Rhizobiaceae Reichenowia sp. Prug N/A Rhizobiales Rhizobiaceae Reichenowia sp. Psp N/A Rhizobiales Rhizobiaceae Georhizobium profundi WS11 GCF 003952725.1 Rhizobiales Rhizobiaceae Kaistia adipata DSM 17808 GCF 000423225.1 Rhizobiales Rhizobiaceae Kaistia soli DSM 19436 GCF 900129325.1 Rhizobiales Rhizobiaceae Liberibacter africanus PTSAPSY GCF 001021085.1 Rhizobiales Rhizobiaceae Liberibacter crescens BT-0 GCF 001543305.1 Rhizobiales Rhizobiaceae Agrobacterium deltaense RV3 GCF 900013505.1 Rhizobiales Rhizobiaceae Agrobacterium fabrum PDC82 GCF 900100235.1 Rhizobiales Rhizobiaceae Agrobacterium salinitolerans Hayward0363 GCF 900012565.1 Rhizobiales Rhizobiaceae Agrobacterium tumefaciens MAFF210266 GCF 007002865.1 Rhizobiales Rhizobiaceae Allorhizobium vitis VAR06-30 GCF 013426735.1 Rhizobiales Rhizobiaceae Neorhizobium galegae HAMBI 540 GCF 000731315.1 Rhizobiales Rhizobiaceae Pararhizobium polonicum F5.1 GCF 001687365.1 Rhizobiales Rhizobiaceae Pararhizobium haloflavum XC0140 GCF 002750855.1 Rhizobiales Rhizobiaceae Pseudorhizobium pelagicum R2-400B4 GCF 000722625.1 Rhizobiales Rhizobiaceae Pseudorhizobium flavum YW14 GCF 902502825.1 Rhizobiales Rhizobiaceae CIAT 899 GCF 000330885.1 Rhizobiales Rhizobiaceae Rhizobium jaguaris CCGE525 GCF 003627755.1 Rhizobiales Rhizobiaceae Agrobacteriun oryzihabitans M15 GCF 010669145.1 Rhizobiales Rhizobiaceae Rhizobium acidisoli FH23 GCF 002531755.2 Rhizobiales Rhizobiaceae R650 GCF 001664385.1 Rhizobiales Rhizobiaceae Rhizobium esperanzae N561 GCF 001664265.1 Rhizobiales Rhizobiaceae Rhizobium grahamii BG7 GCF 009498215.1 Rhizobiales Rhizobiaceae Rhizobium pseudoryzae DSM 19479 GCF 011046245.1 Rhizobiales Rhizobiaceae Rhizobium rhizoryzae DSM 29514 GCF 011046895.1 Rhizobiales Rhizobiaceae Rhizobium favelukesii LPU83 GCF 000577275.2 Rhizobiales Rhizobiaceae Rhizobium multihospitium HAMBI 2975 GCF 900094585.1 Rhizobiales Rhizobiaceae Allorhizobium tabaishanense 14971 GCF 001938985.1 Rhizobiales Rhizobiaceae Rhizobium selenitrireducens ATCC BAA-1503 GCF 000518785.1 Rhizobiales Rhizobiaceae Pseudorhizobium halotolerans AB21 GCF 902153235.1 Rhizobiales Rhizobiaceae Pseudorhizobium endolithicum JC140 GCF 902153245.1 Rhizobiales Rhizobiaceae WSM1455 GCF 000271805.1 Rhizobiales Rhizobiaceae Rhizobium etli R22G GCF 003259755.1 Rhizobiales Rhizobiaceae Rhizobium mesoamericanum STM3625 GCF 000312665.1 Rhizobiales Rhizobiaceae USDA 1844 GCF 007827505.1 Rhizobiales Rhizobiaceae Rhizobium daejeonense CCBAU10050 GCF 011045155.1 Rhizobiales Rhizobiaceae Shinella zoogloeoides PQ7 GCF 003574625.1 Rhizobiales Rhizobiaceae Ensifer adhaerens OV14 (Casida A) GCF 000697965.2 Rhizobiales Rhizobiaceae Ensifer mexicanus ITGG R7 GCF 013488225.1 Rhizobiales Rhizobiaceae Ensifer sesbaniae CCBAU 65729 GCF 013283665.1 Rhizobiales Rhizobiaceae Ensifer sojae CCBAU 05684 GCF 002288525.1 Appendix A. Supplementary material 37

Table A.4 continued from previous page Order Family Genus Species Strain Accession

Rhizobiales Rhizobiaceae CFNEI 73 GCF 001889105.1 Rhizobiales Rhizobiaceae Sinorhizobium arboris LMG 14919 GCF 000427465.1 Rhizobiales Rhizobiaceae CCBAU 25509 GCF 003177055.1 Rhizobiales Rhizobiaceae WSM419 GCF 000017145.1 Rhizobiales Rhizobiaceae Sinorhizobium meliloti USDA1170 GCF 004003925.1 Rhizobiales Rhodobiaceae Afifella aestuarii JA968 GCF 004023665.1 Rhizobiales Rhodobiaceae Amorphus coralli DSM 19760 GCF 000374525.1 Rhizobiales Rhodobiaceae Parvibaculum lavamentivorans DS-1 GCF 000017565.1 Rhizobiales Rhodobiaceae Pyruvatibacter mobilis CGMCC 1.15125 GCF 012848855.1 Rhizobiales Rhodobiaceae Tepidamorphus gemmatus DSM 19345 GCF 004346195.1 Rhizobiales Roseiarcaceae Roseiarcus fermentans DSM 24875 GCF 003315135.1 Rhizobiales Salinarimonadaceae Salinarimonas rosea DSM 21201 GCF 000429045.1 Rhizobiales Xanthobacteraceae Ancylobacter pratisalsi DSM 102029 GCF 010669125.1 Rhizobiales Xanthobacteraceae Azorhizobium caulinodans ORS 571 GCF 000010525.1 Rhizobiales Xanthobacteraceae Starkeya novella DSM 506 GCF 000092925.1 Rhodobacterales Rhodobacteraceae Rhodobacter sphaeroides ATCC 17029 GCF 000015985.1 Caulobacterales Caulobacteraceae Asticcacaulis excentris CB 48 GCF 000175215.2 Caulobacterales Caulobacteraceae Brevundimonas subvibrioides ATCC 15264 GCF 000144605.1 Caulobacterales Caulobacteraceae Brevundimonas naejangsanensis B1 GCF 000635915.2 Caulobacterales Caulobacteraceae Caulobacter flavus RHGG3 GCF 003722335.1 Caulobacterales Caulobacteraceae Caulobacter mirabilis FWC 38 GCF 002749615.1 Caulobacterales Caulobacteraceae Caulobacter rhizosphaerae KCTC 52515 GCF 010977555.1 Caulobacterales Caulobacteraceae Phenylobacterium zucineum HLK1 GCF 000017265.1 Caulobacterales Caulobacteraceae Woodsholea maritima DSM 17123 GCF 000382325.1 Cellvibrionales Cellvibrionaceae Thalassocella blandensis ISS155 GCF 902141825.1

Table A.4: This table contains metadata for each taxon used in the phylogenetic reconstructions in this study. The taxonomic assignment for each taxon down to the strain level, are displayed from right to left: class, order, family, genus, species, and strain. Also included in the table are the references for the taxon’s authority and the assembly accession number.

Input tree logL bp-RELL [135] p-KH [136] p-SH [137] c-ELW [138] p-AU [139]

ribo aa parts ML -383310.1854 0.999 0.998 1 0.999 0.999 ribo aa parts ML constrained -383425.8483 0.0014* 0.002* 0.002* 0.00139* 0.00108* ribo aa nopart ML -385985.3978 1 1 1 1 1 ribo aa nopart ML constrained -386134.454 0.0001* 0.0001* 0.0001* 0.000148* 0.0000884* ribo nuc part ML -1097264.189 1 1 1 1 1 ribo nuc part ML constrained -1097557.172 0* 0.0001* 0.0001* 0.000000019* 0.0000144* ribo nuc nopart ML -1099115.176 0.131 0.133 0.133 0.131 0.131 ribo nuc nopart ML constrained -1098974.326 0.869 0.867 1 0.869 0.869 ortho aa nopart ML -1871319.895 1 1 1 1 1 ortho aa nopart ML constrained -1871823.009 0* 0* 0* 3.09e−143* 2.49e−61*

Table A.5: From left to right the columns indicate: the input tree tested, the log likelihood (logL) of the tree, the proportion of RELL replicates supporting the specified topology, the p-value for the Kishino-Hasegwawa (KH) test, the p-value for the Shimodaira-Hasegawa (SH) test, the c-value of the estimated likelihood weights (ELW), and the p-value for the approximately unbiased (AU) test. An asterisk to the right of a given value denotes that the topology was significantly excluded from the 95% confidence set. Appendix A. Supplementary material 38

T_blandensis 100 Caulobacterales R_sphaeroides 100 N_perideroedes 100 P_lavamentivorans 100 P_mobilis 100 A_litoralis 100 H_denitrificans 100 R_vannielii 100 T_gemmatus B_viridis

100 A_caulinodans 100 A_cavernae 100 100 S_novella 100 100 A_pratisalsi V_gotjawalensis O_carboxidovorans 100 100 A_felis 100 N_winogradskyi 100 100 R_palustris 100 T_robiniae 100 82 100 Bradyrhizobium R_calidifontis R_fermentans 99 73 L_ramalinae 100 L_psoromatis 100 M_trichosporium 100 100 83 M_parvus 100 Beijerinckiaceae

73 B_vaviloviae 100 100 Chelatococcaceae 100 S_rosea 100 100 Methylobacteriaceae 100 Kaistia 100 S_lacustris 100 P_diazotrophica A_aestuarii C_gelatinilyticus 57 73 59 A_coralli 100 B_corrubedonensis M_myrionectae 100 P_halotolerans 100 Y_tibetensis 73 100 100 Devosia S_pukyongi 100 Aurantimonadaceae 100 Bartonellaceae 100 100 100Phyllobacteria 100 P_deserti 68 100 Brucellaceae 100 N_marinus 100 100 Z_mangrovi

100 81 Nitratireductor 100 100Chelativorans 100 A_aerolatum 100 P_salicylatoxidans 100 100 A_defluvii 100 100Aminobacter 100 100Mesorhizobium O_cellulosilytica 100 O_stylonematis 100 R_porphyridii

100 P_haloflavum 100 G_profundi P_suaedae 100 100 100 Hoeflea 100 Liberibacter 100 M_dimorpha 100 S_zoogloeoides 100 100 P_polonicum 100 100 Ensifer/Sinorhizobium R_phalera G_hungarica 98 100 R_montifera 100100 R_multilineata 100 R_parasitica 91 98 R_rugosa 100 R_sp 100 Rhizobium 100Allrohizobium 100 0.2 Substitutions per site 79 R_daejeonense 100 R_selenitrireducens 98 100 100Ciceribacter 100Agrobacterium

100 R_pseudoryzae 100 R_rhizoryzae 100 N_galegae 100 100Pseudorhizobium

Figure A.1: Bayesian tree based on partitioned, amino acid coded data from all single copy ribosomal loci from each taxon listed in Table A.4. Posterior probability values are represented as percentages and atop their respective nodes. Triangular tips represent clades that have been collapsed to depict higher order taxonomic groupings. Branch lengths are drawn proportional to the amount of change. Appendix A. Supplementary material 39

T_blandensis

100 Caulobacterales R_sphaeroides 100 N_perideroedes P_lavamentivorans 100 P_mobilis 99 A_litoralis 98 H_denitrificans 100 R_vannielii 100 T_gemmatus B_viridis

100 A_caulinodans 100 A_cavernae 100 97 S_novella 100 100 A_pratisalsi V_gotjawalensis O_carboxidovorans 100 100 A_felis 100 N_winogradskyi

100 91 T_robiniae 64 100 87 R_palustris 99 Bradyrhizobium R_calidifontis L_ramalinae 80 100 L_psoromatis 100 R_fermentans

50 M_trichosporium 100 100 68 M_parvus 100 Beijerinckiaceae

72 B_vaviloviae

100 100 Chelatococcaceae 88 S_rosea 100 100 Methylobacteriaceae

100 Kaistia 94 S_lacustris 75 P_diazotrophica A_coralli 98 B_corrubedonensis 76 37 C_gelatinilyticus 62 M_myrionectae 100 P_halotolerans 100 Y_tibetensis 100 80 100 Devosia A_aestuarii S_pukyongi A_altamirensis 94 61 C_salsifontis 100 A_coralicida 100 F_pelagi 100 95 J_endophytica

100 Bartonellaceae

100 100Phyllobacterium 100 P_deserti 86 100 100 Brucellaceae 82 N_marinus 93 Z_mangrovi

90 88 Nitratireductor 100 100Chelativorans 100 A_aerolatum 100 P_salicylatoxidans 99 100 A_defluvii 98 100Aminobacter 100 100Mesorhizobium O_cellulosilytica 100 O_stylonematis 100 R_porphyridii

89 P_haloflavum 100 G_profundi P_suaedae 100 100 100 Hoeflea

100 Liberibacter 100 M_dimorpha 100 S_zoogloeoides 100 94 P_polonicum 100 100 Ensifer/Sinorhizobium

100 Rhizobium 89 R_phalera

100G_hungarica R_montifera 85 100100 R_multilineata 100 R_parasitica 96 R_rugosa 71 100 R_sp

100Allorhizobium 100 R_daejeonense 100 R_selenitrireducens 66 100 100Ciceribacter

100Agrobacterium

95 R_pseudoryzae 100 R_rhizoryzae 100 N_galegae 96 100 Pseudorhizobium 0.2 Substitutions per site

Figure A.2: Maximum likelihood tree based on unpartitioned, amino acid coded data from all single copy ribosomal loci from each taxon listed in Table A.4 (log likelihood = -385985.387). UFBoot values are represented as percentages and depicted atop their respective nodes. Triangular tips represent clades that have been collapsed to depict higher order taxonomic groupings. Branch lengths are drawn proportional to the amount of change. Appendix A. Supplementary material 40

T_blandensis 100 Caulobacterales R_sphaeroides 100 N_perideroedes 100 P_lavamentivorans 100 P_mobilis 100 A_litoralis 100 H_denitrificans 100 R_vannielii 100 T_gemmatus B_viridis

100 A_caulinodans 100 A_cavernae 100 100 S_novella 100 100 A_pratisalsi V_gotjawalensis O_carboxidovorans 100 100 A_felis 100 N_winogradskyi 100 100 R_palustris 100 T_robiniae 100 69 100 Bradyrhizobium R_calidifontis R_fermentans 95 77 L_ramalinae 100 L_psoromatis 100 M_trichosporium 100 100 98 M_parvus 100 Beijerinckiaceae B_vaviloviae 100 100 100 Chelatococcaceae 100 S_rosea 100 100 Methylobacteraceae 100 Kaistia 100 S_lacustris 100 P_diazotrophica C_gelatinilyticus 98 A_coralli 100 B_corrubedonensis 100 99 A_aestuarii 98 M_myrionectae 100 P_halotolerans 100 Y_tibetensis 100 100 100 Devosia S_pukyongi 100 Aurantimonadaceae 100 Bartonellaceae

100 P_zundukense 100 100 P_endophyticum 100 P_deserti 93 99 100 Brucellaceae 100 N_marinus 100 Z_mangrovi

100 98 Nitratireductor 100 100Chelativorans 100 A_aerolatum 100 P_salicylatoxidans 100 100 A_defluvii 100 100Aminobacter 100 100Mesorhizobium O_cellulosilytica 100 O_stylonematis 100 R_porphyridii

100 P_haloflavum 100 G_profundi P_suaedae 100 100 100 Hoeflea 100 Liberibacter 100 M_dimorpha 100 S_zoogloeoides 100 100 P_polonicum 100 100 Ensifer/Sinorhizobium R_phalera G_hungarica 95 100 R_montifera 100100 R_multilineata 100 R_parasitica 95 95 R_rugosa 100 R_sp 100 Rhizobium 100Allorhizobium

74100R_daejeonense 100 R_selenitrireducens 97 100 100Ciceribacter 100Agrobacterium

100 R_pseudoryzae 100 R_rhizoryzae 100 N_galegae 100 100Pseudorhizobium 0.2 Substitutions per site

Figure A.3: Bayesian tree based on unpartitioned, amino acid coded data from all single copy ribosomal loci from each taxon listed in Table A.4. Posterior probability values are represented as percentages and atop their respective nodes. Triangular tips represent clades that have been collapsed to depict higher order taxonomic groupings. Branch lengths are drawn proportional to the amount of change. Appendix A. Supplementary material 41

T_blandensis W_maritima A_excentris 100 B_subvibrioides 100 100 B_naejangsanensis P_zucineum 100 C_mirabilis 60 C_flavus 100 C_rhizosphaerae 90 R_sphaeroides 100 N_perideroedes P_lavamentivorans 100 P_mobilis 100 A_litoralis 100 H_denitrificans 100 R_vannielii 100 M_myrionectae 100 P_halotolerans 100 Y_tibetensis 100 100 Devosia T_gemmatus 98 B_viridis A_caulinodans 100 100 A_cavernae 100 S_novella 100 100 A_pratisalsi 97 V_gotjawalensis

100 O_carboxidovorans 100 100 A_felis 100 N_winogradskyi 100 T_robiniae 100 100 98 R_palustris 100 Bradyrhizobium R_calidifontis R_fermentans 9994 L_ramalinae 100 L_psoromatis 100 86 M_trichosporium 100 100 100 M_parvus 100 Beijerinckaceae 100 Chelatococcaceae 100 B_vaviloviae 84 S_rosea 100 100 Methylobacteriaceae S_lacustris 100 P_diazotrophica 59 100 Kaistia A_coralli 99 80 B_corrubedonensis 94 A_aestuarii 61 C_gelatinilyticus S_pukyongi 92 A_altamirensis 100 C_salsifontis 100 F_pelagi 100 100 A_coralicida 57 J_endophytica O_cellulosilytica 100 O_stylonematis 83 100 R_porphyridii N_marinus 92 Z_mangrovi N_indicus 86 100 N_basaltis 83 100 100Chelativorans 100 A_aerolatum 100 P_salicylatoxidans 100 A_defluvii 100100Aminobacter 100 100Mesorhizobium 92 P_deserti

100 100Phyllobacterium B_melitensis 99 87 O_anthropi 100 O_quorumnocens 97 F_ovis 97 P_saccharolyticum 95 100 100 Liberibacter 100 100 Bartonellaceae P_haloflavum 100 G_profundi P_suaedae 100 100 100 Hoeflea M_dimorpha 100 S_zoogloeoides 10083 P_polonicum 100 100 Ensifer/Sinorhizobium 100 Rhizobium 100 R_phalera

100G_hungarica R_rugosa 100100100 R_sp 100 R_parasitica 100R_montifera 100 90 R_multilineata R_pseudoryzae 100 R_rhizoryzae 100 N_galegae 100 100 96 Pseudorhizobium 100Agrobacterium 0.4 substitutions per site 54 100Allorhizobium 100 R_selenitrireducens 96 99 R_daejeonense 100Ciceribacter

Figure A.4: Maximum likelihood tree based on partitioned, nucleotide coded data from all single copy ribosomal loci from each taxon listed in Table A.4 (log likelihood = -1084451.327). UFBoot values are represented as percentages and depicted atop their respective nodes. Triangular tips represent clades that have been collapsed to depict higher order taxonomic groupings. Branch lengths are drawn proportional to the amount of change. Appendix A. Supplementary material 42

T_blandensis 100 Liberibacter 88 100 Bartonellaceae C_gelatinilyticus

67 P_saccharolyticum F_ovis 95 O_quorumnocens 95 B_melitensis 99 O_anthropi 87 100Phyllobacterium P_deserti 87 P_haloflavum 100 G_profundi P_suaedae 100100 100 Hoeflea 95 M_dimorpha 100 100 S_zoogloeoides P_polonicum 100100 100 Ensifer/Sinorhizobium 100 Rhizobium 93 59 R_phalera

100G_hungarica R_rugosa 100 100100 R_sp 100 R_parasitica 100R_montifera 100 46 R_multilineata R_pseudoryzae 100 R_rhizoryzae 100 N_galegae 95 100 100 96 Pseudorhizobium 100Agrobacterium 66 100Allorhizobium

100 R_selenitrireducens 97 97 R_daejeonense 100Ciceribacter N_marinus 89 Z_mangrovi N_indicus 82 100 N_basaltis 98 100Chelativorans 100 A_aerolatum 100 P_salicylatoxidans 100 A_defluvii 83 100Aminobacter 100 93 100Mesorhizobium S_pukyongi 83 O_cellulosilytica 100 O_stylonematis 100 R_porphyridii A_altamirensis 99 C_salsifontis 100 89 F_pelagi 100 A_coralicida 61 J_endophytica A_coralli 100 B_corrubedonensis 48 91 A_aestuarii 48 T_gemmatus 100 Kaistia 99 S_lacustris 83 P_diazotrophica 95 M_myrionectae 100 P_halotolerans 100 Y_tibetensis 100 100 Devosia

69 A_litoralis 99 H_denitrificans 35 100 R_vannielii 95 P_lavamentivorans 100 P_mobilis 93 R_sphaeroides 100 95 N_perideroedes 95 Caulobacterales B_viridis

38 A_caulinodans 99 100 A_cavernae 100 S_novella 100 100 A_pratisalsi V_gotjawalensis O_carboxidovorans 100 100 A_felis 100 N_winogradskyi

100 T_robiniae 97 100 80 R_palustris 100 Bradyrhizobium R_calidifontis R_fermentans 91 70 L_ramalinae 0.3 substitutions per site 100 L_psoromatis 100 M_trichosporium 100 100 99 M_parvus 100 Beijerinckiaceae B_vaviloviae 100100 Chelatococcaceae 67 S_rosea 100 90 Methylobacteriaceae

Figure A.5: Maximum likelihood tree based on unpartitioned, nucleotide coded data from all single copy ribosomal loci from each taxon listed in Table A.4 (log likelihood = -1097411.166). UFBoot values are represented as percentages and depicted atop their respective nodes. Triangular tips represent clades that have been collapsed to depict higher order taxonomic groupings. Branch lengths are drawn proportional to the amount of change. Appendix A. Supplementary material 43

T_blandensis 100 Caulobacterales R_sphaeroides 100 N_perideroedes P_lavamentivorans 100 100 P_mobilis A_litoralis H_denitrificans 100 100 R_vannielii T_gemmatus 100 Xanthobacteraceae 100 100 B_viridis

94 V_gotjawalensis 55 O_carboxidovorans 100 100 A_felis 100 N_winogradskyi 84 100 T_robiniae 100 98 66 R_palustris 100 Bradyrhizobium R_calidifontis R_fermentans 86 L_ramalinae 100 100 L_psoromatis 100 M_trichosporium 100 100 94 M_parvus 63 100 Beijerinckiaceae B_vaviloviae 100 100 Chelatococcaceae 72 S_rosea 100 100 Methylobacteriaceae C_gelatinilyticus 100 M_myrionectae 100 P_halotolerans 100 Y_tibetensis 100 100 Devosia

100 A_coralli 100 B_corrubedonensis 91 100 Kaistia 99 S_lacustris 100 P_diazotrophica 72 A_aestuarii S_pukyongi 100 Aurantimonadaceae 98 O_cellulosilytica 100 O_stylonematis 100 100 R_porphyridii 100 Bartonellaceae 95 100 100Phyllobacterium 97 P_deserti 96 96 100 Brucellaceae 99 N_marinus 100 Z_mangrovi N_indicus 99 100 N_basaltis 100 100Chelativorans 100 96 A_aerolatum 100 P_salicylatoxidans 100 A_defluvii 96 100Aminobacter 100 100Mesorhizobium P_haloflavum 100 G_profundi P_suaedae 100 100 100 Hoeflea 100 Liberibacter 100 M_dimorpha 100 S_zoogloeoides 100 100 P_polonicum 100 100 Ensifer/Sinorhizobium R_phalera G_hungarica 85 100 R_rugosa 100100 R_sp 100 R_parasitica 100 85 R_montifera 100 R_multilineata 100 Rhizobium 100Allorhizobium

0.3 substitutions per site 78 100 R_selenitrireducens 100 100 R_daejeonense

100 100Ciceribacter 100Agrobacterium

100 R_pseudoryzae 100 R_rhizoryzae 100 N_galegae 100 100Pseudorhizobium

Figure A.6: Maximum likelihood tree based on unpartitioned, amino acid coded data from all single copy orthologues recovered from each taxon listed in Table A.4 (log likelihood = -1871319.897). UFBoot values are represented as percentages and depicted atop their respective nodes. Triangular tips represent clades that have been collapsed to depict higher order taxonomic groupings. Branch lengths are drawn proportional to the amount of change. Appendix A. Supplementary material 44

Thiamin (B1) Riboflavin (B2), folic acid (B9)

phoA rsgA pabA pabB pabC thiC thiD thiE tpk Chorismate AIR TMP Thiamin TPP trpEG smc04014 homolog general general phosphatase folP phosphatase folB folK folC folA nifS L-Cys Folic acid sufS thiI phoA thiG folE thiF rhodanese Sulfur carrier containing protein protein yigB ybjI ribA ribD ribE ribC ribF ribF dxs GTP Riboflavin FMN FAD G-3P serB

thiO Ribulose 5 ribB Glycine phosphate

pncA Nicotinamide NMN Cobalamin (B12) pncB pncC hemA hemB hemC hemD Glycine nadB nadA nadC nadD nadE nadK sthA L-Asp NAD+ NADP+ cobA cysG pntA pntB Coenzyme A (B5), NADP+ (B3) cobI amaB panD

panE panC cobG AHAS ssu ilvC ilvD panB coaA dfp coaD coaE cobK cobF cobM cobJ Pyruvate Coenzyme A mazG AHAS lsu ilvC cobL ilvE L-Val

cobH Pyridoxal 5 phosphate (B6) Lipoic acid lipB lipA cobB epd pdxB serC pdxA pdxJ pdxH Octanoyl-[acp] Lipoic acid D-Erythrose PLP 4-phosphate gapA pdxR cobN thiG cobS (6.6.1.2) D-Ribulose cobT 5-phosphate cobO cobQ

cobD

cobO Biotin (B7) Cobinamide cobP

bioC bioH bioF bioA bioD bioB Fatty acid Malonyl-[acp] Biotin cobP synthesis cobU cobC cytochrome Dimethyl bioI P450 benzimidazole cobS (2.7.2.21)

Long chain [acp] Cobalamin

Figure A.7: Representation of each metabolic pathway involved in the biosynthesis of the B vitamin complex as found in Gellertiella hungarica. Bolded names represent start or end-point compounds and black circles represent intermediates not named to conserve space. All arrows represent the conversion of one intermediate to the next, with the gene(s) responsible for producing the enzyme depicted either above or below the reaction arrow. Steps for which the enzyme’s gene(s) was not found are highlighted with a red box and denoted by gray, dotted reaction arrows. Asterisks above the name of a particular gene denote that this gene is not present in all strains examined. Grey names and arrows represent steps for which I have found a potential replacement enzyme that has not yet been demonstrated to have the action indicated. Abbreviations are the same as used in Figure 1.3. Appendix A. Supplementary material 45

Thiamin (B1) Riboflavin (B2), folic acid (B9)

phoA rsgA pabA pabB pabC thiC thiD thiE tpk Chorismate AIR TMP Thiamin TPP trpEG smc04014 homolog general general phosphatase folP phosphatase folB folK folC folA nifS L-Cys Folic acid sufS thiI phoA thiG folE thiF rhodanese Sulfur carrier containing protein protein yigB ybjI ribA ribD ribE ribC ribF ribF dxs GTP Riboflavin FMN FAD G-3P serB

thiO Ribulose 5 ribB Glycine phosphate

pncA Nicotinamide NMN Cobalamin (B12) pncB pncC hemA hemB hemC hemD Glycine sthA nadB nadA nadC nadD nadE nadK L-Asp NAD+ NADP+ cobA cysG pntA pntB Coenzyme A (B5), NADP+ (B3) cobI amaB panD

panE cobG panC cobK AHAS ssu ilvC ilvD panB coaA dfp coaD coaE cobF cobM cobJ Pyruvate Coenzyme A mazG AHAS lsu ilvC cobL ilvE L-Val

cobH Pyridoxal 5 phosphate (B6) Lipoic acid lipB lipA cobB epd pdxB serC pdxA pdxJ pdxH Octanoyl-[acp] Lipoic acid D-Erythrose PLP 4-phosphate gapA pdxR cobN thiG cobS (6.6.1.2) D-Ribulose cobT 5-phosphate cobO cobQ

cobD

cobO Biotin (B7) Cobinamide cobP

bioC bioH bioF bioA bioD bioB Fatty acid Malonyl-[acp] Biotin cobP synthesis cobU cobC cytochrome Dimethyl bioI P450 benzimidazole cobS (2.7.2.21)

Long chain [acp] Cobalamin

Figure A.8: Representation of each metabolic pathway involved in the biosynthesis of the B vitamin complex as found in Reichenowia sp. strain Ppha. Bolded names represent start or end-point compounds and black circles represent intermediates not named to conserve space. All arrows represent the conversion of one intermediate to the next, with the gene(s) responsible for producing the enzyme depicted either above or below the reaction arrow. Steps for which the enzyme’s gene(s) was not found are highlighted with a red box and denoted by gray, dotted reaction arrows. Asterisks above the name of a particular gene denote that this gene is not present in all strains examined. Grey names and arrows represent steps for which I have found a potential replacement enzyme that has not yet been demonstrated to have the action indicated. Abbreviations are the same as used in Figure 1.3. Appendix A. Supplementary material 46

Thiamin (B1) Riboflavin (B2), folic acid (B9)

phoA rsgA pabA pabB pabC thiC thiD thiE tpk Chorismate AIR TMP Thiamin TPP trpEG smc04014 homolog general general phosphatase folP phosphatase folB folK folC folA nifS L-Cys Folic acid sufS thiI phoA thiG folE thiF rhodanese Sulfur carrier containing protein protein yigB ybjI ribA ribD ribE ribC ribF ribF dxs GTP Riboflavin FMN FAD G-3P serB

thiO Ribulose 5 ribB Glycine phosphate

pncA Nicotinamide NMN Cobalamin (B12) pncB pncC hemA hemB hemC hemD Glycine nadB nadA nadC nadD nadE nadK sthA L-Asp NAD+ NADP+ cobA cysG pntA pntB Coenzyme A (B5), NADP+ (B3) cobI amaB panD

panE cobG panC cobK cobM cobJ AHAS ssu ilvC ilvD panB coaA dfp coaD coaE cobF Pyruvate Coenzyme A mazG AHAS lsu ilvC cobL ilvE L-Val

cobH Pyridoxal 5 phosphate (B6) Lipoic acid lipB lipA cobB epd pdxB serC pdxA pdxJ pdxH Octanoyl-[acp] Lipoic acid D-Erythrose PLP 4-phosphate gapA pdxR cobN thiG cobS (6.6.1.2) D-Ribulose cobT 5-phosphate cobO cobQ

cobD

cobO Biotin (B7) Cobinamide cobP

bioC bioH bioF bioA bioD bioB Fatty acid Malonyl-[acp] Biotin cobP synthesis cobU cobC cytochrome Dimethyl bioI P450 benzimidazole cobS (2.7.2.21)

Long chain [acp] Cobalamin

Figure A.9: Representation of each metabolic pathway involved in the biosynthesis of the B vitamin complex as found in Reichenowia sp. strain Pmon. Bolded names represent start or end-point compounds and black circles represent intermediates not named to conserve space. All arrows represent the conversion of one intermediate to the next, with the gene(s) responsible for producing the enzyme depicted either above or below the reaction arrow. Steps for which the enzyme’s gene(s) was not found are highlighted with a red box and denoted by gray, dotted reaction arrows. Asterisks above the name of a particular gene denote that this gene is not present in all strains examined. Grey names and arrows represent steps for which I have found a potential replacement enzyme that has not yet been demonstrated to have the action indicated. Abbreviations are the same as used in Figure 1.3. Appendix A. Supplementary material 47

Thiamin (B1) Riboflavin (B2), folic acid (B9)

phoA rsgA pabA pabB pabC thiC thiD thiE tpk Chorismate AIR TMP Thiamin TPP trpEG smc04014 homolog general general phosphatase folP phosphatase folB folK folC folA nifS L-Cys Folic acid sufS thiI phoA thiG folE thiF rhodanese Sulfur carrier containing protein protein ribA yigB ybjI ribD ribE ribC ribF ribF dxs GTP Riboflavin FMN FAD G-3P serB

thiO Ribulose 5 ribB Glycine phosphate

pncA Nicotinamide NMN Cobalamin (B12) pncB pncC hemA hemB hemC hemD Glycine nadC nadE nadK sthA nadB nadA nadD cobA cysG L-Asp NAD+ NADP+ pntA pntB Coenzyme A (B5), NADP+ (B3) cobI amaB panD

panE cobG panC cobK cobJ AHAS ssu ilvC ilvD panB coaA dfp coaD coaE cobF cobM Pyruvate Coenzyme A mazG AHAS lsu ilvC cobL ilvE L-Val

cobH Pyridoxal 5 phosphate (B6) Lipoic acid lipB lipA cobB epd pdxB serC pdxA pdxJ pdxH Octanoyl-[acp] Lipoic acid D-Erythrose PLP 4-phosphate gapA pdxR cobN thiG cobS (6.6.1.2) D-Ribulose cobT 5-phosphate cobO cobQ

cobD

cobO Biotin (B7) Cobinamide cobP

bioC bioH bioF bioA bioD bioB Fatty acid Malonyl-[acp] Biotin cobP synthesis cobU cobC cytochrome Dimethyl bioI P450 benzimidazole cobS (2.7.2.21)

Long chain [acp] Cobalamin

Figure A.10: Representation of each metabolic pathway involved in the biosynthesis of the B vitamin complex as found in Reichenowia sp. strain Pmul. Bolded names represent start or end-point compounds and black circles represent intermediates not named to conserve space. All arrows represent the conversion of one intermediate to the next, with the gene(s) responsible for producing the enzyme depicted either above or below the reaction arrow. Steps for which the enzyme’s gene(s) was not found are highlighted with a red box and denoted by gray, dotted reaction arrows. Asterisks above the name of a particular gene denote that this gene is not present in all strains examined. Grey names and arrows represent steps for which I have found a potential replacement enzyme that has not yet been demonstrated to have the action indicated. Abbreviations are the same as used in Figure 1.3. Appendix A. Supplementary material 48

Thiamin (B1) Riboflavin (B2), folic acid (B9)

phoA rsgA pabA pabB pabC thiC thiD thiE tpk Chorismate AIR TMP Thiamin TPP trpEG smc04014 homolog general general phosphatase folP phosphatase folB folK folC folA nifS L-Cys Folic acid sufS thiI phoA thiG folE thiF rhodanese Sulfur carrier containing protein protein yigB ybjI ribA ribD ribE ribC ribF ribF dxs GTP Riboflavin FMN FAD G-3P serB

thiO Ribulose 5 ribB Glycine phosphate

pncA Nicotinamide NMN Cobalamin (B12) pncB pncC hemA hemB hemC hemD Glycine sthA nadC nadE nadK nadB nadA nadD cobA cysG L-Asp NAD+ NADP+ pntA pntB Coenzyme A (B5), NADP+ (B3) cobI amaB panD

panE panC cobG AHAS ssu ilvC ilvD panB coaA dfp coaD coaE cobK cobF cobM cobJ Pyruvate Coenzyme A mazG AHAS lsu ilvC cobL ilvE L-Val

cobH Pyridoxal 5 phosphate (B6) Lipoic acid lipB lipA cobB epd pdxB serC pdxA pdxJ pdxH Octanoyl-[acp] Lipoic acid D-Erythrose PLP 4-phosphate gapA pdxR cobN thiG cobS (6.6.1.2) D-Ribulose cobT 5-phosphate cobO cobQ

cobD

cobO Biotin (B7) Cobinamide cobP

bioC bioH bioF bioA bioD bioB Fatty acid Malonyl-[acp] Biotin cobP synthesis cobU cobC cytochrome Dimethyl bioI P450 benzimidazole cobS (2.7.2.21)

Long chain [acp] Cobalamin

Figure A.11: Representation of each metabolic pathway involved in the biosynthesis of the B vitamin complex as found in Reichenowia sp. strain Ppar. Bolded names represent start or end-point compounds and black circles represent intermediates not named to conserve space. All arrows represent the conversion of one intermediate to the next, with the gene(s) responsible for producing the enzyme depicted either above or below the reaction arrow. Steps for which the enzyme’s gene(s) was not found are highlighted with a red box and denoted by gray, dotted reaction arrows. Asterisks above the name of a particular gene denote that this gene is not present in all strains examined. Grey names and arrows represent steps for which I have found a potential replacement enzyme that has not yet been demonstrated to have the action indicated. Abbreviations are the same as used in Figure 1.3. Appendix A. Supplementary material 49

Thiamin (B1) Riboflavin (B2), folic acid (B9)

phoA rsgA pabA pabB pabC thiC thiD thiE tpk Chorismate AIR TMP Thiamin TPP trpEG smc04014 homolog general general phosphatase folP phosphatase folB folK folC folA nifS L-Cys Folic acid sufS thiI phoA thiG folE thiF rhodanese Sulfur carrier containing protein protein ribA yigB ybjI ribD ribE ribC ribF ribF dxs GTP Riboflavin FMN FAD G-3P serB

thiO Ribulose 5 ribB Glycine phosphate

pncA Nicotinamide NMN Cobalamin (B12) pncB pncC hemA hemB hemC hemD Glycine nadB nadA nadE nadK sthA nadC nadD cobA cysG L-Asp NAD+ NADP+ pntA pntB Coenzyme A (B5), NADP+ (B3) cobI amaB panD

panE cobG panC cobJ AHAS ssu ilvC ilvD panB coaA dfp coaD coaE cobK cobF cobM Pyruvate Coenzyme A mazG AHAS lsu ilvC cobL ilvE L-Val

cobH Pyridoxal 5 phosphate (B6) Lipoic acid lipB lipA cobB epd pdxB serC pdxA pdxJ pdxH Octanoyl-[acp] Lipoic acid D-Erythrose PLP 4-phosphate gapA pdxR cobN thiG cobS (6.6.1.2) D-Ribulose cobT 5-phosphate cobO cobQ

cobD

cobO Biotin (B7) Cobinamide cobP

bioC bioH bioF bioA bioD bioB Fatty acid Malonyl-[acp] Biotin cobP synthesis cobU cobC cytochrome Dimethyl bioI P450 benzimidazole cobS (2.7.2.21)

Long chain [acp] Cobalamin

Figure A.12: Representation of each metabolic pathway involved in the biosynthesis of the B vitamin complex as found in Reichenowia sp. strain Prug. Bolded names represent start or end-point compounds and black circles represent intermediates not named to conserve space. All arrows represent the conversion of one intermediate to the next, with the gene(s) responsible for producing the enzyme depicted either above or below the reaction arrow. Steps for which the enzyme’s gene(s) was not found are highlighted with a red box and denoted by gray, dotted reaction arrows. Asterisks above the name of a particular gene denote that this gene is not present in all strains examined. Grey names and arrows represent steps for which I have found a potential replacement enzyme that has not yet been demonstrated to have the action indicated. Abbreviations are the same as used in Figure 1.3. Appendix A. Supplementary material 50

Thiamin (B1) Riboflavin (B2), folic acid (B9)

phoA rsgA pabA pabB pabC thiC thiD thiE tpk Chorismate AIR TMP Thiamin TPP trpEG smc04014 homolog general general phosphatase folP phosphatase folB folK folC folA nifS L-Cys Folic acid sufS thiI phoA thiG folE thiF rhodanese Sulfur carrier containing protein protein yigB ybjI ribA ribD ribE ribC ribF ribF dxs GTP Riboflavin FMN FAD G-3P serB

thiO Ribulose 5 ribB Glycine phosphate

pncA Nicotinamide NMN Cobalamin (B12) hemB pncB pncC hemA hemC hemD Glycine sthA nadB nadA nadC nadD nadE nadK L-Asp NAD+ NADP+ cobA cysG pntA pntB Coenzyme A (B5), NADP+ (B3) cobI amaB panD

panE panC coaE cobG AHAS ssu ilvC ilvD panB coaA dfp coaD cobK cobF cobM cobJ Pyruvate Coenzyme A mazG AHAS lsu ilvC cobL ilvE L-Val

cobH Pyridoxal 5 phosphate (B6) Lipoic acid lipB lipA cobB epd pdxB serC pdxA pdxJ pdxH Octanoyl-[acp] Lipoic acid D-Erythrose PLP 4-phosphate gapA pdxR cobN thiG cobS (6.6.1.2) D-Ribulose cobT 5-phosphate cobO cobQ

cobD cobO Biotin (B7) Cobinamide cobP

bioC bioH bioF bioA bioD bioB Fatty acid Malonyl-[acp] Biotin cobP synthesis cobU cobC cytochrome Dimethyl bioI P450 benzimidazole cobS (2.7.2.21)

Long chain [acp] Cobalamin

Figure A.13: Representation of each metabolic pathway involved in the biosynthesis of the B vitamin complex as found in Reichenowia sp. strain Psp. Bolded names represent start or end-point compounds and black circles represent intermediates not named to conserve space. All arrows represent the conversion of one intermediate to the next, with the gene(s) responsible for producing the enzyme depicted either above or below the reaction arrow. Steps for which the enzyme’s gene(s) was not found are highlighted with a red box and denoted by gray, dotted reaction arrows. Asterisks above the name of a particular gene denote that this gene is not present in all strains examined. Grey names and arrows represent steps for which I have found a potential replacement enzyme that has not yet been demonstrated to have the action indicated. Abbreviations are the same as used in Figure 1.3.