Parallel Genomic Evolution and Metabolic Interdependence in an Ancient Symbiosis
Total Page:16
File Type:pdf, Size:1020Kb
Parallel genomic evolution and metabolic interdependence in an ancient symbiosis John P. McCutcheon†‡ and Nancy A. Moran‡§ †Center for Insect Science and ‡Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ 85721-0088 Edited by E. Peter Greenberg, University of Washington School of Medicine, Seattle, WA, and approved October 19, 2007 (received for review September 18, 2007) Obligate symbioses with nutrient-provisioning bacteria have orig- C. atlanticus (2.95*) 99 inated often during animal evolution and have been key to the 57 G. forsetii (3.80) ecological diversification of many invertebrate groups. To date, 92 Flavo. BBFL7 (3.08*) genome sequences of insect nutritional symbionts have been Flavo. MED217 (4.24*) restricted to a related cluster within Gammaproteobacteria and 40 99 Cellulophaga (3.30*) have revealed distinctive features, including extreme reduction, Flavo. HTCC2170 (3.88*) 0.2 67 100 rapid evolution, and biased nucleotide composition. Using recently R. biformata (3.53*) developed sequencing technologies, we show that Sulcia muelleri, F. johnsoniae (6.10) 100 100 a member of the Bacteroidetes, underwent similar genomic Flavo. BAL38 (2.81*) 100 changes during coevolution with its sap-feeding insect host (sharp- P. irgensii (2.75*) shooters) and the coresident symbiont Baumannia cicadellinicola Sulcia (0.25) 76 (Gammaproteobacteria). At 245 kilobases, Sulcia’s genome is ap- B. thetaiotaomicron (6.26) proximately one tenth of the smallest known Bacteroidetes ge- 100 B. fragilis YCH46 (5.28) 100 nome and among the smallest for any cellular organism. Analysis 100 100 B. fragilis NCTC 9343 (5.21) of the coding capacities of Sulcia and Baumannia reveals striking P. gingivalis (2.34) complementarity in metabolic capabilities. Algoriphagus sp. PR1 (4.78*) 98 M. marina (9.77*) Bacteroidetes ͉ insects ͉ pyrosequencing ͉ Sharpshooters ͉ 98 C. hutchinsonii (4.43) genome reduction S. ruber (3.55) embers of the Bacteroidetes are widely distributed in Fig. 1. Relationships of Sulcia muelleri to other sequenced members of the Bacteroidetes. Bootstrap values for 100 replicates are shown at the bifurca- Mnature, and have been reported as prominent members of tions. The genome sizes (in megabases) are given in parentheses after the environments as diverse as coastal marine waters (1), the human organism name. Those sizes marked with an asterisk are from incomplete gut (2, 3), and dental plaques (4). In insects, a member of the genome projects and should be considered approximate. (Scale bar: 0.2 Bacteroidetes called Sulcia muelleri (Fig. 1) has been shown to changes per site.) be an ancient symbiont of a large group of sap-feeding insects, in which the initial infection was acquired Ͼ260 million years ago (5). In addition to Sulcia, these insects have at least one other plex sample containing a mixture of DNA from the insect host, long-term heritable symbiont (5), exemplified by Baumannia in Sulcia, and Baumannia, with the host DNA constituting the the case of sharpshooters. These symbioses are models of majority fraction of the sample by weight and the Sulcia DNA codiversification over long time periods: Baumannia, Sulcia, and representing the majority of the bacterial fraction. their sharpshooter hosts seem to have diversified through strict vertical association during evolution of this insect group (6). Results Sharpshooters feed exclusively on xylem sap, which is the most By maximizing the representation of Sulcia in our sample, we dilute and unbalanced food source used by herbivores (7, 8). succeeded in obtaining deep coverage for this genome. Of the 416 Xylem composition varies depending on the plant assayed, but contigs generated from the Newbler (10) assembly, 25 were cleanly the primary components are typically a dilute mix of the amino acids glutamate, glutamine, aspartate, and asparagine; some separated by a greater average depth of coverage [supporting simple organic acids (primarily malate); and various sugars information (SI) Fig. 4]. Twenty-three of these contigs had gene (primarily glucose) (7, 8). The genome sequence of Baumannia contents suggesting they belonged to the Sulcia genome and were was recently completed, along with fragments of the Sulcia assembled into a complete circular genome based solely on data genome, both from the invasive agricultural pest Homalodisca generated from the 454 run. This genome included almost all of the vitripennis (formerly H. coagulata, also known as the Glassy- sequence contigs that had been assigned to Sulcia previously (9), Winged Sharpshooter) (9). Analysis of the Baumannia genome revealed that it primarily contributes vitamins and cofactors to Author contributions: J.P.M. and N.A.M. designed research; J.P.M. performed research; the host, while encoding at least partial pathways for two of the J.P.M. and N.A.M. analyzed data; and J.P.M. and N.A.M. wrote the paper. 10 essential amino acids (9). The partial sequence obtained for The authors declare no conflict of interest. Sulcia suggested that it is primarily responsible for essential This article is a PNAS Direct Submission. amino acid biosynthesis (9). Data deposition: The sequence of Sulcia muelleri has been deposited in the GenBank/ To fully and unambiguously assess the role of Sulcia in the EMBL/DDBJ database (accession no. CP000770). metabolism of this tripartite symbiosis, we sequenced the ge- §To whom correspondence should be addressed at: Department of Ecology and Evolution- nome using pyrosequencing (454 Life Sciences/Roche Applied ary Biology, Biosciences West, Room 310, 1041 East Lowell Street, Tucson, AZ 85721-0088. Science) (10). We attempted to enrich for Sulcia DNA in our E-mail: [email protected]. sample through dissection of the appropriate bacteriome. None- This article contains supporting information online at www.pnas.org/cgi/content/full/ theless, as in previous genome sequencing projects on noncul- 0708855104/DC1. tivable, host-associated microorganisms, we started with a com- © 2007 by The National Academy of Sciences of the USA 19392–19397 ͉ PNAS ͉ December 4, 2007 ͉ vol. 104 ͉ no. 49 www.pnas.org͞cgi͞doi͞10.1073͞pnas.0708855104 Downloaded by guest on September 24, 2021 NAD NADH NADH NAD Sulcia muelleri glyceraldehyde 3-P glycerate 1,3-P gapA malate maeA leucine ilvE leuB leuC,D leuA NAD NADH ilvE ilvCilvD ilvN,B pyruvate iscS valine acetyl-CoA cysteine alanine 2-ketovaline acoA,B; aceF; lpdA sufS/E/B/C/D Fe-S cluster thrA thrB thrC ilvA assembly isoleucine homoserinethreonine ilvB,N ilvC ilvD ilvE lysC asd dapA dapB dapD argD dapE dapF lysA aspartate lysine succinate aspartate-semialdehyde sucC,D oxaloacetate 2-ketoglutarate argA argB argC argD argE argF argG argH succinyl-CoA arginine korA,B aspC glutamate ornithine sucA,B; lpdA putA glutamine NADH carbamoyl phosphate FAD FADH carA carB 2 NAD NADH NAD proline pyrroline 5-carboxylate putA aroGaroB aroD aroE aroK aroA chorismate pheA aspC PEP phenylalanine aroC pheA erythrose-4-phosphate trpD,E trpG trpF trpC trpA trpB tryptophan malonyl-[ACP] [ACP] lipB lipA octanoyl-[ACP] lipoylated domain protein acetyl-[ACP] acetoacetyl-[ACP] fabF apo domain protein DHNA ispB prsA polyprenyl-PP menA ribulose-5-P PRPP polyprenyl(n+1)-PP + + cations heavy ubiE H H ADP NADH NAD H O ATP metals menaquinone O2 2 ? ADP ATP NADH dehydrogenase cbb3-type Mo oxidoreductase? cyt. oxidase ATP synthase cyt. c + + multidrug H H Fig. 2. The predicted metabolism of Sulcia muelleri. Essential amino acids are shown in red letters, and vitamins/cofactors are shown in blue. Gene names are written in plain letters, and names of selected compounds are written in bold and indicated by a small gray circle. Genes to which an ORF cannot be unambiguously assigned are written in gray. indicating that the binning criteria were generally accurate but not all protein-coding genes are devoted to translation-related and perfect. amino acid biosynthesis functions, respectively (SI Fig. 5). Because pyrosequencing has trouble accurately calling the Analysis of Sulcia’s predicted metabolic map (Fig. 2) reveals number of bases in homopolymeric regions, especially for single- an organism largely devoted to essential amino acid synthesis. base runs longer than 5–8 bases (10, 11), and because Sulcia’s We predict that pathways for leucine, valine, threonine, isoleu- high AϩT content results in numerous A and T homopolymeric cine, phenylalanine, and tryptophan are all complete. As ob- tracts, we expected and found inaccuracies in those regions. served in Escherichia coli, we predict that the DapC activity in Thirty homopolymeric regions that disagreed between the 454- lysine biosynthesis is fulfilled by ArgD (13). The lysine and and previously obtained Sanger-generated sequences available arginine pathways show evolutionary homology (14), and one for portions of the Sulcia genome (9) were retested by PCR and Sulcia gene cannot be definitely assigned as either argE or dapE. Sanger sequencing; in all cases, the Sanger prediction was These two homologous enzymes perform the same chemical confirmed. reaction (the cleavage of an amide bond) on related molecules, To resolve the homopolymeric regions in the Sulcia genome, and this one gene may perform both roles in Sulcia. In addition, 13,564,883 reads of 33 bases in length were generated from one we predict that the N-acetyltransferase activity of ArgA is fused partial run of an Illumina/Solexa Genome Analyzer, which to the amino terminus of ArgG, not unlike other previous affords many short, highly accurate reads (12). Mapping these examples of the mobility of this activity (15). reads onto the Sulcia genome corrected 125 additional ho- Sulcia seems to be able to generate reducing power in the form mopolymeric regions and eliminated almost all frameshifts and of NADH, to aerobically transmit this energy to a cbb-3 type in-frame stop codons in the predicted coding regions. However, cytochrome c oxidase-terminated electron transport chain to an in-frame stop in a putative cytochrome c assembly protein was generate a proton gradient, and to make ATP using this proton well supported by the Solexa data and was left in the genome, motive force.