Phylogenetic and Biochemical Evidence for Sterol Synthesis in the Bacterium Gemmata Obscuriglobus
Total Page:16
File Type:pdf, Size:1020Kb
Phylogenetic and biochemical evidence for sterol synthesis in the bacterium Gemmata obscuriglobus Ann Pearson*†, Meytal Budin*, and Jochen J. Brocks‡ Departments of *Earth and Planetary Sciences and ‡Organismic and Evolutionary Biology, Harvard University, Cambridge, MA 02138 Communicated by Andrew H. Knoll, Harvard University, Cambridge, MA, October 10, 2003 (received for review July 9, 2003) Sterol biosynthesis is viewed primarily as a eukaryotic process, and Here we conducted an exhaustive search of all microbial genetic the frequency of its occurrence in bacteria has long been a subject sequence data currently in the public domain to (i) identify the of controversy. Two enzymes, squalene monooxygenase and ox- presence of SQMO genes in prokaryotes, (ii) identify the OSC idosqualene cyclase, are the minimum necessary for initial biosyn- genes carried by organisms also containing SQMO and determine thesis of sterols from squalene. In this work, 19 protein gene whether these sequences are consistent with lanosterol synthase, sequences for eukaryotic squalene monooxygenase and 12 protein cycloartenol synthase, or SHC activity, and (iii) investigate the gene sequences for eukaryotic oxidosqualene cyclase were com- phylogeny of these genetic sequences with respect to sterol biosyn- pared with all available complete and partial prokaryotic genomes. thesis in eukaryotes. We hypothesized that sequence similarity to a The only unequivocal matches for a sterol biosynthetic pathway specific group of eukaryotes (animals, plants, fungi, or protists) were in the proteobacterium, Methylococcus capsulatus, in which would be consistent with a recent lateral gene transfer event, sterol biosynthesis is known, and in the planctomycete, Gemmata whereas divergence of the bacterial sequences into a basal clade obscuriglobus. The latter species contains the most abbreviated would indicate an ancient biochemical origin. Significantly, these sterol pathway yet identified in any organism. Analysis shows that results could impact our understanding of the coevolution of the major sterols in Gemmata are lanosterol and its uncommon metabolic pathways, the composition of the atmosphere, and the isomer, parkeol. There are no subsequent modifications of these radiation of nucleated cells (eukaryotes). products. In bacteria, the sterol biosynthesis genes occupy a contiguous coding region and possibly comprise a single operon. Methods Phylogenetic trees constructed for both enzymes show that the Genome Searches. Amino acid sequences for all identified and sterol pathway in bacteria and eukaryotes has a common ancestry. annotated eukaryotic SQMO genes were compiled from the It is likely that this contiguous reading frame was exchanged National Center for Biotechnology Information (NCBI) and between bacteria and early eukaryotes via lateral gene transfer or Department of Energy (DOE)͞Joint Genome Institute (JGI) endosymbiotic events. The primitive sterols produced by Gemmata public databases, www.ncbi.nih.gov and www.jgi.doe.gov (see the suggest that this genus could retain the most ancient remnants of first column of Table 2, which is published as supporting the sterol biosynthetic pathway. information on the PNAS web site). Amino acid sequences for OSC genes of the same organisms also were obtained when terol biosynthesis is nearly ubiquitous among eukaryotes; available (Table 2, second column). To identify SQMO genes in Sconversely, it is almost completely absent in prokaryotes (1). prokaryotes, each SQMO sequence from Table 2 was searched As a result, the presence of diverse steranes in ancient rocks is by BLAST (using default values) against all complete and partial used as evidence for eukaryotic evolution Ͼ2.7 billion years ago microbial genomes available through NCBI, as well as the genomes currently in process at the Department of Energy Joint (2). However, the occasional presence of sterols in prokaryotes Genome Institute and the Max Planck Institute (http:͞͞ is poorly understood. Sterol production by bacteria previously blast.mpi-bremen.de͞blast; accessed March, 2003). The best has been demonstrated only in the Methylococcales (3, 4) and matches (Table 1) were defined as all species that yielded at least Myxobacteriales (5, 6). Ϫ one expect similarity value Ͻ10 8 and were compiled and ranked Understanding the evolution of sterol biosynthesis is of sig- by weighted similarity score as described below. Because many nificant interest to biochemistry, evolutionary biology, and the microbial species contain OSC genes, including SHCs, searches geosciences, because the only known biosynthetic pathway re- for OSC genes were conducted only for those organisms iden- quires molecular oxygen. The first step in this pathway is the tified through the SQMO similarity searches. Phylogeny of the epoxidation of the hydrocarbon squalene, in which the addition 1 SHCs is not discussed in this work. of ⁄2O is catalyzed by the enzyme squalene monooxygenase 2 Raw DNA sequence data for the putative SQMO and OSC (SQMO) (7). Unless there are other unknown enzymes or coding regions of M. capsulatus (gnl TIGR414 contig: abiogenic reactions capable of producing squalene epoxide, this 229:Methylococcus capsulatus) and G. obscuriglobus would require the prior evolution of oxygenic photosynthesis. (gnl TIGR214688 contig:782:Gemmata obscuriglobus UQM For sterol biosynthesis to date to the last common ancestor, a 2246) were translated into all six reading frames with minimum biogenic or abiogenic peroxidation could be a potential mech- read length set to 300 aa. Resulting translated sequences were anism, although this has not yet been demonstrated. BLASTed against the Swiss-Prot protein sequence database to Cyclization of squalene epoxide to form the initial sterol identify putative functional homologues. proceeds immediately through the action of a second enzyme, oxidosqualene cyclase (OSC). It is believed that OSC evolved Sequence Alignments. All data were aligned as protein sequences. from the hopanoid pathway predecessor, bacterial squalene– Multiple alignments were done in CLUSTALW 1.8, using the BLOSUM hopene cyclase (SHC) (8, 9). In eukaryotes, the initial sterols weight matrix, gap open penalty 10, gap extension penalty 0.0, and lanosterol and cycloartenol are merely biosynthetic intermedi- ates; i.e., up to 20 additional steps are required for downstream conversion of lanosterol to the animal product, cholesterol. Abbreviations: OSC, oxidosqualene cyclase; SHC, squalene–hopene cyclase; SQMO, Synthesis of sterols, however, requires only SQMO and OSC. squalene monooxygenase. The major outstanding questions are when and in what organ- †To whom correspondence should be addressed. E-mail: [email protected]. isms did the original synthesis pathway develop? © 2003 by The National Academy of Sciences of the USA 15352–15357 ͉ PNAS ͉ December 23, 2003 ͉ vol. 100 ͉ no. 26 www.pnas.org͞cgi͞doi͞10.1073͞pnas.2536559100 Downloaded by guest on October 1, 2021 Table 1. BLAST results for SQMO and OSC genes of eukaryotes (rows) vs. prokaryotes (columns) Methylococcus Gemmata Burkholderia Bradyrhizobium Azotobacter Nostoc Pseudomonas Bacillus Thermobifida Coxiella capsulatus obscuriglobus mallei japonicum vinelandii punctiforme* aeruginosa* halodurans* fusca burnetii Arabidopsis† thaliana —͞leϪ100 7eϪ7͞3eϪ80 3eϪ06͞2eϪ45 —͞6eϪ37 —͞2eϪ43 —͞1eϪ41 —͞— 3eϪ06͞——͞——͞— (ER11) A. thaliana (ER12) — 2eϪ6 7eϪ05 ————5eϪ05 —— A. thaliana (ER13) 2eϪ15 9eϪ13 2eϪ12 2eϪ7 4eϪ4 1eϪ13 9eϪ07 4eϪ10 4eϪ5 4eϪ10 A. thaliana (NP564) 1eϪ13 1eϪ14 5eϪ08 2eϪ5 4eϪ3 1eϪ07 2eϪ07 1eϪ06 1eϪ5 — A. thaliana (NP568) 2eϪ19 2eϪ13 3eϪ12 2eϪ4 1eϪ3 8eϪ03 9eϪ08 — 7eϪ5 — Brassica napus (ER11) 1eϪ10͞n.a. 1eϪ07͞n.a. 3eϪ07͞n.a. —͞n.a. —͞n.a. —͞n.a. 1eϪ03͞n.a. 1eϪ06͞n.a. —͞n.a. —͞n.a. B. napus (ER12) 5eϪ7 8eϪ06 4eϪ05 ————3eϪ05 —— Candida albicans 2eϪ12͞2eϪ87 1eϪ18͞4eϪ78 5eϪ03͞3eϪ29 3eϪ07͞8eϪ15 8eϪ05͞9eϪ25 9eϪ05͞1eϪ25 6eϪ08͞— 3eϪ02͞— 5eϪ3͞——͞— Homo sapiens 7eϪ16͞1eϪ120 8eϪ15͞6eϪ92 1eϪ04͞5eϪ50 4eϪ02͞2eϪ47 —͞4eϪ38 2eϪ05͞2eϪ37 4eϪ07͞— 1eϪ07͞——͞——͞— Leishmania major 6eϪ14͞n.a. 6eϪ13͞n.a. —͞n.a. —͞n.a. —͞n.a. —͞n.a. —͞n.a. —͞n.a. —͞n.a. —͞n.a. Medicago truncatula 2eϪ20͞1eϪ60 6eϪ19͞2eϪ52 1eϪ15͞1eϪ36 9eϪ10͞9eϪ30 1eϪ10͞4eϪ32 9eϪ05͞6eϪ28 3eϪ17͞— 1eϪ02͞— 1eϪ7͞— 1eϪ8͞— (49) M. truncatula (48) 3eϪ14 3eϪ15 3eϪ10 5eϪ3 2eϪ4 4eϪ02 2eϪ09 — 3eϪ2 — Mus musculus 9eϪ16͞1eϪ118 3eϪ15͞2eϪ92 3eϪ09͞5eϪ52 2eϪ5͞7eϪ46 4eϪ3͞8eϪ37 3eϪ07͞2eϪ35 2eϪ08͞— 1eϪ09͞— 3eϪ3͞——͞— Oryza sativa (687) 7eϪ17͞1eϪ101 2eϪ14͞6eϪ79 3eϪ10͞2eϪ39 7eϪ6͞5eϪ41 3eϪ6͞2eϪ39 4eϪ02͞3eϪ35 6eϪ08͞— 7eϪ07͞— 4eϪ6͞——͞— O. sativa (686) 1eϪ18 2eϪ13 2eϪ08 1eϪ4 5eϪ6 — 5eϪ08 — 1eϪ3 — Panax ginseng 2eϪ18͞1eϪ103 3eϪ18͞2eϪ83 1eϪ13͞9eϪ41 3eϪ6͞5eϪ36 3eϪ7͞3eϪ38 6eϪ06͞2eϪ39 9eϪ10͞——͞— 6eϪ10͞— 6eϪ10͞— Rattus norvegicus 1eϪ14͞1eϪ119 2eϪ16͞1eϪ93 5eϪ10͞1eϪ54 6eϪ4͞2eϪ46 4eϪ4͞5eϪ39 7eϪ08͞9eϪ38 2eϪ09͞— 2eϪ11͞— 7eϪ3͞— 5eϪ4͞— Saccharomyces 2eϪ11͞9eϪ86 6eϪ18͞5eϪ75 4eϪ2͞1eϪ17 1eϪ4͞1eϪ19 5eϪ2͞2eϪ21 1eϪ03͞3eϪ24 8eϪ08͞— 4eϪ02͞— 6eϪ2͞——͞— cerevisiae Schizosaccharomyces 1eϪ13͞6eϪ98 1eϪ15͞1eϪ82 3eϪ5͞2eϪ39 3eϪ4͞4eϪ35 —͞8eϪ34 —͞1eϪ33 6eϪ04͞— 1eϪ04͞——͞——͞— pombe Weighted score (13͉82) (13͉67) (7͉33) (3͉29) (3͉28) (3͉28) (6͉0) (4͉0) (3͉0) (2͉0) Expect values are presented in the order SQMO͞OSC. Single copies of OSC genes are scored in the first row for each organism; multiple copies of SQMO genes are scored separately. The weighted average probability scores are shown in the last row. n.a., sequence not available; —, no hits. *Complete genome. †Accession numbers and sequence information are given in Table 2. hydrophilic gaps. The output