The Cyanobacterial Genome Core and the Origin of Photosynthesis
Total Page:16
File Type:pdf, Size:1020Kb
The cyanobacterial genome core and the origin of photosynthesis Armen Y. Mulkidjanian*†‡, Eugene V. Koonin§, Kira S. Makarova§, Sergey L. Mekhedov§, Alexander Sorokin§, Yuri I. Wolf§, Alexis Dufresne¶, Fre´ de´ ric Partensky¶, Henry Burdʈ, Denis Kaznadzeyʈ, Robert Haselkorn†**, and Michael Y. Galperin†§ *School of Physics, University of Osnabru¨ck, D-49069 Osnabru¨ck, Germany; ‡A. N. Belozersky Institute of Physico–Chemical Biology, Moscow State University, Moscow 119899, Russia; §National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894; ¶Station Biologique, Unite´Mixte de Recherche 7144, Centre National de la Recherche Scientifique et Universite´Paris 6, BP74, F-29682 Roscoff Cedex, France; ʈIntegrated Genomics, Inc., Chicago, IL 60612; and **Department of Molecular Genetics and Cell Biology, University of Chicago, 920 East 58th Street, Chicago, IL 60637 Contributed by Robert Haselkorn, July 14, 2006 Comparative analysis of 15 complete cyanobacterial genome se- to trace the conservation of these genes among other taxa. We quences, including ‘‘near minimal’’ genomes of five strains of analyzed the phylogenetic affinities of genes in this set and Prochlorococcus spp., revealed 1,054 protein families [core cya- identified previously unrecognized candidate photosynthetic nobacterial clusters of orthologous groups of proteins (core Cy- genes. We further used this gene set to address the identity of the OGs)] encoded in at least 14 of them. The majority of the core first phototrophs, a subject of intense discussion in recent years CyOGs are involved in central cellular functions that are shared (8, 9, 12–33). We show that cyanobacteria and plants share with other bacteria; 50 core CyOGs are specific for cyanobacteria, numerous photosynthesis-related genes that are missing in ge- whereas 84 are exclusively shared by cyanobacteria and plants nomes of other phototrophs. This observation suggests, in and͞or other plastid-carrying eukaryotes, such as diatoms or api- agreement with geological evidence, that (now extinct) anoxy- complexans. The latter group includes 35 families of uncharacter- genic ancestors of cyanobacteria are the most plausible candi- ized proteins, which could also be involved in photosynthesis. Only dates for the ancestral photoautotrophs, which apparently dis- a few components of cyanobacterial photosynthetic machinery are seminated parts of their photosynthetic apparatus to other represented in the genomes of the anoxygenic phototrophic bac- bacteria by way of LGT. teria Chlorobium tepidum, Rhodopseudomonas palustris, Chlo- roflexus aurantiacus,orHeliobacillus mobilis. These observations, Results coupled with recent geological data on the properties of the Common and Unique Protein Families in Cyanobacteria. Clustering of ancient phototrophs, suggest that photosynthesis originated in proteins encoded in the 15 complete cyanobacterial genomes the cyanobacterial lineage under the selective pressures of UV light yielded 3,188 protein families [cyanobacterial clusters of ortholo- and depletion of electron donors. We propose that the first gous groups of proteins (CyOGs)] with members encoded in at phototrophs were anaerobic ancestors of cyanobacteria (‘‘procya- least three genomes. Of these CyOGs, 892 were encoded in each nobacteria’’) that conducted anoxygenic photosynthesis using a cyanobacterial genome, and 162 more were encoded in 14 of 15 photosystem I-like reaction center, somewhat similar to the het- genomes (Table 2, which is published as supporting information erocysts of modern filamentous cyanobacteria. From procyanobac- on the PNAS web site). The combined set of 1,054 CyOGs that teria, photosynthesis spread to other phyla by way of lateral gene are missing in no more than one cyanobacterial genome is transfer. hereafter referred to as the core CyOGs. Predictably, cyanobac- teria with small genomes are over-represented in the core ͉ ͉ cyanobacteria protein families lateral gene transfer CyOGs compared with species with larger genomes. Thus, core CyOGs include 52–66% of all proteins encoded in Prochloro- yanobacteria are one of the earliest branching groups of coccus spp. but only 25% of Anabaena sp. PCC 7120 proteins Corganisms on this planet (1, 2). They are the only known (Table 3, which is published as supporting information on the prokaryotes to carry out oxygenic photosynthesis, and there is PNAS web site). little doubt that they played a key role in the formation of Analysis of CyOGs that apparently had no members in one or atmospheric oxygen Ϸ2.3 Gyr ago (2). Despite its evolutionary, more cyanobacterial genomes revealed 31 (mostly short) pro- environmental, and geochemical importance, many aspects of teins that are encoded in the respective genomes but were not cyanobacterial cell life remain obscure (3–5). Genome sequenc- called by gene-finding programs, such as subunits VI (PetL) and ing opened a new chapter in cyanobacterial research. In the last VII (PetM) of the cytochrome b6f complex (34). We also found few years, complete genome sequences of several freshwater and five full-length genes that were annotated as pseudogenes in the marine cyanobacteria became available, providing ample data original genome submissions and whose products were not for systematic analysis. A comparison of the complete genomes included in the protein database (Table 4, which is published as from three different strains of Prochlorococcus spp. demon- supporting information on the PNAS web site). strated a wide variety of gene complements within this genus due The stringent criteria used to define the core CyOGs led to to massive genome reduction in some lineages (6, 7). Studies of the genes shared by cyanobacteria and other photosynthetic organisms allowed delineation of the ‘‘photosynthetic gene set’’ Conflict of interest statement: No conflicts declared. and demonstrated a significant extent of lateral gene transfer Freely available online through the PNAS open access option. (LGT) among phototrophic bacteria (8–11). A somewhat sur- Abbreviations: CyOGs, cyanobacterial clusters of orthologous groups of proteins; LGT, prising result of the latter work has been that genes for most lateral gene transfer; PSI, photosystem I; PSII, photosystem II; RC, reaction center. proteins involved in photosynthesis (hereafter ‘‘photosynthetic Data deposition: The sequences reported in this paper have been deposited in the GenBank genes’’) were not in the photosynthetic gene set. database (accession nos. DQ831217–DQ831236). We compared proteins encoded in 15 complete cyanobacterial †To whom correspondence may be addressed: E-mail: [email protected], r-haselkorn@ genomes, including five genomes of Prochlorococcus spp., to uchicago.edu, or [email protected]. define the minimal set of genes common to all cyanobacteria and © 2006 by The National Academy of Sciences of the USA 13126–13131 ͉ PNAS ͉ August 29, 2006 ͉ vol. 103 ͉ no. 35 www.pnas.org͞cgi͞doi͞10.1073͞pnas.0605709103 Downloaded by guest on October 2, 2021 Table 1. Distribution of photosynthesis-related genes in genomes of phototrophic bacteria Gene products Cyanobacteria Plants Ctep Rpal Caur Hmob Photosystem I proteins Core RC1 subunit PsaA͞PsaB All YY——Y RC1 subunits PsaC, PsaD, PsaE, PsaF, PsaL, PsaM All Y———— RC1 subunits PsaI, PsaJ, PsaK Missing in Gvio Y———— Photosystem II proteins Core RC subunit PsbA͞PsbD All Y—YY— RC2 subunits PsbB, PsbC, PsbE, PsbF, PsbH, PsbI, PsbJ, All Y———— PsbK, PsbL, PsbM, PsbO, PsbP, PsbT, PsbW(Psb28), PsbX RC2 subunit PsbN Missing in one Pmar Y———— RC2 subunits PsbZ (Ycf9) and Psb27 Missing in Gvio Y———— RC2 subunit PsbY (Ycf32) Missing in Gvio and one Pmar Y — — — — Extrinsic protein PsbU, cytochrome c550 PsbV Missing in four Pmar Y* —* — — — Extrinsic protein PsbO Missing in Gvio and all Pmar Y — — — — Chlorophyll-binding proteins Light-harvesting complexes PcbABC͞IsiA Missing in four genomes ————— High-light-inducible proteins HLIP͞ELIP͞Scp All Y———— Chlorophyll biosynthesis enzymes ChlB, ChlL, ChlN, ChlD, ChlH, ChlI, ChlM, ChlG, ChlP All YYYYY Aerobic protoporphyrin IX cyclase AcsF (Ycf59) All Y—YY— Anaerobic protoporphyrin IX cyclase BchE Only in Synechocystis — YYY Y Cytochrome b6f complex subunits PetB, PetC All Y Y Y — Y PetA, PetD, PetG, PetM, PetN All Y — — — —† PetL Missing in Gvio Y————† Water-soluble electron carriers Plastocyanin PetE Missing in Telo Y — — Y — † Cytochrome c6 (PetJ, c553) Missing in one Pmar Y———— Calvin cycle enzymes RubisCO large subunit RbcL All Y Y‡ Y— Y‡ RubisCO small subunit RbcS All Y—Y—— Phosphoribulokinase, ribose-5-phosphate isomerase All Y—YY— Regulatory and uncharacterized chloroplast proteins Gun4, Tab2, Cp12, PM23, Ycf3, Ycf4, Ycf23, Ycf33, Ycf34, All Y———— Ycf35, Ycf37, Ycf51, Ycf58, Ycf60 Cp097, Ycf20, Ycf21, Ycf55, Ycf66 Missing in Gvio Y———— The presence or absence of genes coding for proteins of each type in the complete genomes of 15 cyanobacteria, plants, Chlorobium tepidum (Ctep), Rhodopseudomonas palustris (Rpal), Chloroflexus aurantiacus (Caur), and Heliobacillus mobilis (Hmob) was assessed as described in Materials and Methods. RubisCO, ribulose-1, 5-bisphosphate carboxylase͞oxygenase; Gvio, Gleobacter violaceus; Pmar, Prochlorococcus marinus; Telo, Thermosynechococcus elongatus; Y, gene present; —, gene absent. A detailed version of this table appears as Table 5, which is published as supporting information on the PNAS web site. *PsbU and PsbV are found in red algae, haptophytes, diatoms, and dinoflagellates, but so