What Is an Archaeon and Are the Archaea Really Unique?
Total Page:16
File Type:pdf, Size:1020Kb
1 What IS AN ARCHAEON AND ARE THE 2 ArCHAEA REALLY unique? 1* 3 Ajith Harish Department OF Cell AND Molecular Biology, StructurAL AND Molecular Biology PrOGRam, *For CORRespondence: 4 1 [email protected] Uppsala University, Uppsala, Sweden 5 6 7 AbstrACT The RECOGNITION OF THE GROUP ArCHAEA 40 YEARS AGO STIMULATED RESEARCH IN MICROBIAL 8 EVOLUTION AND MOLECULAR SYSTEMATICS THAT PROMPTED A NEW CLASSIfiCATORY SCHEME TO ORGANIZE 9 BIODIVERSITY. Advances IN DNA SEQUENCING TECHNIQUES HAVE SINCE SIGNIfiCANTLY IMPROVED THE 10 GENOMIC REPRESENTATION OF THE ARCHAEAL BIODIVERSITY. IN addition, ADVANCES IN PHYLOGENETIC 11 MODELING THAT FACILITATE LARge-scale PHYLOGENOMICS HAVE RESOLVED MANY RECALCITRANT BRANCHES OF THE 12 TREE OF Life. Despite THE TECHNICAL ADVANCES AND AN EXPANDED TAXONOMIC REPResentation, TWO 13 IMPORTANT ASPECTS OF THE ORIGINS AND EVOLUTION OF THE ArCHAEA REMAIN CONTROversial, EVEN AS WE 14 CELEBRATE THE 40th ANNIVERSARY OF THE MONUMENTAL DISCOVERY. The ISSUES CONCERN (i) THE UNIQUENESS 15 (monophyly) OF THE Archaea, AND (ii) THE EVOLUTIONARY RELATIONSHIPS OF THE ArCHAEA TO THE Bacteria 16 AND THE Eukarya; BOTH OF THESE ARE RELEVANT TO THE DEEP STRUCTURE OF THE TREE OF Life. The 17 UNCERTAINTY IS PRIMARILY DUE TO A SCARCITY OF INFORMATION IN STANDARD DATASETS—THE CORe-genes 18 DATASETS—TO RELIABLY RESOLVE THE CONflicts. These CONflICTS CAN BE RESOLVED EffiCIENTLY BY EMPLOYING 19 COMPLEX GENOMIC FEATURES AND genome-scale EVOLUTION MODELS—A DISTINCT CLASS OF PHYLOGENOMIC 20 CHARACTERS AND EVOLUTION MODELS—THAT CAN BE EMPLOYED ROUTINELY TO MAXIMIZE THE USE OF GENOME 21 SEQUENCES AS WELL AS TO MINIMIZE UNCERTAINTIES IN TESTS OF EVOLUTIONARY hypotheses. 22 23 INTRODUCTION 24 The RECOGNITION OF THE ArCHAEA AS THE so-called “THIRD FORM OF LIFE” WAS MADE POSSIBLE IN PART BY A 25 NEW TECHNOLOGY FOR SEQUENCE analysis, OLIGONUCLEOTIDE cataloging, DEVELOPED BY FrEDRIK Sanger AND 26 COLLEAGUES IN THE 1960s (1, 2). Carl WOESE’S INSIGHT OF USING THIS method, AND THE CHOICE OF THE SMALL 27 SUBUNIT RIBOSOMAL RNA (16S/SSU rRNA) AS A PHYLOGENETIC MARKer, NOT ONLY PUT MICROORGANISMS 28 ON A PHYLOGENETIC MAP (or TRee), BUT ALSO REVOLUTIONIZED THE fiELD OF MOLECULAR SYSTEMATICS THAT 29 ZukERKANDL AND Pauling HAS PREVIOUSLY ALLUDED TO (3). ComparATIVE ANALYSIS OF ORganism-specifiC 30 (oligonucleotide) SEquence-signaturES IN SSU rRNA LED TO THE RECOGNITION OF A DISTINCT GROUP OF 31 MICROORGANISMS (2, 4). INITIALLY REFERRED TO AS Archaeabacteria, THESE UNUSUAL ORGANISMS HAD 32 ‘OLIGONUCLEOTIDE SIGNATURES’ DISTINCT FROM OTHER BACTERIA (Eubacteria), AND THEY WERE LATER FOUND TO 33 BE DIffERENT FROM THOSE OF Eukarya (eukaryotes) AS well. Many OTHER FEATURes, INCLUDING molecular, 34 BIOCHEMICAL AS WELL AS ecological, CORROBORATED THE UNIQUENESS OF THE Archaea. Thus THE ARCHAEAL 35 CONCEPT WAS ESTABLISHED (2). 36 The STUDY OF MICROBIAL DIVERSITY AND EVOLUTION HAS COME A LONG WAY SINCE then: SEQUENCING 37 MICROBIAL genomes, AND DIRECTLY FROM THE ENVIRONMENT WITHOUT THE NEED FOR CULTURING IS NOW 38 ROUTINE (5, 6). This WEALTH OF SEQUENCE INFORMATION IS EXCITING NOT ONLY FOR CATALOGING AND ORGANIZING 39 BIODIVERSITY, BUT ALSO TO UNDERSTAND THE ECOLOGY AND EVOLUTION OF MICROORGANISMS – ARCHAEA AND 40 BACTERIA AS WELL AS EUKARYOTES – THAT MAKE UP A VAST MAJORITY OF THE PLANETARY BIODIVERSITY. Since 41 LARge-scale EXPLORATION BY THE MEANS OF ENVIRONMENTAL GENOME SEQUENCING BECAME POSSIBLE ALMOST 42 A DECADE ago, THERE HAS ALSO BEEN A PALPABLE EXCITEMENT AND ANTICIPATION OF THE DISCOVERY OF A 1 OF 21 43 FOURTH FORM OF LIFE OR A “FOURTH DOMAIN” OF LIFE (7). The REFERENCE HERE IS TO A FOURTH FORM OF CELLULAR 44 life, BUT NOT TO viruses, WHICH SOME HAVE ALREADY PROPOSED TO BE THE FOURTH DOMAIN OF THE TREE OF 45 Life (ToL) (7, 8). IF A FOURTH FORM OF LIFE WERE TO BE found, WHAT WOULD THE DISTINGUISHING FEATURES be, 46 AND HOW COULD IT BE MEASURed, DEfiNED AND CLASSIfied? 47 Rather THAN THE DISCOVERY OF A FOURTH domain, AND CONTRARY TO THE Expectations, HOWEver, CURRENT 48 DISCUSSION IS CENTERED AROUND THE RETURN TO A DICHOTOMOUS CLASSIfiCATION OF LIFE (9-11), DESPITE THE 49 RAPID EXPANSION OF SEQUENCED BIODIVERSITY – HUNDREDS OF NOVEL PHYLA DESCRIPTIONS (12, 13). The 50 PROPOSED DICHOTOMOUS CLASSIfiCATIONS schemes, UNFORTUNATELY, ARE IN SHARP CONTRAST TO EACH other, 51 DEPENDING on: (i) WHETHER THE ArCHAEA CONSTITUTE A MONOPHYLETIC GROUP—A UNIQUE LINE OF DESCENT 52 THAT IS DISTINCT FROM THOSE OF THE Bacteria AS WELL AS THE Eukarya; AND (ii) WHETHER THE ArCHAEA FORM 53 A SISTER CLADE TO THE Eukarya OR TO THE Bacteria. Both THE ISSUES STEM FROM DIffiCULTIES INVOLVED IN 54 RESOLVING THE DEEP BRANCHES OF THE ToL (10, 11, 14). 55 The TWIN issues, fiRST RECOGNIZED IN THE 80s BASED ON single-gene (SSU rRNA) analyses, CONTINUE 56 TO BE THE SUBJECTS OF A long-standing debate, WHICH REMAINS UNRESOLVED DESPITE LARge-scale ANALYSES 57 OF multi-gene DATASETS (5, 15-19). IN ADDITION TO THE CHOICE OF GENES TO BE analyzed, THE CHOICE OF 58 THE UNDERLYING CHARACTER EVOLUTION MODEL IS AT THE CORE OF CONTRADICTORY RESULTS THAT EITHER SUPPORTS 59 THE Three-domains TREE (5, 19) OR THE Eocyte TREE (17, 20). IN MANY cases, ADDING MORE data, EITHER 60 AS ENHANCED TAXON (species) SAMPLING OR ENHANCED CHARACTER (gene) sampling, OR both, CAN RESOLVE 61 AMBIGUITIES (21, 22). HoWEver, AS THE TAXONOMIC DIVERSITY AND EVOLUTIONARY DISTANCE INCREASES 62 AMONG THE TAXA studied, THE NUMBER OF CONSERVED MARKer-genes THAT CAN BE USED FOR PHYLOGENOMIC 63 ANALYSES DECReases. AccorDINGLY, RESOLVING THE PHYLOGENETIC RELATIONSHIPS OF THE Archaea, Bacteria 64 AND Eukarya IS RESTRICTED TO A SMALL SET OF GENES—50 AT MOST—IN SPITE OF THE LARGE INCREASE IN THE 65 NUMBERS OF GENOMES SEQUENCED AND THE ASSOCIATED DEVELOPMENT OF SOPHISTICATED PHYLOGENOMIC 66 methods. 67 Based ON A CLOSER SCRUTINY OF THE RECENT PHYLOGENOMIC DATASETS EMPLOYED IN THE ONGOING 68 debate, I WILL SHOW HERE THAT ONE OF THE REASONS FOR THIS PERSISTENT AMBIGUITY IS THAT THE ‘INFORMATION’ 69 NECESSARY TO RESOLVE THESE CONflICTS IS PRACTICALLY NONEXISTENT IN THE STANDARD MARKer-genes (i.e. CORe- 70 genes) DATASETS EMPLOYED ROUTINELY FOR phylogenomics. Further, I DISCUSS ANALYTICAL APPROACHES 71 THAT MAXIMIZE THE USE OF THE INFORMATION THAT IS IN GENOME SEQUENCE DATA AND SIMULTANEOUSLY 72 MINIMIZE PHYLOGENETIC uncertainties. IN addition, I DISCUSS SIMPLE BUT IMPORtant, YET undervalued, 73 ASPECTS OF PHYLOGENETIC HYPOTHESIS testing, WHICH TOGETHER WITH THE NEW APPROACHES HOLD PROMISE 74 TO RESOLVE THESE long-standing ISSUES EffECTIVELY. 75 Results 76 INFORMATION IN CORE GENES IS INADEQUATE TO RESOLVE THE ARCHAEAL RADIATION 77 Data-display NETWORKS (DDNs) ARE USEFUL TO EXAMINE AND VISUALIZE CHARACTER CONflICTS IN PHYLOGENETIC 78 datasets, ESPECIALLY IN THE ABSENCE OF PRIOR KNOWLEDGE ABOUT THE SOURCE OF SUCH CONflicts, IDEALLY 79 BEFORE DOWNSTREAM PROCESSING OF THE DATA FOR PHYLOGENETIC INFERENCE (23, 24). While CONGRUENT 80 DATA WILL BE DISPLAYED AS A TREE IN A DDN, INCONGRUENCES ARE DISPLAYED AS RETICULATIONS IN THE TRee. 81 Fig. 1A SHOWS A neighbor-net ANALYSIS OF THE SSU rRNA ALIGNMENT USED TO RESOLVE THE PHYLOGENETIC 82 POSITION OF THE RECENTLY DISCOVERED AsgarD ARCHAEA (20). The DDN IS BASED ON CHARACTER DISTANCES 83 CALCULATED AS THE OBSERVED GENETIC DISTANCE (p-distance) OF 1,462 CHARacters, AND SHOWS THE TOTAL 84 AMOUNT OF CONflICT IN THE DATASET THAT IS INCONGRUENT WITH CHARACTER BIPARTITIONS (splits). The EDGE 85 (branch) LENGTHS IN THE DDN CORRESPOND TO THE SUPPORT FOR THE RESPECTIVE splits. AccorDINGLY, TWO 86 well-supported SETS OF SPLITS FOR THE Bacteria AND THE Eukarya ARE observed. The Archaea, HOWEver, 87 DOES NOT FORM A distinct, well-resolved/well-supported GRoup, AND IS UNLIKELY TO CORRESPOND TO A 88 MONOPHYLETIC GROUP IN A PHYLOGENETIC TRee. 2 OF 21 FigurE 1. Data-display NETWORKS DEPICTING THE CHARACTER CONflICTS IN DIffERENT DATASETS THAT EMPLOY DIffERENT CHARACTER types. (A) SSU rRNA ALIGNMENT OF 1,462 CHARacters. Concatenated PROTEIN SEQUENCE ALIGNMENT OF (B) 29 CORe-genes, 8,563 CHARacters; (C) 48 CORe-genes, 9,868 CHARACTERS AND (D) ALSO 48 CORe-genes, 9,868 SR4 RECODED CHARACTERS (data SIMPLIfiED FROM 20 TO 4 CHARacter-states). Each NETWORK IS CONSTRUCTED FROM A neighbor-net ANALYSIS BASED ON THE OBSERVED GENETIC DISTANCE (p-distance) AND DISPLAYED AS AN EQUAL ANGLE SPLIT network. Edge (branch) LENGTHS CORRESPOND TO THE SUPPORT FOR CHARACTER BIPARTITIONS (splits), AND RETICULATIONS IN THE TREE CORRESPOND TO CHARACTER CONflicts. Datasets IN (A), (C) AND (D) ARE FROM Ref. 20, AND IN (B) IS FROM Ref. 17. 89 LikEwise, THE CONCATENATED PROTEIN SEQUENCE ALIGNMENT OF THE so-called ‘GENEALOGY DEfiNING CORE 90 OF GENES’ (25) – A SET OF CONSERVED single-copY GENES – ALSO DOES NOT SUPPORT A UNIQUE ARCHAEL lineage. 91 Fig. 1B IS A DDN DERIVED FROM A neighbor-net ANALYSIS OF 8,563 CHARACTERS IN 29 CONCATENATED CORe- 92 GENES (17), WHILE Fig. 1C,D IS BASED ON 9,868 CHARACTERS IN 44 CONCATENATED CORe-genes (also FROM 93 (20)). Even TAKEN together, NONE OF THE STANDARD MARKER GENE DATASETS ARE LIKELY TO SUPPORT THE 94 MONOPHYLY OF THE ArCHAEA — A KEY ASSERTION OF THE THRee-domains HYPOTHESIS (26). Simply put, 95 THERE IS NOT ENOUGH INFORMATION IN THE CORe-gene DATASETS TO RESOLVE THE ARCHAEAL Radiation, OR TO 3 OF 21 96 DETERMINE WHETHER THE ArCHAEA ARE REALLY UNIQUE COMPARED TO THE Bacteria AND Eukarya. HoWEver, 97 OTHER COMPLEX FEATURES — INCLUDING molecular, BIOCHEMICAL AND PHENOTYPIC CHARacters, AS WELL 98 AS ECOLOGICAL ADAPTATIONS — SUPPORT THE UNIQUENESS OF THE Archaea. These IDIOSYNCRATIC ARCHAEAL 99 CHARACTERS INCLUDE THE SUBUNIT COMPOSITION OF SUPRAMOLECULAR COMPLEXES LIKE THE ribosome, DNA- 100 AND RNA-polymerases, BIOCHEMICAL COMPOSITION OF CELL MEMBRanes, CELL walls, AND PHYSIOLOGICAL 101 ADAPTATIONS TO ENERgy-starved ENVIRonments, AMONG OTHER THINGS (27, 28). 102 CompleX PHYLOGENOMIC CHARACTERS MINIMIZE UNCERTAINTIES REGARDING THE unique- 103 NESS OF THE ArCHAEA 104 A NUCLEOTIDE IS THE SMALLEST POSSIBLE locus, AND AN AMINO ACID IS A PROXY FOR A LOCUS OF A NUCLEOTIDE 105 triplet.