Hidden Evolutionary Complexity of Nucleo-Cytoplasmic Large DNA Viruses of Eukaryotes Natalya Yutin and Eugene V Koonin*
Total Page:16
File Type:pdf, Size:1020Kb
Yutin and Koonin Virology Journal 2012, 9:161 http://www.virologyj.com/content/9/1/161 RESEARCH Open Access Hidden evolutionary complexity of Nucleo-Cytoplasmic Large DNA viruses of eukaryotes Natalya Yutin and Eugene V Koonin* Abstract Background: The Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) constitute an apparently monophyletic group that consists of at least 6 families of viruses infecting a broad variety of eukaryotic hosts. A comprehensive genome comparison and maximum-likelihood reconstruction of the NCLDV evolution revealed a set of approximately 50 conserved, core genes that could be mapped to the genome of the common ancestor of this class of eukaryotic viruses. Results: We performed a detailed phylogenetic analysis of these core NCLDV genes and applied the constrained tree approach to show that the majority of the core genes are unlikely to be monophyletic. Several of the core genes have been independently acquired from different sources by different NCLDV lineages whereas for the majority of these genes displacement by homologs from cellular organisms in one or more groups of the NCLDV was demonstrated. Conclusions: A detailed study of the evolution of the genomic core of the NCLDV reveals substantial complexity and diversity of evolutionary scenarios that was largely unsuspected previously. The phylogenetic coherence between the core genes is sufficient to validate the hypothesis on the evolution of all NCLDV from a common ancestral virus although the set of ancestral genes might be smaller than previously inferred from patterns of gene presence-absence. Background of genes encoding proteins responsible for many func- Viruses are ubiquitous, obligate, intracellular parasites of tions essential for virus reproduction. all cellular life forms that rely on the host cell translation One of the largest viral divisions that seem to be system, metabolism and, in many cases, the replication monophyletic includes 6 recognized families and a 7th and transcription systems, for their reproduction. There candidate family of viruses with large DNA genomes is no evidence that all viruses have a monophyletic ori- that infect diverse eukaryotes and are collectively known gin, at least not under the traditional concept of mono- as Nucleo-Cytoplasmic Large DNA Viruses (NCLDV) phyly. Indeed, not a single gene is conserved in the [3-6]. The formally recognized NCLDV families are Pox- genomes of all known viruses although a small group of viridae, Asfarviridae, Iridoviridae, Ascoviridae, Phycod- “viral hallmark genes” encoding some of the key proteins naviridae, and Mimiviridae; in addition, the recently involved in genome replication and virion structure for- discovered Marseillevirus and the related Lausannevirus mation are shared by large, diverse subsets of viruses could not be assigned to any of the 6 families, and are [1,2]. However, several large groups of viruses infecting likely to become founding members of a new family diverse hosts do appear to share common ancestry in [7,8]. Hereinafter we speak of 7 NCLDV families for the the strict sense, that is, to have evolved from the same sake of simplicity. ancestral virus, as indicated by the conservation of sets By far the most thoroughly studied group of the NCLDV are the Poxviridae, the family of animal viruses * Correspondence: [email protected] that include a major human pathogen, the smallpox National Center for Biotechnology Information, National Library of Medicine, virus, important animal pathogens, such as rabbit National Institutes of Health, Bethesda, MD 20894, USA © 2012 Yutin and Koonin; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Yutin and Koonin Virology Journal 2012, 9:161 Page 2 of 18 http://www.virologyj.com/content/9/1/161 myxoma virus, as well as vaccinia virus (VACV), one of between the viral families. Moreover, for some of the the best characterized models of molecular biology [9]. NCLDV genes with important functions in virus Another family of the NCLDV that has recently become reproduction, indications of complex evolutionary his- a major focus of attention is the Mimiviridae that tories have been obtained. The showcase for such evolu- includes giant viruses infecting amoeba and probably tionary complexity is the viral DNA ligase which is algae [10-13]. The genome of the prototype virus of this represented by two distantly related forms across the family, Acanthamoeba polyphaga Mimivirus [14], NCLDV families. A maximum likelihood reconstruction slightly exceeds one megabase (Mb), and other related based on the presence-absence of conserved genes in viruses possess even larger genomes [15,16], so the the viral genomes has implied that one of the two forms, Mimiviridae are undisputed genome size record holders the ATP-dependent DNA ligase, was the ancestral form in the virosphere. Indeed, in terms of genome size and that was present in the genome of the prototype complexity, the NCLDV eclipse numerous parasitic bac- NCLDV but was replaced by the distantly related teria, and approach the simplest free-living prokaryotes. NAD-dependent ligase in several viral lineages. How- The NCLDV infect animals and diverse unicellular ever, when the reconstruction was supplemented by eukaryotes, and either replicate exclusively in the cyto- phylogenetic analysis of the two forms of DNA ligase, plasm of the host cells, or encompass both cytoplasmic the opposite conclusion has been reached, namely that and nuclear stages in their life cycle. Most of the the NAD –dependent ligase was the ancestral form in NCLDV do not strongly depend on the host replication NCLDV that was displaced by the ATP-dependent lig- or transcription systems for completing their replication ase on several independent occasions [18]. This change [9,17]. The autonomous life style of the NCLDV is sup- in perspective occurred because phylogenetic analysis ported by a set of conserved proteins that are encoded indicated that the ATP-dependent ligases from different in the viral genomes and mediate most of the processes lineages of the NCLDV clustered with distinct groups of essential for viral reproduction. These essential, con- eukaryotic homologs whereas the NAD–dependent served proteins include DNA polymerases, helicases, ligases of the NCLDV appeared to be monophyletic. In and primases responsible for DNA replication, RNA the same vein, complex phylogenies suggestive of mul- polymerase subunits and transcription factors that func- tiple horizontal gene transfer have been observed for tion in transcription initiation and elongation, Holliday several core NCLDV genes such as thymidine kinase or junction resolvases and topoisomerases involved in gen- the two subunits of ribonucleotide reductase [19]. ome DNA processing and maturation, ATPase pumps Taken together, these findings imply that some of the mediating DNA packaging, molecular chaperones apparently conserved genes of the NCLDV might actu- involved in capsid assembly and capsid proteins them- ally have complex histories which could include inde- selves [3-5]. Although several viral hallmark genes are pendent (convergent) acquisition of these genes from shared by NCLDV and other large DNA viruses, such as different cellular organisms as well as displacement of herpesviruses, baculoviruses and some bacteriophages ancestral viral gene by homologs of cellular provenance. [2], the conservation of the large set of core genes clearly This line of reasoning prompted us to perform a com- demarcates the NCLDV as a distinct, most likely mono- prehensive phylogenetic analysis of the set of the puta- phyletic class of viruses [4,6]. More specifically, recon- tive ancestral NCLDV genes. Here we present the results structions of the ancestral NCLDV genome composition of this analysis which suggest that, although the exist- using maximum parsimony and maximum likelihood ence of a common ancestor of the NCLDV is beyond methods have delineated a set of approximately 50 genes reasonable doubt, most of the conserved NCLDV genes that are inferred to have been responsible for the key indeed had complex evolutionary histories. functions in the reproduction of the last common ances- tor of the NCLDV [5]. Results The core, presumably ancestral set of the NCLDV Approach and rationale genes was delineated using sequence similarity-based Maximum likelihood reconstruction of the gene reper- methods that have been previously employed for identi- toire of the putative ancestral NCLDV has revealed a fication of clusters of orthologous genes in diverse cellu- core consisting of approximately 50 viral genes (Table 1). lar life forms. The comparative genomic analysis Most of these genes are present in various subsets of the underlying these reconstructions was deliberately limited NCLDV genomes rather than all genomes (Additional to the NCLDV genomes to simplify the analysis and to file 1) but high likelihoods of ancestral provenance have facilitate detection of distant relationships between viral been estimated for each of them, with the absences ac- proteins. Indeed, some of the core NCLDV proteins, cordingly attributed to lineage-specific gene loss. We such as for example the packaging ATPases and the di- constructed multiple alignments and position-specific sulfide chaperones, show only weak sequence similarity