Reconstruction of the Ancestral Metazoan Genome Reveals an Increase in Genomic Novelty
Total Page:16
File Type:pdf, Size:1020Kb
Paps, J. , & Holland, P. W. H. (2018). Reconstruction of the ancestral metazoan genome reveals an increase in genomic novelty. Nature Communications, 9, [1730]. https://doi.org/10.1038/s41467-018- 04136-5 Publisher's PDF, also known as Version of record License (if available): CC BY Link to published version (if available): 10.1038/s41467-018-04136-5 Link to publication record in Explore Bristol Research PDF-document This is the final published version of the article (version of record). It first appeared online via Nature Research at https://www.nature.com/articles/s41467-018-04136-5#article-info . Please refer to any applicable terms of use of the publisher. University of Bristol - Explore Bristol Research General rights This document is made available in accordance with publisher policies. Please cite only the published version using the reference above. Full terms of use are available: http://www.bristol.ac.uk/red/research-policy/pure/user-guides/ebr-terms/ ARTICLE DOI: 10.1038/s41467-018-04136-5 OPEN Reconstruction of the ancestral metazoan genome reveals an increase in genomic novelty Jordi Paps1,2 & Peter W.H. Holland 2 Understanding the emergence of the Animal Kingdom is one of the major challenges of modern evolutionary biology. Many genomic changes took place along the evolutionary lineage that gave rise to the Metazoa. Recent research has revealed the role that co-option of 1234567890():,; old genes played during this transition, but the contribution of genomic novelty has not been fully assessed. Here, using extensive genome comparisons between metazoans and multiple outgroups, we infer the minimal protein-coding genome of the first animal, in addition to other eukaryotic ancestors, and estimate the proportion of novelties in these ancient gen- omes. Contrary to the prevailing view, this uncovers an unprecedented increase in the extent of genomic novelty during the origin of metazoans, and identifies 25 groups of metazoan- specific genes that are essential across the Animal Kingdom. We argue that internal genomic changes were as important as external factors in the emergence of animals. 1 School of Biological Sciences, University of Essex, Colchester, Essex CO4 3SQ, UK. 2 Department of Zoology, University of Oxford, Oxford OX1 3PS, UK. Correspondence and requests for materials should be addressed to J.P. (email: [email protected]) NATURE COMMUNICATIONS | (2018) 9:1730 | DOI: 10.1038/s41467-018-04136-5 | www.nature.com/naturecommunications 1 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04136-5 etazoa are the multicellular eukaryotic group with the Results Mlargest number of described species, over 1.6 million1. Inference of homology groups. To perform a genome wide They evolved within the eukaryotic supergroup Opis- examination of novelty, we designed a bioinformatics pipeline to thokonta, most closely related to choanoflagellates, filastereans, identify homologous groups of proteins within and between and ichthyosporeans2. In addition to environmental and ecolo- genomes, and determine their phylogenetic origin (Methods gical triggers, biological functions encoded in the genome were section and Supplementary Fig. 1). In contrast with other crucial in this transition, including genes involved in differential approaches (e.g., phylostratigraphy5,6), we use a large and gene regulation (e.g., several transcription factors, signalling representative taxon sampling and reciprocal sequence compar- pathways), cell adhesion (e.g., cadherins), cell type specification, ison7 combined with Markov clustering8 of complete proteomes cell cycle, and immunity3. Recent studies show that many genes to identify homology instead of one-way BLAST. We also define typically associated with metazoan functions actually pre-date gene age based on occupancy in all taxa of a clade (see below) animals themselves4, supporting functional co-option of ‘uni- instead of using a single species as an anchor. While this is a cellular genes’ during the genesis of metazoans. robust methodology, it has some limitations seen in other However, the role of genome novelty in animal origins has not BLAST-based analyses in that homology assignment might be been fully evaluated. We hypothesize that genomic novelty had a affected by gene fusions, fissions, exon shuffling, lateral gene major impact in this transition, particularly involving biological transfer, or presence of repetitive motifs9. We compared 62 functions which are hallmarks of animal multicellularity (gene genomes throughout Metazoa and their outgroups, including 44 regulation, signalling, cell adhesion, and cell cycle). Here we apply genomes for 14 metazoan phyla (Fig. 1, Supplementary Fig. 2, a comparative genomics approach using sophisticated methods, Supplementary Data 1). These genomes were generated with newly developed programs, and a comprehensive taxon sampling. different sequencing technologies, coverage, and gene prediction The reconstruction of the ancestral genome of the last common software. The completeness of these genomes was assessed using ancestor of animals shows a set of biological functions similar to BUSCO10 (Supplementary Data 1, Supplementary Fig. 3); 10 other eukaryote ancestors, while revealing an unexpected animal genomes showed percentages of missing genes above 15%, expansion of gene diversity. These analyses also highlight 25 but five of them belong to animals that had suffered known groups of genes only found in animals that are highly retained in genome reductions, while the others are within clades in which all their genomes, with essential functions linked to animal other genomes display very low values of missing genes in the multicellularity. BUSCO analyses. After performing all-vs-all BLAST, MCL Holozoa Metazoa Bilateria 9337 Deuterostomia (12 species) +1580 Clade Planulozoa +0 Ancestral 8433 –520 Novel +1201 Novel Core Eumetazoa +1 Ecdysozoa (12) Lost 4910 –72 +494 Metazoa +3 6331 –263 Lophotrochozoa (12) +1189 Node 2 +25 4883 –503 +399 Cnidaria (3) Node 1 +4 Placozoa (1) 4182 –497 +340 Ctenophora (2) Holozoa +4 Porifera (2) 3878 –420 +389 Choanoflagellata (2) +5 –347 Filasterea (1) Ichthyosporea (2) Holomycota (4) Apusozoa (1) Outgroups (8) Fig. 1 Reconstruction of ancestral genomes. Evolutionary relationships of the major groups included in his study2. Different categories of HG are indicated in each node, from top to bottom, Ancestral HG, Novel HG, Novel Core HG, and Lost HG. Values assume sponges as the sister group to other animals, and placozoans as sister group to Planulozoa (=Cnidaria + Bilateria); alternative phylogenetic hypotheses are explored in Supplementary Data 3-8. Organism outlines from phylopic.org and the authors 2 NATURE COMMUNICATIONS | (2018) 9:1730 | DOI: 10.1038/s41467-018-04136-5 | www.nature.com/naturecommunications NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04136-5 ARTICLE a 20 b 3.0 18 % Core HG that are 2.5 16 Novel 14 2.0 12 % of Novel HG that are Core HG 10 % 1.5 8 1.0 6 4 % of novel HG in the genome % of novel 0.5 2 0 0.0 Node 1 Node 2 Node 1 Node 2 Holozoa Metazoa Bilateria Holozoa Metazoa Bilateria EumetazoaPlanulozoa EumetazoaPlanulozoa c 90 80 70 60 50 40 30 20 10 0 Receptor Hydrolase Transporter Chaperone Oxidoreductase Structural protein Enzyme modulator Nucleic acid bindingTranscription factorSignalling molecule Extracellular matrix protein Holozoa Node 1 (Filasterea + Node 2) Node 2 (Choanoflagellata + Metazoa) Metazoa Eumetazoa Planulozoa Bilateria Fig. 2 Novelty in ancestral genomes. a Proportion of Novel HG in the Ancestral HG for different holozoan ancestors. b Percentage of Core HG that are novel, and percentage of highly preserved genes among the Novel HG across different LCA. c Number of Protein Class GO hits for the fruit fly representatives of the Novel HG for the various phylogenetic nodes displays similarities in two-dimensional space, within which the parallel with organisms’ classical taxonomy and their phyloge- distance between proteins is proportional to the e-value of their netic relationships. A given HG can either include genes BLAST comparison. MCL then applies graph theory and hidden traditionally defined as gene families (e.g., Iroquois gene family, Markov models to cluster sets of proteins based on their proxi- part of the TALE class, which is part of the homeobox genes mity in that space8. All 1,494,207 proteins were clustered in superfamily11), as gene classes (e.g., POU class of the homeobox 268,440 homology groups (HG); all human genes are grouped superfamily, containing multiple gene families), or as super- within 9451 HG and fruit fly genes within 7618 HG (Supple- families (e.g., Wnt ligands, containing multiple gene classes that mentary Data 4). comprise diverse gene families). The recovery of traditional gene The inferred HG provide a contrast between the hierarchical families/classes/superfamilies demonstrates the reliability of the nature of gene classifications used in the literature (e.g., gene clustering approach taken here. However, the evolutionary families, classes, and superfamilies) and their evolutionary history of some genes is evident from their clustering; for dynamics (clustering based on sequence divergence), drawing a example, even though the Iroquois gene family is classified within NATURE COMMUNICATIONS | (2018) 9:1730 | DOI: 10.1038/s41467-018-04136-5 | www.nature.com/naturecommunications 3 ARTICLE NATURE COMMUNICATIONS | DOI: 10.1038/s41467-018-04136-5 Table 1 Novel Core genes in the Animal Kingdom Transcription factors Fruit fly genes examples Human gene examples