Insights Into Bilaterian Evolution from Three Spiralian Genomes
Total Page:16
File Type:pdf, Size:1020Kb
LETTER OPEN doi:10.1038/nature11696 Insights into bilaterian evolution from three spiralian genomes Oleg Simakov1,2, Ferdinand Marletaz1{, Sung-Jin Cho2, Eric Edsinger-Gonzales2, Paul Havlak3, Uffe Hellsten4, Dian-Han Kuo2{, Tomas Larsson1, Jie Lv3, Detlev Arendt1, Robert Savage5, Kazutoyo Osoegawa6, Pieter de Jong6, Jane Grimwood4,7, Jarrod A. Chapman4, Harris Shapiro4, Andrea Aerts4, Robert P. Otillar4, Astrid Y. Terry4, Jeffrey L. Boore4{, Igor V. Grigoriev4, David R. Lindberg8, Elaine C. Seaver9{, David A. Weisblat2, Nicholas H. Putnam3,10 & Daniel S. Rokhsar2,4,11 Current genomic perspectives on animal diversity neglect two Supplementary Table 2.2.2 and Supplementary Note 2.2. The repetitive prominent phyla, the molluscs and annelids, that together account landscape of these genomes is discussed in Supplementary Note 3.2). for nearly one-third of known marine species and are important Comparing the new genomes with other metazoan sequences, we both ecologically and as experimental systems in classical embry- characterized 8,756 modern bilaterian gene families as likely to have ology1–3. Here we describe the draft genomes of the owl limpet arisen from single progenitor genes in the last common bilaterian (Lottia gigantea), a marine polychaete (Capitella teleta) and a ancestor (Supplementary Note 3.4). As gene loss is common and highly freshwater leech (Helobdella robusta), and compare them with diverged orthologues can be difficult to detect, this is a conservative other animal genomes to investigate the origin and diversifica- lower bound on the number of genes encoded by the last common tion of bilaterians from a genomic perspective. We find that the bilaterian ancestor. Of the 8,756 gene families, 763 were newly identified genome organization, gene structure and functional content of as being of bilaterian ancestry based on the new spiralian genomes these species are more similar to those of some invertebrate deu- (Supplementary Note 3.4). These newly identified bilaterian families terostome genomes (for example, amphioxus and sea urchin) than belong to various functional categories (Supplementary Table 3.4.1), those of other protostomes that have been sequenced to date (flies, the most prominent being members of the G-protein-coupled recep- nematodes and flatworms). The conservation of these genomic tor superfamily and epithelial sodium channels (see below) as well as features enables us to expand the inventory of genes present in various metabolic enzymes. Through subsequent gene duplication, the the last common bilaterian ancestor, establish the tripartite diver- 8,756 ancestral bilaterian families conservatively account for 47 to 85% sification of bilaterians using multiple genomic characteristics and of genes in other bilaterian species (70% of human genes; Supplemen- identify ancient conserved long- and short-range genetic linkages tary Note 3.4). Most of the remaining genes in extant bilaterian gen- across metazoans. Superimposed on this broadly conserved pan- omes share at least one domain with the bilaterian gene families, or have bilaterian background we find examples of lineage-specific genome a significant BLAST (Basic Local Alignment Search Tool) hit when evolution, including varying rates of rearrangement, intron gain compared against sequences from bilaterian gene families, suggesting and loss, expansions and contractions of gene families, and the that they have arisen through descent with modification (Supplemen- evolution of clade-specific genes that produce the unique content tary Note 3.5). of each genome. Exon–intron structures are highly conserved between spiralians and Molluscs, annelids and numerous smaller phyla typically share stereo- other animals; thus we infer that cis-splicing ofintron-rich genes was the typed spiral cleavage patterns, cell-fate assignments and characteristic ancestral state of metazoans, bilaterians and protostomes (Supplemen- ciliated trochophore larvae, features that originated in the Precambrian tary Note 5.2). In most cases, exon boundaries in the newly sequenced era3–5. These spiralian phyla are included in the larger lophotrochozoan spiralians are precisely conserved between orthologous genes in clade6 that is a sister group to the ecdysozoans (arthropods, nematodes sequenced deuterostomes (vertebrates, sea urchin and amphioxus) and other related phyla) but whose internal branching remains con- and non-bilaterians (Trichoplax and starlet sea anemone). For example, troversial. However, so far the only deeply sequenced lophotrocho- 75% of human introns are present in one or more of the spiralians, zoan genomes are those of platyhelminth flatworms (two parasitic whereas only 14% of the same introns are found in Drosophila10,11. schistosomes7,8 and a free-living planarian9), whose comparatively rapid However, intron gain or loss rates vary markedly among the three rates of genome evolution do not reflect a general condition of lopho- spiralians. In particular, H. robusta also has substantially more novel trochozoans (see below). In this study, we explore spiralian diversity at introns than do the other two sequenced spiralians (Supplementary the genomic level by comparative analysis of one mollusc and two Notes 5.2 and 5.3, and Supplementary Fig. 5.2.1), the first of several annelid genomes (Supplementary Note 1). indicators of a notably dynamic genome in this lineage. We assembled the limpet, polychaete and leech genomes from appro- Collectively and individually, the spiralian genomes reported here ximately eight-fold random whole-genome shotgun coverage with Sanger retain most of the inferred ancestral bilaterian gene families (8,203 dideoxy sequencing reads (Supplementary Note 2). No genetic or physical out of 8,756, corresponding to a 94% retention rate, compared to maps were available for these systems, so we reconstructed each genome 7,553 or 86% retention rate in human). In contrast, the collective as scaffolds (gap-containing sequences). The three genomes reported here retention rate of only 65% for sequenced flatworms (53% for schisto- each encode an estimated 23,000 to 33,000 protein-coding genes (Table 1, somes and 60% for Schmidtea) reflects the absence (and presumed 1European Molecular Biology Laboratory, Meyerhofstraße 1, 69117 Heidelberg, Germany. 2Department of Molecular and Cell Biology, University of California, Berkeley, California 94720, USA. 3Department of Ecology & Evolutionary Biology, Rice University, PO Box 1892, Houston, Texas 77251-1892, USA. 4DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, California 94598, USA. 5Department of Biology, Williams College, Thompson Biology Laboratory, 59 Lab Campus Drive, Williamstown, Massachusetts 01267, USA. 6Children’s Hospital Oakland Research Institute, 5700 Martin Luther King Jr. Way, Oakland, California 94609, USA. 7Hudson Alpha Institute for Biotechnology, 601 Genome Way, Huntsville, Alabama 35806-2908, USA. 8Department of Integrative Biology, University of California, Berkeley, California 94720, USA. 9Kewalo Marine Laboratory, University of Hawaii at Manoa, 41 Ahui Street, Honolulu, Hawaii 96813, USA. 10Department of Biochemistry and Cell Biology, Rice University, Houston, Texas 77251, USA. 11Okinawa Institute of Science and Technology, 1919-1 Tancha, Onna-son, Okinawa 904-0495, Japan. {Present addresses: Department of Zoology, University of Oxford, The Tinbergen Building, South Parks Road, Oxford OX1 3PS, UK (F.M.); Insitute of Zoology, National Taiwan University, 1 Roosevelt Road, Taipei 10617, Taiwan (D.-H.K.); Genome Project Solutions, 1024 Promenade Street, Hercules, California 94547, USA (J.L.B.); The Whitney Laboratory for Marine Bioscience, 9505 Ocean Shore Boulevard, St. Augustine, Florida 32080-8610, USA (E.C.S.). 526 | NATURE | VOL 493 | 24 JANUARY 2013 ©2013 Macmillan Publishers Limited. All rights reserved LETTER RESEARCH Table 1 | Genome sequencing and annotation summary Species Size of genome Scaffold N50 Repetitive GC (%) Predicted Number of genes in Number of genes in Mean number of exons Mean exon Mean assembly (Mbp) content(%) number orthologous clusters ancestral bilaterian per gene (with $2 length (bp) intron (Mbp) of genes with other species gene families orthologues) length (bp) Lottia gigantea 348 1.87 21 33 23,800 16,183 10,681 8 213 787 Capitella teleta 324 0.19 31 40 32,389 20,537 11,911 7 221 291 Helobdella robusta 228 3.06 33 33 23,400 13,820 8,707 8 203 526 GC, fraction of guanine plus cytosine nucleobases; Scaffold N50, the length such that half of the assembled sequence is in scaffolds longer than this length; Mbp, megabase pairs. loss) of more than 3,018 ancestral bilaterian gene families in these in development (for example, homeobox genes (see below, and Sup- flatworms. Similar losses are observed for introns (Supplementary plementary Notes 4.5 and 4.6)). Both mollusc and annelid genomes are Note 5.3), as well as synteny (see below), which indicate a higher rate also enriched in specific metabolic enzymes and pathways of unknown of genomic turnover in platyhelminths than in the mollusc and annelid relevance (for example, galactoside 2-a-L-fucosyltransferase; see Sup- genomes reported here. plementary Notes 4.2 and 4.3). In general, lineage-specific gene family Against this background of conserved gene content and structure, expansions seem to be the norm in the evolutionary diversification of we find several significantly (P , 0.05) expanded gene families in spe- modern taxa from the bilaterian ancestor, whereas the more ancient cific spiralian clades