A Comparative Genomics Multitool for Scientific Discovery and Conservation
Total Page:16
File Type:pdf, Size:1020Kb
UC Santa Cruz UC Santa Cruz Previously Published Works Title A comparative genomics multitool for scientific discovery and conservation. Permalink https://escholarship.org/uc/item/3tg037ts Journal Nature, 587(7833) ISSN 0028-0836 Author Zoonomia Consortium Publication Date 2020-11-11 DOI 10.1038/s41586-020-2876-6 Peer reviewed eScholarship.org Powered by the California Digital Library University of California Analysis A comparative genomics multitool for scientific discovery and conservation https://doi.org/10.1038/s41586-020-2876-6 Zoonomia Consortium* Received: 17 April 2019 Accepted: 27 July 2020 The Zoonomia Project is investigating the genomics of shared and specialized traits Published online: 11 November 2020 in eutherian mammals. Here we provide genome assemblies for 131 species, of which all but 9 are previously uncharacterized, and describe a whole-genome Open access alignment of 240 species of considerable phylogenetic diversity, comprising Check for updates representatives from more than 80% of mammalian families. We fnd that regions of reduced genetic diversity are more abundant in species at a high risk of extinction, discern signals of evolutionary selection at high resolution and provide insights from individual reference genomes. By prioritizing phylogenetic diversity and making data available quickly and without restriction, the Zoonomia Project aims to support biological discovery, medical research and the conservation of biodiversity. The genomics revolution is enabling advances not only in medical We collaborated with 28 institutions to collect samples, nearly half research1, but also in basic biology2 and in the conservation of bio- (47%) of which were provided by The Frozen Zoo of San Diego Zoo Global diversity, where genomic tools have helped to apprehend poachers3 (Supplementary Table 1). Since 1975, The Frozen Zoo has stored renew- and to protect endangered populations4. However, we have only a able cell cultures for about 10,000 vertebrate animals that represent limited ability to predict which genomic variants lead to changes in over 1,100 taxa, including more than 200 species that are classified as organism-level phenotypes, such as increased disease risk—a task that, vulnerable, endangered, critically endangered or extinct by the Inter- in humans, is complicated by the sheer size of the genome (about three national Union for Conservation of Nature (IUCN)10. For 36 target spe- billion nucleotides)5. cies we were unable to acquire a DNA sample of sufficient quality, even Comparative genomics can address this challenge by identifying though our requirements were modest (Methods), which highlights a nucleotide positions that have remained unchanged across millions major impediment to expanding the phylogenetic diversity of genomics. of years of evolution6 (suggesting that changes at these positions will We used two complementary approaches to generate genome assem- negatively affect fitness), focusing the search for disease-causing blies (Extended Data Table 1). First, for 131 genomes we generated assem- variants. In 2011, the 29 Mammals Project7 identified 12-base-pair blies by performing a single lane of sequencing (2× 250-bp reads) on (bp) regions of evolutionary constraint that in total comprise 4.2% PCR-free libraries and assembling with DISCOVAR de novo11 (referred of the genome, by measuring sequence conservation in humans plus to here as ‘DISCOVAR assemblies’). This method does not require intact 28 other mammals. These regions proved to be more enriched for cells and uses less than two micrograms of medium-quality DNA (most the heritability of complex diseases than any other functional mark, fragments are over 5 kilobases (kb) in size), which allowed us to include including coding status8. By expanding the number of species and species that are difficult to access (Extended Data Figs. 1, 2) while achiev- making an alignment that is independent of any single reference ing ‘contiguous sequences constructed from overlapping short reads’ genome, the Zoonomia Project was designed to detect evolutionary (contig) lengths comparable to those of existing assemblies (median con- constraint in the eutherian lineage at increased resolution, and to tig N50 of 46.8 kb, compared to 47.9 kb for Refseq genome assemblies). provide genomic resources for over 130 previously uncharacterized For nine DISCOVAR assemblies and one pre-existing assembly (the species. lesser hedgehog tenrec (Echinops telfairi)), we increased contigu- ity 200-fold (the median scaffold length increased from 90.5 kb to 18.5 megabases (Mb)) through proximity ligation, which uses chromatin Designing a comparative-genomics multitool interaction data to capture the physical relationships among genomic When selecting species, we sought to maximize evolutionary branch regions12. Unlike short-contiguity genomes, these assemblies capture length, to include at least one species from each eutherian family, and structural changes such as chromosomal rearrangements13. The upgraded to prioritize species of medical, biological or biodiversity conservation assemblies increase the number of eutherian orders that are represented interest. Our assemblies increase the percentage of eutherian families by a long-range assembly (contig N50 > 20 kb and scaffold N50 > 10 Mb) with a representative genome from 49% to 82%, and include 9 species from 12 to 18 (out of 19). We are working on upgrading the assembly of that are the sole extant member of their family and 7 species that are the large treeshrew (Tupaia tana) for the remaining order (Scandentia). critically endangered9 (Fig. 1): the Mexican howler monkey (Alouatta palliata mexicana), hirola (Beatragus hunteri), Russian saiga (Saiga tatarica tatarica), social tuco-tuco (Ctenomys sociabilis), indri (Indri Comparative power of 240 species indri), northern white rhinoceros (Ceratotherium simum cottoni) and The Zoonomia alignment includes 120 newly generated assemblies black rhinoceros (Diceros bicornis). and 121 existing assemblies, representing a total of 240 species (the *A list of authors and their affiliations appears at the end of the paper. 240 | Nature | Vol 587 | 12 November 2020 Solenodontidae (1) Erinaceidae (1) Ctenodactylidae (1) Soricidae (2) Rodentia Thryonomyidae (1) Eulipotyphla Talpidae (3) Petromuridae (1) (shrews, moles Manidae (3) (rodents) Bathyergidae (2) and hedgehogs) Nandiniidae (0) Cuniculidae (1) Eupleridae (1) Caviidae (5) Herpestidae (3) Dasyproctidae (1) Hyaenidae (1) Primates Erethizontidae (0) Viverridae (1) Abrocomidae (0) Felidae (7) Octodontidae (1) Canidae (3) Fossa Ctenomyidae (1) Dermoptera Aye-aye Pholidota Procyonidae (0) Echimyidae (0) Mustelidae (5) (arboreal Myocastoridae (1) (pangolins) Ailuridae (1) gliding Capromyidae (1) Mephitidae (1) mammals) Chinchillidae (1) Otariidae (1) Dinomyidae (1) Odobenidae (1) Hystricidae (1) Phocidae (3) Scandentia Platacanthomyidae (0) Ursidae (2) (tree shrews) Calomyscidae (0) Equidae (3) Muridae (8) Tapiridae (2) Cricetidae (8) Carnivora Rhinocerotidae (4) Nesomyidae (2) (carnivores) Camelidae (4) Spalacidae (2) Arctic fox Cingulata Dipodidae (3) Hippopotamidae (1) (armadillos) Geoffroy’s Balaenidae (1) Anomaluridae (0) spider monkey Balaenopteridae (2) Pedetidae (1) Eschrichtiidae (1) Castoridae (1) Neobalaenidae (0) Pilosa Heteromyidae (3) Physeteridae (1) (anteaters Geomyidae (0) Perissodactyla Platanistidae (1) and sloths) Sciuridae (4) (odd-toed Ziphiidae (2) Aplodontiidae (1) ungulates) Iniidae (3) Gliridae (3) Phocoenidae (2) Sirenia Lorisidae (1) Monodontidae (2) (sea cows and Galagidae (1) Delphinidae (2) Daubentoniidae (1) Tragulidae (1) Hirola manatees) Indridae (2) Moschidae (1) Lepilemuridae (0) Southern three- Bovidae (14) Cheirogaleidae (3) banded armadillo Cetartiodactyla Girafdae (2) Hyracoidea Lemuridae (3) (whales and even- Antilocapridae (1) (hyraxes) Pitheciidae (2) Cervidae (3) Cebidae (5) toed ungulates) Tayassuidae (1) Aotidae (1) Suidae (1) Atelidae (2) Rhinopomatidae (0) Proboscidea Cercopithecidae (16) Megadermatidae (1) Hylobatidae (1) Craseonycteridae (1) (elephants) Hominidae (5) Rhinolophidae (2) Tarsiidae (1) Hipposideridae (2) Cynocephalidae (1) Chiroptera Pteropodidae (5) Bumblebee bat Afrosoricida Ptilocercidae (0) (bats) Myzopodidae (0) (golden moles Tupaiidae (2) Giant anteater Thyropteridae (0) and tenrecs) Dasypodidae (3) Noctilionidae (1) Myrmecophagidae (2) Furipteridae (0) Cyclopedidae (0) Mormoopidae (2) Macroscelidea Megalonychidae (2) Phyllostomidae (7) (elephant Bradypodidae (1) Mystacinidae (0) Trichechidae (1) Vespertilionidae (12) shrews) Dugongidae (0) Lagomorpha Molossidae (1) Procaviidae (2) (rabbits, Natalidae (0) Elephantidae (1) hares and pikas) Nycteridae (0) Tubulidentata Tenrecidae (2) Emballonuridae (0) (aardvarks) Chrysochloridae (1) Ochotonidae (1) Macroscelididae (1) Leporidae (2) Snowshoe hare Orycteropodidae (1) Brown-throated sloth Zoonomia GenBank Zoonomia and GenBank No sequenced species Fig. 1 | The Zoonomia Project brings the fraction of eutherian families that (grey boxes) and 22 had no representative genome assembly (white boxes). are represented by at least one assembly to 83%. Phylogenetic tree of the Parenthetical numbers indicate the number of species with genome assemblies mammalian families in the Zoonomia Project alignment, including both our in a given family. Image credits: fossa, Bertal/Wikimedia (CC BY-SA); Arctic fox, new assemblies and all other high-quality mammalian genomes publicly Michael Haferkamp/Wikimedia (CC BY-SA); hirola, JRProbert/Wikimedia