Knowledge from Three Major Genome Projects Large-Scale Evolution Across Vertebrates Large-Scale Evolution Across Vertebrates
Total Page:16
File Type:pdf, Size:1020Kb
1/22/2018 Dissecting evolution and disease using comparative vertebrate Knowledge from three major genome projects genomicsDissecting – power from evolution 200 mammals and disease using comparative vertebrate genomics – power ~5% of mammalian genomes are functional from 200 mammals ~1.5% protein coding >3.5% non-coding conserved ≥ 20,000 mammalian genes some lineage specific expansion The most highly conserved non-coding elements sit around developmental genes ~20% of non-coding elements are lineage specific innovations in placental mammals Kerstin Lindblad-Toh Transposable elements give rise to novel Broad Institute of MIT and Harvard Uppsala University functional elements Large-scale evolution across vertebrates Large-scale evolution across vertebrates SOX2 Lowe et al Science, 2011 Label conserved non-coding elements as: Eutherian Therian Amniote Mikkelsen et al Nature, 2007 Conservation predicts function 29 mammals project At least 6-10 % of the genome is conserved If something has been conserved for 100 million years, it is probably doing Sanger 2x (upgraded to 7x) something 90% of GWAS peaks are outside coding regions 3.6 million constraint elements coding (1.5%) conserved non-coding (~6.5%) encompasses 4.2% of the human genome mystery Lindblad-Toh et al Nature 2011 numbers = substitutions per 100 base pairs Broad, WashU, Baylor 1 1/22/2018 New genes and exons Not just genes – ~3,900 new exons : 4.2% of human genome detected under purifying selection: ~1,400 alternative exon in 850 genes ~1,000 new candidate new genes with multiple exons antisense genes • 20% exonic • 30% intronic • 40% intergenic Varför inte hittade förut? Half novel elements För korta Bara i vissa organismer Lin 3 categories of promoter constraint Novel constraint elements & limitations 12bp resolution 10% False Discovery Rate Synonumous constraint elements The 200(+) mammals project • New genome assemblies for 137 species • Conservation across 200+ mammals (~75 extant + 137 new) • 1 bp resolution • False Positive Rate = 6x10-7 (~1800 in genome) Mandrill Dugong Both CSE regions contain enhancers driving the expression of HOXA2 in hindbrain Arctic Fox Screaming Hairy Armadillo 2 1/22/2018 The placental mammalian tree “today” (52) The 200 mammals tree Glires Glires Primates Primates Afrotheria Xenarthra Afrotheria Laurasiatheria Laurasiatheria Xenarthra Source: UCSC Species selection Current status 1.Branch length (aiming for ~1 per family) Goal: NCBI submission of all genomes by early 2018 2.Expert consultations (Ollie Ryder, Bill Murphy, Emma Teeling, Jim Patton, and many others) Sample LC and Assembly 3.Interesting research models, specific traits collection sequencing 4.Sample availability >150 collected 138 Sequenced 132 assembled + ~75 extant genomes Analysis Annotations Alignment • Laurasiatheria pilot alignment (26 species) • “Backbone” alignment in progress using extant genomes DISCOVAR de novo genomes Assembly stats summary (N50s) • 1 ug of standard quality DNA Technical or biological? • Single Illumina library (~450bp insert) • 1 lane HiSeq 2500 with 2x250 bp reads ~$5000 LC + sequencing per genome assembly Scaffold – Arachne/ALLPATHS Scaffold - DISCOVAR Southern Three-Banded Armadillo (13kb) Red-shanked Douc Linnaeus's Two Toed Sloth (10kb) (5kb) Screaming Hairy Armadillo (5kb) Hoary Bamboo Rat (4kb) Uppsala and Broad 3 1/22/2018 BUSCO DISCOVAR captures gene content assesses genome assembly with single-copy orthologs. • Lifted over dog annotation to both rhino assemblies • Almost identical total gene sequence (38.9Mb vs 38.8Mb) Primates Laurasiatheria Dog (Discovar & canFam3.1) Rodents • Very similar coverage of the 14K 1:1 genes (94.8% vs. 93.8%) But – chromosome evolution not possible Simão Bioinformatics 2015 Upgrading DISCOVAR assemblies - Dovetail Chicago/HiC Dovetail HiRise2 (first 20 genomes) • 1 species per order (Harris Lewin @ UC Davis) Clade Common Name Status Dovetail N50 (Mb) Rock hyrax Done 9 Lesser Hedgehog tenrec Done 60 Afrotheria Cape elephant shrew No sample Aardvark No sample Colugo Done 10 Euarchonta Large tree shrew In progress Siberian musk deer Done 33 Pronghorn Done 24 Chacoan Peccary Done 37 Narwhal Done 28 Hippopotamus Done 5 Eastern black rhinocerus Done 18 Laurasiatheria North Indian muntjac Done 32 Gemsbok Done 47 Masai giraffe Done 57 Solenodon Done 43 Greater Mouse Deer Done 19 Tree pangolin Done 10 Southern three-banded armadillo Libraries made; Failed QC Xenarthra Giant anteater Libraries made; Failed QC Reference free alignment with Cactus Flexibility in alignments • whole genome multiple alignment • What other genomes are available? • end result: a reconstruction of the evolution of the genome • Potential to replace with newer assemblies in alignment along the input species tree – reconstructed ancestors “genomes” will be available • annotate genes with Comparative Annotation Toolkit (CAT) Joel Armstrong Benedict Paten 4 1/22/2018 Pilot alignment - 26 Laurasatheria Good coverage of reindeer by cow, hippo and meerkat Contig N50 Scaffold N50 Name Species Clade (kb) (kb*) Part of the ITIH2 gene (exon 7-15) Cow Bos taurus ARTIODACTYLA 97 6.4 Mb Pig Sus scrofa ARTIODACTYLA 69 576 Siberian Reindeer Rangifer tarandus ARTIODACTYLA 92 106 Hippopotamus Hippopotamus amphibius ARTIODACTYLA 85 99 Pronghorn Antilocapra americana ARTIODACTYLA 75 91 Nilgiri Tahr Hemitragus hylocrius ARTIODACTYLA 70 90 Penninsular Bighorn Sheep Ovis canadensis ARTIODACTYLA 63 79 Hunter's Hartebeest Beatragus hunteri ARTIODACTYLA 60 73 Ferret Mustela putorius CARNIVORA 45 9.3 Mb Dog Canis lupus CARNIVORA 267 45.8 Mb Cat (Domestic) Felis catus CARNIVORA 45 18.1 Mb South African Banded Mongoose Mungos mungo CARNIVORA 189 247 Meerkat Suricata suricatta CARNIVORA 157 197 Fossa Cryptoprocta ferox CARNIVORA 138 187 Dwarf Mongoose Helogale parvula CARNIVORA 117 185 California Sea Lion Zalophus californianus CARNIVORA 98 143 Giant Otter Pteronura brasiliensis CARNIVORA 100 131 Arctic Fox Vulpes lagopus CARNIVORA 89 124 Asian Palm Civet CARNIVORA Paradoxurus hermaphroditus 68 77 Striped Hyena Hyaena hyaena CARNIVORA 54 69 Northern Elephant Seal Mirounga angustirostris CARNIVORA 55 67 Narwhal Monodon monoceros CETACEA 79 99 Horse PERISSODACTYLA Equus callabus 112 46.7 Mb Malayan Tapir PERISSODACTYLA Tapirus indicus 234 320 South American Tapir PERISSODACTYLA Tapirus terrestris 184 207 Black Rhinocerous PERISSODACTYLA Diceros bicornis 114 152 Cactus with “backbone” alignment Analysis plan Conservation: Both SiPhy + PhastCons 33 high-quality genomes “backbone” – mostly extant • Placental mammalian conservation genomes • Major clades: primate, rodent, carnivore Attach smaller clade alignments to backbone Conserved regulatory motifs, non-coding RNAs CTCF sites – genome compartmentalization Synonymous Constraint Elements (codons) Regions under positive selection (HARs, PARs, CARs) Convergent evolution Couple to phenotypes and disease Helps finding candidate disease causal variants Connecting genotype to phenotype 5Mb targeted sequencing of 4 doberman OCD cases • Overlap with known mutations / functional annotations and 4 controls • Non-coding element turnover No coding changes • Convergent evolution All 4 cases had mutations in single regulatory element • venom 3 DP cases + 1 DP case • hibernation • loss of vitamin C metabolism • marine • ecolocation Mole vole Hispaniolan solenodon Narwhal The dentist’s friend Tang et al, Genome Biol 2014 Lost its Y chromosome Venomous mammal 5 1/22/2018 Convergent evolution EPAS1 - High altitude adaptation Sheep: 200 genes enriched for functions related to angiogenesis, energy production and erythropoiesis. Wei, C. et al Sci. Rep. 2016 Hu Y et al PNAS,2017 Wang, G.-D. et al. Genome Biol. Evol. 2014. Tool for conservation genetics How to get 150 mammals ... THANK YOU! • Eric Baitchman • Steve Goodman • Bret Pasch • Genomic resources crucial for conservation genetics • Robert Baker • Kris Helgen • Klaus Peter-Koepfli – 3 Northern white rhinos left on earth • Erika Barthlemess • Allyson Hindle • Sébastien Puechmaille • Matthew Breen • Hopi Hoekstra • David Ray – Southern white rhinos as surrogates • Kevin Campbell • Pavel Hulva • Kelly Robertson – White rhino + black rhino genome assemblies • Nicholas Casewell • William Israelsen • Stephen Rossiter • Leona Chemnick • Danielle Lee • Manuel Ruedi • Kimberly Cooper • Harris Lewin • Karen Sears Critical point: variation found among stored samples that can • Liliana Davalos • Matt MacManes • Ashley Seifert be used for reproduction • Frederic Delsuc • Phil Morin • Mark Springer • Dan Distel • Bill Murphy • Emma Teeling • Christopher Emerling • Alice Mouton • Anne Yoder • Vadim Gladyshev • Michael Nachman • Jeffrey Good • Rob Ogden Oliver Ryder, San Diego Zoo 200 Mammals Collaboration • Vertebrate Genomics @ Broad -Elinor Karlsson & Bruce Birren, Kerstin Lindblad-Toh • Uppsala University – Kerstin Lindblad-Toh • San Diego Zoo – Oliver Ryder • University of California, Santa Cruz – Benedict Paten & Joel Armstrong • Stanford – Gill Bejerano • University of California, Davis – Harris Lewin • Earlham Institute (UK) - Wilfried Haerty (formerly TGAC) • Institut de Biologica Evolutiva (Spain) – Tomas Marques-Bonet • Karolinska Institutet (Sweden) – Jussi Taipale • UMass Medical School – Manuel Garber 6 1/22/2018 Connecting to phenotype Backbone improves ancestral assemblies • Activity pattern (nocturnal, etc) Contiguity Size • Brain size • Chemosensing • Demography (longevity, littersize, …) • Diet • Habitat (aquatic, altitude, heat, cold …) • Immunity • Reproduction • Skeletal