Whole-genome phylogenetics, gene regulation and the origin of evolutionary novelty

Feather photos: J. Trimble, MCZ “Beast Legends”: A six part adventure in science and myth

Fijian shark god Griffin Kraken

Wild man

Dragon Terror Graphics by Invisible Pictures, Inc. Beast Legends – Griffin episode Evolutionary change: genes or gene regulation? Taste receptors in mammals X X Taste receptors on the tongue inherited only the umami (meat) receptor from their dinosaur ancestors

el receptor del X gusto dulce se perdió incluso en colibríes ! Hummingbirds can taste sugar due to changes in the gene other birds use to taste meat (or insects) Can taste sugar? Taste receptor T1R1 T1R3 hummingbird ✗

swift

chicken Baldwin et al. 2014. Science 345: 929-933 Non-codingDark matter of the genome: a regulatory network?

Karyotype of an Emu CNEEs: evolutionarily conserved non-coding enhancer regions CNEEs = conserved non-exonic elements

View of a segment of human chromosome 10 using UCSC Genome Browser

Janes et al. (2011) Genome Biol. Evol. 3:102–113 Noncoding enhancers: long-range control of gene expression

Levine et al. 2014 Cell 157: 13-25 Phylogenetic hidden Markov model detects CNEEs using Phastcons*

*Siepel et al. 2005. Genome Res. 15:1034-1050 A role for gene regulation in the origin of feathers

Quanguo et al. (2010) Science Anchiornis Chen et al. (1998) Nature Sinosauroptyerx Archaeopteryx Feather photos: J. Trimble, MCZ Conserved non-exonic elements (CNEEs) act as enhancers for feather genes

• 400 abstracts • 193 feather genes • 13,307 feather gene CNEEs (2.2% of total)

Bird, amniote- and tetrapod-specific CNEEs near SHox

Lowe et al. 2015 Mol. Biol. Evol. 32:23–28 High origination rates of feather CNEEs, but not feather genes, when feathers evolved

feathers evolve 1.5 220 mya genes Relative CNEEs number of 1.0 origins

0.5Origin of most feather genes Feather gene pre-dates enhancers feathers evolved 0.0 at almost the same time as feathers Birds Turtles r MammalsReptiles Amphibians Chickens/ducks Crocodilians Bony vertebrates ancient time recent Lowe et al. 2015 Mol. Biol. Evol. 32:23–28 Bird-specific regulatory evolution: what makes a bird a bird? Bird-specific CNEEs associated with genes for limb and body size evolution

Chromosome position (chicken)

Lowe et al. 2015 Mol. Biol. Evol. 32:23–28 Bird-specific CNEEs associated with genes for limb and body size evolution

Lowe et al. 2015 Mol. Biol. Evol. 32:23–28 Bird-specific CNEEs associated with genes for limb and body size evolution

Lee et al. 2014. Science 345:562 Correlates of GC content in ~1000 mammalian coding regions

Romiguier et al. 2010. Genome Res. 20: 1001-1009 Mechanistic hypotheses for high GC content in a lineage

GC content

Isochores/Chromo Rate of biased somal location and gene conversion Population size and length (BGC) and life history effects recombination

Hughes & Hughes 1995 Nature Lynch & Connery 2003 Science Sessions & Larson 1987 Evolution Consequences of repair of double strand breaks induced by BGC or recombination

Frequency of GC alleles among gametes from a heterozygous individual:

(b=BGC bias; Ne = effective population size) Covariation of GC content and body mass in birds

large small large small populations populations populations populations Nabholz et al./Avian Phylogenomics Consortium Effects of GC variation on phylogenomic analysis

Including only genes with low-GC Including only genes with high-GC variation among lineages variation among lineages

Avian Phylogenomics Consortium Passerines

Parrots

Owls

Cormorants Penguins

Rails Gulls, Auks

Pigeons Grebes Ducks, Geese Jarvis et al. 2014, Science CNEEs and the convergent evolution of flightlessness in Palaeognathae Skeletal modifications for flightlessness

Little-spotted kiwi sternum

Emu and ostrich keelless sterna

De Bakker et al. 2013. Nature 500, 445–448. Convergent losses of flight allow comparative genomics to identify genomic regions for flightlessness

Extinct

Extant

Mitchell et al 2014 11 new palaeognath genomes

Little bush moa Rowi Kiwi Great-spotted Kiwi Little Spotted Kiwi

Southern Cassowary

Emu Lesser Rhea Greater Rhea

Elegant-crested Image (all CC): David Cook; Quartl; Jim, the Photographer, Tim Sackton 42- whole genome alignment for birds using ProgressiveCactus

DMRT1 Relative rates of different noncoding markers

Edwards et al. 2017. Syst. Biol. 66: 1028 Phylogenomic markers cover c. 3% of total genome length 12,676 CNEEs

5,016 Introns

3,158 Ultraconserved elements (UCEs)

Cloutier et al. 2019. Syst. Biol. 10.1093/sysbio/syz019 Relationships of rheas unclear

Haddrath & Baker (2012) Mitchell et al. (2014) Smith et al. (2013) 27 nuclear loci mtDNA 60 nuclear loci + mtDNA

Moa Moa Tinamou Moa Tinamou Elephant bird 49 Tinamou 1 Kiwi Kiwi Rhea 63 Emu 1 Emu Kiwi

0.67-1 Cassowary Cassowary Emu Rhea Rhea Cassowary Ostrich Ostrich Ostrich Coalescent* analyses resolve the position of rheas and reveal an ancient rapid radiation

0.061

12,676 CNEEs - 4,794,620 bp 0.073 5,016 introns - 27,890,802 bp 3,158 UCEs - 8,498,759 bp

Total: 20,850 loci; 41,184,181 bp

Branch lengths in coalescent units

*MP-EST: Liu et al. 2010. BMC Evol. Biol. Consistent accumulation of phylogenetic signal using MP-EST

Cloutier et al. 2019. Syst. Biol. 10.1093/sysbio/syz019 Gene tree distribution suggests a near polytomy at base of ratites Cassowary/ Rheas Emu Kiwis Major topology

Cassowary/ Kiwis Emu Rheas Minor topology 1

Cassowary/ Emu Kiwis Rheas Minor topology 2

Cloutier et al. 2019. Syst. Biol. 10.1093/sysbio/syz019 Anomaly zone: most common gene tree does not match the species tree

Cloutier et al. 2019. Syst. Biol. 10.1093/sysbio/syz019 Corroboration of coalescent tree from transposable elements • Chicken Repeat1 (CR1) retroelement insertions • (Virtually) homoplasy free • Binary presence/absence • No model misspecification • No GC/rate variation bias • BUT- subject to incomplete lineage sorting TS TS 5' flank D CR1 D 3' flank A TS TS 5' flank D CR1 D 3' flank B 5' flank 3' flank C Example CR1 insertion Most CR1s support the coalescent species tree topology

Cloutier et al. 2019. Syst. Biol. 10.1093/sysbio/syz019 Evolutionary change: genes or gene regulation? Searching for evidence of genome-wide amino acid convergence in ratites No evidence for genome-wide convergent amino acid substitutions in ratites

R2 = 0.95 ~1-2% of protein-coding genes show evidence of -specific positive selection

271 genes (10% FDR) 104 genes (1% FDR)

Sackton et al. 2019. Science 364: 74-78 Non-codingDark matter of the genome: a regulatory network?

Karyotype of an Emu CNEEs: evolutionarily conserved non-coding enhancer regions CNEEs = conserved non-exonic elements 284,001 long (* > 50 bp) CNEEs in data set

View of a segment of human chromosome 10 using UCSC Genome Browser

Janes et al. (2011) Genome Biol. Evol. 3:102–113 Convergent loss of function of CNEEs in ratite lineages Ratites neognaths and

250 bp Branch-specific Bayesian model of noncoding rate accelerations

for noncoding element i

Z

a = probability of gain of conserved state

b = probability of loss of conserved state

Hu, Z., et al. 2019. Mol. Biol. Evol. 36: 1086 A convergently accelerated CNEE detected with a novel Bayesian method

Anole Turtle2 Turtle1 Gharial Crocodile Alligator2 SNPs Alligator1 Ostrich Moa Tinamou4 Tinamou3 Tinamou2 Tinamou1 Rhea2 Rhea1 accelerated Emu Cassowary Kiwi3 branch Kiwi2 Kiwi1 Mallard Turkey Chicken Mesite Pigeon conserved Cuckoo Swift Hummingbird Killdeer branch Crane Ibis Fulmar Penguin2 Penguin1 Eagle neutral Courol Woodpecker Falcon Budgerigar branch Crow Ground−tit Flycatcher Finch BF1: 72 BF2: 6 r1=0.27, r2=3.08 Branch lengths relative to conserved rate Hu, Z., et al. 2019. Mol. Biol. Evol. 36: 1086 Additional examples of convergently accelerated CNEEs

Hu, Z., et al. 2019. Mol. Biol. Evol. 36: 1086 Rapid regulatory evolution near developmental genes

Sackton et al. 2019. Science 364: 74-78 Homing in on regulators for flightlessness through convergence

4,260 CNEEs 1,270 CNEEs 252 CNEEs 66 CNEEs iw i ed T inamo u ed T inamo u h_ Moa -cres t po tt ed_ K hroa t iw i _Spo tt ed_Kiw i e- t ed T inamo u ed T inamo u iw i hi t rea t ed T inamo u h_ Moa G Rowi_Kiw i Southern_Cassowar y Em u Chilean_ TI na mou Elegen t Thicket_Tinamo u W _Rh ea Lesser Greater_Rhe a Ostr ich Li tt le_ Bus Li tt le_ S ed T inamo u -cres t po tt ed_ K hroa t h_ Moa _Spo tt ed_Kiw i iw i e- t -cres t po tt ed_ K hroa t hi t ed T inamo u _Spo tt ed_Kiw i rea t ed T inamo u e- t G Southern_Cassowar y Em u Chilean_ TI na mou Elegen t Thicket_Tinamo u W Greater_Rhe a Rowi_Kiw i _Rh ea Lesser Ostr ich Li tt le_ Bus Li tt le_ S hi t rea t h_ Moa G Rowi_Kiw i Southern_Cassowar y Em u Chilean_ TI na mou Elegen t Thicket_Tinamo u W _Rh ea Lesser Greater_Rhe a Ostr ich Li tt le_ Bus Li tt le_ S -cres t po tt ed_ K hroa t _Spo tt ed_Kiw i e- t hi t rea t G Chilean_ TI na mou Elegen t Thicket_Tinamo u Rowi_Kiw i Southern_Cassowar y Em u W _Rh ea Lesser Greater_Rhe a Ostr ich Li tt le_ Bus Li tt le_ S 0. 7 0. 7 0. 7 0. 7

CNEE relaxed Same CNEE relaxed Same CNEE relaxed Same CNEE relaxed on 1 lineage in 2 lineages in 3 lineages in 4 lineages Ratite genomes exhibit higher numbers of convergently accelerating CNEEs than do vocal learners or random trios of taxa

140 ratites random pairs/trios 120 vocal learners

Number of 100 convergently accelerated 80 CNEEs 60

40

20

0 2 3 Number of convergent lineages Assay for Transposase-Accessible Chromatin ATAC-Seq identifies DNA with open chromatin, accessible to transcription factors

Stage HH24-25 chickens and rheas

Buenrostro et al. 2015. Curr Protoc.Biol. 2015; 109: 21.29.1–21.29.9. Differences in ATAC-seq peaks between rhea and chicken suggest changes in limb gene regulation Ratite accelerated element 1317692 is contained under chicken ATAC peaks …

5 kb galGal 11,278,500 11,279,000 11,279,500 11,280,000 11,280,500 11,281,000 11,281,500 11,282,000 11,282,500 11,283,000 11,283,500 11,284,000 11,284,500 11,285,000 11,285,500 11,286,000 11,286,500 11,287,000 11,287,500 CNEECNEE Chick HL 90.2857 _

0 _ Chicken 82.4286 _ 0 _ hind limb86.6667 _ 0 _

Chick FL 98.5714 _

0 _ Chicken 93.6667 _ 0 _ 84.8571 _

Forelimb 0 _

Rhea HL 14.8333 _

0 _ 18.5 _

0 _ RheaRhea FL 17 _

0 _ forelimb27.5 _

0 _ ... but rhea is missing this peak Combined information from multiple sources suggests candidate enhancers for flightlessness phenotypes

Rate acceleration ATAC-seq

16 340 33,554 42 CNEEs 71 44,396

62,299

Chip-seq (from Seki et al. 2017)

Sackton et al. 2019. Science 364: 74-78 Volant version of CNEE drives gene expression in the developing forelimb of chicken but flightless version does not Conclusions

• Ratite relationships require methods accommodating gene tree heterogeneity • Noncoding CNEEs show more evidence for convergence than do coding regions • Functional marks (ATAC-seq) suggests noncoding CNEEs are functional and related to species differences Acknowledgements Cliff Tabin Tim Sackton Phil Grayson Scott Edwards Chad Eliason

Statistics: Zhirui Hu Jun Liu

Grant numbers EAR-1355343/DEB-1355292 Michele Clamp Allan Baker Julia Clarke Alison Cloutier