Complex Molecular Evolutionary Models and Information Theoretic Approaches Provide Genomic Perspectives on Amphibian Evolution
Total Page:16
File Type:pdf, Size:1020Kb
Complex Molecular Evolutionary Models and Information Theoretic Approaches Provide Genomic Perspectives on Amphibian Evolution Paul M. Hime 16 May, 2017 Blue Waters Symposium [email protected] The Evolution of Life on Earth • All life traces its origins back to a single common ancestor nearly 4 billion years ago • But today, there are tens of millions of species! • Reconstructing the genealogy of life is fundamental to nearly all areas of modern biology. The Evolution of Life on Earth “Nothing in biology makes sense, except in light of evolution” Dobzhansky “Nothing in evolutionary biology makes sense, except in light of phylogeny” All Organisms on Earth Trace Their Origins Back to a Single Common Ancestor Genomes Are Documents of Evolutionary History Organisms’ Genomes Evolve through Time Phylogenetic Reconstruction • Phylogenies are hypotheses about ancestor - descendent relationships. • These can be estimated from genetic data (in the context of a model). • Simple case: enumerate all possible trees, pick the “best”. • Tree space explodes factorially with increasing numbers of taxa. • Use heuristic search strategies to explore tree- and parameter-space. Models in Evolutionary Biology • Evolutionary biology is an inherently historical discipline. • In evolutionary biology, one cannot “replay the tape” of life... • We use statistical approaches to compare competing sets of models, in the light of data which we collect. “All models are wrong. Some are useful.” George Box Data ≠ Information (Except in the Context of an Appropriate Model) Species 1 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Species 2 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Species 3 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Species 4 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Species 5 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Species 6 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Species 7 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Species 8 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Species 9 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Species 10 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Species 11 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Species 12 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Species 13 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Species 14 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Species 15 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Species 16 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Species 17 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Species 18 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Species 19 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Species 20 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Species 21 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Species 22 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Species 23 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Species 24 AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA Data ≈ Information (Except in the Context of an Appropriate Model) Species 1 ACCGAGGGCATCGATCGACTACCTTAGGGCTCTAGCCTGTTTCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 2 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTACGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 3 ACCGAGGGCATCGATCGACTACCTTAGGGCTCTAGCCTGTTACGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 4 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTACGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 5 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTACGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 6 ACCGAGGGCATCGATCGACTACCTTAGGGCTCTAGCCTGTTTCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 7 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTCCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 8 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTTCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 9 ACCGAGGGCATCGATCGACTACCTTAGGGCTCTAGCCTGTTTCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 10 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTTCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 11 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTTCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 12 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTCCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 13 ACCGAGGGCATCGATCGACTACCTTAGGGCTCTAGCCTGTTTCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 14 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTTCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 15 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTTCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 16 ACCGAGGGCCTCGATCGACTACCTTAGGGCTCTAGCCTGTTTCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 17 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTCCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 18 ACCGAGGGCCTCGATCGACTACCTTAGGGCTCTAGCCTGTTTCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 19 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTTCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 20 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTGCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 21 ACCGAGGGCCTCGATCGACTACCTTAGGGCTCTAGCCTGTTTCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 22 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTGCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 23 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTTCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 24 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTTCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG * * informative site informative site Models of Nucleotide Substitution Species 1 ACCGAGGGCATCGATCGACTACCTTAGGGCTCTAGCCTGTTTCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 2 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTACGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 3 ACCGAGGGCATCGATCGACTACCTTAGGGCTCTAGCCTGTTACGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 4 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTACGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 5 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTACGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 6 ACCGAGGGCATCGATCGACTACCTTAGGGCTCTAGCCTGTTTCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 7 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTCCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 8 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTTCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 9 ACCGAGGGCATCGATCGACTACCTTAGGGCTCTAGCCTGTTTCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 10 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTTCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 11 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTTCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 12 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTCCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 13 ACCGAGGGCATCGATCGACTACCTTAGGGCTCTAGCCTGTTTCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 14 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTTCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 15 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTTCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 16 ACCGAGGGCCTCGATCGACTACCTTAGGGCTCTAGCCTGTTTCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 17 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTCCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 18 ACCGAGGGCCTCGATCGACTACCTTAGGGCTCTAGCCTGTTTCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 19 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTTCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 20 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTGCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 21 ACCGAGGGCCTCGATCGACTACCTTAGGGCTCTAGCCTGTTTCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 22 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTGCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 23 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTTCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG Species 24 ACCGAGGGCTTCGATCGACTACCTTAGGGCTCTAGCCTGTTTCGTGCTAGCTGACTGATCGTAGTGTAGCTGACTGTGTG * * informative site informative site Multi-sequence Alignment General time-reversible substitution matrix Gamma-distributed rate heterogeneity across sites (discretized in practice) Codon-Based Models of Molecular Evolution The Multispecies Coalescent Model Gene divergences always predate species divergences Stochastic coalescent processes can lead to gene tree / species tree discordance Modified from Leliaert et al. 2014 Gene Tree - Species Tree Discordance Population-Level Processes Affect The Expected Distributions of Gene Coalescence Luay Nakhleh Conflicting phylogenetic signal from different loci is expected, especially for more recent divergence events and large effective population sizes. What about at deep scales? Many loci (regions of the genome) may be needed for difficult questions. The Genomics Revolution in Evolutionary Biology It has never been easier to collect genomic data in non-model organisms. The Genomics Revolution in Evolutionary Biology It has never been easier to collect It has never been genomic data in easier to collect non-model genomic data in organisms. non-model organisms. Data ≠ Information (Except in the Context of an Appropriate Model) Data ≠ Information (Except in the Context of an Appropriate Model) Data ≠ Information (Except in the Context of an