Investigating Evolutionary History Using Phylogenomics
Total Page:16
File Type:pdf, Size:1020Kb
Investigating evolutionary history using phylogenomics K. Jun Tong Faculty of Science Te University of Sydney 2018 A thesis submitted to fulfl requirements for the degree of Doctor of Philosophy Statement of originality Tis is to certify that to the best of my knowledge, the content of this thesis is my own work. Tis thesis has not been submitted for any degree or other purposes. Te research described in this thesis is the original work of the author, except where specifcally acknowledged. I certify that the intellectual content of this thesis is the product of my own work and that all the assistance received in preparing this thesis and sources have been acknowledged. K. Jun Tong Authority of access Tis thesis may be made available for loan and limited coping in accordance with the Copyright Act (1968). K. Jun Tong Te cover image is Blossoming Almond Branch in a Glass by Vincent van Gogh, 1888 i Authorship attribution statement Te following chapters were published as research papers. Sections 1.1 to 1.8 of Chapter 1 of this thesis are published as: Tong KJ, Lo N, Ho SYW (2016) Reconstructing evolutionary timescales using phylogenomics. Zoological Systematics, 41, 343–351 KJT and SYWH wrote the paper, with input from NL. A version of Chapter 2 of this thesis is published as: Tong KJ, Duchêne S, Ho SYW, Lo N (2015) Comment on “Phylogenomics resolves the timing and pattern of insect evolution”. Science, 349, 487 NL, SYWH, and KJT designed the study. KJT and SD conducted the analyses. KJT wrote the draft of the manuscript. KJT, SYWH, and NL wrote the paper. A version of Chapter 3 of this thesis is submitted as: Tong KJ, Duchêne DA, Duchêne S, Geoghegan JL, Ho SYW (submitted) A comparison of methods for estimating substitution rates from ancient DNA sequence data. BMC Evolutionary Biology It is also posted as a pre–print as: Tong KJ, Duchêne DA, Duchêne S, Geoghegan JL, Ho SYW (2017) A comparison of methods for estimating substitution rates from ancient DNA sequence data. bioRxiv, doi:https://doi.org/10.1101/162529 KJT, DAD, and SYWH designed the study. KJT, DAD, and SD collected the data and conducted the analyses. KJT, DAD, and SYWH wrote the paper, with input from SD and JLG. A version of Chapter 5 of this thesis is published as: Tong KJ, Duchêne S, Lo N, Ho SYW (2017) Te impacts of drift and selection on genomic evolution in insects. PeerJ, 5, 3241 SYWH, SD, and KJT designed the study. KJT and SD collected the data and conducted the analysis. KJT and SYWH wrote the paper, with input from SD and NL. ii Parts of sections 6.1, 6.3, and 6.4 of Chapter 6 of this thesis are, respectively, published in: Tong KJ, Lo N, Ho SYW (2016) Reconstructing evolutionary timescales using phylogenomics. Zoological Systematics, 41, 343–351 and Tong KJ, Duchêne DA, Duchêne S, Geoghegan JL, Ho SYW (submitted) A comparison of methods for estimating substitution rates from ancient DNA sequence data. BMC Evolutionary Biology and Tong KJ, Duchêne S, Lo N, Ho SYW (2017) Te impacts of drift and selection on genomic evolution in insects. PeerJ, 5, 3241 Te following chapters are unpublished: Chapter 4 Simon Y. W. Ho, Sebastián Duchêne, and K. Jun Tong designed the study with input from Robert Lanfear. KJT collected the data with assistance from RL. KJT conducted the analyses. In addition to the statements above, in cases where I am not the corresponding author of a published item, permission to include published material has been granted by the corresponding author. K. Jun Tong As supervisors for the candidature upon which this thesis is based, we confrm that the authorship attribution statements above are correct. Nathan Lo Simon YW Ho iii Table of contents Statement of originality i Authority of access i Authorship attribution statement ii Table of contents iv List of fgures vii List of tables x Acknowledgements xi Preface xiii Abstract xiv Chapter 1 – General introduction 1.1 Te molecular clock and evolutionary history 1 1.2 Nucleotide sequences and clock calibrations 2 1.3 Evolutionary rate variation 4 1.4 New approaches for analysing genome–scale data 7 1.5 Insights from phylogenomic dating: Mammals 8 1.6 Insights from phylogenomic dating: Birds 9 1.7 Insights from phylogenomic dating: Insects 12 1.8 Studying evolutionary history and molecular evolution using phylogenetics 14 Chapter 2 – Investigating the evolutionary timescale of insect evolution 2.1 Introduction 18 2.2 Methods 20 2.3 Results and discussion 24 Chapter 3 — A comparison of methods for estimating substitution rates from ancient DNA sequence data iv 3.1 Introduction 27 3.2 Methods 30 3.2.1. Simulations 30 3.2.2. Mitochondrial genomes 34 3.3 Results 37 3.3.1 Simulations 37 3.3.2 Mitochondrial genomes 42 3.4 Discussion 48 Chapter 4 — Te impact of unlinking branch lengths in phylogenetic analyses of multilocus data sets 4.1 Introduction 52 4.2 Materials and Methods 54 4.3 Results 59 4.4 Discussion 62 Chapter 5 — Te impacts of drift and selection on genomic evolution in insects 5.1 Introduction 67 5.2 Methods 71 5.3 Results 76 5.4 Discussion 82 5.4.1 Evolutionary rate informs structure of branch–length patterns 82 5.4.2 Enzyme function and branch–length patterns 85 5.4.3 Implications for phylogenomic analysis 87 5.6 Conclusions 88 Chapter 6 – General discussion v 6.1 Te phylogenomic age 89 6.2 Estimating the insect evolutionary timescale 91 6.3 Phylogenetic methods and models 92 6.4 Using phylogenomic data to understand genomic evolution 94 6.5 Concluding remarks and future directions 95 References 99 vi List of fgures Figure 1.1. An illustration of gene effects, lineage effects, and their interactions (residual effects). Figure 1.2. Phylogenomic estimates of the crown ages of major groups within mammals birds, and insects. Figure 2.1. Bayesian estimates of the insect evolutionary timescale using three different calibration schemes. Figure 2.2. Bayesian estimate of the insect evolutionary timescale using 37 fossil minimum age constraints and an additional constraint in the polyneopteran clade. Figure 3.1. A scheme outlining simulations of sequence evolution representing different combinations of mean rate, rate variation among lineages, and phylo–temporal clustering. Figure 3.2. Error in estimates of substitution rates from sequence data produced under 12 different simulation conditions, representing different combinations of substitution rate, rate variation among lienages, and phylo–temporal clustering. Figure 3.3. Pairwise comparisons of rate estimates from regression of root–to–tip distances in TempEst, least–squares dating in LSD, and Bayesian inference in BEAST. Figure 3.4a. Precision of Bayesian estimates of substitution rates across 12 simulation conditions. Figure 3.4b. Relationship between phylogenetic stemminess (ratio of internal to terminal branch lengths) and error in the Bayesian median rate estimates. vii Figure 3.5. Relationships between phylogenetic steaminess and error in rate estimates using regression of root–to–tip distances in TempEst for 12 simulation treatments. Figure 3.6. Relationships between phylogenetic steaminess and error in rate estimates using least–squares dating in LSD for 12 simulation treatments. Figure 3.7. Relationships between phylogenetic steaminess and error in rate estimates using Bayesian inference in BEAST for 12 simulation treatments. Figure 3.8. Estimates of substitution rates from time–structured mitogenomic data sets from six vertebrate species. Figure 4.1. Differences between best AICc score and the AICc score for a particular data– partitioning treatment. Figure 5.1. A diagram illustrating the relationship between evolutionary rate and phylogenetic branch–length clusters. Figure 5.2. A positive relationship between the ratio of estimated radical to non–radical amino acid substitutions (Kr/Kc) with evolutionary rate, as measured by gene–tree length. Figure 5.3. A negative relationship between gene–tree length and the number of branch–length patterns. Figure 5.4. Further analyses illustrating the relationship between evolutionary rate and phylogenetic branch–length clusters for additional data sets. viii Figure 5.5. Results from simulated data show that there is no relationship between evolutionary rate and the number of branch–length patterns. Figure 5.6. A relationship between EC number and clusters of branch–length patterns. ix List of tables Table 3.1. Details of six time–structured mitogenomic data sets from vertebrates analysed in Chapter 3. Table 3.2. Results from analyses of six time–structured mitogenomic data sets. Table 4.1. Details of 25 multi–locus data sets analysed in Chapter 4. Table 4.2. Details of data–partitioning treatments applied to 25 multi–locus data sets. Table 4.3. Topological distances from the tree inferred using the partitioning scheme with the lowest AICc score for 25 data sets. x Acknowledgements I thank my supervisors, Simon Ho and Nathan Lo. I am indebted to them for their generous supervision, their endless patience, and their kind friendship. Tey have taught me to look deeper, stretch further, and try harder. I have disappointed them on many an occasion but their grace is such that they have always supported and encouraged me. Tey pursue excellence with such ardour that working with them is a challenge and a pleasure. I also thank Sebastían Duchêne and David Duchêne for their invaluable contributions to my work. Teir enthusiasm for research is infectious. Tey make long days feel brief and they reduce difficult problems to fun puzzles. Tey have already made blistering starts to their careers and I wish them every success. Te best ideas in this thesis belong to these four. I thank the MEEP lab for having me.