Evaluating, Accelerating and Extending the Multispecies Coalescent Model of Evolution

Evaluating, Accelerating and Extending the Multispecies Coalescent Model of Evolution

Evaluating, Accelerating and Extending the Multispecies Coalescent Model of Evolution Huw Alexander Ogilvie December 2017 A thesis submitted for the degree of Doctor of Philosophy of The Australian National University. © Copyright by Huw Alexander Ogilvie 2017 All Rights Reserved Declaration The length of this thesis is 30,924 words, exclusive of footnotes, tables, figures, maps, bibli- ographies and appendices. The abstract, introduction and conclusion of the thesis are my own original work, with input and advice from my chair supervisor, Professor Craig Moritz. Each chapter was based on research conducted jointly with others, and all chapters have been published in or submitted to peer-reviewed journals. Chapter 1 was published in 2016 in Systematic Biology as “Computational Performance and Statistical Accuracy of *BEAST and Comparisons with Other Methods” (volume 65, is- sue 3, pages 381–396). The authors are Huw A. Ogilvie, Joseph Heled, Dong Xie and Alexei J. Drummond. DX and AJD designed, conducted, and analysed the results of Experiment 1. JH and HAO analysed the results of Experiment 2, which was designed and conducted by JH. HAO designed, conducted, and analysed the results of Experiment 3. All authors contributed to the writing of the manuscript. Chapter 2 was published in 2017 in Molecular Biology and Evolution as “StarBEAST2 Brings Faster Species Tree Inference and Accurate Estimates of Substitution Rates” (vol- ume 34, issue 8, pages 2101–2114). The authors are Huw A. Ogilvie, Remco R. Bouckaert and Alexei J. Drummond. HAO and RRB wrote the StarBEAST2 software. HAO designed, conducted and analysed the results of all experiments with advice and input from AJD. The manuscript was written by HAO with advice and input from AJD. i Chapter 3 was published online in 2017 in Molecular Biology and Evolution as “Bayesian Inference of Species Networks from Multilocus Sequence Data”. The authors are Chi Zhang, Huw A. Ogilvie, Alexei J. Drummond and Tanja Stadler. The SpeciesNetwork software was written by HAO and CZ with input from AJD and TS. The birth-hybridization process was derived by TS and CZ. CZ designed, conducted and analysed the results of all experiments with input from HAO and AJD. The manuscript was written by CZ and HAO with input from TS and AJD. Chapter 4 has been submitted to Systematic Biology as “Inferring Species Trees Using Inte- grative Models of Species Evolution”. The authors are Huw A. Ogilvie, Timothy G. Vaughan, Nicholas J. Matzke, Graham J. Slater, Tanja Stadler, David Welch and Alexei J. Drummond. The integrative model of species evolution was formulated by AJD, TS and HAO. Software to support the model was developed by HAO with input from AJD. TGV designed, conducted and analysed the results of experiments to test the correctness of the implementation, with input from HAO. HAO designed, conducted and analysed the results of all other experiments with input from AJD, NJM and GJS. The manuscript with written by HAO with input from all other authors, and parts of the introduction were based on a grant application authored by and awarded to AJD, NJM, TS, TGV and DW. Huw Alexander Ogilvie, December 15th 2017 ii Acknowledgments This thesis is large and contains multitudes. By that I mean although I am the nominal author and wrote much of the text and code, it is the product of everyone who has supported me. Of course I rst have to thank my doctoral committee of Craig Moritz, Alexei Drummond and Jason Bragg. Each of them believed in me, and I hope I have absorbed some of their copious knowledge and wisdom in the process of completing my PhD. Thanks also to Tanja Stadler who, although not an ocial member of my committee, oversaw a good chunk of my PhD research. Prior to my PhD my honours supervisors Michael Djordjevic and Nijat Imin, and my honours examiners John Rathjen and Ulrike Mathesius, taught me much about research and academia — preparing me for what was to come. For the last ve years I have been part of a community of honours students, PhD students, and postdocs, and it was understood by everyone that we are all in this together. In alpha- betical order, the following people from this community made life and work much easier for me: Adam, Alex P, Alex G, Ana, Ariel, Armand, Bec, Bo, Chi, Christiana, Connie, Damien, Dan, David, Duncan, Eli, Fábio, Ian, Izzy, Jack, Jared, Jessica, Jessie, Jo, Joëlle, John, Jordan, Josh, Jūlija, Karen, Kelly, Kevin, Lauren, Leo, Liam, Louisa, Marta, Meenu, Megan, Michael T, Mitzy, Mozes, Nadia, Nick, Nicola, Ollie, Paul, Regina, Sally, Sasha, Stephen, Susan, Tim, Vero, Veronica, Veronika, Viri, and Xia. If anyone is missing from that list, it is because my memory is temporarily out-of-order as a side efect of writing a thesis. iii My parents, Teresa and Ian, my late grandparents, Graeme and Pix, and my brother Martin made me who I am today. My grandfather was a research scientist at CSIRO, and my admira- tion for him certainly predisposed me towards a career in science. In addition to my family and academic buddies, thanks to everyone I know in a non-science, non-blood-relative capacity (although one of them has a PhD, it’s in evolutionary compu- tation and not computational evolution). Especially Rico, Shan Shan, He Chuan, Robbie, John, Ben, Nick, Dzung, Darryl, Yin, Damien, Ron and Miri. Benji, the best dog in the world, helped too. I am legally required to acknowledge that “This research is supported by an Australian Government Research Training Program (RTP) Scholarship” iv Abstract So much research builds on evolutionary histories of species and genes. They are used in genomics to infer synteny, in ecology to describe and predict biodiversity, and in molecular biology to transfer knowledge acquired in model organisms to humans and crops. Beyond downstream applications, expanding our knowledge of life on Earth is important in its own right. From Naturalis Historia to On the Origin of Species, the acquisition of this knowledge has been a part of human development. Evolutionary histories are commonly represented as trees, where a common ancestor pro- gressively splits into descendant species or alleles. Time trees add more information by using height to represent genetic distance or elapsed time. Species and gene trees can be inferred from molecular sequences using methods which are explicitly model-based, or implicitly as- sume or are statistically consistent with a particular model of evolution. One such model, the multispecies coalescent (MSC), is the topic of my thesis. Under this model, separate trees are inferred for the species history and for each gene’s history. Gene trees are embedded within the species tree according to a coalescent process. Researchers often avoid the MSC when reconstructing time trees because of claims that available implementations are too computationally demanding. Instead, the species history is inferred using a single tree by concatenating the sequences from each gene. I began my thesis research by evaluating the effect of this approximation. In a realistic simulation based on pa- rameters inferred from empirical data, concatenation was grossly inaccurate, especially when estimating recent species divergence times. In a later simulation study I demonstrated that when using concatenation, credible intervals often excluded the true values. v To address reluctance towards using the MSC, I developed a faster implementation of the model. “StarBEAST2” is a Markov chain Monte Carlo (MCMC) method, meaning it char- acterizes the probability distribution over trees by randomly walking the parameter space. I improved computational performance by developing more efficient proposals used to traverse the space, and reducing the number of parameters in the model through analytical integration of population sizes. Despite its sophistication, the MSC has theoretical limitations. One is that the substitution rate is assumed to stay constant, or uncorrelated between lineages of different genes. How- ever substitution rates do vary and are associated with species traits like body size. I addressed this assumption in StarBEAST2 by extending the MSC to estimate substitution rates for each species. Another assumption is that genetic material cannot be transferred “horizontally”, but a more general model called the multispecies network coalescent (MSNC) permits introgres- sion of alleles across species boundaries. My collaborators and I have developed and evaluated an MCMC implementation of the the MSNC. My final thesis project was to combine the MSC with the fossilized birth-death (FBD) pro- cess, which models how species are fossilized and sampled through time. To demonstrate the utility of the FBD-MSC model, I used it to reconstruct the evolutionary history of Caninae (dogs and foxes) using fossil data and molecular sequences. vi Contents 0 Introduction 1 0.1 Inferring trees from data ............................ 2 0.2 Inferring species trees ............................. 4 0.3 Evaluating and accelerating the multispecies coalescent ............ 6 0.4 Extending the multispecies coalescent ..................... 8 1 Computational Performance and Statistical Accuracy of *BEAST and Comparisons with Other Methods 13 1.1 Introduction ................................. 14 1.2 Methods .................................... 17 1.3 Results .................................... 25 1.4 Discussion and conclusions .......................... 38 1.5 Supplementary information and data ..................... 46 1.6 Acknowledgements .............................

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    190 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us