UC Berkeley UC Berkeley Electronic Theses and Dissertations

UC Berkeley UC Berkeley Electronic Theses and Dissertations

UC Berkeley UC Berkeley Electronic Theses and Dissertations Title Using the Birth-Death Process to Infer Changes in the Pattern of Lineage Gain and Loss Permalink https://escholarship.org/uc/item/8wr120w2 Author Hallinan, Nathaniel Malachi Publication Date 2011 Peer reviewed|Thesis/dissertation eScholarship.org Powered by the California Digital Library University of California Using the Birth-Death Process to Infer Changes in the Pattern of Lineage Gain and Loss by Nathaniel Malachi Hallinan A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Integrative Biology in the Graduate Division of the University of California, Berkeley Committee in charge: Professor David Lindberg, Chair Professor John Huelsenbeck Professor David Aldous Fall 2011 Using the Birth-Death Process to Infer Changes in the Pattern of Lineage Gain and Loss Copyright 2011 by Nathaniel Malachi Hallinan Abstract Using the Birth-Death Process to Infer Changes in the Pattern of Lineage Gain and Loss by Nathaniel Malachi Hallinan Doctor of Philosophy in Integrative Biology University of California, Berkeley Professor David Lindberg, Chair The birth-death process has been used to study the evolution of a wide variety of biological entities from genes to species. Much recent work has turned to detecting changes in the patterns of lineage splitting by comparing data to birth-death models in which the parameters vary between lineages or over time. Here, I develop methods to investigate how the birth- death process varies under three very different circumstances: changes in the pattern of taxon diversification through time; the effect of whole genome duplications on the pattern of chromosome gain and loss; and changes in the pattern of gene gain and loss on branches of a taxon tree. For all three cases I apply my methods to some real data. For the last fifteen years researchers have studied the distribution of branching times of a phylogeny of extant taxa in order to detect temporal changes in the process of diversification. Theoretical work on this subject has been based on different implementations of the birth- death process and has proceeded along three basic lines: the comparison of actual branching times to a birth-death process; the inference of the effects of different birth-death processes on the distribution of branching times; and the derivation of analytical results that describe various aspects of different birth-death processes. In chapter 2 I make contributions to all three lines of research for the reconstructed time variable birth-death process. Previous work had shown how to calculate the distributions of number of lineages and branching times for a reconstructed constant rate birth-death process that started with one or two reconstructed lineages at some time or ended with some number of lineages in the present. In chapter 2 I expand that work to include any time variable birth-death process that starts with any number of reconstructed lineages and/or ends with any number of reconstructed lineages at any time. I also introduce the discrete time birth-death process which operates as an efficient and accurate numerical solution to any time-variable birth death process and allows for the analytical incorporation of sampling and mass extinctions. Furthermore, I show how to simulate random trees under any of these models. In order to compare phylogenetic trees to these models, I use these methods to calculate two statistics that describe the fit of a set of branching times to any time variable birth- death model: the maximum likelihood, which can be compared to the distribution of the 1 maximum likelihood for a random sample of trees or to that the maximum likelihood of other birth-death models using the Akaike Information Criterion; and the Komolgorov-Smirnov test, which is based on the fact that the branching times should be independently and identically distributed under many time variable birth-death models. I also demonstrate two new methods for visualizing the distribution of branching times: the lineage through time null plot uses a heat map to show the distribution of the number of lineages at different times; and the waiting time null plot does the same for waiting times between branching times. These plots can be used either to see how different time variable birth-death processes affect these distributions or to compare a data set to any time variable birth-death process. I use all these methods to analyze two data sets of reconstructed taxon branching times. The study of paleopolyploidies requires the comparison of multiple whole genome sequences. If researchers could identify the branch of a phylogeny on which a whole genome duplication occurred, before sequencing the genomes of multiple taxa, then they could select taxa that would give them a better picture of that whole genome duplication. In chapter 3 I describe a likelihood model in which the number of chromosomes in a genome evolves according to a Markov process with three stochastic rates: a rate of chromosome duplication and a rate of chromosome loss that are proportional to the number of chromosomes in the genome; and a rate of whole genome duplication that is constant. I implemented software that calculates the maximum likelihood under this model for a phylogeny of taxa in which the chromosome counts are known. I compared the maximum likelihoods of a model in which the genome duplication rate varies to one in which it is fixed at zero using the Akaike information criterion, in order to determine if a model with whole genome duplications is a good fit for the data. Once it has been determined that the data does fit the model, we infer the phylogenetic position of paleopolyploidies by using this model to calculate the posterior probability that a whole genome duplication occurred on each branch of the taxon tree. I applied this model to a phylogeny of 125 molluscan taxa and inferred three places on that phylogeny where it is very likely that a whole genome duplication occurred: a single branch within the Hypsogastropoda; one of two branches at the base of the Stylommatophora; and one or two branches near the base of Cephalopoda. Thanks to the wealth of readily available comparative genomic data, it has become apparent that gene family expansion and contraction is critical for the evolution of organisms. Several researchers have developed likelihood methods that use counts of genes in gene families from a number of taxa to deduce on which branches of the phylogenetic tree there has been an unusual amount of gene duplication or gene loss in that gene family. Gene family counts are readily available, but there is a great deal of information in the gene family tree that is unavailable when using gene counts alone. In chapter 4, I develop a method that uses the gene family tree to infer changes in the process of gene gain and loss on a taxonomic tree. This method relies on calculating the probability of a gene tree given a taxon tree and a set of birth-death parameters by which that gene tree evolves on the taxon tree. I use a reversible-jump MCMC to sample from the joint posterior distribution of a set of birth- death parameters and assignments of those parameters to the branches of a taxon tree given 2 a gene tree and a taxon tree. Different assignments are compared using Bayes factors. I use simulations to show that this method has much more power than a method which relies only on counts of gene family members to determine if a gene family evolved by a different process on a pair of taxon branches, and whether that difference is a consequence of differences in the birth rate or the death rate. In section 4.5 I expand my method to include uncertainty in the gene tree topology, by using a set of gene alignments as my data rather than the fully resolved gene tree. Under this implementation I calculate the probability of those sequences given the gene tree, in addition to the probability of the gene tree given the taxon tree. I modify the reversible-jump MCMC so that it now samples from the posterior distribution of the nucleotide evolution parameters and the gene trees, in addition to the birth-death parameters and their assignments to the branches of the taxon tree. I demonstrate the use of this method on two real gene families found in the Bilateria. I found that a clade of 46 protein tyrosine kinase genes from three taxa is characterized by an increase in the gene duplication rate on the branch leading to Caenorhabditis elegans. Furthermore, a separate analysis of all the posterior hox genes from nine taxa implies that their evolution has been characterized by massive gene loss throughout the Bilateria with a lower rate of turn over in the chordates and at the base of the deuterostomes than is found in the protostomes or in the echinoderms. 3 Contents 1 Variation in the Process of Lineage Gain and Loss 1 1.1 The Birth-Death Process . .1 1.1.1 The Birth-Death Process and Macroevolution . .2 1.1.2 The Birth-Death Process and Gene Family Evolution . .3 1.2 Variation in Evolutionary Rates . .4 1.2.1 Variable Rates of Taxon Diversification . .4 1.2.2 Variable Rates of Gene Family Diversification . .6 1.3 Summary of the Chapters . .7 2 The Reconstructed Time Variable Birth-Death Process 9 2.1 Introduction . .9 2.2 Time Variable Birth-Death Process . 11 2.2.1 Definitions . 11 2.2.2 The Birth-Death Process Divided into Time Intervals . 14 2.2.3 Sampling and Mass Extinctions . 15 2.2.4 Discrete Time Birth-Death Process .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    204 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us