Downloaded the Complete Data on the Fossil When They Were Not Identical
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/128314; this version posted April 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. i \Oxford-style" | 2017/4/18 | 19:19 | page 1 | #1 i i i A posteriori evaluation of molecular divergence dates using empirical estimates of time-heterogeneous fossilization rates Simon Gunkel,∗;1 Jes Rust,1 Torsten Wappler,2 Christoph Mayer,3 Oliver Niehuis,4 and Bernhard Misof∗;4 1Steinmann Institut f¨urGeologie, Mineralogie und Pal¨aontologie, Nussallee 8, 53115 Bonn, Germany 2Hessisches Landesmuseum, Friedensplatz 1, 64283 Darmstadt, Germany 3Zoologisches Forschungsmuseum Alexander Koenig, Zentrum f¨urmolekulare Biodiversit¨atsforschung, Adenauerallee 160, 53113 Bonn, Germany 4Albert-Ludwigs-Universit¨atFreiburg Institut f¨urBiologie I (Zoologie) Lehrstuhl Okologie,¨ Evolutionsbiologie und Biodiversit¨at,Hauptstraße 1, 79104 Freiburg, Germany ∗Corresponding author: E-mail: [email protected]; [email protected] Associate Editor: Abstract The application of molecular clock concepts in phylogenetics permits estimating the divergence times of clades with an incomplete fossil record. However, the reliability of this approach is disputed, because the resulting estimates are often inconsistent with different sets of fossils and other parameters (clock models and prior settings) in the analyses. Here, we present the λ statistic, a likelihood approach for a posteriori evaluating the reliability of estimated divergence times. The λ statistic is based on empirically derived fossilization rates and evaluates the fit of estimated divergence times to the fossil record. We tested the performance of this measure with simulated data sets. Furthermore, we applied it to the estimated divergence times of (i) Clavigeritae beetles of the family Staphylinidae and (ii) all extant insect Article orders. The reanalyzed beetle data supports the originally published results, but shows that several fossil calibrations used do not increase the reliability of the divergence time estimates. Analyses of estimated inter-ordinal insect divergences indicate that uniform priors with soft bounds marginally outperform log-normal priors on node ages. Furthermore, a posteriori evaluation of the original published analysis indicates that several inter-ordinal divergence estimates might be too young. The λ statistic allows the comparative evaluation of any clade divergence estimate derived from different calibration approaches. Consequently, the application of different algorithms, software tools, and calibration schemes can be empirically assessed. Key words: molecular clock, fossil calibration, paleobigology, node ages Introduction Molecular clock analyses can be used to infer the geological origin of the most recent common ancestor (MRCA) of any two extant species or 1 i i i i bioRxiv preprint doi: https://doi.org/10.1101/128314; this version posted April 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. i \Oxford-style" | 2017/4/18 | 19:19 | page 2 | #2 i i i clades, even with an incomplete fossil record. as a range of either explicitly or implicitly assumed This is of course very appealing to a broad prior probability distributions of node ages. range of scientists, including molecular biologists, Consequently, the application of relaxed molecular systematists, and palaeontologists, who have clocks became a standard procedure. The most extensively applied the molecular clock concept in commonly used software tools implementing their studies. The concept of the molecular clock relaxed molecular clock methods are BEAST was initially established empirically (Zuckerkandl (Drummond et al., 2007), MCMCtree (Yang, and Pauling, 1962). Its theoretical foundations 2007), and dpp-div (Heath et al., 2012). All three are based on the neutral theory of molecular software packages rely on a Bayesian approach to evolution (Kimura, 1968; King and Jukes, 1969). estimate and calibrate divergence times among Thus, the similarity of two given orthologous clades. The reliability of the inferred estimates DNA sequences is expected to decay with an using a Bayesian approach strongly depends approximately constant rate over time. DNA among other factors on a correct phylogenetic sequence divergence can therefore be used to infer assignment of fossils (Parham et al., 2012) and relative species divergence times. An absolute proper choice of node age priors (Inoue et al., time scale is achieved by calibrating the relative 2010; Warnock et al., 2012). Explicit a priori age of a node with the absolute age of a assignment of fossils and the choice of node fossil associated with the node. However, this age priors is not straightforward and is typically simple approach has been subject to substantial done in a highly subjective manner, resulting in criticism, because (i) calibration errors can substantial debate over the interpretation and accumulate (e.g.(Graur and Martin, 2004)) and utility of the results. At the same time, differences (ii) substitution rates are not constant over time in the specification of analysis parameters (Drummond, 2006). We now know that these are typically also results in different divergence time biologically unrealistic assumptions. This problem estimates. The search for methods that can inform has been partially solved by the development parameter choice is consequently of fundamental and subsequent application of so-called relaxed importance for further improving molecular tree clock methods, which do not assume a constant calibrations. While currently accepted practice, substitution rate among taxa and use multiple arbitrary choice of node age priors can drastically fossil calibrations. Relaxed molecular clock models affect results (Inoue et al., 2010; Warnock et al., not only offer an improved fit to the empirical 2012). The recently introduced fdpp-div (Heath data, but also are capable of accommodating et al., 2014) as well as other implementations seemingly contradictory calibration points, as well of the fossilized birth-death process (Zhang 2 i i i i bioRxiv preprint doi: https://doi.org/10.1101/128314; this version posted April 18, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. i \Oxford-style" | 2017/4/18 | 19:19 | page 3 | #3 i i i et al., 2016) seek to substitute the arbitrary ABCDABCDABCD choice of fossilization rate distributions with the application of a model that estimates rates of ● ● ● speciation, extinction, and fossilization as well as ● ● ● ● ● ● the proportion of sampled extant species directly ● ● ● from the data. The approach assumes a constant ((A,B),(C,D)) fossilization rate within a given taxon across a) b) c) FIG. 1. Graphical illustration of the definition of the time. This, however, is unrealistic, because the terms ghost lineage and invisible lineage. a) Earliest fossil documentation of taxa A, B, C, and D. b) Ghost lineages fossil record documents that fossilization rates (dashed lines) inferred from considering the fossil data in conjunction with the phylogenetic relationship between are not constant over time and across taxa. the taxa. c) Invisible lineages (dotted lines) inferred from considering the fossil data, the phylogenetic relationship between the taxa, and a molecular dating hypothesis on Thus, fdpp-div (Heath et al., 2014) and other the age of nodes. approaches (Zhang et al., 2016) still rely on the notion of an invisible lineage as occupying a palaeontologically unrealistic assumption. In the time interval over which the stratigraphic this study, we introduce a new a posteriori range of a clade is extended by molecular clock approach which scores dated phylogenies for dating. Invisible lineages differ from ghost lineages their congruence with the fossil record using an in requiring data beyond the ages of fossils empirically derived time-dependent fossilization and their phylogenetic position. However, both rate. The approach allows a critical a posteriori ghost lineages and invisible lineages represent appraisal of dated phylogenies and in consequence time intervals during which a taxon existed, the effect of node age priors on estimated but left no known fossil documentation. This divergence dates. absence of documentation is caused in both cases New Approaches by one of three possibilities: (i) no fossil has It has been recognized in paleontology that the been preserved, (ii) a fossil has been preserved, stratigraphic range of a taxon can be extended but relevant apomorphic traits have not been by taking into account evidence from pylogenetic preserved, (iii) a fossil has been preserved, studies. This resulted in the concept of a ghost but relevant apomorphic traits had not yet lineage, which is defined as the geological time evolved. The factors that influence (iiii) can be interval during which a clade is not documented broadly categorized into taxon-specific (intrinsic) by a fossil, but its is postulated based on the and taxon-unspecific (extrinsic) factors. Taxon- existence of a fossil from the clades sister taxon specific factors include all properties of organisms (Norell and Novacek, 1992) (Figure 1). Analogous that affect the probability of any given individual to the notion of a ghost lineage, we introduce of a taxon to be preserved