Dating Phylogenies with Fossils

Fredrik Ronquist Dept. Bioinformatics & Genetics Swedish Museum of Natural History

fredag 4 oktober 13 Node dating A B C D E F G H

Node dates Fixed Drawn from prior Molecular analysis Unconstrained Calibration of extant taxa point using clock model Calibration point

Fossil record is used to derive priors on calibration points Calibration nodes are fixed; we can integrate out uncertainty concerning other nodes

Prior on root age fredag 4 oktober 13 Potential Shortcomings

 Phylogeny is not known with certainty but we have to fix calibration nodes  Unclear how to derive appropriate calibration distributions  Fossil placement is often uncertain; unclear if this can be accommodated in the calibration distributions  Does not incorporate all the data in the analysis  You have to summarize many fossils in a few calibration points

fredag 4 oktober 13 Total-evidence dating A B C D E F G H

L Node dates Fixed Drawn from prior I Unconstrained J Simultaneous Branch length analysis of extant based on taxa (black) and morphology fossils (red)

Molecular data K provide backbone for extant taxa Integrate out uncertainty in Morphological data all parameters (topology, help place fossils Prior on root age placement of fossils etc) fredag 4 oktober 13 Early radiation of the

 Documented by a number of incomplete impression fossils that are difficult to place phylogenetically  45 fossil and 68 extant taxa  343 morphological characters  Fossil completeness 4 – 20 %  5 kb sequence data from 7 markers  Phylogenetic model:  Mk model of morphology  Codon-site-partitioned GTR+I+G : SYM+I+G  Non-clock, strict clock and relaxed clocks

fredag 4 oktober 13 fredag 4 oktober 13 Morphological models

 Variable state space  Arbitrary state labels  Sampling (ascertainment bias): only variable characters observed  Different models for ordered and unordered characters

fredag 4 oktober 13 Relaxed clock models

 Thorne–Kishino 2002 (TK02) model: continuous autocorrelated model  Compound Poisson process (CPP) model: discrete autocorrelated model  Independent gamma rates (IGR) model: uncorrelated continuous model

fredag 4 oktober 13 CPP Relaxed Clock

Molecular clock tree m2 m3 m 1 A m4 B m6 time lengths m 5 C m7 D time Non-clock tree A effective branch lengths B C evolutionary branch lengths D

evolutionary change

fredag 4 oktober 13 Tree model for total-evidence dating

 Coalescent model: not relevant model for higher-level phylogenies  Birth-death model: problem of modeling speciation, , sampling and fossilization  Uniform model: can be extended to serially sampled trees

fredag 4 oktober 13 Uniform prior on serially sampled trees

A B C D A B C D

(a) (b)

A B C A B C

D D

(c) (d)

fredag 4 oktober 13 Two approaches to dating

 Node dating  68 extant taxa  Seven Hymenoptera calibration points derived from 45 fossils (C-I)  Two outgroup calibration points (A-B)  Offset exponential priors, mean being min of the next more inclusive calibration point  Calibrations set together with the leading paleontological and morphological experts on the Hymenoptera  Total-evidence dating  68 extant + 45 fossil taxa in simultaneous analysis  Position of fossil taxa determined by morphological characters (343 characters in total, 4 – 20% coded for fossils)  Extant phylogeny mostly determined by molecular characters (5 kb sequence data from 7 markers)  All calibration constraints removed except the two outgroup calibrations

fredag 4 oktober 13 Orthoptera Paraneoptera Lepidoptera Non-clock tree retrieves Mecoptera Coleoptera Adephaga Outgroups Coleoptera Polyphaga Neuroptera expected relationships A* Raphidioptera Xyela Macroxyela Xyeloidea D* Blasticotoma Runaria Rate deceleration in ! Paremphytus B* Aglaostigma Dolerus Taxonus Selandria Strongylogaster Monophadnoides Metallus Hoplocampa HYMENOPTERA Nematinus C* Nematus Cladius Monoctenus Gilpinia Diprion Athalia Cimbicinae A - I were used in clock Corynis Arge I* Sterictiphora analyses as calibration points Perga Phylacteophaga Lophyrotoma (all clades well supported) Decameria Neurotoma Xyelidae Onycholyda Pamphilius Cephalcia Acantholyda E* Megalodontes skorniakowii Cephus Calameuta Hartigia Syntexis Sirex Xeris UNICALCARIDA F* Urocerus Tremex Xiphydrioidea Xiphydria Orussus Orussoidea A Siricidae Stephanidae B VESPINA G* A Apoidea B Apoidea C H* Chalcidoidea 0.1 substitutions /site fredag 4 oktober 13 Comparing strict clock to non-clock branch lengths sampled from the non-clock topology

Correlation Squared deviation 0.35 ) 0.030 y Xyelidae 0.30 0.025 0.25 0.020 0.20 0.015 0.15 0.010 non-clock branch length 0.10 y = 0.01867x 0.005 0.05 Xyelidae squared deviation of non-clock branch length ( 0.00 0.000

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 strict-clock branch length strict-clock branch length (x)

fredag 4 oktober 13 fredag 4 oktober 13

rate of CPP process Findingmodel priorssuitable forCPP the 0.25 0.75 1.25 1.75 2.25 0.5 1.0 1.5 2.0 2.5

2 10 rate– multiplier Poisson variance space

0.22

0.28

0.37 variance ofratemultiplier

0.47

0.61

0.78

1.00

1.28

1.65

2.120 > 1.00 < 0.01 0.171 0.019 0.017 0.016 0.010 variance Deviation from observed Slight but significant rate autocorrelation

Branch rate ratios 200

random pairs

150 parent-offspring pairs 100 frequency 0 0

0 1 2 3 4 log rate ratio (absolute value)

fredag 4 oktober 13 Morphology Non-clock Outgroup Hemimetabola Outgroup Hemimetabola Outgroup Holometabola Outgroup Holometabola Relaxed clock Xyeloidea Xyeloidea

Tenthredinoidea Tenthredinoidea models may need rooting constraint Pamphilioidea Pamphilioidea Cephoidea Cephoidea Syntexis Siricoidea Siricidae Xiphydriidae Xiphydriidae Orussidae Orussidae Apocrita Apocrita

Strict clock Strict clock with rooting constraint Outgroup Hemimetabola Outgroup Hemimetabola Outgroup Holometabola Outgroup Holometabola Xyeloidea Xyeloidea Pamphilioidea Pamphilioidea

Tenthredinoidea Tenthredinoidea

Cephoidea Cephoidea Siricoidea Siricoidea Xiphydriidae Xiphydriidae Orussidae Orussidae Apocrita Apocrita

Relaxed clock Relaxed clock with rooting constraint Outgroup Hemimetabola Outgroup Hemimetabola Outgroup Holometabola Outgroup Holometabola Xyeloidea Xyeloidea

Tenthredinoidea Tenthredinoidea

Pamphilioidea Pamphilioidea Cephoidea Cephoidea Siricoidea Siricoidea Xiphydriidae Xiphydriidae Orussidae Orussidae Apocrita Apocrita fredag 4 oktober 13 Rate variation across the tree IGR model CPP model

Orthoptera Orthoptera Paraneoptera Paraneoptera Lepidoptera Lepidoptera Mecoptera Mecoptera Coleoptera Adephaga Coleoptera Adephaga Coleo Polyphaga Coleo Polyphaga Neuroptera Neuroptera Raphidioptera Raphidioptera Xyela Xyela Macroxyela Macroxyela Blasticotoma Blasticotoma Runaria Runaria Paremphytus Paremphytus Tenthredo Tenthredo Aglaostigma Aglaostigma Dolerus Dolerus d Taxonus d Taxonus Selandria Selandria Strongylogaster Strongylogaster Monophadnoides Monophadnoides Metallus Metallus Hoplocampa Hoplocampa Nematinus Nematinus Nematus Nematus Cladius Cladius Monoctenus Monoctenus Gilpinia Gilpinia Diprion Diprion c Athalia c Athalia Cimbicinae Cimbicinae Abia Abia Corynis Corynis Arge Arge Sterictiphora Sterictiphora Perga Perga Phylacteophaga Phylacteophaga Lophyrotoma Lophyrotoma Acordulecera Decameria Decameria Acordulecera Neurotoma Neurotoma Onycholyda Onycholyda Pamphilius Pamphilius Cephalcia Cephalcia Acantholyda Acantholyda Megalodontes cephalotes Megalodontes cephalotes Megalodontes skorniakowii Megalodontes skorniakowii Cephus Cephus Calameuta Calameuta Hartigia Hartigia Syntexis Syntexis Sirex 0.1 Tremex Xeris Sirex Urocerus 0.3 Urocerus Tremex Xeris Xiphydria e Xiphydria Orussus 0.6 Orussus Stephanidae A Stephanidae A e Stephanidae B Stephanidae B Cynipoidea 1.6 Cynipoidea Megalyridae Megalyridae Trigonalidae 3.7 Trigonalidae Apoidea A Apoidea A Apoidea B Apoidea B Apoidea C 9.0 Apoidea C Vespidae Vespidae Chalcidoidea Evanioidea Evanioidea Chalcidoidea f Ichneumonidae f Ichneumonidae

fredag 4 oktober 13 97 Orthoptera Paraneoptera 100 Mecoptera 100 Lepidoptera 96 Coleoptera Adephaga 98 Coleo Polyphaga 100 Raphidioptera Majority rule Neuroptera Abrotoxyela Liadoxyela Eoxyela 84 Anagaridyela Chaetoxyela Nigrimonticola Leioxyela consensus with fossils 100 Triassoxyela 61 Spathoxyela 57 Mesoxyela mesozoica 62 Xyela Macroxyela Grimmaratavites Brigittepterus Xyelula Sogutia Rudisiricius Ferganolyda Prosyntexis Thoracotrema 0% Trematothorax 20% 100 Ghilarella Onokhoius 40% Completeness of Sepulca Karatavites 60% Aulisca 80% Protosirex morphology scores Aulidontes 100% Mesolyda Brachysyntexis 75 Kulbastavia Syntexyela Anaxyela Palaeathalia 77 Pseudoxyelocerus Dahuratoma 55 Undatoma 91 100 Blasticotoma 100 Runaria Paremphytus 100 Tenthredo 77 Aglaostigma 71 Dolerus 68 98 Taxonus 100 Selandria Strongylogaster 97 99 Monophadnoides Metallus 100 Hoplocampa 82 100 Nematinus 99 Nematus Cladius 52 100 Monoctenus 100 Gilpinia 58 Diprion 100 Athalia 100 Cimbicinae Abia Corynis Xyelotoma 64 100 Arge Sterictiphora 97 100 80 Perga Phylacteophaga 100 Lophyrotoma Decameria Acordulecera 64 Pamphiliidae undescribed 100 Neurotoma 100 Acantholyda 95 Cephalcia 100 Pamphilius Onycholyda 52 Turgidontes 100 Megalodontes skorniakowii 100 Megalodontes cephalotes 100 Cephus Calameuta Hartigia Syntexis 93 100 Sirex 100 100 Xeris Urocerus Tremex Xiphydria 71 Praeoryssus 60 Paroryssus 70 Mesorussus Orussus Symphytopterus 85 Cleistogaster Leptephialtites Stephanogaster 100 Stephanidae A 61 83 Stephanidae B 70 Cynipoidea Megalyridae 57 Evanioidea 60 Trigonalidae 90 Vespidae 94 Apoidea C 100 Apoidea B Apoidea A Chalcidoidea Ichneumonidae

350 300 250 200 150 100 50 0 million years before present fredag 4 oktober 13 0 0.1 0.2 Uncertainty in 0.3 0.4 0.5 0.6 the phylogenetic 0.7 Mesoxyela mesozoica Sogutia liassica position varies Outgroups Outgroups Xyelidae Xyelidae across fossils

Tenthredinoidea Tenthredinoidea

Pamphilioidea Pamphilioidea Cephoidea Cephoidea Siricoidea Siricoidea Xiphydriidae Xiphydriidae Orussidea Orussidea

Apocrita Apocrita

Aulisca odontura Leptephialtites caudatus

Outgroups Outgroups

Xyelidae Xyelidae

Tenthredinoidea Tenthredinoidea

Pamphilioidea Pamphilioidea Cephoidea Cephoidea Siricoidea Siricoidea Xiphydriidae Xiphydriidae Orussidea Orussidea

Apocrita Apocrita fredag 4 oktober 13 Orthoptera Paraneoptera Lepidoptera Mecoptera Estimated Coleoptera Adephaga Coleo Polyphaga Neuroptera Raphidioptera Xyela divergence times Macroxyela Blasticotoma Runaria Tenthredo Paremphytus Aglaostigma Dolerus Taxonus Selandria Strongylogaster Monophadnoides Metallus Hoplocampa Nematinus Nematus Cladius Monoctenus total evidence Gilpinia Diprion dating Athalia Cimbicinae node dating Abia Corynis Arge Sterictiphora Perga Phylacteophaga Lophyrotoma Acordulecera Decameria Neurotoma Onycholyda Pamphilius Cephalcia Acantholyda Megalodontes cephalotes Megalodontes skorniakowii Cephus Calameuta Hartigia Syntexis Sirex Xeris Urocerus Tremex Xiphydria Orussus Stephanidae A Stephanidae B Cynipoidea Megalyridae Trigonalidae Apoidea A Apoidea B Apoidea C Vespidae Evanioidea Chalcidoidea Ichneumonidae

450 400 350 300 250 200 150 100 50 0 million years before present Paleogene Neogene Quaternary fredag 4 oktober 13 0.08 0.06 Hymenoptera Holometabola node dating total evidence dating 0.05

0.06 tree age prior: Posteriors on least restrictive 0.04 medium restrictive 0.04 most restrictive 0.03 density density

0.02 node ages 0.02 0.01 0.00 0.00 600 500 400 300 200 100 0 600 500 400 300 200 100 0 Ma Ma 0.03 0.10 Xyelidae Tenthredinoidea 0.08 0.02 0.06 density density 0.04 0.01 0.02 0.00 0.00 600 500 400 300 200 100 0 600 500 400 300 200 100 0 Ma Ma 0.03 Pamphilioidea 0.04 Siricoidea 0.03 0.02 0.02 density density 0.01 0.01 0.00 0.00 600 500 400 300 200 100 0 600 500 400 300 200 100 0 Ma Ma 0.03 0.04 Vespina Apocrita 0.03 0.02 0.02 density density 0.01 0.01 0.00 0.00 600 500 400 300 200 100 0 600 500 400 300 200 100 0 Ma Ma fredag 4 oktober 13 Table 2. Fossils used in the node-dating analyses and calibration point prior settings. The minimal and mean age for the offset-exponential prior are given, along with the corresponding fossils and references.

Calibration point Prior on age (Ma) Fossil(s) Reference PP correct1 A. min: 315 Katerinka (oldest Neoptera) Prokop & Nel 2007 mean: 396 Rhyniognatha (oldest ) Engel & Grimaldi 2004 B. Holometabola min: 302 insect gall (oldest Holometabola) Labandeira & Philips 1996 mean: 396 Rhyniognatha Engel & Grimaldi 2004 C. Hymenoptera min: 235 Triassoxyela, Asioxyela Rasnitsyn & Quicke 2002 96% mean: 302 insect gall Labandeira & Philips 1996 D. Xyelidae2 min: 180 Eoxyela Rasnitsyn 1983 0% E. Pamphilioidea2 min: 161 Aulidontes, Pamphilidae undescribed Rasnitsyn & Zhang 2004 48% F. Siricoidea2 min: 161 Aulisca, Anaxyela, Syntexyela, Zhang & Rasnitsyn 2006 0% Kulbastavia, Brachysyntexis G. Vespina2 min: 180 Brigittepteris Rasnitsyn et al. 2003 7% H. Apocrita2 min: 176 Cleistogaster Rasnitsyn 1975 34% I. Tentrhedinoidea s.str.2,3 min: 140 Palaeathalia Zhang 1985 100%

1Posterior probability (PP) from the total-evidence analysis that the fossil attaches at the position assumed in the node-dating analysis. Note that these posterior probabilities take both the morphological data and the ages of the fossils into account.

2The mean age for all intra-hymenopteran calibration points was assumed to be the minimal age of Hymenoptera, i.e. 235 Ma (Triassoxyela, Asioxyela).

3Tenthredinoidea excluding Blasticotomidae.

fredag 4 oktober 13 Error in divergence time estimation is not influenced to a large extent by molecular character data

Branch length posteriors for different models on four example branches

non clock strict clock IGR effective branch length

0 IGR time branch length non-clock length TK02 effective branch length

06 TK02 time branch length CPP effective branch length 0 Relaxed clock: CPP time branch length strict clock length 05 effective length 06 04

density time length density 03 04 02 01 02

0.00 0.05 0.10 0.15 0.20 0.00 0.05 0.10 0.15 0.20 0.25

effective or time branch length effective or time branch length 150 200 150 100 100 density density 0 5 0 0 05

0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.00 0.02 0.04 0.06 0.08 0.10 effective or time branch length effective or time branch length fredag 4 oktober 13 Conclusions 1(2)

 Total-evidence dating is preferable because it:  explicitly incorporates fossil evidence  allows powerful analysis of the available data  results in divergence times that are  more precise  less sensitive to prior assumptions  probably more accurate  provides better platform for future development, such as explicit modeling of fossilization, speciation, extinction, and sampling fredag 4 oktober 13 Conclusions 2(2)

 There is a limit to how much molecular characters can help reduce the errors in divergence time estimates  Most significant improvements will come from  more intense study of the fossil record  better understanding of morphological evolution  better models of rate variation across sites and lineages  better modeling of speciation, extinction, fossilization and sampling of fossil and extant taxa  Challenges with total-evidence dating under birth-death prior with fossilization:  Dealing with trees where fossils are ancestors (sit on branches)  Sampling probabilities and biases, both for fossils and extant taxa  Uniform fossilization or ”slice sampling”  Priors for speciation and extinction rates

fredag 4 oktober 13 Acknowledgements

 Alex Rasnitsyn  Lars Vilhelmsen  Susanne Schulmeister  Debra Murray  Seraina Klopfstein  Chi Zhang  Tracy Heath  Tanja Stadler  Funding: US National Science Foundation (HymAToL), Swedish Research Council, Swiss National Science Foundation

fredag 4 oktober 13