Molecular Clocks Fossil Evidence Is Sparse and Imprecise Rose Hoberman (Or Nonexistent)

The Holy Grail Molecular Clocks Fossil evidence is sparse and imprecise Rose Hoberman (or nonexistent) Predict divergence times by comparing molecular data Rate Constancy? •Given 110 MYA – a phylogenetic tree – branch lengths (rt) – a time estimate for one (or more) node C D R M H • Can we date other nodes in the tree? • Yes... if the rate of molecular change is constant across all branches Page & Holmes p240 Protein Variability Evidence for Rate Constancy in Hemoglobin • Protein structures & functions differ – Proportion of neutral sites differ • Rate constancy does not hold across different protein types Large carniverous marsupial • However... – Each protein does appear to have a characteristic rate of evolution Page and Holmes p229 1 The Outline Molecular Clock • Methods for estimating time under a molecular Hypothesis clock – Estimating genetic distance • Amount of genetic difference between – Determining and using calibration points sequences is a function of time since – Sources of error separation. • Rate heterogeneity – reasons for variation • Rate of molecular change is constant – how its taken into account when estimating times (enough) to predict times of divergence • Reliability of time estimates • Estimating gene duplication times Measuring Evolutionary time with a Estimating Genetic Differences molecular clock 1. Estimate genetic distance If all nt equally likely, observed difference d = number amino acid replacements would plateau at 0.75 2. Use paleontological data to determine date of common ancestor Simply counting T = time since divergence differences underestimates 3. Estimate calibration rate (number of genetic distances changes expected per unit time) r = d / 2T Fails to count for multiple hits 4. Calculate time of divergence for novel sequences (Page & Holmes p148) T_ij = d_ij / 2r Estimating Genetic Distance with a Distances from Substitution Model Gamma-Distributed Rates • accounts for relative frequency of different • rate variation among sites types of substitutions – “fast/variable” sites •3rd codon positions • allows variation in substitution rates • codons on surface of globular protein between sites – “slow/invariant” sites • Trytophan (1 codon) structurally required • given learned parameter values •1st or 2nd codon position when di-sulfide bond needed – nucleotide frequencies • alpha parameter of gamma distribution describes degree of variation of rates across – transition/transversion bias positions – alpha parameter of gamma distribution • modeling rate variation changes branch length/ • can infer branch length from differences sequence differences curve 2 Gamma Corrected Distances The ‘Sloppy’ Clock • high rate sites • ‘Ticks’ are stochastic, not deterministic saturate quickly – Mutations happen randomly according to a • sequence difference rises much more Poisson distribution. slowly as the • Many divergence times can result in the low-rate sites same number of mutations gradually accumulate differences • Actually over-dispersed Poisson – Correlations due to structural constraints • Felsenstein Inferring Phylogenies p219 Poisson Variance Need for Calibrations (Assuming A Pefect Molecular Clock) • Changes = rate*time If mutation every MY • Can explain any observed branch length • Poisson variance – Fast rate, short time – 95% lineages 15 MYA – Slow rate, long time old have 8-22 • Suppose 16 changes along a branch substitutions – Could be 2 * 8 or 8 * 2 – 8 substitutions also – No way to distinguish could be 5 MYA – If told time = 8, then rate = 2 • Assume rate=2 along all branches – Can infer all times Molecular Systematics p532 Estimating Calibration Rate Calibration Complexities • Calculate separate rate for each data set • Cannot date fossils perfectly (species/genes) using known date of • Fossils usually not direct ancestors divergence (from fossil, biogeography) – branched off tree before (after?) splitting • One calibration point event. – Rate = d/2T • Impossible to pinpoint the age of last • More than one calibration point common ancestor of a group of living – use regression species – use generative model that constrains time estimates (more later) 3 Molecular Dating Linear Regression Sources of Error • Fix intercept at (0,0) • Fit line between • Both X and Y values only estimates divergence estimates and – substitution model could be incorrect calibration times – tree could be incorrect – errors in orthology assignment • Calculate regression and – Poisson variance is large prediction confidence limits • Pairwise divergences correlated (Systematics p534?) – inflates correlation between divergence & time • Sometimes calibrations correlated – if using derived calibration points Molecular Systematics p536 • Error in inferring slope • Confidence interval for predictions much larger than confidence interval for slope Rate Heterogeneity Rate Heterogeneity among Lineages • Rate of molecular evolution can differ between – nucleotide positions – genes Cause Reason – genomic regions Repair e.g. RNA viruses have – genomes (nuclear vs organelle), species equipment error-prone polymerases – species –over time Metabolic rate More free radicals • If not considered, introduces bias into Generation time Copies DNA more frequently time estimates Population size Effects mutation fixation rate Local Clocks? Rate Changes within a Lineage • Closely related species often share similar properties, likely to have similar rates Cause Reason • For example Population size Genetic drift more likely to fix changes neutral alleles in small – murid rodents on average 2-6 times faster population than apes and humans (Graur & Li p150) Strength of selection 1. new role/environment – mouse and rat rates are nearly equal (Graur & changes over time Li p146) 2. gene duplication 3. change in another gene 4 Working Around Rate Search for Genes with Heterogeneity Uniform Rate across Taxa 1. Identify lineages that deviate and remove them Many ‘clock’ tests: 2. Quantify degree of rate variation to put – Relative rates tests limits on possible divergence dates • compares rates of sister nodes using an outgroup – requires several calibration dates, not always – Tajima test • Number of sites in which character shared by outgroup and available only one of two ingroups should be equal for both ingroups – gives very conservative estimates of – Branch length test molecular dates • deviation of distance from root to leaf compared to average distance 3. Explicity model rate variation – Likelihood ratio test • identifies deviance from clock but not the deviant sequences Likelihood Ratio Test Relative Rates Tests • estimate a phylogeny under molecular • Tests whether distance between two taxa and an outgroup are equal (or average rate of two clades vs an clock and without it outgroup) – e.g. root-to-tip distances must be equal – need to compute expected variance • difference in likelihood ~ 2*Chi^2 with n-2 – many triples to consider, and not independent • Lacks power, esp degrees of freedom – short sequences – asymptotically – low rates of change – when models are nested • Given length and number of variable sites in typical sequences used for dating, (Bronham et al 2000) says: – when nested parameters aren’t set to – unlikely to detect moderate variation between lineages (1.5-4x) boundary – likely to result in substantial error in date estimates R Modeling Rate Variation N Relaxing the Molecular Clock Relaxing the Molecular Clock D E F • Likelihood analysis M – Assign each branch a rate parameter • Learn rates and times, not just • explosion of parameters, not realistic branch lengths – User can partition branches based on domain knowledge A B C – Rates of partitions are independent – Assume root-to-tip times equal – Allow different rates on different branches • Nonparametric methods – smooth rates along tree – Rates of descendants correlate with that of common acnestor • Bayesian approach – stochastic model of evolutionary change • Restricts choice of rates, but still too much – prior distribution of rates – Bayes theorem flexibility to choose rates well –MCMC 5 Bayesian Approaches Parsimonious Approaches Learn rates, times, and substitution parameters simultaneously • Sanderson 1997, 2002 – infer branch lengths via parsimony Devise model of relationship between rates – fit divergence times to minimize difference – Thorne/Kishino et al between rates in successive branches • Assigns new rates to descendant lineages from a – (unique solution?) lognormal distribution with mean equal to • Cutler 2000 ancestral rate and variance increasing with branch length – infer branch lengths via parsimony – Huelsenbeck et al – rates drawn from a normal distribution • Poisson process generates random rate changes (negative rates set to zero) along tree • new rate is current rate * gamma-distributed random variable Comparison of Likelihood & Bayesan Approaches for Estimating Divergence Sources of Error/Variance Times (Yang & Yoder 2003) • Lack of rate constancy (due to lineage, • Analyzed two mitochondrial genes population size or selection effects) – each codon position treated separately • Wrong assumptions in evolutionary model – tested different model assumptions • Errors in orthology assignment – used – 7 calibration points • Incorrect tree • Neither model reliable when • Stochastic variability – using only one codon position • Imprecision of calibration points – using a single model for all positions • Results similar for both methods • Imprecision of regression – using the most complex model • Human sloppiness in analysis – use separate

Molecular Clocks Fossil Evidence Is Sparse and Imprecise Rose Hoberman (Or Nonexistent)

Molecular Clock: Insights and Pitfalls

U-Pb Geochronology and the Calibration of Metazoan Evolution: Progress and Promise

An Overview of the Independent Histories of the Human Y Chromosome and the Human Mitochondrial Chromosome

Aspen Ecology in Rocky Mountain National Park: Age Distribution, Genetics, and the Effects of Elk Herbivory

MOLECULAR CLOCKS Definition Introduction

Pseudotsuga Menziesii)

Evidence for a Convergent Slowdown in Primate Molecular Rates and Its Implications for the Timing of Early Primate Evolution

The New Science of Human Evolution - Newsweek Technology -

A Review of Molecular-Clock Calibrations and Substitution Rates In

Adam, Anthropology and the Genesis Record: Taking Genesis Seriously in the Light of Contemporary Science1

A Genetic Method for Dating Ancient Genomes Provides a Direct Estimate of Human Generation Interval in the Last 45,000 Years

Bayesian Molecular Clock Dating of Species Divergences in the Genomics Era Mario Dos Reis1,2, Philip CJ Donoghue3 and Ziheng Yang1 1