Molecular Clocks Fossil Evidence Is Sparse and Imprecise Rose Hoberman (Or Nonexistent)
Total Page:16
File Type:pdf, Size:1020Kb
The Holy Grail Molecular Clocks Fossil evidence is sparse and imprecise Rose Hoberman (or nonexistent) Predict divergence times by comparing molecular data Rate Constancy? •Given 110 MYA – a phylogenetic tree – branch lengths (rt) – a time estimate for one (or more) node C D R M H • Can we date other nodes in the tree? • Yes... if the rate of molecular change is constant across all branches Page & Holmes p240 Protein Variability Evidence for Rate Constancy in Hemoglobin • Protein structures & functions differ – Proportion of neutral sites differ • Rate constancy does not hold across different protein types Large carniverous marsupial • However... – Each protein does appear to have a characteristic rate of evolution Page and Holmes p229 1 The Outline Molecular Clock • Methods for estimating time under a molecular Hypothesis clock – Estimating genetic distance • Amount of genetic difference between – Determining and using calibration points sequences is a function of time since – Sources of error separation. • Rate heterogeneity – reasons for variation • Rate of molecular change is constant – how its taken into account when estimating times (enough) to predict times of divergence • Reliability of time estimates • Estimating gene duplication times Measuring Evolutionary time with a Estimating Genetic Differences molecular clock 1. Estimate genetic distance If all nt equally likely, observed difference d = number amino acid replacements would plateau at 0.75 2. Use paleontological data to determine date of common ancestor Simply counting T = time since divergence differences underestimates 3. Estimate calibration rate (number of genetic distances changes expected per unit time) r = d / 2T Fails to count for multiple hits 4. Calculate time of divergence for novel sequences (Page & Holmes p148) T_ij = d_ij / 2r Estimating Genetic Distance with a Distances from Substitution Model Gamma-Distributed Rates • accounts for relative frequency of different • rate variation among sites types of substitutions – “fast/variable” sites •3rd codon positions • allows variation in substitution rates • codons on surface of globular protein between sites – “slow/invariant” sites • Trytophan (1 codon) structurally required • given learned parameter values •1st or 2nd codon position when di-sulfide bond needed – nucleotide frequencies • alpha parameter of gamma distribution describes degree of variation of rates across – transition/transversion bias positions – alpha parameter of gamma distribution • modeling rate variation changes branch length/ • can infer branch length from differences sequence differences curve 2 Gamma Corrected Distances The ‘Sloppy’ Clock • high rate sites • ‘Ticks’ are stochastic, not deterministic saturate quickly – Mutations happen randomly according to a • sequence difference rises much more Poisson distribution. slowly as the • Many divergence times can result in the low-rate sites same number of mutations gradually accumulate differences • Actually over-dispersed Poisson – Correlations due to structural constraints • Felsenstein Inferring Phylogenies p219 Poisson Variance Need for Calibrations (Assuming A Pefect Molecular Clock) • Changes = rate*time If mutation every MY • Can explain any observed branch length • Poisson variance – Fast rate, short time – 95% lineages 15 MYA – Slow rate, long time old have 8-22 • Suppose 16 changes along a branch substitutions – Could be 2 * 8 or 8 * 2 – 8 substitutions also – No way to distinguish could be 5 MYA – If told time = 8, then rate = 2 • Assume rate=2 along all branches – Can infer all times Molecular Systematics p532 Estimating Calibration Rate Calibration Complexities • Calculate separate rate for each data set • Cannot date fossils perfectly (species/genes) using known date of • Fossils usually not direct ancestors divergence (from fossil, biogeography) – branched off tree before (after?) splitting • One calibration point event. – Rate = d/2T • Impossible to pinpoint the age of last • More than one calibration point common ancestor of a group of living – use regression species – use generative model that constrains time estimates (more later) 3 Molecular Dating Linear Regression Sources of Error • Fix intercept at (0,0) • Fit line between • Both X and Y values only estimates divergence estimates and – substitution model could be incorrect calibration times – tree could be incorrect – errors in orthology assignment • Calculate regression and – Poisson variance is large prediction confidence limits • Pairwise divergences correlated (Systematics p534?) – inflates correlation between divergence & time • Sometimes calibrations correlated – if using derived calibration points Molecular Systematics p536 • Error in inferring slope • Confidence interval for predictions much larger than confidence interval for slope Rate Heterogeneity Rate Heterogeneity among Lineages • Rate of molecular evolution can differ between – nucleotide positions – genes Cause Reason – genomic regions Repair e.g. RNA viruses have – genomes (nuclear vs organelle), species equipment error-prone polymerases – species –over time Metabolic rate More free radicals • If not considered, introduces bias into Generation time Copies DNA more frequently time estimates Population size Effects mutation fixation rate Local Clocks? Rate Changes within a Lineage • Closely related species often share similar properties, likely to have similar rates Cause Reason • For example Population size Genetic drift more likely to fix changes neutral alleles in small – murid rodents on average 2-6 times faster population than apes and humans (Graur & Li p150) Strength of selection 1. new role/environment – mouse and rat rates are nearly equal (Graur & changes over time Li p146) 2. gene duplication 3. change in another gene 4 Working Around Rate Search for Genes with Heterogeneity Uniform Rate across Taxa 1. Identify lineages that deviate and remove them Many ‘clock’ tests: 2. Quantify degree of rate variation to put – Relative rates tests limits on possible divergence dates • compares rates of sister nodes using an outgroup – requires several calibration dates, not always – Tajima test • Number of sites in which character shared by outgroup and available only one of two ingroups should be equal for both ingroups – gives very conservative estimates of – Branch length test molecular dates • deviation of distance from root to leaf compared to average distance 3. Explicity model rate variation – Likelihood ratio test • identifies deviance from clock but not the deviant sequences Likelihood Ratio Test Relative Rates Tests • estimate a phylogeny under molecular • Tests whether distance between two taxa and an outgroup are equal (or average rate of two clades vs an clock and without it outgroup) – e.g. root-to-tip distances must be equal – need to compute expected variance • difference in likelihood ~ 2*Chi^2 with n-2 – many triples to consider, and not independent • Lacks power, esp degrees of freedom – short sequences – asymptotically – low rates of change – when models are nested • Given length and number of variable sites in typical sequences used for dating, (Bronham et al 2000) says: – when nested parameters aren’t set to – unlikely to detect moderate variation between lineages (1.5-4x) boundary – likely to result in substantial error in date estimates R Modeling Rate Variation N Relaxing the Molecular Clock Relaxing the Molecular Clock D E F • Likelihood analysis M – Assign each branch a rate parameter • Learn rates and times, not just • explosion of parameters, not realistic branch lengths – User can partition branches based on domain knowledge A B C – Rates of partitions are independent – Assume root-to-tip times equal – Allow different rates on different branches • Nonparametric methods – smooth rates along tree – Rates of descendants correlate with that of common acnestor • Bayesian approach – stochastic model of evolutionary change • Restricts choice of rates, but still too much – prior distribution of rates – Bayes theorem flexibility to choose rates well –MCMC 5 Bayesian Approaches Parsimonious Approaches Learn rates, times, and substitution parameters simultaneously • Sanderson 1997, 2002 – infer branch lengths via parsimony Devise model of relationship between rates – fit divergence times to minimize difference – Thorne/Kishino et al between rates in successive branches • Assigns new rates to descendant lineages from a – (unique solution?) lognormal distribution with mean equal to • Cutler 2000 ancestral rate and variance increasing with branch length – infer branch lengths via parsimony – Huelsenbeck et al – rates drawn from a normal distribution • Poisson process generates random rate changes (negative rates set to zero) along tree • new rate is current rate * gamma-distributed random variable Comparison of Likelihood & Bayesan Approaches for Estimating Divergence Sources of Error/Variance Times (Yang & Yoder 2003) • Lack of rate constancy (due to lineage, • Analyzed two mitochondrial genes population size or selection effects) – each codon position treated separately • Wrong assumptions in evolutionary model – tested different model assumptions • Errors in orthology assignment – used – 7 calibration points • Incorrect tree • Neither model reliable when • Stochastic variability – using only one codon position • Imprecision of calibration points – using a single model for all positions • Results similar for both methods • Imprecision of regression – using the most complex model • Human sloppiness in analysis – use separate